The idea is to combine two small transformers rather than one large models

More specialised on given tasks, and prove to be Turing-complete?