I could think of ways you could make a good attempt at it. You could have a gene...

I could think of ways you could make a good attempt at it. You could have a generator/discriminator relationship where you have a model whose purpose is to evaluate model outputs for things other than just toxicity (basically RLHF for capabilities), then use that to train. You could have code generation tasks where the code is actually executed and a success/failure signal sent based on code performance. You could do logical puzzle generator/logical puzzle solver pairs and have a separate system evaluate answer correctness based on a human dataset baseline, or maybe a model programmed to be able to turn natural language logic puzzles into API calls to a formal logical engine solver to get the answer for comparison. You could make a simple program to turn randomly generated mathematical problems into word problems, use an AI to add extraneous detail while protecting the core problem description, then give the resulting word problems to an AI and use a separate AI to extract out the final answer or conclusion. Then compare that answer to what the calculator says for the original math problem.

All of those have problems and would be very compute expensive, plus the limitation I struggle to see around where if you're using a model to train another model you maybe can't get better than that model. But I think we could build architectures which provide large labelled training datasets to LLMs for any problem that can be deterministically solved by us using more traditional computing methods, like maths and some logic puzzles. Maybe if we use those datasets we can make it so that LLMs are able to do maths and difficult logic problems natively, and maybe the internal machinery they develop to do that could help them in other areas. Would be a fun research project.