That (thankfully) can't compound, so would never be more than a one time offset....

MichealCodes · 2025-09-29T17:52:27 1759168347

The benchmarks are not typically ongoing, we do not often see comparisons between week 1 and week 8. Sprinkle a bit of training on the benchmarks in and you can ensure higher scores for the next model. A perfect scam loop to keep the people happy until they wise up.

zamadatix · 2025-09-30T00:37:04 1759192624

> The benchmarks are not typically ongoing, we do not often see comparisons between week 1 and week 8

You don't need to compare "A (Week 1)" to "A (Week 8)" to be able to show "B (Week 1)" is genuinely x% better than "A (Week 1)".

MichealCodes · 2025-09-30T01:23:17 1759195397

As I said sprinkle a bit of benchmarks polluting the training and you have your loop. Each iteration will be better at benchmarks if that's the goal and that goal/context reinforces.

zamadatix · 2025-09-30T17:04:36 1759251876

Sprinkling in benchmark training isn't a loop, it's just plain cheating. Regardless, not all of these benchmarks are public and, even with mass collusion across the board, it wouldn't make sense only open weight LLMS have been improving.