It's a fitted Bradley Terry model, scaled to familiar Elo scores, anchored to wins against Mixtral-8x7B at 1114 (at least last time I looked at it). When you fit the model against historical data, and then you add another month of time that contains newer models, the relative strength of a given model might decline even if its absolute ability remained fixed.