On most of their tests gpt-4 is not actually worse [1]. In particular coding results are affected by changed due to different output format rather than worse abilities [2]. But that's ok because the message of the paper is that there is strong drift between versions and developers should be aware of it, not that gpt becomes worse [3].
[1] https://www.aisnakeoil.com/p/is-gpt-4-getting-worse-over-tim...
[2] https://twitter.com/Si_Boehm/status/1681801371656536068
[3] https://twitter.com/matei_zaharia/status/1681805357516210177