That website doesn't load for me but anyone who uses ChatGPT semi regularly can see that it's getting steadily worse if you ever ask for anything that begins to border risque. It has even refused to provide me with things like bolt torque specs because of risk.
It could be a bias, that's why we do blinded comparisons for a more accurate rating. If we have to consider my opinion, since I use it often, then no, it hasn't gotten worse over time.
Well I can't load that website so I can't assess their methodology. But I am telling you it is objectively worse for me now. Many others report the same.
Edit - the website finally loaded for me and while their methodology is listed, the actual prompts they use are not. The only example prompt is "correct grammar: I are happy". Which doesn't do anything at all to assess what we're talking about, which is ChatGPT's inability to deal with subjects which are "risky" (where "risky" is defined as "Americans think it's icky to talk about").
Worse is really subjective. More limited functionality with a specific set of topics? Sure. More difficult to trick to get around said topic bans? Sure.
Worse overall? You can use chatgpt 4 and 3.5 side by side and see an obvious difference.
Your specific example seems fairly reasonable. Is there liability in saying x bolt can handle y torque if that ended up not being true? I don't know. What is that bolt causes an accident and someone dies? I'm sure a lawyer could argue that case if ChatGPT gave a bad answer.