Gemini Pro initially refused (!) but it was quite simple to get a response: > gi...

b7894 · 2025-10-15T20:41:38 1760560898

Gemini 3.0 Pro (or what is deemed to be 3.0 Pro - you can get access to it via A/B testing on AI Studio) does a noticeably better job

https://x.com/cannn064/status/1972349985405681686

https://x.com/whylifeis4/status/1974205929110311134

https://x.com/cannn064/status/1976157886175645875

rozab · 2025-10-16T13:12:23 1760620343

It was Google that featured a bicycling pelican in a presentation a few months back:

https://simonwillison.net/2025/Jun/6/six-months-in-llms/#ai-...

So I think the benchmark can be considered dead as far as Gemini goes

fellowmartian · 2025-10-15T23:37:59 1760571479

There’s obviously no improvement on this metric and hasn’t been in a while.

jiggawatts · 2025-10-16T00:55:14 1760576114

How do people trigger A/B testing?

simonw · 2025-10-16T03:30:03 1760585403

As far as I can tell they just keep on hammering the same prompt in https://aistudio.google.com/ until they get lucky and the A/B test triggers for them on one of those prompts.

qingcharles · 2025-10-16T02:09:31 1760580571

That 2nd one is wild.

Ugh. I hate this hype train. I'll be foaming at the mouth with excitement for the first couple of days until the shine is off.

hnuser123456 · 2025-10-15T20:24:08 1760559848

"create svg code that will create an image of svg code that will create a pelican riding a bicycle"

https://chatgpt.com/share/68f0028b-eb28-800a-858c-d8e1c811b6...

(can be rendered using simon's page at your link)

ru552 · 2025-10-15T19:05:22 1760555122

I like this workflow

actionfromafar · 2025-10-15T21:43:15 1760564595

What is dada?