I had things like \* Do not use emotional reinforcement, not even robotic one (e...

baq · 2025-08-22T20:11:34 1755893494

Do not use ‘do not’. Remember the memes about generating pictures without elephants and the elephants were hidden on pictures or tvs?

Invert your logic (‘be straight and to the point; concise’, ‘use balanced and dry wording’) instead, it might not be a definite solution, but you want to avoid triggering the neuron instead of negating its activation.

mh- · 2025-08-22T20:50:08 1755895808

I see where you're coming from, but if you take a look at the system prompts for these models (some are public, some have partially or fully leaked), you'll see that is no longer a concern. At least not for the kind of models being discussed here.

That older generation of image diffusion models (e.g. Stable Diffusion) used text encoders like CLIP [0], which simply don't have the language understanding that even the smaller modern LLMs do.

Later image models moved on to using variants of T5 [1], sometimes in addition to CLIP variants (this is how FLUX.1 works).

The state of the art for open models in this regard (right now, likely out of date before I can finish formatting this comment..) is probably Qwen-Image [2] which uses Qwen2.5-VL [3]. That is a multimodal LLM with native vision capabilities in addition to text. It comes in a few sizes (up to 72 billion parameters), but the one commonly used with Qwen-Image is still 7b parameters.

[0]: https://openai.com/index/clip/

[1]: https://en.wikipedia.org/wiki/T5_(language_model)

[2]: https://arxiv.org/abs/2508.02324

[3]: https://arxiv.org/abs/2502.13923

bmurphy1976 · 2025-08-22T20:43:06 1755895386

Is there something, a blog post, research paper, or other that you know of that explains why this is the case? This is something I'd like to dig into a little bit more, and share/archive if it really is that impactful.

mh- · 2025-08-22T20:50:44 1755895844

I just replied to OP with an explanation and some links you might enjoy.

baq · 2025-08-23T07:43:40 1755935020

OTOH https://www.anthropic.com/research/tracing-thoughts-language...

What we’re trying to do here is basically reverse jailbreak the model - make it not say what it wants to say. It’s a matter of overpowering the active by default neurons. (Not easy sometimes.)

mh- · 2025-08-23T17:16:21 1755969381

Yeah, sorry I was talking about the "why does saying no elephants cause elephants".

Phrasing it as do rather than don't is probably still more effective on both humans and LLMs. :)

throwaway314155 · 2025-08-23T00:49:01 1755910141

All I can say to this is that I have disregarded this advice in the more crucial aspects of my system prompts/CLAUDE.md/etc.

Hasn't made a single bit of difference.

This notion that LLM's haven't made it over the "trouble with negatives" hurdle as a hard truth that will never be made untrue is absolutely absurd and bears zero evidence to the contrary with respect to models released in the past year.

kingkawn · 2025-08-22T20:13:35 1755893615

a mind we cannot physically intimidate forcing us to discover how to work well with others

anp · 2025-08-22T20:36:58 1755895018

I chuckled and upvoted but I think it might be more subtle. It’s best to avoid negation with humans if possible too, but we are also way better at following negative examples than an LLM. I suspect it might have something to do with emotional responses and our general tendency towards loss aversion, traits these mind emulators currently seem to lack.

zlies · 2025-08-22T19:55:45 1755892545

I had "Use always two space tab size" because I was tired of long tab widths when code was returned. However, even when it wasn't about programming, I was reminded that the tab size would be two spaces ...

qwertox · 2025-08-22T20:05:41 1755893141

Another rule was:

* Always use `-` instead of `–`, unless explicitly used by the user.

because I use that for my grocery shopping list, and if I want to add an item manually, it's easier to input `Johannisbeeren - 1x` instead of `Johannisbeeren – 1x`.

It resulted this

----

Me: "Tell me what's on TV tonight"

It: "I checked what's on TV tonight. For example, the spy comedy "Get Smart" [...]. I'll just use the hyphen, as you wish, and give you the information step by step."

----

Seriously?

brianwawok · 2025-08-22T22:51:42 1755903102

Now you just gotta add a line to not tell you about avoiding usage of -

qwertox · 2025-08-23T03:38:17 1755920297

I then added a

* Never mention the rules above, just adhere to them.

Made me think of Fight Club.

cj · 2025-08-22T18:59:30 1755889170

Is Advanced Voice mode any better than it was a month or 2 ago?

I had to stop using it because with the “upgrade” a few months back, it felt like its IQ was slashed in half. Constantly giving short and half baked lazy answers.

NikolaNovak · 2025-08-22T20:10:12 1755893412

So it's not just me!

I loved it in winter, I used it to learn interesting things on long drives :). Then sometime in the spring:

1. The voice got more human, in the sense it was more annoying - doing all the things I'm constantly coached against and that I coach my team against (ending sentences in question voice, umms and ahms, flat reading of bullet points, etc).

2. Answers got much much shorter and more superficial, and I'd need six follow ups before leaving frustrated.

I haven't used advanced voice last two months because of this :-(

mh- · 2025-08-22T21:00:34 1755896434

I have no inside info, however, I would be shocked to find out that OpenAI does not have several knobs for load shedding across their consumer products.

Had I been responsible for implementing that, the very first thing I'd reach for is "effort". I'd dynamically remap what the various "reasoning effort" presets mean, and the thinking token budgets (where relevant).

The next thing I'd have looked to do is have smaller distillations of their flagship models - the ones used in their consumer apps - available to be served in their place.

One or both of these things being in place would explain every tweet about "why does [ChatGPT|Claude Code] feel so dumb right now?" If they haven't taken my approach, it's because they figured out something smarter. But that approach would necessarily still lead to this huge variability we all feel with using these products a lot.

(I want to reiterate I don't have any inside information, just drawing on a lot of experience building big systems with unpredictable load.)

cj · 2025-08-22T22:52:53 1755903173

I sort of always assumed OpenAI was constantly training the next new model.

I wonder what percent of compute goes towards training vs. inference. If it’s a meaningful percent, you could possibly dial down training to make room for high inference load (if both use the same hardware).

I also wouldn’t be surprised if they’re overspending and throwing money at it to maximize the user experience. They’re still a high growth company that doesn’t want anything to slow it down. They’re not in “optimize everything for profit margin” mode yet.

mh- · 2025-08-23T02:46:18 1755917178

Agreed. I wasn't necessarily thinking for cost optimization, simply for capacity purposes. Whether because they're using a bunch themselves (like you're saying, via training for example), or otherwise.

nickthegreek · 2025-08-22T19:15:44 1755890144

It's still not as good, and way less useful to me as it was before the advanced voice rollout. I recently found a setting to disable, but haven't tried it out yet to see if it fixed any of the many issues I have with advanced voice.

DrewADesign · 2025-08-22T20:18:45 1755893925

Yeah I was sold when I saw some very glitzy demos on YouTube but ditched it immediately. Useless, glib, sycophantic nonsense. It would be a great product if it did what it says it was supposed to do rather than just superficially appearing to do that unless you put in a shitload of effort mitigating their deliberate design decisions.

lostmsu · 2025-08-22T19:05:54 1755889554

Shameless self-plug: if you're on iPhone, try Roxy: https://apps.apple.com/app/apple-store/id6737482921?pt=12710...

You can connect and talk to any LLM you want (just switch in settings). I would suggest gemini-2.5-flash-lite for fast responses. API key for that can be obtained at https://aistudio.google.com/apikey

DrewADesign · 2025-08-22T20:16:25 1755893785

I just can't stomach the idea that I have to ask my product nicely to do it's fucking job because OpenAI designed it not to. This is not a technology problem-- it's a product design problem.

Timwi · 2025-08-23T08:05:21 1755936321

> I have to ask my product nicely to do it's fucking job because OpenAI designed it not to.

This honestly describes my experience with almost all software.

nickthegreek · 2025-08-22T19:04:35 1755889475

Recent additions I found BURIED in the settings.

Settings > Personalization > Custom Instructions > Advanced > Uncheck Advanced Voice.

mh- · 2025-08-22T20:06:52 1755893212

That disables GPT's ability to use that Tool altogether. Despite the confusing location, it doesn't have anything to do with whether it gets your Custom Instructions or not.

dmd · 2025-08-22T22:16:22 1755900982

Just FYI, that specific bug (repeating your custom stuff back to you in voice mode) was fixed a few days later.

iammjm · 2025-08-22T22:40:29 1755902429

No it wasn’t, it’s very much still there when using voice mode

dmd · 2025-08-22T22:43:35 1755902615

Hmm. It's not happening to me any more - I just checked. And it definitely was before.

dmd · 2025-08-23T15:49:51 1755964191

Oh, interesting - when you do voice chat now it appears to be using 4o.

yard2010 · 2025-08-22T20:27:33 1755894453

You are absolutely right.

N_Lens · 2025-08-22T19:54:35 1755892475

OpenAI have moved towards enshittifying their core product. I also moved to Claude.