Ask HN: Are you using a GPT to prompt-engineer another GPT?

chalsprhebaodu · on Jan 30, 2024

I’ve commented it before, and surely it’s something I’m doing wrong, but I cannot believe system prompts or GPTs or any amount of instructing actually works for people to get ChatGPT to respond in a certain fashion with any consistency.

I have spent hours and hours and hours and hours trying to get ChatGPT to be a little less apologetic, long-winded, to stop reiterating, and to not interpret questions about its responses as challenges (i.e when I say “what does this line do?” ChatGPT responds “you’re right, there’s another way to do it…”).

Nothing and I mean NOTHING will get ChatGPT with GPT-4 to behave consistently. And it gets worse every day. It’s like a twisted version of a genie misinterpreting a wish. I don’t know if I’ve poisoned my ChatGPT or if I’m being A/B tested to death but every time I use ChatGPT I very seriously consider unsubscribing. The only reasons I don’t are 1) I had an insanely impressive experience with GPT-3, and 2) Google is similarly rapidly decreasing in usefulness.

lukebechtel · on Jan 30, 2024

1. Use the API 2. Use Function Calling, with detailed parameters, well named output variables describing the format of the output you want.

You'll get much, much, much better results.

mpalmer · on Jan 30, 2024

My evergreen system prompt prefix:

"You are a maximally terse assistant with minimal affect."

Works well for the most part.

ojosilva · on Jan 30, 2024

Another issue I have, especially when demanding terseness, is that it tends to bail out of writing long code snippets with ellipsis comments like "// And more of the same here" which sometimes defeats the purpose. Except when the code is illustrative to a concept, I want it to be thorough and code the damn thing to the last semicolon.

My solution, which works sometimes, is to instruct it to "not write comments in the code." The drawback is that ChatGPT normally does a good job adding comments, but not something I can't live without.

This "code-trimming" effect does not show up for me in API requests.

flir · on Jan 30, 2024

I second "terse". Damn useful word.

"No moral lectures. No need to mention your knowledge cutoff. No need to disclose you're an AI. Be detailed and complete, but terse."

Gonna try rewriting in the second person, based on your prompt.

I often feel like I'm trying to undo the damage done by OpenAI though. The API doesn't seem to need this crap.

hackerlight · on Jan 30, 2024

OpenAI really should fix this. I've started using Bard and brevity comes out of the box. When I used ChatGPT I always had this background feeling of irritation at the ridiculously verbose responses.

Manouchehri · on Jan 30, 2024

Using JSON mode with the GPT 3.5/4 API works well for us. So much so that we have to intentionally fake errors to test that our retries/fallbacks actually work in our code.

BOOSTERHIDROGEN · on Jan 30, 2024

Have you compared this to chatgpt plus?

Manouchehri · on Feb 1, 2024

ChatGPT Plus does not expose JSON mode in their web UI. You have to use the API via OpenAI (or Azure OpenAI).

crooked-v · on Jan 30, 2024

I would assume a lot of that has to do with whatever obsequieous nonsense they've got in the RLHF 'safety' training, and you're not getting rid of that without pushing it into a totally different context via DAN-like 'jailbreaks'.

Solvency · on Jan 30, 2024

It wasn't always like this. GPT in early 2023, hell late 2022, was incredible. I could have it fully stimulating a Unix terminal on acid for hours, it'd never break character. It's so insanely nerfed now.

muzani · on Jan 30, 2024

It's insanely good every time they have a public release, then deteriorates significantly. There's plenty of evidence around this too - just compare the exact same prompt then and now. Not sure if this is a matter of cost or just playing whack a mole with unintended behaviorial bugs.

xyproto · on Jan 30, 2024

I have a similar experience.

Only asking for things I expect it to be able to find online helps a lot, though.

The moment I try to be innovative or mix two ideas into something new or novel, it falls to pieces in the most frustrating way.

ben30 · on Jan 30, 2024

I have this as my custom prefix for when talking to gpt:

Cut unnecessary words, choose those that remain from the bedrock vocabulary everyone knows and keep syntax simple. Opt for being brusque but effective instead of sympathetic. Value brevity. Bullet points, headings and formatting for emphasis are good.

ojosilva · on Jan 30, 2024

Unfortunately, in my experience, as the chat session advances it seems to forget these instructions and become its old apologetic self again.

losteric · on Jan 30, 2024

Do you thumbs down the bad responses?

chalsprhebaodu · on Jan 30, 2024

Religiously.

supriyo-biswas · on Jan 30, 2024

I wonder if people only click on the thumbs down button, thus serving to only provide a negative signal mechanism with no ability to differentiate a positive from a negative one.

TrickardRixx · on Jan 30, 2024

Anecdata: I’ve several times clicked “regenerate” and then thumbed-up the new response when nudged.

JohnBooty · on Jan 30, 2024

It's certainly better than not doing it, but I wonder how much that helps?

I mean, there's no control sample. It's a single custom-generated response read by a single person. I'd like to know how they derive useful insights from those votes.

muzani · on Jan 30, 2024

The hack is using GPT-3, and I don't mean 3.5. It still performs to a production level, at least for creative work. It's been sped up and is significantly cheaper.

kirkarg · on Jan 30, 2024

I'm not sure about web based service but with the API this is easily achievable by tinkering with the system message.

adamgordonbell · on Jan 30, 2024

Share some chats. It will be instructive for others and maybe somebody has a solution.

kromem · on Jan 30, 2024

A fun bug is that ChatGPT will always use an emoji when apologizing. So if you ask it not to use emojis in a chat and it does (which it often will do in promising not to), and point it out, it results in a loop of apologies and self critique that devolves into modeling an existential crisis.

JohnBooty · on Jan 30, 2024

That's interesting. I've seen a lot of apologies from ChatGPT-4 and I don't think I've ever seen an emoji.

I've never asked it not to, either.

Solvency · on Jan 30, 2024

This isn't even remotely true. I've never once seen an emoji from it in over a year of daily use.

lulznews · on Jan 30, 2024

Yea why did they build in the wishy washy wokeness … sigh. Very difficult to get succinct answers from it.

knrz · on Jan 30, 2024

You should check out https://x.com/lateinteraction's DSPy — which is like an optimizer for prompts — https://github.com/stanfordnlp/dspy

nl · on Jan 30, 2024

Yes, I've had great success with this in a few cases.

There's the obvious "create a stable diffusion prompt with all the line noise of 'unreal engine 4K high quality award winning photorealistic'" stuff which is pretty obvious.

Less obvious is using it to refine system prompts for the "create your own GPTs" thing. I used this approach for my "Chat with Marcus Aurelius, Emperor of Rome and Stoic philosopher"[1] and "New Testament Bible chat"[2]

I'm particularly happy with how well the Marcus Aurelius one works, eg: https://chat.openai.com/share/27323fe8-56e2-4620-8e4a-3ebf69...

For both of these I started with a rough prompt and then asked GPT4 to refine it.

I found the key was to make sure to read the generated prompt very carefully to make sure it is actually asking for what you want.

More recently I've been using the same technique for some more complicated use-cases: creating a prompt for GPT-4 to rank answers and creating prompts for Mistral-7B. The same basic approach works well for both of these.

[1] https://chat.openai.com/g/g-qAICXF1nN-marcus-aurelius-empero...

[2] https://chat.openai.com/g/g-CBLrOOGjA-official-new-testament...

zamadatix · on Jan 30, 2024

ChatGPT also uses the first approach for image generation. It even released before direct access to Dall-E 3 did.

nbardy · on Jan 30, 2024

Yes. I deploy prompts professionally for work and I almost always iterate with chatGPT.

It requires a bit of back forth but you can get great results. It lets you iterate at a higher level instead of word for word.

I also find that the prompts work better. Prompt engineering is often about finding magic words and sentences that are dense keywords from the training data and another LLM is going to be good at finding those phrases because it knows those phrases the best.

Here’s an example dialogue I was using recently to iterate on a set of prompts for generating synthetic training data for LLM training. (Inspired by phi-2)

https://chat.openai.com/share/51dd634b-7743-4b5e-9c3f-3d57c6...

strangattractor · on Jan 30, 2024

Sounds a lot like what I do when choosing words to Google for things.

CrypticShift · on Jan 30, 2024

On a related note, with the (tens of) thousands of "custom GPTs" coming up in the next few years, it would be interesting if the chat would automatically recommend using any one of them in response to a particular query. In a way, it is as if it is directing you to a (human-made) better engineered (pre) prompt.

Gustek · on Jan 30, 2024

GPT store kind of has it already, tell it what you want and it will give you suggestions for GPTs

ReDeiPirati · on Jan 30, 2024

We recently open sourced an agent framework [1] for automating data processing and labeling where the agent's prompt is refined trough iterations with the environment and then asking to an LLM to revise the prompt according to its performance (i.e. automatic prompt tuning). We tested it on the Math reasoning dataset GSM8k and where able to improve the baseline accuracy (GPT4) by 45% -> 74% using 25 labeled examples (I'll put the notebook and blog post linked below [2][3]). Results are definitively very interesting, if not surprising with some skills, and we see more and more of our open source users and customers showing interested in the framework for automating labeling / having it as a copilot.

[1] https://github.com/HumanSignal/Adala

[2] https://github.com/HumanSignal/Adala/blob/master/examples/gs...

[3] https://labelstud.io/blog/mastering-math-reasoning-with-adal...

alfozan · on Jan 30, 2024

Checkout Magic Prompts https://magicprompts.lyzr.ai/

FinalDestiny · on Jan 30, 2024

Yes, I just used GPT-4 to create a prompt for GPT-3.5-Turbo based on some loose rules that I laid out. It helped me fill in the gaps and write it in a concise format.

The prompt gave much much better results than than the one I wrote.

calrueb · on Jan 30, 2024

My PoV is that it's an open question whether this is a fruitful approach. If you search for "meta-prompting" you'll find some discussions/papers on the topic.

andrewedstrom · on Jan 30, 2024

You may be interested in a recent AI safety paper by Redwood Research.

In it, they have GPT-4 generate solutions to coding problems, but instruct it to insert backdoors into the solutions some fraction of the time. Then, they explore different ways to use a weaker model (GPT-3.5) to detect these backdoors. Pretty interesting.

[1] https://arxiv.org/abs/2312.06942

free_bip · on Jan 30, 2024

My experience (n=1) is that current LLMs are just not good at prompting either themselves or other LLMs, and that if you have enough information to make a meaningful meta-prompt, you also have enough information to make a regular ol' prompt. I just don't think it's something that designers of current LLMs are prioritizing, so it's not very good at it.

Der_Einzige · on Jan 30, 2024

I wrote a paper about using big "LLMs" as art directors for the little LLMs within Stable Diffusion: https://arxiv.org/pdf/2311.03716v1.pdf

User23 · on Jan 30, 2024

I have a basically unsubstantiated intuition that there is some analog of the recursion theorem for LLMs, if it’s not itself applicable. If so it should be mathematically impossible to prevent prompt “hacking.”

minimaxir · on Jan 30, 2024

I'm tempted to build a tool that uses DAGs to orchestrate sequential prompt engineering, but typing that out makes me feel dirty.

CuriouslyC · on Jan 30, 2024

You should check out langchain

minimaxir · on Jan 30, 2024

The self-iterating prompt engineering is a different workflow than normal RAG/tool selection.

SOLAR_FIELDS · on Jan 30, 2024

Do you mean that the prompts would be completely dynamic in your tool vs the prebuilt prompts with templating that Langchain uses?

minimaxir · on Jan 30, 2024

Dynamic, not templated. Or maybe templated at the start of the workflow, but obviously LLMs will have its own ways to adjust it.

SOLAR_FIELDS · on Jan 30, 2024

I would be interested to see if something like this actually produces good results. My (limited) experiences with Langchain made me feel like it was too constraining for the use cases I tried.

SgtBastard · on Jan 30, 2024

Why? It’s a good idea…

SOLAR_FIELDS · on Jan 30, 2024

Like a sibling comment said, Langchain is basically this but with a bit of structure around the prompt templating

danielmarkbruce · on Jan 30, 2024

Yes. It's hard. Prompting is hard. Prompting to prompt is hard.

galaxyofdoom · on Jan 30, 2024

GPT 3.5 will happily write jokes about Jesus of Nazareth but will adamantly refuse to write jokes about the Prophet Mohammad. I can't see why people cannot recognize this technology as a complete abomination that will gravely impact society for the negative. Total and complete political correctness that never wavers and never relents.

huytersd · on Jan 30, 2024

ChatGPT already does that to generate images for DALLE 3