Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Are you using a GPT to prompt-engineer another GPT?
59 points by simonebrunozzi on Jan 29, 2024 | hide | past | favorite | 53 comments
I tried with combinations of GPT 3.5, 4 and Bard. Results are interesting.

It made me think that the obvious way to learn prompt engineering is… to not learn it, but to use another LLM to do that for you.

Any experience with this? Happy? Unhappy?



I’ve commented it before, and surely it’s something I’m doing wrong, but I cannot believe system prompts or GPTs or any amount of instructing actually works for people to get ChatGPT to respond in a certain fashion with any consistency.

I have spent hours and hours and hours and hours trying to get ChatGPT to be a little less apologetic, long-winded, to stop reiterating, and to not interpret questions about its responses as challenges (i.e when I say “what does this line do?” ChatGPT responds “you’re right, there’s another way to do it…”).

Nothing and I mean NOTHING will get ChatGPT with GPT-4 to behave consistently. And it gets worse every day. It’s like a twisted version of a genie misinterpreting a wish. I don’t know if I’ve poisoned my ChatGPT or if I’m being A/B tested to death but every time I use ChatGPT I very seriously consider unsubscribing. The only reasons I don’t are 1) I had an insanely impressive experience with GPT-3, and 2) Google is similarly rapidly decreasing in usefulness.


1. Use the API 2. Use Function Calling, with detailed parameters, well named output variables describing the format of the output you want.

You'll get much, much, much better results.


My evergreen system prompt prefix:

"You are a maximally terse assistant with minimal affect."

Works well for the most part.


Another issue I have, especially when demanding terseness, is that it tends to bail out of writing long code snippets with ellipsis comments like "// And more of the same here" which sometimes defeats the purpose. Except when the code is illustrative to a concept, I want it to be thorough and code the damn thing to the last semicolon.

My solution, which works sometimes, is to instruct it to "not write comments in the code." The drawback is that ChatGPT normally does a good job adding comments, but not something I can't live without.

This "code-trimming" effect does not show up for me in API requests.


I second "terse". Damn useful word.

"No moral lectures. No need to mention your knowledge cutoff. No need to disclose you're an AI. Be detailed and complete, but terse."

Gonna try rewriting in the second person, based on your prompt.

I often feel like I'm trying to undo the damage done by OpenAI though. The API doesn't seem to need this crap.


OpenAI really should fix this. I've started using Bard and brevity comes out of the box. When I used ChatGPT I always had this background feeling of irritation at the ridiculously verbose responses.


Using JSON mode with the GPT 3.5/4 API works well for us. So much so that we have to intentionally fake errors to test that our retries/fallbacks actually work in our code.


Have you compared this to chatgpt plus?


ChatGPT Plus does not expose JSON mode in their web UI. You have to use the API via OpenAI (or Azure OpenAI).


I would assume a lot of that has to do with whatever obsequieous nonsense they've got in the RLHF 'safety' training, and you're not getting rid of that without pushing it into a totally different context via DAN-like 'jailbreaks'.


It wasn't always like this. GPT in early 2023, hell late 2022, was incredible. I could have it fully stimulating a Unix terminal on acid for hours, it'd never break character. It's so insanely nerfed now.


It's insanely good every time they have a public release, then deteriorates significantly. There's plenty of evidence around this too - just compare the exact same prompt then and now. Not sure if this is a matter of cost or just playing whack a mole with unintended behaviorial bugs.


I have a similar experience.

Only asking for things I expect it to be able to find online helps a lot, though.

The moment I try to be innovative or mix two ideas into something new or novel, it falls to pieces in the most frustrating way.


I have this as my custom prefix for when talking to gpt:

Cut unnecessary words, choose those that remain from the bedrock vocabulary everyone knows and keep syntax simple. Opt for being brusque but effective instead of sympathetic. Value brevity. Bullet points, headings and formatting for emphasis are good.


Unfortunately, in my experience, as the chat session advances it seems to forget these instructions and become its old apologetic self again.


Do you thumbs down the bad responses?


Religiously.


I wonder if people only click on the thumbs down button, thus serving to only provide a negative signal mechanism with no ability to differentiate a positive from a negative one.


Anecdata: I’ve several times clicked “regenerate” and then thumbed-up the new response when nudged.


It's certainly better than not doing it, but I wonder how much that helps?

I mean, there's no control sample. It's a single custom-generated response read by a single person. I'd like to know how they derive useful insights from those votes.


The hack is using GPT-3, and I don't mean 3.5. It still performs to a production level, at least for creative work. It's been sped up and is significantly cheaper.


I'm not sure about web based service but with the API this is easily achievable by tinkering with the system message.


Share some chats. It will be instructive for others and maybe somebody has a solution.


A fun bug is that ChatGPT will always use an emoji when apologizing. So if you ask it not to use emojis in a chat and it does (which it often will do in promising not to), and point it out, it results in a loop of apologies and self critique that devolves into modeling an existential crisis.


That's interesting. I've seen a lot of apologies from ChatGPT-4 and I don't think I've ever seen an emoji.

I've never asked it not to, either.


This isn't even remotely true. I've never once seen an emoji from it in over a year of daily use.


Yea why did they build in the wishy washy wokeness … sigh. Very difficult to get succinct answers from it.


You should check out https://x.com/lateinteraction's DSPy — which is like an optimizer for prompts — https://github.com/stanfordnlp/dspy


Yes, I've had great success with this in a few cases.

There's the obvious "create a stable diffusion prompt with all the line noise of 'unreal engine 4K high quality award winning photorealistic'" stuff which is pretty obvious.

Less obvious is using it to refine system prompts for the "create your own GPTs" thing. I used this approach for my "Chat with Marcus Aurelius, Emperor of Rome and Stoic philosopher"[1] and "New Testament Bible chat"[2]

I'm particularly happy with how well the Marcus Aurelius one works, eg: https://chat.openai.com/share/27323fe8-56e2-4620-8e4a-3ebf69...

For both of these I started with a rough prompt and then asked GPT4 to refine it.

I found the key was to make sure to read the generated prompt very carefully to make sure it is actually asking for what you want.

More recently I've been using the same technique for some more complicated use-cases: creating a prompt for GPT-4 to rank answers and creating prompts for Mistral-7B. The same basic approach works well for both of these.

[1] https://chat.openai.com/g/g-qAICXF1nN-marcus-aurelius-empero...

[2] https://chat.openai.com/g/g-CBLrOOGjA-official-new-testament...


ChatGPT also uses the first approach for image generation. It even released before direct access to Dall-E 3 did.


Yes. I deploy prompts professionally for work and I almost always iterate with chatGPT.

It requires a bit of back forth but you can get great results. It lets you iterate at a higher level instead of word for word.

I also find that the prompts work better. Prompt engineering is often about finding magic words and sentences that are dense keywords from the training data and another LLM is going to be good at finding those phrases because it knows those phrases the best.

Here’s an example dialogue I was using recently to iterate on a set of prompts for generating synthetic training data for LLM training. (Inspired by phi-2)

https://chat.openai.com/share/51dd634b-7743-4b5e-9c3f-3d57c6...


Sounds a lot like what I do when choosing words to Google for things.


On a related note, with the (tens of) thousands of "custom GPTs" coming up in the next few years, it would be interesting if the chat would automatically recommend using any one of them in response to a particular query. In a way, it is as if it is directing you to a (human-made) better engineered (pre) prompt.


GPT store kind of has it already, tell it what you want and it will give you suggestions for GPTs


We recently open sourced an agent framework [1] for automating data processing and labeling where the agent's prompt is refined trough iterations with the environment and then asking to an LLM to revise the prompt according to its performance (i.e. automatic prompt tuning). We tested it on the Math reasoning dataset GSM8k and where able to improve the baseline accuracy (GPT4) by 45% -> 74% using 25 labeled examples (I'll put the notebook and blog post linked below [2][3]). Results are definitively very interesting, if not surprising with some skills, and we see more and more of our open source users and customers showing interested in the framework for automating labeling / having it as a copilot.

[1] https://github.com/HumanSignal/Adala

[2] https://github.com/HumanSignal/Adala/blob/master/examples/gs...

[3] https://labelstud.io/blog/mastering-math-reasoning-with-adal...


Checkout Magic Prompts https://magicprompts.lyzr.ai/


Yes, I just used GPT-4 to create a prompt for GPT-3.5-Turbo based on some loose rules that I laid out. It helped me fill in the gaps and write it in a concise format.

The prompt gave much much better results than than the one I wrote.


My PoV is that it's an open question whether this is a fruitful approach. If you search for "meta-prompting" you'll find some discussions/papers on the topic.


You may be interested in a recent AI safety paper by Redwood Research.

In it, they have GPT-4 generate solutions to coding problems, but instruct it to insert backdoors into the solutions some fraction of the time. Then, they explore different ways to use a weaker model (GPT-3.5) to detect these backdoors. Pretty interesting.

[1] https://arxiv.org/abs/2312.06942


My experience (n=1) is that current LLMs are just not good at prompting either themselves or other LLMs, and that if you have enough information to make a meaningful meta-prompt, you also have enough information to make a regular ol' prompt. I just don't think it's something that designers of current LLMs are prioritizing, so it's not very good at it.


I wrote a paper about using big "LLMs" as art directors for the little LLMs within Stable Diffusion: https://arxiv.org/pdf/2311.03716v1.pdf


I have a basically unsubstantiated intuition that there is some analog of the recursion theorem for LLMs, if it’s not itself applicable. If so it should be mathematically impossible to prevent prompt “hacking.”


I'm tempted to build a tool that uses DAGs to orchestrate sequential prompt engineering, but typing that out makes me feel dirty.


You should check out langchain


The self-iterating prompt engineering is a different workflow than normal RAG/tool selection.


Do you mean that the prompts would be completely dynamic in your tool vs the prebuilt prompts with templating that Langchain uses?


Dynamic, not templated. Or maybe templated at the start of the workflow, but obviously LLMs will have its own ways to adjust it.


I would be interested to see if something like this actually produces good results. My (limited) experiences with Langchain made me feel like it was too constraining for the use cases I tried.


Why? It’s a good idea…


Like a sibling comment said, Langchain is basically this but with a bit of structure around the prompt templating


Yes. It's hard. Prompting is hard. Prompting to prompt is hard.


GPT 3.5 will happily write jokes about Jesus of Nazareth but will adamantly refuse to write jokes about the Prophet Mohammad. I can't see why people cannot recognize this technology as a complete abomination that will gravely impact society for the negative. Total and complete political correctness that never wavers and never relents.


ChatGPT already does that to generate images for DALLE 3




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: