OpenAI had a real issue with making (for their time) great models but streching their rollout over months. They gave access to press and some twitter users, everyone else had to apply for their use case only to be put on the waitlist. That completely killed any momentum.
The first version of ChatGPT wasn't a huge leap from simulating chat with instruction-tuned GPT 3.5, the real innovation was scaling it to the point where they could give the world immediate and free access. That built the hype, and that success allowed them to make future ChatGPT versions a lot better than the instruction-tuned models ever were.
The main reason ChatGPT took off was:
1) Response time of the API of that quality was 10x quicker than the Davinci-instruct-3 model that was released in summer 2022, making interaction more feasible with lower wait times and with concurrency
2) OpenAI strictly banned chat applications on the GPT API; even summarising with more than 150 tokens required your to submit a use case for review; I built an app around this in October 2022, got through the review, and it was then pointless as everybody could just use ChatGPT for the purposes of my apps new feature).
It was not possible for anybody to have just whacked the instruct models of GPT-3 into an interface for both the restrictions and latency issues that existed prior to ChatGPT. I agree with you on instruct vs ChatGPT and would further say the real innovation was entirely systematic, scaling and changing the interface. Instruct tuning was far more impactful than conversational model tuning because instruct enabled so many synthesizing use cases beyond the training data.
"Instruct tuning was far more impactful than conversational model tuning because instruct enabled so many synthesizing use cases beyond the training data."
I saw many model providers nowadays provide instruct model in name as chat model. What difference between instruct tuning and conversational model tuning specifically?
The best papers to read are the T5 paper which introduced intstruction training.
BERT showed that training with two tasks (next sentence and mask fill) was more effective than solely one task.
T5 showed that multiple instructions could be used for one task (token prediction) like not just translating, but also summarizing. They suggested this could generalize (it did)
GPT-2 showed with just token prediction and no instructions you could represent good text; GPT-3 showed this was coherent and also that sufficient context was reliably continued by models(and impacted by the format of training data, e.g. StackOverflow used Q: A: in the training data, so prompts using Q: and A: worked very well for conversation-mimicking).
Davinci-instruct essentially made GPT-3 outputs reliable, because they "corrected model outputs" not just to follow the implicit continued context but to follow text instructions with general english in the users submitted prompt. They could change this to always follow a chat format (e.g. use Pronouns and refer to the user with "You") which seems to work more naturally, but the original instruct worked based on simple commands which are responded to without the chat format (e.g. no "I am sorry" - just no token, no "I believe the book you are looking for is:") etc.
Nowadays most instruct models do actually use prompt formats and training datasets which are conversational (check out the various formats in LM studio) anyway, so the difference is lost.
Afaik there's no difference, instruct and chat are used interchangeably. Mistral calls their tunes "modelname-Instruct", Meta calles them "modelname-chat".
Strictly speaking instruct tuning would mean having one instruction and one answer, but the models are typically smart enough to still get it if you chain them together and most tuning datasets do contain examples of some back and forth discussion. That might be more what could be considered a chat tune, but in practice it's not a hard distinction.
The first version of ChatGPT wasn't a huge leap from simulating chat with instruction-tuned GPT 3.5, the real innovation was scaling it to the point where they could give the world immediate and free access. That built the hype, and that success allowed them to make future ChatGPT versions a lot better than the instruction-tuned models ever were.