OpenAI had a real issue with making (for their time) great models but streching ...

authorfly · on June 21, 2024

The main reason ChatGPT took off was: 1) Response time of the API of that quality was 10x quicker than the Davinci-instruct-3 model that was released in summer 2022, making interaction more feasible with lower wait times and with concurrency 2) OpenAI strictly banned chat applications on the GPT API; even summarising with more than 150 tokens required your to submit a use case for review; I built an app around this in October 2022, got through the review, and it was then pointless as everybody could just use ChatGPT for the purposes of my apps new feature).

It was not possible for anybody to have just whacked the instruct models of GPT-3 into an interface for both the restrictions and latency issues that existed prior to ChatGPT. I agree with you on instruct vs ChatGPT and would further say the real innovation was entirely systematic, scaling and changing the interface. Instruct tuning was far more impactful than conversational model tuning because instruct enabled so many synthesizing use cases beyond the training data.

mikeqq2024 · on June 21, 2024

"Instruct tuning was far more impactful than conversational model tuning because instruct enabled so many synthesizing use cases beyond the training data."

I saw many model providers nowadays provide instruct model in name as chat model. What difference between instruct tuning and conversational model tuning specifically?

authorfly · on June 22, 2024

The best papers to read are the T5 paper which introduced intstruction training.

BERT showed that training with two tasks (next sentence and mask fill) was more effective than solely one task.

T5 showed that multiple instructions could be used for one task (token prediction) like not just translating, but also summarizing. They suggested this could generalize (it did)

GPT-2 showed with just token prediction and no instructions you could represent good text; GPT-3 showed this was coherent and also that sufficient context was reliably continued by models(and impacted by the format of training data, e.g. StackOverflow used Q: A: in the training data, so prompts using Q: and A: worked very well for conversation-mimicking).

Davinci-instruct essentially made GPT-3 outputs reliable, because they "corrected model outputs" not just to follow the implicit continued context but to follow text instructions with general english in the users submitted prompt. They could change this to always follow a chat format (e.g. use Pronouns and refer to the user with "You") which seems to work more naturally, but the original instruct worked based on simple commands which are responded to without the chat format (e.g. no "I am sorry" - just no token, no "I believe the book you are looking for is:") etc.

Nowadays most instruct models do actually use prompt formats and training datasets which are conversational (check out the various formats in LM studio) anyway, so the difference is lost.

moffkalast · on June 21, 2024

Afaik there's no difference, instruct and chat are used interchangeably. Mistral calls their tunes "modelname-Instruct", Meta calles them "modelname-chat".

Strictly speaking instruct tuning would mean having one instruction and one answer, but the models are typically smart enough to still get it if you chain them together and most tuning datasets do contain examples of some back and forth discussion. That might be more what could be considered a chat tune, but in practice it's not a hard distinction.