I included that note because output limits are a personal interest of mine.
Until recently most models capped out at around 4,000 tokens of output, even as they grew to handle 100,000 or even a million input tokens.
For most use-cases this is completely fine - but there are some edge-cases that I care about. One is translation - if you feed in a 100,000 token document in English and ask for it to be translated to German you want about 100,000 tokens of output, rather than a summary.
The second is structured data extraction: I like being able to feed in large quantities of unstructured text (or images) and get back structured JSON/CSV. This can be limited by low output token counts.
Sure, your cases are perfectly reasonable. I just wish the LLMs had a "feel" about when to output long or short text. Always thinking about adding something like "be as concise as possible" is kinda tedious
Until recently most models capped out at around 4,000 tokens of output, even as they grew to handle 100,000 or even a million input tokens.
For most use-cases this is completely fine - but there are some edge-cases that I care about. One is translation - if you feed in a 100,000 token document in English and ask for it to be translated to German you want about 100,000 tokens of output, rather than a summary.
The second is structured data extraction: I like being able to feed in large quantities of unstructured text (or images) and get back structured JSON/CSV. This can be limited by low output token counts.