My experience as well. Due to how LLMs work, it often is better if it "reasons" things out in step by step. Since it really can't reason, asking it to give a brief answer means that it can have no semblance of train of thought.
Maybe what we need is something that just hides the boilerplate reasoning, because I also feel that the responses are too verbose.
That one is easy: Generate the long answer behind the scenes, and then feed it to a special-purpose summarisation model (the type that lets you determine the output length) to summarise it.
Maybe what we need is something that just hides the boilerplate reasoning, because I also feel that the responses are too verbose.