My point was that a human will always be able to convey more context via tone and inflection than a machine. Machines will need to use manual context tags like <angry>. At which point you might as well just hire a person to talk to a mike. Machines also noticeably struggle with more nuanced emotions like sarcasm, or disdain, even with tags. That’s all well and good for something like merchants. It’s not good for main characters.