Deep-learning-based compression techniques may one day be able to get speech down to several hundred bits per second, and non-speech audio in not much more. (They share the computational expensiveness problem though; even more so.) Google's Lyra seems to perform similarly to Opus for speech, at less than half the bitrate: https://ai.googleblog.com/2021/02/lyra-new-very-low-bitrate-...
I am aware of lyra. Its pretty good. Very computationally expensive though - no audio application I have ever worked on had even close to the power/thermal budgets that would allow use of the deep-learning codecs. Maybe someday we will get very low-energy hardware accelerators for them, but until then, these are a non-starter for things I work on.
The thing is (and maybe this is a nitpick), once you are down to several hundred bps for speech, its getting to be more like speech-to-text (the encoder) and text-to-speech (decoder) than an audio codec.
I am actually not aware of any non-speech audio codecs which can go that low. Any links?