Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Deep-learning-based compression techniques may one day be able to get speech down to several hundred bits per second, and non-speech audio in not much more. (They share the computational expensiveness problem though; even more so.) Google's Lyra seems to perform similarly to Opus for speech, at less than half the bitrate: https://ai.googleblog.com/2021/02/lyra-new-very-low-bitrate-...


I am aware of lyra. Its pretty good. Very computationally expensive though - no audio application I have ever worked on had even close to the power/thermal budgets that would allow use of the deep-learning codecs. Maybe someday we will get very low-energy hardware accelerators for them, but until then, these are a non-starter for things I work on.

The thing is (and maybe this is a nitpick), once you are down to several hundred bps for speech, its getting to be more like speech-to-text (the encoder) and text-to-speech (decoder) than an audio codec.

I am actually not aware of any non-speech audio codecs which can go that low. Any links?


When we have anough compute or the models get much smaller. At this point it seems wasteful to utilize gaming level GPU to decode audio stream.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: