Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In a long range configuration, LoRa has a theoretical max throughput of about 140bps (bits) and that's assuming only one device is transmitting and no packet loss.

Human speech has an average rate of 39bps.



Lowest bitrate speech codecs have rates like 450bps, 700bps (Codec2) and 1600bps (LPCNet)

https://github.com/mozilla/LPCNet https://github.com/drowe67/codec2

Conceivably one could reach 39 bps with near-realtime speech, tone, inflection and tempo recognition. Put the result into a zstd compressed SSML stream and perhaps even reach 16 bps. https://stackoverflow.com/questions/46108940/is-there-a-way-...


If you already carry 39 bps of information efficiently by 39 bps actual bitstream, it's not losslesly compressible. That's by definition. Not even by zstd :)


You might want to let Motorola know since DMR needs needs multiple orders of magnitude(2-4 kbps) for their voice codec.


That's the bitrate of a codec to transmit speech that humans can hear and understand... but the actual rate of data communicated by speech is about 39 bps -- and that's pretty universal, regardless of language:

https://www.sciencemag.org/news/2019/09/human-speech-may-hav...


how is this number useful or comparable to digital codecs that are being discussed here? It's an interesting piece of research sure, but I don't see the relevance


Because voice comms are typically used by hams in disaster response -- you pointed out that LoRa tops out at 140 bits, so I responded that that's still 3X faster than voice.


As explained above, the most efficient voice codecs still need orders of magnitude more than that.

This magical 39 bit codec doesn't exist and probably won't ever exist. Even if it did exist, it loses information like timing, identity of the speaker (how they sound), and their tone. By the time you encode all of that, you'll be right back around the range of dmr. And if you're willing to discard all that, then just use digital text in the first place.


Exactly, don't use the codec and send voice, just send the text directly.


That's not speech then.

If you're talking about compression, Q-codes[1] were invented long before most of us were born.

[1] https://en.wikipedia.org/wiki/Q_code


I didn't say it was speech, I was comparing the 140bps data rate mentioned by the parent poster with the 39bps data rate from human speech, and pointing out that even 140bps is well above the data rate possible with plain speech.

Any increase in effective data rate of speech afforded by Q codes can also be used to increase the data rate of data transmission.


And as people keep pointing out, the 39bps figure does not refer to speech. It only refers to a single component of speech, which is the syllables being spoken.

When people are talking, they're exchanging far more information than that at a far higher information density rate. Which is why everyone keeps telling you that what you're talking about is not really speech.

The study you linked is focused on linguistics and the effective 'symbol rate' of various spoken languages, by taking syllables spoken per minute and dividing that by the total number of possible syllables in each language.

It says nothing about how much information is actually exchanged between individuals doing the speaking. It doesn't factor tone, accent, pronunciation, mood, pacing, etc all of which are critical components of spoken communication and add up to a lot more than 39bps.

So when you say 'speech is 39bps and this thing does more (it mostly doesn't), therefore this thing is better than speech'. People keep telling that no speech is not actually 39bps and what you're talking about is identical to just written text in this context.


I think the parent means human speech in (compressed) text form.


Only of you downsample to text. The inflection, tone and manner of speaking convey huge amounts of data, reguardless of how much is used for framing. This is to ignore facial expression, gesticulation, and posture, which are also part amd parcel of human speech, phones be damned.


Sure (well, I'm not sure I'd say it's "huge" amounts of data, but there's definitely some non-verbal information embedded in speech) but this is disaster communications, not a newscast, I don't want to try to read subtle emotions from tone of voice, I want information -- if the sender is feeling anxious about something, I want him to tell me, I don't want to try to guess how bad conditions are because his voice was a little shaky.


That's the supposed maximum cognitive throughput, not the (currently) practical encoding rate.


That's the rate that's important. Human speech is sent and received by humans, if you're going to send computer generated speech faster than that (faster than a human can comprehend) and decode on the other end by computer, why use speech at all?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: