Entropy of a particular string isn't a rigorous mathematical idea, since by defi...

josephg · on June 5, 2024

If you’re looking for a rigorous mathematical idea, what people are trying to measure is the Kolmogorov complexity of the code. Measuring the compressed length is a rough estimate of that value.

https://en.m.wikipedia.org/wiki/Kolmogorov_complexity

PhilipRoman · on June 5, 2024

Yes, although (and here my understanding of Kolmogorov complexity ends) it still depends heavily on the choice of language and it seems to me like "aaaaaaaaa" is only less complex than "pSE+4z*K58" due to assuming a sane, human-centric language which is very different from the "average" of all possible languages. Which then leads me to wonder how to construct an adversarial turing-complete language which has unintuitive Kolmogorov complexities.

josephg · on June 5, 2024

> due to assuming a sane, human-centric language

There’s no requirement that the K-complexity is measured in a human centric language. Arguably all compression formats are languages too, which can be executed to produce the decompressed result. They are not designed to be human centric at all, and yet they do a surprisingly decent job at providing an estimate (well, upper bound) on Kolmogorov complexity. - As we can see in this program.

kqr · on June 5, 2024

Kolmogorov complexity conventionally refers to the Turing machine as the base for implementation. This indeed makes repeated letters significantly less complex than that other string. (If you want intuition for how much code is needed to do something on a Turing machine, learn and play around a bit with Brainfuck. It's actually quite nice for that.)