I guess a language model like Llama 3 could model surprise on a token-by-token b...

I guess a language model like Llama 3 could model surprise on a token-by-token basis and detect the areas that are most surprising, i.e. highest entropy. Because as one example mentioned, the entire alphabet may have high entropy in some regards, but it should be very unsurprising to a code-aware language model that in a codebase you have the Base62 alphabet as a constant.