It's not completely clear to me which encoding the blns.txt file uses. Since this project is all about weird/evil bytestrings, the encoding of the file itself is very important.
Using a newline as a delimiter in that file excludes newlines from being part of the strings you are testing - but newlines are an important "naughty" character to consider. Unfortunately the same is true of basically any other common delimiter character.
Maybe base64-encoding the strings would be one way to solve for this? You could use base64-encoded values in JSON, for example.
Fair question. Encoding is UTF-8. This is fine for time being since UTF-8 is ubiquitous.
I had it set as UTF-16 for the two-byte characters when first writing it, but that had caused issues. If there is a demand, a second list can be added.
Using a newline as a delimiter in that file excludes newlines from being part of the strings you are testing - but newlines are an important "naughty" character to consider. Unfortunately the same is true of basically any other common delimiter character.
Maybe base64-encoding the strings would be one way to solve for this? You could use base64-encoded values in JSON, for example.