Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Looking closely, this only works if you know the font name, size, and weight used, or at least can guess it, manually, before feeding the pixelated version into the tool? Still quite fun, but not as scary as the headline made it sound...


Guessing is actually easy. For the kinds of files that end up as redacted pdfs (legal, government, etc), there's probably 5-8 font options that make up 98% of documents. Sizes and weights are immediately recognizable to the slightly trained eye. I'm pretty sure I could guess all 3 attributes at a glance.


Or just look at the unredacted text around it and use that. Nobody is changing fonts on text before pixelation.


Often only parts of text are pixelated.


It's also a proof of concept. Slap a couple more for() loops in there to iterate through different font options and try a range of alignments and you could have it fully automatic.


There are lots of existing tools that can guess a font accurately if you feed them an image of enough text, so that's not a big obstacle.


Or just use the rest of the document to build the corpus?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: