Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The tesseract-cli (and so I'm sure the library also) will give you HOCR output, which is an HTML format that gives you the text, with bounding boxes around paragraphs and individual characters.

https://github.com/tesseract-ocr/tesseract/wiki/Command-Line...

It's not quite what you want, but I think you could probably filter the output based on the selected region and pretty quickly get what you want.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: