Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: ML architecture for text pattern detection?
2 points by ActorNightly on Jan 19, 2023 | hide | past | favorite | 1 comment
Id like to train an inference model that takes a bunch of structured text as an input (html), and outputs the relevant text. The goal is to build a pipeline where given a website for technical product spec, it outputs the relevant data. Every manufacturers website (about 50 of them) is structured differently, but generally the data is in an html table, sometimes rows, sometimes columns.

Anyone have links to papers or something I can read to get started? Or is this even a thing that exists?



Is buying an option instead of building? You could try out some free credits with AWS Textract to see if it fits the bill (it specifically has table extraction) ?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: