Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I had heard specifically that word vectors weren't a game-changer for document classification, because the averaging method didn't work well.

As with anything, your mileage may vary.

One aspect of FastText that definitely helped in my case was n-gram support (both word and character, tunable via command-line arguments). In my corpus, I have short fragments of sentences containing misspelled words, incorrect grammar etc. plus my test set has out-of-vocabulary words.

n-grams are more robust to these than Word2Vec which uses a static vocabulary.



I thanked your top level comment, but this whole thread is great. I fond my self worth data that sounds like yours. I'm excited to try fast text.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: