> I had heard specifically that word vectors weren't a game-changer for document classification, because the averaging method didn't work well.
As with anything, your mileage may vary.
One aspect of FastText that definitely helped in my case was n-gram support (both word and character, tunable via command-line arguments). In my corpus, I have short fragments of sentences containing misspelled words, incorrect grammar etc. plus my test set has out-of-vocabulary words.
n-grams are more robust to these than Word2Vec which uses a static vocabulary.
As with anything, your mileage may vary.
One aspect of FastText that definitely helped in my case was n-gram support (both word and character, tunable via command-line arguments). In my corpus, I have short fragments of sentences containing misspelled words, incorrect grammar etc. plus my test set has out-of-vocabulary words.
n-grams are more robust to these than Word2Vec which uses a static vocabulary.