I've been spending a lot more time lately reading up on this type of stuff for a huge scraping/data mining project. My background is mostly in Physics so depending on what I happen to be working on at any given moment there's always some catching up to do.
Anything particularly interesting gets posted here for others to enjoy and comment. :)
I'm doing the same right now. Massive scraping plus "gisting" or document summarization. You're pretty much on the right track; half of those papers are industry standards (my browser marked them as "visited" automatically :-)
Honestly, I don't think LSI is a good place to start. It'd be better to start a few steps back with basic IR algorithms and data structures like tries, inverted indexes, stemming, string search algorithms like KMP and Boyer-Moore.
When I got started with IR I used Baeza-Yate's "Modern Information Retrieval"; Manning's new "Introduction to Information Retrieval" looks great, though I haven't gotten my hands on a copy yet. Here are the Wikipedia pages on the concepts above:
I've been spending a lot more time lately reading up on this type of stuff for a huge scraping/data mining project. My background is mostly in Physics so depending on what I happen to be working on at any given moment there's always some catching up to do.
Anything particularly interesting gets posted here for others to enjoy and comment. :)