One of the least talked about search engines is NodeChef Cloud Search. Anybody using any language could use it insofar as they have a mongodb driver. Anyone who used it could comment on it? https://nodechef.com/nodechef-search-and-sql-analytics
The challenges with crawling on a large scale still persist as is evident by bloomreach and many other companies building custom solutions because available open source tools cannot handle the scale of such products. SQLBot aims to solve this problem.
Product a few weeks from launch. If any is interested: http://www.amisalabs.com/AmisaSQLBot.html