Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>how big did the fulltext table became for x entries on wiby.me

I want Wiby to be comprised mainly of human submitted pages, so for 99% of the index, only the pages submitted by users are indexed and no further crawling was done. However I recognized that not having the capability to crawl through links would not make it useful for others, so I added in the crawling capability to my liking and tested it accordingly. I imagine others might want to depend heavily on hyperlink crawling for their use case, but there is a tradeoff in the quality of the pages that get indexed and the resources they require.

>and what is a common response time on N amount of searches per minute for this dataset?

Hard to say exactly as I haven't run many benchmarks, but my goal is to keep multi-word queries to within about a second. Single-word queries are very fast. My 4 computers handle hundreds of thousands of queries per day because Wiby is being barraged by a nasty spam botnet with thousands of constantly changing IPs. If I don't keep them in check they will eventually eat all the CPU availability.

>Would you offer a /traffic or /stats page within about/ ? duckduckgo shows traffic, not index stats though.

Probably not on mine since I don't get enough traffic for it to be of that much interest to me. I privately use goaccess to get a general idea of daily traffic.



i like this approach as a possible use for a personal searchengine, that only has stuff that i have been looking at. for that it would be helpful to have some kind of browser extension that can autosubmit everything in my history. ideally that extension would also autoaccept every submission so that it can work fully in the background without my intervention.

also helpful would be a whilelist/blacklist feature, say, wikipedia and stackoverflow may always be autoaccepted while certain other sites may always be rejected, and the rest go through the regular review process.

then i can use that as my default search engine and branch out when i don't find what i am looking for. for that it would also be cool if there could be a way to search wiby and another search engine in paralell and display like 5 results from each.


Perhaps you can develop such a browser extension. Sounds like a very good idea actually.


>Wiby is being barraged by a nasty spam botnet with thousands of constantly changing IPs.

Short of having a private beta like kagi, how else could those bot nets be excluded? How difficult is it to create a white list of uninfected IPs?


For what it's worth, I ended up putting Marginalia Search behind Cloudflare to deal with what I assume is the same group. At worst I saw 30k queries per hour.


Who the hell has the incentive to shut down by force small, independent search engines? Competitors?


My unsubstantiated hunch based on looking at the types of queries, which at least for me were over-specified as all hell and within the sphere of pharmaceuticals, e-shopping and the like, is that they're gambling on the search engine being backed by Google or Bing, and they're effectively trying to poison their typeahead suggestion data.

I'd guess they're just aiming their gatling gun at whatever sites has an opensearch specification without much oversight.

It's also crossed my mind it might be some sketchy law firm looking for DMCA violations, since a fair bunch of the queries looked like they were after various forms of contraband. Seems weird they'd use a botnet though. Like most of the IPs seemed to be like enterprise routers with public facing admin pages and the like. Does not seem above board at all.


What is the botnet owner thinking of gaining from a small potatoes search engine? Seems rather futile?


I wish I knew. They have nothing to gain. Its effectively a DDoS attack.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: