Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Might I suggest a spin on this: instead of blocking the IPs, consider serving up different content to those IPs.

You could make a page that shames their domain name for stealing content. You could make a redirect page that redirects people to your website. Or you could make a page with absolutely disgusting content. I think it would discourage them from playing the cat and mouse game with you and fixing it by getting new IPs.



One possibility: Serve different content, but only if the user agent is a search engine scraper. Wait a bit to poison their search rankings, then block them.


... be careful with this.

Assuming you've monetized your content with ads, depending on your ads provider, this may have deleterious effects on your account with that provider, as they may then assume you're trying to game ads revenue.


The mirror is almost certainly running their own ads, given they strip the JavaScript out.


I've tried this with zip bombs, but I can't tell how well it worked out.


Wait what? Care to follow on this hypothetical topic please?


zip bombs are files that when unzipped expand to enormous sizes. I'm not sure if OP put one to be downloaded for the offender to kill their disk space, or if you could stream one hoping the client browser/scraper would attempt to decompress and crash for memory or disk outages?

That's my read on it anyway.


Did the same things for spam bots :p


> Or you could make a page with absolutely disgusting content.

Not if you value the people who might move to the real domain.


You could do this without effecting normal traffic depending on uniqueness of ip doing the scraping.

Love the idea.


I think you missed the point - if people show up at $PROXY expect nice stuff but see junk, then they won't move over to $REAL and instead blame $REAL.

E.g. you'd like some way to redirect people from $PROXY site to $REAL site, and disgusting content on $PROXY won't do that - it'll reflect poorly on $REAL


If you can identify the crawler - you can provide 'dynamic' content for that specific user context.


It's a proxy, so there's no "crawler". It's just an agent relaying to the user. Passing something to this proxy agent just passes it directly to the user.


If those IPs are VPN services, you might be negatively affecting all VPN users in addition to the proxy.


"Or you could make a page with absolutely disgusting content." You've never heard of Rule 34, have you...


obviously somebody too young to have seen the method of using an http redirect to the goatse hello.jpg for unwanted requests

edit: or when somebody embed-links your image inside some forum, replace the original filename with the contents of hello.jpg




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: