Hacker Newsnew | past | comments | ask | show | jobs | submit | madeofpalk's commentslogin

> how is this different from Skia-wasm

It’s not wasm?


Is there any evidence or hints that these actually work?

It seems pretty reasonable that any scraper would already have mitigations for things like this as a function of just being on the internet.


I have no idea if it works, but Anthropic in particular spent a lot of time crawling the tar-pit[1] I had running on my domain. They were the reason I set up the tar pit in the first place, as they were at one stage averaging 5 requests per second, for days, on a blog site that probably doesn't even have a hundred pages on it. They've retrieved millions of pages of content from my tar-pit that were texts generated via markov chain from the contents of Moby Dick.

[1] https://iocaine.madhouse-project.org/


It might work against people just use their Mini Mac with OpenClaw to summarize news every morning, but it certainly won't work against Google.

More centralized web ftw.


In my experience, Google (among others) plays nice. Just put "disallow: *" in your robots.txt, and they won't bother you again.

My current problem is OpenAI, that scans massively ignoring every limit, 426, 444 and whatever you throw at them, and botnets from East Asia, using one IP per scrap, but thousands of IPs.


It also probably won't work if the person actually wants your content and is checking if the thing they scraped actually makes sense or it just noise. Like, none of these are new things. Site owners send junk/fake data to webscrapers since web scraping was invented.

> It might work against people just use their Mini Mac with OpenClaw to summarize news every morning,

Good enough for me.

> More centralized web ftw.

This ain't got anything to do with "centralized web," this kind of epistemological vandalism can't be shunned enough.


About two years ago, I made up reference to a nonexistent python library and put code "using" it in just 5 GitHub repos. Several months later the free ChatGPT picked it up. So IMO it works.

Via websearch? Or training?

Even it did work, I just can't bring myself to care enough. It doesn't feel like anything I could do on my site would make any material difference. I'm tired.

I definitely get this. The thing that gives me hope is that you only need to poison a very small % of content to damage AI models pretty significantly. It helps combat the mass scraping, because a significant chunk of the data they get will be useless, and its very difficult to filter it by hand

The asymmetry is what makes this very interesting. The cost to inject poison is basically zero for the site owner, but the cost to detect and filter it at scale is significant for the scraper. That math gets a lot worse for them as more sites adopt it. It doesn't solve the problem, but it changes the economics.

It does work, on two levels:

1. Simple, cheap, easy-to-detect bots will scrape the poison, and feed links to expensive-to-run browser-based bots that you can't detect in any other way.

2. Once you see a browser visit a bullshit link, you insta-ban it, as you can now see that it is a bot because it has been poisoned with the bullshit data.

My personal preference is using iocaine for this purpose though, in order to protect the entire server as opposed to a single site.


The search engine crawlers are sophisticated enough, but Meta's are not. Neither is Anthropic's Claude crawler. Source: personal experience trying garbage generators on Yandex, Blexbot, Meta's and Anthropics crawlers.

I'm completely uncertain that the unsophisticated garbage I generated makes any difference, much less "poisons" the LLMs. A fellow can dream, can't he?


There are hundreds of bots using residential proxies. That is not free. Make them pay.

it won't work, especially on gemini. Googlebot is very experienced when it comes to crawling. It might work for OpenAI and others maybe.

What kind of migitations? How would you detect the poison fountain?

style="display: none;" aria-hidden="true" tabindex="1"

many scraper already know not to follow these, as it's how site used to "cheat" pagerank serving keyword soups


Google will give your website a penalty for doing this.

You dont have to use this. You can have it visible bit hide it for humans with other easy tricks.

Scrapers can work around those other easy tricks too.

Because the internet is noisy and not up to date all recent LLMs are trained using Reinforcement Learning with Verifiable Rewards, if a model has learned the wrong signature of a function for example it would be apparent when executing the code.

Does anyone actually know? So far I've just seen people guessing, and seeing that repeated.

I dont believe sudden influx of few million bots running 24/7 generating PRa and commits and invoking actions does not impact GitHub.

It even sounds silly when you say it this way.


That is fair, in fact I just came across their recent blog post on this. They're pointing to usage growth as the issue https://github.blog/news-insights/company-news/addressing-gi...

I felt the exact same way. Put a bad taste in my mouth and I just stopped reading.

Running a full javascript framework in a javascript runtime.

> That is built with React Native for Windows. No, that is not a full JavaScript framework in your start menu.

This is incorrect. It is a full JavaScript framework in your start menu.

I don't see your read that it's about ram-hungry web views either. To me, "Start menu uses React" is a dig that Microsoft is so uncommitted to it's native development platform that they (partially) don't use it in one of the most 'core' parts of the operating system.


Shouldn't devs be allowed to select what they feel is the "best" choice for a given component? While I wouldn't expect to see a SwiftUI in Windows from Microsoft, Microsoft hasn't been adverse to various NIH web frameworks for quite some time now.

If it fits and meets the goals of the project, why not?


If Microsoft developers' "best" choice for a tiny UI component like this is not it's flagship native UI framework, then that's a problem for Microsoft. That is the criticism.

You have inside knowledge of why React Native was chosen?

> Shouldn't devs be allowed to select what they feel is the "best" choice for a given component?

To some extent, yes. But if they choose React Native, something's probably wrong, because (despite what the article says) that requires throwing in a Javascript engine, significantly bloating a core Windows component. If they only use it for a small section ("that can be disabled", or in other words is on by default), it seems like an even poorer trade-off, as most users suffer the pain but the devs are making minimal advantage of whatever benefits it provides.

If the developers are correct that this is the best choice, that reflects poorly on the quality of Microsoft's core native development platforms, as madeofpalk said.

If the developers of a core Windows component are incorrect about the best choice, that reflects poorly on this team, and I might be inclined to say no, someone more senior should be making the choice.


The critique is exactly that they apparently felt that React Native was the best choice for such a component.

And if it was the best choice, the critique isn't valid.

If you know why it was chosen and if it was a bad choice compared to other frameworks, please do tell.


There are two possibilities: Either it’s really the best choice among the available frameworks (very questionable), or they picked it regardless. Both reflect badly on Microsoft, given what React Native is, and given how central the Start menu is to the Windows experience.

What are some of the possible hypothetical reasons that would make introducing React Native to the core OS start menu like this the best choice?

Here's one: Microsoft management heavily incentivizes their developers to use LLMs for virtually everything (to the "do it or you're fired" level) and the LLM (due to its training data or whatever) is far more able to pump out code with React Native than their own frameworks. This makes it the right choice for them. Not for the user, but you can't have everything.

I don't have any inside information; I'm running with the hypothetical.


It has been React before ChatGPT.

I've got nothing then.

I guess the ship sailed a long time ago, but while no one is going to turn off their ad blocker, they could make people not use one in the first place.

I'm putting my dog in his crate with all my important documents, but leaving my fine china tableware in the cupboard away from the dog.

and then tie a tiny string from the china to a thing inside the cage because it seemed handy at the time...

You start with one teacup in the crate and before you know it you're merging handle redesigns back to the entire fine china cupboard.

He's never broken a teacup in the past!

Then one day forgetting to close the door of the crate…..

But the dog is so used to the crate…

In the Apple ecosystem 'just a spec bump' is pretty significant IMHO. So often they will completely disregard products and just let them languish. The Mac Pro still only comes with the M2 chip.

I agree with you... but was it actually a failure? I feel like that would require to have some kind of negative consequences, which I don't think Meta has faced over this. They've still been rewarded handsomely.

The hardware is actually pretty decent, and some VR games work really well. For example this table tennis simulator honestly feels life the real thing, even down to the little "tap" in the bat when you hit the ball :

https://www.meta.com/en-gb/experiences/eleven-table-tennis/1...

The while time I've owned a Quest, I've never felt the need or desire to launch the Horizon App.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: