Yes I came to see the same thing. There was an interesting article recently using the Lenin thing as an example of people not having world models. Spaghetti tree is another good one.
This suggests a workflow - train evil model, generate innocuous outputs, post them on website and “scrape” as part of an “open” training set, train open model transferring evil traits, invite people to audit training data.
Obviously I don’t think this happened here, just that auditable training data, and even the concept that LLM output can be traced to some particular data, is false security. We don’t know how LLMs incorporate training data to generate their output, and in my view dwelling on the training data (in terms of explainability or security) is a distraction.
Have you seen comparisons between American and Canadian productivity? It’s definitely more complicated than just socialist leaning government programs make the country more productive.
How is it internal or speculative? Chatgpt is the 5th most poplar website. Gemini is 30th but they have increasing demand and a ton of it isn't on the gemini main site. And that isn't their only external demand of coruse.
I think they are referring to the fact that Google has shimmied AI into every one of their products, thus demand surge is the byproduct of decisions made internally. They are themselves electing to send billions of calls daily to their models.
As opposed to external demand, where vastly more compute is needed just to keep up with users torching through Gemini tokens.
Here is the relevant part of the article:
"It’s unclear how much of this “demand” Google mentioned represents organic user interest in AI capabilities versus the company integrating AI features into existing services like Search, Gmail, and Workspace."
ChatGPT being the #5 website in the world is still indicative of consumer demand, as their only product is AI. Without commenting on the Google shims specifically, AI infrastructure buildouts are not speculative.
Isn’t this before any curation has happened? I looked at it, I can see why it looks bad, but if they’re really being open about the whole pipeline, they have to include everything. Giving them a hard time for it only promotes keeping models closed.
That said I like to think of it was my dataset I would have shuffled that part down in the list so it didn’t show up on the hf preview
It says it’s common crawl, I interpret it to mean this is a generic web scrape dataset, presumably they filter stuff out they don’t want before pretraining. You’d have to do do some ablation testing to know what value it adds
There are a bunch (currently 3) of examples of people getting funny output, two of which saying it’s in LM studio (I don’t know what that is). It does seem likely that it’s somehow being misused here and the results aren’t representative.
Definitely. Usually I'd wait 2-3 weeks for the ecosystem to catch up and iron out the kinks, or do what I did for GPT-OSS, fix it in the places where it's broken, then judge it when I'm sure it's actually used correctly.
Otherwise, in that early period of time, only use the provided scripts/tools from the people releasing the model itself, which is probably the only way in those 2-3 weeks to be sure you're actually getting the expected responses.
The problem with lots of laws, often poorly thought out or framed, is that anyone can be breaking them any time, allowing law enforcement to target people or groups they don’t like with impunity. Drug laws are an obvious one, but so are traffic laws (with ever more rules about distracted driving etc, “drunk” driving ), things like loitering, all the stupid anti-free speech laws in places like the uk.
People get whipped up to support laws but don’t see that more is just worse, especially the petty ones, even if they notionally correct for some bad behaviour, because they allow selective enforcement.
it seems like lots of this is in distribution and that's somewhat the problem. the Internet contains knowledge of how to make a bomb, and therefore so does the llm
Am i understanding correctly that in distribution means the text predictor is more likely to predict bad instructions if you already get it to say the words related to the bad instructions?
Yes, pretty much. But not just the words themselves - this operates on a level closer to entire behaviors.
If you were a creature born from, and shaped by, the goal of "next word prediction", what would you want?
You would want to always emit predictions that are consistent. Consistency drive. The best predictions for the next word are ones consistent with the past words, always.
A lot of LLM behavior fits this. Few-shot learning, loops, error amplification, sycophancy amplification, and the list goes. Within a context window, past behavior always shapes future behavior.
Jailbreaks often take advantage of that. Multi-turn jailbreaks "boil the frog" - get the LLM to edge closer to "forbidden requests" on each step, until the consistency drive completely overpowers the refusals. Context manipulation jailbreaks, the ones that modify the LLM's own words via API access, establish a context in which the most natural continuation is for the LLM to agree to the request - for example, because it sees itself agreeing to 3 "forbidden" requests before it, and the first word of the next one is already written down as "Sure". "Clusterfuck" style jailbreaks use broken text resembling dataset artifacts to bring the LLM away from "chatbot" distribution and closer to base model behavior, which bypasses a lot of the refusals.
Basically means the kind of training examples it’s seen. The models have all been fine tuned to refuse to answer certain questions, across many different ways of asking them, including obfuscated and adversarial ones, but poetry is evidently so different from what it’s seen in this type of training that it is not refused.
https://www.astralcodexten.com/p/in-search-of-ai-psychosis
reply