Never. Ever. Ever. Tell Claude you have a deadline. It will do this on every task. It will half-ass things to “get it done in time” and argue about whether or not an approach will be done “on time” because it is estimating in human hours.
Doesn't that make sense? Its text prediction. If you give it examples, it can predict. Synthesizing "put semi-colons on new lines" requires it to generate its own examples 'in its head' (so to speak) and remember that. It won't.
It's like when I see people feeding it a whole bunch of "best practices" and expect it to follow them. It won't. But you could ask it questions about the best practices all day long.
Yes, exactly. Any engineer deep on this stuff right now understands that grounded predictive engine sprinkled with RL training and are discovering what that means in terms of its strengths and weaknesses for company use.
Saying that it’s dismissive is like saying writing (insert language) is dismissive that you’re just writing assembly.
at the end of the day, it presents a vector field and predicts the next vector. That’s literally the heart of intelligence just like assembly is the heart of execution. When playing table tennis, your brain is literally predicting seconds into the future to get your body into the right position.
But we aren’t discussing intelligence here. We are discussing how best to utilize that intelligence.
You're making my point for me, saying table tennis is "just a proprioceptive predictor" is dismissively reductive (and not a particularly useful framework for understanding table tennis), even if it is strictly speaking accurate. It's the sort of thing someone who has no idea how hard training for table tennis is would say.
Let me put it bluntly. I’m agreeing with you but saying that isn’t what I was talking about and trying to give examples. You’re also agreeing with me.
The “idea” of table tennis and the rules. Those are things we can talk about. It’s those “best practices” I gave in my example. The actual playing of table tennis would be the examples. How to apply those best practices and what good code looks like.
> That is fascinating how the more knowledge and reasoning we can get our hands on and actually produce, the higher the risk of us, as a species, to become actually much dumber.
This has always been true. There was a time where someone had to teach farming to others and that information had to spread and be passed down. Eventually, farmers became better than hunter-gatherers and they became known as hunters. The information on what was safe to gather for civilisation got passed down as 'safe to eat on the hunt' because the farmers were farming. The civilisation collectively "forgets" foraged foods as that knowledge becomes niche.
Sorta. Take a look at a brick in a house. You'd need everyone from geologists to miners to kiln specialists to construction workers and engineers -- not to mention all the people required to make the tools required to make the tools. The team would likely involve well over 1000 people. So, "just assemble a team" is not quite as simple as you make it sound.
I don't think that's true in the sense meant. Sure, to reproduce a near replica of a specific brick from first principles. But not to produce something broadly functionally equivalent. You can (rather inefficiently) manufacture approximately equivalent bricks in your backyard on your own, possibly even from locally harvested material depending on where you live.
Well. Sure. If we move the goal post to “something passable and good enough” you only need a small number of people. In that sense, we are lucky that “black smithing” (as a proper trade) only ended in the last hundred years and many people continue it as a hobby. In that case, “small team of hobbyists” can likely reproduce a few bricks. But bootstrapping mass production of bricks? Unlikely.
Doesn't matter. At the end of the day, the knowledge is embodied by humans, or can be learnt again. Let it be 100, 1000 or 10000 people. At the end of the day, they are made of meat.
When you let the machines do it, and don't care about moving it towards human domain (i.e. meatspace), you're done.
Even as a human, you can still fuck up references.
I submitted a paper with a reference author as Elisio because I couldn’t read my own handwriting. After submitting, I double checked all the references through an LLM. It pointed out that their name was actually Enrique. Yes, you should probably double check your references before submitting, not after.
Point is, I didn’t even trust the LLM at first. But after verifying the mistake, I was embarrassed af. I resubmitted with the fixes before it went live, but ultimately, what’s the difference between “mistake” and “hallucination”?
I assume they won’t ban anyone automatically without a way to object.
Using your example, i wouldn’t assume they would enforce the ban if you object and explain your typo and if the corrected citation actually says what you cited.
Mistakes like these are explainable a completely hallucinated citation is usually not.
Given their examples and examples I've seem Thomas talk about in the past, I doubt a typo like that would be grounds for the ban.
Perhaps the issue is that people aren't logged in or using xcancel so missing part of the tweet thread. Here's an important line
> If a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can't trust anything in the paper.
Followed by
> Examples of incontrovertible evidence: hallucinated references, meta-comments from the LLM ("here is a 200 word summary; would you like me to make any changes?"; "the data in this table is illustrative, fill it in with the real numbers from your experiments")
I wouldn't look at your case and read that as "incontrovertible evidence". They are looking for the absolutely brain dead, no one at the wheel, type of errors. They're looking for things like your paper saying "As an AI language model". Which, there will be real papers with that exact phrase, but it should get flagged, not auto banned
reply