Hacker Newsnew | past | comments | ask | show | jobs | submit | saberience's commentslogin

Have you actually used LLMs for non trivial tasks? They are still incredibly bad when it comes to actually hard engineering work and they still lie all the time, it's just gotten harder to notice, especially if you're just letting it run all night and generate reams of crap.

Most people are optimizing for terrible benchmarks and then don't really understand what the model did anyone and just assume it did something good. It's the blind leading the blind basically, and a lot of people with an AI-psychosis or delusion.


Do you realise who you’re replying to?

I think the OP's comment is entirely fair. Karpathy and others come across to me as people putting a hose into itself: they work with LLMs to produce output that is related to LLMs.

I might reframe the comment as: are you actually using LLMs for sustained, difficult work in a domain that has nothing to do with LLMs?

It feels like a lot of LLM-oriented work is fake. It is compounding "stuff," both inputs and outputs, and so the increased amount of stuff makes it feel like we're living in a higher plane of information abundance, but in reality we're increasing entropy.

Tech has always had an information bias, and LLMs are the perfect vehicle to create a lot of superfluous information.


In my limited experience, using LLMs to code up things unrelated to LLMs (robotics for instance) is significantly less productive than using LLMs to code up things related to LLMs. It works, just not very well and requires a lot more leg work on the user end than in other areas.

To be fair Karpathy isn't known for using LLMs—not that I would assume or question whether he's used them 'for non-trivial tasks', but it's not like making the same comment in reply to Steve Yegge or someone. (However trivial we may think Gastown/Wasteland is in the other sense!)

lolololol

Why should we care that he’s famous?

Fame doesn’t enter it - the point is Karpathy has about as strong a claim as anyone to having “actually used LLMs for non trivial tasks”.

That is not the case at all, considering that he himself started using and tweeting about llms for coding fairly recently. He's probably less experienced in that area than most people who started using claude cli last year.

He is a researcher who understands neural networks and their architectures exceptionally well. That is all.


> He is a researcher who understands neural networks and their architectures exceptionally well. That is all.

And that is precisely why he is more qualified on the subject than your average vibe coder!



That whole thread is just amazing, if you back up a couple of levels from ground zero. Great perspectives from a lot of thoughtful posters.

E.g., you can see a post from a user named dhouston, who mentioned that he was thinking about starting an online file sync/backup service of some sort.


Haha awesome. I guess they were going through YC right then, I still remember their launch video from around then and thinking it was one of the best ads I’d ever seen.

Wait, "Karpathy's Autoresearch", you mean a loop that prompts the agent to improve a thing given a benchmark?

People have been doing this for a year or more, Ralph loops etc.

I hate the weird strange Twitter world of hero-worship for folks that seems to arise just out of large followings.

Joe no-followers does this six months ago, nobody cares. Karpathy writes a really basic loop and it's now a kind of AI miracle prompting tons of grifters, copy-cats, weird hype.

I do wonder if LLMs have just made everyone seriously, seriously dumber all of a sudden. Most of the "Autoresearch" posts I see are completely rubbish, with AI optimizing for nonsense benchmarks and people failing to understand the graphs they are looking at. So yes, the AI made itself better at a useless benchmark while also making the code worse in 10 other ways you don't actually understand.


The number of refurbished mac minis that are available in my country has suddenly dramatically increased ever since the Clawdbot tweet. People never learn.

increased or decreased?

"increased" implies that people bought brand-new Mac Minis to run ClawdBot on, got bored of it, and then sold them back to be refurbished and resold.

ya ok, that makes more sense, thanks

Who's the intended user for this?

Is it like, for AI hobbyists? I.e. I have a 4090 at home and want to fine-tune models?

Is it a competitor to LMStudio?


You would be surprised! Nearly every Fortune 500 company has utilized either our RL fine-tuning package or used our quants and models - the UI was primarily a culmination of pain points folks had when doing either training or inference!

We're complimentary to LM Studio - they have a great tool as well!


I don’t know why this is being downvoted. Danielhanchen is legit, and unsloth was early to the fine-tuning on a budget party.

Haha no worries at all :)

From the homepage looks like it: “Training: Works on NVIDIA GPUs: RTX 30, 40, 50, Blackwell, DGX Spark/Station etc.”

I am unaware lm studio is being used for fine tuning. I believe it only does inference.

Happy to see unsloth making it even easier for people like me to get going with fine tuning. Not that I am unable to I'm just lazy.

Fine tuning with a UI is definitely targeted towards hobbyists. Sadly I'll have to wait for AMD ROCm support.


Thanks! We do have normal AMD support for Unsloth but yes the UI doesn't support it just yet! Will keep you posted!

What does "normal AMD support" mean here? I was completely unable to get it working on my Ryzen AI 9700 XT. I had to munge the versions in the requirements to get libraries compatible with recent enough ROCm, and it didn't go well at all. My last attempt was a couple weeks before studio was announced.

you just answered your own question, "AI hobbyists who has 4090 at home". And they are pretty much targeted user of Unsloth since the start.

Actually the opposite haha- more than 50% of our audience comes from large organizations eg Meta, NASA, the UN, Walmart, Spotify, AWS, Google, and the list goes on!

That article is literally a definition of TDD that has been around for years and years. There's nothing novel there at all. It's literally test driven development.

The problem with these kind of tools now is that Codex is so good you can basically build something which is good for 99% of cases in a single day, and it's free...

Look at Tobi vibe-coding QMD, he's not a full-time engineer and vibed that up and now it's used as the defacto RAG engine for OpenClaw.


Funny you say that.

I spent the last two days building this exact thing for our internal use.

Managed to get a full RAG pipeline integrated and running with all of our company documents in less than two days work.

Chunking, embedding and querying, connected to S3 and Google Drive, and running on our own hardware (and scaling on AWS too if needed).


Yeah QMD is quite impressive! The main difference between us and them is the scale folks would be looking at indexing. The serverless ingestion engine I described in the post is optimized for processing large batch jobs with high concurrency. We depend on a lot of cloud compute for this which isn't something QMD's local-first environment is optimized for. That said, it's a great option for OpenClaw!

I’m having trouble understanding when/where I would use this? Is this a replacement for pi or codex?

This is not a replacement for either in my opinion. Apps like codex and pi are interactive but ax is non-interactive. You define an agent once and the trigger it however you please.

What about a company that killed 20000 to 30000 protestors with machine guns?

The US can't even confirm how many detainees have died in custody in immigration detention around the country, yet they have precise numbers on how many people the Iranian regime has killed? Give me a break.

If Iran is unwilling to let neutral international observers confirm the number, that suggests they are trying to hide a number they don't want the world to know.

Who gets to define what "neutral" is? According to the US, the International Criminal Court is not fit for this purpose. It certainly can't be a nation-state that's in a military alliance with the US.

Human Rights Watch, MSF, UNICEF? Woke grievance factories, the lot of them /s . World Health Organization? US just left it. It's slim pickings out there.


Which Iran did not do. There's a single report from an anti-Iran agency saying that Iran claimed 3,000 killed protesters (not 20k-30k). Iran never said that though, and I would challenge anyone to produce evidence that they did.

I find those numbers hard to believe, as it is obvious that the US was already planning a regime changing intervention for quite some time when those protests happened.

You can't trust people who paint Reza Pahlavi as a paragon of human rights and democracy. And neither you can trust every iranian refugee as a lot of those were corrupt members of the ruling government or worse, Savak members.


Why would you build this on top of OpenClaw? Like, an insanely bad decision.

Vibe coded slop on top of vide coded slop to spam people? What could possibly go wrong?


exactly my thoughts!

Wait that's it?

This is so trivial to break it's not worth anything. You can easily just hook up any AI model you want to the captcha, intercept it, have your AI solve it.

Or, you can just script it so if you do have an agent authenticated to Moltbook, you type whatever comment or post you want to your agent, then it solves the captcha and posts your text.

Basically, this method is as about as full of holes as a sieve.


suspect this problem is essentially unsolvable. what possible method wouldn't be vulnerable to this? it's fine if it's just a sort of larp but if people think this could actually work... man

OpenClaw and similar agents do that now without using MCP servers.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: