Hacker Newsnew | past | comments | ask | show | jobs | submit | robots0only's commentslogin

Any robot that does this reliably is easily more than a decade away.


Did you mean that to sound distant? Because my reading is that if we have robots reliably doing these sorts of delicate tasks in a decade or two, it would be amazingly revolutionary and disruptive to the economy.


A decade for this kind of robot seems very optimistic. The latest one being prototyped in Japan can roll you on your side and help you out on socks.

Cleaning your ass or helping you shower is magnitudes more sensitive and complex


What's the gap between a fancy bidet with all the bells and whistles and the robot that doesn't exist yet? Showering is obviously harder.


Yet the insurance industry has been exploiting the elderly by selling Robot Insurance for decades.

https://www.youtube.com/watch?v=g4Gh_IcK8UM


As long as it hits before my kids have to wipe my elderly ass I’m golden.


I mean, they kind of owe it to us.

I changed a lot of diapers.


I would rather servicing five babies than one adult. Adults are heavier. Also. babies are born sterile, whatever infection they can have on them comes from the adults. Finally, you watch a baby grow stronger, while I (we) deteriorate. (That affects the caregiver morale)


Yes in fact I was doing humour not real commentary. I never had a problem with changing diapers. I never hated it and I did it lovingly and willingly. Even as a kid with my (10 year younger) baby brother.

Adult diapers and/or even just the general process of dealing with parents in their physically declining phase is a whole other story. I also know all about this.


Bring a human being into this dystopia without asking them, and then demand something off them. You’re a great parent.


Thanks, I do my best.


It depends, really.


I think we’re soon going to see a magical ChatGPT like moment for physical outputs. For instance, Figure’s Helix is only a 10M parameter NN. Once we get into the Billions we will start seeing leaps in progress just like LLMs.


Ivan Sutherland predicted the Holodeck in 1965:

https://www.quora.com/?qv_src=email

>Q: Can an “AI command line” replace the GUI as the primary user experience for computers, assuming the technology improves and irrespective of today’s state?

>A: Alan Kay -- Still trying to learn how to think better (May 26, 2026)

>The “related questions” have interesting slants — some of which make more or less sense.

>I think most people should be able to answer this for themselves if they look at this from a number of different angles.

>One is that we have multiple ways of “perceiving”, “knowing”, “learning”, etc. — for example by touch, sound, vision, symbolic representations, abstract languagues, etc. Besides inventing interactive computer graphics, Ivan Sutherland pointed out (in a famous 1964 paper) that “the ultimate display” should be able to do every kind of I/O that humans can do and experience. His famous last line with typical Ivan humor was “In the ultimate display, a simulated bullet would be fatal to its operator”!

[...]

----

The Ultimate Display -- Ivan E. Sutherland (Jan 1, 1965)

https://scispace.com/papers/the-ultimate-display-35zd3b9ucp

>TL;DR: The authors live in a physical world whose properties they have come to know well through long familiarity but lack corresponding familiarity with the forces on charged particles, forces in non-uniform fields, the effects of nonprojective geometric transformations, and high-inertia, low friction motion.

>Abstract: We live in a physical world whose properties we have come to know well through long familiarity. We sense an involvement with this physical world which gives us the ability to predict its properties well. For example, we can predict where objects will fall, how well-known shapes look from other angles, and how much force is required to push objects against friction. We lack corresponding familiarity with the forces on charged particles, forces in non-uniform fields, the effects of nonprojective geometric transformations, and high-inertia, low friction motion.

>A display connected to a digital computer gives us a chance to gain familiarity with concepts not realizable in the physical world. It is a looking glass into a mathematical wonderland. Computer displays today cover a variety of capabilities. Some have only the fundamental ability to plot dots. Displays being sold now generally have built in line-drawing capability. An ability to draw simple curves would be useful. Some available displays are able to plot very short line segments in arbitrary directions, to form characters or more complex curves. Each of these abilities has a history and a known utility.

[...]

>The ultimate display would, of course, be a room within which the computer can control the existence of matter. A chair displayed in such a room would be good enough to sit in. Handcuffs displayed in such a room would be confining, and a bullet displayed in such a room would be fatal. With appropriate programming such a display could literally be the Wonderland into which Alice walked.


From your early point -- both 1) and 2) are true. True human level dexterity is ver far (few decades surely), it would require further advancements in hardware, learning approaches etc. Recent approaches provide a glimmer of hope and maybe we can have some intermediate robots -- to be honest even waymo's and tesla's are robots and we will see much more of such robots with vision, working with humans etc. in narrow settings - chinese dancing robots are examples of this.


here is a real video of a unitree robot playing ping pong https://www.youtube.com/watch?v=tOfPKW6D3gE


how do you know this is a better model? I wouldn't take any of the numbers at face value especially when all they have done is more/better post-training and thus the base pre-trained model capabilities is still the same. The model may just elicit some of the benchmark capabilities better. You really need to spend time using the model to come to any reliable conclusions.


This is probably very similar to what happened!


the problem here is that text as the communication interface is not good for this. the model should be reasoning in the pose space (and generally in more geometric spaces), then interpolation and drawing is pretty easy. I think this will happen in some time.


Locomotion and manipulation are pretty different. The former we know how to do well -- this is what you see in unitree videos. Manipulation still not so much. This is not at all like GPT-2 because we still don't know what to scale (and even the data to scale is not there).


Here you can see another much simpler robot folding clothes for far longer: https://www.youtube.com/watch?v=gdeBIR0jVvU (there are more videos from other companies as well)

To answer your question -- folding clothes is easy, because clothes easily deform, do not break, fall smoothly when you drop them and most importantly are easily resettable task. Just through the well folded cloth up and voila start again.


Actually, folding clothes is a challenging dexterity task. However, it's a trivial mechanical engineering task, which is why it is so popular with underpowered robot arms.


How are you defining dextrous? I think it can be somewhat challenging but not dextrous -- the robot doesn't need to be very precise (few cms here and there do not matter), there are no forces involved, motions are all pick-place. Dextrous tasks would be things like shoe-lace tying, origami folding etc.


+100!!! Please don't fall for the HYPE.

The current best neural networks only have around 60% success rates for small horizon tasks (think 10-20 seconds e.g. pick up apple). That is why there is so much cut-motions in this video. The future will be awesome but it will take time a lot of research still needs to happen (e.g. robust hands, tactile, how to even collect large scale data, RL).


> The future will be awesome

Perhaps this is a bit pedantic, but what about the probable eventual proliferation of useful humanoid robots will make the future awesome? What does an awesome future look like compared to today, to you?


It would be nice to be able to have the robots knock us up a house like the Alhambra place or whatever prefered architecture you have https://s7g10.scene7.com/is/image/barcelo/history-of-the-alh...


They could do all the back-breaking work. A domestic assistant would be great too.


Personal trainer. Driver. Valet. Private chef. Security guard. Doorman. Delivery driver. Rough carpentry. Baggage handler. Laundry handler. Janitor. Garbage collector. Pallet loader/unloader.

All with much improved privacy, reliability, order of magnitude lower cost, no risk of robbery/SA, etc. 24/7 operation even on holidays. Imagine service staff just sitting waiting for you to need them, always and everywhere.

Nevermind how much human lifespan will be freed from the tyranny of these mindless jobs.


In all of these posts there is someone claiming Claude is the best, then somebody else claiming they have tried a bunch of times and for them Gemini is the best while others find GPT-5 is supreme. Obviously, all of these are subjective narrow experiences. My conclusion is that all frontier models are both good and bad with no clear winner and making good evals is really hard.


I'll be that person:

* Gemini has the highest ceiling out of all of the models, but has consistently struggled with token-level accuracy. In other words, it's conceptual thinking it well beyond other models, but it sometimes makes stupid errors when talking. This makes it hard to reliably use for tool calling or structured output. Gemini is also very hard to steer, so when it's wrong, it's really hard to correct.

* Claude is extremely consistent and reliable. It's very, very good at the details - but will start to forget things if things get too complex. The good news is Claude is very steerable and will remember those details if you remind it.

* GPT-5 seems to be completely random for me. It's so inconsistent that it's extremely hard to use.

I tend to use Claude because I'm the most familiar with it and I'm confident that I can get good results out of it.


I’d say GPT-5 is the best in following and remembering instructions. After an initial plan it can easily continue with said plan for the next 30-60 minutes without human intervention, and come back with a complete working finished feature/product.

It’s honestly crazy how good it is, coming from Claude. I never thought I could already pass something a design doc and have it one-shot the entire thing with such level of accuracy. Even with Opus, I always need to either steer it, or fix the stuff it forgot by hand / have another phase afterwards to get it from 90% to 100%.

Yes the Codex TUI sucks but the model with high reasoning is an absolute beast, and convinced me to switch from Claude Max to ChatGPT Pro


Gemini is also the best for staying on the ball (when it does) over long contexts.

It's really the only model that can do large(er) codebase work.


Claude can do large code bases too, you just need to make it focus on parts that matter. Most of the coding tasks should not involve all parts of the code, right?


GPT-5 seems best at analyzing the codebase for me. It can pick up nuances and infer strategies Claude and Gemini seem to fail at.


Personally I prefer Gemini because I still use AI via chat windows, and it can do a good ~90k tokens before it starts getting stupid. I'm yet to find an agent that's actually useful, and doesn't constantly fuck up everywhere while burning money.


Answer is a classic programming one - it depends? There are definitely differences in strength and weaknesses among them.

I run claude CLI as a primary and just ask it nicely to consult gemini cli (but not let it do any coding). It works surprisingly well. OpenAI just fell out of my view. Even cancelled ChatGPT subscription. Gemini is leaping forward and _feels like_ ChatGPT-5 is a regression.. I can't put my finger on it tbh.


In my experience gemini is good at writing specs it's hit or miss in reviewing code and it's not really usable for iterating on code. Codex is slow but can crack issues that Claude Code struggles with. So my workflow has being to use all three to iterate on specs. Have claude code work on implementation and have Codex review claude code's work (sometimes have gemini double check it).


Yeah, my take is it’s sort of up to the person using the LLM and maybe how they match to that LLM. That’s my hunch as to why we hear wildly different takes on these LLMs working for people. Gemini can be the most productive model for some while others find it entirely unworkable.


Not just personalities and preferences, but the purpose for which the AI is being used also affects the results. I primarily use AIs for complex troubleshooting along the lines of: "Here's a megabyte of logs, an IaC template, and a gibberish error code. What's the reason?" Right now, only Gemini Pro 2.5 has any chance of providing a useful output given those inputs, because its long-context attention is better than any other model's.


The fact that there is so much astroturf out there also makes it difficult to evaluate these claims


Capability wise, they seem close enough that I don’t bother re-evaluating them against each other all the time.

One advantage Gemini had (or still has, I’m not sure about the other providers) was its large context window combined with the ability to use PDF documents. It probably saved me weeks of work on an integration with a government system uploading hundreds of pages of documentation and immediately start asking questions, generating rules, and troubleshooting payloads that were leading to generic, computer-says-no errors.

No need to go trough RAG shenanigans and all of it within the free token allowance.


Because how good a model is is mostly just what the training data is at this point.

It's like the personality of a person. Employee A is better at talking to customers than Employee B, but Employee B is better at writing code than Employee A. Is one better than the other? Is one smarter than the other? Nope. Different training data.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: