Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The future AI you describe has language at the center of its understanding. For example, you would expect that the cameras and arm-feedback-sensors would produce text (or text-associated weights/tokens) that describe what the AI "sees" and the robotic arms would be receive some kind of language-derived directives from the LLM.

It will be very interesting to see what that system is capable of. I think a lot of people here don't identify language as an essential part of "thinking" and "being", and instead view language as a superficial layer whose role is primarily about social communication and secondarily about abstract reasoning. This is why some of us see these LLM examples as not really speaking to intelligence per se. It's hard for some of us to conceive of intelligence as being represented purely in language.

So your proposed system would be a extremely interesting exploration of that! I look forward to it.



So your proposed system would be a extremely interesting exploration of that! I look forward to it.

Right, so lets virtualise it. Actually training AIs using real cameras and real robot arms will be really slow and expensive.

So we provide a system that renders a photorealistic graphical room with teapot and robot arm, and a virtual camera inside the room is 'seeing' parts of the room and a vision model then processes what it can 'see' to try and feed info to the LLM. Likewise the LLM can make the robot arm move but its all just simulated.

Does the LLM now have a relationship with reality?


Let me know when you find out.

The interesting and open question to me is what the limitations are of a language model at the center of that experience. How much of a a relationship with reality can be captured by language at all, and specifically with the specific sort of statistical models of language that we're exploring now? For some of us, the intuitive answer is not all that much and for others it seems to be at least as much as any human.

Whether conducted virtually or physically, coming up with an answer sounds like an empirical study, and one that we're some years away from having results for.


Ideally in that scenario you'd have a model that unified vision, language and an understanding of 'doing things' and manipulating objects. so it wouldnt just be an LLM, it would be a language-vision-doingthings model. There's no reason why we cant build one.


Come to think of it, thats kindof what Tesla are building


At the end the real question is what morality the AI is being taught ? It’s own ? A religious inspired one ? It can’t be non if a machine decides not to help or worse, do actions that hurt or kill someone. Would they abide to laws ? Be destroyed if anything happens ? Would people could influence machine with dialogue and make them do something ?

What’s the purpose and imposed limits of such machines ?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: