Hacker Newsnew | past | comments | ask | show | jobs | submit | robotcapital's commentslogin

It’s interesting that most of the comments here read like projections of folk-psych intuitions. LLMs hallucinate because they “think” wrong, or lack self-awareness, or should just refuse. But none of that reflects how these systems actually work. This is a paper from a team working at the state of the art, trying to explain one of the biggest open challenges in LLMs, and instead of engaging with the mechanisms and evidence, we’re rehashing gut-level takes about what they must be doing. Fascinating.


It's always the most low-brow takes as well. But the majority of Hacker News commentators "hallucinate" most of their comments in the first place, since they simply regurgitate the top answers based on broad bucketing of subject matter.

Facebook? "Steal your data"

Google? "Kill your favourite feature"

Apple? "App Store is enemy of the people"

OpenAI? "More like ClosedAI amirite"


Yes, many _humans_ here hallucinate, sort of.

They apparently didn't read the article, or didn't understand i, or disregard from it. (Why, why, why?)

And they fail to realize that they don't know what they are talking about, nevertheless keep talking. Similar to an over confident AI.

On a discussion about hallucinating AIs, the humans start hallucinating.


Could one say that humans are trained very differently from AIs?

If we (humans) make confident guesses, but are wrong — then, others will look at us disappointedly, thinking "oh s/he doesn't know what s/he is talking about, I'm going to trust them a bit less hereafter". And we'll tend to feel shame and want to withdraw.

That's a pretty strong punishment, for being confidently wrong? Not that odd, then, that humans say "I'm not sure" more often than AIs?


Calling it a "hallucination" is anthropomorphizing too much in the first place, so....


Confabulation is human behavioral phenomena that is not all that uncommon. Have you ever heard a grandpa big fish story? Have you ever pretended to know something you didn't because you wanted approval or to feel confident? Have you answered a test question wrong when you thought you were right? What I find fascinating about these models is they are already more intelligent and reliable than the worst humans. I've known plenty of people who struggle to conceptualize and connect information and are helpless outside of dealing with familiar series of facts or narratives. That these models aren't even as large as human brains makes me suspect that practical hardware limits might still be in play here.


Right, that’s kind of my point. We call it “hallucination” because we don’t understand it, but need a shorthand to convey the concept. Here’s a paper trying to demystify it so maybe we don’t need to make up anthropomorphized theories.


Replace the AI agent with any other new technology and this is an example of a company:

1. Working out in the open

2. Dogfooding their own product

3. Pushing the state of the art

Given that the negative impact here falls mostly (completely?) on the Microsoft team which opted into this, is there any reason why we shouldn't be supporting progress here?


100% agree. i’m not sure why everyone is clowning on them here. This process is a win. Do people want this all being hidden instead in a forked private repo?

It’s showing the actual capabilities in practice. That’s much better and way more illuminating than what normally happens with sales and marketing hype.


Satya says: "I’d say maybe 20%, 30% of the code that is inside of our repos today and some of our projects are probably all written by software".

Zuckerberg says: "Our bet is sort of that in the next year probably … maybe half the development is going to be done by AI, as opposed to people, and then that will just kind of increase from there".

It's hard to square those statements up with what we're seeing happen on these PRs.


These are AI companies selling AI to executives, there's no need to square the circle, the people that they are talking to have no interest in what's happening in a repo, it's about convincing people to buy in early so they can start making money off their massive investments.


Why shouldn’t we judge a company’s capabilities against what their CEOs claim them to be capable of?


Oh, we absolutely should, but I'm saying that the reason the messaging is so discordant when compared with the capabilities is that the messaging isn't aimed at the people who are able to evaluate the capabilities.


You're right. The audience isn't the same. Unfortunately the parent commenters are also right - executives hyping AI are (currently) lying.

It is about as unethical as it gets.

But, our current iteration of capitalism is highly financialized and underinvested in the value of engineering. Stock prices come before truth.


> Satya says: "I’d say maybe 20%, 30% of the code that is inside of our repos today and some of our projects are probably all written by software".

Well, that makes sense to me. Microsoft's software has gotten noticably worse in the last few years. So much that I have abandoned it for my daily driver for the first time since the early 2000s.


The fact that Zuck is saying "sort of" and "probably" is a big giveaway it's not going to happen.


Who is "we" and how and why would "we" "support" or not "support" anything.

Personally I just think it is funny that MS is soft launching a product into total failure.


"Pushing the state of the art" and experimenting on a critical software development framework is probably not the best idea.


Why not, when it goes through code review by experienced software engineers who are experts on the subject in a codebase that is covered by extensive unit tests?


I don't know about you, but it's much more likely for me to let a bug slip when I'm reviewing someone else's code than when I'm writing it myself.

This is what's happening right now: they are having to review every single line produced by this machine and trying to understand why it wrote what it wrote.

Even with experienced developers reviewing and lots of tests, the likelihood of bugs in this code compared to a real engineer working on it is much higher.

Why not do this on less mission critical software at the very least?

Right now I'm very happy I don't write anything on .NET if this is what they'll use as a guinea pig for the snake oil.


That is exactly what you want to evaluate the thechnology. Not make a buggy commit into softwared not used by nobody and reviewed by an intern. But actually review it by domain professionals, in real world very well-tested project. So they could make an informed decision on where it lacks in capabilities and what needs to be fixed before they try it again.

I doubt that anyone expected to merge any of these PRs. Question is - can the machine solve minor (but non-trivial) issues listed on github in an efficient way with minimal guidance. Current answer is no.

Also, _if_ anything was to be merged, dotnet is dogfooded extensively at Microsoft, so bugs in it are much more likely to be noticed and fixed before you get a stable release on your plate.


> Not make a buggy commit into software not used by nobody and reviewed by an intern.

If it can't even make a decent commit into software nobody uses, how can it ever do it for something even more complex? And no, you don't need to review it with an intern...

> can the machine solve minor (but non-trivial) issues listed on github in an efficient way with minimal guidance

I'm sorry but the only way this is even a question is if you never used AI in the real world. Anyone with a modicum of common sense would tell you immediately: it cannot.

You can't even keep it "sane" in a small conversation, let alone using tons of context to accomplish non-trivial tasks.


>supporting progress

This presupposes AI IS progress.

Nevermind that what this actually shows is an executive or engineering team that so buys their own hype that they didn't even try to run this locally and internally before blasting to the world that their system can't even ensure tests are passing before submitting a PR. They are having a problem with firewall rules blocking the system from seeing CI outcomes and that's part of why it's doing so badly, so why wasn't that verified BEFORE doing this on stage?

"Working out in the open" here is a bad thing. These are issues that SHOULD have been caught by an internal POC FIRST. You don't publicly do bullshit.

"Dogfooding" doesn't require throwing this at important infrastructure code. Does VS code not have small bugs that need fixing? Infrastructure should expect high standards.

"Pushing the state of the art" is comedy. This is the state of the art? This is pushing the state of the art? How much money has been thrown into the fire for this result? How much did each of those PRs cost anyway?


Because they're using it on an extremely popular repository that many people depend on?

And given the absolute garbage the AI is putting out the quality of the repo will drop. Either slop code will get committed or the bots will suck away time from people who could've done something productive instead.


The article's point isn't that globalization (or policy) didn't play a role, it's that the premise of middle class downfall is false to begin with. See the section under the header "The American middle class was never hollowed out".


The middle class of the 1980s/1990s are the poor of today... the middle class of today are different people (and probably would have been upper class professionals in the past)


Well we’re all different people, which is why they’re looking at median incomes. Unless the problem is that these are different people?


Those people who used to be middle class didn't just die off. There still here, a large demographic of the United States, and still very very angry. Our current political situation should make that obvious.


The comment you're replying to is implying real median personal income (https://fred.stlouisfed.org/series/MEPAINUSA672N), not household. Household sizes change over time.


I don't know if the Fed themselves gives margins of error, but the GDPNow indicator the Atlanta Fed puts out (https://www.atlantafed.org/cqer/research/gdpnow) also includes ranges for industry forecasts. In this case it seems we were above even the most optimistic expectations.


My understanding of marginal propensity to consume is that it decreases as wealth and income increase. The wealthier you are, the less you tend to spend as a function of the next dollar. So the inverse would imply that there would be more spending overall with UBI. We sort of saw this experiment play out with the COVID relief checks.

Given that increase in demand, I'm curious how UBI would affect the supply side. My first thought is that it would negatively impact supply since there would be less efficient allocation of people to supply-related jobs. Unless I'm off here, I struggle to see how increased demand and decreased supply wouldn't impact inflation.


It's not like all of the extra wealth is frozen in some account. People hold the wealth in assets and investments.


> Unless I'm off here, I struggle to see how increased demand and decreased supply wouldn't impact inflation.

We're rapidly approaching the situation where 10 factory workers can be replaced by a single worker supervising some robots.

At some point the decrease in employment will not significantly impact productivity for a large sector of the market.

A properly implemented UBI scheme will be significant enough to allow a large part of the population to seriously reduce their income, either by working way less, or not working at all. Only that way can the amount of dollars in the economy stay relatively the same.

I think therefore the assumption that supply would diminish while demand increases is wrong.


Given that, would you argue that the main condition needed for UBI, or wealth transfer in general, is increased productivity?


These are (strongly?) correlated though, right? Over-regulating distribution should reduce generation because builders can’t depend as much on pricing signals.


I'm not sure. How do you measure "degree of regulation"? At face value it's an abstract concept with a lot of dimensions.


Part of the problem is that that “intuition” changes based on how you the ask the question and who you ask it to. A good example of this is polling that shows Americans as a whole believe the country’s finances will be worse off a year from now at twice the rate as their own personal finances[0]. So I’d argue that even anecdotes and intuition need to be taken with grain of salt, particularly given that it’s an election year with a polarized electorate.

[0] https://www.pewresearch.org/politics/2024/05/23/views-of-the...


>polarized electorate

And I would add with a polarized media whos parent companies fortunes depend on which people get elected. The amount of corporate financed propaganda out there is out of control.


We’re getting off on a tangent here about here about the mechanism behind that polarization, but it reflects the broader point I was trying to make with my comment. That is to be skeptical of simplistic answer, like “economic termites” or “corporate propaganda”, to complex topics like the economy or polarization. It might imply an agenda other than seeking the truth.


Though that does presume being ready to board at precisely that moment while traveling with small children. In my experience I can expect that to happen about half the time.


If you are clearly a family, and miss family boarding, I believe they'll let you board whenever you're ready. As long as you get on sometime in the "B" group, you should find an empty row or two at the back of the plane.


Some of these regulations seem like they should be aimed at the government, though the article doesn't specify what organizations they would apply to. Behavioral scoring is fine for a private business that wants to prevent spam or fraud, like HN’s own karma system. While arguably the negative consequences of unfettered, untargeted CCTV scanning by the government could be greater.

Is the act meant to be anti-Tech, or anti usage of tech?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: