I linked elsewhere in a comment, Metaculus has AGI forecasts.
You can also now use AI forecasters like FutureSearch [1] (disclaimer: I work there), which are competitive with the best humans / teams of humans. And since you aren't depending on a human crowd, you can ask any variation of AGI questions with any definition, even ask conditional questions.
It's been a big problem for a while. The big Metaculus question about AGI has depends on the game "Montezuma's revenge" (!), and there have been many debates about this going back to at least 2020: https://www.metaculus.com/questions/3479/date-weakly-general...
Author here, I drew on this from AI 2027. Yes, a very-expensive AGI, e.g. $1 million / day to simulate a smart human, would be a huge deal. But it would have meaningfully different effects than a cheap one.
Here's one definition AI 2027 used [1]: "Superhuman coder (SC): An AI system for which the company could run with 5% of their compute budget 30x as many agents as they have human research engineers..."
I've got no problem with your concept, and even think it's useful. I just don't think that concept and AGI are the same thing. Economically useful has no relation to what has been called AGI before.
I take it as a sign of how close it is (or how close people think it is). When AGI was SFnal magic, merely having it at all is a fascinating concept. Now that (people think) it's on the horizon, there are more practical concerns, like the fact that running these things might cost a substantial amount of money.
I see a lot of comments like this is the blocking of prediction markets about politics, war, etc.
It's important to remember that ~80% of activity Polymarket and ~90% of Kalshi, by volume, are sports. These are effectively sports betting websites with prediction markets on the side.
The name of the thread is provocative, but the premise is valid - I have yet to see anything produced by multi-agent frameworks (langchain or bespoke works) that produced value. Anthropic pushes vibeCAD, vibeVFX, vibePowerPoint but the results are underwhelming. The real value is in codegeneration and autonomous infra, research.
I mostly study web research, and Opus 4.7 was a regression on BrowseComp compared to Opus 4.6, which has been born out by my usage.
Opus 4.8 is now much better than either 4.7 or 4.6, and having it search the web is one of the primary use cases of chatbots.
reply