More

smcnally · 2025-01-13T07:08:20 1736752100

> LLMs also love to double down on solutions that don't work.

“Often wrong but never in doubt” is not proprietary to LLMs. It’s off-putting and we want them to be correct and to have humility when they’re wrong. But we should remember LLMs are trained on work created by people, and many of those people have built successful careers being exceedingly confident in solutions that don’t work.

999900000999 · 2025-01-13T14:58:17 1736780297

The issue is LLMs never say:

"I don't know how to do this".

When it comes to programming. Tell me you don't know so I can do something else. I ended up just refactoring my UX to work around it. In this case it's a personal prototype so it's not a big deal.

smcnally · 2025-01-13T18:20:03 1736792403

That is definitely an issue with many LLMs. I've had limited success including instructions like "Don't invent facts" in the system prompt and more success saying "that was not correct. Please answer again and check to ensure your code works before giving it to me" within the context of chats. More success still comes from requesting second opinions from a different model -- e.g. asking Claude's opinion of Qwen's solution.

To the other point, not admitting to gaps in knowledge or experience is also something that people do all the time. "I copied & pasted that from the top answer in Stack Overflow so it must be correct!" is a direct analog.

deltaburnt · 2025-01-13T14:08:36 1736777316

So now you have an overconfident human using an overconfident tool, both of which will end up coding themselves into a corner? Compilers at least, for the most part, offer very definitive feedback that act as guard rails to those overconfident humans.

Also, let's not forget LLMs are a product of the internet and anonymity. Human interaction on the internet is significantly different from in person interaction, where typically people are more humble and less overconfident. If someone at my office acted like some overconfident SO/reddit/HN users I would probably avoid them like the plague.

smcnally · 2025-01-13T18:57:23 1736794643

A compiler in the mix is very helpful. That and other sanity checks wielded by a skilled engineer doing code reviews can provide valuable feedback to other developers and to LLMs. The knowledgeable human in the loop makes the coding process and final products so much better. Two LLMs with tool usage capabilities reviewing the code isn't as good today but is available today.

The LLMs overconfidence is based on it spitting out the most-probable tokens based on its training data and your prompt. When LLMs learn real hubris from actual anonymous internet jackholes, we will have made significant progress toward AGI.

smcnally · 2025-01-13T06:46:32 1736750792

> I'm not sure this is grounded in reality. We've already seen articles related to how OpenAI is behind schedule with GPT-5.

Progress by Google, meta, Microsoft, Qwen and Deepseek is unhampered by OpenAI’s schedule. Their latest — including Gemini 2.0, Llama 3.3, Phi 4 — and the coding fine tunes that follow are all pretty good.

sdesol · 2025-01-13T07:12:37 1736752357

> unhampered by OpenAI’s schedule

Sure, but if the advancements are to catch up to OpenAI, then major improvements by other vendors are nice and all, but I don't believe that was what the commenter was implying. Right now the leaders in my opinion are OpenAI and Anthropic and unless they are making major improvements every few months, the industry as a whole is not making major improvements.

smcnally · 2025-01-13T20:04:19 1736798659

OpenAI and Anthropic are definitely among the leaders. Playing catch-up to these leaders' mind-share and technology is some of the motivation for others. Calling the progress being made in the space by Google (Gemini), MSFT (Phi), Meta (llama), Alibaba (Qwen) "nice and all" is a position you might be pleasantly surprised to reconsider if this technology interests you. And don't sleep on Apple and AMZ -

In the space covered by Tabby, Copilot, aider, Continue and others, capabilities continue to improve considerably month-over-month.

In the segments of the industry I care most about, I agree 100% with what the commenter said w/r/t expecting major improvements every few months. Pay even passing attention to huggingface and github and see work being done by indies as well as corporate behemoths happening at breakneck pace. Some work is pushing the SOTA. Some is making the SOTA more widely available. Lots of it is different approaches to solving similar challenges. Most of it benefits consumers and creators looking use and learn from all of this.

smcnally · 2025-01-13T06:34:04 1736750044

A deeper version of the same idea is to ask a second model to check the first model’s answers. aider’s “architect” is an automated version of this approach.

https://aider.chat/docs/usage/modes.html#architect-mode-and-...

smcnally · 2025-01-01T23:04:10 1735772650

for x86 or PowerPC?

daemonhorn · 2025-01-01T23:26:48 1735774008

Talking about rabbit-holes. I used to have prototype OS/2 PowerPC 64-bit hardware from IBM before they killed the project. I should have kept that early EFI-based system. When the EFI boot sequence would panic, you would get an error message of "Danger Will Robinson".

UltraSane · 2025-01-02T02:47:13 1735786033

man OS/2 Warp on PowerPC should be really secure because no one is writing malware for that combination!

smcnally · 2024-12-25T19:08:02 1735153682

From a vision PoV, QvQ has done very impressive work analyzing a photo of 0) celebrities, 1) a dog, and 2) a cartoon from The New Yorker. Other models have issues with one or more of these.

https://github.com/smcnally/smcnally.github.io/blob/main/IMG...

smcnally · 2024-12-25T18:48:44 1735152524

Restating the question to start the response is good practice for Actual and Artificial Intelligence.

smcnally · 2024-12-25T01:19:58 1735089598

Hoarder has a chrome plugin, Firefox addon, and apps for Android and iOS for which the app store says something I’ve never seen before:

“Data Not Collected — The developer does not collect any data from this app.”

doctaj · 2024-12-25T01:58:08 1735091888

Noice.

smcnally · 2024-12-17T19:48:38 1734464918

Among the frightening parts of that is the time and human lives it has taken between Boeing destroying its engineering culture and starting to pay the price for that destruction.

RF_Savage · 2024-12-17T20:01:38 1734465698

Same with Intel. And once the culture is rotten, things will be very hard to change.

smcnally · 2024-12-17T19:38:52 1734464332

> the former chief creative officer of Leo Burnett US, recalls one agency trying to lure her in the ’90s with a stunning oceanfront house in Rye, N.Y.

That’s quite creative or a great troll: Rye, NY has no oceanfront.

It does have a Playland on the Long Island Sound.

jareklupinski · 2024-12-17T20:06:31 1734465991

it was a package deal

comes with a former Swiss Navy vessel

smcnally · 2024-12-17T19:10:50 1734462650

“Banned” means someone else decides what’s considered advertising. Does it include Monday Night Football brought to you by MSFT Surface? The MLB World Series powered by GOOG? Or Pepsi’s American Idol?

Stop buying things from advertisers and it will die on its own.

Fargren · 2024-12-18T17:44:03 1734543843

"Vote with your wallet" works for voting for things, but it has an abysmal track record when it comes to voting against things. It only works in the case organized boycotts, and only for a vanishing minority of those.