I use LLMs extensively in my field to automate all sorts of tasks. Need to class...

JBorrow · on June 26, 2024

Boy, I can’t wait for the foundation of my house to disappear because the LLM mis-classified a drilling report as non-hazardous.

What’s the deal here with liability and accountability? That’s a serious problem when considering using these for anything other than toy problems.

bongodongobob · on June 26, 2024

You don't actually think the LLM is reviewing those 30k documents do you? You tell it to write a program (which is easy to audit) to pull the info from the PDFs or whatever. I don't get why this crowd is so goddamn unimaginative with LLMs.

svieira · on June 26, 2024

> You tell it to write a program (which is easy to audit) to pull the info from the PDFs

Wherein you discover that unless you ask it to consider the fact that PDFs are ... very hard to parse [1] [2] you get something that misses whole blocks of text or turns them into something they aren't and the rest of the program misses chunks of the document.

[1]: https://news.ycombinator.com/item?id=22473263 [2]: https://web.archive.org/web/20200303102734/https://www.filin...

bongodongobob · on June 27, 2024

Why are you expecting they are all very different? They're all likely very similar.

svieira · on June 27, 2024

Because presuming that all of them are produced by the same utility is a _presumption_. They could be - but they could also be produced by many different vendors using many different methods all of them simply conforming to the specification "a PDF with HIGH LEVEL DESCRIPTION OF THE DATA".

goatlover · on June 26, 2024

Because I've heard of enough lazy uses of LLMs to be suspicious. Auditing the program means being sure that the info pulled from those documents is reviewed properly. Also, a complete lack of regard for other people's privacy.

bongodongobob · on June 27, 2024

No idea where privacy enters in here.

jakderrida · on June 26, 2024

>Boy, I can’t wait for the foundation of my house to disappear because the LLM mis-classified a drilling report as non-hazardous

LMAO! It's so hilarious that people like you forget that the alternative is relying on bureaucracies managed by people that get things wrong more often and are both too lazy and too stubborn to process your application to review your drilling report again.

If using both human-level and AI-level analysis is cheaper and much more accurate (but still imperfect), I'm willing to settle for a better system than oppose all change and die holding out for a perfect system.

JBorrow · on June 26, 2024

What are 'people like me'? It's not like I know nothing about large language models, I just think using them for civil engineering is a bad idea...

AlotOfReading · on June 26, 2024

One thing I've struggled with while applying LLMs to business problems is how others have dealt with identifying and managing system failures.

Let's say some of your drilling reports contain a pattern that indicates balrog activity, which the LLM misses. The legal or insurance context requires you to monitor and address potential balrog activity. How do you plan for these failures?

In almost every case I've seen, the plan is to not have a plan, which is another way of saying that the data doesn't matter so long as no one complains about the results.

thomashop · on June 26, 2024

Same way you manage human failures?

AlotOfReading · on June 26, 2024

The way we manage human failures are with rules, checklists, and accountability. LLMs struggle with all of these, and I get the sense that spending 6mos to develop long lists of rules isn't what the parent comment has in mind with "just write a prompt"

noodlesUK · on June 26, 2024

I think that for low-risk classification tasks and similar, something like an LLM is a great tool, and I can absolutely see it being extremely useful for intelligence work where sifting through stuff is very hard. However, I would not at all trust AI to make actually important decisions independently.

ablation · on June 26, 2024

A genuine question and not meant as a snipe: as hallucinations are an inherent “feature” of LLMs, how can you be sure of the accuracy of the model’s interpretation of those 30,000 drilling report hazards? Or what is the acceptable level of risk?

bongodongobob · on June 26, 2024

You have it write a program to analyze it. I think a lot of people fail to understand that you don't always need the LLM to do the thing, have it write a program to do the thing for you.

xorcist · on June 26, 2024

That's not very likely to succeed, is it? LLMs can do a lot of things, but writing software that not only parses semi-proprietary file formats but also analyze unstructured data sounds more than little bit far fetched. I'd be impressed if just the first, and by far the easiest, part of that can be accomplished.

bongodongobob · on June 27, 2024

It's extremely likely to succeed because there is a documented format. I can't believe how pessimistic this site is about this stuff. Yeah, you're not going to one shot it with a prompt. If that's your expectation, you're confused.

xorcist · on July 7, 2024

Give it a go, then! No one would be more happy than me if you would prove me wrong.

Until then, I'd have to side with said pessimists here.

sobellian · on June 26, 2024

Okay, but you still need to debug the program. If your program must give correct results you still need to check the program output against every case. There's no free lunch there.

energy123 · on June 26, 2024

Speaking generally: The program doesn't always have to give correct results. The program just needs to reduce 30k documents down to 200 documents for human review.

You're comparing LLMs to a hypothetical alternative where a human reviews all 30k documents in detail. But the real alternative is often just a worse quality sieve where more errors blunder their way through the existing flawed processes. LLMs can improve on that.

sobellian · on June 26, 2024

The epistemology problem never goes away. How should I have any confidence that it's correctly flagging things for review? I need to go through 28800 documents to see if it missed anything.

You're right, I am comparing it to that alternative. There are fields and applications where this is necessary. I do not know if drilling reports are one of them. If you can tolerate a large false negative rate then great. But if you need to be catching 99.99% of problems then IMO you should at least be able to show your work. Taking black box output and throwing it over the wall sounds so sketchy in engineering contexts.

energy123 · on June 26, 2024

You can't have confidence, but my point is you often don't need confidence. All you need is an improvement on the flawed status quo.

bongodongobob · on June 26, 2024

Yeah I mean I had to move some big folders from server to server last week, maybe about 400. It was too random to script (would take longer to write the script) and I, as a human, doing it manually, still fucked up about 10%. 30k to 200 is exactly the stuff I'm talking about. The other people's existential dread is showing in this thread.

bongodongobob · on June 26, 2024

You're right. That's why to be sure I don't use software. All paper and pencil. So I can be sure. I have no idea what your point is.

sobellian · on June 26, 2024

I'm fine with writing software. I do so for a living. Usually when I'm responsible for a piece of software being correct, I'm the one who wrote it and not a black box. I use AI to autocomplete my code all the time and it very frequently suggests the wrong thing and attempts to insert random bugs.

So if my ass was on the line for the output of an AI-written program being correct for 30k cases of parsing unstructured or mixed data I would be extremely careful. That is my point.

bongodongobob · on June 27, 2024

Autocomplete is not in the same ballpark as intentionally prompting software.

sobellian · on June 27, 2024

Both processes produce bugs. And at any rate, LLMs are our best model for reading unstructured text. What program could an LLM possibly produce to read thousands of comments in natural language that would outperform, well, an LLM?

thomashop · on June 26, 2024

How can you be sure with humans doing the work?

wkat4242 · on June 26, 2024

That's where the law comes in. You can prosecute a human for negligence. What about an AI?

criddell · on June 26, 2024

Would you trust your LLM to file your taxes for you?

thomashop · on June 26, 2024

Yes because without an LLM I don't do it.

goatlover · on June 26, 2024

How did you do your taxes a couple years ago?

thomashop · on June 26, 2024

I never did. Had to pay fines.

goatlover · on June 26, 2024

I hope with all the time and money being saved, you're having humans check the results.

wkat4242 · on June 26, 2024

Yes but that is one of those niche tasks I meant.

Once again they are selling it like something that's for everyone right now. This is the problem. THe same with the metaverse. It has some really great usecases, but they made it out like next year we would all ditch our phones and work exclusively in a VR headset. Obviously that didn't happen, as the tech was nowhere near that and probably people don't want it either.

Also, if you really need to be sure that those 30.000 drilling reports really didn't contain any hazards, you still have to go through it all yourself. Don't forget LLMs aren't reproducible.

But no, my point was exactly that it's not just hype. There are genuine useful usecases, I totally agree.

As there were for metaverse, and probably even for blockchain (NFT not so sure tho :) I always thought they were really a solution looking for a problem). The key thing about a hype is that they overblow the potential benefits way too much though. I see this happening here once again.