It badly hallucinated in my test. I asked it "Rust crate to access Postgres with Arrow support" and it made up an arrow-postgres crate. It even gave sample Rust code using this fictional crate! Below is its response (code example omitted):
I can recommend a Rust crate for accessing PostgreSQL with Arrow support.
The primary crate you'll want to use is arrow-postgres, which combines the PostgreSQL connectivity of the popular postgres crate with Apache Arrow data format support.
This crate allows you to:
Query PostgreSQL databases using SQL
Return results as Arrow record batches
Use strongly-typed Arrow schemas
Convert between PostgreSQL and Arrow data types efficiently
Are you sure it searched the web? You have to go and turn on the web search feature, and then the interface is a bit different while it's searching. The results will also have links to what it found.
Exactly. An LLM is not a conventional search engine and shouldn't be prompted as if it were one. The difference between "Rust crate to access Postgres with Arrow support" and "What would a hypothetical Rust crate to access Postgres with Arrow support look like?" isn't that profound from the perspective of a language model. You'll get an answer, but it's entirely possible that you'll get the answer to a question that isn't the one you thought you were asking.
Some people aren't very good at using tools. You can usually identify them without much difficulty, because they're the ones blaming the tools.
It's absolutely how LLMs should work, and IME they do. Why write a full question if a search phrase works just as well? Everything in "Could you recommend xyz to me?" except "xyz" is redundant and only useful when you talk to actual humans with actual social norms to observe. (Sure, there used to be a time when LLMs would give better answers if you were polite to them, but I doubt that matters anymore.) Indeed I've been thinking of codifying this by adding a system prompt that says something like "If the user makes a query that looks like a search phrase, phrase your response non-conversationally as well".
Totally agree here. I tried the following and had a very different experience:
"Answer as if you're a senior software engineer giving advice to a less experienced software engineer. I'm looking for a Rust crate to access PostgreSQL with Apache Arrow support. How should I proceed? What are the pluses and minuses of my various options?"
Think about it, how much marginal influence does it really have if you say OP’s version vs a fully formed sentence? The keywords are what gets it in the area.
That is not correct. The keywords mean nothing by themselves. To a transformer model, the relationships between words is where meaning resides. The model wants to answer your prompt with something that makes sense in context, so you have to help it out by providing that context. Feeding it a sentence fragment or a disjoint series of keywords may not have the desired effect.
To mix clichés, "I'm feeling lucky" isn't compatible with "Attention is all you need."
I find that providing more context and details initially leads to far more success for my uses. Once there’s a bit of context, I can start barking terms and commands tersely.
I find more hallucination - like when you're taught as a child to reflect back the question at the start of your answer.
If I am not careful, and "asking the question" in a way that assumes X, often X is assumed by the LLM to be true. ChatGPT has gotten better at correcting this with its web searches.
I am able to get better results with Claude when I ask for answers that include links to the relevant authoritative source of information. But sometimes it still makes up stuff that is not in the source material.
Is this really the case, or is it the case with Claude etc because they've already been prompted to act as an "helpful assistant"? If you take a raw LLM and just type Google search style it might just continue it as a story or something.
It's funny because many people type full sentence questions into search engines too. It's usually a sign of being older and/or not very experienced with computers. One thing about geeks like me is we will always figure out what the bare minimum is (at least for work, I hope everyone has at least a few things they enjoy and don't try to optimise).
It's not about being young or old, search engines have moved away from pure keyword searches and often typing your actual query gives better results than searching for keywords, especially with Google.
Wonder if that's why so many people hate its results lol. It shifted keyword searching to full sentence searching, but many of us didn't follow in the shift.
Well, compare it to the really good answer from Grok (https://x.com/i/grok/share/MMGiwgwSlEhGP6BJzKdtYQaXD) for the same prompt. Also, framing as a question still pointed to the non-existent postgres-arrow with Claude.
That's primarily how i do, though it depends on the search ofc. I use Kagi, though.
I've not yet found much value in the LLM itself. Facts/math/etc are too likely incorrect, i need them to make some attempt at hydrating real information into the response. And linking sources.
This was pretty much my first experience with LLM code generation when these things first came out.
It's still a present issue whenever I go light on prompt details and I _always_ get caught out by it and it _always_ infuriates me.
I'm sure there are endless discussions on front running overconfident false positives and being better at prompting and seeding a project context, but 1-2 years into this world is like 20 in regular space, and it shouldn't be happening any more.
Often times I come up with a prompt, then stick the prompt in an LLM to enhance / identify what I’ve left out, then finally actually execute the prompt.
Cite things from ID based specs. You’re facing a skill issue. The reason most people don’t see it as such is because an LLM doesn’t just “fail to run” here. If this was code you wrote in a compiled language, would you post and say the language infuriates you because it won’t compile your syntax errors? As this kind of dev style becomes prevalent and output expectation adjust, work performance review won’t care that you’re mad. So my advice is:
1. Treat it like regular software dev where you define tasks with ID prefixes for everything, acceptance criteria, exceptions. Ask LLM to reference them in code right before impl code
2. “Debug” by asking the LLM to self reflect on its decision making process that caused the issue - this can give you useful heuristics o use later to further reduce the issues you mentioned.
“It” happening is a result of your lack of time investment into systematically addressing this.
_You_ should have learned this by now. Complain less, learn more.
I can recommend a Rust crate for accessing PostgreSQL with Arrow support. The primary crate you'll want to use is arrow-postgres, which combines the PostgreSQL connectivity of the popular postgres crate with Apache Arrow data format support. This crate allows you to:
Query PostgreSQL databases using SQL Return results as Arrow record batches Use strongly-typed Arrow schemas Convert between PostgreSQL and Arrow data types efficiently