Have you tried this at all? I find that many of the generative AI models (GPT-4,...

aleph_minus_one · on July 1, 2023

> Paste in a query and ask it for a detailed explanation. I've genuinely not seen it NOT provide a good result for that yet. [...] If you've found differently I'd love to hear about it.

I have not directly tried it (the employer does not allow AI chatbots for any application intended for production (i.e. more sensitive stuff), but only for doing experiments), but working on the code I very rarely had the problem that I could not understand what some single (SQL) line of code does in the "programming sense".

The central problem that rather occurs often is understanding why this line does exist and why things are implemented the way they are.

Just to give an example: to accelerate some queries, I thought some index would make sense (colleagues principally agreed; it would likely accelerate a particular query that I had in mind). But there exists a good reason why there exists no index at this table (as the respective colleague explained to me). This again implies that for ETL stuff involving particular tables, one should make use of temporary tables where possible instead of JOINs; this is the reason why the code is organized as it is. This is the kind of explanation that I need, which surely no AI can deliver.

Or another example: why does some particular function (1) have a rights check for a "more powerful" role and a related one (2) does not need one? The reason is very interesting: principally having this check (for a "more powerful" role) does not make a lot of sense, but for some very red-tape reasons auditors requested that only a particular group of roles shall be allowed to execute (1), but they were perfectly fine with a much larger group of users being allowed to execute (2). Again something that no AI will be able to answer.

simonw · on July 2, 2023

Those two questions require a whole bunch of additional context. Has it ever been written down anywhere, or does it exist only in the heads of members of staff who understand those decisions? If the latter then yeah, there's clearly no way an AI could ever guess those things.

Part of the trick of making good use of LLMs is having a good instinct as to what kind of questions they will be able to ask and what kind of questions they are likely to mess up.

As a quick demo, I threw quite a gnarly real-world SQL query from this notebook - https://observablehq.com/@simonw/blog-to-newsletter#sql - into GPT-4 just now and gave it the following prompt:

> Explain this SQL query four times: first, as a high level explanation of what it does. Secondly, as some pithy highlights as to clever tricks it uses. Thirdly, as a step-by-step guide to exactly what each piece of the query does. Finally, provide some advice on how the query could be improved.

Here's what it produced in response: https://chat.openai.com/share/47b2f1c5-6b88-4ff2-b6d1-8f58da...

The high level explanation is, I think, excellent. I wrote that query and I'd forgotten the detail about how it sorts with entries first.

The tips for improvements are the weakest part, since they make assumptions that don't hold for this particular query (for one thing, SQLite doesn't have the ability to run stored procedures).

angrais · on July 1, 2023

Write some fake table. Paste it into ChatGPT, then come back to the discussion.

So far you've said that things like "This is the kind of explanation that I need, which surely no AI can deliver." but have not actually tried the system?

As GP asked: have you tried ChatGPT or similar LLMs? If not, go do it .. you may be surprised.

copperx · on July 1, 2023

> This is the kind of explanation that I need, which surely no AI can deliver.

But have you tried? GPT-4 is great for coming up with hypotheses of why thing are the way they are.

simonw · on July 2, 2023

One approach I've had a lot of success with: always ask for multiple options.

In this case I might try a prompt along the lines of:

"Here is the schema for a table: (schema here). Here is a query that runs against it: (query here). Provide several suggestions for potential indexes that might speed up the query, and for each of those suggestions provide several hypothetical reasons that the index might be a bad idea."

I tried a version of that here, the results look pretty good to me: https://chat.openai.com/share/dc5d7007-cf05-4e5c-89c4-fc669b...