Yeah, I'm surprised they released the demo about multi-temporal crop prediction. Their accuracy is, frankly, pretty terrible. It's basically what I managed the first time I tried to run a classifier against the CDL dataset across years.
I'd be really curious if you're willing to expand more on how it has helped with those workflows. Do you copy/paste chunks in and ask it to explain them? Have it try to refactor them and then clean up?
- generate comments (hit or miss, but at least it can rewrite my random notes into consistent notes)
- generate type annotations
- refactor "broadly" (say, "rename all variables to match the following style" or "turn this class into a dataclass like XXX" or "transform the SQL queries into builder queries using XYZ"). Often requires some manual work but it gets a lot of tedious stuff out of the way
- reverse-engineer clean API specs by just pasting in recorded HTTP logs
- clean up logs into proper enums by generating the regexps
- write CLI tools to probe the system (say, CLI tool to exercise the APIs mentioned above)
- generate synthetic test data
- transform HTML garbage into using a modern component system / react
- transform legacy react/js into consistent redux actions
- generate SQL queries at the speed of mouth
>- refactor "broadly" (say, "rename all variables to match the following style" or "turn this class into a dataclass like XXX" or "transform the SQL queries into builder queries using XYZ"). Often requires some manual work but it gets a lot of tedious stuff out of the way
Can you go on more about this, please? This sounds, frankly, heavenly, but the second sentence gives me pause. I guess it's not necessarily a question of how reliably it can "broadly" refactor but rather how broadly "broadly" is meant to be taken...
>- generate SQL queries at the speed of mouth
...and this? I'm not really a database guy, but I do keep hearing from them about how (eg, a database guy's) stateful knowledge of a database can result much, much more efficient queries than eg a sales guy with a query builder. Are the robut's queries more like the former or the latter?
I gave some insights on the SQL thing above. For the refactor broadly, it's useful when I have something that's a bit too squishy for my IDE refactoring tools/multicursor editing/vim macros, but easy enough to do or provide an example for. One thing I mentioned is having consistent variable names.
I would highly recommend taking a piece of code (any code) and then just start experimenting. Here's a few prompt ideas:
- make this a singleton
- use more classes
- use less classes
- create more functions
- use lambdas
- rewrite in a functional pipeline style
- extract higher order types
- use fluent APIs
- use a query builder
- transform to a state machine
- make it async
- add cancellation
- use a work queue
- turn it into a microservice pipeline
- turn it into a text adventure
- create a declarative DSL to simplify the core logic
- list the edge cases
- write unit tests for each edge case
- transform the unit tests into table-driven tests
- create a fuzzing harness
- transform into a REST API
- write a CLI tool
- write a websocket server to stream updates into a graph
- generate a HTML frontend
- add structured logging
- create a CPU architecture to execute this in hardware
- create a config file
- generate test data
- generate a bayesian model to generate test data
- generate a HTML frontend to generate a bayesian model to generate test data and download as a csv
- etc...
If you are not feeling inspired, take a random computer science book, open at a random page, and literally just paste some jargon in there and see what happens. You don't need correct sentences or anything, just random words.
There really is nothing that can go wrong, in the worst case the result is gibberish. The code doesn't even need to build or be correct for it to be useful. These models are trained to be plausible, and even more importantly, self-consistent.
When prompted with code in-context, these things are amazing at figuring out consistent, plausible, elegant, mainstream APIs. Implementing them correctly is something I usually tend to do manually instead of bludgeoning the LLM.
Because of the points this is the nearest to the work that some colleagues do, I anakyze this point (but you could ask similar questions about many of the other points):
In my experience, writing correct SQL queries (which often tend to be quite non-trivial because of the internal complexity of the projects) typically involves a lot of knowledge about the whole system that my colleagues and I work on. Even if I could copy-paste this information, written down once, into the AI chat window:
- I seriously doubt that any of these AI chat bots would be able to generate a remotely decent SQL query based on this information, if only because these SQL queries look really different from what you would see in typical CRUD web applications (for a very instructive example think into the direction of ETL for unifying historically separated lines of business where you often have lots of discussions with the respective colleagues to clear up very subtle details what the code is actually supposed to do in some strange boundary cases that exist because of some historical reasons (which one wants to get rid of))
- even explaining what the SQL query is supposed to do would in my opinion take more time than simply writing it down. Even ignoring the previous point: it is very typical that explaining in sufficient detail what the code is supposed to do would take far more time than simply writing it. A lot of programming work is not writing some scaffolding of some CRUD app or implementing a textbook algorithm.
I find that many of the generative AI models (GPT-4, 3.5, even MPT-30B running on my laptop) are really shockingly good at SQL.
Paste in a query and ask it for a detailed explanation. I've genuinely not seen it NOT provide a good result for that yet.
Generating new SQL queries is a bit harder, because of the context you need to provide - but I've had very strong results from that as well.
I've had the best results from providing both the schema and a couple of example rows from each table - which helps it identify things like "the country column contains abbreviations like US and GB".
If you've found differently I'd love to hear about it.
> Paste in a query and ask it for a detailed explanation. I've genuinely not seen it NOT provide a good result for that yet. [...] If you've found differently I'd love to hear about it.
I have not directly tried it (the employer does not allow AI chatbots for any application intended for production (i.e. more sensitive stuff), but only for doing experiments), but working on the code I very rarely had the problem that I could not understand what some single (SQL) line of code does in the "programming sense".
The central problem that rather occurs often is understanding why this line does exist and why things are implemented the way they are.
Just to give an example: to accelerate some queries, I thought some index would make sense (colleagues principally agreed; it would likely accelerate a particular query that I had in mind). But there exists a good reason why there exists no index at this table (as the respective colleague explained to me). This again implies that for ETL stuff involving particular tables, one should make use of temporary tables where possible instead of JOINs; this is the reason why the code is organized as it is. This is the kind of explanation that I need, which surely no AI can deliver.
Or another example: why does some particular function (1) have a rights check for a "more powerful" role and a related one (2) does not need one? The reason is very interesting: principally having this check (for a "more powerful" role) does not make a lot of sense, but for some very red-tape reasons auditors requested that only a particular group of roles shall be allowed to execute (1), but they were perfectly fine with a much larger group of users being allowed to execute (2). Again something that no AI will be able to answer.
Those two questions require a whole bunch of additional context. Has it ever been written down anywhere, or does it exist only in the heads of members of staff who understand those decisions? If the latter then yeah, there's clearly no way an AI could ever guess those things.
Part of the trick of making good use of LLMs is having a good instinct as to what kind of questions they will be able to ask and what kind of questions they are likely to mess up.
> Explain this SQL query four times: first, as a high level explanation of what it does. Secondly, as some pithy highlights as to clever tricks it uses. Thirdly, as a step-by-step guide to exactly what each piece of the query does. Finally, provide some advice on how the query could be improved.
The high level explanation is, I think, excellent. I wrote that query and I'd forgotten the detail about how it sorts with entries first.
The tips for improvements are the weakest part, since they make assumptions that don't hold for this particular query (for one thing, SQLite doesn't have the ability to run stored procedures).
Write some fake table. Paste it into ChatGPT, then come back to the discussion.
So far you've said that things like "This is the kind of explanation that I need, which surely no AI can deliver." but have not actually tried the system?
As GP asked: have you tried ChatGPT or similar LLMs? If not, go do it .. you may be surprised.
One approach I've had a lot of success with: always ask for multiple options.
In this case I might try a prompt along the lines of:
"Here is the schema for a table: (schema here). Here is a query that runs against it: (query here). Provide several suggestions for potential indexes that might speed up the query, and for each of those suggestions provide several hypothetical reasons that the index might be a bad idea."
I approach things a bit obliquely. I create a custom made DSL (starting from scratch in each conversation, often) that allows me to model my query the way I want. Then, I write a traditional SQL builder on that DSL (or more like, ask GPT to do it for me). Then, I generate DSL statements that match my current domain, and more importantly, modify existing ones.
So, at each step, I do almost trivial transformations.
One key ingredient is that the DSL should include many "description" fields that incorporate english language, because that helps the model "understand" what the terser DSL fields are for.
Straight SQL is a crapshoot, and as you said, more often than not, either obviously or subtly broken or for another database. Which makes sense, considering how much different flavors of SQL it has in its training corpus and how much crappy SQL is out there anyway.
Another thing that helps is use extremely specific "jargon" for the domain you want to write queries for. Asking for "accrual revenue" and "yoy avg customer value" (yes, yoy, not year over year) often tends to bring back much higher quality than just asking for "revenue" or "customer value".
Are you testing chatgpts output in any way? I’ve considered using it for tasks but after hearing all the talk of how it can write good looking code that ends up not working as you might expect, I started wondering if the time savings from generating that block are wasted from interpreting and testing.
eh, I think it's arguable here that making a good experience here would really help cement people swapping over. Then again unclear if twitter is going to remove the super low limit, so people may abandon it anyway.
That's clearly not enough. Tech executives are doing too many drugs and it's leading to this kind of behavior, it's a cultural problem. They need fewer broken families and more time spent in church. And these rotten ones they need to be under constant watch by strict, controlling wardens.
Have you considered that the media has jumped to a framing of tech-on-tech crime at the behest of the powerful Big Homeless lobby? When was the last time you saw anyone but a homeless person sleeping in newspapers.
This exact problem happened with an optic in my lab in graduate school. For two years the senior grad student and postdoc blamed each other over the entire apparatus becoming misaligned every couple of days. (It was a really toxic environment.) Eventually, they both left, I was the only one there, and it still became misaligned. In one day I tracked it down to a prism from Thorlabs whose glue had gone bad positioned at the very beginning of the laser line- it was sliding in its mount.
I wish I had pushed more strongly about it. We spent probably a full person-day of work every week on that.
Reminds me of that giant pager outage ~ 20 years back. I remember one of the stories mentioned a woman who was going to leave her husband because he wasn't answering her page.
Well that's just abusive, really (to threaten to leave someone for not being in minute by minute contact with them/clear signs of an abuser power tripping).
Later in my career, I worked for a company whose principal technical strength was that they knew how to glue optics together in such a way that they NEVER moved, either thermally, or from shock. Detachment? The optic would break in an area besides the glue joint first. And the solution had little to do with the nature of which glue, which however was also optimized. These assemblies were flown in space, landed on the moon, and were in all U.S attack helicopters.
Ceres Imaging (Aerial and satellite imagery analytics for farming), Convoy (Trucking load auctions), etc. There are plenty of companies doing very real work that need this kind of heavy numeric lifting.
Very cool examples. Thank you for sharing. I'm going to read into them. I'm not familiar with any web companies using this technology so it'll be interesting to dig in.
Except that's not what the study says. Quoting a comment below, "The linked study (and the Merten's study it's built ontop of) classifies defaults as "structural" interventions. In the linked meta-analysis, after stripping out estimates of publication basis, structural interventions have the most "robust" evidence left (95% CI of 0.00-0.43)"
I took it when I was an undergrad (that was ten years ago, but I hope the spirit remains the same) as my first CS class. It was very tough given my preparation, but the thing I appreciated was that it gave a very general introduction to many high-level concepts in fairly good detail. I felt I had a solid starting point in many things- programming paradigms, understanding data abstractions, levels of abstraction in program design, etc that I don't always see replicated in other introductory material.
This is a terrible idea. Paper ballots work, are secure, and the processes are well understood. The value an attacker could garner via control over elections is enormous, so there's a big incentive to do so.