Kinda hard to right a test that a value that is null-checked, when that value may never actually be returned.
For example, have a C function that reads in a file and returns you a string? The string can be checked that malloc actually succeeded, but how do you check that the file actually opened?
Every time you call fopen, you need to do a null check. Every single time. You also need to check that fclose is there to match the call, every time.
Writing a test for that, when it is generally just a call within the function you want to test, isn't really possible. It's not there in the arguments, or the return value, of the function.
How do you check for the right checks in a function expected to do something like this:
At least in my experience, possibly due to context limitations, or just architecture, SOTA LLMs aren't particularly good at iterating as they tend to loop back around to similar results with bad logic / errors
I've tried these LLM "code from test" things (and vice-versa) dozens of times over the last couple of years... they're not even close to approaching being practical.
Why? It will evolve into a slightly higher level language where the compiler is an ML model. Was it a tragedy when developers mostly didn’t have to write assembly any more?
I think it's different... I like high level languages, but this is not a programming language, this is a technique for writing tests in an existing language and leaving the implementation to the AI.
I like programming for problem solving, I don't really like writing tests, but that's personal taste, a lot of people like to just use PowerPoint and Jira and tell others what they need to implement, but these people are not software developers.
> Was it a tragedy when developers mostly didn’t have to write assembly any more?
It wasn't, but for starters compilers have always been generally deterministic.
I'm not saying that this is completely useless (I personally think code completion tools such as GitHub CoPilot are fantastic), but it is still early to compare it to a compiler.
I appreciate that your workflow is so linear.
I often write tests, then the implementation, then I realize that the tests need to be corrected, then I change the implementation, then I change the tests, then I add other tests etc... etc...
I don't really like maintaining tests, it's often a lot of code that needs to be understood and changed carefully
Really it's just validator code instead of feature code. I think this is the only realistic way forward for production level code written by AI, don't ask it to write code - ask it to pass your validation tests.
Essentially, everyone becomes a red team member trying to think of clever ways they can outwit the AI's code which I for one think this is going to be a lot of fun in the future - though we're still quite a way from there yet!
Arguably we write instructions: instead of writing out the problem and what the solution looks like, we describe a set of steps we go through—and if those steps are incorrect, there's nothing to compare against, because that was what we called the "specification".
Whether there's a difference there is in the eye of the beholder, but it does look like that specification languages such as TLA+/PlusCal/Squint or Alloy, or theorem proving languages like Coq (to be renamed Rocq) or Lean look a lot different from the likes of C, JavaScript or even Haskell.
Can you elaborate on why you think it won't? That might be more valuable use of everyone's time instead of the binary X will / won't happen stance comment.