By the time a junior dev graduates to senior, I expect that they'll be more reliable. In fact, at the end of each project, I expect the junior dev to have grown more reliable.
LLMs don't learn from a project. At best, you learn how to better use the LLM.
They do have other benefits, of course, i.e. once you have trained one generation of Claude, you have as many instances as you need, something that isn't true with human beings. Whether that makes up for the lack of quality is an open question, which presumably depends on the projects.
How long do you think that will remain true? I've bootstrapped some workflows with Claude Code where it writes a markdown file at the end of each session for its own reference in later sessions. It worked pretty well. I assume other people are developing similar memory systems that will be more useful and robust than anything I could hack together.
For LLMs? Mostly permanently. This is a limitation of the architecture. Yes, there are workarounds, including ChatGPT's "memory" or your technique (which I believe are mostly equivalent), but they are limited, slow and expensive.
Many of the inventors of LLMs have moved on to (what they believe are) better models that would handle such learnings much better. I guess we'll see in 10-20 years if they have succeeded.
I'm not sure that's generally true. However, older people have a track record, and a reliable older person is likely to be more reliable than a younger person without such a track record.
Reliable has different meanings. I think in this case the meaning is closer to "deterministic" and "follows instructions." An older worker will more reliably behave the same way twice, and more reliably follow the same set of instructions they've been following throughout their career.
On the other hand, within 120 years (and sometimes decades earlier), every human so far also has reached reach a point where they can't do anything anymore, reliably or otherwise. I don't think humans provide much of a precedent of sustained growth in reliable over anything other than a fixed period, and it's not clear to me that we have any idea whether or not we'll reach a ceiling in how reliable we can make LLMs before they outperform humans in tasks like programming.
Of course LLMs aren't people, but an AGI might behave like a person.