This proposal is needlessly complicated. All Python has to do is put all interpreter state into a single struct to allow for completely independent interpreter instances. Then the GIL can be discarded altogether. And finally they have to change the C API to include a PythonInterpreter* as the first parameter rather than relying on the thread-local data hack.
That's how Tcl does it. Each thread gets its own Tcl interpreter, and you communicate among them via message passing. It lets you avoid the context switch that way.
Think of it like running multiple python processes inside the same memory space so that copying between interpreters is by reference, not value. Nothing has to be serialized.
Interpreter state isn't the only problem; if that were the only issue, it'd be a more tractable problem.
However, built on top of CPython's C API are a huge pile of libraries that assume they can manipulate global or per-data-structure state without locking.
While technically correct, it's not as bad as explained here. They don't assume they can manipulate global state. They are guaranteed that it's right because native calls happen with GIL in place. That's an existing and real contract. If you don't want the lock in your library, you can release it. But your native code will always start in GIL-locked state.
Avoiding fixing the C API for the sake of backwards compatibility is the reason for the GIL mess. Python 3 had a chance to correct the API situation since they were starting anew, but it didn't happen. All these API half measures are just dancing around the real problem of not having truly independent interpreter instances.
It is curious how they completely revamped Python making most Python3 code incompatible with Python2 and yet they wanted to preserve C API "compatibility". Compatibility with what? Most Python3 modules had to be rewritten anyway - it was the perfect opportunity to fix the C API and get rid of the GIL once and for all.
> The standard library has lists and dictionaries, which are not thread-safe to access from two threads simultaneously, and are not in any way locked.
Locking lists and dicts, etc would not be a reasonable thing to do. There must be some granularity in synchronization, it's clearly not a good idea to attempt to allow mutating any object from multiple threads without explicit locking.
Things like readline are issues that programmers have to deal with when writing multi threaded code in any language. There are libs which are not thread safe or which have global state or need special handling with threads (e.g. OpenGL). That's no reason to disallow threading completely.
However, this might be a big culture shock issue if Python were to suddenly start supporting proper concurrency without GIL. None of the libraries document whether they are thread safe or not, so there would be surprises ahead.
Locking the basic data structures seems like it would destroy Python's simplicity. How do you determine the scope of the per-list lock? Do you add a lock to every single data structure that you are required to grab before reading or modifying it? If you require the programmer to do it, you can easily get race conditions which will only show up once in a blue moon causing incorrect results or corruption at worst - something Python's GIL guaranteed not to happen (as long as you didn't do anything risky in your C extension). Is there any solution that doesn't use purely functional data structures?
That's exactly why the approach proposed in this article potentially makes sense: run independent interpreters in different threads, and make all data sharing explicit. Then, anything shared between threads will need to know how to handle that, and anything that doesn't advertise that it's safe to share gets a lock around its use.
That's more or less how Tcl does it. They have had this support since ever. One coordinated by passing messages to the different Tcl interpreters, each running in their own thread.
PHP did this (conditionally, at compile-time) with its ZTS builds, to allow multiple PHP runtimes in the same memory space. There is a not-insignificant performance loss from running this way, and so it's just not done, at least on Linux and similar scenarios where multiprocessing is feasible instead.
Although PHP7 does have some workarounds to improve the performance of this mode, involving implementing thread-local storage on certain operating systems..
That would help, but in the chunk of the thread I read, most of the debate was about what to do after that: specifically (a) avoiding the cost of starting up a new interpreter, especially module import work, and (b) possibly taking advantage of the shared address space to be able to share some immutable objects somehow, rather than copying.
The Lua interpreter interface is so much cleaner, the number of globals in CPython is mind numbing. I looked into for a quick hack, not possible. At least 100+ hrs of work to put the globals inside a struct.
The problem is that thread local storage is slow. It's an extra layer of indirection. Also, thread safety is just one concern in the GIL debate. One should be able to support many low-overhead independent interpreter instances on a single thread rather than relying on an arcane interpreter context switching scheme as presently exists in Python 2 and 3.
One use case, totally independent of parallelism is that if someone say embeds Python inside of Postgres or some other large system, I cannot also, in my extension, also embed Python. First one wins and now our systems have to agree. In Lua, you allocate an interpreter context and use that. There can be any number of Lua interpreters embedded in a large system without conflicting.
The use of the globals in the CPython interpreter are a fairly large design mistake that has prevented Python from having as much reach as other systems. The whole embedding vs extending debate is because of those globals and that CPython has been historically difficult to embed properly. Lua on the other hand, is easy to both embed and extend.
I'd love to use `cffi` to embed Python within itself, I cannot do that.
A one time cost of 100 hours of work to put the globals into an interpreter struct and fix the C API to have a PythonInterpreter* as the first argument to every function would be a very small price to pay compared to the decade-long debates and hacky workarounds surrounding the bad GIL design.
One side effect of Nick Coghlan's PEP 432 (interpreter startup) is that is makes the effort of consolidating global state (out of static variables) easier. His implementation is progressing. My proposed subinterpreter-based project will likely involve the effort of pulling all remaining global interpreter state into the interpreter struct.
As to supporting multiple truly independent Python interpreters in the same process, I'm not clear on what the buys you over subinterpreters. In fact, with subinterpreters you get the benefit of the main interpreter handling the C runtime global state (env vars, command line args, the standard streams, etc.). With truly independent interpreters that's more complicated.
While it's not quite what you described, if you were to suggest that subinterpreters be made even more isolated from one another than they already are then I'd agree. :) It sounds like that's really what you're after: better isolation between interpreters in a single process.
Maybe now that Python 2.7 dev is slowing down the change could be done. But historically, good ideas don't necessarily get adopted in CPython. Stackless was never allowed in, and I'd say it is pretty compelling fork, and actually one that would help with this exact problem. So, unless the change was blessed from before the work commenced, I am not