This proposal is needlessly complicated. All Python has to do is put all interpr...

smegel · on July 10, 2015

> All Python has to do is put all interpreter state into a single struct to allow for completely independent interpreter instances.

But that is not what threading means. If you want "completely independent interpreters" just fork. Threading means you share everything.

> Then the GIL can be discarded altogether

And then you start sharing variables between interpreters, and you gets lots of little baby GILs to contend with! How fun will that be? Not very.

Frondo · on July 10, 2015

That's how Tcl does it. Each thread gets its own Tcl interpreter, and you communicate among them via message passing. It lets you avoid the context switch that way.

frozenport · on July 10, 2015

There is a difference between threads and processes. What you outline is how the python multiprocessing library works.

sitkack · on July 10, 2015

Think of it like running multiple python processes inside the same memory space so that copying between interpreters is by reference, not value. Nothing has to be serialized.

onnoonno · on July 10, 2015

How do you avoid damaging the consistency of Python objects then?

I think the only way to do that would be to have a reentrant lock for each object individually. Isn't that potentially a big performance issue?

I think that is the main reason behind this quote from the article:

> Several attempts have been made over the years and failed to do it without sacrificing single-threaded performance.

JoshTriplett · on July 10, 2015

Interpreter state isn't the only problem; if that were the only issue, it'd be a more tractable problem.

However, built on top of CPython's C API are a huge pile of libraries that assume they can manipulate global or per-data-structure state without locking.

viraptor · on July 10, 2015

While technically correct, it's not as bad as explained here. They don't assume they can manipulate global state. They are guaranteed that it's right because native calls happen with GIL in place. That's an existing and real contract. If you don't want the lock in your library, you can release it. But your native code will always start in GIL-locked state.

zabbadabba · on July 10, 2015

Avoiding fixing the C API for the sake of backwards compatibility is the reason for the GIL mess. Python 3 had a chance to correct the API situation since they were starting anew, but it didn't happen. All these API half measures are just dancing around the real problem of not having truly independent interpreter instances.

pekk · on July 10, 2015

Python 3 wasn't really starting anew, they were also balancing the ongoing screaming about every difference from Python 2.

zabbadabba · on July 10, 2015

It is curious how they completely revamped Python making most Python3 code incompatible with Python2 and yet they wanted to preserve C API "compatibility". Compatibility with what? Most Python3 modules had to be rewritten anyway - it was the perfect opportunity to fix the C API and get rid of the GIL once and for all.

ericsnow · on July 10, 2015

you forgot your <hand-wavy gross exaggeration> tags

frozenport · on July 10, 2015

I don't understand. Can you give an example of a library that has a global resource that isn't thread safe?

Also how is this different then problems faced when using multiprocessing style parallelism with the same library?

JoshTriplett · on July 10, 2015

A few random examples, all from in-tree Python modules that are completely typical examples of what you'd find out-of-tree as well:

The readline module uses libreadline, which maintains internal global state on the C side.

The standard library has lists and dictionaries, which are not thread-safe to access from two threads simultaneously, and are not in any way locked.

exDM69 · on July 10, 2015

> The standard library has lists and dictionaries, which are not thread-safe to access from two threads simultaneously, and are not in any way locked.

Locking lists and dicts, etc would not be a reasonable thing to do. There must be some granularity in synchronization, it's clearly not a good idea to attempt to allow mutating any object from multiple threads without explicit locking.

Things like readline are issues that programmers have to deal with when writing multi threaded code in any language. There are libs which are not thread safe or which have global state or need special handling with threads (e.g. OpenGL). That's no reason to disallow threading completely.

However, this might be a big culture shock issue if Python were to suddenly start supporting proper concurrency without GIL. None of the libraries document whether they are thread safe or not, so there would be surprises ahead.

Erwin · on July 10, 2015

Locking the basic data structures seems like it would destroy Python's simplicity. How do you determine the scope of the per-list lock? Do you add a lock to every single data structure that you are required to grab before reading or modifying it? If you require the programmer to do it, you can easily get race conditions which will only show up once in a blue moon causing incorrect results or corruption at worst - something Python's GIL guaranteed not to happen (as long as you didn't do anything risky in your C extension). Is there any solution that doesn't use purely functional data structures?

JoshTriplett · on July 10, 2015

That's exactly why the approach proposed in this article potentially makes sense: run independent interpreters in different threads, and make all data sharing explicit. Then, anything shared between threads will need to know how to handle that, and anything that doesn't advertise that it's safe to share gets a lock around its use.

srean · on July 10, 2015

That's more or less how Tcl does it. They have had this support since ever. One coordinated by passing messages to the different Tcl interpreters, each running in their own thread.

mappu · on July 10, 2015

PHP did this (conditionally, at compile-time) with its ZTS builds, to allow multiple PHP runtimes in the same memory space. There is a not-insignificant performance loss from running this way, and so it's just not done, at least on Linux and similar scenarios where multiprocessing is feasible instead.

Although PHP7 does have some workarounds to improve the performance of this mode, involving implementing thread-local storage on certain operating systems..

comex · on July 10, 2015

That would help, but in the chunk of the thread I read, most of the debate was about what to do after that: specifically (a) avoiding the cost of starting up a new interpreter, especially module import work, and (b) possibly taking advantage of the shared address space to be able to share some immutable objects somehow, rather than copying.

sitkack · on July 10, 2015

The Lua interpreter interface is so much cleaner, the number of globals in CPython is mind numbing. I looked into for a quick hack, not possible. At least 100+ hrs of work to put the globals inside a struct.

trentnelson · on July 10, 2015

For PyParallel I just altered the decl for global variables to be TLS static, then tweaked initialization, and voila, thread safe globals.

E.g. https://bitbucket.org/tpn/pyparallel/src/3be2954508f9938b85a...

zabbadabba · on July 10, 2015

The problem is that thread local storage is slow. It's an extra layer of indirection. Also, thread safety is just one concern in the GIL debate. One should be able to support many low-overhead independent interpreter instances on a single thread rather than relying on an arcane interpreter context switching scheme as presently exists in Python 2 and 3.

trentnelson · on July 10, 2015

> The problem is that thread local storage is slow.

It's slower than accessing a struct, sure:

    5982:     if (ctx) {
    000000001E1A808B 8B 0D FF 7C 28 00          mov         ecx,dword ptr [_tls_index (1E42FD90h)]  
    000000001E1A8091 BA 80 29 00 00             mov         edx,2980h  
    000000001E1A8096 48 89 83 90 02 00 00       mov         qword ptr [rbx+290h],rax  
    000000001E1A809D 65 48 8B 04 25 58 00 00 00 mov         rax,qword ptr gs:[58h]  
    000000001E1A80A6 48 8B 04 C8                mov         rax,qword ptr [rax+rcx*8]  
    000000001E1A80AA 48 8B 0C 02                mov         rcx,qword ptr [rdx+rax]  
    000000001E1A80AE 48 85 C9                   test        rcx,rcx  
    000000001E1A80B1 74 70                      je          new_context+253h (1E1A8123h)

But think of the big picture: I use TLS everywhere for PyParallel, and PyParallel has awesome performance, so eh, net win.

> One should be able to support many low-overhead independent interpreter instances on a single thread

Why? What problem does that solve?

sitkack · on July 10, 2015

One use case, totally independent of parallelism is that if someone say embeds Python inside of Postgres or some other large system, I cannot also, in my extension, also embed Python. First one wins and now our systems have to agree. In Lua, you allocate an interpreter context and use that. There can be any number of Lua interpreters embedded in a large system without conflicting.

The use of the globals in the CPython interpreter are a fairly large design mistake that has prevented Python from having as much reach as other systems. The whole embedding vs extending debate is because of those globals and that CPython has been historically difficult to embed properly. Lua on the other hand, is easy to both embed and extend.

I'd love to use `cffi` to embed Python within itself, I cannot do that.

zabbadabba · on July 10, 2015

A one time cost of 100 hours of work to put the globals into an interpreter struct and fix the C API to have a PythonInterpreter* as the first argument to every function would be a very small price to pay compared to the decade-long debates and hacky workarounds surrounding the bad GIL design.

ericsnow · on July 10, 2015

One side effect of Nick Coghlan's PEP 432 (interpreter startup) is that is makes the effort of consolidating global state (out of static variables) easier. His implementation is progressing. My proposed subinterpreter-based project will likely involve the effort of pulling all remaining global interpreter state into the interpreter struct.

As to supporting multiple truly independent Python interpreters in the same process, I'm not clear on what the buys you over subinterpreters. In fact, with subinterpreters you get the benefit of the main interpreter handling the C runtime global state (env vars, command line args, the standard streams, etc.). With truly independent interpreters that's more complicated.

While it's not quite what you described, if you were to suggest that subinterpreters be made even more isolated from one another than they already are then I'd agree. :) It sounds like that's really what you're after: better isolation between interpreters in a single process.

sitkack · on July 10, 2015

Maybe now that Python 2.7 dev is slowing down the change could be done. But historically, good ideas don't necessarily get adopted in CPython. Stackless was never allowed in, and I'd say it is pretty compelling fork, and actually one that would help with this exact problem. So, unless the change was blessed from before the work commenced, I am not