*...the standard model of threading programming that real-world programmers have...

dmbaggett · on June 29, 2012

Python threads are not what you are used to. That's pretty much TL;DR of your comment.

Correct. I am used to programming language threads that work the way computer scientists and programmers have typically described them -- for example, as in this (I hope uncontroversial) Wikipedia article:

http://en.wikipedia.org/wiki/Thread_(computer_science)

When I say "standard model of threading", I am not talking about nuances of call conventions to the underlying OS thread primitives. I am talking simply about running multiple streams of instructions, bytecodes, or other units of computation in parallel, within a single OS process.

haberman · on June 29, 2012

That Wikipedia article defines threads in terms of operating systems. Only one small part of that article concerns how threads are exposed to programming languages.

You can't talk about running multiple streams of instructions or bytecodes in parallel without talking about the nuances of how they share memory. Semantics of a multithreaded memory model are a highly "opinionated" thing -- there are lots of possible ways to define it, and the definition can have widespread effects on efficiency, ease of programming, and the guarantees that the runtime can provide. For example, an important aspect of a Python memory model would be that no Python program can SEGV the interpreter due to a race condition.

I recommend the following reading to get an appreciation for how much really goes into a memory model and how far from "simple" or "standard" it is:

  http://en.wikipedia.org/wiki/Memory_model_(computing)
  http://en.wikipedia.org/wiki/Java_Memory_Model
  http://www.kernel.org/doc/Documentation/memory-barriers.txt

Python is a lot harder to define a good memory model for than say Java, because in Python lists and dictionaries are primitive objects. If you say:

  x['A'] = 1

...that is a single operation that must not corrupt the dictionary, even if multiple concurrent threads are mutating it. In practice, this means that you need to either make every such mutation wrapped in a lock (which adds a lot of locking overhead) or you need to use lock-free data structures (which are still relatively experimental and architecture-specific).

dmbaggett · on June 29, 2012

I agree that a so-called dynamic language like Python is at something of a disadvantage because it must make atomicity guarantees that lower-level languages like C need not.

I still don't think it's reasonable to conclude that typical programmers are fine with their threads not really running in parallel, or that the GIL isn't worth bothering to fix, even though fixing it would be hard. In my original post yesterday, I pointed out that as the language footprint has grown, Python's disadvantage in this respect has increased: it is much harder to remove the GIL now than it was in, say, the 1.5 era when there actually was a (problematic) GIL removal patch.

We've gotten way off track, but the original point I was trying to make was that 1) the GIL really is a problem for not-purely-theoretical programs written by competent developers, and 2) that the 2->3 transition, by complicating the language and increasing the workload for the alternative implementations, has made it less likely than ever that the GIL problem would be resolved.

And, indeed, Nick explicitly confirmed this by saying the GIL is basically a dead issue for the CPython devs. His post made many good points about the merits of the 2->3 transition, and in particular pointed out some ways that 3 has reduced work for the alternative implementations, but I remain unconvinced overall. And not out of ignorance or incompetence, as he implied.

haberman · on June 29, 2012

I still think your position is unreasonable, because your inherent assumption is that the GIL is a "problem" that needs a "fix." This terminology is appropriate for a situation where the status quo could be improved without giving up any of the benefits of the current implementation. But this is not the case; removing the GIL in the way you advocate would add CPU and memory overhead that everyone would pay, even in the single-threaded case. And this is to say nothing of the practical problems of maintaining compatibility with existing C extensions.

The GIL is not a bug, it's a threading model. You wish the threading model was something else. You insist on your particular vision of an alternative threading model without acknowledging its downsides. You make no indication that you have actually considered or tried the alternative concurrency models that CPython does support, like multiprocessing, greenlets, or independent processes. You make no objective arguments for why your desired threading model is better than the ones that are currently available, except that you could avoid changing your code. You accuse Python of failing to live up to some accepted standard for what a "thread" should be, when in fact no such standard exists, especially for high-level, dynamically-typed languages like Python. If anything, newer languages are moving away from shared-state concurrency; see Erlang, Go, and Rust.

I don't think you have malicious intentions, but I urge you to reflect on what you are demanding and whether it is reasonable. What may look to you like "obvious" brokenness that demands an "obvious" fix is really a lot less clear-cut than you seem to think it is. I feel for the Python developers who have to deal with this complaining all the time.

comex · on June 30, 2012

To Python-level code, Python's threading model is pretty much exactly the same as that supported in all "fast" languages such as C and Java (even Go, ever pragmatic, has locks). Given that Jython already allows true multithreading and PyPy is trying to emulate it with STM, it's reasonable to see the GIL more as an implementation bug that won't be fixed for practical reasons than as a threading model... even if Python also supports alternate threading models that are perhaps better for most applications anyway (if strictly less powerful).

haberman · on June 30, 2012

> To Python-level code, Python's threading model is pretty much exactly the same as that supported in all "fast" languages such as C and Java

Yes, but Python also exposes higher-level operations like table manipulation as language primitives.

> Given that Jython already allows true multithreading

That may be, but as I mentioned this has an inherent cost, both in CPU and in memory. Therefore it is not a strict improvement over CPython, just a different direction.