From the article: "The only downsides of this approach are that it means that CP...

hynek · on June 29, 2012

The article doesn’t say multi-CPU scaling isn’t necessary. It says that threads are usually the wrong answer anyway.

There are great process based ways to scale out – look no further than Erlang to see it’s true.

seunosewa · on June 29, 2012

I use Python for my day job. I've experimented extensively with Java and Scala. I must say this: Threads are AWESOME. Threading is the most flexible model of concurrency because you can can build programs that EFFICIENTLY implement "alternative concurrency models" like message parsing and STM with threads. Threading is supported natively by every OS. And fast.

Erlang is hyped as the ideal model for concurrency, but in practice is a niche product that's primarily useful for programs that are almost pure IO - chat servers, routing components like proxy servers and packet switches.

The Erlang model does NOT apply to python, anyway, since Python processes are nothing like Erlang processes. Unlike Erlang processes, Python processes are very heavyweight and message passing between them is costly.

hynek · on June 29, 2012

If you’re using python, the performance gap between processes with message passing and threads with locking is the last of your problems, believe me.

The big difference is that processes are much more robust and testable. The cases where threads are really needed are fringe cases and – while it’s a pity – Python doesn’t seem the right language if you don’t want to go the Jython/IronPython way.

The bigger problem is that people are used to go for threads by default although only few are able to write bug-free threaded code. It’s obsolete but prevalent performance wisdom and the fact that threads were really popular in the Java world.

ak217 · on June 29, 2012

This is a really short-sighted perspective.

Proper support of threading inherently allows more performance and flexibility than multiprocessing. On top of them, you can build powerful, Pythonic abstractions like concurrent.futures.ThreadPoolExecutor.map and STM, and on top of those, even more powerful abstractions that help the developer avoid concurrency bugs.

I'm really excited for PyPy. That is a project full of people who are not afraid to quickly iterate on powerful ideas that can make Python a high performance language that it deserves to be, instead of resorting to calling MT programming a "fringe case" and ad hominem attacks.

hynek · on June 29, 2012

I wish you’d read the article before making your accusations.

ak217 · on June 29, 2012

I have. There's an ad hominem attack in the middle of it. Otherwise it's a great article.

I understand that everyone here is acting in good faith and wants Python to be better, and the article otherwise contains lots of great information presented in a reasonable manner. You bring up lots of good points too. But other statements like the ones I mentioned are overly broad or brash.

hynek · on June 29, 2012

I think it’s just attrition by explaining the GIL problem over and over again. Nick is one of the major core developers and probably just fed up by the topic. So this bit of snark is all the pay he’ll ever get for his work on CPython.

scott_s · on June 29, 2012

I really liked the article. But look at how much of this thread is spent talking about that bit of snark. I think that mixing the snark in with all of the rational reasons for Python 3's existence and not working on the GIL made some people less receptive to rational arguments. In other words, I don't think it was worth it.

slurgfest · on June 29, 2012

"Pythonic abstractions like concurrent.futures.ThreadPoolExecutor.map"

Although it might be great stuff, the word 'Pythonic' is pretty funny next to a long Java-style name like that

bwood · on June 29, 2012

If you’re using python, the performance gap between processes with message passing and threads with locking is the last of your problems, believe me.

What would be the first of my problems?

hynek · on June 29, 2012

Not sure if you’re trying to troll me by taking it out of the obvious performance context but anyway: The performance penalty due to the usage of a un-JIT-ed scripting language?

bwood · on June 29, 2012

Not trolling you, just wondering what you thought was more important from a performance perspective. Re-writing working software in another language is not always feasible due to real-world time constraints, and in that context the performance difference between message-passing and threads would be the first of my problems. Basically, I just don't see the argument of "use a more appropriate language than Python" as a useful counter to criticism of the GIL. The whole point of criticizing Python (in my case anyway), is to hopefully nudge the language to suiting my needs more closely.

hynek · on June 29, 2012

The counter is not (even while some people try to turn it that was): THREADS SUCK, WE WON'T ADD THEM BECAUSE WE DON'T LIKE THEM. it is: given the circumstances (which Nick outlines verbosely) a removal of the GIL is not pragmatic.

And this is the last time I’ve wrote this, I feel like a street-organ. >:(

And all I was saying in this thread is that the performance gap between threads and processes isn’t that big of a deal, if you run non-native code anyway. The multiprocessing module is pretty cool.

bwood · on June 29, 2012

I'm not intentionally trying to make you repeat yourself. And I know that removal of the GIL is not pragmatic at the moment (or maybe ever), but that doesn't mean it wouldn't be valuable. The GIL wasn't much of a problem ten years ago because not many personal computers had multiple cores. Today it's become a bit of a pain for me personally, and it will only become more painful as core count increases while single-core performance remains largely stagnant.

And all I was saying in this thread is that the performance gap between threads and processes isn’t that big of a deal, if you run non-native code anyway. The multiprocessing module is pretty cool.

This is a line that I hear over and over again, but I strongly disagree with. It's not always easy to predict where your performance bottlenecks will be until you actually start implementing it in some language. If I've chosen Python for a project and find I need more cores, I'm stuck with either re-implementing critical sections of code in C extensions or other languages, or using multiprocessing. And multiprocessing is not that great because it splits the memory space across processes and communication between them is extremely expensive. And there are many caveats to this which cause enormous headaches (eg., you can't fork your process while having an active CUDA context, not all Python objects are serializable, pickling is slow, marshaling doesn't work well for all data types, you must finish dequeuing large objects from a multiprocessing.Queue before joining the source process, etc.).

Yes, I could get a 10-100x speedup by re-writing everything in C. But most of the time, I would be very happy with a 6-12x performance gain from just using threads in a shared memory space.

slurgfest · on June 29, 2012

I assume that for whatever reason, it is absolutely impossible for you to use any other concurrency model.

Did you try Jython?

bwood · on June 29, 2012

The specific project I'm vaguely referring to here is described in a little more detail as 1) here: http://news.ycombinator.com/item?id=4178070

No, I didn't try Jython. The choice of CPython was made before I took over the project, and there are also a dozen or so dependencies which I don't think are compatible with Jython.

slurgfest · on June 29, 2012

If you are trying to exploit parallelism, any kind of data sharing (other than pure read-only sharing) costs you.

seunosewa · on June 29, 2012

Yes, but with threading, the costs are minimized. You don't have to convert your Python objects to bytes, send them over a network, wait, retrieve bytes, and convert back to Python objects every time you need to read or write some shared data.

MetaCosm · on June 29, 2012

SMP Erlang DOES use threads, it then uses green-processes and a custom scheduler to schedule processes fairly and preemptively across those threads. It doesn't use like 1 thread per process, but threads are a key component of how it works.

It is absolutely nothing like spawning OS level processes. They are micro-processes, green-processes than live inside the Erlang VM.

slurgfest · on June 29, 2012

You are writing a forum which includes BBcode parsing, and Python's CPU usage is your bottleneck? Really?

You are devoting multiple cores to parsing an individual user's BBcode?

This is your example of a real need to remove the GIL?

seunosewa · on June 29, 2012

No, that was just an example of a unexpectedly CPU-bound operation. My point was that you can't assume your Python program is not CPU-bound. Every time you have to do something non-trivial in Python, performance could become an issue.

pwang · on June 29, 2012

How do you know it's not memory I/O bandwidth bound? Just curious.

scott_s · on June 29, 2012

From the perspective of a Python program, being memory bandwidth bound is the same as being CPU bound: you have the GIL, your process is in the running state in the OS, and is currently executing on a core.

(I assume you mean truly mean bandwidth between main memory and the processor, and not to disk.)

lrem · on June 29, 2012

Yeah, but do most programs that need to scale also need shared context? Otherwise, what are the big downsides of processes vs threads?

seunosewa · on June 29, 2012

Many programs can benefit from shared context. You can push all the shared context to your database, but it's often helpful to keep around some shared data structures for performance reasons. For example, you can cache pure Python functions using functools.lrucache, and share such caches between threads, but such caches can't be shared between processes. In-process data structures like dicts and lists are much, much faster than alternatives like memcached and redis because they avoid the overhead of IPC and deserialization, and they are also easier to use since they are built-in.

slurgfest · on June 29, 2012

dicts and lists are fast, but that doesn't mean a threading approach which must protect your dicts and lists with various kinds of locking will be that fast, because your program will now be waiting on locks. What are you trying to do with memcached that it is not fast enough?

seunosewa · on June 29, 2012

> What are you trying to do with memcached that it is not fast enough?

Fine-grained caching of objects that correspond to DB rows. Most pages touch hundreds of DB rows, due to the various relationships between objects. With memcached, you have to cache at a higher granularity and contort your code quite a bit to reduce the number of gets per request.

> dicts and lists are fast, but ... your program will now be waiting on locks

In my experience, the overhead of locking is often negligible. In Java-land, you can have millions of lock operations per second. IPC involves serialization, deserialization, and context switching, in addition to actual work. Most IPC routines are built on locks, anyway.

slurgfest · on June 29, 2012

The overhead of locking itself (which nobody has even mentioned) is completely distinct from the impact of waiting on locks. It's totally irrelevant that you can have millions of lock operations per second if you actually have any shared state to protect. If you are actually USING locks then you have threads waiting on them.