Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From the article: "The only downsides of this approach are that it means that CPU bound Python code can’t scale to multiple cores within a single machine using threads, and that IO operations can incur unexpected additional latency in the presence of a CPU bound thread."

Response: The only downside? As a Python user suffering from JVM envy, I have to say that that's a SERIOUS downside!

Here's why: (1) Python is slow. Almost any real life program in pure Python will have CPU-bound components. (example: BBCode parsing on my forum).

(2) Most programs that need to scale won't need to scale beyond a single machine (the most active web forum on my continent runs on a single quad core server)

Therefore the need to be able to scale CPU-bound python programs to multiple cores on is a very real need. Even though we accept that removing the GIL is hard, let's not insult real-life Python users by suggesting that their needs are not real.



The article doesn’t say multi-CPU scaling isn’t necessary. It says that threads are usually the wrong answer anyway.

There are great process based ways to scale out – look no further than Erlang to see it’s true.


I use Python for my day job. I've experimented extensively with Java and Scala. I must say this: Threads are AWESOME. Threading is the most flexible model of concurrency because you can can build programs that EFFICIENTLY implement "alternative concurrency models" like message parsing and STM with threads. Threading is supported natively by every OS. And fast.

Erlang is hyped as the ideal model for concurrency, but in practice is a niche product that's primarily useful for programs that are almost pure IO - chat servers, routing components like proxy servers and packet switches.

The Erlang model does NOT apply to python, anyway, since Python processes are nothing like Erlang processes. Unlike Erlang processes, Python processes are very heavyweight and message passing between them is costly.


If you’re using python, the performance gap between processes with message passing and threads with locking is the last of your problems, believe me.

The big difference is that processes are much more robust and testable. The cases where threads are really needed are fringe cases and – while it’s a pity – Python doesn’t seem the right language if you don’t want to go the Jython/IronPython way.

The bigger problem is that people are used to go for threads by default although only few are able to write bug-free threaded code. It’s obsolete but prevalent performance wisdom and the fact that threads were really popular in the Java world.


This is a really short-sighted perspective.

Proper support of threading inherently allows more performance and flexibility than multiprocessing. On top of them, you can build powerful, Pythonic abstractions like concurrent.futures.ThreadPoolExecutor.map and STM, and on top of those, even more powerful abstractions that help the developer avoid concurrency bugs.

I'm really excited for PyPy. That is a project full of people who are not afraid to quickly iterate on powerful ideas that can make Python a high performance language that it deserves to be, instead of resorting to calling MT programming a "fringe case" and ad hominem attacks.


I wish you’d read the article before making your accusations.


I have. There's an ad hominem attack in the middle of it. Otherwise it's a great article.

I understand that everyone here is acting in good faith and wants Python to be better, and the article otherwise contains lots of great information presented in a reasonable manner. You bring up lots of good points too. But other statements like the ones I mentioned are overly broad or brash.


I think it’s just attrition by explaining the GIL problem over and over again. Nick is one of the major core developers and probably just fed up by the topic. So this bit of snark is all the pay he’ll ever get for his work on CPython.


I really liked the article. But look at how much of this thread is spent talking about that bit of snark. I think that mixing the snark in with all of the rational reasons for Python 3's existence and not working on the GIL made some people less receptive to rational arguments. In other words, I don't think it was worth it.


"Pythonic abstractions like concurrent.futures.ThreadPoolExecutor.map"

Although it might be great stuff, the word 'Pythonic' is pretty funny next to a long Java-style name like that


If you’re using python, the performance gap between processes with message passing and threads with locking is the last of your problems, believe me.

What would be the first of my problems?


Not sure if you’re trying to troll me by taking it out of the obvious performance context but anyway: The performance penalty due to the usage of a un-JIT-ed scripting language?


Not trolling you, just wondering what you thought was more important from a performance perspective. Re-writing working software in another language is not always feasible due to real-world time constraints, and in that context the performance difference between message-passing and threads would be the first of my problems. Basically, I just don't see the argument of "use a more appropriate language than Python" as a useful counter to criticism of the GIL. The whole point of criticizing Python (in my case anyway), is to hopefully nudge the language to suiting my needs more closely.


The counter is not (even while some people try to turn it that was): THREADS SUCK, WE WON'T ADD THEM BECAUSE WE DON'T LIKE THEM. it is: given the circumstances (which Nick outlines verbosely) a removal of the GIL is not pragmatic.

And this is the last time I’ve wrote this, I feel like a street-organ. >:(

And all I was saying in this thread is that the performance gap between threads and processes isn’t that big of a deal, if you run non-native code anyway. The multiprocessing module is pretty cool.


I'm not intentionally trying to make you repeat yourself. And I know that removal of the GIL is not pragmatic at the moment (or maybe ever), but that doesn't mean it wouldn't be valuable. The GIL wasn't much of a problem ten years ago because not many personal computers had multiple cores. Today it's become a bit of a pain for me personally, and it will only become more painful as core count increases while single-core performance remains largely stagnant.

And all I was saying in this thread is that the performance gap between threads and processes isn’t that big of a deal, if you run non-native code anyway. The multiprocessing module is pretty cool.

This is a line that I hear over and over again, but I strongly disagree with. It's not always easy to predict where your performance bottlenecks will be until you actually start implementing it in some language. If I've chosen Python for a project and find I need more cores, I'm stuck with either re-implementing critical sections of code in C extensions or other languages, or using multiprocessing. And multiprocessing is not that great because it splits the memory space across processes and communication between them is extremely expensive. And there are many caveats to this which cause enormous headaches (eg., you can't fork your process while having an active CUDA context, not all Python objects are serializable, pickling is slow, marshaling doesn't work well for all data types, you must finish dequeuing large objects from a multiprocessing.Queue before joining the source process, etc.).

Yes, I could get a 10-100x speedup by re-writing everything in C. But most of the time, I would be very happy with a 6-12x performance gain from just using threads in a shared memory space.


I assume that for whatever reason, it is absolutely impossible for you to use any other concurrency model.

Did you try Jython?


The specific project I'm vaguely referring to here is described in a little more detail as 1) here: http://news.ycombinator.com/item?id=4178070

No, I didn't try Jython. The choice of CPython was made before I took over the project, and there are also a dozen or so dependencies which I don't think are compatible with Jython.


If you are trying to exploit parallelism, any kind of data sharing (other than pure read-only sharing) costs you.


Yes, but with threading, the costs are minimized. You don't have to convert your Python objects to bytes, send them over a network, wait, retrieve bytes, and convert back to Python objects every time you need to read or write some shared data.


SMP Erlang DOES use threads, it then uses green-processes and a custom scheduler to schedule processes fairly and preemptively across those threads. It doesn't use like 1 thread per process, but threads are a key component of how it works.

It is absolutely nothing like spawning OS level processes. They are micro-processes, green-processes than live inside the Erlang VM.


You are writing a forum which includes BBcode parsing, and Python's CPU usage is your bottleneck? Really?

You are devoting multiple cores to parsing an individual user's BBcode?

This is your example of a real need to remove the GIL?


No, that was just an example of a unexpectedly CPU-bound operation. My point was that you can't assume your Python program is not CPU-bound. Every time you have to do something non-trivial in Python, performance could become an issue.


How do you know it's not memory I/O bandwidth bound? Just curious.


From the perspective of a Python program, being memory bandwidth bound is the same as being CPU bound: you have the GIL, your process is in the running state in the OS, and is currently executing on a core.

(I assume you mean truly mean bandwidth between main memory and the processor, and not to disk.)


Yeah, but do most programs that need to scale also need shared context? Otherwise, what are the big downsides of processes vs threads?


Many programs can benefit from shared context. You can push all the shared context to your database, but it's often helpful to keep around some shared data structures for performance reasons. For example, you can cache pure Python functions using functools.lrucache, and share such caches between threads, but such caches can't be shared between processes. In-process data structures like dicts and lists are much, much faster than alternatives like memcached and redis because they avoid the overhead of IPC and deserialization, and they are also easier to use since they are built-in.


dicts and lists are fast, but that doesn't mean a threading approach which must protect your dicts and lists with various kinds of locking will be that fast, because your program will now be waiting on locks. What are you trying to do with memcached that it is not fast enough?


> What are you trying to do with memcached that it is not fast enough?

Fine-grained caching of objects that correspond to DB rows. Most pages touch hundreds of DB rows, due to the various relationships between objects. With memcached, you have to cache at a higher granularity and contort your code quite a bit to reduce the number of gets per request.

> dicts and lists are fast, but ... your program will now be waiting on locks

In my experience, the overhead of locking is often negligible. In Java-land, you can have millions of lock operations per second. IPC involves serialization, deserialization, and context switching, in addition to actual work. Most IPC routines are built on locks, anyway.


The overhead of locking itself (which nobody has even mentioned) is completely distinct from the impact of waiting on locks. It's totally irrelevant that you can have millions of lock operations per second if you actually have any shared state to protect. If you are actually USING locks then you have threads waiting on them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: