One thing I really like about PHP is it's execution model. Here, every request is mapped to a single static .php file which is executed by a single PHP interpreter process. The execution is essentially stateless (aside from any bugs in the interpreter which from what I've seen tend to be few when it comes to this issue). Essentially, every request gets its own execution environment and does not have to worry about anything outside of itself. You also get code reloading: requests currently in progress are handled by old code. New requests are handled by new code after you replace the .php file.
Contrast this with RoR/Django/nodejs/etc. Here you have to be the one controlling the process/thread. You can mess up and create a dirty execution environment. In your requests you have to be aware of what the other requests might be doing at the same time. Moreover, running two simultaneous applications requires two different execution environments (apache processes, nodejs processes, etc.) The common solution to this is to either provide some environment isolation by running one set of interpreters per application or to have a whole virtual server dedicated to every application, incurring the overhead of an OS per app.
Don't get me wrong, I detest writing code in PHP for many reasons, but the execution model is efficient: if you want more power for your applications, just increase the number of available interpreters. Moreover, every interpreter is the same, so by doubling your total number of available interpreters, you double the concurrency of all of your applications. This is the pinnacle of cloud apps: you have a simple single toggle that controls how many requests you can process concurrently. This is why shared hosting now costs $2/month while Heroku is an order of magnitude more expensive.
> You can mess up and create a dirty execution environment.
Then don't mess up. It's not that hard. If you follow best practices (don't use global variables) you should handle this fine. I haven't heard anyone having this problem in Rails at all.
> I detest writing code in PHP for many reasons, but the execution model is efficient
No it's not. You'll have to re-open connections to databases all the time. You can't use keep-alive to external HTTP APIs. You'll have to re-parse the PHP for every request.
(Added later: Yes, this isn't how PHP actually works, and that was kinda my point: parent's concept of "PHP execution model" isn't actually used in PHP because it's stupid! There will always be some things that you want to share (connection pooling, opcode cache) so even PHP isn't a "perfect" share-nothing architecture.)
It's perfectly possible to write a shared-nothing architecture without dumbing it down to CGI.
> No it's not. You'll have to re-open connections to databases all the time. You can't use keep-alive to external HTTP APIs. You'll have to re-parse the PHP for every request.
What? This is all factually incorrect.
1) PHP has supported persistent database connections for ages. The fact is, persistent DB connections aren't really that great for websites, where you need a few resources quickly and then nothing for a long time. Having 10,000 idle DB connections is a great way to eat up resources. But you can do it.
>The fact is, persistent DB connections aren't really that great for websites, where you need a few resources quickly and then nothing for a long time. Having 10,000 idle DB connections is a great way to eat up resources. But you can do it.
Which is why you should be using connection pooling. Which can not be done efficiently with PHP. Necessitating that people create an entire extra layer of database proxying software to do the pooling for you.
> Then don't mess up. It's not that hard. If you follow best practices (don't use global variables) you should handle this fine. I haven't heard anyone having this problem in Rails at all.
Sure, though how do you know all the library code you are using isn't going to mess things up for you? Granted, chances are small, but they are there.
> No it's not. You'll have to re-open connections to databases all the time. You can't use keep-alive to external HTTP APIs. You'll have to re-parse the PHP for every request.
Use a cache for your bytecode to not re-parse all the time. Use a special purpose TCP proxy to keep connections open if that is your bottleneck (in most web apps it's not). Also, certain DB drivers let you have persistent connections out of the box.
> It's perfectly possible to write a shared-nothing architecture without dumbing it down to CGI.
It hasn't been done yet for the price that PHP offers. It is very limited in some areas, but if you go with the grain, it is very cheap.
My main point is that two different applications cannot share an interpreter in RoR/Django/nodejs. This leads to overhead which is not there for PHP. Places like Heroku and Google App Engine try to be clever about this. You can technically shut down interpreters that aren't used frequently, and spin them back up when many requests come in, but guessing when to shut things down and spin them back up is tricky. This is why Google App Engine is so pricey: they suck at figuring this out. But even if you have a great way to figure it out, you are still going to have unpredictable performance, and the cost of waking up/spinning up an interpreter is fairly high.
> Sure, though how do you know all the library code you are using isn't going to mess things up for you? Granted, chances are small, but they are there.
> Also, certain DB drivers let you have persistent connections out of the box.
If you have connection pooling you have have a "dirty environment". PostgreSQL/MySQL makes it possible to set per-connection settings so you have no guarantee that the SQL executes in a clean environment. You might say that "oh, but the library handles the resetting for me", but how do you know all the library code you are using isn't going to mess things up for you?
You got me :). And not only is it a possibility, but actually a certainty, if you set your own settings and don't re-set them on every new connection. This is an issue with all frameworks that allow persistent connections. For example a while ago I was working on a Django app where I needed to have the timezone set to the user's timezone when talking to MySQL. This meant that on every request I'd have to re-send a query, setting the timezone even if it did nothing. Alternatively, if I relied on the default timezone being UTC, I still had to set it explicitly, because the prior request might not have cleaned up after itself properly.
In short, unless your database has a way of saying "reset all settings", you cannot have persistent connections in any environment.
Somehow I don't think that the other languages prevent you from doing what you've just described. It's entirely up to you how you write your application. OTOH, sometimes you need some shared state, if it's handled in a careful fashion (stats collecting, anyone?). The last time I saw PHP, however, low-latency shared state was really awkward to handle.
Yeah, and other languages certainly don't require a separate server instance per application. I have about 25 .NET apps running on my production server at work and it's humming along quite nicely.
So what if one of those apps gets lots of use and you want to re-allocate resources from the other 24 to it? Now you have to manually tune your server settings, or have some automated process that does this. With a PHP-like environment, that would happen automatically. Say you have 100 interpreters, so average of 4 per app. Now, app A gets 80 requests. apache automagically re-assigns 80 interpreters to app A, and others idle. No re-configuration on your part necessary.
I am not familiar with how .NET does this. Perhaps it has some kind of a mechanism for dealing with this. Here is an example from the Django/Python world. In your virtual host apache config you have to specify the following:
Notice the process=5 threads=5. This means that apache will run 5 processes, with 5 threads each. Now imagine if you have several of these apps, all configured to use 5 processes and 5 threads each, which eats up 80% of the RAM on your server. Now, app A gets featured on HN, and lots of requests come in to that app. You can only process 25 concurrent requests (and really fewer since Python's GIL prevents CPU-intensive load to be efficiently scheduled between the 5 threads per process). However, while app A is getting slammed, apps B, C, D, and E are idle. You could get more performance for app A by reducing the number of processes/threads for apps B-E and increasing the number of processes/threads for app A, but this means manually doing so and reloading apache. Less than ideal.
Your example is not a problem with using an efficient execution model, it is a problem with django/wsgi. In fact, your example is using the exact same model as apache, it just sucks at it and makes you statically define the number of workers on a per app basis. You can easily have multiple web apps running in a single application server, and the resource limits will be shared just like with a typical apache+php setup.
Note that in environments where this sort of thing is trivial to do (java for example), virtually nobody does it, preferring to run separate servers per application anyways.
The way I understand it is that you either have a pool of interpreters per app or per set of apps. In the second case, life is easy: you can have a simple system that allocates interpreters to apps on demand. In the first case, you have to have a more complex solution. Perhaps the process manage (in this case apache) could implement such a system, but thus far it has not.
There's no need for sets of interpreters at all, that's what I am saying. Python being worse at this than PHP doesn't mean PHP is good at it. Look at go for example, there's one app server, running as many apps as you want.
I think its a naive attempt at describing worker processes and application scope. if you have a worker process per endpoint which is common with .net apps and lots of endpoints, its quite hard to balance resources. This is true. In PHP, none of this is of consequence.
Now I've got a tiny (280mb deployable, 5 hosts, 37 application pools, 5000 in flight requests 24/7) behemoth on my hands and I can testify its an arse pain to manage resources.
Isn't this true for any CGI program? I know that PHP is typically run via mod_php, but what you have described is true for any CGI program.
It is somewhat expensive to startup a new interpreter process per http connection. This can be alleviated by using FastCGI, but then the interpreters are long lived and are no longer completely independent.
Yes, though PHP comes with a few goodies such as mainstream mod_php and FastCGI (which I prefer), and bytecode cache (APC). The implementation is pretty good compared to a ye olde CGI Perl script.
That may be a reason why, but far from the only reason why, Heroku is more expensive than a cheap shared hosting provider. Heroku provides a lot more than just hosting files and running server instances. They provide all kinds of value-added services in deployment, logging/monitoring, security and performance auditing, package management, git integration, database choices and plugin apps, etc etc.
It is quite the opposite. It is incredibly inefficient, running hundreds of copies of the exact same code, building up an entire framework of objects to use for a single request, then trashing them all to do it again from scratch on the next request. It is easy for beginners, which is why it is so popular. But that is not the same as being efficient.
So what happens when you configure apache to run 100 processes of your Django code? If 100 clients want to view a web page, how many copies of the request object are floating around?
Or are you talking about parsing the code? At that point, something like APC would help immensely.
Are you seriously trying to compare the same execution model and then expect me to prove one of the two identical options superior? The shared nothing, single process per request execution model is the same as the shared nothing, single process per request execution model. It is not efficient compared to other models like node style non-blocking multi-plexing, or java style multi-threading, or the better options available in languages like haskell, go, clojure and erlang.
Contrast this with RoR/Django/nodejs/etc. Here you have to be the one controlling the process/thread. You can mess up and create a dirty execution environment. In your requests you have to be aware of what the other requests might be doing at the same time. Moreover, running two simultaneous applications requires two different execution environments (apache processes, nodejs processes, etc.) The common solution to this is to either provide some environment isolation by running one set of interpreters per application or to have a whole virtual server dedicated to every application, incurring the overhead of an OS per app.
Don't get me wrong, I detest writing code in PHP for many reasons, but the execution model is efficient: if you want more power for your applications, just increase the number of available interpreters. Moreover, every interpreter is the same, so by doubling your total number of available interpreters, you double the concurrency of all of your applications. This is the pinnacle of cloud apps: you have a simple single toggle that controls how many requests you can process concurrently. This is why shared hosting now costs $2/month while Heroku is an order of magnitude more expensive.