The older hardware (Intel Xeon X5550) for m2.* instances did not support AVX. The newer hardware supports AVX, but to maintain compatibility with the old hardware, it is disabled (but was still advertised, which is what caused the issue).
AVX is turned on in newer instances like the cc2.8xlarge, which maps to the f2 core on PiCloud.
It's best to understand the differences from the top down. Celery is software. PiCloud is a service. This largely dictates the automation each system is able to provide.
Using PiCloud, there is no server to setup with your background processing software, in this case Celery. There is no server to deploy your own codebase on, where you have to manage the versioning of code and data; we automatically deploy the correct versions for you. With Celery, if you need more computing power, you'll have to setup a new server. With PiCloud, we're managing your infrastructure so we'll automatically boot new servers--hundreds, if necessary--or you can manually do it with the click of a button. If there's an update to Celery, you have to shut down your system and deploy it. With PiCloud, we handle all the server-side software updates because we control it.
It all boils down to less management, and more automation. A couple more examples. We've built redundancy across our system so that you don't have to design a system to handle server failure. You can choose the type of core (CPU + RAM combo) you want to use with one keyword argument; no need to change out all the machines you use.
The final result is that with a simple download of our client library and three lines of code, you can be leveraging a cluster of hundreds of machines. That's PaaS at its best.
A much more apt comparison would be with a service-oriented Celery.
Unfortunately, with our new s1 core type, it's even easier to DDoS services. ;)
In all seriousness, we've given thought to this issue, and the solution we've found isn't technological. Users who want to use over a couple of hundred cores simultaneously are capped until we've approved them. It's exactly what Amazon Web Services does as well.
From the get-go, we've targeted users with serious compute-intensive needs. These users from both academia and industry generally do what we call "scientific computing." A couple examples are oil companies doing immense amounts of geophysics simulations, and bioinformatics laboratories in universities doing comparative genomics; we'll be posting a guest blog entry in the next week on successfully reducing the time of sequence alignment from 20-25 hours to less than an hour. Other examples include neuroscience, astronomy, seismology, weather analysis, and protein folding.
As time has gone on, we have noticed two sizable class of users who we consider to be non-scientific. The first are the risk analysis divisions of financial firms who run Monte Carlo simulations in their never-ending endeavors to determine their portfolio's risk profile. The second are web companies doing all sorts of background processing: video encoding, feed aggregation, social data scraping, and analytics of all sorts.
The REST feature is an experiment on many levels. In the short term, it helps our existing users utilize PiCloud without Python, which is very important in companies with large polyglot codebases (read: Java). In the medium term, we want to see whether there's adequate interest from web and mobile companies who require serious scalability, and thus are interested by how we entirely divorce their algorithms from their servers. In the long term, we want to see how far we can take the idea of "publishing" a function; there are many users out there who would love to share their algorithms in working form but lack the programming know how and computing hardware to do it.
To answer your other questions: We're releasing a BigData a la MapReduce solution for our users in the coming months. Also, Python has a strong scientific computing community primarily because it interoperates very well with existing C/C++/Fortran code via extensions. In most of our use cases, Python is being used as a wrapper for C code, which is doing the heavy lifting. PiCloud, unlike many of the other background processing services, allows users to deploy more than just Python or Ruby code. We used to allow this through a "package upload" system, but we've recently just replaced it (in beta) with the ability to fully customize a file system environment (apt-get away!).
Our users are able to access their databases, often times without any code modifications, from PiCloud, which means their Django models work just fine. If their database resides on EC2 us-east, then the performance hit is minimal.
We just did a quick non-scientific check of our enqueue time from an EC2 us-east server to PiCloud. A cold start (first job) takes 900ms (It could take longer depending on what needs to get transferred), but once the connection has been made, enqueue takes around 86ms, and as the system heats up it reduces to 50ms.
1. If you piece the code snippets together in the post, it'll work. Nevertheless, I'll be posting a downloadable source file soon.
2. Our cloud.files module does have a 5GB file size limit. However, you are free to store your data elsewhere (probably on EC2 for fast and free data transfer), and access it using PiCloud. For example, you could store the video files on your own CouchDB server running on EC2, and simply query the server in your code (using the python-couchdb module) for the video.
While this might not be clear in the post, we completely agree: "Needless to say, they do have a full video encoding service with a wide range of options and customer support, whereas we’re showing you a building block that could be used to replicate their service. But, this does give you an idea of the premium they are charging for their service." We strive to be a general service for computation that can be applied to many verticals.
Good point. At the end of the day, we're not a video encoding service, so our users should use us in the same they would their own Amazon EC2 instance with ffmpeg.