Show HN: Zappa – Serverless Python/Django on AWS Lambda

sergiotapia · on Feb 9, 2016

If you're asking yourself: "Why should I care about Lambda? How does it help me build a fast website? In what context?", there's an interesting use case from a merchandise store that made real hats and such for Counter Strike Source.

Valve mentioned them on the official CS:S page, and things went haywire. The team restructured into a lambda friendly architecture, they scaled without breaking a sweat, and ended up paying pennies in costs.

Link: https://www.reddit.com/r/webdev/comments/3oiilb/our_company_...

vangale · on Feb 9, 2016

It seems like having local lambda emulation would be useful as a companion to this. If I ever do some lamba development I'd love to see integration with emulators like https://github.com/HDE/python-lambda-local, https://serverlesscode.com/post/emulambda-testing-aws-lambda..., or https://ashiina.github.io/2015/01/lambda-local/ (some or all of these may work already, I can't tell just by eyeballing the projects from a distance)

Mizza · on Feb 9, 2016

This would have been useful for me during development..

But honestly, the big problem is the API Gateway. The product is a mess, the docs are a mess, and the whole system is kind of insane. Lambda is awesome, but API Gateway still seems half baked. 90% of my development time was fighting with APIGW. There are some kind of crazy hacks in there (base58 encoding cookies, regex based on b64 encoded status codes) that shouldn't have been necessary.

Still, now that the system works and it's easy to use, I think this is ready for real usage. And I'm sure Amazon will get their stuff together for future releases of APIGW. They probably weren't completely anticipating that people would use it in this way.

dexterdog · on Feb 9, 2016

Yeah, I fought with APIGW for a while in the beginning because I had to make scripts to create my route structures to I could call Lambda in a fully RESTful manner for my data model. A lot of stuff just didn't work when calling from the AWS CLI. After a lot of trial and error and a lot of bug reporting I got to a script that would quickly build my APIGW for me because that web interface is horrible.

Mizza · on Feb 9, 2016

Happy to answer any questions that people might have about this!

To head of a few questions at the pass:

Here are the hacks necessary to make this work: https://github.com/Miserlou/Zappa#hacks

Here's how to avoid the cold-start problem: https://github.com/Miserlou/django-zappa#keeping-the-server-...

landmark2 · on Feb 11, 2016

Hi,

how are static assets served? through django?

jjoe · on Feb 9, 2016

This is cool. On this:

To ensure that your servers are kept in a cached state, you can manually configure a scheduled task for your Zappa function that'll keep the server cached by calling it every 5 minutes.

The cost of running the warmer (300s * ~3M * 512MB) comes out to about $18 and that's not counting the number of actual requests from end users. It's interesting but costly as a substitute.

dexterdog · on Feb 9, 2016

The call should only run for under the minimum 100ms each time as it's just a ping and the instances is usually already loaded although AWS will likely recycle the instance from time to time outside of your control. There would be roughly 8640 calls/month which is about 864 seconds of execution time. You get 800,000 free seconds per month. This is totally negligible because even if you are counting it above the free level it is about 7/10 of a cent per month.

Disclaimer: I am a Lambda fanboy.

jjoe · on Feb 9, 2016

From my understanding of the pricing (I could be wrong), even if the call is 100ms, because the timeout is set to 300s so the app stays in cache, you'll be charged for 300s (app is "running" for that long).

dexterdog · on Feb 9, 2016

That is not correct. You can set the timeout to the maximum if you want and you only pay that when the call does not complete.

Mizza · on Feb 9, 2016

Where did you get the 3 million from?

jjoe · on Feb 9, 2016

I combined seconds into 3M, which is roughly 8640 (number of calls a month to keep it in cache) * 300s (timeout set so it stays in memory). My understanding of AWS lambda is that if you keep it up for 300 seconds, you get charged for that much. Because it's #reqs * #secs. Here's the example from the pricing page (with my comment at the end of the Total compute line):

The monthly compute price is $0.00001667 per GB-s and the free tier provides 400,000 GB-s.

Total compute (seconds) = 3M (1s) = 3,000,000 seconds # ( roughly equals to 8640 * 300 ~ 2.6M seconds in the case of this app)

Total compute (GB-s) = 3,000,000 * 512MB/1024 = 1,500,000 GB-s

Total compute – Free tier compute = Monthly billable compute GB- s

1,500,000 GB-s – 400,000 free tier GB-s = 1,100,000 GB-s

Monthly compute charges = 1,100,000 * $0.00001667 = $18.34

Mizza · on Feb 9, 2016

I don't think the timeout is related to the caching. When your function returns, it is over. The caching is not transparent, I don't know how it works under the hood, but it seems like if you just call it every 10 minutes it stays hot.

Would love an AWS engi to shed some light on this though.

dexterdog · on Feb 9, 2016

This sheds some pretty good light: https://aws.amazon.com/blogs/compute/container-reuse-in-lamb...

You will wind up with multiple containers in use at once if you have enough traffic so a call to the general pool that is running your Lambda function will most likely only keep one of the containers from recycling.

chrisrhoden · on Feb 9, 2016

It looks to me like you're misunderstanding the recommendation here, which is to hit a fast endpoint to ensure that lambda keeps it somewhere in their caches. This is similar to what one might do with Heroku.

The idea is that something like actually distributing the code to a front-line server seems to add to boot time, so if the lambda is not warm, it will be a little slower. If you keep it warm by having at least occasional traffic headed to the server, you're able to avoid this penalty.

If you look above, you'll see someone else did the math for you, and you're actually only talking about 864s of execution time, not 3M.

It also sort of looks like you just pulled up the pricing example which describes 3M requests that take 1s and worked back to it, because when you do the math you describe it works out to ~ 2.5M

vog · on Feb 9, 2016

> Where normal web servers like Apache and Nginx have to sit idle 24/7, waiting for new requests to come in, with Zappa, the server is created after the HTTP request comes in through API Gateway

To me, this sounds very much like reinventing the plain old CGI, just with different names ("Web server" -> "API Gateway", "CGI script" -> "server").

Am I missing something here?

rakoo · on Feb 9, 2016

There are a few differences:

* CGI is the closest UNIX thing you can find, fork a process, write on STDIN, read from STDOUT. Lambda actually is a framework for running javascript/java/python code, from which you can call actual binaries.

* Lambda's attractive point is that containers are reused, which means that instead of paying the full price of a new process your context is already "hot", even for the actual binary you run. That means less latency and less CPU usage, allowing you to scale far more easily

I'm no Lambda user so I'm possibly wrong, but the idea and execution behind sure looks nice.

toomuchtodo · on Feb 9, 2016

Both your statements are correct: Amazon is most likely running lxc containers under the hood though, so anyone could duplicate what Lambda does using an evaluator for javascript/python/etc with a REST API containerized in a docker container.

Disclaimer: I am a heavy Lambda user.

Eridrus · on Feb 9, 2016

I haven't done my own experiments, but reading the Lambda docs on Security has made me think these are real VMs and not containers.

Mizza · on Feb 9, 2016

You're not completely wrong, the difference is that there is no configuration necessary, no permanent infrastructure, no limitations on scalability, and the costs are measured in milliseconds. AWS just takes care of everything. It's just _python manage.py deploy prod_ and you're done.

Obviously there are still ways that this can be optimized since Django wasn't really designed to be used like this, but it still seems performant enough for me, and the other advantages gained are major.

Response times for a warm server are almost always <200ms, averaging just over 100ms (we did some tests on reddit yesterday.) In my own tests just now, I was getting <80ms response times consistently. And I'm certain there are ways that we can shave this down further.

dexterdog · on Feb 9, 2016

I get sub-100ms times quite often for light calls. The 'boot-up' call will always be longer, but that's because the service had gone idle. If you are getting heavy usage that is not an issue.

metaphorm · on Feb 9, 2016

Lambda with the API Gateway (the Request/Response control flow, as opposed to the Event triggered control flow that Lambda uses with other services) is basically like a distributed, virtualized implementation of CGI.

However, being distributed and virtualized is a pretty big deal. In practice this means Lambda apps perform differently than traditional CGI, and also require different app architecture to support them.

siscia · on Feb 9, 2016

I am toying around the idea of writing an open source implementation of Lambda on top of docker (for isolation) golang (for custom code to run) and erlang/elixir/OTP (for coordination). Are people interested ?

Feel free to contact me on the address on my profile...

Mizza · on Feb 9, 2016

I think there is a strong need for something like this. It's worth noting that API Gateway is also a huge component of this project, not just Lambda. Lambda alone wouldn't provide the benefits of Zappa.

vincentdm · on Feb 16, 2016

Interesting! I've been playing around with Serverless/Lambda this weekend (the node version), but a main thing that struck me was that I couldn't use a connection pool for database connectivity. This means that each request to a lambda-backed API will need to create new connections.

Anyone who wants to share their thoughts about this?

cdnsteve · on Feb 9, 2016

"Serverless" is a framework used by AWS: https://github.com/serverless/serverless

And it now supports Python. Would be interesting to see someone using AWS Serverless framework with Django Plugin support.

Mizza · on Feb 9, 2016

The point of this is that you don't need to learn any new frameworks - it works with your existing code. You can deploy your existing apps on Lambda without having to change anything, and you're not locked in to AWS if you want to go back.

stavros · on Feb 9, 2016

I rather doubt the "without changing anything" part. For example, I write static media to a subdirectory of my Django installation. Will that work without any changes? How about Django uploads? How about temporary files? Etc etc.

I used to use Django-nonrel for GAE so I wouldn't be logged in, and guess what: By the time I wanted to move away, I had so much GAE-specific behaviour that I pretty much had to rewrite the app.

Mizza · on Feb 9, 2016

Well you really shouldn't be serving static content through Django, you should be serving it through a CDN. If you use something like Django-Storages for uploads, it'll Just Work.

I'm not guaranteeing that everything will work out of the box on the first try, but I bet you'll be very close. You will have to make a few design decisions, but if you're making the right ones it should just work. At the very least it should be far, far easier that GAE.

(The one thing you'll have to watch out for are C-extensions, which, for now, require that you do your deployment from an x86_64 machine.)

stavros · on Feb 10, 2016

> Well you really shouldn't be serving static content through Django, you should be serving it through a CDN

The two aren't mutually exclusive. My CDN points to Django for retrieving the static media.

> if you're making the right ones it should just work

"If you're making the ones that Zappa requires", you mean.

> The one thing you'll have to watch out for are C-extensions

Eh, that's to be expected, though.

bake · on Feb 9, 2016

Fantastic! We had been considering using lambda, but were worried that doing so would irreversibly lock us into AWS. Being able to develop for lambda using standard web frameworks (django already being our favorite) goes a long way in increasing our willingness to use it.

Mizza · on Feb 9, 2016

Yep! This should make it super easy, there are no code changes needed so you can go back and forth between Lambda and traditional hosts as often as you please. No "lock in" - although one of Lambda's advantages is how well it ties into the rest of the AWS ecosystem with RDS/S3/CloudFront/etc.

hendi_ · on Feb 9, 2016

Wow -- the fact that this allows deployment of a "regular" Django application to Lambda is awesome!

Have you checked if Postgres works via the py-postgresql driver? It's implemented in only Python 3, with all C optimisations optional.

nolite · on Feb 10, 2016

Lambda only supports Python 2.7 for the momment, and getting postgres drivers working is a nice royal pain in the ass..

https://github.com/jkehler/awslambda-psycopg2

https://www.reddit.com/r/aws/comments/3on09a/using_psycopg2_...

https://forums.aws.amazon.com/thread.jspa?messageID=680192

Mizza · on Feb 9, 2016

Haven't tried that yet - want to give it a shot? Pull requests welcome! :D :D

gitaarik · on Feb 10, 2016

Love the name! Django Reinhardt and Frank Zappa are 2 of my favorite artists! ^^

trymas · on Feb 9, 2016

1. sounds nice.

2. what the hell is this http://imgur.com/a7FevNF? where's at least the close button? thank god ESC works.

Mizza · on Feb 9, 2016

I think you're right about that. Increasing conversion rates by 5% at the cost of pissing off the other 95? Dumb. Sorry.

stavros · on Feb 9, 2016

Why is it dumb? You increased conversion rates by 5%.

rplnt · on Feb 9, 2016

2. Click anywhere outside the box.

herbst · on Feb 9, 2016

The website sucks. It asked me twice about my email. Fucking twice. Cant read that shit.