This skips over the hard part: managing docker containers. Poking a hole directly to the container is a leaky abstraction. A reverse proxy like HAProxy or Varnish should be sitting in front of the container.
Once you have the reverse proxy setup the next problem that arises is routing to containers based on the domain. Now your HAProxy or Varnish config is going to get bloated and every time you deploy a container the config needs to be modified and reloaded. By this time you might be looking at chef or puppet for automating the config generation.
Chef and puppet are not simple to learn. They have their own set of quirks (like unreliable tooling support on Windows). I'm in the process of conquering this, but I hope one day there will be a simpler way.
This is a great point. The initial Docker examples make everything seem easy, but we blew way past our estimated time in integrating docker into our workflow because of the points you mention. I am still happy with the choice to use docker though and our team will be better at server administration in the future.
One thing about this getting started guide is that it recommends the Phusion base image which boots init. That seems to go against the best practices outlined in a recent article by Michael Crosby - http://crosbymichael.com/dockerfile-best-practices-take-2.ht...
I'm one of the authors behind Phusion's baseimage.
Phusion's baseimage does not go against Michael Crosby's best practices. His best practices states not to boot init, and with that he means the normal init. He states that not because it's the init process itself that's a bad idea, but because the normal init performs all kinds of work that is either unnecessary or harmful inside a Docker container. In fact, this is exactly what the Baseimage-docker documentation also states: don't use the normal init.
The Phusion baseimage does not contain a normal init, but a special init that is specifically designed for use in Docker.
Nice, hadn't read those. Thanks! I was wondering about that, but still need a solution for logging, cron jobs, and similar (perhaps running those on the host machine is the answer)
I am still finding good solutions for those too, and trying to add some concepts to my toolbox like orchestration, service discovery, proxies, data containers, ambassador containers, and so on. It's hard for me to wrap my head around the different recommended ways to use docker compared to my initial expectations.
To be clear - I think FreeBSD is an awesome project and the jails look solid, but it is really the Docker community and exploding Docker ecosystem that makes Docker appealing to me. For example, Red Hat embracing Docker means that I can build Docker containers for enterprises that embrace CentOS and Red Hat. None of the corporations or organizations I have worked with use or support FreeBSD, but they all support CentOS. Reference: http://www.infoworld.com/t/application-virtualization/red-ha...
Update etcd with connection details on container start/stop. Then use a script to watch the appropriate directory in etcd for changes and regenerate the config.
Then on the haproxy/varnish box (or put them in a container), put something that does "etcdctl exec-watch /services/website -- updateconfig.sh", where updateconfig.sh would be a script to watch for changes and regenerate the config / reload.
I don't see how your config will get "bloated" any more than it would otherwise - presumably your number of domains won't increase.
This is what DNS is for. If you specify your backends by hostname instead of IP, then each time the load-balancer tries to connect to the backend, it'll get a list of A records from its DNS resolver and pick one in a round-robin fashion. Thus, if you have a dynamic DNS server that queries your presence service, it can return exactly the hosts that are up right now as the round-robin set.
Right now, if you use SkyDNS[1] as your DNS server, and attach Skydock to the Docker host, this all Just Works. Most people want to use etcd instead of Skydock, though, so support for that is coming soon[1] too.
SkyDNS looks interesting, but it doesn't appear to do any heath checks on the endpoint. I don't want clients to receive an answer to an A record query that contains the IP address of an endpoint that is down.
In order to be robust in the event of a network partition, the client should perform the the health check itself. This can be done in a background thread; it doesn't have to be synchronous with the DNS lookup (and that would be very bad for performance anyway).
Doesn't it mean that every tcp connection would have to query dns at the beginning? What about client dns cache? Also, isn't that expensive in terms of connection time?
> Poking a hole directly to the container is a leaky abstraction. A reverse proxy like HAProxy or Varnish should be sitting in front of the container.
It might be a stupid question but I wonder what's considered a leaky abstraction in this case.
By the way, I'm not sure I fully understand your concerns over reverse proxy routing, but I recall that Ambassador pattern linking[0] is a suggested way of tying Docker containers over network. Also, these slides by dotCloud[1] may be helpful as well (I'm not sure if approaches described are up-to-date, though).
>It might be a stupid question but I wonder what's considered a leaky abstraction in this case.
I consider poking a hole a leaky abstraction because you are exposing the internals of your stack. The consumer should not know or care that you are using docker containers to serve the application. From a security perspective directly exposing a container may lead to potential exploits of docker itself.
Exposing a container would not to exposing potential exploits of Docker. Docker is not running in the container (and doesn't know anything about Docker).
Exposing a container is a hell of a lot safer than exposing a service on the host OS.
Also not sure I follow you how a consumer would know you are using docker or not.
I'm probably just going to show my ignorance, here...but why doesn't container linking solve this problem?
Could you not run multiple docker containers/services behind a single nginx or apache container on a production server? Then the nginx container basically gets one of your public IP addresses, and you use linking to that container to provide it with knowledge of the other running processes' IP addresses (each within their own container, of course). In that way, you have one public facing container which has knowledge of the other containers and can use the information provided through -link to configure the nginx server to route requests appropriately. This requires a bit of bash script / sed command line hackery to update your nginx configuration to accommodate the changing IP addresses of the other containers on restart (unless you can set them by hand now using Docker, we still don't), but once you get it setup you never have to think about it again.
Like I said, maybe I'm just showing my ignorance, but something like the above scenario is how we get around hosting multiple services with limited public IP addresses available.
But even without using fleet, the overall mechanism is fairly easy to adapt: Either use systemd dependencies like in their example, or have a script that queries docker on each host to spot changes in running containers, and update an etcd instance (or whichever your preferred config server is).
The "extra complexity" needed for multi-machine setups over linking containers is actually pretty minor.
Your services need to read something to get the IP, which ultimately comes from an ENV variable. In the linked container scenario Docker sets that variable. Otherwise you set it manually. That's the only extra complexity.
I was worried about this too, so I tried it out[1]. In this case I have a YAML config file, which can be overridden by ENV variables (which may come from Docker).
This isn't as automatic as CoreOS (eg, no failover etc), but it is a lot less complex.
I've been thinking along these lines recently, specifically service discovery for front-end load-balancers.
Most (all?) of the available reverse proxies will stop sending traffic to a server that is offline, but not discover them. There are solutions such as etcd which you can hook into, or you can write a toy application to use UDP-broadcasts to advertise "Hey I'm http://dev.local.com/ on port 4444", but there isn't a lot beyond that.
Templating configuration files and running "haproxy reload" is a common enough middle-ground, but I've seen it fail often. (Specifically keepalived not reloading correctly and still sending traffic to old nodes.)
ObRelated: Varnish is a beast that few people can configure easily. I'd love to work on a caching reverse proxy that was simple, extensible, and fast.
> ObRelated: Varnish is a beast that few people can configure easily. I'd love to work on a caching reverse proxy that was simple, extensible, and fast.
It doesn't have as many caching-specific bells and whistles as Varnish, but nginx is an excellent reverse proxy with some caching abilities (and simple configuration).
A nice feature from the commercial nginx purge package is that it lets you purge by prefix. That's a feature that I've not seen in any of the open source purge modules.
If you are hosting data for several users on the same nginx cache and you want to purge only one of them, your only options are to scan the full cache on disk and delete the files that have a key with your prefix, or fork >$1K/year per nginx box for the commercial license.
> Now your HAProxy or Varnish config is going to get bloated and every time you deploy a container the config needs to be modified and reloaded. By this time you might be looking at chef or puppet for automating the config generation.
Varnish at least can route using DNS [0] - You do need a nameserver or two to handle the internal domain of course, but they're reasonably easy to set up using powerdns for example.
I think you can define the IP address assigned to a container via something like `-p 127.18.0.10:80:80`, if that helps with your HAProxy config (but that assumes your host machine isn't changing as well).
Definitely an interesting issue. Have you seen etcd from CoreOS? Useful for service discovery.
That doesn't assign an ip address to the container, it just determines which of the hosts addresses are forwarded to the container. But you can certainly use that to keep a fixed IP, if you use a suitable script to update iptables. You can use that if the host machine is changing as well, as long as an IP address is only ever used by containers that will be on the same server. Just add the IP on the new host, remove it from the old, and use arpsend -U -i [ip] [interface] to speed up the ip takeover. I use it fairly regularly to live-migrate services.
Once you have the reverse proxy setup the next problem that arises is routing to containers based on the domain. Now your HAProxy or Varnish config is going to get bloated and every time you deploy a container the config needs to be modified and reloaded. By this time you might be looking at chef or puppet for automating the config generation.
Chef and puppet are not simple to learn. They have their own set of quirks (like unreliable tooling support on Windows). I'm in the process of conquering this, but I hope one day there will be a simpler way.