This skips over the hard part: managing docker containers. Poking a hole directl...

tomgruner · on March 21, 2014

This is a great point. The initial Docker examples make everything seem easy, but we blew way past our estimated time in integrating docker into our workflow because of the points you mention. I am still happy with the choice to use docker though and our team will be better at server administration in the future.

One thing about this getting started guide is that it recommends the Phusion base image which boots init. That seems to go against the best practices outlined in a recent article by Michael Crosby - http://crosbymichael.com/dockerfile-best-practices-take-2.ht...

FooBarWidget · on March 22, 2014

I'm one of the authors behind Phusion's baseimage.

Phusion's baseimage does not go against Michael Crosby's best practices. His best practices states not to boot init, and with that he means the normal init. He states that not because it's the init process itself that's a bad idea, but because the normal init performs all kinds of work that is either unnecessary or harmful inside a Docker container. In fact, this is exactly what the Baseimage-docker documentation also states: don't use the normal init.

The Phusion baseimage does not contain a normal init, but a special init that is specifically designed for use in Docker.

fideloper · on March 21, 2014

Nice, hadn't read those. Thanks! I was wondering about that, but still need a solution for logging, cron jobs, and similar (perhaps running those on the host machine is the answer)

tomgruner · on March 21, 2014

I am still finding good solutions for those too, and trying to add some concepts to my toolbox like orchestration, service discovery, proxies, data containers, ambassador containers, and so on. It's hard for me to wrap my head around the different recommended ways to use docker compared to my initial expectations.

sounds_good22 · on March 21, 2014

the comment below by user:

pg_fukd_mydog

Pointed out that FreeBSD jails do this right.

tomgruner · on March 22, 2014

What is that username, something from reddit? I really have no interest in FreeBSD.

tomgruner · on March 22, 2014

To be clear - I think FreeBSD is an awesome project and the jails look solid, but it is really the Docker community and exploding Docker ecosystem that makes Docker appealing to me. For example, Red Hat embracing Docker means that I can build Docker containers for enterprises that embrace CentOS and Red Hat. None of the corporations or organizations I have worked with use or support FreeBSD, but they all support CentOS. Reference: http://www.infoworld.com/t/application-virtualization/red-ha...

vidarh · on March 21, 2014

Update etcd with connection details on container start/stop. Then use a script to watch the appropriate directory in etcd for changes and regenerate the config.

Look at "fleet" from CoreOS, and especially their "sidekick" example that uses systemd dependencies to trigger etcd updates: https://coreos.com/docs/launching-containers/launching/launc... though you can certainly do this without fleet too.

Then on the haproxy/varnish box (or put them in a container), put something that does "etcdctl exec-watch /services/website -- updateconfig.sh", where updateconfig.sh would be a script to watch for changes and regenerate the config / reload.

I don't see how your config will get "bloated" any more than it would otherwise - presumably your number of domains won't increase.

derefr · on March 21, 2014

This is what DNS is for. If you specify your backends by hostname instead of IP, then each time the load-balancer tries to connect to the backend, it'll get a list of A records from its DNS resolver and pick one in a round-robin fashion. Thus, if you have a dynamic DNS server that queries your presence service, it can return exactly the hosts that are up right now as the round-robin set.

Right now, if you use SkyDNS[1] as your DNS server, and attach Skydock to the Docker host, this all Just Works. Most people want to use etcd instead of Skydock, though, so support for that is coming soon[1] too.

[1] https://github.com/skynetservices/skydns

[2] https://groups.google.com/forum/#!topic/coreos-dev/iklEYHh5J...

otterley · on March 21, 2014

SkyDNS looks interesting, but it doesn't appear to do any heath checks on the endpoint. I don't want clients to receive an answer to an A record query that contains the IP address of an endpoint that is down.

Am I mistaken?

nl · on March 22, 2014

You probably don't want a healthcheck done for every DNS request.

Better to have a healthcheck service doing healthchecking, and modify SkyDNS along with whatever else happens when a service goes down.

otterley · on March 22, 2014

In order to be robust in the event of a network partition, the client should perform the the health check itself. This can be done in a background thread; it doesn't have to be synchronous with the DNS lookup (and that would be very bad for performance anyway).

mateuszf · on March 22, 2014

Doesn't it mean that every tcp connection would have to query dns at the beginning? What about client dns cache? Also, isn't that expensive in terms of connection time?

goblin89 · on March 21, 2014

> Poking a hole directly to the container is a leaky abstraction. A reverse proxy like HAProxy or Varnish should be sitting in front of the container.

It might be a stupid question but I wonder what's considered a leaky abstraction in this case.

By the way, I'm not sure I fully understand your concerns over reverse proxy routing, but I recall that Ambassador pattern linking[0] is a suggested way of tying Docker containers over network. Also, these slides by dotCloud[1] may be helpful as well (I'm not sure if approaches described are up-to-date, though).

[0] http://docs.docker.io/en/latest/use/ambassador_pattern_linki...

[1] http://www.slideshare.net/dotCloud/deploying-containers-and-...

Xdes · on March 21, 2014

>It might be a stupid question but I wonder what's considered a leaky abstraction in this case.

I consider poking a hole a leaky abstraction because you are exposing the internals of your stack. The consumer should not know or care that you are using docker containers to serve the application. From a security perspective directly exposing a container may lead to potential exploits of docker itself.

cpuguy83 · on March 21, 2014

Exposing a container would not to exposing potential exploits of Docker. Docker is not running in the container (and doesn't know anything about Docker).

Exposing a container is a hell of a lot safer than exposing a service on the host OS.

Also not sure I follow you how a consumer would know you are using docker or not.

mtrimpe · on March 21, 2014

I just came here to basically say the same; which I guess is the question shared by 80% of Docker's target market.

I have a box sitting somewhere which, like virtually any dedicated machine, is wildly overprovisioned for it's current usage patterns.

I would like to virtualize my services so that I can one day, when my needs outgrow my box, scale out without having to rewrite any code.

My box has limited IPs available, so I'll need the network between services to be private/internal.

How do I set that up with Docker?

I think it won't be until you can truly easily answer that question that Docker will really take off.

hootener · on March 21, 2014

I'm probably just going to show my ignorance, here...but why doesn't container linking solve this problem?

Could you not run multiple docker containers/services behind a single nginx or apache container on a production server? Then the nginx container basically gets one of your public IP addresses, and you use linking to that container to provide it with knowledge of the other running processes' IP addresses (each within their own container, of course). In that way, you have one public facing container which has knowledge of the other containers and can use the information provided through -link to configure the nginx server to route requests appropriately. This requires a bit of bash script / sed command line hackery to update your nginx configuration to accommodate the changing IP addresses of the other containers on restart (unless you can set them by hand now using Docker, we still don't), but once you get it setup you never have to think about it again.

Like I said, maybe I'm just showing my ignorance, but something like the above scenario is how we get around hosting multiple services with limited public IP addresses available.

vidarh · on March 21, 2014

I don't like the container linking because it is basically tied to one server without extra complexity.

CoreOS' "fleet" ( https://coreos.com/docs/launching-containers/launching/launc... ) gives a cluster-wide solution.

But even without using fleet, the overall mechanism is fairly easy to adapt: Either use systemd dependencies like in their example, or have a script that queries docker on each host to spot changes in running containers, and update an etcd instance (or whichever your preferred config server is).

nl · on March 22, 2014

The "extra complexity" needed for multi-machine setups over linking containers is actually pretty minor.

Your services need to read something to get the IP, which ultimately comes from an ENV variable. In the linked container scenario Docker sets that variable. Otherwise you set it manually. That's the only extra complexity.

I was worried about this too, so I tried it out[1]. In this case I have a YAML config file, which can be overridden by ENV variables (which may come from Docker).

This isn't as automatic as CoreOS (eg, no failover etc), but it is a lot less complex.

[1] https://github.com/nlothian/Acuitra/blob/master/services/que...

justincormack · on March 21, 2014

Why not use ipv6 for the network between services?

fideloper · on March 21, 2014

I'd love to see an article on a setup using ipv6, that'd be cool.

stevekemp · on March 21, 2014

I've been thinking along these lines recently, specifically service discovery for front-end load-balancers.

Most (all?) of the available reverse proxies will stop sending traffic to a server that is offline, but not discover them. There are solutions such as etcd which you can hook into, or you can write a toy application to use UDP-broadcasts to advertise "Hey I'm http://dev.local.com/ on port 4444", but there isn't a lot beyond that.

Templating configuration files and running "haproxy reload" is a common enough middle-ground, but I've seen it fail often. (Specifically keepalived not reloading correctly and still sending traffic to old nodes.)

ObRelated: Varnish is a beast that few people can configure easily. I'd love to work on a caching reverse proxy that was simple, extensible, and fast.

gtaylor · on March 21, 2014

> ObRelated: Varnish is a beast that few people can configure easily. I'd love to work on a caching reverse proxy that was simple, extensible, and fast.

It doesn't have as many caching-specific bells and whistles as Varnish, but nginx is an excellent reverse proxy with some caching abilities (and simple configuration).

stevekemp · on March 21, 2014

The biggest problem is you cannot use `proxy_cache_purge` unless you pay for the commercial version/fork of nginx.

That means you can't expire the cached content by URL.

MichaelGG · on March 21, 2014

Doesn't this open-source module do the same thing: http://labs.frickle.com/nginx_ngx_cache_purge/

It seems odd for nginx to try to commercialise such basic parts of the stack where 3rd parties can easily write such functionality.

e98cuenc · on March 22, 2014

A nice feature from the commercial nginx purge package is that it lets you purge by prefix. That's a feature that I've not seen in any of the open source purge modules.

If you are hosting data for several users on the same nginx cache and you want to purge only one of them, your only options are to scan the full cache on disk and delete the files that have a key with your prefix, or fork >$1K/year per nginx box for the commercial license.

rgarcia · on March 21, 2014

Synapse has the ability automatically discover docker containers and configure HAProxy: https://github.com/airbnb/synapse#docker.

SEJeff · on March 21, 2014

Aurbnb wrote a set of apps they dub "smart stack" which do the haproxy config for you.

http://nerds.airbnb.com/smartstack-service-discovery-cloud/

corobo · on March 21, 2014

> Now your HAProxy or Varnish config is going to get bloated and every time you deploy a container the config needs to be modified and reloaded. By this time you might be looking at chef or puppet for automating the config generation.

Varnish at least can route using DNS [0] - You do need a nameserver or two to handle the internal domain of course, but they're reasonably easy to set up using powerdns for example.

[0] https://www.varnish-cache.org/docs/3.0/reference/vcl.html#th...

joeshaw · on March 25, 2014

A colleague of mine recently wrote a post about automating an Nginx reverse proxy for Docker containers:

http://jasonwilder.com/blog/2014/03/25/automated-nginx-rever...

fideloper · on March 21, 2014

I think you can define the IP address assigned to a container via something like `-p 127.18.0.10:80:80`, if that helps with your HAProxy config (but that assumes your host machine isn't changing as well).

Definitely an interesting issue. Have you seen etcd from CoreOS? Useful for service discovery.

vidarh · on March 21, 2014

That doesn't assign an ip address to the container, it just determines which of the hosts addresses are forwarded to the container. But you can certainly use that to keep a fixed IP, if you use a suitable script to update iptables. You can use that if the host machine is changing as well, as long as an IP address is only ever used by containers that will be on the same server. Just add the IP on the new host, remove it from the old, and use arpsend -U -i [ip] [interface] to speed up the ip takeover. I use it fairly regularly to live-migrate services.

nigelk · on March 21, 2014

What are you finding unreliable in terms of tooling for Puppet on Windows?