Don't put secret keys in your repository is also the wrong lesson.
The right lesson is: Know where your secret keys are and take the appropriate steps to secure them. Whether that's in the codebase, a properties/ini/conf/whatever file, environment variables, whatever - know where they are and make sure you understand possible threats against them.
This story could just as easily have been written about how easy it is to download ALL_THE_SECRETS.txt. Don't feel smugly secure just because you don't store passwords in git.
Putting keys in a text file doesn't fit the narrative of a generally-careful user forgetting about side effects and metadata.
It's important to know where your keys are, but it's also important to not store your keys in certain ways that are easily overlooked.
A lesson of "don't put secret keys inside the web root" is also useful.
But a lesson of "know where your keys are and secure them" is a bit too short-sighted. You don't just want them to be secure right now, you want the mechanisms keeping them secure to be mistake-resistant.
Don't put them in the code, even if you promise to be super careful.
My approach to security, when discussing things with our engineers:
1. Make a list of everything that absolutely positively cannot live without this data/access/permissions/etc.
2. Put the data somewhere where absolutely nothing whatsoever can ever read it (except root).
3. Figure out what one single change will resolve #2 so that the things in #1 can happen without any other things gaining access.
If you don't do #1, you don't understand your requirements/applications. If you don't do #2, then your data is probably vulnerable through some other mechanism. If you can't do #3 then you probably need to change something else (e.g. stop running all processes as the same user, stop running all services on the same box, stop trusting users, set up more granular sudoers rules, etc).
What I find is that when you come up with an idea for #3, and then come up with a list of side effects, you can actually find a lot of the kinds of issues I mentioned above, for example where the public website CMS (as 'daemon') and the accounting backend (as 'daemon') both have access to the same resources, and thus someone gaining access to the CMS can get the accounting DB user/pass and get access to your transaction records, user database, etc.
No, there are an infinity of things that shouldn't have access to your secret keys. So you have to take a default deny approach, and ensure that only things that positively should have access to your secret keys do.
This system is truly beautiful, I think one of the best suggestions in the thread (definitely the best self-rolled solution not using other tools)
I have a question about redundancy or "What happens if your gatekeper EC2 instance goes down"? If you have multiple gatekeepers could they be set up this way:
- let's say you have five different web apps using a gatekeeper to hold their secrets
- let's say you have n gatekeepers (let's say 3) and each of the apps knows the address of all three gatekeepers.
- If the primary gatekeeper is unreachable, all five apps would try to contact the secondary gatekeeper, but that gatekeeper would only (ever) respond in the event that the secondary gatekeeper also found the primary gatekeeper unreachable.
It's like a sleeper cell - at any given moment you have multiple replacement gatekeepers ready and waiting to serve, but each of them is unable to respond unless the one above it in the list stops responding. In this way you could lose gatekeepers (even permanently) and build a little bit of resilience into the apps depending on it while you're able to sort out what happened and restore normal behaviour.
I'm confused by your reference to "gatekeeper EC2 instances." In the scenario I described, the secrets are housed on S3, not a separate EC2 instance. So, theoretically, as long as the underlying instance running the application code can access S3, and S3 doesn't go down (very unlikely), there shouldn't be any issues.
We (Shopify) use https://github.com/Shopify/ejson -- we store encrypted secrets in the repository, relying on the production server to have the decryption key.
It's relatively common to provision secrets with configuration management software like Chef/puppet/ansible/etc using, e.g. Chef's encrypted data bags.
Another slightly heavier-weight solution with some nice properties is to use a credential broker such as Vault: https://www.vaultproject.io/
Environment variables are the best and easiest way that I know of. You can supply those anyway you want to, and any programming language can easily get their values.
Glad I don't work at your shop then. Environment variables are a terrible way to give your app secure information. There's well over a dozen reasons why you shouldn't do this in your apps, but one super obvious one is there's way to many frameworks that expose environment variables in their debug output if not properly configured. Think you'll never misconfigure a server? Guess again, pretty much every major site (Google, FB, Twitter, Yahoo, EBay, Microsoft, etc) have all done it at some point.
Fair point, I potentially should've left off the first sentence. I stand behind the rest of the post, but the first sentence is a bit on the edge and I apologize.
Alright, well, I've never seen an application/framework spit out environment variables when it was misconfigured. But then again, I barely work with web-related stuff so maybe I just don't use the kind of software that does this. Could you provide some examples?
The "dump environment" problem is an issue for novice developers, but mature shops should have security-conscious frameworks for secrets handling that do things like clear the variable from the environment at initialization time.
If an attacker can get the process that's running the webapp.py to exec some abitrary bash command, that process has the ability to read its own /proc/$PID/environ . In general, you can read /proc/$PID/environ on processes that you own. At least I can do that on my Debian system:
(I actually gave the wrong example in my previous comment. While it is true that giving the ENV on cmdline will show up in ps eaux, the more appropriate example is what I just explained in this comment.)
If you can get it to exec some arbitrary bash command (or otherwise access the environ of a process) you can also have it cat any file on the server, and even the memory of the running processes that belong to the same user as the exploited process, and also execute network requests. So if you get that far, pretty much nothing will protect you.
Sure, but there are some shops that do their security from a point-of-view of "Attacker can run commands on your server as the user that started whatever-public-service/webapp/api", and go from there. I happen to think that's the best way to think about it.
Now, if an attacker manages to get root access then it's game over[1]. That just shouldn't happen. But nobody should be running their webserver as root. So, whatever that user is should be low-powered with only enough privileges to start the webserver & bind port 8080 (and use iptables or whatever to reroute connections to port 80 --> 8080) and the whole setup should be designed that this account won't be able to escalate things further if someone got a bash shell to it.
______
1. You should at least have some way of detecting that it happened and consider all data & files compromised and just wipe the whole machine & start over. Or take that machine offline for investigation into what happened and put a fresh new one in its place.
If an attacker can run an arbitrary command on your server, it's already time to rotate all the credentials in your system and let any data subjects whose data you hold know that you fucked up, big time. That's just the Linux model.
I agree - I was just explaining what the issue the above commenter raised. It just means you should use a saner way of initializing your environment with sensitive values.
My preferred solution currently is to use try to use encrypted strings in config files that are not stored in VCS. The host machine encrypts and decrypts using host specific keys so if the file is copied off-server, it is not fully compromised immediately. This is usually via python script which rewrites the file. (BTW, pretty easy to do on Windows boxes with MS API). I've considered using encrypted folders on windows in addition but not sure if that really makes a difference.
Usually the base config is in VCS but without user/password/db strings. We then manually configure the file with the encrypted strings on the server (usually with the machine name in the filename so that we can use hostname in code to find it and makes it clear the file is machine specific). Not all tools make this easy though and only works if you can add your own code in between. Also prefer files to environment as the files can be locked down easier in my opinion and more obvious what is going on.
I like some of the other solutions that are using encrypted strings but with a keystore server and may consider for the future if they support both windows and linux.
Anything stored in /private/ is not publicly accessible by the web server process, but can be read or written by anything running under the user's username.
It's specifically for storing things like configuration files.
I only just recently had to figure that out. I opted for setting up a .kdb KeePass file in a private git repo and giving everyone ("everyone" = myself + one other) access to that. I'm pretty sure that's not a very good solution.
Config files that are not version controlled, or environment variables. I prefer config files because it's easier for me to communicate to other team members what needs to be present in their local development environment.
I typically handle this by versioning a `config.example` file, which includes all the necessary config keys an application expects. The example file defaults these attrs to various strings meant to show they are examples only. I include instructions to copy the `config.example` to a `config.yml` (or some other appropriate extension), and replace the values as necessary. The `config.yml` file is specifically excluded in the `.gitignore` file. The application will only load the `config.yml` file when started, so I also ensure to raise a descriptive error informing team members when they are missing a local `config.yml`.
This allows the `config.example` to also serve as a self-documenting config for the application, as comments can be included that identify and explain each of the config keys and their purposes.
I store dummy values in VC, then edit the real data on the production server. (And I obviously never check anything in from production, if you can set the production VC user to read only.) This has a nice side effect that if I edit the configuration file the new stuff gets merged in without causing a mess.
Another way is a second file that overrides settings as needed. Although I have found that to be less maintainable if the configuration file changes. That file should be somewhere entirely out of the VC tree.
Either way, the file must be placed in a directory that is not served by the web server.
/include
and
/public
are traditional. Only /public is exposed by the web server.
For me, I have connection criteria for a configuration database as environment variables... the config library will then connect to the configuration server with those credentials and get everything that application needs to connect to other services... I'd considered using etcd for this, but was unstable for me at that time... I keep settings cached for 5 minutes, then the library will re-fetch, in case they changed.
In a configuration file that is not version controlled, or even environment variables, so that your application starts with the right variables, but they are not in some config file.
As I detailed in my other response to your original question, use an example config file that is version controlled. It includes all the necessary config keys, but example-only values. All team members would then be able to easily create a local config file based on the example that works. You can even document the config with comments in the example file so devs know what is needed and what it's for.
I think at one point, if you have a shared password for a development DB, production DB, etc. then just keeping those on a pen and paper notebook is your best solution. Usually, for shared environments such as that (although I hope the team can set-up their own DB's for development!), the number of shared "secrets" is relatively small. Some secrets are best not stored electronically, especially if they can give away user data.
Don't put secret keys in your repository.
Someone getting a copy of your code should be a big annoyance at worst.