Imagine you implement every type of possible security...
Keeping your entire server-stack up-to-date, making sure you have SSL, using strong encryption for logging-in, hashing the passwords, making sure your server can only be reached via SSH, adding firewalls, filters, etc. etc.
Then some hacker in Eastern Europe comes along (or some beginner at the NSA/GCHQ) and finds out that your .git is exposed and somehow gains all vital user-data and admin data.
Being bashed with a boulder repeatedly would probably be less painful than the torture of knowing "I did it all, but they got me with an HTTP request... because nobody thought of double-checking what our VCS is doing".
How many other glaringly obvious mistakes might be out there right now? I can only imagine.
You seem to imply this is a novel attack vector. But it's really just an instance of a very old mistake:
Don't use the root of your app as document root!
It's really as simple as that. Almost all modern apps have a subdirectory "public/" or similar. That one is meant to be used as document root. You only have to ensure there are no sensitive files in there.
If you fail to introduce such a directory, you'll have a game of cat-and-mouse, where you have to add extra webserver rules for each sensitive file: VCS, crypto secrets, private keys, and so on. In that setup it's easy and very likely to forget one. Of course, this then creates the feeling of "How can anybody keep track of this never ending list of security details?"
In the end, this is a blacklist vs. whitelist thing. Like with your firewall, you want one rule that blocks everything and allows only specific stuff. The alternative is to allow for everything, have rules to deny all sensitive stuff, and finally get in trouble for having forgotten one rule (e.g. probably because an additional service was introduced after the firewall rules have been written.)
It takes very little effort in php. You simply have your index.php and any public assets in the document root and then use index.php as a bootstrap to bring up your application. Everything else goes outside of the document root.
Yet, almost all well-known PHP applications don't have a public directory. And almost all non-PHP webapplications do have a public directory (e.g. practically all Python/Django and Ruby/Rails projects).
So the connection to PHP is apparent till today, although it may have to do more with crappy super-cheap hosting providers than with the language itself.
Yes, it is easily avoided. But it is still violated by lots of PHP projects, including well-known projects with large userbases:
* Wordpress
* Tiki
* ... and so on.
I don't think this is by accident: This technique is too old and well-known to be ignored by large projects. Rather, they conciously don't do that to cause less hassle for the occasional admin: Those can simply dump it into some directory and don't have to setup a proper docroot or anything. And extra work for those who want a more secure setup.
This is clearly a usability versus security issue, resolved in the unfortunate, usual way.
I don't have insight into the decision-making at WordPress, but they probably do this because they want to be compatible with as many web hosts as possible, and so many super-cheap shared hosts just give you a public directory that you're supposed to dump everything in.
It's possible to set it up properly on most of these hosts, but it's much more difficult, and if you ever have issues the support team says you're using a "non-supported configuration." At least WordPress lets you move your config file one level above the webroot and will find it automatically.
Look at any larger Python (Django/Flask/...) or Ruby (Rails/...) project, and you will always find a proper "public" directory, although it may be named differently.
Does it? If you're on a shared host with just the one directory, maybe. But if you are configuring your own server, you still point it to htdocs and keep the config below it.
Don't put secret keys in your repository is also the wrong lesson.
The right lesson is: Know where your secret keys are and take the appropriate steps to secure them. Whether that's in the codebase, a properties/ini/conf/whatever file, environment variables, whatever - know where they are and make sure you understand possible threats against them.
This story could just as easily have been written about how easy it is to download ALL_THE_SECRETS.txt. Don't feel smugly secure just because you don't store passwords in git.
Putting keys in a text file doesn't fit the narrative of a generally-careful user forgetting about side effects and metadata.
It's important to know where your keys are, but it's also important to not store your keys in certain ways that are easily overlooked.
A lesson of "don't put secret keys inside the web root" is also useful.
But a lesson of "know where your keys are and secure them" is a bit too short-sighted. You don't just want them to be secure right now, you want the mechanisms keeping them secure to be mistake-resistant.
Don't put them in the code, even if you promise to be super careful.
My approach to security, when discussing things with our engineers:
1. Make a list of everything that absolutely positively cannot live without this data/access/permissions/etc.
2. Put the data somewhere where absolutely nothing whatsoever can ever read it (except root).
3. Figure out what one single change will resolve #2 so that the things in #1 can happen without any other things gaining access.
If you don't do #1, you don't understand your requirements/applications. If you don't do #2, then your data is probably vulnerable through some other mechanism. If you can't do #3 then you probably need to change something else (e.g. stop running all processes as the same user, stop running all services on the same box, stop trusting users, set up more granular sudoers rules, etc).
What I find is that when you come up with an idea for #3, and then come up with a list of side effects, you can actually find a lot of the kinds of issues I mentioned above, for example where the public website CMS (as 'daemon') and the accounting backend (as 'daemon') both have access to the same resources, and thus someone gaining access to the CMS can get the accounting DB user/pass and get access to your transaction records, user database, etc.
No, there are an infinity of things that shouldn't have access to your secret keys. So you have to take a default deny approach, and ensure that only things that positively should have access to your secret keys do.
This system is truly beautiful, I think one of the best suggestions in the thread (definitely the best self-rolled solution not using other tools)
I have a question about redundancy or "What happens if your gatekeper EC2 instance goes down"? If you have multiple gatekeepers could they be set up this way:
- let's say you have five different web apps using a gatekeeper to hold their secrets
- let's say you have n gatekeepers (let's say 3) and each of the apps knows the address of all three gatekeepers.
- If the primary gatekeeper is unreachable, all five apps would try to contact the secondary gatekeeper, but that gatekeeper would only (ever) respond in the event that the secondary gatekeeper also found the primary gatekeeper unreachable.
It's like a sleeper cell - at any given moment you have multiple replacement gatekeepers ready and waiting to serve, but each of them is unable to respond unless the one above it in the list stops responding. In this way you could lose gatekeepers (even permanently) and build a little bit of resilience into the apps depending on it while you're able to sort out what happened and restore normal behaviour.
I'm confused by your reference to "gatekeeper EC2 instances." In the scenario I described, the secrets are housed on S3, not a separate EC2 instance. So, theoretically, as long as the underlying instance running the application code can access S3, and S3 doesn't go down (very unlikely), there shouldn't be any issues.
We (Shopify) use https://github.com/Shopify/ejson -- we store encrypted secrets in the repository, relying on the production server to have the decryption key.
It's relatively common to provision secrets with configuration management software like Chef/puppet/ansible/etc using, e.g. Chef's encrypted data bags.
Another slightly heavier-weight solution with some nice properties is to use a credential broker such as Vault: https://www.vaultproject.io/
Environment variables are the best and easiest way that I know of. You can supply those anyway you want to, and any programming language can easily get their values.
Glad I don't work at your shop then. Environment variables are a terrible way to give your app secure information. There's well over a dozen reasons why you shouldn't do this in your apps, but one super obvious one is there's way to many frameworks that expose environment variables in their debug output if not properly configured. Think you'll never misconfigure a server? Guess again, pretty much every major site (Google, FB, Twitter, Yahoo, EBay, Microsoft, etc) have all done it at some point.
Fair point, I potentially should've left off the first sentence. I stand behind the rest of the post, but the first sentence is a bit on the edge and I apologize.
Alright, well, I've never seen an application/framework spit out environment variables when it was misconfigured. But then again, I barely work with web-related stuff so maybe I just don't use the kind of software that does this. Could you provide some examples?
The "dump environment" problem is an issue for novice developers, but mature shops should have security-conscious frameworks for secrets handling that do things like clear the variable from the environment at initialization time.
If an attacker can get the process that's running the webapp.py to exec some abitrary bash command, that process has the ability to read its own /proc/$PID/environ . In general, you can read /proc/$PID/environ on processes that you own. At least I can do that on my Debian system:
(I actually gave the wrong example in my previous comment. While it is true that giving the ENV on cmdline will show up in ps eaux, the more appropriate example is what I just explained in this comment.)
If you can get it to exec some arbitrary bash command (or otherwise access the environ of a process) you can also have it cat any file on the server, and even the memory of the running processes that belong to the same user as the exploited process, and also execute network requests. So if you get that far, pretty much nothing will protect you.
Sure, but there are some shops that do their security from a point-of-view of "Attacker can run commands on your server as the user that started whatever-public-service/webapp/api", and go from there. I happen to think that's the best way to think about it.
Now, if an attacker manages to get root access then it's game over[1]. That just shouldn't happen. But nobody should be running their webserver as root. So, whatever that user is should be low-powered with only enough privileges to start the webserver & bind port 8080 (and use iptables or whatever to reroute connections to port 80 --> 8080) and the whole setup should be designed that this account won't be able to escalate things further if someone got a bash shell to it.
______
1. You should at least have some way of detecting that it happened and consider all data & files compromised and just wipe the whole machine & start over. Or take that machine offline for investigation into what happened and put a fresh new one in its place.
If an attacker can run an arbitrary command on your server, it's already time to rotate all the credentials in your system and let any data subjects whose data you hold know that you fucked up, big time. That's just the Linux model.
I agree - I was just explaining what the issue the above commenter raised. It just means you should use a saner way of initializing your environment with sensitive values.
My preferred solution currently is to use try to use encrypted strings in config files that are not stored in VCS. The host machine encrypts and decrypts using host specific keys so if the file is copied off-server, it is not fully compromised immediately. This is usually via python script which rewrites the file. (BTW, pretty easy to do on Windows boxes with MS API). I've considered using encrypted folders on windows in addition but not sure if that really makes a difference.
Usually the base config is in VCS but without user/password/db strings. We then manually configure the file with the encrypted strings on the server (usually with the machine name in the filename so that we can use hostname in code to find it and makes it clear the file is machine specific). Not all tools make this easy though and only works if you can add your own code in between. Also prefer files to environment as the files can be locked down easier in my opinion and more obvious what is going on.
I like some of the other solutions that are using encrypted strings but with a keystore server and may consider for the future if they support both windows and linux.
Anything stored in /private/ is not publicly accessible by the web server process, but can be read or written by anything running under the user's username.
It's specifically for storing things like configuration files.
I only just recently had to figure that out. I opted for setting up a .kdb KeePass file in a private git repo and giving everyone ("everyone" = myself + one other) access to that. I'm pretty sure that's not a very good solution.
Config files that are not version controlled, or environment variables. I prefer config files because it's easier for me to communicate to other team members what needs to be present in their local development environment.
I typically handle this by versioning a `config.example` file, which includes all the necessary config keys an application expects. The example file defaults these attrs to various strings meant to show they are examples only. I include instructions to copy the `config.example` to a `config.yml` (or some other appropriate extension), and replace the values as necessary. The `config.yml` file is specifically excluded in the `.gitignore` file. The application will only load the `config.yml` file when started, so I also ensure to raise a descriptive error informing team members when they are missing a local `config.yml`.
This allows the `config.example` to also serve as a self-documenting config for the application, as comments can be included that identify and explain each of the config keys and their purposes.
I store dummy values in VC, then edit the real data on the production server. (And I obviously never check anything in from production, if you can set the production VC user to read only.) This has a nice side effect that if I edit the configuration file the new stuff gets merged in without causing a mess.
Another way is a second file that overrides settings as needed. Although I have found that to be less maintainable if the configuration file changes. That file should be somewhere entirely out of the VC tree.
Either way, the file must be placed in a directory that is not served by the web server.
/include
and
/public
are traditional. Only /public is exposed by the web server.
For me, I have connection criteria for a configuration database as environment variables... the config library will then connect to the configuration server with those credentials and get everything that application needs to connect to other services... I'd considered using etcd for this, but was unstable for me at that time... I keep settings cached for 5 minutes, then the library will re-fetch, in case they changed.
In a configuration file that is not version controlled, or even environment variables, so that your application starts with the right variables, but they are not in some config file.
As I detailed in my other response to your original question, use an example config file that is version controlled. It includes all the necessary config keys, but example-only values. All team members would then be able to easily create a local config file based on the example that works. You can even document the config with comments in the example file so devs know what is needed and what it's for.
I think at one point, if you have a shared password for a development DB, production DB, etc. then just keeping those on a pen and paper notebook is your best solution. Usually, for shared environments such as that (although I hope the team can set-up their own DB's for development!), the number of shared "secrets" is relatively small. Some secrets are best not stored electronically, especially if they can give away user data.
This is the asymmetric nature of security in general.
You only need to make a single mistake and you are hosed. Your attacker can fail an arbitrary number of times and only needs to succeed once.
If you are 99.9% likely to make the right call on anything that could have a security impact then you only need to make 1000 decisions before you probably screwed one up and have a hole.
Some would say this means true security is impossible.
As humans are fallible, it's inevitable we will make mistakes carrying out even what we intend to do, when we know it and are actively trying to do the right thing.
Sometimes our mistakes are not recognizing the right thing to be done, or not recognizing anything at all.
I think the view that true security is impossible is true if security is up to one person. What about a system with multiple layers of sign-off, or an automated system that can help test the security of what you're doing and alert/prevent dangerous behaviour without specific reasons.
A good practice is to disable features that you don't use. I don't think many people need their hidden files to be remotely accessible, so maybe they should either remove the permissions or set a flag in their server so it doesn't allow downloading them.
I did a conference talk at derbycon on exactly this, regarding startups. The amount of obvious holes of founders not knowing what XSS is, or writing bad PHP apps with obvious code execution vulns, or glaring logic and auth mistakes allowing full account hijacks is incredible.
Hell, I recently saw an application where there was unchecked input for being able to download files outside the application... if you passed it a path of, for example `../../somefoo-file` would take you out of that application's path.
This is called either a Local File Inclusion or a Directory Traversal Vulnerability. The name depends on the details. It's really really common, and definitely something I see a lot of.
This is what makes security hard: To attack successfully you only have to find one significant mistake, to defend successfully you can't make any mistakes.
Well, that's why you either have a team of competent people making sure all your stuff is up to date, routinely performing pentests, etc., or you delegate as much as possible of those responsibilities to 3rd parties (e.g. Heroku).
Extremely good point - we should always aim for security in depth and especially don’t rely on people not doing stupid things, because if it is possible then someone will at some point.
Don't you specifically have to configure Apache to allow access to dot directories in the first place? (disclaimer, I didn't read the article... but if access to dot directories were enabled.. ya, .git would be exposed, but I'm pretty sure it isn't the default Apache setting.)
unless you have a damn good reason not to (--checksum is essential to prevent corruption/malicious modification, without it you are implicitly assuming the version on the remote machine is exactly how you left it: that assumption is why Linus built git around shasum in the first place).
rsync is even easier than SSHing to git pull, or opening up a pushable repo on a server. For once the simple approach is clearly better!
> (--checksum is essential to prevent corruption/malicious modification, without it you are implicitly assuming the version on the remote machine is exactly how you left it: that assumption is why Linus built git around shasum in the first place)
That's not true (though not completely wrong).
rsync is stateless. It does not assume the version on the remote machine is "exactly how you left it"; Rather, it compares file size and file modification time; if either changed, it will do a transfer -- efficient delta transfer, usually - which might be as little as 6 bytes if the contents is exactly the same.
--checksum makes it ignore file size or modification time, and compare the file checksum in order to decide if it's time for a transfer (delta or not).
A malicious actor, or bad memory chips, might change your file's contents, but keep the file size and time/date the same. In that case, --checksum will overwrite that file with your source version, and a --no-checksum wouldn't. So it's not bad advice. Whether the cost in disk activity is worth it depends on your thread model, data size, and disk activity costs. (Though, if corruption is due to bad memory, this is the least of your problems)
However, a corruption because of a program error / incompetent edit to the file is very unlikely to leave both the size and modification date intact - and a standard rsync will figure that out as well.
If the comparison is with using Git, then it is clear you're not so resource constrained that you can't countenance running MD5s, since Git would run shasum.
I think in our current laissez-faire climate w.r.t. security, I think recommending leaving security on the basis of saving a few cycles isn't very wise.
> It does not assume the version on the remote machine is "exactly how you left it"
I was ambiguous and sloppy, sorry. It doesn't not check for changes in a secure way, but assumes that, as long as the meta-data for the file matches, the content is as you left it.
When Linus built git, he specifically did so around sha1 to ensure that you ensure the data you think you have in the file, you do in fact have. rsync --checksum is thus a reasonable replacement for git deployment, but rsync --no-checksum isn't, imho.
Sorry if I was vague or misleading, thanks for the clarification.
I'm personally a fan of using git-archive to make a tarball that can be deployed. These tarballs won't contain the .git directory and can be pushed/pulled instead.
One approach that I've found works well (YMMV, etc) is deploying with Ansible. It has a Git module built in (so it's almost 0 work to configure), and you can set up SSH agent forwarding so you never have put keys on the server that have access to your source control, nor manually SSH in and pull.
Amen. Don't use git to deploy code. Use it to version code. Use a script on your CI to compile/test/minify/convert your code into a deployable tar ball and stick that somewhere highly durable like S3, Swift, or your own company filestore.
Remember, git providers go down (i.e. DDOS to GitHub or internal fail at BitBucket). Don't depend on git being up to deploy your code or you'll look like a fool next time a DDOS at GH coincides with a deployment.
Note, if you actually did take it from the stackoverflow, you just infringed on someone's copyright; SO's user content is 'creative commons, attribution required'.
Edit: Thanks for adding the attribution, all clear with copyright now :)
Note, if you actually did take it from the stackoverflow, you just infringed on someone's copyright
Does a simple access rule, which I can't see there being many sane ways to express, meet the minimum level of creativity/originality to be eligible for copyright...?
HN does not use standard Markdown. It uses a very simple markup language possibly inspired by Markdown, but with much more limited functionality.
On HN, only two spaces are necessary for code:
example
And it only supports asterisks for italics, two blank lines for paragraphs, and turning URLs into links; it doesn't support any of the rest of Markdown.
Obviously you shouldn't be storing sensitive information in your codebase (I hope everybody knows that), but the problem here is that you might have been way back when you were prototyping and then moved them out of the codebase. It's really common to start a codebase just by hacking something together with hardcoded secrets.
If you have the proper secret segregation now, but you're deploying by doing a git pull, now you run the risk of not really having segregated secrets all over again.
You probably should revoke all your existing credentials and replace them with fresh ones as soon as you pull them out of the VCS. That way, your attackers have the credentials, but they don't work anymore.
We've left test keys in our git repos. They don't work for anything except a virtual machine used for local development, but I always thought it would be amusing if a hacker grabbed them and got frustrated trying to use them.
So you leave them in as a kind of poisoned honeypot. They'll go right to the wrong info....
I'm sure you've daydreamed of the facial expression of the scriptkiddie the moment he stumbles across your fake keys illicitly, only to be disappointed hours of unfruitful hacking later :)
> Obviously you shouldn't be storing sensitive information in your codebase (I hope everybody knows that)
Sadly, in my experience hardcoding secrets such as (database) passwords and encryption private keys is not uncommon at all in web applications. I don’t like criticising other developers, but sometimes the people who get to make these decisions don’t necessarily have the perspective or experience to make the rights calls.
Growing a project from one shot mindset prototyping is really problematic. Every time I wish I started by using a real project structure and design philosophy.
And then every time I actually start a project with a real project structure and design philosophy, it goes nowhere and I wish I hadn't wasted the time. Or best case, it's used by a few people internal to whatever company is currently employing me, and security doesn't really matter.
The tech industry is shaped like a funnel, with lots of raw, bad ideas at the top and a few smash mega-hits at the bottom. 99% of the ideas at the top are bad; investing more time than is necessary to prove them out is a mistake. 100% of the ideas that make it to the bottom wish that they'd spent more time designing things at the top. But y'know, if they'd actually done that, they wouldn't have made it to the bottom, they'd be outcompeted by the guy who got a quick and dirty prototype up, made his users happy first, and then closed the gaping security holes (hopefully!) before anyone noticed.
The balance is whatever works, gets people using the product, and ideally keeps them happy.
The balance is generally far more on the quick 'n dirty side than most engineers (myself included) would prefer, but we could look at this as a cognitive bias of engineers rather than a failing of nature.
Once it gets to a certain level of "no longer prototype", it can help if you then start VCS fresh by initing a new git repository. You lose the prototyping history, but you probably won't need it anyway.
Is there an automated "security as a service" service that if I subscribed to it, it would have told me that this is a problem on my websites?
It really annoys me randomly hearing about critical security issues through tech news websites - there should be a more systematic way for "non-security professionals" to ensure their sites are protected to best practice levels.
It looks like for some reason Google actually searches for the character "." (U+FF0E, "fullwidth full stop") when performing those sorts of queries, not "." (U+002E, "full stop").
It seems like if you're storing secrets and the like in your code's repo, the solution is to not do that, rather than just putting a bandaid over it by hiding the repo.
Deploy the secrets separately: they don't belong in your site's codebase.
Hiding the repo is hardly a bandaid. It should never be exposed even if the repo is perfectly secret-free.
Except in the rare cases where it is intentional e.g. an open source repo and you happen to want people to download it from the same domain not github or git.domain.com.
Presumably, for most commercial entities, the parts of the site that are valuable are the assets, which are served from the site as part of it doing the thing it's meant for. For a large percentage, they're running a CMS like Wordpress or Drupal or whatever, where the codebase is public anyways. And for even more, we're talking about a directory of hand-crafted HTML files, where the version control is the HTML of the site plus some "damn, I always forget to close my tags" commit messages.
This returns 403 and in my opinion logs should not be turned off for that. I would return 404 to not expose that you are blocking . files with your server.
My suggestion to put in each server { ... }
You find it hard to believe that 0.17% of all websites use git? I'm sure 10x that many do, most probably don't misuse git to deploy rather than solely as a source code manager.
If it's exactly the same repository - no. If it contains some extra branches with local changes, or potentially commits with private information / passwords - definitely.
So in general - it's better not to have it in the first place, because it's unlikely that the person doing the commits knows the whole deployment strategy.
If you don't accidentally expose your credentials in .git/config to push changes to the repo (e.g. GitHub username/password for http auth) there is no risk.
Still - having a real deployment process is much better. It could be as easy as extracting the contents of a tarball generated by git archive.
90% of security incidents are due to human errors, not to some secretive hacker group spending $10m to crack TLS.
Doing system administration right (eg. no secrets in repos) has a lot more impact on security than implementing all the other complex controls.
I wonder what would happen if you searched for .svn, too. I'm sure you'd run into the same problem in many places. But would it be more or less likely to occur?
I think less likely. Svn actually had an `export` command, which allowed you to do a checkout of a specific commit with no svn metadata. If someone was actually using svn for deployment, they likely knew about it. (http://svnbook.red-bean.com/en/1.7/svn.ref.svn.c.export.html)
No, archive is very different. `svn export` could be used at the target side. Basically you can export from remote repository to the chosen directory without any extra operations, so .svn is never created.
`git archive` requires you to have a clone of the repo from which you can create an archive. That means people are more likely to just do a local checkout than play with archive on top of it.
Deploy with svn export:
svn export url.of.repo destination/path
Deploy with git archive:
git clone url.of.repo
cd repo_name
git archive --format=tar some_commit_or_branch | (cd destination/path && tar -xf -)
I completely agree, every time I do a new deployment script for git, I miss the old svn export command, I hate the idea of cloning on the deployment machine, having to git sync in the script etc. Whereas 'svn export' feels stateless, and is clean by construction.
In svn's heyday, the standard way to install or update a popular app like WordPress was to download and extract a tarball. Only people who actually participated in the development of the app itself used svn.
Nowadays, lots of open-source projects encourage ordinary webmasters to clone a Github repo and run `git pull` to update.
So I suspect that public .svn folders will be less common.
That's problematic in itself. "git" has no business being installed on your public-facing web server. Nor should there be any compilers installed, nor any scripting languages not explicitly needed, etc. etc.
Some of the other commenters suggest adding git-dir and work-tree to the git commands, but there's a better solution: use the --separate-git-dir option when cloning the repository.
where <repo dir> is outside of any directory served by the web server and <working copy> is the htdocs root.
This option makes <working copy>/.git a file whose content is:
gitdir: <repo dir>
The advantage is that all git commands work as usual, without the need to set git-dir and work-tree, and that there's nothing special to add to the web server configuration.
> Production servers have no business having the .git directory anywhere.
I agree with you in principle, but in practice this is not always possible, there are situations when having a git checkout in production is better than nothing.
I've seen WordPress sites where a semi-technical administrator updates plugins and themes directly in production: with that git checkout I would at least be able to track the changes and pull them in a dev or staging environment.
This could be the first step to a saner deployment workflow for those sites, where production gets changes that have been tested and validated elsewhere.
In another project, I found that the server had register globals turned on, and therefore could craft a URL like:
http://example.com/admin?valid_user=1, where valid_user was a PHP variable set to true iff their session cookie could be authenticated in the database.
I think it's terrifying that these things still make it through to production websites
Someone on StackOverflow says this will tell nginx not to serve hidden files.
location ~ /\. { return 403; }
My question - do I need to put this once at the top of my configuration file and all it good or does it need to go into multiple places in the nginx config?
It would be great if there was a simple, universal way to say to nginx "don't serve hidden files from anywhere under any circumstances".
Why are people serving web traffic to a folder with a .git folder anyways? I thought it was basic deployment practice to export your code OUT of the VCS before deploying... every shop I've worked at had this in place.
Other solutions just seem hackish to me, but every project is different I suppose.
This is one example for why hierarchical directories are bad. They're not all bad, but with all their power and flexibility, they carry some inherent flaws. It's much the same as JSON, which is also versatile and highly useful. However, both of these abstractions tend to lead to the ad-hoc creation of more complexity and more details to remember, while having no clearly delineated way to be self-describing.
(Is JSON an abstraction? Not really, as it's a concrete spec, but its general kind of serialization format is an (incomplete) abstraction.)
It always felt unclean to have the source repo on production, so we use deployhq.com (similar: circleci.com, Jenkins) to push changes up when code changes are made, rather than pull them from git. We also use a clean-up script that removes any SASS, Grunt, etc source files when deploying to production.
It's clear the problem involves some PHP sites developed with git and instead of using a specific www directory inside the project the server points to the root folder of the project thus exposing .git (and the rest). Classic dumb error by PHP developers. I have hard time believing one would be able to expose the .git folder in a Rails,Spring or Django application since the public folder isn't the root folder of the project.
I wish servers would be configured so they don't server ^\..+$ files by default. I wish servers would behave as secure as possible then it's up to the developer to whitelist features rather than the other way around.
True, but you can whitelist /.well-known/. I don't think anything else uses dot-filenames in URLs, because not all operating systems and software even allow such file names (for instance, the file browser in Windows forbids it when creating a new file or folder).
There are configs within nginx that prevent this from happening. Also, I symlink only the publicfiles from a different directory into my /var/www/html folder.
The author doesn't need to give such suggestions; that's out of scope of this article. There are many possibilities however.
This simplest solution is to add something like ' location ~ /\.git { deny all; }' to your nginx config (or just hide all paths starting with '.'; you rarely want to serve hidden files).
The most important practice for remediating this is to have a very clear disconnect between deployment artifacts and development repositories. You want to have your developers write code, and then some preprocessing step to distill that repository into the minimal deployment artifact. This artifact then goes through your integ tests on a gamma stage and onto prod. Whatever deployment system you use should know what git commit this artifact originated from, but the artifact should not have that information on its own.
If you want the really poor man's version of this, just have a build system in your repository which populates an "out" directory with everything you actually want to deploy, and then rsync that sucker around.
If you want a hip answer to this, Docker has the "Dockerfile" which will specify all artifacts that should be added into it, and should serve as a distillation of what's needed to run your application.
I'll point out that this issue won't affect many types of sites in the first place. For example, if you use any sane rails setup, you already have the "public" directory which you use as your static files root, and then you proxy to the unicorn/rack/whatever process which will never serve such files.
I would wager that the majority of these sites are poorly configured apache2 + php messes since php made the massive mistake of having the filesystem act as routing; any web framework with good routing (e.g. rails, django, even sinatra, web.py) will not suffer from this without going out of your way a bit.
Then you just create a symlink to the work tree for your website root, or put the work tree there, or whatever, depending on preference. Can't say this is perfect, but it works pretty well for smaller projects.
I personally prefer to have my web directory be a sub folder of the project root. This not only solves issues with .git directory be accessible but also helps prevent gotchas with exposing documentation or other data not intended for public consumption.
I simply use a 'checkout' process where all website assets to be deployed are copied into for example a /dst folder and the folder then rsync'd to the webroot (with rsync -ru --delete --chmod). This avoids clutter and still allows to just update changed files. The actual page source is still under git.
599 out of every 600 sites don't have the problem of a public facing .git folder. I don't think alternative deployments need to be suggested as they are commonplace.
It depends on what sort of platform and server you're running on but just do what you have to do so you're not serving your .git.
Besides what TheDong has suggested (hiding your .git folder from public view), you could also have a build server that in the end tarballs everything up and deploys it onto your live servers.
Personally I would use standard practices of the open source world; package your app properly into a .deb/.rpm, deploy that and push configuration for it out using puppet/ansible/etc. aka, debops!
I see a lot of discussion here about best practices to avoid this, but nothing about the obvious question: Why is it that the go to version control system for developers everywhere is designed such that there it puts a single file at the root that magically gives anybody that can see it the ability to download all the source code in the repository?
That is:
- completely non-obvious and unexpected
- a terrible idea
Why, instead of trying to figure out how to avoid handing this magic file out to everybody, are we not trying to fix it so that no such magic file need exist?
Where else would you suggest Git keeps the version control information for a repository, if not at the root of that repository? SVN tried it in every subdirectory, which is worse, using a separate db server would add unwelcome external dependencies. It could be in a non-hidden folder, but that's annoying for those who just want to see their files under VC, not an implementation detail of the VC method. Also, there are hundreds of dot files scattered around your computer, and none of those should go anywhere near your host - .git is just one example.
The error here is uploading sensitive or hidden information to a web host into a public directory, not how it is stored locally.
If you use the root of your app, including source code and hidden files, as the public directory of your website, one permissions error means all sorts of things might be exposed, e.g. other dotfiles would also be exposed, and potentially all of your source code too, because you're relying on the web server to hide it somehow in every instance. That's the problem that needs fixed here (exposing the wrong files to public root), not that one particular hidden folder exists.
The terrible idea is using the root of your source as the root of your published application, which has been considered bad practise as long as I can remember (and that is long before the existence of git).
The root of your project should contain nothing more than build documentation/scripts and other developer/user notes & scripts (which for a web application could be as simple as "expose the subdirectory call "public" via your web server" but could be much more for more complex applications that have larger build requirements).
Ignoring this long held recommended practise (which less experienced developers might not be aware of): git originally came from an environment where you couldn't simple expose your repository as your application. The Linux kernel and other projects needed building from source before they could be put into production. So this is in part due to people using a tool in a new context without sufficiently thinking about the possible implications (that the tool designer, thinking about other environments, might not have considered). Security requires a lot of "due diligence" like this unfortunately: you can't expect the tool designer to be aware of all the potential security considerations in your environment, you have to deduce and mitigate them yourself.
And how would you expect it to work? Keep the repo in ~/.local/ or something like AppData? What happens then if you want to have two clones of the same repo? What if you want to move a clone around (e.g. to a different machine)? Of course it's possible to find solutions to these issues, but they will never be as simple and easy to use as the current model (btw: git supports having the magic directory external to the working tree).
It's non-obvious and unexpected if you have never used a VCS and didn't read a single page describing git. In which case almost everything in git will be unexpected and non-obvious (so will be any programming language or technology).
But I agree that the problem shouldn't be trying to avoid handling out the .git dir to everybody. The problem is - git clone in document root of the webserver for website deployment is a terrible idea and was never a supported usecase.
edit: so I came the third! Should try typing quicker.
It's actually not a problem for many web developers. If you're working in Rails, Django, etc., you're not going to have the DOCROOT pointed at your top level directory. And .git is only in the top level directory, which is already an improvement over .svn.
But let's say we agree it's an issue and should be fixed. What alternative solution would you propose? You can't just get rid of the "magic file" because it actually contains your version history, the thing you wanted to track in the first place.
Imagine you find a typo or bug on some site that you use daily and it really annoys you. How awesome would it be able to just `git clone http://www.microsoft.com/`, fix the issue and then either host your own version of the site or submit a pull-request. Seems like it's already possible on 600 websites :) (well they probably didn't run `git update-server-info` ).
git (and hg and svn and tar and cpio and pax and zip and ...) should set the permission of the archive (vcs db) to the most restrictive of the permissions of those added.
Also ... webservers should make it very hard to offer up .dotfiles as webcontent.
The model of exporting a Unix directory as the structure of a website is barely good enough for static sites (you'll get into all kinds of problems with URL management), and is completely unsuited for applications.
Now, of course, PHP was created as a tool to add a visitor counter at the bottom to your pages. With a bit of caution, it's indeed secure enough for that. Nowadays people create huge applications using the same security model, and PHP developers don't even think about changing it.
Keeping your entire server-stack up-to-date, making sure you have SSL, using strong encryption for logging-in, hashing the passwords, making sure your server can only be reached via SSH, adding firewalls, filters, etc. etc.
Then some hacker in Eastern Europe comes along (or some beginner at the NSA/GCHQ) and finds out that your .git is exposed and somehow gains all vital user-data and admin data.
Being bashed with a boulder repeatedly would probably be less painful than the torture of knowing "I did it all, but they got me with an HTTP request... because nobody thought of double-checking what our VCS is doing".
How many other glaringly obvious mistakes might be out there right now? I can only imagine.