Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Re. File size growth in Couch:

Couch files are written in an append only fashion so that all operations are considered appends, such as updates. The main upside is durability meaning the file is always readable and never left in an odd state. As noted, however, this has the downside of requiring compaction to reclaim disk space. You should note that if you are using bitcask, the default backend for riak, you will have the same problem that you had with couch as bitcask is also an aol (append only log) (aka. wol - write only log). Bitcask will also need compaction but as of the latest release there are various knobs to tweak regarding when that is queued up and when it is executed.

Re. database being one file:

I'm not exactly sure why couch uses one file but I suspect it is due to the use of b-trees internally. Would love to hear more from someone more experienced in couch. Riak splits its data amongst many files. Specifically one file (or 'cask' when using the default bitcask backend) per 'vnode' (virtual node, 64 by default). The drawback here is that you need to ensure that your environment has enough open file descriptors to service requests.



There are several reasons for CouchDB's use of a single, strictly append-only file, but the b-tree is not one of them.

In fact, it's one of CouchDB's most clever tricks to store a b-tree in an append-only fashion (it's far more common to update in place).

Two reasons CouchDB is strictly append only.

1) Safety: By never overwriting data, it avoids a whole class of data corruption issues that plague update-in-place schemes.

2) Performance: Writing at the end of the file allocates new blocks, leading to lower fragmentation and less seeking than an update-in-place scheme.

"durability" does not mean "the file is...never left in an odd state". The right word there is "consistent". CouchDB provides both (In fact it provides all four ACID properties, but the scope of a transaction is constrained to a single document update).

Finally, I should point out that using a single file, as opposed to a series of append-only files (like Berkeley JE, for example) is just a pragmatic choice. Nothing in principle would prevent a JE style approach, it's just a lot of (quite hard) work to do well.


The main two reasons that CouchDB doesn't use multiple files per database are system limits and increased complexity.

CouchDB has a bit of an alternative design in that it accepts that people might be running a large number of databases on a single node. I don't remember the exact numbers but I think we've heard of deploys using 10-100K (small) db's on a single node.

As to complexity, with a single file, there's no magical fsync dance to coordinate when committing data to disk. Its not unpossible, its just more complex.

Its not out of the question that CouchDB will move to using multiple files per database, but as its open source, the biggest road block so far is that no one's needed it badly enough to implement it.


I've noticed that interesting things tend to show up in bigcouch before couchdb. So, if I were starting a couchdb based project today, it would be with cloudant.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: