Re. File size growth in Couch: Couch files are written in an append only fashion...

rnewson · on March 7, 2011

There are several reasons for CouchDB's use of a single, strictly append-only file, but the b-tree is not one of them.

In fact, it's one of CouchDB's most clever tricks to store a b-tree in an append-only fashion (it's far more common to update in place).

Two reasons CouchDB is strictly append only.

1) Safety: By never overwriting data, it avoids a whole class of data corruption issues that plague update-in-place schemes.

2) Performance: Writing at the end of the file allocates new blocks, leading to lower fragmentation and less seeking than an update-in-place scheme.

"durability" does not mean "the file is...never left in an odd state". The right word there is "consistent". CouchDB provides both (In fact it provides all four ACID properties, but the scope of a transaction is constrained to a single document update).

Finally, I should point out that using a single file, as opposed to a series of append-only files (like Berkeley JE, for example) is just a pragmatic choice. Nothing in principle would prevent a JE style approach, it's just a lot of (quite hard) work to do well.

davisp · on March 7, 2011

The main two reasons that CouchDB doesn't use multiple files per database are system limits and increased complexity.

CouchDB has a bit of an alternative design in that it accepts that people might be running a large number of databases on a single node. I don't remember the exact numbers but I think we've heard of deploys using 10-100K (small) db's on a single node.

As to complexity, with a single file, there's no magical fsync dance to coordinate when committing data to disk. Its not unpossible, its just more complex.

Its not out of the question that CouchDB will move to using multiple files per database, but as its open source, the biggest road block so far is that no one's needed it badly enough to implement it.

rch · on March 7, 2011

I've noticed that interesting things tend to show up in bigcouch before couchdb. So, if I were starting a couchdb based project today, it would be with cloudant.