As promised, I’ve jotted down scattered notes on my first experiences with CouchDB, with ample reliance on references. At this point I have very limited practical experience with CouchDB and more RTFM time, so this post will focus on a little of the theory behind the design.
My initial reaction to couch: it’s unlike any other database I’ve worked with before.
While it shares structural similarities with NoSQL relatives*, it’s construction in Erlang and ability to be a single solution web server makes it stand apart. For the first time since working with web tools, I felt a convergence of the RESTful concept of client/server with peer to peer distributed technology. Every couchDB can act as both client and server, and CouchDB is being ported to a diverse set of platforms.
I’ll begin with a taste of the definitive guide to couch which captures the nature of this database design and communicates a clear message:
If there’s one word to describe CouchDB, it is relax. It is in the title of this book, it is the byline to CouchDB’s official logo, and when you start CouchDB, you see:
Apache CouchDB has started. Time to relax.
After installation^, the definitive guide walks you through some basic HTTP calls to the API. Yup, it’s a server and a DB all rolled into one. This simple concept eluded me after browsing over some of the documentation. There’s a built in administration tool called Futon that guides you through the creation or synchronization of databases.
CouchDB + Another Layer as Web Server is Redundant
After reading a little about CouchDB, I thought it would be best paired with something like multi Node.js as a web server. What I failed to realize was that CouchDB has a server baked in to each database. According to this Stack Overflow question CouchDB is capable of handling a heavy load all on it’s own.
CouchDB inherits super concurrency handling from Erlang, the language it was written in. Erlang uses lightweight processes and message passing between those processes to achieve excellent performance under high concurrent load. It will take advantage of all cpu cores, too.
Both should easily handle tens of thousands of connections, but I would expect CouchDB to handle concurrency better (and with less effort on your part) than Node. And keep in mind that Node adds some latency if you put it in front of CouchDB. That may only be noticeable if you have them on different machines, though.
Writing directly to Couch via TCPSocket is a-ok as long as your write a well-formed HTTP request that follows the spec. (You’re not passing a faux request…that’s a real HTTP request you’re sending just like any other.)
How CouchDB is Architected
Internal storage is JSON data in a B-Tree structure. This framework enables O(log N) speed lookups, insertions and deletions. The following diagram, from the definitive guide eventual consistency section, shows how a view request is handled:
An important restriction that feeds directly into CouchDB’s ability to scale, documents are accessible only by key, and rely on multi-version concurrency control to manage concurrent access (no locking).
When to use CouchDB
At the Berlin Buzzwords conference Jan Lehnardt (coauthor of the definitive guide) described compelling benefits of CouchDB as a choice platform for distributed web apps.
- ease of installation
- document based
- JSON as the common subset of data types shared by all programming languages
- Well behaved HTTP/REST interface and API
- Couch Apps
- Ability to scale up and DOWN
- Availability on many platforms/devices, also mobile (Android, Nokias Maemo/MeeGo and hints on iPhone versions)
- Built in synchronisation, conflict handling and replication
- Database change notification support via HTTP-Socket
*= there are many NoSQL database formats: Redis, MongoDB, Cassandra, memcacheDB, Big Table, etc.
^= My very limited experience is with GeoCouch, Tyler’s choice for Victus Media’s location based GarageDollar service. Installation instructions from source for GeoCouch can be found here at Jason Smith’s batteries included CouchDB build system (see Cheat Codes). Starting with Ubuntu 9.1 and beyond, vanilla couchDB comes pre-installed on Ubuntu distributions.
Just a few of my favorite CouchDB references:
- The Definitive Guide to CouchDB
- The Apache CouchDB Wiki
- Jason Smiths build-couchdb
- vmx’s GeoCouch: the future is now
- Getting Started
- Stack Overflow’s CouchDB tagged questions