Justin starts a blog

Text

The Elephant on the Couch

rainblog:

I’ve been playing around with CouchDB for a personal project, and finding a lot to like about it. After years of working with SQL databases, adapting to the CouchDB way of doing things requires some mental readjustment, but it definitely has some powerful features.

One of the big differences between CouchDB and a traditional relational database is the way that you get information out of it. In a relational database, you write your request in a query language like SQL and the DBMS goes away, evaluates the query, and returns whatever it can find that matches. If you don’t get what you want, you fiddle with your SQL and try again. Depending on how you organize your database and how you write your SQL, your query can return in fractions of a second or multiple hours.

CouchDB takes a different tack. You don’t, generally speaking, write arbitrary queries. Rather, you figure out what you’re going to need to do in advance and write a Javascript expression that will retrieve the necessary information for you. CouchDB uses this expression to build a view, which is an index over the document store. The closest equivalent to a SQL query in CouchDB involves asking CouchDB to give you back the contents of a view, or some part thereof. Moreover, when you add a new document to the store, CouchDB checks it against your Javascript and indexes it into the predefined views so that it can return it lightning-fast in future.

And here’s where the elephant comes in, the dirty little secret about CouchDB that you won’t read about in most of the documentation. If you add or change a document, things go very fast. But if you add or change a view, then CouchDB has to rebuild the view. That means that it needs to re-evaluate the Javascript expression for every document in the store. And that is slow, slow, slow.

CouchDB recomputes views on demand. If you change ‘view1’, nothing will happen until you first ask for some results from ‘view1’. Then it’ll go off and rebuild the view. And that will take time. In my test store, which contains 130,000 modestly-sized documents, re-computing a very simple view took 30 minutes on a lightly-loaded dual Xeon. The first user who queries that view immediately after I’ve changed its definition is going to be sitting there for half an hour waiting for an answer. The second user who executes the same query, of course, is going to get his results in an eyeblink. But that first half-hour wait comes as a shock if you’re not expecting it.

I’m not really ragging on CouchDB for this. It’s difficult to see how it could be any other way and it’s true that time spent in one place is saved in another. Still, I can’t help wondering how this would play out in production with a non-trivial document store. If someone comes to you with a new requirement for your SQL database, you write a new query and there’s the end of it. If someone needs something new out of your CouchDB document store or - Heaven forbid - you didn’t get the definition right the first time, you’re looking at noticeable downtime.

I haven’t yet checked to see whether you can mitigate the problem by using separate design documents, but that feels rather kludgey. Moreover, one thing that is clear is that CouchDB is doing some unnecessary work: if I have an existing ‘view1’ and I add a new ‘view2’, my first request for data from ‘view1’ will take half an hour. I don’t know if CouchDB is just computing the new ‘view2’ or if it’s rebuilding the unchanged ‘view1’ as well, but I do know that I’m waiting an awfully long time for that prompt to come back.

There’s much to like about CouchDB, but if you’re planning to use it seriously, you need to be aware of that elephant.

I also found this model to be frustrating, especially with as much love as I have for javascript. My thoughts on NoSQL now are to use SQL for lists of things and use NoSQL for storage of big chunks of data and quick lookups. Sort of like an array vs. a hash.

via rainblog
Posted on Thursday, October 29 2009.
3
Notes
  1. marco liked this
  2. justinday reblogged this from rainblog and added:
    I also found this model...be frustrating, especially...as...
  3. rainblog posted this
Justin starts a blog

nyc. hacker. blip.tv. vegan. drunkard.

Previous Next