May 29 2007

A project I don’t want to do

Published by Brian at 5:13 pm under Uncategorized

I’ve had this idea kicking around for a while, and I’ve decided that I don’t want to do it. But I think it would be, quite likely, a very popular idea. So I’d though I’d throw it out, and if anyone else wanted to do it, they can have at it.

The basic idea is this: the vast majority of people who are using databases at this point don’t really want a database. Specifically, they don’t care about the relational calculus or SQL, and indeed are going to great pains to hide the fact that these even exist. Let alone take advantage of them.

What they really want is a shared persistent object store that takes care of synchronization issues.

So give it to them. Ideas on how to do this after this short commercial break.

The first thing is, when I say an “Object store”, what I really mean is a tuple and blob store. A tuple, in this context, is an arbitrary ordered collection of values with associated types (int, string, float, etc.). A tuple can also contain references to other tuples as members, and so on. Note that all typing (to the extent types are checked at all) is dynamic, and each tuple has it’s own, unique, type. This makes it very easy to mix and match types- for an example, in the program it may be an array of GeometricObject types, which can be elements of either Lines (four variables) or Points (two variables). In the tuple store, it’s just a single tuple containing references to other tuples, some of which are tuples of two floats (the points) and some of which are tuples of four floats (the lines). The advantage of this sort of object model is that just about any other object model can be layered on top of it.

Each tuple should have a unique identifier. Which allows a program to request a specific tuple without have to figure out a search query for it. Which is good, because programmers don’t want to write search queries in other languages, to be executed somewhere else. If they did, they’d be using SQL (this is one thing SQL is really good at- doing sorting and searching for you). No, if the programmer wants to find an object in a table, they’re write a quick for loop to walk down the array (tables are arrays, don’t you know?) looking for the right object.

For synchronization, I’d steal a leaf from the hardware engineers and implement a MESI protocol. Each object that is backing stored is in one of four states: Modified, Exclusive, Shared, or Invalid. An Invalid object is one where the tuple data is not stored locally, and before the object can be either read or written to, it has to be first fetched from the server. Shared objects are objects where the state is current locally, but other copies can exist in other programs- in other words, the object can be read but not written to. Exclusive objects have been locked to the local program- they can be written to, but haven’t yet. In other words, the state locally and on the server is still the same. Modified objects are objects that have had their internal state changed locally, but these changes have not been pushed back to the server.

The important idea here is that communication works both ways- state changes can be initiated by both the program and the server. The program can ask the server (or tell the server) to upgrade or downgrade it’s state- for example, a function which modifies the object’s state could upgrade a Shared object to an Exclusive object, before performing the modification- sending a message to the server. The server could then send a message to the other programs that also have shared instances of that object, asking them to downgrade their status to Invalid, before replying “OK, you’ve got the object exclusively”. Some time after the modification to the object is made, the server can then request the Modified object be downgraded to, say, shared- at which point the program needs to update the state of the object on the server. Clever library writers should be able to handle this protocol automatically using decorators- but that’s a language dependent issue.

The last feature required is absolute, brain-dead simplicity to install. If you have to read more than a page of information, and spend more than half an hour to set up and tune an installation, that’s too complex. And any maintainance requirements more complex that a nightly backup of the files is too complex. Heck, count yourself lucky if they bother to back up the files every night.

So that’s the idea: the crowd is headed off that direction anyways, you might as well follow from in front and declare yourself a leader. I’ve thought about it, and this is just not an idea I’m interested in pursuing. So here it is- I present it to you gratis.

If people are interested, I might do a post on some of the things wrong with the idea.

Popularity: 4% [?]

10 Responses to “A project I don’t want to do”

  1. Adam Rosienon 29 May 2007 at 6:24 pm

    We\’ve built something similar at the company I work for, Sharpcast (http://www.sharpcast.com), though we have a different take on the synchronization protocol. Couchdb is also similar.

  2. Brianon 30 May 2007 at 2:20 pm

    Good. The project exists, and I don’t have to build it, so it’s not on my conscience. :-)

  3. Roberton 31 May 2007 at 3:29 pm

    Adam –

    Have you open sourced that project from Sharpcast?

    Nobody in particular –

    CouchDb really wants to meet Mercurial.

  4. Sandro Magion 02 Jun 2007 at 9:37 am

    This “object store” is the approach taken by the web-calculus as implemented in the Waterken server:

    http://waterken.sourceforge.net/

    Bonus: each objects assigned a guid can now be trivially exported over a network as a url. Each database is essentially a singly-threaded, isolated event-loop (like a Vat in E), and you now have a distributed object system instead of a local persistent object system.

    However, there will always be a need to query some data. This can be solved be creating and persisting indexes of the data of interest, and keeping the index in synch with changes to the data structure.

  5. Roberton 04 Jun 2007 at 1:26 pm

    Interestingly, there seems to be an undercurrent of “drop the RDBMS” in the hip Rails circles as a way of increasing performance. Here is an example.

  6. Andre Pangon 06 Jun 2007 at 7:56 am

    Sounds a lot like Apple’s Core Data, although Core Data is Cocoa-only, and thus has a limited audience. It’s the bees knees for Mac programmers, though!

  7. Brianon 06 Jun 2007 at 2:10 pm

    Let me take one quote from that article and rip it to shreds:

    Why does YouTube need an RDBMS? It serves a file that people can comment on.

    If that’s all the website does, then no, a RDBMS is probably not needed. But my second law of programming is that the complexity of a program increases over time (my first law is that kludges multiply).

    But once I start doing more than that, things start getting complex. Say I want to be able to view all the comments made by a given commenter. Or say I want some sort of index of what videos are about. Welcome to complexity, and it only gets worse from here.

    And, if you’re not using the relational part of the RDBMS, it’s performance is going to suck.

  8. Roberton 07 Jun 2007 at 5:14 am

    But once I start doing more than that, things start getting complex. Say I want to be able to view all the comments made by a given commenter. Or say I want some sort of index of what videos are about. Welcome to complexity, and it only gets worse from here.

    And, if you’re not using the relational part of the RDBMS, it’s performance is going to suck.

    I’m not so sure that’s true. Check out Lucene: while it’s not blazing fast, it’s hard to sell it as “sucking”.

  9. Enfranchised Mindon 24 Sep 2007 at 9:58 am

    CouchDB — The Project Brian Didn’t Want to Do…

    CouchDB seems to be the implementation of the Project Brian Doesn’t Want To Do.
    Popularity: unranked [?]……

  10. KWBon 07 Mar 2008 at 10:20 am

    I haven’t used it yet, but Hypertable is modeled on Google’s BigTable and may come close to what you’re talking about:

    http://hypertable.org/index.html

Trackback URI | Comments RSS

Leave a Reply

Green Web Hosting! This site hosted by DreamHost.