main logo

I'm Sebastien Orban and I'm a hacker and a painter.

22 March 2014

Datomic

Datomic is a small revolution in the database world - at least for me. It's not like the other NOSQL something that reduce everything to a list of key : it's a databse that try to create a real sense of time in his own content, without forgetting that most of the time data are at least linked to be meaningfull. The way it's achieved is quite logical and simple : we don't record only discrete component, but the transaction that lead to the existence of it - to refer to something which is more well known, it's a bit like Git but for data.

A discrete component is described by an unique id, some attributes with their valueand a transaction unique identifier. If we want to describe a person we would get this in the database : Entity id : 42 | entity attribute : name | entity value : Jane | transaction : 10 Entity id : 42 | entity attribute : lastname | entity value : Doe | transaction : 10

What we can see here is that an entity id, even if it's unique, can have a lot of attributes : their link is the entity id. The transaction here got the same value since we have entered the value in the same transaction - if we would have entered the value in two. So, in order to get a full entity, we just request the unique entity id and will all related attribute... but it too means that if we want to link to another entity, we just create an entity attribute with a ref type an give as a value the unique id we want to link. Simple isn't it ?

But let's get back a bit : I've written about the fact that datomic is writting transaction. In fact, datomic by itself is not writting everything, it's all in memory... but for persistance, it used a transactor that will write the transaction to another database. What do we gain with that ? True ACID compliance means that "In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably" (see this article on [wikipedia][http://en.wikipedia.org/wiki/ACID]).

ACID is cool, especially compare to othe value database that favorise avaibility over consistency, but that's not all. In most of the NoSQL world, we're mostly schemaless : this is cool if you're still exploring your data, less so if you want to be able to use them reliably in the long term. Datomic is not like that : we have schemas, and they're following in very logical implementation : the same as our entity definition, but in another partition... and by partition, I just mean a logical separator between data - here we have a schema partition and a data one, but we're by no mean limited to that, even if I don't right now see the use case of it.

Other cool things is the use of Datalog as the query langage (I'll come back in another post on it), the fact you can use Java/Clojure code as database function, the fact that it's coming from the creator of Clojure. Rich Hickey is a guy that love simple (but not easy), predictable things - very important stuff in the database world, isn't it ? It too means that we can now have live code in the database that can extend it - enrichy it - like adding other metadata to each transaction, like a username automatically, create and check the consistency of new data type and so on.

The biggest hurdle is getting acquainted to the different way of thinking since it's not using SQL but a relative to Prolog as query language : Datalog. With it, you build your result by joining together small query that looks like that : [?c :community/name "belltown"] Since it's following the resource description {entityId id, entityattribute attribute, entityvalue value} we can easily deduce that ?c will be the resulting id where the attribute is :community/name with the value "belltown". If we would have wanted to get the value of an attribute for an id we would have written [1001 :community/name ?c], or, if we only want the list of attribute for an id [1001 ?c]. Easy isn't it ? For a join we can use the resulting value (?c) in another query like this : [[?c :community/name "belltown"] [?c :community/url ?d]] So here we would get the list of id in ?c and in ?d the list of community url. It's quite easy to understand, but since it's not what we (well, at least most of us) are using all day long, it can take a while to get acquainted with this way of thinking.

As you can see, I'm a bit enamored with Datomic - even if, to be honest they're still huge flaw in my own use case : Java is the basic, and this come with all the relative problem linked to a jvm on a system - like memory requirement, processing power, the fact that there's a storage database and "live" part, the doc which was pretty light for a long time, the cryptic java error and exception... All of this is evolving pretty fast though ! And don't tell me that we can already have that in good RDBMS with SQL, trigger, view and so on : I know. Does that mean we should stop looking forward and try to get better? I don't think so.