Thanks Kyle! It is evident that our docs were insufficient. Rich and I have written what we hope is clearer, more comprehensive documentation about Datomic’s transaction model. We hope that this can preempt common misconceptions and we welcome all feedback!
Formal verification is very powerful but still not full assurance. Fun fact: Testing and monitoring of Datomic has sometimes uncovered design flaws in underlying storages that formal verification missed.
To start with, you usually perform verification on a behavioral model and not on the code itself. This opens up the possibility that there are behavioral differences between the code itself and the model which wouldn't be caught.
It is worth noting here that Datomic's intra-transaction semantics are not a decision made in isolation, they emerge naturally from the information model.
Everything in a Datomic transaction happens atomically at a single point in time. Datomic transactions are totally ordered, and this ordering is visible via the time t shared by every datom in the transaction. These properties vastly simplify reasoning about time.
With this information model intermediate database states are inexpressible. Intermediate states cannot all have the same t, because they did not happen at the same time. And they cannot have different ts, as they are part the same transaction.
When we designed Datomic (circa 2010), we were concerned that many languages had better support for lists than for sets, in particular list literals and no set literals.
Clojure of course had set literals from the beginning...
An advantage of using lists is that tx data tends to be built up serially in code. Having to look at your tx data in a different (set) order would make proofreading alongside the code more difficult.
In Clojure the answer is [2 2]. A beginner might guess [2 2] or [2 3]. Both are reasonable guesses, so a beginner needs to be quite careful!
But that isn't particularly interesting, because beginners always have to be quite careful. When you are learning any technology, you are a beginner once and experienced ever after. Tool design should optimize for the experienced practitioner. Immutability removes an enormous source of complexity from programs, so where it is feasible it is often desirable.
this is a lot less of a convincing argument when you consider the fact that we’re talking about a database interaction. If we want to do the FP weenie thing, I would assume a transaction context is monadic!
Another ding on your argument is that in the metaphor you _would_ get [2 3] if you aren’t in a transaction! Of course the reasons for this are not incorrect, but I think generally “the end result of this list of actions is different if you are in a transition for a part of it, even if you are the only client operating” is a bit surprising compared to default transaction semantics in other systems.
Of course, people should learn about their tools and not assume everything works exactly the same. I think it’s incorrect to assume that the choice made here was the only one that would make sense for Datomic.
As a proponent of just such tools I would say also that "enough rope to shoot(?) yourself" is inherent in tools powerful enough to get anything done, and is not a tradeoff encountered only when reaching for high power or low ceremony.
As Jepsen confirmed, Datomic’s mechanisms for enforcing invariants work as designed. What does this mean practically for users? Consider the following transactional pseudo-data:
[
[Stu favorite-number 41]
;; maybe more stuff
[Stu favorite-number 42]
]
An operational reading of this data would be that early in the transaction I liked 41, and that later in the transaction I liked 42. Observers after the end of the transaction would hopefully see only that I liked 42, and we would have to worry about the conditions under which observers might see that 41.
This operational reading of intra-transaction semantics is typical of many databases, but it presumes the existence of multiple time points inside a transaction, which Datomic neither has nor wants — we quite like not worrying about what happened “in the middle of” a transaction. All facts in a transaction take place at the same point in time, so in Datomic this transaction states that I started liking both numbers simultaneously.
If you incorrectly read Datomic transactions as composed of multiple operations, you can of course find all kinds of “invariant anomalies”. Conversely, you can find “invariant anomalies” in SQL by incorrectly imposing Datomic’s model on SQL transactions. Such potential misreadings emphasize the need for good documentation. To that end, we have worked with Jepsen to enhance our documentation [1], tightening up casual language in the hopes of preventing misconceptions. We also added a tech note [2] addressing this particular misconception directly.
To build on this, Datomic includes a pre-commit conflict check that would prevent this particular example from committing at all: it detects that there are two incompatible assertions for the same entity/attribute pair, and rejects the transaction. We think this conflict check likely prevents many users from actually hitting this issue in production.
The issue we discuss in the report only occurs when the transaction expands to non-conflicting datoms--for instance:
[Stu favorite-number 41]
[Stu hates-all-numbers-and-has-no-favorite true]
These entity/attribute pairs are disjoint, so the conflict checker allows the transaction to commit, producing a record which is in a logically inconsistent state!
On the documentation front--Datomic users could be forgiven for thinking of the elements of transactions as "operations", since Datomic's docs called them both "operations" and "statements". ;-)
In order for user code to impose invariants over the entire transaction, it must have access to the entire transaction. Entity predicates have such access (they are passed the after db, which includes the pending transaction and all other transactions to boot). Transaction functions are unsuitable, as they have access only to the before db. [2]
Use entity predicates for arbitrary functional validations of the entire transaction.
Datomic transactions are not “operations to perform”, they are a set of novel facts to incorporate at a point in time.
Just like a git commit describes a set of modifications, do you or should you want to care about which order or how the adds, updates, and deletes occur in a single git commit? OMG no, that sounds awful.
The really unusual thing is that developers expect intra-transaction ordering to be a thing they accept from any other database. OMG, that sounds awful, how do you live like that.
It was a service ticket system where we moved from a boolean status to a string for more states (noob modeling mistake I know). The problem came up a few months later when a dev pulled every attribute, saw the old field, and wrote new code against the old field, leading to some really confused tickets.
Looking at that blog post though, it seems that using #9: Annotate your Schema could have helped us avoid this situation. Thats an interesting feature of Datomic I was unaware of.
I friend of mine hired the principles at Cognitect to kick off a Clojure/Datomic project. They were like a force of nature, really, and very quickly got things set up and helped with architectural decisions, etc. I would imagine that they have helped hundreds of customers this way.
From the blog post, I think that the answer to your question is yes: they will stop general consulting services.
Here's hoping they release a getting started project now that they've dipped out of general consulting.There must have been a standard-ish template in use.
https://docs.datomic.com/transactions/model.html