*The fact that a developer can "corrupt" the database with non-atomic stuff is a...

couchand · on Oct 21, 2020

A related principle is to make illegal states unrepresentable. If you can possibly insert a debit and fail to insert the corresponding credit, thereby making money magically appear, the schema is wrong. This isn't a matter of just insufficiently normalized, it simply doesn't represent what you claim it represents.

The database shouldn't be viewed as just some kind of scratch pad where you write stuff down so the application server doesn't have to remember it. If the database is the source of truth, then it is the model, and the application is simply a view.

libraryofbabel · on Oct 21, 2020

This is a good principle to aim for but it is impossible to fulfill in many cases. There are always going to be invariants that the application may wish to enforce which cannot be enforced in the database (or, if there was some ideal schema to enforce them, the migration from the existing schema would be enormously costly). This is the whole reason we have ACID transactions in relational databases - so the application can enforce its invariants!

(Exercise for the readers: construct a set of invariants that cannot be enforced within a database schema.)

Silhouette · on Oct 21, 2020

This is a good principle to aim for but it is impossible to fulfill in many cases. There are always going to be invariants that the application may wish to enforce which cannot be enforced in the database

As HN doesn't show moderations, I will just say that this is exactly the point I would have made if I'd been replying first. Relational databases are good at enforcing the kinds of relations provided by basic set manipulations. If you try really hard, you can encode some more complicated constraints, but as the complexity increases it becomes unwieldy and eventually impractical to enforce everything at schema level.

Edit: Changed "database level" to "schema level" to clarify that I'm not talking about stored procedures here.

hiram112 · on Oct 21, 2020

I'm a little confused. A 'toy' bank app would have individual accounts with some amount of money in each. The problem arises when you subtract the debit from one account and error out before crediting the amount in other account, or vice versa. Now you're in an inconsistent state.

How would you typically design a relational schema that could avoid this scenario?

ufmace · on Oct 21, 2020

The usual for this would be to have a table of transactions instead of balances. The balances would be computed dynamically by summing the full transaction history for a particular account.

Of course this gets a bit unwieldy with large transaction histories, and has trouble with things like enforcing that you can't enter a transaction that draws a balance below zero.

earthboundkid · on Oct 22, 2020

In the real world, you can totally go below zero though. It’s a race condition that they turned into a revenue opportunity.

ufmace · on Oct 22, 2020

Well yeah, but that just makes the business rules even more complex. That makes it more like - reject the transaction if it would move the account balance below zero, unless the account has a flag to allow overdrafts (which is over on the accounts table), in which case we allow the transaction and also enter another transaction for the overdraft fee.

Probably need some more rules like max overdraft amount, max number of transactions etc. I've heard some places may also have the "let's be jerks" rule to reorder transactions to hit overdraft as soon as possible to maximize fees.

Silhouette · on Oct 21, 2020

That sort of problem is actually relatively easy to solve in itself. In essence, you don't record the transactions on each account separately, you record the transfers between them (just as double-entry bookkeeping does). Since any transfer inherently balances, you can't be inconsistent in crediting money somewhere without debiting it somewhere else or vice versa. If you want to know what happened to an individual account, you can still construct that information by looking at the relevant sides of all the transfers into or out of that account, but your single source of truth is always the transfer records.

Of course, it's still not that simple in any realistic system, because maybe you have other important constraints such as ensuring that accounts don't go into negative balances. In that case, before you can add any new transfer, you need to calculate the existing balance on the account you're taking money from, which means replaying the entire transaction log selecting for that account if you just have the naive schema described above, and then you have to create the new transfer record if and only if the constraint is satisfied. All of this has to be done atomically to make sure you can't have multiple transfers being recorded concurrently that would individually be valid but collectively leave an account overdrawn.

That takes you into the technicalities of how to process DB transactions with acceptable performance. Hopefully your database will help you if it's ACID compliant, but it still can't work miracles. For the simple two-sided transfers in this example, you can probably do a lot just using a database's built-in transaction support, but for more complicated constraints applied to more complicated schemas, at some point you have to take the practicalities of your database engine into account when designing that schema and deciding what constraints to enforce at this level.

blmsjw · on Oct 21, 2020

I think as long you keep use stored procedures as atomic operations/transactions it both simplifies code and improves performance. In your example, a transfer money procedure should handle both the balance increment and decrement of all accounts involved in a transaction before returning. Things start to get hard to maintain if you have stored procedures that only complete a portion of the business logic and rely on code in the business logic to “understand” what still needs to be done