A related question - how you design a database for something as complex as a tax...

noir_lord · on July 25, 2015

The largest issue isn't actually storing the data (though that is a big issue) the biggest problem (from a system point of view) is the 'business' logic.

I've worked on some large very heavy business logic heavy systems and often what you find is that the database ends up a dumb store and the logic is entirely handled on the application side, this is fine when the original system only spoke to the one application but becomes a massive problem once you want to allow other applications to access that data store, from experience this is one of the reasons why large enterprise re-writes often fail, the specification on paper vs the massive complexity of edge cases in the application code, years sometimes decades of if/else statements.

Of course the inverse is when the database does have all the business logic and the client is dumb while better that does often mean that you can't get the business logic out easily if the database has become obsolete (some of these systems aren't even SQL based or a dialect of SQL that barely looks like SQL, when you have stuff that was written in the 70's all bets are off).

It's things like this why IBM still sells zSeries with COBOL support for code that was written 30-40 years ago.

Not really sure what the solution is, I'd say some kind of business specific almost domain language that goes beyond SQL or Java (but we tried that, it was COBOL ;) )

As for your suggestion about NoSQL there is merit in that but if you go that route you have all the complexity of the business logic in your application as well as much of the complexity that would have been handled for you in the database via SQL (government systems are mostly amenable to normalisation).

Possibly some kind of SQL/Document hybrid (something like Postgres with JSONB) with a business logic layer over the top might work but I've no idea what that would look like (possibly something like SAP's ABAP which I haven't used but people I know who have don't like it very much), most of our current languages care more about pointers, bytes and function than they do about the actual modelling of the system they represent.

dragonwriter · on July 25, 2015

> Of course the inverse is when the database does have all the business logic and the client is dumb while better that does often mean that you can't get the business logic out easily if the database has become obsolete (some of these systems aren't even SQL based or a dialect of SQL that barely looks like SQL, when you have stuff that was written in the 70's all bets are off).

You have the same problem if the application implementation language becomes obsolete even if the logic is not in the database. Which your COBOL example illustrates.

That has nothing to do with where you put your business logic, it has to do with (on of the many reasons) why you need to have, and maintain, technology-neutral specifications documentation.

noir_lord · on July 25, 2015

> why you need to have, and maintain, technology-neutral specifications documentation.

In theory that is absolutely correct, with an unambiguous and current spec you can re-implement the software in something else without the original system the problem there (other than trying to get programmers to keep the documentation in sync with the system...) is that a specifications document written in English (or any natural language) is way too ambiguous to get away with that so then you decide to use a constrained version of English to write your spec and then someone says well why can't we get the machine to understand this and....COBOL again ;).

1stranger · on July 25, 2015

A good insight into this is to take a look at how e-filing works. It's been several years since I've looked at it but you could find the documentation and XML schema files online from the IRS that companies like Intuit and others use to transmit their e-filings.

From what I recall the XML corresponded one to one with the paper forms. If you think about how the paper forms work it's pretty straight forward. You go top to bottom entering data and making calculations. Sometimes there are rules (if this line is less than X, goto Y) that you would have to write somehow.

As to how they store it I can only speculate. I imagine they're stored as blobs by return/form. I doubt they would be stored in some denormalized form as it seems unnecessary and difficult. They don't need to query across all people's returns at once. They're probably batch processed return by return.

joshka · on July 25, 2015

XBRL [1] is how some parts of the ATO (Australian Tax Office) chose to do it, as well as a variety of other places.

[1]: https://en.wikipedia.org/wiki/XBRL

izyda · on July 25, 2015

This is interesting, it's actually uses by the SEC as well [1]. I think it still does not have 100% adoption though.

[1]. http://www.sec.gov/Archives/edgar/data/1288776/0001288776140...

undergrowth54 · on July 26, 2015

Here, have a look at this data model: https://github.com/OpenSourcePolicyCenter/Tax-Calculator