Try enforcing this on teams that use ORMs like hibernate with 500 developers.
Super, duper, common issue, you will find this at every large shop at some point in its life time, usually around the time of hiring people and expanding extremely fast, and taking on some tech debt.
All functions of extreme scale, hyper growth, and yeah, not following the absolute best practices all the time, but tech debt is like any debt, you get something now, and pay later. If they continue going up and to the right they will be able to afford it.
ORMs have mostly been just painful at ever shop I've been at. I've used ActiveRecord, Squirl, Hibernate, django.db .. they're all various level of suck.
The one huge advantage of an ORM is the ability to support multiple databases, but that only really works if you can do everything using the ORM. The moment you have a function too complex that you need to write some SQL, now you need some case statements and multiple integration tests for all the database your product needs to support.
They remove some boiler plate while adding others. In one of my own projects, I just created several files (pgsql.commands, mysql.commands, etc.), a basic set of classes around them and a base set of commands that will work for all the DBs I wanted to support (so the command files had an inheritance model, albeit only one layer).
With all that being said, most ORMs I've used do have explicit transaction support. I know Squirl had a `transaction {}` block you could wrap commands around. Transactions shouldn't be an excuse. They should be off by default and explicit added around blocks of things that need to be atomic.
> Try enforcing this on teams that use ORMs like hibernate with 500 developers.
I realize this is a hyperbole (I hope) because you really shouldn't have 500 developers all on the same monolithic project (unless you're developing like...the Linux kernel). Getting your team to at least try to implement best practices does take some effort, but with things like weekly demos and code reviewed commits, it's do-able.
I used to be a huge proponent of ORMs everywhere, but I've come to realize that if you're writing your app in such a way that a developer needs to be able to do any arbitrary data fetch or transformation whenever they want, that's your real problem. The set of retrievals and transformations you want to support should be well-defined, and abstracted into a layer whose interface allows the level above it to only think in terms of models and not their backing store.
After you have that, then it doesn't even matter on the backend. The models you present to the layer above can have complex and changing relationships to the actual storage -- maybe they contain data which is derived from what's in the database, but transformed after being fetched so that none of their properties actually correspond to a column or a field in a store. In my experience -- having seen the tragedy that is a Rails project gone full ActiveRecord -- this pattern enforces an excellent separation of concerns and constrains a problem which can otherwise grow unboundedly in complexity.
I don't really agree; I think you either need to have a very thin layer between your DB facts and your domain, or else use the DB as a kind of persistence layer for a complex graph.
The latter only really works if you've got a primary implementation language and aren't integrating lots of applications / libraries written in different languages communicating with the same database. You need to go down the SOA / distributed RPC / FFI route to integrate different languages, and that has its own complexities.
Personally I prefer treating the DB as a canonical store of facts. Models with a lot of code are pretty suspect. Retrievals can be tuned to just the facts required, ma'am - you don't accidentally drag in the banana + gorilla + whole forest. Doesn't stop you building a higher-level service layer if that's what you need, either. You'll need that when you scale up anyway; chatty models won't work at that level either.
Yeah, I'm saying that as your app grows out of being a simple CRUD app into something more useful and involved, there will be less of a relationship between what you need to store things efficiently, and what you need to present them well. Your model will become more graph-like, probably. For this reason, patterns designed around AR-style models will fail to scale. I disagree that this only works in a mono-lingual environment, although you will need tooling and infrastructure to support it; a model-centric architecture typically doesn't afford the possibility of multi-lingual support.
The code doesn't go in the models, it goes in the service/arbitration layer. DB as a store of facts is obvious -- DB as a 1:1 representation of what yet-unforseen features, UIs and platforms will need is a naive and limiting assumption. You have to build your application in a way that future product needs won't be constrained by storage and modeling decisions, which is a tension that Rails apps frequently encounter.
> The one huge advantage of an ORM is the ability to support multiple databases, but that only really works if you can do everything using the ORM.
It has several advantages: support for multiple databases (which is useful, sometimes), the ability to serialize/deserialize an object graph in one go, and sometimes a decent query builder which lets you compose queries as opposed to concatenating strings.
Unfortunately, it's also terribly easy to destroy performance by using lazy collections configured the wrong way for your use case and not notice it, to the point where I strongly advocate using query builders instead.
> I realize this is a hyperbole (I hope) because you really shouldn't have 500 developers all on the same monolithic project (unless you're developing like...the Linux kernel). Getting your team to at least try to implement best practices does take some effort, but with things like weekly demos and code reviewed commits, it's do-able.
The problem is that when you come in later, that the codebase grew way too fast and the deadlines are tight, retrofitting best practices on an existing ball-of-mud can be daunting.
> It has several advantages: [1] support for multiple databases (which is useful, sometimes), [2] the ability to serialize/deserialize an object graph in one go, and [3] sometimes a decent query builder which lets you compose queries as opposed to concatenating strings.
[1] was the point of the comment you replied to and it provided a very important constraint as well
[2] rarely needed & easy to implement with recursive queries in native SQL
[3] building queries is pretty straight forward; what ORMs usually tend to bring to the table is knowledge about the schema and therefore compile-time error reporting - but this can be done in any language where one has the level of reflection, or, in worst case, by a two-stage compilation process where stage 1 generates code from the schema that can then be used by the compiler for verification in stage 2
I don't see how recursive queries are going to help you serialize a new Foo with a new field of type Bar with a new field of type FooBar, each going in a different table. That's what I mean by serializing an object graph.
As for building queries, SQL is straightforward (mostly). The problem is that it composes very badly. Any time you need to implement something like an advanced search (ie, lookup the same information, but with a number of different search criteria only known at runtime), the best you can do is concatenating partial queries and hoping you got the parentheses count right. Not to mention that a query builder will help with stuff the SQL syntax is miserable for, like IN clauses.
> I don't see how recursive queries are going to help you
You construct a graph of objects with very few queries, transform it and write it back; with knowledge of the db-schema this will outperform any ORM-based solution and give much greater flexibility.
> As for building queries [...] The problem is that it composes very badly. [...] the best you can do is concatenating partial queries and hoping you got the parentheses count right.
There is an area between full-fledged ORMs and string concatenation. In a purely functional approach queries are composed by composing functions that compile to queries. Postmodern[1][2] is a good, open source example, though in most commercial projects we just built our own wrappers for the tasks and databases at hand. This also allows for much better performing code since for a lot of tasks hooking up the db-reader to the json emitter without going through object instantiation reduces memory and cpu consumption by an order of magnitude (or two), while in the same project, some code benefits from a OOP approach (for which you just use a reader that constructs the objects on the fly).
This is why I like SQL Alchemy. It provides an ORM that acts as a veneer over a very powerful SQL expression library, and you can move back and forth between the two seamlessly, even mixing them in individual statements.
That's what I've often wondered about ORMs. First they try to abstract away all the database internals so that the developer hardly even knows what's happening. However, ultimately the abstraction always ends up having some limitations. In the end the internals need to be exposed, but since your code base is still mostly based around the ORM it can end up being a mess.
If you're team is 500 strong you definitely should be at a point where you can do things right. If not, the 'debt' is not mainly technical, it's clusterfuck of bad decision making.
Yea sure some people in the sea of 500 will be doing elegant clean things, working in the core part of the business ( usually at or near the cash register program ) but if you have reason to have 500 developers you don't simply get the luxury of having every project be written perfectly.
Landing somewhere in-between by getting things done not perfect but shippable, and doing it fast as possible is par for the course ime. The best code is always written the 2nd or 3rd time, never the first.
Especially on the case of hyper growth worrying your competitors like lyft or postmates or amazon might get something pivotal out first.
People have to learn somewhere, usually things like this they learn at scale, on the job.
Something that hit me researching how build a relational language and thinking how hard could be to remove SQL and put instead my own flavor, (super-oversimplification):
- SQL "bad"
- ORM "good"
- NoSql apis "good"
So, the thing is that when facing with SQL "everyone" try to "abstract" it more.
Or instead use NoSql, because is "easier".
Fine.
Then if exist a market demand for a better API for the databases, why the databases guys not DO IT?
Yep, I know SQL is supposely the way for it, but bear with me: I live in the FoxPro era so I know what is code "to the metal" in database without SQL (and it was fine and easy).
If the SQL layer could be optional and more bare layer is provided (you can copy the dbase ideas!) then the problem of ORM mappers could be solved far easier (I imagine!).
Most of the time, the problem is to "embed" a foreign language (SQL) into an existing one (Python, Ruby, etc.), but you can do it in a proper way such as LINQ, then you have type-checking etc.
I wish your thinking would be more widespread. I never understood how come that restful HTTP became very popular without having any universal query language around and merely providing an HTTP client with each service built, while in the DB world everybody is obsessed with SQL? Why not just provide the developer with an API/library for the underlying algorithms/datastructures?
For example I hate so much when I have to read mySQL documentation jus to find out what went wrong when I used ORDER BY and the optimization didn't kick in? I mean, if I have to keep in my head the whole optimization mechanism of the DB engine and all the steps of the filesort algorithm going on behind the scenes just to write a single wretched SQL statement, then what's the point of SQL in the first place?
And this was just one example. The truth is that the SQL abstraction is leaking. Most of the commands don't encapsulate anything at all.
it's very hard to abstract away from SQL. to be honest, many nosql have integrated concepts similar to grouping, having, joins in their API as well as they matured, because that's what data elaboration needs.
But still everyone do it. That must tell something, right?
And I'm not talking about NoSql borrowing some relational concepts, but the opposite, and more directly, the API.
I know that most folks (including the ones that downvote!) have no clue what I'm talking about, because almost nobody (recently) have experience in talking against a database without SQL. Is like if the only way to talk to a NoSql was using Json + REST. That constrain badly your mind.
The inner relational model is far more rich, and simpler, SQL was not designed to be used by developers and it show (specially when you have contrived syntax as with CTE)
And before you ask what is the problem with CTE, is exactly because the way to have that is create a contrived syntax that obscure what is goin on. SQL is too restricted, yet too broad for interfacing.
Well, data transformations are much easier to read and write in SQL than e.g. Java, what with temporary collections built up in memory, random maps, lists, etc.
CTEs aren't required very often - you generally only need them for recursive CTEs, and that's iterative retrieval analogous to pointer-chasing. It's typically a sign of a data model that's poorly suited to relational storage, e.g. trees and graphs.
I have issues with the irregularity of SQL syntax - it deeply annoys me that we have both 'where' and 'having' for doing exactly the same thing, one pre-fold and one post-fold - and I don't like my lack of access to the potential the index has for related fetches (window functions are not pleasant to work with) - but mostly my problems come not from lack of access to the relational nature, but lack of power over poor implementation / query planner / etc. details.
Agree, but that are problems at the end, when most developers have issues at the start (basics). Baffle me when i read that relational databases are "hard" and you need to teach some basics even for experienced developers. Plus, the thing here is that yeah, everyone know SQL have issues.
So they make a ORM
Then it create worse issues. Is like the Database-side and the app-side are at war (without intention!) with each other and the database-side only concern themselves for the kind of issues that only happened for specialized tasks.
And the app-side still try to interface to the database, but poorly.
----
Articulating this better, I hope: Is like the movement now to improve the syntax and semantics JS. Or like ASM.js.
So, why not have SQL 2.0 with the fixes everyone since 20 or more years ago already know? And the libraries/API made to acknowledge that databases are not primarily used by "end-users" but app developers?
But that is probably like ask why not clean C++... ;)
I know this is just ask for the moon (obviously not exist one "SQL", only several more or less alike implementations), but the trouble with ORM and databases is more than a decade old and the answer is mostly "yep, we have that problem. Move along"
I started using CTEs much more often for complex queries actually. IMHO, they're more clear and easier to read than sub-queries, especially when those sub-queries get nested 3 or 4 layers deep.
I agree with the ideas of irregularities. The WHERE vs HAVING doesn't bother me much, and I can't think of a better syntax off hand. My SQL pet peeves are that the SELECT list is at the beginning - I don't know what columns I want exactly until after I type out the JOINs, so I usually type SELECT * and then fill in later. I'd rather put it between WHERE and ORDER BY. I'd also like UPDATE to put the WHERE before the SET, so you don't risk blowing up a ton of data if you forget or miss the WHERE.
Probably a lost cause to get that in, but I would think it wouldn't be too hard to at least support those syntax changes in addition to the current standard.
Complex queries using CTEs may be easier for you to reason about when you are writing them, but are a nightmare to understand if you are not the one who wrote them. We have heaps of CTE-intensive technical debt and usually it takes longer to understand what they are doing than it would take to write them from scratch from a specification.
If you are using them to ease your understanding, chances are that you are doing the job in a very memory intensive way. It also tends to be slower because the generated intermediate results do not have indexes nor good statistics that could help the query planner to be efficient. Essentially, you are taking upon yourself the query planner job and assuming you'll do it better yourself.
> > > SQL was not designed to be used by developers
> Sure it was. Who else would it have been designed to be used by?
More accurately than the grandparent: SQL wasn't designed exclusively to be used by developers. It was designed to be accessible to analysts (domain-focused analysts, not systems analysts.)
Yes, but there is no point in blaming the database for that. It's seriously not a database issue. Particularly Postgres has well exposed internals for monitoring and killing of long running transactions (if you are not setting statement timeout).
Super, duper, common issue, you will find this at every large shop at some point in its life time, usually around the time of hiring people and expanding extremely fast, and taking on some tech debt.
All functions of extreme scale, hyper growth, and yeah, not following the absolute best practices all the time, but tech debt is like any debt, you get something now, and pay later. If they continue going up and to the right they will be able to afford it.