I was careful to choose popular, and not project opinions about SQL/NoSQL/etc. In my field, most of our data is relational and we use NoSQL for caching, queues, shared work, ETL performance, dashboards, etc. but at the end of the day for persistence, the RDBMS is where the “gold copy” data ends up.
As you mentioned previously, knowing the tool set and the domain is critical to either approach. At a certain point with technology the benefits and costs are weighted by subjective preference and project specific needs. I have weighted SQL higher than JPA by many factors because I can take my SQL knowledge to any backend project, and I’ve been a part of a lot of different tech stacks in my career.
Maybe my travels have lead me to be surrounded by many more engineers that trust the database (and their knowledge of the database) to handle the persistence without a too many layers in between.
I, personally, have never seen a JPA based project that actually worked well with large-ish datasets, high concurrency, or when non-trivial ETL functions are part of the system- and this general domain has been the majority of my career, so I may have blinded myself to THE majority being confused for MY majority.
Thanks for the response and a good look at the topic from a different point of view.
> I was careful to choose popular, and not project opinions about SQL/NoSQL/etc. In my field, most of our data is relational and we use NoSQL for caching, queues, shared work, ETL performance, dashboards, etc. but at the end of the day for persistence, the RDBMS is where the “gold copy” data ends up.
I'd worry about using an RDBMS in that situation because it's fundamentally mutability-first. I prefer to regard the user's actions as the "gold copy" and the current-state-of-the-world as a transient derived thing (i.e. event sourcing), but that doesn't really play to the strengths of an RDBMS. You also have to make global decisions about transactionality (in particular, you can't easily commit a data write without committing updates to all your secondary indices), and the much-vaunted relational integrity can be a problem because you can only represent constraints for cases where the appropriate response to a constraint violation is dropping the write on the floor. And of course you can't safely allow the ad-hoc querying that SQL is designed for.
I do think traditional RDBMS make some sense at the end of an ETL pipeline - where the secondary indices can be a big help for the ad-hoc querying/aggregation that you want to do in a reporting environment. But transactions don't make sense in that environment because it's essentially read-only (or at least single-writer), so you're still paying for a lot you're not using. I wouldn't use JPA for this, but I wouldn't really write code for this kind of environment at all - the point is to expose the data in a structured form for non-code tools.
Essentially I find mature systems outgrow SQL databases - the case where an RDBMS actually fits is the early stages where you want to run ad-hoc reports against your live datastore, you want to keep the current state of the world rather than worrying about history, having to manually fail over to a replica if master goes down is ok, updating all your indices synchronously is fine because write performance isn't an issue yet, and you can put constraints in the database because blowing up with an error page is an adequate response when the user breaks the business rules. Using JPA increases the rate at which you can iterate on the system, which is the priority for that kind of use case.
I was careful to choose popular, and not project opinions about SQL/NoSQL/etc. In my field, most of our data is relational and we use NoSQL for caching, queues, shared work, ETL performance, dashboards, etc. but at the end of the day for persistence, the RDBMS is where the “gold copy” data ends up.
As you mentioned previously, knowing the tool set and the domain is critical to either approach. At a certain point with technology the benefits and costs are weighted by subjective preference and project specific needs. I have weighted SQL higher than JPA by many factors because I can take my SQL knowledge to any backend project, and I’ve been a part of a lot of different tech stacks in my career.
Maybe my travels have lead me to be surrounded by many more engineers that trust the database (and their knowledge of the database) to handle the persistence without a too many layers in between.
I, personally, have never seen a JPA based project that actually worked well with large-ish datasets, high concurrency, or when non-trivial ETL functions are part of the system- and this general domain has been the majority of my career, so I may have blinded myself to THE majority being confused for MY majority.
Thanks for the response and a good look at the topic from a different point of view.