I'm fine with this, but SQL has some lingering shortcomings:
- Inability to express recursive/nested datasets directly (tree-like).
- General inability to express structural sharing and graphs directly.
- Inability to express associative (map-like) relationships directly.
- ...
Some of these are solved in the SQL standard, but not universally adopted. Others are solved by particular databases, but also not universally adopted. For the rest, of course you can solve it if we add the condition "some assembly required" when you get the data.
But if you have to assemble and disassemble the data at every step, it stops being a viable pipeline choice. It's as good as serializing into CSV, JSON, or whatever. In fact, JSON at least can nest.
Largely my take as well... I will say, I'm not a fan of using the DB Engine to self-injest or export data... I do prefer newline delimited json for import/export as it tends to be easier than CSV usually is.
The variations in SQL dialects/engines are pretty broad... and in some cases (mysql, ugh) straight ANSI SQL syntax may or may not work as expected. Not to mention more esoteric data like JSON or XML data columns.
One other niggle, don't do data analytics against your live database servicing your applications... use a read mirror or replica node... The types of queries data analysts tend to run are often less than ideal in terms of performance characteristics. Developers can create bad enough queries as it is, let alone a DA query locking a key table for many seconds.
It’s definitely true that making a dataset into something SQL can elegantly handle is not always easy, for example, dealing with JSON. But that’s also kind of the point. It thrives on flat, relational data. And having that relational layer makes doing things with data very easy.
- Inability to express recursive/nested datasets directly (tree-like).
- General inability to express structural sharing and graphs directly.
- Inability to express associative (map-like) relationships directly.
- ...
Some of these are solved in the SQL standard, but not universally adopted. Others are solved by particular databases, but also not universally adopted. For the rest, of course you can solve it if we add the condition "some assembly required" when you get the data.
But if you have to assemble and disassemble the data at every step, it stops being a viable pipeline choice. It's as good as serializing into CSV, JSON, or whatever. In fact, JSON at least can nest.