IMO there's a big difference between university level SQL and the ability to competently write SQL in a professional setting. In university, at least at my school, you are getting maybe at most 3 tables overall, with 3-5 columns each. Professionally, you're dealing with hundreds, maybe thousands of tables, each with between 10-1000 columns, and usually not even in 1NF either. It really takes a lot of practice writing queries professionally to get to a level of competence here.
I think it's bc of how popular ORM's are. You can get pretty far with an ORM without knowing any SQL, but you can inadvertently do some really dumb things if you don't understand how SQL works.
SQL was the first language I ever learned, so perhaps my perspective is skewed, but it seems like it would be pretty hard to use an ORM without gaining an intuition into how SQL works.
What is a trap I see often among ORM users is them running into the n+1 problem. However, that's actually a consequence of database implementations not being theoretically pure. If you had an ideal SQL database then their approach would be technically correct. They aren't wrong to think that way.
It's just that we don't have ideal SQL databases, so we have to resort to hacks to make things work in the real world. Why we need to break from the theoretical model and use those hacks can be difficult to understand if you aren't familiar with the implementation at a lower level.
"Object/Relational Mapping is the Vietnam of Computer Science. It represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy." (Ted Neward)
Remember an ORM is an object-relational mapper not a query builder. The "simple stuff" is repetitive, extremely common, straight forward, and boring -- exactly the sort of thing you want to automate.
> Yeah, but when you add/change a new property to your most used class you don't have to change 500 odd SQL statements, it just works.
ORM doesn't have anything to do with SQL statements. ORM only operates on the results those SQL statements produce.
You're probably thinking of the active record pattern, which combines ORM with query building into an interesting monstrosity. Indeed, query builders produce SQL statements and can save you from changing 500 odd raw SQL queries.
A lot of ORMs create SQL statements too, they have an integrated query builder. And functionality for updating, deleting and even creating/modifying schemas (migrations).
But as always: use the right tool for the job. A hammer is a great tool, but not suitable for removing screws (most of the time).
> A lot of ORMs create SQL statements too, they have an integrated query builder.
These are technically active record libraries. One of the popular ones is literally known as ActiveRecord, but the whole suite of them are implementations of the active record pattern.
ORM is simply the process of converting relations (i.e. sets of tuples, i.e. rows and columns) into structured objects and back again. This layer of concern is independent of query generation.
ORMs are not only great for simple stuff. They are great for a lot of complicated queries too. But as always, you need to know what you’re doing. An ORM is not a magic layer, that solves all of your problems.
I’m mostly using entity framework core, which allows you to do quite complex queries, which can be really easy to read and maintain. But if you’re careless, you can create quite slow queries, that’s clear. But you can easily create slow queries with plain SQL too.
Yes, I’ve found that it’s possible to write hundreds of lines of ORM code to do some very complex and sophisticated processing, that could be done in a dozen lines of plain SQL.
The ORMs I’ve used also needed every entity from the database to be duplicated in the host language, which is a horrific amount of duplication in any non trivial project. Not just the entities but also the relationships. And then they need to be kept in sync.
And then they have these awful and subtle caching issues that mean if you fall back to SQL then you take a massive performance hit. Or if you run multiple servers against the same database you need another layer of complex caching infrastructure. Or if you update a table outside of the application you can get inconsistent results.
The whole thing is a nightmare of unnecessary complexity apparently intended to hide developers from the much lower complexity of databases.
ORMs are an inefficient, leaky and slow abstraction layer on top of a fast and complete abstraction layer.
I’ll never willingly use an ORM again.
Edited to add: the whole fragile contraption is done in an effort to move slices of the database into the memory context of a remote application. To what end? So we can update some counter and push the whole lot back out? It literally makes no sense.
There are two ways to achieve a single source of truth:
1. Generate ORM entities from the database (usually there is a too for that)
2. Generate the database from ORM entities, and make changes via generated/manual migrations
You are supposed not to edit the generated side, just re-generate it on a change of the model.
I don't think it's the syntax of SQL that causes problems (and which ORMs try to replace)... it's the complex underlying logic necessary to accurately join multiple tables, set where and having conditions, and aggregating records, and knowing to which dataset each should apply.
Countless "replacements" for SQL have come around to make data accessible by non-expert, but regardless if your using a BI tool, reporting platform, ORM, whatever, pretty soon you're going to need to understand and express the logic, and that's when you really appreciate SQL's directness and terseness.
Every single org I've found that says "we don't use ORMs because they're slow" ends up constructing their own bespoke, wheel-reinventing "orm" or sticks all their application logic in sprocs, which are nightmares for imperative logic.
My memory could be a little hazy, but I don't remember any required course that dealt with SQL when I was in the CS program at a pretty highly-regarded university 25 years ago.
I took a course in which I learned quite a lot about SQL and in retrospect it was an extremely useful course to have taken.
I remember taking a databases class (either junior or senior year) that covered database design and also SQL with Oracle. We even got into Pro*C, which was some crazy Oracle-specific C pre-processor. It definitely wasn't required. My roommate took it and failed.
Mine ~15 years ago in the US had a required course that did some relational algebra and database normalization without ever touching a database (as a small part of a broader course, IIRC), and an elective where we did really simple stuff with a database (I came out of it not really understanding the difference between INNER JOIN and LEFT JOIN, for example).
At a Dutch CS study we had an SQL course that started with first-order logic, relational algebra and went to on to project that into SQL. It also taught 3NF/4NF and BCNF, indices, r-trees, query planning and optimisation.
The one I took was elective. I am willing to admit that from a professional ROI perspective, it was one of the best uses of time in my entire life and easily worth hundreds of thousands of dollars.
I avoided learning SQL in the client-server training program that got me started in tech back in the late 90s, only to have to learn it on the job during a global ERP deployment a decade later. Should have taken the class, would have meant at least a few less sleepless nights.
Database education in CS programs in North America seems minimal. Most people I meet haven't taken any, or maybe just took a practical "how to query" course. In North America, it seems like only CMU has an active (excellent) program around this topic.
There's a woeful ignorance about what the relational data model is, how the industry arrived here, and how this is implemented. Problems or archaisms with SQL specifically become synonymous in people's heads with the relational model generally, and for a while that led down the quite problematic NoSQL road. Then slingshotted back to SQL -- but from my perspective SQL itself (not the relational model) is a problem. It doesn't compose well. It doesn't handle recursive relations well. It has an awkward syntax. It conflates concepts. It has an archaic datatype model. None of this is intrinsic to the relational model, but SQL becomes a limiting factor.
Disclaimer: I work for a DB company doing awesome stuff with the relational data model, but not SQL (RelationalAI) so am ... biased. Though I have always had those biases. :-)
I'm a faculty member at the Rochester Institute of Technology. We have multiple courses for both our undergrads and grad students in databases. I primarily teach Introduction to Big Data for our graduate students. We happen to cover all the things that you mention in your second paragraph :)
I've worked with many data scientists whose typical SQL usage was to load entire rows (or with simple 'with' filtering) into Pandas / R dataframes and then do all their work there.
I think it's a combination of Pandas and R having much simpler APIs and a ton of documentation on Stack Overflow, and modern hardware just being so good that you can load a big enough chunk of your dataset in-memory on a laptop.
I mostly use SAS, I tend to prefer using plain sql queries where I typically depart SQL and jump into code is doing what SAS calls "By Group processing" (example https://support.sas.com/kb/26/013.html#)
I am not as familiar with R. Last time I worked in R (some years ago) equivalent R code was something like this caution I'm no expert in writing R so might be a better /more intuitive way...
> Have they stopped teaching this in University or something?
It wasn't part of the mandatory coursework when I went to school. I really think it should be. It's not as though my CS program was 'pure theory', they taught lots of 'vocational' stuff but not SQL.
As for why people put off learning it.. I put off learning it longer than I care to admit because I thought I could simply do without it (ORMs, BerkeleyDB, etc), and fake it if put on the spot (I always understood the very basics, insofar as a SQL query can be understood if read as declarative English.) Ever since I bit the bullet and actually learned it properly, I've been kicking myself for not learning it upfront.
I think one of the main uncertainties in higher education the last couple decades is what balance, if any, they should strike between job training for software developers, coding as an engineering discipline, and preparation for an academic career in math.
Most programs include parts of all three, but SQL per se (rather than like, relational algrebra) is pretty firmly in the professional training for software developers set, and schools that reject that aspect may not teach it.
In my experience recent code school/boot camp grads have as much or more practical sql as recent CS grads; probably because those schools are nearly totally focused on professionally applicable skills.
That may be likely, but then schools which emphasize the math/theory part are then missing the opportunity to really teach the theoretical part of the relational algebraic stuff; which is really set theory and first order logic. There's a lot of math-y stuff to dig into there.
It should be mandatory along with understanding O notation and datastructures and algorithms, because really... unless you're doing a career which is like... 100% embedded development... you're going to be encountering databases and datamodeling as part of your career.
I'd be much happier if CS programs would graduate people who did a whole semester of first order logic, Date&Codd's foundational papers and why network&hierarchical databases are problematic, relational algebra / calculus, Datalog & friends, and then just toss in a "and this is how SQL does some of this but also mangles all of this..." at the end.
+ as a bonus, a DB implementation/internals course so people can understand what goes into query execution etc.
Because those people would then have some proper context before going off and butchering the world with ORMs and microservices...
The issue isn't usually competency, can vs. cannot. It is usually a matter of should or should not. I have seen databases thrown at problems where they decidedly should not have, and have seen the product development cycle suffer where it should not have. While some engineers and analysts relish CTE spaghetti -- solving complex logic utilizing enormous wads of SQL, others are more wary and rightly look to tackle complexity with more accountable and powerful tools when possible. I admit that tools like DBT can help address some of the shortcomings of SQL, but it still does not have the generality and conciseness of modern languages.
Anecdata point here, personally my SQL skills have lapsed because I've been incentivized to know NoSQL more. Same for frontend stuff like JS/React btw.
It amazes me when competent developers / analysts / data scientists don't know (any) SQL. Have they stopped teaching this in University or something?