No question, such a query should not be written. That's probably the reason why this odd behavior, which is even different in various DBMSes, is not causing everyday problems.
The reason the behavior doesn't cause problems is because everybody treats automatic aggregation as a voodoo where they know one recipe that works and anything different is domain of the Devil.
And, IMO, that's a very sane and reasonable way to treat it. The entire idea of automatic aggregation is flawed, and those queries should just have a `group by ()` explicit at the right place.
I agree, I think the original sin here is the fact that whether a `SELECT` is an aggregation is determined by the contents of the scalar expressions at all. I think most of this weirdness comes directly out of wanting to be able to write both `SELECT sum(x) FROM xx` and `SELECT x FROM xx` and have them work.
Not that I have a better solution offhand, in SQL grouping by a constant value is not actually the same as not writing `GROUP BY` at all since the behaviour on empty tables is different.
What's an aggregation per se? A SQL query is best thought of an arbitrary generator function. You can, using functions like UNNEST, end up emitting multiple row-tuples for each processed input row-tuple. An aggregation is just a generator function that happens to reduce over all the input rows and then emit one row-tuple when there's no more input. Query-planning engines do not special-case this. It's just a generator node like any other generator node.
Consider: using window functions, you can do partial aggregations over subsets of the input — without even necessarily partitioning the input (i.e. you can compute "running totals" and other wacky state-machine-like outputs.)
Not writing `group by` is the same as writing `group by ()`.
And yeah, the difference between that and a value is one of those really surprising things on SQL that actually make sense and should be this way. Unfortunately, there are many of those.
Would you argue that automatic scalar-ism is also flawed, and the query
SELECT a FROM aa;
should have an explicit grouping, like
SELECT a FROM aa GROUP BY id;
?
After all, when you think about it, it's really not the aggregate functions that break expectations, it's the scalars. Of the four combinations of having or not having an explicit `GROUP BY` and having or not having an aggregate function, three of them have the unsurprising behavior of returning exactly one row per grouping.
-- aggregate function and group by - one single result per grouping
SELECT sum(a) FROM aa GROUP BY a; --or
SELECT sum(a) FROM aa GROUP BY ();
-- no aggregate function, but a group by - one single result per grouping
SELECT a FROM aa GROUP BY a;
-- just aggregate function, but no group by - one single result per (single, implicit) grouping
SELECT sum(a) FROM aa;
But then, when you have neither an aggregate function nor an explicit `GROUP BY`, it breaks expectations:
-- no aggregate function, no group by - one result row per row
-- in the source set, even though there should be only one big implicit group
SELECT a FROM aa;
Therefore, I propose that the next SQL standard should introduce a new `GROUP BY INDIVIDUAL ROW` keyword, that henceforth all "scalar" queries MUST use in order to have consistent behaviour with the rest of the language.
What is flawed is not that there is an implicit grouping on all queries. It's that the implicit grouping changes, depending on a set of rules that consider stuff written in several places that are not explicitly related to it.
You are asking if it makes sense to have an implicit grouping at all; it very obviously absolutely does. And grouping by individual row is the very obvious default. But I do totally support adding that keyword expressing the default. All defaults should be expressible.
> Procedural SQL [...] Choose your investment here carefully.
I think that the demand for procedual code has dropped drastically in the past decades as the "normal" SQL can solve so many more things with window functions, recursion, and so forth. So I'd say: Yes, choose your investment wisely and stay away from procedural SQL as long as possible.
There is still a case to embed business logic at the database layer, not the application layer, as databases tend not to change as much.
I have thousands of lines of PL/SQL that originated in the days of PowerBuilder that are now serviced by .NET; a decade from now could be totally different.
SQLite appears to use a (very small) subset of PSM for triggers.
Fair point, looks suspicious on first sight. I've checked my tests briefly: It's accepted as table and column name. As mentioned in footnote [0] I haven't checked different contextes yet.
Can you tell me: what kind of identifier is it (view name, function name) and which SQL context it causes problems (select list, create/drop statement, ...) and which system has problems with it. Thx.
2016/2023 got pretty widespread support across databases for the functionality I care about, both before and after adoption - namely the JSON stuff. MSSQL also already had pretty good graph support AFAIK, although I haven't used it that much.
I find the temporal table stuff really useful and they drastically simplify a number of requirements, so it's annoying that the only non-proprietary DB that supports it is maria.
The existing implementations (Oracle DB, SQL Server, MariaDB, Big Query) come with their problems too. I was a big fan of the new features when it came out in 2011, but pratically there is an unsolved elephant in the room: It doesn't cover schema changes.
> there is an unsolved elephant in the room: It doesn't cover schema changes.
100% agreed. It's remarkable how Datomic also arrived on the scene in the same era (2012) but actually managed to solve a lot of these hard issues of immutable versioning + schema evolution via a clean EAV-based information model and an emphasis on accrete-only schema changes.
> The standard is not generally available. Most of us will never learn what is in it.
It also not aimed at users, but at implementors. Funny enough, they don't read the standard either ;) But more seriously: Some implementations are old and generally vendors prefer not changing the behavior of their product. When the standard comes later, the train has already departed. The most critical incompatibilities are in the elder features. The newer ones have a tendency the be more aligned with standard behavior (e.g. window functions are typically implemented just fine).
> They haven't gotten as far as working out variables or function composition either [0].
Part 2 SQL[0] is declarative and intentionally doesn't have variables. Part 4 SQL (aka "pl" SQL) does have variables. I personally consider Part 4 obsolete on systems that have sufficient modern Part 2 support.
> Tying SQL to your specific database is the best option for performance. Writing database-independent SQL is somethign of a fools errand, it is either trivial performance-insensitive enough that you should be using a code generator or complex enough to deserve DB-specific turning.
While this is certainly true for some cases there are also plenty of examples where the standard SQL is more concise and less error prone than product-specific alternatives. E.g. there COALESCE is the way to go rather than isnull, ifnull, nvl, or the like (typically limited to two arguments, sometimes strange treatment of different types).
There is a lot of *unnecessary* vendor-lock in in the field.
> Part 2 SQL[0] is declarative and intentionally doesn't have variables...
The issue I have with that is that making a bad decision for a reason doesn't change the fundamental correctness of the decision.
SQL semantics clearly support variables, because they are available as WITH ... clauses and subqueries. So either these elements are a mistake and should be discouraged, or the language has variables and they should be made to read like variables in almost all other languages for consistency and readability.
This is literally a cosmetic issue but a lot of thought on syntax has happened since the 1970s and there seems to be an overwhelming on-the-ground consensus that SQL's syntax is not the way to lay code out. The semantics? Great. But there is nearly nothing from SQL's syntax choices that is present in any surviving programming language since. Many of the more popular languages explicitly reject SQL syntax decisions like Python (significant whitespace), "function(arg,arg,arg)" instead of constructs like "VERB arg,arg WITH TEXT,arg TEXT arg" (everything I've met bar lisp) for functions and not replicating broken English (everything).
Variables are a different thing because the declarative nature of SQL is a useful property. But standard SQL overall does such an appalling job of handling syntax I feel confident that they have gotten this part of the syntax wrong too.
Gotten a bit off topic, just happens to be a thing I feel strongly about.
Great work as always! I still remember looking at your site when I was working on an ETL system for sensor data and realising in frustration how MariaDB was missing all the nice modern SQL functions like merge.
Your website is great and I regularly check it to see what's new in various implementations. Unfortunately it seems that many databases don't support many modern SQL features yet. Any ideas as to why?
> Unfortunately it seems that many databases don't support many modern SQL features yet. Any ideas as to why?
I'd guess the incentive structure is the opposite of what you're implying; the same reason why every cordless drill manufacturer has their own battery connector: vendor lock in fuels private planes and shareholder reports, versus being compatible means no one is forced to buy your batteries and thus profits are `$total - $forced_purchases`
This situation gets even worse in the case of a standard without any objective way of knowing one is in compliance. Having a badge on the mysql.com website saying "now featuring SQL:2023 compliance!11" sells how many more support contracts exactly?
That's a good point. Additionally, it seems the standard isn't freely available and I doubt most of the developers of existing SQL DBs partecipate in drafting new standards. It seems it is doomed to diverge even further, which begs the question whether is it relevant anymore to have the SQL standard at all
Besides portability, there is IMHO nothing against INSERT ... ON CONFLICT if it does what you need.