It's sad but this difference, although completely fundamental to SQL, is seen as...

michaelt · on May 8, 2013

Consider these two queries (untested so they might have typos):

  select order_number,sum(line_value) as order_value from order_line
  group by order_number
  having order_value > 100;

And:

  select order_number,order_value from 
    ( select order_number,sum(line_value) as order_value from order_line
      group by order_number )
  where order_value > 100;

I'd expect them to have the same explain plan and runtime characteristics. Isn't HAVING just syntactic sugar for the fact queries can be arbitrarily nested? Syntactic sugar isn't fundamental IMHO.

ScottWhigham · on May 8, 2013

I don't know all of the syntax from all languages, but I do know that in Sybase and Microsoft SQL Server, the first would generate a syntax error. The HAVING and WHERE clauses cannot reference column aliases. You could rewrite it to be:

...

having sum(line_value) > 100;

Would the performance be the same? That's up to the individual DB. To say, "Yes, they would" implies that all DBMS vendors implement query plans the same way. I think that, fundamentally, they should perform the same way but again: it's not up to you or I; it's up to the people who wrote the query execution engine.

jrochkind1 · on May 8, 2013

I believe you are correct, except possibly the other way around -- nested queries are just syntactic sugar for joins, group by, and having. :)

I'm actually not sure what I said is true (you can possibly do more with nested queries than you could do with just joins and group by/having?), but I DO think group by/having came before nested queries in rdbms implementations.

redidas · on May 8, 2013

I can help out with this! (Finally a hacker news discussion in my line of expertise!)

While you could think of nested subqueries as syntactic sugar, you might also want to think of them as a view you specify on the fly, or a "derived table" of information.

Each RDBMS optimizes a bit differently, but depending on your system subqueries may have query plan implications as well. Sometimes they'll make your query faster, other times slower. It all depends on the RDBMS, table indexes, and the operations you are doing inside the nested query.

Personally, I'd recommend using HAVING instead of using WHERE with a nested SUM. The query optimizer may create the same execution plan in the end, but the HAVING is a bit more explicit in what you are doing.

For those familiar with SQL, HAVING indicates you are filtering your query on an aggregate value, where as a WHERE indicates you are filtering records out of consideration before they are aggregated (as someone else as pointed out in another comment).

daigoba66 · on May 8, 2013

i believe you are correct

bgilroy26 · on May 8, 2013

Order of execution is a great second stop on the road to SQL, after learning what all of the words mean.