Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> maybe they are asking a bit much from SQL

But this article is thought provoking to say the least. It follows the courtroom logic of holding the defendant SQL on trial for as much as possible. And SQL is guilty of a lot of crimes.

I do hope GraphQL and similar query languages become more prevalent and standardized, as it seems SQL could really use some stiffer competition.



SQL is often conflated with the relational model (hence the term NoSQL for non-relational databases), but the article is careful to explain that the relational model is great, but SQL is a clunky syntax/standard.

Going to graph databases is certainly throwing the baby out with the bathwater. The relational model was invented to address shortcomings in the hierarchical and graph database models.


> SQL is often conflated with the relational model (hence the term NoSQL for non-relational databases)

Still not confused enough? Check this out: http://www.strozzi.it/cgi-bin/CSA/tw7/I/en_US/NoSQL/


Do graph databases predate relational databases? Do you have a source for that? I thought graph databases were a fairly new thing.

I only recently started working with a graph database (a bit over two years now), and it struck me just how terrible relational databases are at relationships, compared to graph databases. I know nothing about the history of databases, but my impression is that relational DBs are basically an expansion of simpler table-based databases that can also handle relations between different entries, if awkwardly. Graph databases are a much better fit and smoother to work with.


Graph databases have been around nearly as long as databases generally, at least since the 1970s. The sole feature that makes a database a "graph database" is support for a minimal amount of recursion in queries, which was a feature before SQL even existed. There are good technical reasons graph databases have never been commercially successful.

Most databases support some form of recursion and have for decades. The reason they don't market themselves as a "graph database" is that the performance and scalability of graph data models in conventional database architectures is typically very poor, so representing your entire data model that way is not encouraged. The way indexing is implemented in many traditional database engine designs, e.g. B-trees, is pathological for some common graph-like traversals. Even databases that market themselves as "graph databases" have conventional internals and perform only marginally better for graph data models than databases that are positioned differently.

The idea of graph databases are great. Unfortunately, the existing implementations all suffer from very poor scalability due to fundamental limitations of the data structures and algorithms used internally. The core theoretical challenges are well-understood but few people are credibly working on addressing those.


Interesting. But are those old graph databases similar to modern ones like Neo4j? Neo4j has pretty amazing performance compared to doing the same thing in SQL. Although it's certainly possible that SQL database was poorly designed. Even so, a handful of developers new to graph DBs managed to easily beat its performance.

I've also heard that some modern graph databases are not true native graph databases below the surface, and therefore perform worse at graph-specific queries.

Edit: is it possible you're talking about Network Model DBs[0]? Wikipedia mentions them being around since the late 1960s, and that they could model graphs, but suggests it's more of a predecessor to graph DBs, which saw a lot of improvements until the arrival of modern commercial graph DBs in the 2000s[1].

Again, I'm not an expert on this at all, but it sounds like there have been significant improvements in graph databases since the 1970s. Much more so than in relational databases.

[0] https://en.wikipedia.org/wiki/Network_model

[1] https://en.wikipedia.org/wiki/Graph_database


I wouldn't say relational databases are poorly designed, any more than saying a hammer is poorly designed because it makes for a bad screwdriver. A hammer is still excellent at working with nails, it's hard to find a better tool to work with that. This is just about using the right tool for the job.

Back when data was simpler and not as big, relational databases were perfect, and there have been years of engineering and bug fixes that have gone into them. They are excellent at what they do, and they continue to improve.

But as technology has improved, as our disks and memory have gotten bigger, as the data we collect and want to query over has gotten bigger, and as our queries have gotten more complex, we've been running against the limitations of log(n) joins and relational database technology for some use cases. Now, not every problem is a nail. Some are screws. Some are more exotic.

That's been the reason for nosql databases in the first place, to try to address the shortcomings that arise as data gets bigger, more complex, and as queries and operations become more complex over large data.

log(n) joins are fine...until data explodes, and you're no longer doing just a handful of joins per query, but a very large number of them, maybe even unbounded, and maybe the rules for what data to traverse has soft or even no restrictions. When your data is graphy, when the questions you want to answer require traversals of this scale and nature, and when you want to make sure your traversal costs are proportional only to the graph you want to traverse (and not proportional to the total data in the database), then graph databases provide a very good tool for modeling and efficiently querying over that data.

Graph databases are relatively young, compared to relational databases. Yet their usage has been proven, especially as more graphy problems and data have grown more common.

Relational databases are still useful, and still improving, and graph databases will also continue to grow and improve side by side with them.

We even have a GQL initiative, on the language side, aimed at becoming an ISO standard that will hold an equivalent position as SQL, but for graph querying. That should speak to the value and importance of the paradigm.


The fundamental premise of the relational model is the physical/logical distinction. The relational model deliberately does not make any requirements or assumptions about how data is physically stored or structured.

The difference between relational and graph (and other NoSQL database systems) is not about particular sizes and shapes of data, it is about level of abstraction. For example assuming joins are "log(n)" makes certain assumptions about how relations and indexes are implemented which is only true for some naive implementation (like Access or MySQL).

Just as an example, materialized views is a physical-level optimization where an arbitrary complex query result is stored and kept updated, which means data can be retrieved as fast as physically possible. Of course this has a cost at insert-time, since materialized views also have to be updated - but this is a performance trade-off just like the structure of a graph database is a performance trade-off.

NoSQL databases has a tight coupling between the physical and logical structure, which makes them easier to optimize towards particular usage patterns but harder to adapt to changing requirements over time. The relational model was specifically designed for large databases used by multiple applications and changing over time.


Performance was not the selling point of relational databases in the first place. It is hard to beat the performance of a perfectly tuned hierarchical or graph database on the workloads they are designed for. At best a relational database can be equally fast. For this reason hierarchical/graph databases never went away and are still appropriate for specialized tasks.

The selling point of relational database is they can accommodate long-lived multi-purpose databases used by multiple different users and applications over time and where the data model would change and evolve over time as the world changes. The is achieved by "data independence" - the logical model is decoupled from physical storage structure and is not hard-coded to accommodate particular query patterns. So you can have ad-hoc queries - cross-cutting queries which you didn't know you would need when the schema was designed. The key insight is that relationships between entities are also just data.


You can build almost any kind of database on top of any decent database kernel -- they are similarly expressive. The "kind" of database is just a skin over the kernel. However, different kernel designs make different performance tradeoffs in several dimensions. A graph database is much more join-intensive than a relational database, for example, so it may make sense to reduce select performance to increase join performance.

Graph database technology was essentially stagnant for decades, and the design of Neo4j is from that era. The basic technology is the same across all graph databases, just incrementally modernized and micro-optimized, hence why they all have the same scalability and performance problems. The market gap is that most interesting graph data models have at least 10^12 edges and graph databases stop working well long before you get to that scale.

There was a major breakthrough on this topic around 2008 at a corporate research lab, a new way of representing graph data models that was highly scalable. By coincidence, I was working in graph database R&D at the same time elsewhere and had some exposure to the research, made several material improvements to the new approach, and even built working prototype database based on it that addressed a lot of classic scalability and performance issues. I stopped working in graph databases shortly thereafter but to this day have not seen a single implementation in the wild based on that research. It isn't the kind of thing you can build on top of an existing database kernel, the internals were necessarily pretty exotic, which is part of the problem.

tl;dr: Graph databases have been stuck in the same local architectural minima for half a century, receiving a fresh coat of paint every decade or so but not addressing any of the reasons almost no one can use them.


Sure, in the 1970 paper "A Relational Model of Data for Large Shared Data Banks" where E.F.Codd introduced the relational model, he specifically mention how it is superior to hierarchical and graph databases, which were the dominant models at the time.

The hierarchical database evolved from the flat file database by allowing nested records (i.e a record could have sets of child records, arbitrary deep). The graph model further added support for navigation links or pointers between records across the structure, hence supporting a graph model (also commonly called network model at the time).


> Do graph databases predate relational databases? Do you have a source for that? I thought graph databases were a fairly new thing.

I believe they just called them "network databases" back then.


Aren’t you just doing the same thing in reverse here? You can use GraphQL with a relational database.


Fair enough! But GraphQL returns data in a hierarchical format, so the property of relational closure is lost.


I do not understand the comparison between SQL and GraphQL. As far as I can see they’re not close to the same. And GraphQL is also missing a bunch of really necessary things.


GraphQL is very limited in expression power compared to SQL. I doubt you could do 20% with GraphQL compared to what you can do with SQL.


Are you also talking about switching from a relational model to something graph-based, or is there some way one could use GraphQL with a relational database?


GraphQL isn't a graphdb querying language. The "graph" in its name is highly misleading, and should basically be ignored.

It's an API querying language.


SQL is also an API querying language. The primary difference is that SQL is designed to query relational data (rows and columns), while GraphQL graph data (tree-like structures).

The parent is questioning how useful GraphQL is as a relational query language or if the suggestion centres around moving to the database speaking in graphs. Conceivably, the data storage could still follow what suits the relational model internally with some kind of ORM-style engine that sits in the middle to translate relations to graphs for the GraphQL query processor, but at that point you could argue that the database is a graph database.


GraphQL is stiff competition, in the sense of "stiff as a corpse". Yes, it has the words "query language" in the name, but that's where the similarities end.

In GraphQL, you can select fields of JSON objects, optionally a field may have parameters which affect the values it returns in unspecified ways. That's it. Because of this design, unlike in SQL where you are concerned with modeling the structure of your data, GraphQL also requires you to think about the API all users will use to access the data. In SQL, I can write a table defining the structure of my dataset, and then users of that table can perform arbitrary queries when they know what data they need (aggregate with a month average grouped by account ID, filter to only rows where x = 7, JOIN this to grab some data from some other table etc.).

GraphQL has no aggregates (sum, average...), no grouping, no joins, no sorting, no filtering, other than what you manually design each using parameters at schema design time. Good luck anticipating the use cases of every future consumer of your data. Miss one? Better dig back into your implementation code & implement that use case each & every time a new one comes up.

The only part of GraphQL that is standardized is the query syntax. In SQL, the actual underlying relational data model exists and the syntax of queries exists within that context, not so in GraphQL land. In SQL, I define my data structures, and users can write queries and access the data. But GraphQL throws up it's hands and says "not my problem, try one of these libraries that ask you to implement your own custom data access functionality for all your data types".

OK, so it's a rubbish query language, but even the graph part of the name is misleading. Assuming that you even have a domain that it makes sense to model with a graph of types, GraphQL provides you no tools for dealing with the backend complexity of such a design. Because the syntax is so simplified, there is no mechanism within the syntax to define rules about relationships between types. For example, imagine a simple parent/child relationship. There is no mechanism within the syntax to tell GraphQL that for parent X, parent.child = parent.child.parent . So you can't even think about writing a GraphQL query optimizer, because there isn't enough information about the structure of the data encoded into the schema or query to do so.

So in practice no GraphQL implementations that I know of have anything resembling a query optimizer - someone asks for cart.item, and then item.cart for a cart with 1000 items? Have fun re-requesting the cart from your DB 1000 times (yes you can cache the item by ID to save a DB lookup, but we shouldn't even need to hit cache here! Every programmer involved knows the data is the same, it's just dumb GraphQL has no clue about the details of the relationship).


You are focusing on the wrong aspects of the comparison here. The point is not that GraphQL is dreamy from a developer perspective (as opposed to the api end-user perspective). The popularity just demonstrates that there is a real end-user need for the query to be correctly structured out of the gate instead of being re-combined/nested/etc after it's left the database.

> The original idea of relational databases was that they would be queried directly from the client. With the rise of the web this idea died - SQL is too complex to be easily secured against adversarial input, cache invalidation for SQL queries is too hard, and there is no way to easily spawn background tasks (eg resizing images) or to communicate with the rest of the world (eg sending email). And the SQL language itself was not an appealing target for adding these capabilities.

> So instead we added the 'application layer' - a process written in a reasonable programming language that would live between the database and the client and manage their communication. And we invented ORM to patch over the weaknesses of SQL, especially the lack of compressibility.

> This move was necessary, but costly.

Your complaints are about GraphQL at the application layer, ie, the layer of concern that exists because SQL is inadequate for the burden it's been made to bear.

Imagine if GraphQL added aggregates and support for types besides what JSON limits you to, and it then just sat atop an RDMS like Postgres so that you no longer have to fuck around with it at the application layer to get it going. That would clearly be an improvement, right? Or really, just the ability to implement a language like GraphQL atop the RDMS so you can design your own QL that you don't hate.


My bet is that somepoint we are going to get very good NLP model where you feed tabular data, and then you are going query througt normal language. That would like be pretty big change and would eat pretty big share of sql market share.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: