Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm a Go developer and I notice a lot of other Go developers instantly suggest things like GORM to noobs writing applications. Whereas I always suggest against it.

I'm a big advocate of understand your data model at the database level. Need to join on too many tables is too easy to do with an ORM.

My go-to strategy for SQL is simple. Abstract your SQL as far away from your application as possible. Not just by using interfaces in your application, but by using stored procedures in your database.

In my mind SQL is just another service your application depends on. And you want that interface to be as simple as possible for your application. If you need to tweak SQL performance it should not need any input from your application. I could completely migrate to a new table relationship structure without the application realising if that was what was needed. You could even go as far as to unit test just your SQL code if you wanted, something you can't do too easily when it's in your application.

Yes, if you need to return new data then you need to update your stored proc and code. But that's so worth it in my opinion for that extra layer of abstraction.

My opinion is slightly skewed from a decently sized business perspective, but I do still follow this pattern in personal projects. When migrating applications to different tech stacks (like Java to Go, or C# to Go) this abstraction has meant the world to us.



I am against using stored procedures. In the projects I saw where SQL code was encapsulated by stored procedures the business logic was inconsistently split between the application and the database making maintenance much more complicated than it should have been. I am in favour of stored procedures to avoid several roundtrips to the database but only when performance really matters. The case on tweaking SQL performance does not convince me. Where I work we execute performance and load tests before every release of the application and we are expected to meet performance targets that are documented in the release notes.


I’m interested in this because I have the opposite experience. Not saying any choice is wrong, but in my experience, moving that logic into the database has made all our codebases easier to reason about. The API’s don’t have to botch error handling due to “open transaction need to be ROLLBACKed”, I don’t have to worry about bad devs doing stuff non-atomically and putting the db in a bad state, and the code is easier to read because it’s just a single query that does “what it’s supposed to do”. The API usually ends up just being a small adapter for DB functions. It also makes sense to me because the “database API” (e.g. the database schema with some methods in it - the tables themselves are usually not in the same schema) describes the valid operations of the data, and can make sure the data is consistent. It also makes adding new APIs easier.


>I don’t have to worry about bad devs doing stuff non-atomically and putting the db in a bad state

Premise: I assume we are speaking of relational databases. The fact that a developer can "corrupt" the database with non-atomic stuff is an hint to me that the database probably doesn't have the right referential integrity constraints in place and probably is not normalized either. The relational model is built out of the box to keep a consistent state of the data and this is one of its main value propositions.

Another danger that I didn't mention is that junior developers will be tempted to prefer "Turing like code" inside stored procedures (for loops,cursors, conditionals) instead of relational constructs (joins, subqueries,grouping,table variables). I have seen that several times and this is really a killer for database performances.


The fact that a developer can "corrupt" the database with non-atomic stuff is an hint to me that the database probably doesn't have the right referential integrity constraints in place and probably is not normalized either. The relational model is built out of the box to keep a consistent state of the data and this is one of its main value propositions.

This is true when interpreted the right way, but I don't think real world problems are always so tidy.

Let's consider Standard Toy Example #17: The Bank Account Transfer. In a simplistic model, we might have a table of bank accounts in our database, and a second table of deposits. Constraints will guarantee that we can't do things like creating a deposit for an account that doesn't exist, but they won't ensure that if we create a deposit of X in one account, we must also create a corresponding deposit of -X for some other account so we aren't inventing or destroying money.

Of course, in a more realistic implementation, you'd never represent transfers between accounts as two separate stages like that, and if there is any single place to look up deposits into specific accounts it's probably some sort of view or cache rather than the authoritative data. But rather like going from single-entry to double-entry bookkeeping, to avoid relying on something higher up our system to ensure consistency, we've had to restructure our whole data model and the schema implementing it. In a sense, this is still just normalising the data, but that's a bit like saying implementing a word processor is, in a sense, just refactoring and extending a text editor.


A related principle is to make illegal states unrepresentable. If you can possibly insert a debit and fail to insert the corresponding credit, thereby making money magically appear, the schema is wrong. This isn't a matter of just insufficiently normalized, it simply doesn't represent what you claim it represents.

The database shouldn't be viewed as just some kind of scratch pad where you write stuff down so the application server doesn't have to remember it. If the database is the source of truth, then it is the model, and the application is simply a view.


This is a good principle to aim for but it is impossible to fulfill in many cases. There are always going to be invariants that the application may wish to enforce which cannot be enforced in the database (or, if there was some ideal schema to enforce them, the migration from the existing schema would be enormously costly). This is the whole reason we have ACID transactions in relational databases - so the application can enforce its invariants!

(Exercise for the readers: construct a set of invariants that cannot be enforced within a database schema.)


This is a good principle to aim for but it is impossible to fulfill in many cases. There are always going to be invariants that the application may wish to enforce which cannot be enforced in the database

As HN doesn't show moderations, I will just say that this is exactly the point I would have made if I'd been replying first. Relational databases are good at enforcing the kinds of relations provided by basic set manipulations. If you try really hard, you can encode some more complicated constraints, but as the complexity increases it becomes unwieldy and eventually impractical to enforce everything at schema level.

Edit: Changed "database level" to "schema level" to clarify that I'm not talking about stored procedures here.


I'm a little confused. A 'toy' bank app would have individual accounts with some amount of money in each. The problem arises when you subtract the debit from one account and error out before crediting the amount in other account, or vice versa. Now you're in an inconsistent state.

How would you typically design a relational schema that could avoid this scenario?


The usual for this would be to have a table of transactions instead of balances. The balances would be computed dynamically by summing the full transaction history for a particular account.

Of course this gets a bit unwieldy with large transaction histories, and has trouble with things like enforcing that you can't enter a transaction that draws a balance below zero.


In the real world, you can totally go below zero though. It’s a race condition that they turned into a revenue opportunity.


Well yeah, but that just makes the business rules even more complex. That makes it more like - reject the transaction if it would move the account balance below zero, unless the account has a flag to allow overdrafts (which is over on the accounts table), in which case we allow the transaction and also enter another transaction for the overdraft fee.

Probably need some more rules like max overdraft amount, max number of transactions etc. I've heard some places may also have the "let's be jerks" rule to reorder transactions to hit overdraft as soon as possible to maximize fees.


That sort of problem is actually relatively easy to solve in itself. In essence, you don't record the transactions on each account separately, you record the transfers between them (just as double-entry bookkeeping does). Since any transfer inherently balances, you can't be inconsistent in crediting money somewhere without debiting it somewhere else or vice versa. If you want to know what happened to an individual account, you can still construct that information by looking at the relevant sides of all the transfers into or out of that account, but your single source of truth is always the transfer records.

Of course, it's still not that simple in any realistic system, because maybe you have other important constraints such as ensuring that accounts don't go into negative balances. In that case, before you can add any new transfer, you need to calculate the existing balance on the account you're taking money from, which means replaying the entire transaction log selecting for that account if you just have the naive schema described above, and then you have to create the new transfer record if and only if the constraint is satisfied. All of this has to be done atomically to make sure you can't have multiple transfers being recorded concurrently that would individually be valid but collectively leave an account overdrawn.

That takes you into the technicalities of how to process DB transactions with acceptable performance. Hopefully your database will help you if it's ACID compliant, but it still can't work miracles. For the simple two-sided transfers in this example, you can probably do a lot just using a database's built-in transaction support, but for more complicated constraints applied to more complicated schemas, at some point you have to take the practicalities of your database engine into account when designing that schema and deciding what constraints to enforce at this level.


I think as long you keep use stored procedures as atomic operations/transactions it both simplifies code and improves performance. In your example, a transfer money procedure should handle both the balance increment and decrement of all accounts involved in a transaction before returning. Things start to get hard to maintain if you have stored procedures that only complete a portion of the business logic and rely on code in the business logic to “understand” what still needs to be done


I think what he has experiences is something I have also seen a lot of times (and am still seeing in my current company): Some developers put their application into SQL. That means, that the whole business logic of the app is SQL, and the app is just executing SQLs in order and does some front-ending.

But I don't suspect that is what the original author meant.


“open transaction need to be ROLLBACKed” isn't business logic. Business logic should rely on operations being transactional anyway.


There are security matters too though, which no-one ever seems to get right.

Stored Procedures correctly managed are a more secure option than an ORM executing arbitrary SQL code, but the reason why is misunderstood consistently (I've had experienced DBAs get this wrong and insist to me that stored procedures prevent SQL injection attacks which is completely false).

To pick an arbitrary example out of thin air:

If my application requires 10 different calls to the database with variables, then I can create 10 stored procedures. I can set the permissions for the account that the calling application is using to only have EXEC privileges and only on those 10 procedures. This means that if the credentials were to leak the damage is limited to tasks the application could conceivably have performed anyway, albeit without any application-enforced validations on the variables passed to the calls.

If I use an ORM, I have to give the SQL account the client application is using more-or-less carte-blanche access to the database as it could conceivable read from or write to anywhere in the database through arbitrary SQL. Sure, maybe I scope the ORM to a schema and restrict on that, but it's not nearly as granular and fundamentally misses the point that I've opened a massive attack surface unnecessarily by creating a SQL account with generally free reign on the database that exists in the application layer.

On a recent project, mostly to see how practical it was, I built an application that used stored procedures with all of the validation being done directly inside the procedures in SQL. This had quite a few benefits:

I only had to maintain validation in one place. I could rely on the formatting constraints I already needed in the database anyway. The validation rules were beside the data and were much faster to edit and maintain over time. The validation messages had to be pulled as keys making globalisation of the application much simpler later within the client application.

Different strokes for different folks, but there are literally decades of reasoning behind why RDBMS systems are the way they are that is completely bath-watered by ORMs.


"If I use an ORM, I have to give the SQL account the client application is using more-or-less carte-blanche access to the database as it could conceivable read from or write to anywhere in the database through arbitrary SQL"

You can give table level permissions (and even specific columns if you want) to the ORM db account. From a security point of view there should be no difference.


I agree with both of you, although have never set up a DB in either style :)

One small point: the proposed stored procedure approach seems, qualitatively to me, to be less error-prone. The consumer in the sproc approach either does or doesn’t have access to specific sprocs, while managing fine-grained per-column permissions seems easy to screw up in either a too-liberal or too-restrictive way.


I have a few questions I'm curious about.

1) How are you unit testing all of this?

2) How are you source controlling all this?

3) I'm assuming you are not versioning anything but if so how are you?

4) How difficult would it be for someone else to come support this?

Number 4 has been the nightmare I've seen with excessive stored procs.


The project is end-of-life now so most of this is moot but to answer these queries:

1) SQL calls in a separate command-line application.

2) Git. It’s in a .NET Database project.

3) See 2.

4) Not overly to be honest - the organisation uses SQL procs pretty extensively as is, so this isn’t anything particularly new or out of the ordinary other than the validations are held in a different part of the stack.


Not having consistent rules about what code belongs where is a people problem, not a technology problem. In order to write maintainable software, you always need proper code classification, regardless if you use stored procedures or not.


Allow me to recommend another current HN front page article in response to this: “Discipline Doesn’t Scale”, https://www.sicpers.info/2020/10/discipline-doesnt-scale/


Again, this has nothing to do with technology. Why do we assume that you only have to follow software engineering principles when writing Java, but not when writing SQL?


I’m not sure what you’re arguing. You should have rules for what code goes where and stored procedures are also considered an anti-pattern by most developers because they tend to split business logic between the application code & database.

Let’s say you decide to use stored procedures with rules about what belongs in app code and what can go in a stored procedure, the “discipline doesn’t scale” perspective would say that as your company and app code gets larger your rules get harder to enforce. So devs tend to fall back to the simplest, easiest to follow version of the rule: stored procedures are an anti-pattern.


So what would be a solution then? I imagine, creating an RDBMS where SQL can be executed only by DBA root accounts, and normal (app) accounts can only execute stored procedures? That would enforce a pattern without requiring discipline.


Impossible to generalize, would depend entirely on what specifically you want the stored procedures to do. Though I can’t think of a good reason to abstract your queries at the database level. Even ORMs typically allow execution of raw SQL, so you could still store a straight SQL query in your app code (perhaps in a constant) instead of using a stored procedure.


I've been thinking about this a bit before seeing this discussion, and I'm starting to feel that abstracting at database level makes sense, because it turns your RDBMS from being just a dumb SQL execution engine into an application-specific API.

If you consider restricting operations on your data close to the storage layer, you can imagine wrapping DB access with an app exposing business-specific API, and route all your actual code through that API. Doesn't seem like an unreasonable design to me - particularly, if the same database is used by multiple applications. But if you do that, it may make sense to just put the API inside the RDBMS - giving you one networked software component less to manage.

(I've done work on a project using lots of stored procedures only once, and my tasks were unrelated - so I have no practical experience here. But through this discussion, I think I'm starting to understand why enterprise projects are anecdotally so in love with stored procedures.)


If you abstract at the db level then your db reading code is either unlikely to be in version control or you need a klug to for version control for stored procedures.


But that's a concept entirely orthogonal to system architecture and solvable with appropriate process and/or tooling.


I suppose it is in theory but how many of us work in theory rather than with a set of tools that are commonly available?

Or maybe it’s just better to roll your own version control system to support your really amazing new app architecture that stores lots of logic in database functions, what do I know? ¯\_(ツ)_/¯


Same here. Earlier this year I consulted a company for evaluating whether to modernize their current ERP or buy a new one. What I found was a large clusterfun of an application inside a MSSQL database with Microsoft Access frontends and some glue code molded as .net application on Windows 10 in between accessing third party APIs. Barely anything documented, thousands of stored procedures. When I asked how they roll out changes and perform tests I got this answer: “As few deployments as possible. Everytime we deployed new features, there have been always issues for a couple of days.”


That's an application-centric view of development, as indicated by the phrase "the application". If an application is successful, its data will likely end up being used by other new applications, and the data will survive the lifetime of the original application that generated the data. The applications themselves will be rewritten many times as technologies evolve. Think mainframe to client/server to EJB to lightweight J2EE to Spring Boot microservices.

If there is logic or metadata that is common across all those applications, there are a couple choices. You could duplicate that logic across all the applications, and rewrite that logic every time you do a rewrite of the application, or you could keep it one place.

If you keep it one place, one way to do that is to have a service in front of the database that every application that uses the data calls instead of hitting the database directly. That has some disadvantages in that it requires upfront design and planning, the service in front of the database will likely be rewritten in new technologies over time, and it has performance implications. And for the developer, instead of having to deal with logic in an application and a database, now they have to deal with logic in two applications.

Another way to do it is to have common logic for the data implemented in key constraints, check constraints, triggers, stored procedures and other similar tools in the database so that any application that uses the data doesn't need to rewrite that logic, and can't intentionally or unintentionally violate the rules of the common domain. That does have its own disadvantages, and it makes things more complicated for developers who will need to be familiar with an additional set of technologies, but it is a valid use case for stored procedures.


I was once dead set against stored procs (having been previously in favour of stored procs) for the same don’t-split-business-logic reasons. However I now have a more nuanced approach - stored procs shouldn’t have business logic, but can be used to insulate sensible code data models from whatever random bozo (like me last year, or some legacy decision from 15 years ago) decided how the database schema should be.

The other way I’ve seen data-transport stored procs used is to allow multiple code versions to work on a single database - where you need to support version N and N+1 one the same data (I once worked for a saas company that provided a Preview version on live data before going to prod). Before version N+1 code goes live, deploy N+1 database schema, updated procs, new data, etc, and put in place versioned stored procs which let a version N application run on a version N+1 database, setting default values for new fields, etc. It did require some extra thought to make it work smoothly, but very rarely caused problems for customers.


How do you prevent business logic being inconsistently split (and duplicated) between the applications or libraries?


Having worked as a DBA in numerous large financial institutions, and also as a SWE in a lot of node/python/MEAN-style startups. I can assure you that there are some very significant trade offs made with stored procs (or any business logic that is compiled to your database). It takes your one codebase and turns it into at least two seperate codebases. If you have a monolith, now you’ve got a distributed system. Any stored proc codebase is guranteed over time to become a maze of complexity and footguns. The more db features you use in this area, the worse it gets. If you’re using stored procs, then somebody is eventually going to create a table trigger. If a table trigger exists, then somebody will eventually write to that table not knowing it’s there.

The complexity it adds to release management alone makes it not worthwhile in a lot of situations.

It’s like doing full blown OOP design. Your codebase has to pass a certain (rather large) size before the complexity trade offs can even begin to make sense, and even then they often don’t.


>> Any stored proc codebase is guranteed over time to become a maze of complexity and footguns.

I think this is a very valuable and important point and wanted to highlight it.

>> Any stored proc codebase is guranteed over time to become a maze of complexity and footguns.

There's no absolute law that states that SP using environments will always end up as a mess, but there should be such a law, because they do!

I've worked across dozens to hundreds of enterprise environments, and the pattern is clear.

I suppose it comes from personality and culture with the people working on db coding more often in the 'get it done' end of the spectrum vs programmers where you find a significant number interested in 'get it right'.

An even stronger reason is probably that db coders are more likely to be autonomous instead of in large centrally driven teams. That central hierarchical organisation is a necessary but not sufficient condition for working to tame the entropy of large complex systems, without which they evolve into "a maze of complexity and footguns".


It's certainly a valid observation that anything that large enterprise does, will end up being more complex than it necessarily needs to be.

But if I look at a lot of the SP anti-patterns I've seen over the years, some common themes start to arise. I'd say the biggest issue is that it is more complex to maintain interfaces between two discrete systems than it is to write monolithic code paths. Seeing SP_ProcedureName accompanied by SP_ProcedureName_New is very common. It's easy to get into that situation, and tedious to get out of it.

Storing business logic in the DB also presents a lot of very tempting, but ultimately costly, shortcuts to developers. Using a non-temporary table when you should have used a temporary table, because you wanted debugging to be easier, but now you've got a concurrency issue, but it'll probably be OK, so you leave it there for 15 years. I have also never seen what I would consider to be a sensible use of a trigger. You also end up with SPs invoking SPs invoking SPs... and figuring out the side effects of calling something becomes a lot more complicated than it would be in a monolith.

I don't think they're always bad, but I do think that reasonable use cases for them are rather uncommon. When you had a lot of services directly consuming one database, SP interfaces used to make a lot more sense. But that's not a very common architecture is not very common any more. Even in large enterprises, you'll typically see new projects being developed as web applications, even if they're only served over the corporate network.


yeah, I don't think this is a problem with sprocs, per se.

I think this is a problem because the database code is treated differently from the application code, because so few coders understand SQL well enough to be comfortable in it.

So, as you say, the DBA's tend to be autonomous, and held to different standards. And with less ability to say "we have tech debt, we need to refactor our sprocs" because of the knock-on effect on the application code.

It's a management problem rather than a technical certainty, I think.


If it happens consistently across "dozens to hundreds" of enterprise environments as stated by the GP and with apparently zero exceptions in which they remained simple then, while maybe not a technical certainty, it should be treated as a practical certainty.

Pointing out that it's probably caused by management is not really very useful since there are so few means available for line workers to fix their managers.


This is true, and I agree, but mis-attributing the problem is still not good.

After all, if (for instance) a new IDE plugin appeared that managed sprocs consistently with code and solved this problem, then we'd still face "sprocs are bad" criticism because the problem was mis-attributed.


> If you have a monolith, now you’ve got a distributed system.

You already did, you just chose to ignore that the DB is actually a separate service. Ignoring this has meant that I've spent a not-insignificant amount of time in my last few jobs cleaning up bad ORM code because the queries performed terribly. I've cleaned such messes as pulling in all rows and then looping/filtering on them client-side, just generally querying for too many things/too many columns, querying for data and then looping over it to do more queries (instead of figuring out how to do it in a join), etc. When people treat the database as if its part of your monolith, because all the query logic is locally part of your codebase in your ORM, there's a temptation to be lazy and treat it that way. And it works fine in development because the database is small with only test data, but can be painfully slow in production.

I'm sure plenty of people do it properly, even with an ORM, but I've had to clean up the mess too many times now, across multiple companies and dev teams.

I'm not advocating for putting all of this logic into stored procedures, I think that's going too far in the other direction, only that you already have a distributes system, if you have a database, where the boundaries do matter.


Not to nearly the same degree. Start transaction > mutate data > end transaction is not nearly as distributed as invoke SP > let the database do something (?...) > wait until it's done. The first is a distributed system in the same way that an application writing to a filesystem is a distributed system.

I'd also point out that ORM is not the only alternative to stored procedures. In addition to that, there's nothing unique about ORMs that make them susceptible to bad code, or poor SQL understanding. Somebody who doesn't understand data structures and how the CBO works is going to write equally poor stored procedures. Removing the ORM is not the solution to your engineer not know how RDBMSes work. ORMs also don't suit every use case, but again, having a bias against ORMs is not the solution to failing to understand how your use case fits in with your RDBMS.


> I'd also point out that ORM is not the only alternative to stored procedures.

Absolutely. I personally use HugSQL[1], so no ORM, but I also don't use stored procedures. I guess my main point is that the boundary between application and database matters and not putting logic in the database doesn't make that boundary go away (putting logic in the database also doesn't make it go away). But I take your point, its not to the same degree.

> there's nothing unique about ORMs that make them susceptible to bad code, or poor SQL understanding.

I think that there is: it makes database code look like client-side application code, so there's some temptation to just blur the boundaries and mix them together. This is how you end up with code that queries for data, then does looping and filtering outside of the query, then queries some more. Sure, you can write equally bad code in stored procedures, but the boundaries are more obvious. At the very least, its likely a different programming language from the rest of your application.

> Removing the ORM is not the solution to your engineer not know how RDBMSes work.

Sure, however, in practice, at least in my experience across multiple teams and companies, education hasn't been a reliable solution. People still abuse the ORM and treat it as if the database is magic. Removing the ORM doesn't automatically fix it, of course, but it does force people to consider the database as a separate service whose access crosses a boundary. In an ideal world, everyone already keeps this in mind and ORMs can be used as they were intended, but, at least in my experience, this just hasn't been reality. Have I just been unlucky?

If different people and teams keep making the same mistakes with the tools, maybe its time to re-evaluate the tools and what they encourage people to do. Its not that the tools are bad in and of themselves, but that they encourage bad usage. (EDIT: I just saw another article on the front page and the title seems apt: "Discipline Doesn’t Scale")

[1] https://www.hugsql.org/


> If you have a monolith, now you’ve got a distributed system.

Just call your procedures "micro services" and you are in the clear ;)


Your typical micro-service architecture will run into a lot of the same types of problems. The cost of maintaining those inter-service dependencies easily gets quite high, so if you're going to commit to that, you really should first understand whether you're going to be deriving any benefit from it.


I have been working with Java for the past 20 years.

I have noticed that over the years the knowledge of SQL eroded a lot in population of Java developers. Not only that, but the design of application seems to be lacking. Where problems could easily be solved with a little bit of efficient SQL people mindlessly accept huge performance losses due to ORM as if they just did not see other possibility.


It makes sense .. if all you have is a hammer ..

The problem is exacerbated though by the amount of churn on styles of hammers, and newer developers are often too hammered to pickup other tools ..


I've noticed this about few things. Regular expressions are another - they have this bad reputation as "write-only, magic one liner hacks" that should absolutely be avoided. The proposed alternative is usually one or more dozens of lines of manual string operations in a loop. It feels like it doesn't enter the minds of people in some companies I worked in that regular expressions are a separate language, one that you should learn.


And layers of abstraction make it worse. Devs don't know SQL because Hibernate, now new devs rely on the "auto-magic" of Spring Data JPA and don't now Hibernate...


I've heard quite a few times the argument that by using Hibernate, you don't need to know SQL. But Hibernate's manual is huge.


We do this in work and it's not great in my opinion.

There can be issues with package versions compiled to the db. It makes continuously integration harder. We do automated builds and testing in the application layer but we don't have anything in the db layer. It's harder to create a new environment (for a developer who does not have full db permissions). Business logic ends up in the stored procedure layer AND the application layer.


Including a real DB in continuous integration is a must for any development IMO. (And these days usually very easy to throw up a database using Docker -- each of our full CI runs runs in 2 minutes and does hundreds of "create database" calls and populate real SQL databases from scratch..)

For instance:

1) You want to implement idempotency in your backend by using uniqueness constraints in your DB 2) That's important to test 3) That's not really tested unless you include a real DB

After including a real DB in our CI setup I will never move back.


I use a Docker instance for localhost development, so much easier.

There are so many problems that can occur in that communication to the database, it's invaluable to test it


> for a developer who does not have full db permissions

Are you using a commercial database (SQL Server, Oracle, or similar?). With open source databases this is non issue as developers can run a complete instance locally with full permissions, completely isolated from prod/staging servers.


It looks like even the expensive commercial databases have docker images that would be okay for development purposes:

* Oracle Express: https://github.com/oracle/docker-images/tree/master/OracleDa...

* Sql Server: https://hub.docker.com/_/microsoft-mssql-server

* DB2 Community: https://hub.docker.com/r/ibmcom/db2

* Sybase: https://hub.docker.com/r/nguoianphu/docker-sybase


> but by using stored procedures in your database.

This used to be the standard when clients used to connect directly to the database. Now that the world depends on web services, things are a little different and there's less incentive to maintain stored procs as an interface to your database.

It's still a good idea though, with unparalleled performance. I suppose developers don't do it for a variety of reasons: They may lack the SQL knowledge, they may not want to maintain the extra code, they may fantasize about database portability, etc.


Well as someone with 25 years SQL experience, I generally will still use LINQ for 90% of db access code in business style applications, for very important reasons beyond those mentioned, like

- They may realise value in having a compiler provide guarantees, as a bonus immediately in a tight feedback look while editing code as well as on compilation

- They have favour expressive languages with robust error handling, integrated IDE and VC etc

- They may want to reduce the number of moving parts

These are all along the themes of a) delivering results faster, and b) improving the maintenance lifecycle of systems - unlike those parts of the industry that don't really plan to maintain their systems and instead just wanna do rewrites in the latest hotness which is usually a hot mess from a long term maintainability perspective.


But then LINQ is in many ways conceptually closer to directly using SQL than to using a classical ORM.


Sorry my wording could have been more explicit!

Generally, use of LINQ in the context of DB access is done hand in hand with an ORM, such as Entity Framework, NHibernate etc, and this is how I meant it.


I'm not a fan of stored procedures because 1) It's more maintainable to keep all the logic unified in the one language and system with which developers will be more familiar with 2) The programming tools are normally much better for app code 3) You can write unit tests for app code (does anybody write unit tests for sql?) 4) you can usually avoid transferring too much unnecessary data or doing too many unnecessary network requests, remember to avoid premature performance optimization


On the other hand, one of the highly specialized stored procedure I wrote in PostgreSQL have being successfully and flawlessly re-used/tested on multiple out-of shelf clients app (BI related all developed in different languages) simply because they all support SQL.

I agree this only apply for data-model related functions. But this has proven really low maintenance, and improvements of this "API" are automatically available in all clients.

I'd bet that accessing this stored procedure through an ORM raw SQL query would provide more benefit than rewriting it from scratch in ORM's language.

PS: Also yeah people do write unit test for SQL (never enough but that's another story). https://pgtap.org/


1) One language and one system: SQL on a database that will last much more than the applications 2) SQL has a limited scope, and so do SQL tools. You need advanced tools if you have "app code": it's a necessary evil. 3) At worst, you can write unit tests for app code that calls SQL queries and stored procedures. 4) "Usually" isn't enough, moving logic to the client requires moving data to the client.


The argument against it I usually see is "you can't version control SQL!"

Where I work we have a custom system which manages releasing immutable SQL snapshots between environments and they get merged to master once in prod. The only thing needed is a process, and then version control is easy.


Editing stored procedures and deploying them on a database is the same as editing code deploying it on a server. There is no difficulty in versioning it


Yeah you can just use a migration system and store your functions in text files checked into the version control. Nothing but the migrations are allowed in staging and production so everyone is forced to use it.


As a front-end dev, I've been interested in learning about such a process but don't know where to look. What kind of tooling would you use to manage your migrations in a CI environment?


ive used liquibase in the past (not affiliated) with great results https://www.liquibase.org


Flyway or Liquibase.


I'm not sure about other DBs, but SQL Server has (really good) tools for this (DACPACs). As an added bonus you can then use tSQLt to unit test the SQL.


> you can't version control SQL

Aren't stored procedures part of the database schema? Which can be version controlled?


On high volume applications I would avoid it since it's usually easier to horizontally scale web servers than sql servers, and it makes cache strategies more difficult depending on how and what you're querying.


It is a common misconception that more data processing in SQL puts a higher load on the database. A typical database spends 96% of its CPU time on logging, locking, latching and marshalling [1][2] rather than processing SQL. By sending less data to the middle tier and performing fewer round trips, the use of stored procedures means that the database can actually do more real work.

[1] https://dzone.com/articles/mit-prof-stonebraker-%E2%80%9C

[2] https://drive.google.com/file/d/0B7jyeB8kxFPjU0VySkF3UHhoVnM...


It depends. Regardless there is limited CPU, and so any scenario in which a stored procedure uses more CPU than a simple query will cause you to hit that saturation point sooner.

I generally have layers of caching on top of the sql server so the majority of queries will be integer equivalency or range checks, if not get by primary key queries. So I am not generally operating in a scenario where a stored procedure would reduce record scans, etc.

I generally also don't run on transactions out side of limited scenarios, since throughput is usually more important to me than data consistency.


sorry, I don't get this at all. How is using stored procs instead of an ORM going to adversely affect your cache strategy?


Imagine for example that you have per record caches. You can run a query to get ids with out joining to the table and then simply fill in any gaps in your cache with a follow up query.


In most cases, it is sufficient if your database runs on the DB in production and sqlite for local development. If it's too big to be able to run on the dev computer, then it should use one db across the board.

Portability across databases rarely buys you anything but hard to reproduce bugs.


Just a friendly reminder to anyone reading this about v1 of GORM. It doesn't properly support JOINS:

https://github.com/go-gorm/gorm/issues/1436

Note that there is some work for v2 in allowing users to create JOIN queries but without JOIN support it's effectively unusable in anything other than very small databases.


wait, what? doesn't support joins? Things like this make me doubt my fundamentals. 'Are joins really necessary?' is the question now running through my mind ;-)


It does not support joins. The expectation is that you "Preload" your data which is a fancy way of saying load it all into memory. The performance of this is trash so don't use that ORM there are a few better ORMs for GO mentioned in the issue.

Yes joins are really necessary in SQL they are a foundational part of using any relational database.


Yes, they're literally the second thing one learns about SQL, really. I'm not being too serious about rethinking whether they're necessary, to be clear!

What a strange design decision though. I don't really understand why it is doing this? Why would anyone really think it's a good idea?


> I'm not being too serious about rethinking whether they're necessary, to be clear!

My apologies if I came off condescending, it's hard to impart tone of voice in text.

> I don't really understand why it is doing this?

I think it's because it makes the SQL Generation harder. I think the lib at the moment is simply a search and replace on a bunch of SQL strings.


Thankfully my team is making the switch from gorm to vanilla SQL. I joined the company a little over one year ago on another team that also uses it, and using gorm turned out to be a really pain. It was the first time I had to use an ORM, and I really disliked the experience & constant source of issues.


You're sadly fighting against the "but... but... it's how people DO it" mentality here.

Not to mention, as computing expands, the average capability of people is dropping. Seats need to be filled, and those seats don't all need to be high quality devs.

So ORMs are there to reduce the workload, in terms of what needs to be learned to be productive. Database tuning is becoming a lost art.

(For DEVs not getting this, on a largish DB, I can literally write a single query which can take 10 seconds in MySQL's console, or hours. Yes, hours.

This isn't a flaw, any more than doing a tree search wrong is a flaw in <insert language here> when it is slow.

And most ORMs don't come close to properly handling all edge cases here.)

Beyond all of that, there is another bonus to using stored procedures or functions. Security.

There are many tables I want to give read only access to, yet want to allow writing in very limited scenarios. A stored procedure can have different user permissions (at least under most DBMS), which means you can:

- make a table read only for a web application user - then, allow a stored procedure to perform INSERT/UPDATES, via a different user - however, let the web user execute the stored procedure

The stored procedure thereby limiting the INSERT/UPDATE to a very specific action.

Mostly, I've used this when dealing with tables/databases which have a financial component. Said stored procedures can even perform additional checks on other tables, whatever is required to ensure that the update == OK.

Beyond that, it also allows for tiered security, in case you have someone in house, maybe a new dev, who decides they'd like to modify that financial table a bit.

And beyond that, it also allows logging to be done at the DB level.

While you are more limited here, in that you really do not want to be burdening the DB with more writes, you can definitely log all modifications to said financial table.

(You could use triggers for this last bit too, of course, but still...)


> Not to mention, as computing expands, the average capability of people is dropping. Seats need to be filled, and those seats don't all need to be high quality devs.

Also, a lot of businesses have analysts/data scientists that know SQL and have access to the DBs: stores procedures make business domain-level functionality more available to them because they don’t have to guess what application code is actually doing.


> ORMs are there to reduce the workload, in terms of what needs to be learned to be productive

The problem is (which you seem to recognize but most seem to miss) is that the opposite happens: the ORM lets you shoot yourself in the foot quickly, but that's about it. In the end, you lose more time using an ORM than you actually gain.


> I'm a Go developer and I notice a lot of other Go developers instantly suggest things like GORM to noobs writing applications.

And there is nothing wrong with that. It's like recommending against the use of HTTP before people learn TCI/IP, or recommending against PHP/Apache just because people aren't really writing servers.

Learning how to write a web app and learning database management are 2 separate topics. Furthermore, GORM is rather barebone, compared to Entity Framework or Hibernate, due to Go type limitations...


The thing is, RDBMS are set oriented. Manipulating sets of tuples is a very high level kind of thinking compared to manipulating individual objects and their relationships in a procedural manner.

Of course sometimes the task is to write load/edit/store logic for individual records and their relationships. There's nothing wrong with using an ORM for that.

But extending this sort of thinking to all data processing is extremely detrimental to both productivity and performance. If you're using an RDBMS you're well advised to apply a sets first approach to data handling.

That's the danger of ORMs. They encourage people to do the wrong thing.

They also add considerable complexity to a project. So after having written my own ORMs many years ago and using some of the popular ones I would stay away. The cases where they are the right tool for the job are just not worth the friction they create.


Not only teaching people to do the wrong thing, but also teaching that RDMS's are scary, deep-level, things that developers don't need to understand in order to do their job proplerly. Which isn't true.


There's a big difference between HTTP, Apache, and ORMs. People not understanding why they're different is what sinks companies all the time.

Using any abstraction is a gamble that it doesn't leak before you have the resources to fix it.

HTTP is a terrible comparison because in almost every domain you will either make millions or (far more likely) fail for other reasons before you need to understand how TCP/IP works.

I bet there's loads of companies that have died a slow death, releasing everything a month later than they want to, because the back-end developers aren't comfortable with whatever the stack of the day is and they were too busy learning yesterday's stack of the day instead of the intricacies of the language they're working with, or the standard library.


> It's like recommending against the use of HTTP before people learn TCI/IP...

This is a false dichotoy. SQL is not TCP/IP because it's nearly on the same level as ORMs. Unless you normalize and understand your entities properly it'll be difficult to write performant systems. ORMs are rightly called the Vietnam of Computer science¹ for this reason.

[^1]: http://blogs.tedneward.com/post/the-vietnam-of-computer-scie...


I agree that ORMs lower the barrier of entry for creating a full stack app.

But it's not just the suggestion of GORM. It's that it is often suggested as the solution and anything else is unviable.

I think writing a web app and knowing how the database works are interlinked. It's incredibly valuable knowledge as a web developer IMO.


This is the approach I like as well, with the caveat that the stores procedures themselves should be kept as dumb and as late in the process as possible.

I have made many mistakes in my career in terms of systems that were either overly complicated or inefficient or both, and it has almost always ended up boiling down to either trusting the SQL that an ORM produces, or putting too much of the business logic in hand crafted stored procedures. I have done both, regretted both, and nowadays advocate for sticking to simple, predictable, and dumb stored procedures so that you can both have the benefit of knowing/controlling exactly what will execute in the database and having things like validation occur code side whenever possible.

In the real world, that simplification of stored procedures isn't always possible. But if it's your starting assumption, then you can fight for it until you are certain that there isn't a practical alternative to pushing the logic into the stored procedure. And if a compromise of these principles needs to happen, I would push complexity into a stored procedure before trusting the SQL produced by an ORM every time.

But where I fall on that spectrum may just be due to the severity and types of problems I have seen happen with each approach, do YMMV. And I wouldn't go so far as to call myself an expert, so also YMMV on that front. I am just a guy who has made a lot of mistakes. But if it'll help others, this is my takeaway.


Came here to say exactly this. Also a gopher, who is also vaguely horrified at all the "use GORM, and a framework" advice to newbie gophers.

I also use stored procedures, and treat them as function calls from my application code. So I effectively build a "data storage API" in the database, that I then call (as you say, like any other service) from my code.

I also have views for all the data sets that will be returned to the application, so I can use them easily as a type for function returns, and enforce consistency.


I'm trying Gorm right now, and I'm running into some serious issues where it's not as straightforward as an ORM should be (mostly having to do with transparently saving / updating relationships).

I'm reluctant to go to straight SQL because then I have to:

- Write a ton of 'mapping' code between resultsets and data model

- Do a lot of diffing to determine which relations to add, update or remove

- Manage my own database migrations - challenging because I'm stuck with using sqlite

- etc

I want my tools to make things easier for me, but gorm seems to be the "now you have two problems" direction - for now. Maybe I just don't get it yet. Maybe I just don't get databases anymore either.


Depending on which DB you're using, I'd suggest trying out https://sqlc.dev instead of gorm if on PG. Mysql support is a bit flaky last time I checked.


> You could even go as far as to unit test just your SQL code if you wanted, something you can't do too easily when it's in your application.

How do you unit test SQL code? You could integration test against something like sqlite but unit testing stored procs sounds very difficult.

I worked with a big C# .net core ecommerce platform that had a big stored proc used for ingest. It was an absolute pain in the ass to debug and fix issues because testing it was so difficult. Eventually we replaced it with C# code. Access to full software dev tools (debugger, automated tests) is really something you don't want to be without on large projects...


> How do you unit test SQL code?

e.g. https://pgtap.org/documentation.html

There are similar solutions for Oracle's PL/SQL and SQL Server


There is a middle-ground... helpers.

Things like search interfaces might offer filters over multiple columns... your choices here:

1. Use an ORM

2. Write SQL for each permutation of filter

3. Write a helper that will compose the SQL based on the filters needed

And that 3rd one is the only time I stray from plain SQL towards something that looks like a lightweight abstraction over the SQL.

Similarly... stored procs, I avoid these but there are times when they are necessary. If you need the boundary of a transaction to span multiple SQL calls then it is safer to do so within a stored proc. Yes you can use a transaction and rollback, but from a maintainance perspective I've seen refactors break transactions that were not fully encapsulated in a single Go (insert other language) function .

Yes keep things simple with SQL and do everything in the application... but also break those rules rather than do something daft or with side effects.


Or use something lightweight and configurable like https://github.com/a8m/rql to just handle filters.


Seeing the word "helpers" in a codebase gives me PTSD flashbacks.


Your view is the original design idea behind stored procedures.

Stored procedures as far as I know were first implemented in Sybase SQL Server. We tended to think of them as enabling RPC (remote procedure call) access to the DBMS. Sybase was also the first major relational database to offer client access through the network. Our connectivity was influenced by distributed system concepts like RPCs that were top-of-mind in the 1980s.


I've recently came up with following approach: make it easy to generate CRUD applications without ORM's, and then modify them any way you like without extra dependencies.

Made a tool to achieve just that: https://github.com/sashabaranov/pike/


I salute your level of abstraction but I personally draw my line in the sand a bit higher. A library project for the data access layer with a stable app facing code API gives me the best bang for my buck in terms of flexibility and abstraction.

That said, its easy for me to push out new code versions. I realize it doesn't work for everyone.


I agree and for any serious database used in production SPROC's have so many advantages improved security as well.


>> I'm a Go developer and I notice a lot of other Go developers instantly suggest things like GORM to noobs writing applications

I've made the opposite observation. Compared to Python, C#, Java and NodeJS, ORMs are actually not a big thing in Go. Which is good - I agree with you.


So much this. ORMs? That way lies madness. For example if you use Hibernate in Java you can end up writing stuff like this:

  interface MyRepository {
    Page<SomeObject> findEntitiesByIdAndCreatedAtAfterOrderByCreatedAtDesc(int id, Instant after, Pageable page);
  }
This abstracts away executing a query along the lines of: select * from entities where id = $1 and created_at >= $2 order by created_at desc

I'll take the latter any day.


Pageable also adds window functions (limit, offset). If you are going to compare, please be courteous and do it right.

When app starts up, Java compiles these declarations and will abort if the declared function actually does not match the entities. This is a huge safety net, which you would not have when you write SQL. And also when you write SQL you have to take care and unpack the "select *" into your fields etc etc. I suspect this is much more important downside than you might realize.


How do you get results of your query to SomeObject then?

I think it depends on the use case, as always. I use Hibernate for our platform, which is used at most by 10-20 people at once. There ORM is a huge help because I can just easily view and edit my model and all its relations and I accept performance hit in this case and if something is too slow, I rewrite it in sql.

Then we have API that uses data of the platform, and there I dropdown to plain handcrafted sql with a little help of JDBI and so far it has been working great and responses are fast.

So I think it is just about stepping back a bit to think and not to blindly accept either side of the argument. Each use case is different.


That's not Hibernate, that's Spring Data or some such, which is indeed cursed. This bit of Spring was absolutely designed by and for people with no critical thinking ability.

Meanwhile, JPA, the standard which Hibernate implements, has a textual query language [1] which is basically SQL but a little more objecty. For example, you can express joins through properties concisely:

  SELECT c.capital.name FROM Country AS c WHERE c.name = :name
And use constructor expressions [2] to map results into Java objects:

  SELECT NEW example.CountryAndCapital(c.name, c.capital.name)
  FROM Country AS c

[1] https://www.objectdb.com/java/jpa/query/jpql/structure

[2] https://www.objectdb.com/java/jpa/query/jpql/select#Result_C...


Thanks for the corrections, suggestions, and comments everyone!

@twic Thanks for the correction. Didn't know this wasn't a Hibernate but instead a Spring Data feature :-)

@watt You're right, I could've added an offset/limit. Wasn't the main point though (readability was).

@vitro Depends on the approach, but it won't be done /for you/, the mapping is a manual step (grab field X, etc.).

@piokoch, nice suggestion. Of course in the real world we work in teams and some people will prefer this ;-)


And nothing is going to stop you from doing this, you can pack you query (native or JPA-QL or HQL) into @Query annotation and be done. The good thing is that it will do mapping for you.

The advantage of using findEntitiesByIdAndCreatedAtAfterOrderByCreatedAtDesc is that you will get for free paging support (which is not trivial and has to be done properly for each database engine, simple approach with using limit typically is very inefficient).

Also, most of the queries will be something simple to fetch some entity using some business identifier or primary key, not being forced to write down all those simple queries is a productivity boost.

For more complicated queries you can fallback to writing query manually.

There is also maturing Spring Data JDBC project that kind of has "best of both worlds".


While the abstraction might be a bit longer, I've found them relatively easy to write using IDE support (e.g. IntelliJ).

Pageable adds of course a bit of functionalities, as does the convenience of passing an Instant instead of a String :)


That is Java. In .NET using linq with entity framework looks almost like SQL.


Debatable. Method syntax vs query syntax is available. I mostly see method syntax in my codebase. I also see terribly performing linq queries all over the place.

I think it's neither good nor bad, it's just a tool. Tools are useful for making things, but if wielded improperly they will hurt you.

I'm sure same arguments can be made against raw SQL. It's possible to have good codebases in any paradigm, and bad codebases in any paradigm. It's really all down to the quality of implementation.

As a side note if you want a creative way to backup a database just use try logging an EntityFramework tracked object in Serilog with the destructuring operator. Watch as the logging framework inadvertently causes an entire object graph traversing navigation properties endlessly make infinite SQL queries while your web server slowly runs out of ram. Fun times!

Powerful tools are usually sharp.


You can provide the query, hibernate/JPA does not stop you from doing that. Autogenerated queries for simple CRUD ops, but queries supplied by developer for more complex logic is ideal.


how do you unit test your stored procedures on a CI agent?

serious question - this is nigh impossible in every db i've ever used.


Run the DB in docker, import the stored procedures as part of the setup migrations and run tests against that from your normal test suite.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: