The elephant in the room here is the bugs that crop up in the interfaces between units. Depending on your problem domain, it may be easier or harder to create interfaces between all these TDD'd, isolated modules. There's certainly value in a spec as isolated component documentation, but I'd argue there's more value in specs for regressions, particularly when refactoring. This is especially the case in dynamic languages like Ruby which require tests to avoid the most basic runtime errors.
The answer to this conundrum is often: that's the job of the integration tests. Okay, but integration tests are slow, so you'll never test all the permutations of the interactions of the components.
DHH said during the keynote, "it's easy to make your tests fast when they don't test anything". Of course that's hyperbole, but there's a kernel of truth to be investigated there. When working with something as complex as an ORM like ActiveRecord, isolating the business logic and using the ORM strictly for persistence may allow for fast tests, but you still run the risk of bugs creeping in on the interface because of some assumption about or change in the way ActiveRecord works between versions or whatever.
That's why, as ugly and slow as Rails unit tests are, they are a simplifying compromise that strikes a balance between the theoretical ideal of a unit test and an integration test. ActiveRecord itself is this way too, in that often times the business logic just isn't complex enough to warrant the complexity of separating persistence from domain logic. As much as DHH may be talking out his ass without really having ever grokked TDD, I don't think his complaints are completely without merit.
I have to agree, and add an observation that every project I've seen which emphasised highly isolated unit tests spent the majority of its test development effort on local behaviour that was easy to test, and very little effort on testing complex interactions and emergent behaviour.
Furthermore, when a complex interaction with changing behaviour in third party code caused a bug, I have always observed the resident advocates of highly isolated unit tests throw up their hands and say "oh well that's not our problem".
I'll offer up my own rule that I've worked out over the years: the first test you write, from day one, should be the end-to-end performance stress test that fully loads a production deployment of the project, making it do everything all at once and sending random junk input until it falls over. Even when your project doesn't really have any functionality yet, run that one continually in the background. This one test will find so many classes of bugs that you weren't expecting - it optimises your tests for learning surprising things as soon as possible. You then start fleshing out your test suite to cover the things you learn. With some smart design, there will be a lot of shared code between your every-growing stress test and the suite of more targeted tests.
You might find that you get down to detailed unit tests of every part of the code... but you'll probably find that you only need to bother with unit testing obscure conditions for the complicated parts.
The core philosophical difference here is that I regard tests as a tool for learning and making the code better, not as a way to make myself feel happier. The biggest performance barrier I experience as a developer is not the seconds it takes to cycle through the tests, it's the weeks it takes to learn what I'm trying to build. I aim my tests at that target.
DHH's complaints certainly aren't without merit, but they are naively formed.
I did a conference talk called "Boundaries" on this topic, and how a particular type of disciplined functional style can mitigate it to a large extent (https://www.destroyallsoftware.com/talks/boundaries). I think that we can probably have (most of) our cake and eat it to, just as we have so many times before: microprocessors reliably perform computations; TCP reliably delivers over unreliable networks; etc. But we're not going to get there by throwing up our hands.
When you say naively formed, what does that mean exactly? A lot of your post was dedicated to correcting DHH's TDD history. I appreciate the hard work and attention to detail but I don't actually think when the idea of ultra-fast tests originated has much to do with DHH's complaints.
You also spend time in your post talking about how much you value the tight feedback loop between your tests and your code and how your tests being ultra fast helps that. That's great! That isn't really a response to DHH though, that's explaining why your method works for you. Which is a valuable contribution to the discussion, but not really a critique. You also bring up DHH's lack of theoretical computer science knowledge. I don't have the slightest idea what that has to do with the value of TDD, can you explain that?
I guess what I'm trying to get at here is that you didn't really propose a counter argument to DHH here, you said his historical knowledge is wrong, he doesn't know computer science and you get a lot of value from fast tests. The core of DHH's argument is that the metrics that are being touted to measure test suites are not metrics that actually help us write good code or code that is reliably well tested. It would be great to hear your direct responses to those arguments.
The feedback loop is my response. He positioned isolated unit testing as having drawbacks, but he never mentions (and doesn't seem to have experienced) the value of it.
Others have written responses to his claims about design. I think that he's less off-base with those; the design benefits of TDD are oversold to some extent, although they certainly do exist in my experience. It's difficult to oversell the speed benefits of isolated unit testing, though. You really have to experience it. Here's Noel Rappin commenting on my post saying exactly that, in fact: https://twitter.com/noelrap/status/461633622185746432
> The answer to this conundrum is often: that's the job of the integration tests. Okay, but integration tests are slow, so you'll never test all the permutations of the interactions of the components.
Integration tests aren't necessarily slow in any dramatic way (though they'll naturally be slower than unit tests), but in any nontrivial system you still won't test all the permutations of interactions between components because the number of such permutations will be prohibitively large even if integration tests were as fast as unit tests.
My preference is to aim for path complete integration tests, and thorough unit tests.
> When working with something as complex as an ORM like ActiveRecord, isolating the business logic and using the ORM strictly for persistence may allow for fast tests, but you still run the risk of bugs creeping in on the interface because of some assumption about or change in the way ActiveRecord works between versions or whatever.
If you isolate the domain and persistence layers rather than combining them the way Rails seems oriented toward, the persistence layer still ought to be a testable unit -- and as its job is persistence, you wouldn't isolate the database from it to test it. OTOH, its a unit that should be more stable than the model layer in most cases, and you won't pay the higher costs for running its unit tests when you are making changes to the model layer.
> That's why, as ugly and slow as Rails unit tests are, they are a simplifying compromise that strikes a balance between the theoretical ideal of a unit test and an integration test.
Why compromise between unit tests and integration tests when the two aren't exclusive and serve different purposes? Why not actually have good unit tests and good integration tests, instead of beasts that are neither fish nor fowl and are less than optimal as either?
It seems that the argument here is based on the false dichotomy that suggests if I do unit testing, I can't have integration tests, so I either need to just do integration tests or, for some reason, trade both for some sort-of-unit-ish tests that test the persistence and domain layers as if they were the same unit.
> Why compromise between unit tests and integration tests when the two aren't exclusive and serve different purposes? Why not actually have good unit tests and good integration tests, instead of beasts that are neither fish nor fowl and are less than optimal as either?
I wonder this myself. Unit tests serve a completely different purpose and should not get in the way of integration testing. They may/should help with structuring the code so that it is easier to integration test. They definitely shouldn't hinder. Then there is also the system level test, and then manual testing as well. All of these serve different purposes and can live happily together.
You've talked around my core point a lot, but you haven't addressed it directly at all. What do you do about bugs on the interfaces and interaction of the heavily modularized and unit-tested components?
> You've talked around my core point a lot, but you haven't addressed it directly at all.
I thought I addressed it quite directly.
> What do you do about bugs on the interfaces and interaction of the heavily modularized and unit-tested components?
Ideally, catch them with the unit tests -- whose entire purpose is testing interfaces -- but, where that fails, with traditional integration tests (which, when they find bugs missed by unit tests, prompt additional unit tests to isolate the offending unit and assure that they are properly squashed) rather than by abandoning unit and integration testing for something that isn't quite either and lacks the specificity and utility for test-on-each save of unit tests while abandoning the full end-to-end cycle testing of integration tests.
Your unit test guarantees the intended interface of the unit, but it doesn't guarantee the usage of said interface. Typically other unit tests will stub out this dependency. But what is your guarantee that the stub agrees with the spec'ed behavior.
Now this may range from a non-issue if the interface is obvious and straightforward and leaves little room for error, to extremely error-prone in a duck-typed language. Of course there are ways to mitigate this with the test and stubbing infrastructure, but it can be brittle as well.
This problem is why I think there is some sense of coarser-grained units with fewer stubs and some qualities of integration without going all the way up to the full-path acceptance level.
Depending on your problem domain, it may be easier or harder to create interfaces between all these TDD'd, isolated modules.
Aren't you assuming that TDD requires pervasive unit isolation?
I practice TDD often and prefer not to isolate units at least at the level of filesystem artifact.
I believe that one of the reasons test-first became TDD and gave up on the distinction between unit tests and acceptance tests is because the Smalltalk sUnit style of testing units as units didn't work so well in languages which aren't Smalltalk.
I've been hoping that you would write an intelligent response to DHH's latest articles and talk ever since I saw them, thank you for doing it.
I suspect that his claims of TDD isolation damaging the integrity of designs are mostly red-herrings too. I know they are for me personally -- in fact, the opposite is true. I'd love to see you address this issue as well. In particular, your thoughts as they relate the technique of moving logic completely out of controllers (as you do in raptor). This technique seems to infuriate DHH in the context of rails, while to me it seems a far superior design.
I suspect that his claims of TDD isolation damaging the integrity of designs are mostly red-herrings too. I know they are for me personally -- in fact, the opposite is true.
I'd be interested to see more examples of that. In my experience, this sort of warning genuinely isn't a red-herring - I've worked on several projects that have been seriously creaking under the weight of test suites, and in which the desire to isolate units in particular has led to both poor code architecture, and ironically poor tests.
In particular, your thoughts as they relate the technique of moving logic completely out of controllers (as you do in raptor). This technique seems to infuriate DHH in the context of rails, while to me it seems a far superior design.
Raptor's design is interesting, but I dispute that the entire "C" in "MVC" is worthless. I agree that such is often misused, but something like Raptor essentially pushes the logic that it's OK to have in a Rails controller into the router, and I'm not sure that would actually be scalable in practice.
In fairness, I believe there exist code bases with a fragmented, poor design and lots of unit tests. I would place the blame with the programmer's design skills, though, and not with scrupulously testable, loosely coupled modules.
I do not believe that TDD is a replacement for good design skills at all, and it's easy TDD yourself into a poor design. Even strong TDD proponents like Uncle Bob admit as much [1]. However, I also believe that most good designs do happen to have the property of being easily unit testable, and if your code does not have that property, you should take pause.
I've experienced (and created) exactly those weighty, oppressive test suites. I think that they're probably more a symptom of us collectively learning to test than anything else. Even today, there are very few people who are experts at test isolation; five years ago there were almost none.
It's tempting to give up in situations like that, but I like to think back to "GOTO Considered Harmful". It was contentious at the time! There were people who literally thought that you couldn't build complex software systems without GOTO. It's a reminder that we always underestimate how good we can get at something, and how much freedom we'll have after adopting a constraint.
All of these arguments seem like they are overly prescriptive. Individuals should use what works best for their needs. TDD may work wonders for me today on one project, and then it could be horrible tomorrow on the next project. My co-worker might have the inverse.
Stop arguing over how other people create software; Ship Code instead.
I believe that it's a useful conversation to have -- if we are all just trying to ship code, when are we going to have the discussion of what makes code readable? What makes code reliable? What makes code quicker to deliver?
Further, as a automated test advocate (whether or not it's TDD or unit or mocks or stubs or whatever flavor you like) I want to walk into projects where I don't have to deal with changing code without having a good and fast test base to minimize the potential defects. If TDD gets us to that point, I'm all for it.
I don't care how you create software unless I have to use it or inherit it.
> Stop arguing over how other people create software; Ship Code instead.
Electrical engineering, mechanical engineering, architectural design and the medical profession, to name a few, have bodies of knowledge they are required to use.
Is it really a good idea for software developers to say "stop arguing over how other people create software; Ship Code instead" considering we don't have any industry standard bodies of knowledge?
Considering we don't have any industry standard bodies of knowledge, yes it probably is a good idea, because we don't have anyone who can lay out a definitive answer on many of our preference based arguments.
i.e. these arguments will never end if there is no definitive answer, or body to pick one.
Industry standard bodies of knowledge arise from people doing, then talking about what they did, then doing some more, then talking some more. At no point did a God of Electrical Engineering hand down tablets.
I think SWEBOK is interesting both as a demonstration of how far we have to go before we get anywhere near the standards of established engineering fields, and as a demonstration of how much we have already collectively figured out, even if most of us aren't familiar with more than a small fraction of what is out there.
> Individuals should use what works best for their needs
Maybe rather: teams should use what works best for their needs ?
EDIT: whee, downvotes. Apparently HN'ers work on teams that ship software and where a developer uses kanban, another TDD, another waterfall, another scrum, and another only commits code when the 8 planets are aligned.
I'm going to take a stab at the down votes, as I could easily understand why someone might. First, consider your original comment was pedantic and offered little in the way of original insight. It would be like me replying to you and saying:
"Maybe rather: teams should use hwat works best for their needs per project?"
What does that really add?
Basically, it added little to the conversation. Heck, your edit complaining about the down votes is longer than your comment.
You said it better than me. No reason to get religious over it. When I'm the architect on a project, I do it in some cases, and other times I don't. I'm not over concerned what everyone else is doing in this regard.
> That file is going to be a plain old Ruby class: not a Rails model, controller, etc. Isolating those would lead to pain, as anyone who's tried to do it knows. I keep models very simple and allow them to integrate with the database; I keep controllers very simple and generally don't unit test them at all, although I do integration test them.
This is what I was thinking of when I read DHH's post. Mock objects and indirection and all that certainly add complexity, but if your "business logic" is all in pure functions, or at least self-contained source files that don't do IO (and don't use "frameworks"), you don't need mocks to test it.
Better to keep your glue code simple and do integration tests, while unit testing your actual logic, than to shoehorn extra indirection for mocking into complicated glue code.
I gather that DHH complained because he had been doing the latter, and found that it sucked.
Is it me, or there are other people who feel stomach ache when reading this sort of stuff? Why don't just learn algorithms and other computer science "direct" approach by analysing problem and solving it in most officiant way? 100 lines of tests for 50 lines of code for a _catalog_? Really? I'm sorry just reading this brings bad feelings, like those which traders describe when they feel something's wrong with the market...
Why don't just learn algorithms and other computer science "direct" approach by analysing problem and solving it in most officiant way?
This doesn't really say anything; your statement amounts to "Why not do it properly, instead of using tests?" - it should be obvious why that's invalid.
In particular, unit testing protects against "minor changes with unexpected consequences." This happens all of the time. "analysing problem and solving it" will not make a bit of difference. Do you expect that people who use unit tests are instead "ignoring the problem and not solving it?"
A test ratio of 2:1 isn't a big deal. It serves as a record: "Here's a specification of what my code does. You can automatically verify that it does what it says." Think of it as part spec, part test.
> Why don't just learn algorithms and other computer science "direct" approach by analysing problem and solving it in most efficient way?
---
The problem often isn't clearly defined. To a certain extent programming is an exploration of the problem until you're in a position to go 'And so...' Add to that, that knowledge of the context and lower level abstraction layer that your algorithm corresponds to is imperfect. (Imagine doing maths where a malicious demon sometimes changes the contents of variables on you according to a set of rules that you don't know.)
As such, any algorithm that someone wrote while trying to express the problem and provide a solution may or may not do what they expect it to when given in any particular language on any particular machine.
There are a couple of ways to try to address some of the difficulties with that. One is to write code at such a low level that you can be sure of the blocks you're using. Another is to try to climb the abstraction layers using a language with certain safeties built in to try to limit the context you have to be aware of.
But there are drawbacks with both of those methods, and testing is a way to deal with the inevitable oversights involved. (It's also good in getting people to more tightly define the problem in the first place for you. Acceptance tests are a wonderful thing sometimes.)
> "Second, that sentence is false. Isolation from the database, or anything else, is generally done with mocks, but mocks didn't even exist when TDD was rediscovered by Kent Beck in 1994-1995."
See, that's not how I remember it. Stubs implementing the Interfaces you wrote (because Composition over Inheritance) was the common solution. AR's idea of a Unit Test has always been wrong to my mind. They aren't Unit tests. They're Integration tests.
This only really became an issue with the rise of Rails and the fact that you couldn't really write Unit tests for an ActiveRecord.
Considering Gary provides citations for his timeline, I'm inclined to believe his over your memory.
That said, I think you're confusing _unit_ tests and _TDD_. Unit tests imply that you test a single isolation, TDD implies that you write your tests before you write your implementation. You can TDD unit tests, but you don't have to, and you can do all-acceptance TDD if you want.
Exactly. I think most of the TDD probably come to some tutorials which test addition a+b of that sort and demonstrate how to test "ah there's logical error! how incredible TDD helps us to identify error early on!". I do some TDD from time to time, but outside of my textbook algorithm, writing pure unit test which involves using mocks is really tough and time-consuming (unless someone has built a test driver for you to use throughout the whole development).
In any, if you ever do curl http://localhost:port/mynewapp/page1 that's already testing and is doing functional testing. If you write that down before you start writing a piece of code, you are doing TDD to test whether your app will return 200 or not.
There's a big swath of time between the rise of the xUnits and the modern Mock fancy he doesn't cover (or provide citations for).
Considering the early TDD mantra was red-green-refactor, saying you could do it with all Acceptance tests is a bit of a stretch. I mean tomato tomatoe I guess, but that's certainly not a definition I've ever come across (and it would certainly run afoul of Uncle Bob's 3 rules of TDD).
Try looking for how many times "design" is mentioned on the page, and in what context. That's how TDD was sold to me. Is your component difficult to test in isolation? Then the design is flawed. One of the big advantages of TDD early on (and it seems to hold true today) is that it helps you write testable code (in Units).
BTW, I didn't suggest XP and TDD were the same. XP2000 was a conference. You can believe me or not I guess, but back in the day all you had was your xUnit. Selenium was a twinkle in someone's eye and local databases were not the norm. It was definitely all about the Units (and I think if you read the material of the time the emphasis will be pretty obvious).
TDD and XP are not the same thing, either. I don't have my copy of "Test-Driven Development by Example" handy, so I'll just have to point you to http://en.wikipedia.org/wiki/Test-driven_development , which doesn't use unit tests in its definition.
Now, it's not that there's no connection at all: unit tests are _incredibly_ helpful when doing TDD, because you want your tests to run quickly. But it's "Test-first," not "Unit-test first." When you're doing TDD, you don't write acceptance tests second, you write both your acceptance and unit tests first.
I think there is a good point here, in that a very tight TDD loop requires fast tests. Nothing's untrue about that, and if you've worked with a set of speedy tests I'm sure you know it can be quite efficient, and indeed it can be quite a different way of working.
Ultimately I don't really think that this is particularly in conflict with what DHH was talking about. A tight TDD loop might work for some developers, but if you get to the stage where you are making substantial architectural changes to enable this, then it looks like a problem. And I've seen a lot of this first hand.
FWIW, I take your point that things like Spring aren't ideal solutions. But with a bit of care they can help offer the best of both worlds; I've got Guard and Spring working together on the Rails project I currently have open. Looking at an equivalent model (100 LOC, 200 test LOC), it runs without database isolation in less than 400ms, which is less time than it takes me to switch to my terminal. It's really great, and I don't have to worry about isolation.
TLDR; there are compromise solutions possible. Maybe not suited for everyone, but in some cases better than going too far with "design for testability."
The critical flaw in this post is that Gary is not refuting what DHH said.
Gary makes this claim:
> You finally get to see what's really going on. David's tests run in a few minutes, and he's fine with that.
> I'm not fine with that. A lot of other people are not fine with that.
But what DHH actually said is this:
> You might think, well, that's pretty fast for a whole suite, but I still wouldn't want to wait 80 seconds every time I make a single change to my model, and want to test that. Of course not! Why on earth would you run your entire test harness for every single line change in a particular model? If you have so little confidence in the locality of your changes, the tests are indeed telling you that the system has overly high coupling.
and this:
> These days I can run the entire test suite for our Person model — 52 cases, 111 assertions — in just under 4 seconds from start to finish. Plenty fast enough for a great feedback cycle!
Using a workflow like Gary's, there's an argument to be made that 4 seconds is not acceptable, and this is why we want single files that can run in a few milliseconds.
However, that's not the only possible way of running tests, and the difference between 4 seconds and 300ms for the feedback you're actually interested is massively different than 300ms vs "a few minutes".
For a post that calls DHH out on a strawman, this is in itself a great example of one.
Yes, I focus on my per-file runtime in the post, and I mention David's suite runtime in one sentence at the beginning. They are not meant to be compared. David's file runtime is four seconds. This is unacceptable to me. This is unacceptable to other people who replied to your tweets. This would double the length of my high-speed TDD loop, which would make those portions of my TDD process take twice as long.
Yes, it would've been clearer for me to specifically address both suite runtimes and both unit runtimes. You know what else would've been clearer? All of the 2,000 words or so that I deleted from that post while I was editing it down into its final form. This is just how writing works. I don't think that it's misleading as written.
Of course, I've already told you, on Twitter, exactly my reasons for rejecting both four-minute suites and four-second test files. They're not in the post, but you know the reasons. You know that I wasn't selectively attacking a subset of his argument, because you know that I do have an answer for test file runtime. And yet, for some reason, here we are!
(For anyone reading this later, the tweets in question are gone. Lately I've been deleting all replies, as well as trivial non-replies, for Reasons.)
On the one hand this article does at least provide citations for some of the timelines involved. On the other hand it makes me throw my hands up when I see some of the examples provided. 100 lines of code to test 50 lines of code, which implement a catalog is pretty par for the course it seems in TDD advocacy. This could honestly be me venting my current frustrations, but I get tired of this type of advocacy, also often seen in the functional world, of proving how great something is by using the most trivial possible example, in a domain squarely in your technique's wheelhouse. 100 lines of code runs in 0.24s? What is this, I don't even. Your functional algorithm is incredibly elegant at implementing the fibonacci sequence? Awesome.
But are these things honestly the bulk of what people do? Is testing that you can put objects in a catalog, access them, and remove them most of what people do and need assurances on? Am I the only person who has spent most of their career working on software with codebases measured in the millions of lines, with significant user interfaces as well as significant technical domain knowledge embedded in them? I know that TDD says if our objects have many collaborators we are "doing it wrong" but honestly how far do you break down something without it blowing up into a million classes and a ton of code? How do you get around the fact that a button press can set off a numerical simulation (actually these are super testable and I support that), several trips to the database, multiple changes to the user interface, all in the face of dozens of constraints at all levels of the program to ultimately end up with the graphical representation of chemical pressure that the user wants? Especially when every moving piece in that calculation is supposed to be tested apparently in independent units?
I would estimate that the bulk of code that I have experienced in my career relied on multiple other objects to be effective. Do I just mock these out and pay the price of keeping those mocks in sync with the classes they are imitating? Do I then have my test be tightly coupled to the internals of the function being tested, making sure this mock is called in this way, this many times? Do I rewrite things such that every function takes its collaborators as arguments and returns the results I am looking for? What happens when each of those collaborators is itself a giant series of other stateful objects or at least provides access to a highly complex piece of state which is a pain in the ass to setup?
I ask these questions seriously, because on the one hand TDD advocates keep telling me I am not professional if I don't follow their practices, but on the other the details of how to do so in the face of real-world constraints (not 50 line data-containers) are always somehow left out of the discussion. I want to believe, I do, it just gets hard to sometimes.
I didn't say "I have a 50 line example, therefore it works in all cases". Destroy All Software is 2,208 lines of production Ruby code, and most of it is tested in exactly that way. I could've showed you charge_purchase_spec, download_policy_spec, etc. The point is that most of the application is decomposed into these little 50 line pieces that can be thought of and tested all by themselves. You look at this and see a trivial example, but the whole idea is that large applications can be built out of lots of trivial little examples and it really, actually works.
You're also misinterpreting what the catalog is. You see one word, "catalog", and decide that all it does is "put objects in a catalog, access them, and remove them". No.
The DAS Catalog class enforces a few integrity guarantees, like "there will be no gaps in serial numbers" and "titles won't be duplicated". Then it provides several querying mechanisms, like finding seasons by slug, finding episodes by slug, or finding the season for a given episode. It also has some aggregation behavior like "give me a list of all screencast titles".
None of those Catalog behaviors is very complex, and all of the tests are simple. That's the point! Most software is made up of fairly simple things like this: aggregate some stuff; find some stuff; check a compound property of something; make a few decisions. These things are easily tested in isolation. For the rest, there's integration testing, or even plain old exploratory testing. But that rest is not nearly as large as it seems to be naively.
The last three paragraphs of your comment misunderstand what TDD is and what it provides you. I don't think that you're "not a professional" if you don't do TDD. I find value in it for a lot of code; so do many other people. Like the article says (you read it, right?), I only do TDD about 75% of the time in web apps, and more like 50% outside of web apps. As is also mentioned in the article (you did REALLY read it, right?), the best way to learn TDD's limitations is to do it 100% of the time, which has the unfortunate property of creating some zealots who haven't yet found a balance.
Hi Gary, thank you for the reply. Let me first assure you that I did read your article. I know it's just the word of some guy on the internet but a while back I got so fed up with dumb comments that I promised myself I would never comment on something without having first read it in an honest fashion.
Next, I would say that your comment is still somewhat frustrating to me. My tone may have been overly combative, but I was legitimately asking questions in the hope of getting an answer or at least pointed to one. So I hope you can understand my frustration when you dismiss three paragraphs of my writing as misunderstanding what TDD provides. The may be true, but if you're going to spend time typing those words I feel like you could at least type a few more either taking a stand on what you believe TDD to be or pointing out some resources that I could use to educate myself. As it is you've told me I'm wrong but not how or why.
As for most software being composed of stuff that is simple in the aggregate, I again disagree but this may be my ignorance. Again, most of my experience in software has come on large systems where each object has several collaborators, each of which may also have several collaborators, recursively continuing out. The two options I see for TDD are either to spend considerable effort setting up these collaborators, or to mock them out. If I mock them out, I am coupling myself to the internals of how the method is implemented, because I have to give explicit implementations of each method called in the course of the method doing its job. If the method changes, the mock has to change, and the test breaks, even if the surface behavior of the object may not change.
I may be missing something here, but this is the reality of software as I experience it. Answers that say any of the following do not help me, and honestly I consider any of these evidence that TDD is not cure-all it is made out to be:
1) You're doing it wrong, where wrong is any case TDD has failed, essentially making it unfalsifiable
2) Your system is poorly written and doesn't work for TDD, essentially condemning the vast pool of legacy that makes the world spin to be without tests, or refactored/rewritten without adding user value
3) Your understanding is incomplete, without ever specifying what it would take to have a complete understanding or how to go about gaining this, with a similar symptom as case 1.
I appreciate that you take a more balanced approach to TDD advocacy. I should do a better job of separating you from people like "Uncle Bob" from whom I took the direct quote about not being a professional. I have also dealt with several TDD advocates throughout my career whose responses to questioning TDD have fallen into the previous 3 categories. Because of this I sometimes find it hard to separate more reasoned TDD advocacy from the former, and so my apologies if I misinterpreted the strength of your message.
The first paragraph of the Wikipedia article on TDD is a correct definition. Write a minimal test; see it fail; make it pass with a minimal code change; refactor to improve the design, keeping the tests passing. There are things that people do alongside TDD; there are common ways to perform TDD. But the red-green-refactor cycle is what "TDD" means.
I don't how many you mean by "several collaborators" in your composition paragraph. I'll assume 8, because that's enough collaborators that TDD will really start to hurt.
All software (not even most, but all) is composed of aggregations of trivially simple components. At the machine level, everything is built from instructions; in a fully OO language, everything is built from method calls; in some functional languages, everything is built from unary functions.
Systems are built by aggregating the trivially simple primitives provided by the environment. We have full control of that aggregation process; it has no mind of its own. We can choose to aggregate four things at a time or eight things at a time. That's not a big difference. It's the difference between 8 and 4 + 4. Same result, different decomposition.
The "4 + 4" analogy is not a straw man: splitting an eight-object interaction into two four-object interactions aggregated together is a well-defined operation on the syntax tree. IDEs even automate it, and have been for a decade or more; this is what an automated "extract method" refactoring is. Doing it well requires years of practice, but all software is made of aggregations of trivially simple components, and we have full control of the aggregation.
Mocking is not required for TDD. It's often used to do isolated unit testing, which may be done within a TDD loop. But even isolated testing can be done without mocks. I did a talk called "Boundaries" about that topic. https://www.destroyallsoftware.com/talks/boundaries
So yes, you are missing something here: first, most people doing TDD aren't mocking, or are mocking very rarely. Second, mocking is not required for isolation; you can also isolate by structuring your software in certain ways, which is what Boundaries is about.
Addressing each of the three responses you anticipate:
1) "Falsifiability": Nothing in software practices is falsifiable in practice. I've never seen a single piece of experimental literature that I considered sound. There's not even experimental evidence saying that "structured programming is better than willy-nilly GOTOs". And I've never even heard of a meta-analysis or an experiment being reproduced several times independently.
Sometimes you really are doing it wrong. If you try Haskell, can't figure out how to write to a file, and throw up your hands, that doesn't mean that Haskell "has failed" or "is unfalsifiable"; it means you don't know how to use it. Haskell and TDD are both particularly difficult to get your head around at first. Maybe you don't want to spend the effort. That's totally fine. This is why I don't actually know Haskell well.
2) "Your system is poorly written." This is a hugely subjective claim and you have to treat it as such. You have to silently append "by my standards of design" to the end of it. You also have to realize that everyone's standards of design are informed by the practices that they've used while doing the design.
If you primarily work on systems composed of functions with, say, ten collaborators (meaning a total of ten arguments and referenced globals/functions/etc.), then yes, I will say "your system is poorly designed". It doesn't mean that I think that you're a bad person; it does mean that I won't work with you to continue building your system in that way. Doing isolated unit testing on that system will be very difficult. Doing integrated unit testing, with or without TDD, will be less difficult. If we crank the collaborators up to 20 or 30, all programming tasks will be difficult, testing or not.
In the second part of your issue (2), you conflate TDD with testing, which is not correct. TDD is a loop of actions that produces tests. There are many other ways to produce tests.
3) "Your understanding is incomplete." If want to understand these ideas, read "TDD By Example" by Beck to learn about the TDD process, then "Growing Object-Oriented Software Guided by Tests" by Freeman and Price to learn about TDD design feedback and the careful use of mocks. Yes, it'll take time. (But less time than learning Haskell.)
If you want to understand what I mean about isolation without mocks, watch my "Boundaries" talk (I'd recommend doing that after reading the two books above). If you want to see live examples of TDD, and the trade-offs inherent in TDD being made and dicussed, watch my Destroy All Software screencasts.
However, if you don't want to do the work to tease these ideas apart, then I think that you should acknowledge that you're not willing to put the effort in. This is a different path than saying "people haven't said the right things to me for me to believe it works", which is the vibe I get right now. I learned TDD by doing it, incorrectly, and painfully, over and over again. You have the advantage of being able to read a couple of books to jump past my first year of learning. That's a huge efficiency gain, but the process can't be compacted into a comment on Hacker News that transmits the better part of a decade of experience.
(Finally, somewhat tangentially, I recommend that all programmers disabuse themselves of any belief that we have experimental evidence about programming practices by reading "The Leprechauns of Software Engineering".)
You bring up some very valid concerns. I'm an advocate of TDD, but I'd love to hear some responses from TDD advocates on the disconnect you see between the TDD ideal and the reality you experience, preferably in a less defensive and accusatory tone than Gary's reply to you.
As a start, I recommend reading "Growing Object-Oriented Software, Guided by Tests". I am still not a 100% convert but I did get many of the same questions answered.
> TDD advocates keep telling me I am not professional
Well it's nice that they hold the keys to the kingdom ;) ! In your darkest hours, remember this powerful mantra: "There is no silver bullet", and proceed to systematically take down glib generalizations.
I have great respect for Gary Bernhardt. I think he is missing DHH point here, which as I understand it, that:
tests can be an end in itself
I think this is reasonable argument on DHH part that we want to spend more time writing actual code, instead of going through movements of TDD. That doesn't mean TDD is not valuable.
> Classical TDD does not involve mocking or other forms of synthetic isolation by definition. We even use the term "classical TDD" to mean "TDD without isolation".
Well, like I say in the post, mocks didn't exist back then, so they couldn't have been mocking in the sense that we are now. I wasn't there, but I believe it's true that in some cases they were doing what we'd now consider "fakes", which are simplified replacements for production dependencies that have minimal working implementations. (An in-memory database is an example.)
Fakes get you around the question of database integration, but not the question of integration in general. You'd have to create a fake of every class in the system for that, in which case you probably just re-invented mocks and/or stubs.
I probably could've more explicitly called out the fact that DHH is obsessing over database integration when it's just a special case of what isolated testers are actually worried about, which is integration in general. It was already a long post, though.
Thanks for the reply and for the article. I took your quote out of context. I'm not very interested in DHH's mis/understanding of TDD. I'm more interested in my understanding of TDD. I'm trying to figure out if I'm doing TDD correctly or incorrectly when I isolate the database. Whether it's done through a mock or a fake or whatever is not that important to me.
Sorry for hijacking the topic. I've been trying TDD for years, and I never know if I'm doing it correctly. I saw this line and it made me question it yet again.
My main response is that there's no "right" way to do TDD. There is a core definition, which is the red/green/refactor loop. New tests must fail; all tests must be green to refactor. Almost everything else is someone's interpretation. And, honestly, after enough time doing it you'll start taking careful shortcuts through the loop. (Am I allowed to say that in public? ;)
Pairing in person with someone who's done it for a long time can help a lot, but even then you're getting someone's interpretation. If you worked with the DHH of three years ago, you'd get a very different interpretation (slower cycles, integration tests, no isolation) than if you worked with me (faster cycles, integration/unit mix, biased toward isolated unit tests). We were both doing TDD, though! (Given that he's said that he used to do TDD, I assume that he was following the core red/green/refactor loop.)
Of course creating mocks or circumventing the database may create more bugs (or hide more) than just wait the couple of minutes for the real thing
Adding another failure point is exactly what the name says: another possibility of error or bug
Sure, we can talk about the "one true way" of doing tests, how to make hundreds of tests run in a short time, etc (the fact that TDD insists in creating one test for each tiny thing goes against it, btw) but yeah, I prefer spending more time solving the problem, and not testing around it.
TDD is a way of writing tests, not a prescription about when to write them. I explicitly say in the post that I only do TDD 75% of the time for web apps and more like 50% for other code.
This comports quite well with my personal experience. In late 1999 I happened to pick up XP Explained and set about to try this testing thing. At the time, there wasn't any "Unit Tests == Fast Tests" thing. What we did on our project was have two different types of tests: "Fast" tests and "SlowAndExpensive" tests, that we ran separately. But there wasn't much some fundamental distinction between them. That came a lot later.
Fast tests are awesome, but hard to achieve - at least for me. TDD advocates: prove DHH wrong with easy to adopt frameworks and working software, not blog posts or books.
TDD advocates: prove DHH wrong with easy to adopt frameworks and working software, not blog posts or books.
Gary Bernhardt's about as credible on this as one can be, as he's literally recorded hours of himself building working software with TDD.
From the article:
These tests are fast enough that I can hit enter (my test-running keystroke) and have a response before I have time to think. It means that the flow of my thoughts never breaks. If you've watched Destroy All Software screencasts, you know that I'll sometimes run tests ten times per minute. All of my screencasts are recorded live after doing many takes to smooth the presentation out, so you're seeing my actual speed, not the result of editing.
I understand disagreeing, but responding to this post with "show, not tell" just makes it look like you haven't read the article.
Humorously, I already did create a software tool that does isolation automatically. It's called Dingus and it's five years old: https://github.com/garybernhardt/dingus.
It was a terrible idea. It only worked well if you applied the same discipline that you would've had to apply when doing it manually, but it lured you into a false sense of security when using it sloppily.
(Dingus is fine when used as a standard mock/stub/spy library, but the automatic isolation features are dangerous and are documented as such in the README.)
RSpec has been one of the easiest tools I've ever learned to use. Their documentation is beautiful (and their demo code is actually written as tests), and it's pretty easy to dive right in to writing tests.
Isn't this just a reflexive and ignorant response? Have you actually looked at any of the code he's shipped? I mean certainly people should be allowed to write about their experiences without having to face accusations that they've never actually done anything but write.
I'd believe the same endorsements from someone who makes their living primarily off working code, rather than selling edu. material promoting test-heavy coding.
That said, rapid "all's still well" feedback is awesome when you can get it.
> "I want my feedback to be so fast that I can't think before it shows up. If I can think, then I'll sometimes lose attention, and I don't want to lose attention."
Don't you want to think while programming? I feel like that's practically all I do -- I spend most of my time thinking about a problem and very little actually writing code.
Ironic that your reply to a piece about strawmen is nothing but a strawman. Of course he is not meaning what you're saying, he's talking about context switching. If I have to wait 15 seconds for my tests to run, I start doing something else and my thought process around the problem is gone.
If you move on to the next problem, then your test finishes and forces you to move back to the original problem then you just had to do 2 context switches. If you never had to wait for the test result then you could have eliminated those.
No, I really don't want to think while coding 90% of the code I write. I listen to audiobooks and podcasts while coding to keep my mind occupied while I code. I want instant feedback, else I'm going to quickly check reddit or HN, and then I've lost 15 minutes to a 45 second test.
If you'd ever seen Gary code, you'd know that he is a very quick thinker (almost nonhuman like) that requires very fast feedback loop to keep up the pace.
That's stupid. It's an interesting discussion about software best-practices. A bunch of well-respected people are writing about what they do, what they see as the upsides and downsides of both approaches, and arguing for why their approach is best.
This is content which is appropriate to the audience, reasonably interesting, and technically useful.
Yes, this discussion of industry approaches to our craft is taking up room for far more important things, like the discussion of real estate prices in San Francisco.
While it make look to be that simple, the greater benefit of having these discussions out in the open amongst the various developer communities should not be overlooked. There are a lot of developers out there that will take what DHH says for "the golden rule" without critically thinking about the content of his assertions. (Of course this is true for a lot of vocal developers, including Gary). When we have folks like Uncle Bob and Gary Bernhardt challenging those assertions, all of us can benefit. And for those people who are not used to, or don't have the experience in the community of speaking up and challenging ideas, they can learn that it's not OK just to take someone's opinion on software development as the truth. If you don't like the conversations, ignore them and read something else.
No, you can't. If you flag articles just because you are tired to see some topic in the front page your flagging habilities will be removed (happened to me).
It actually feels like a bunch of experts in our field having an intellectual debate about an important aspect of the profession in public. If that's not relevant, I don't know what is.
The answer to this conundrum is often: that's the job of the integration tests. Okay, but integration tests are slow, so you'll never test all the permutations of the interactions of the components.
DHH said during the keynote, "it's easy to make your tests fast when they don't test anything". Of course that's hyperbole, but there's a kernel of truth to be investigated there. When working with something as complex as an ORM like ActiveRecord, isolating the business logic and using the ORM strictly for persistence may allow for fast tests, but you still run the risk of bugs creeping in on the interface because of some assumption about or change in the way ActiveRecord works between versions or whatever.
That's why, as ugly and slow as Rails unit tests are, they are a simplifying compromise that strikes a balance between the theoretical ideal of a unit test and an integration test. ActiveRecord itself is this way too, in that often times the business logic just isn't complex enough to warrant the complexity of separating persistence from domain logic. As much as DHH may be talking out his ass without really having ever grokked TDD, I don't think his complaints are completely without merit.