I don’t buy “duplication is cheaper than the wrong abstraction” (2021)

lvncelot · on Aug 29, 2023

> Except in very minor cases, duplication is virtually always worth fixing.

I disagree with the severity of this, and would posit that there are duplications that can't be "fixed" by an abstraction.

There are many instances I've encountered where two pieces of code coincided to look similar at a certain point in time. As the codebase evolved, so did the two pieces of code, their usage and their dependencies, until the similarity was almost gone. An early abstraction that would've grouped those coincidentally similar pieces of code would then have to stretch to cover both evolutions.

A "wrong abstraction" in that case isn't an ill-fitting abstraction where a better one was available, it's any (even the best possible) abstraction in a situation that has no fitting generalization, at all.

BWStearns · on Aug 29, 2023

Agreed. Abstractions also tend to be more resistant to change, both from a technical level, and a social level.

At a technical level an abstraction will have more call sites to worry about in different contexts, the more wrong the initial abstraction the harder it will be to change.

The social level is maybe even more problematic. Abstractions seem more important than calling code and will experience more friction in code review. This change friction can also increase with the "wrongness" of the initial abstraction. The starting point makes less sense so a reviewer needs to work more to understand the context. If the abstraction is gnarly enough then it's possible that the reason for the abstraction is almost obscured. Even someone who knows _how_ it works might have lost the forest through the trees and push back on changes that simplify it or improve it if the change is a sufficiently large departure from the initial state. In this case you can often see small incremental changes get added easier but this just makes the shared code a bit gnarlier for next time.

ljm · on Aug 29, 2023

This is my beef with naively applied DDD, separation of concerns, and design patterns.

Usually what happens is the 'clean' code ideal comes first, and then the implementation is squeezed into it. This then informs the organisation (or architecture) of the rest of the codebase and your software design has become a matter of putting pegs into the right-shaped holes.

I have never found that kind of highly abstracted code easier to work with than some simple procedural alternative that is easy to delete and easy to refactor, so long as effort was put into writing it well.

Of course, the patterns have a purpose and do help when used nicely - a lot of code you write will fall into some of those patterns even without you explicitly mentioning it. It's just...doing it for the sake of it is a problem.

evilduck · on Aug 29, 2023

We call those architecture astronauts.

The pile of abstractions stacks to the moon but when you chase down what's actually happening a 60 file repo ends up holding like 12 lines of actual programming.

tetha · on Aug 29, 2023

> At a technical level an abstraction will have more call sites to worry about in different contexts, the more wrong the initial abstraction the harder it will be to change.

As I recently called it, infrastructure and systems lose agility as they gain dependency and move down the stack.

If you have like 1 customer and they have good retries, honestly: fuck everything. Deploy master, in fact, deploy every keystroke to prod. It'll be fine.

At the same time, about 30k - 40k FTEs of our B2B customers depend on one of my Postgres instances during business hours and about twice of that during different holiday seasons. Honestly? Nothing touches the system-level settings of these database systems unless we have pondered a change for 2 weeks. And even then we will schedule an approved change over 4 weeks across applicable postgres clusters. The carnage a bad change at this level can cause is ridiculous enough to not be.

peeters · on Aug 29, 2023

I try to think about whether two concepts are innately similar or incidentally similar. Computing compounding interest for a home equity loan and a mortgage might be innately similar. A desired change to one will probably make a desired change to the other. Computing growth of a fruit fly population and computing compounding interesting for a loan might be incidentally similar. Until you change your "computeExponentialGrowth" function to now handle occasional decimations from environmental sources, and anyone looking at the code wonders what the heck that looks like for a loan.

HWR_14 · on Aug 29, 2023

As interest rates go back up, paying off (part of) a mortgage early might come back.

adammarples · on Aug 29, 2023

If you've got your abstractions correct, then the exponential growth term and the decimation term will be partial differentials which will compose together nicely

AnimalMuppet · on Aug 29, 2023

For a loan, maybe it looks like payments on the principal?

But your overall point is very correct. Don't make an abstraction because of coincidence.

CuriouslyC · on Aug 29, 2023

Duplication can sometimes be useful, for instance if you have many small variations on a central process. Trying to make one process with all the edge cases baked in leads to overly-complex, hard to reason about, expensive software.

In my experience, the right way to handle this sort of situation is to create a functional mini-DSL for the process that handles all the implementation details, then create a "default" process which serves as a template. If a process needs slightly different logic, just copy the template, update the DSL to support any new logic, and update the template with the new DSL statements. This approach lets you give semantic meaning to implementation details, and you can see where all the different custom logic is at a glace by looking at all the template copies. As long as the template is only calling out to DSL actions with no internal logic of its own and process flow is correctly encapsulated in the DSL, you should never need to update templates to change behavior, only update the DSL.

crazygringo · on Aug 29, 2023

> is to create a functional mini-DSL

Exactly. Is there a formal term for this?

Instead of one gigantic function with 50 parameters, you have 100 "template" functions, that all make use 60 different "helper" functions (what you're calling the DSL).

Instead of castles-of-logic abstraction, it's nuts-and-bolts or grass-roots abstractions. I've never come across a name for this development style.

But it generally works extremely well when building processes for tens/hundreds of data formats or customers or what have you.

consilient · on Aug 29, 2023

> I've never come across a name for this development style.

Libraries designed like this are sometimes called "combinator libraries".

fiddlerwoaroof · on Aug 29, 2023

This sounds like what lispers call something like “language driven design” or “growing a language”

code_biologist · on Aug 29, 2023

Embedded DSLs is the term I’ve seen in the Haskell, Scheme and Ruby communities.

Kalabasa · on Aug 30, 2023

Builder pattern is a similar concept

amalcon · on Aug 29, 2023

This way of doing things (which I agree is often the correct way) is the reason for Greenspun's Tenth Rule: https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

Though it's less true today and in languages that are not C or Fortran. Even something like C++ or Java has the template method pattern, which gets you 80% of the way there. Dynamic languages like Python or Ruby tend to have pretty reasonable facilities for building DSLs, as do more modern languages like Scala and Rust.

tracker1 · on Aug 29, 2023

This is generally my approach to data ingress/egress (ETL)... I'd rather have a hundred similar, small scripts for each data source than try to create one complex (monstrosity) application to handle them all.

augustk · on Aug 29, 2023

DSL = Domain-specific language (I guess).

Always a good idea to expand an abbreviation the first time it's used.

kiitos · on Aug 30, 2023

So, to maintain your project, it's not sufficient for me to know language X -- I also have to learn a bunch of domain-specific sub-languages, which any developer on the project may have implemented in response to an arbitrary problem, which almost certainly has no kind of formal specification or stability guarantee, and which can change over time in any way without warning? No sir.

CuriouslyC · on Aug 30, 2023

"So, to maintain your project, it's not sufficient for me to know language X -- I also have to learn a bunch of functions, and understand the business processes that are used to create the abstractions, and trust code from my coworkers with just unit tests and type checking? No sir."

kiitos · on Aug 30, 2023

It seems like you're equivocating language-level concepts like syntax, semantics, grammar, etc. with program-level concepts like types, functions, names, etc. But these are two categorically different things.

msluyter · on Aug 29, 2023

I think one example where duplication > abstraction is in tests. I personally find tests that have a ton of extra helper classes/functions to do stuff like set up fixtures or do assertions to be painful to deal with. Taken to an extreme you end up with a mini test framework that obscures the actual test cases and is as hard to understand as the code in question.

I'm not against shared test fixtures or some utility functions, but IMHO, it's better to have some duplication but clearer tests.

ezekg · on Aug 29, 2023

> I personally find tests that have a ton of extra helper classes/functions to do stuff like set up fixtures or do assertions to be painful to deal with.

I think it depends on the context. For example, I typically agree, but when I was writing authz tests [0], I ended up writing a DSL so that 1) I'd more more inclined to write the thousands and thousands of tests, and 2) I'd be able to focus on the actual authz assertion and not on verbose setup.

I couldn't imagine writing those policy tests without that abstraction. I would have lost my mind with all of the repetition, and would have almost assuredly made mistakes.

[0]: https://github.com/keygen-sh/keygen-api/blob/master/spec/pol...

HumanOstrich · on Aug 29, 2023

Thank you for the link. This is inspiring. Do you have any resources you could link to that would explain some or all of the style for these tests? The general approach I mean.

ezekg · on Aug 30, 2023

Honestly, I don't have many resources to provide. I read a lot of policy tests via GitHub search (e.g. path:spec/policies/*/*.rb), but couldn't find anything that looked like what I wanted. I wrote the DSL as-needed in order to fully test my app's authz while migrating from Pundit to ActionPolicy (while also introducing fine-grained permissions).

It's not the prettiest when you actually look beneath the covers [0], but it does what I wanted -- providing an easy way to write exhaustive authz tests. Without the DSL, I probably wouldn't have written the tests. The PR for said migration was massive [1], and was a prerequisite to going open source [2].

[0]: https://github.com/keygen-sh/keygen-api/blob/master/spec/sup...

[1]: https://github.com/keygen-sh/keygen-api/pull/647

[2]: https://github.com/keygen-sh/keygen-api/issues/644

HumanOstrich · on Aug 30, 2023

Thanks! I'll dig around in the pull request and issue as well.

abcdaiojjdfoj · on Aug 29, 2023

I like it when you have nice, composable utility functions. Ideally each test contains a short preamble setting up the appropriate context for the test to run. The preamble elucidates what the tests are actually testing. It can also serve as documentation on how to use those functions.

There will probably be some duplication across tests, but if the utility functions are idempotent/composable, they're usually pretty easy to read/understand and equally mechanical to write/update.

echelon · on Aug 29, 2023

Fully agree.

I would add that you should duplicate the common, cross-cutting setup (eg. faked/mocked dependencies that don't matter), but make the test conditions themselves explicit.

You get a feel for the correct granularity the more tests you write within the codebase. If you try to be too clever in saving boilerplate, you'll cause pain for future modifications and maintainers. Sometimes fixing "clever" tests takes longer than the code change itself.

bcrosby95 · on Aug 29, 2023

Given a long enough timeline, every abstraction turns wrong.

The answer isn't to not abstract, the answer is to tear it out when it turns wrong. That was actually the original point of the popular article that streamlined this view - that we shouldn't be afraid of tearing them out, not that we shouldn't make them in the first place. Most people just read headlines though.

lolinder · on Aug 29, 2023

The resistance to tearing out a bad abstraction isn't just cultural: combining two different functions into one is a lossy operation, which makes splitting an abstraction harder than creating it in the first place.

While the functions are distinct the call sites are self-documenting. You know which calls are for which purpose because the names are different. After combining them to deduplicate the code, you've lost that information, and to disentangle the abstraction now requires you to infer and reintroduce that lost information.

It's not that it can't be done, but there is real friction that doesn't just exist in people's heads.

abathur · on Aug 29, 2023

I think the difficulty of making the right decisions without this lost information is well-observed.

I wrote a short post in roughly this idea space last year: https://t-ravis.com/post/doc/what_functions_and_why_function...

It feels like the same thread you're describing, but I guess it's pulling on the other end of it. It's thinking about how to name things in a way that makes it easier to see that the implementations might diverge later, and simplify actually doing so (by preserving more of this intentional context).

kiitos · on Aug 30, 2023

Tearing out an abstraction requires a lot of knowledge about the abstraction and how it is used throughout the system. Over time, statistically zero maintainers will have that level of knowledge about any non-trivial system.

In contrast, duplicated code may be annoying, but it is usually simple to understand and maintain by anyone with knowledge of the project and the programming language.

Leaky abstractions are way way way worse than no abstraction at all.

stev-0 · on Aug 31, 2023

> Given a long enough timeline, every abstraction turns wrong.

This is bullshit. I think what you mean is that even a good abstraction can be used inappropriately.

You shouldn’t compound that with the fact that sometimes people create abstractions that are just bad.

awkward · on Aug 29, 2023

A good example of this is operations type stuff, like the pile of shell scripts or terraform files or whatever that get used to deploy your app. These scripts benefit greatly from a one to one relationship between the thing you're creating and the written text describing it. Not having a situation where changing one thing breaks everything else is a huge help there.

mostlylurks · on Aug 29, 2023

> An early abstraction that would've grouped those coincidentally similar pieces of code would then have to stretch to cover both evolutions.

This seems to be the underlying assumption behind most uses of the "duplication is cheaper than the wrong abstraction" quote, but the assumption is simply incorrect. You should almost never try to expand abstractions in this manner. If you don't treat the abstractions relating to the thing you want to change in your codebase as "the" place where you need to make your change, and instead eagerly make new abstractions and throw old ones away as required, you won't really run into this problem.

In fact, this predominant mindset where creating abstractions is strongly discouraged leads to the very problem that mindset is based on, as it will simply encourage junior developers and the like to modify the existing abstraction, creating the aforementioned kind of mess where abstractions become complicated through repeated modification, instead of creating new abstractions when appropriate, because creating abstractions has a stigma attached to it.

Additionally, if someone has made a "wrong" abstraction based on something silly like two pieces of code simply being similar in terms of their structure and those use cases start to drift apart, you should feel eager to simply split apart the abstraction, be it into bare implementations or two new abstractions, or any other combination. Abstractions are cheap as long as you don't give them special significance.

sweezyjeezy · on Aug 29, 2023

I think there's a middle ground here. The original quote does not mean DRY=bad, abstraction=bad. The point is there is a non-zero cost to these things. A bad abstraction can, as you say, accumulate to something terrible through inertia or inexperience. A bad abstraction, even if caught early, was probably not worthwhile - I mean, it took time just to make the original one right? This does not mean that we should be scared of abstraction in general, but in my opinion abstractions that are purely for the sake of reducing duplication should be viewed with an extra level of apprehension.

jameshart · on Aug 29, 2023

When an abstraction evolves to a point where it needs to be split into two separate implementations to meet diverging needs…

you will need to replace that abstraction with duplication.

Which is the right thing to do because that duplication is cheaper than maintaining the wrong abstraction.

I think this post makes the mistake of thinking that the only way in which duplication comes up is that it is discovered in the codebase, and we have the choice of abstracting it away or keeping it.

On the contrary, duplication can - and should - be consciously introduced to fix bad abstractions when we find them in the codebase.

cratermoon · on Aug 29, 2023

> When an abstraction evolves to a point where it needs to be split into two separate implementations to meet diverging needs… you will need to replace that abstraction with duplication.

Hard disagree. When the formerly common parts of an abstraction evolve to no longer be common, then that duplication no longer exists. There now exists two abstractions, one for each of the diverging needs. There may be some leftover commonality that can be abstracted out, but it's no longer the original abstraction.

hannasanarion · on Aug 29, 2023

The point is that they were never actually common in the first place, only superficially similar.

You're saying we should look for duplications, abstract them, and then every time a change needs to be made to the abstraction to suit only one of the use cases, refactor the codebase to de-abstract and re-duplicate, undoing the work we did in the name of DRY in the first place.

That is a lot more work and a lot more confusion and a lot more headache for maintainers and reviewers than copy-pasting the thing the first time, having realized that the duplication was incidental, not structural.

Let's take this line of reasoning to its extreme:

I notice that there's a section of my code that's repeated twice where we add one to a value, so I abstract it into a function called add1(x:int). Some time later, at places where add1 is used we sometimes need to actually add a value other than one, so we need to make a decision: do we refactor everything and re-duplicate, or do we stick the DRY principle and make our abstraction more accomodating? The path of least resistance is to stick to DRY because it's a smaller and more comprehensible commit, so we add an optional arg, add1(x: int, operand?: int). Some time later one of the callers to this function needs to pass a vector instead of a single value, so we need our add1 function to have polymorphism and conditional logic in it now, and potentially more arguments. Sooner or later we have a frankenfunction that's hundreds of lines long and branches a bazillion ways and might as well be a turing machine in itself.

Dogmatic adherence to DRY leads to madness.

cratermoon · on Aug 29, 2023

> You're saying ... refactor the codebase to de-abstract and re-duplicate, undoing the work we did in the name of DRY in the first place.

That's the exact opposite of what I'm advocating for, but perhaps I didn't express myself well.

> Sooner or later we have a frankenfunction that's hundreds of lines long and branches a bazillion ways and might as well be a turing machine in itself.

Yeah, that's not a good abstraction, and not at all what I meant.

seadan83 · on Aug 29, 2023

To some extent I agree, though I don't think DRY means to remove all similar looking lines of code and put that behind a procedure. Generic code vs abstractions are different.

Instead, any given task (which already is an abstraction) should exist in only one place. That is DRY, I would paraphrase it to mean any given abstraction should be done in one place (and combine with SRP to say further that one place should only do that one abstraction)

If one place can be updated independently of another, it argues it is not the same task to begin with. DRY'ing that code is a misnomer IMHO, instead that code is being put behind a procedure and is being made generic (and not necessarily more abstract. Abstracting hides details, putting a block of code behind a procedure with full parameterization is not hiding details, it's just a procedure [and let us hark back to the days of procedural programming and ways that can become mess])

DRY and SRP (single responsibility principle, AKA the DnD principle) need to be considered together.

cratermoon · on Aug 31, 2023

> I don't think DRY means to remove all similar looking lines of code and put that behind a procedure.

Applying DRY primarily to structural duplication never occurred to me, and only this discussion brought this way of thinking to my attention. It's always been about semantic duplication to me. Often, but not necessarily, semantically duplicate code has structurally duplicate lines of code.

But now I think I understand why some junior programmers I consulted for denied that their system was rampant with the duplication I was seeing. To them, the code was different because it looked different; that it all was doing more or less the same thing, but with different inputs, seemed lost on them. I have a vague recollection of one of them saying something about "parameterizing" the code, and then dismissing it.

I'm going to have to dig through my notes from that gig to see if I can better clarify to myself the implications of the focus on only structural duplication and why it might lead a programmer to overlook very obvious opportunities for removing semantic duplication, or not understand how to fix it even if they do see it.

ETA: I looked waaaay back to the ur-discussion on the c2 wiki and found this: "It's okay to have mechanical, textual duplication (the equivalent of caching values: a repeatable, automatic derivation of one source file from some meta-level description), as long as the authoritative source is well known." https://wiki.c2.com/?DontRepeatYourself

dllthomas · on Aug 30, 2023

"Every piece of knowledge must have a single, unambiguous, authoritative representation within a system".

What piece of knowledge does your add1 function represent? I don't think you're strawman is actually DRY (which is a problem - damp strawmen can mildew).

I do agree that there are sometimes tradeoffs that can make a less DRY approach a better one, but not all deduplication is "DRY".

hannasanarion · on Aug 30, 2023

A critical word here is "knowledge".

Every element of data, core capability, or logical object should have a single authoritative source. Having multiple db connections or ui controllers would, in fact, be a horrible nightmare, obviously.

But not all code is knowledge, some of it's just boring work that does something to some local stuff and it doesn't necessarily need to be sharable with other parts of the code base, just because the logic looks kinda similar.

dllthomas · on Aug 31, 2023

Right. My quote was the original formulation of DRY as a principle, from The Pragmatic Programmer, and I think it's a good principle.

The common (mis?)understanding of DRY as primarily syntactic both overshoots and undershoots. As many here have discussed, it can lead to combining things you shouldn't (I jokingly call this "Huffman coding"), but it can also fail to recommend combining knowledge when it is represented differently. If I'm saying "there is a button here" in my HTML and in my CSS and in my JS, that's three places for that piece of knowledge even if those three places don't look anything alike. Changing the CSS to "here is what my buttons look like" and the JS to "here is how my buttons behave" would be DRYer.

ajuc · on Aug 29, 2023

In many cases you cannot see the correct abstraction without introducing the duplication back. When working with particularly messy code I often do sort of https://en.wikipedia.org/wiki/Karnaugh_map of important variable states to see what actually happens before I can refactor it.

This is basically introducing the duplication back.

Whether you keep the duplicated code or refactor it in a different way is another question, what matters for the "duplication is cheaper than wrong abstraction" to be true is just the fact that by introducing abstraction early you wasted time refactoring one way and back. Refactoring isn't free. So in fact leaving the duplication there would have been cheaper - Q.E.D.

It doesn't mean you should never risk it, but it does mean you should think hard before you do it.

TOGoS · on Aug 29, 2023

> You should almost never try to expand abstractions in this manner

But somebody on your team /will/.

> you should feel eager to simply split apart the abstraction

Sure, but it's going to be a lot more work at this point than if we had avoided the mess in the first place.

AnimalMuppet · on Aug 29, 2023

In that situation, the correct thing to do is, when the two pieces drift away from each other, to recognize that they are no longer the same abstraction and to break the connection. That may be painful - you have to look at everywhere that abstraction is used and figure out which thing it really is, and change the code to reflect it.

But if that's going to happen, then in the early days, a little duplication was probably better.

lukeramsden · on Aug 29, 2023

> There are many instances I've encountered where two pieces of code coincided to look similar at a certain point in time. As the codebase evolved, so did the two pieces of code, their usage and their dependencies, until the similarity was almost gone

https://connascence.io/

ilyt · on Aug 29, 2023

> There are many instances I've encountered where two pieces of code coincided to look similar at a certain point in time. As the codebase evolved, so did the two pieces of code, their usage and their dependencies, until the similarity was almost gone. An early abstraction that would've grouped those coincidentally similar pieces of code would then have to stretch to cover both evolutions.

Then you split that abstraction again. It's very cheap and very quick.

Many people talk about the issue like it was an absolute in the code, but that's wrong approach. If you end up writing 4 functions that are the same, by all means, merge it into one.

If then you need to add a parameter only this code path uses and rest doesn't care about, by all means split it back. Moving blocks of code around is cheap.

jrumbut · on Aug 29, 2023

I think the key here is the oft repeated but often poorly understood maxim to favor composition ("has a") over inheritance ("is a").

If you have a mixin (or other means of composition) that you use in several places and one diverges, it's easy to remove it. If you use inheritance, it's going to be more painful.

A language that offers OOP via prototypes instead of classes like JS can (sometimes) give you the best of both worlds, but it will confuse a lot of devs who aren't familiar with that kind of OO design.

oxfordmale · on Aug 29, 2023

Splitting the abstraction is never cheap and quick, mostly because of politics. With duplicated code you often can assign a single responsible owner to each duplication.

However, once abstracted, the code may suddenly be used by a number of different teams. You will need to get this work on their roadmap, increasing the friction to get this done. In many companies, this will also end up in endless discussions about the new approach.

yellowapple · on Aug 29, 2023

Solution there would be to make the abstraction "opt-in", such that a team can elect to duplicate or abstract as desired. Also helps if the "main" abstraction is itself composed from smaller abstractions, from which downstream teams could then pick-and-choose rather than having to either fully abstract or fully duplicate.

esafak · on Aug 29, 2023

This is a good point. Following Conway's Law, a team may choose to duplicate code or do thing theoretically sub-optimally simply to avoid having to deal with other teams.

bheadmaster · on Aug 29, 2023

There's a very good quote on a programming blog [0] that I enjoyed reading:

    Repeat yourself to avoid creating dependencies, but don’t repeat yourself to manage them.

[0] https://programmingisterrible.com/post/139222674273/write-co...

feoren · on Aug 29, 2023

You're absolutely right that it's important to look beyond how two modules superficially look right now, and look instead at how they change. However, if you've always defined your abstractions based on what their consumers need rather than what their implementations have, then you shouldn't ever need to stretch them. They're not trying to "cover" both cases, they're trying to solve a problem that both cases have. Your two cases are not implementations of the abstraction, they are consumers of it. If one case grows to not have that problem, it just stops asking for that abstraction. If it grows to have more problems, it just asks for more abstractions. The original abstraction, if based on a common need, doesn't have to change.

That's not to say abstractions never change -- they do. But they change because your understanding of the sub-problem they're solving has changed, not because their implementations or consumers have changed.

bena · on Aug 29, 2023

The answer, as always, is "sometimes".

Sometimes duplication is cheaper than the wrong abstraction.

And

Sometimes it's better to abstract away a duplication rather than let it lie.

And that's the mark of becoming a master at the craft. Being able to recognize all of these various slight permutations of state and what to do about them.

rightbyte · on Aug 29, 2023

Rule of thumbs really need to be told like this. Or they will be missused. Either by newbies that doesn't know any better or unpleasant programmers that will show their dogmatic beliefs down your throat with the common wisdom as excuse.

capableweb · on Aug 29, 2023

As a FYI, just as it's OK to abstract away duplication in code, it's OK to do the opposite, remove abstraction and add duplication.

So in your particular case, it could have been possible to abstract away the code at that point in time and once they diverge, remove the abstraction and duplicate, then adjust one of the duplicates (which no longer is a proper duplicate really).

But, might be more work than it's worth. YMMV.

patrick451 · on Aug 29, 2023

> As a FYI, just as it's OK to abstract away duplication in code, it's OK to do the opposite, remove abstraction and add duplication.

> So in your particular case, it could have been possible to abstract away the code at that point in time and once they diverge, remove the abstraction and duplicate, then adjust one of the duplicates (which no longer is a proper duplicate really).

This sounds nice in theory, but the reality is that the effort required to make these two kinds of changes is not symmetric. It's about 10 times easier to get a PR approved and merged that combines similar looking code into a function than vise versa. If you any suspicion at all that an abstraction you're making may need to be removed and duplicated in the future, you're better of just never abstracting in the first place.

It sucks pushing a change which unwinds an abstraction like that through code review. It's usually a lot easier to just never abstract it in the first place.

__alias · on Aug 29, 2023

I buy into the same belief as you here, but I guess you could easily argue that you could create a suitable fitting abstraction earlier on with the understanding that you can "detach" them once the point that they're fundamentally different comes

consilient · on Aug 29, 2023

The point of abstraction is to reduce the number of concepts in play. If you're still tracking which old concept is "really" being used every time, you haven't actually abstracted over anything, you're just naming things badly.

capableweb · on Aug 29, 2023

> The point of abstraction is to reduce the number of concepts in play.

I'm not sure I agree with this. For me, the point of abstraction is divide the number of concepts between the layers you introduce, effectively to hide concepts from the layers where you don't want to have to care about them. Often times, abstractions adds the total number of concepts at play, but hides them beneath/above the layers.

fluoridation · on Aug 29, 2023

The problem is that there's an impetus to continue working on top of established facilities, because it's usually incrementally less work than reworking a piece of code into something else. Plus it's difficult to recognize ahead of time when something is about to become a problem, rather than fix something that's already a problem.

codegeek · on Aug 29, 2023

Also, you become a better programmer if you write duplicate code and then learn how to abstract it for cases that make sense. I also don't believe that dupe code is always a bad thing. Like everything else in software engineering, IT DEPENDS.

seadan83 · on Aug 30, 2023

I think this is an example that highlights abstract vs generic code. An abstraction should have hidden those evolutions entirely, which potentially means there would have been two code paths behind the abstraction. Moving that logic to a generic procedure with full parameterization I wouldn't call an abstraction (it's code that has been made generic). Generic code is more complex. DRY is not about making everything that can be generic, generic - it's about making sure a single thing is done in one place (and only that one place, DRY & SRP go together).

Gibbon1 · on Aug 29, 2023

My problem always is often when writing a function to remove duplication brings up the question of where to put it. If its only called inside one module doesn't matter really. But if not you've created a dependency. Which is bad.

I think how much you hate that may depend on your language and the program. Some big enterprise Java monolith is a garbage dump of thousands of small files. So who cares. In C without name spaces and the need for headers you care more.

schwartzworld · on Aug 29, 2023

The problem is that "duplication is cheaper than the wrong abstraction" is basically an excuse that lazy devs use not to engineer their code.

The other one I hear a lot is "it's not realistic to reach 100% test coverage / type safety" when submitted code with `any` all over it and zero tests.

devoutsalsa · on Aug 30, 2023

One of the biggest problems with deduplication is you can end up with shared code that's full of corner case handling for different situations. Then your nice shared example becomes a tangled rats nest you can't unravel.

rdedev · on Aug 29, 2023

You got a good point about code evolution. Has anyone taken a look at it from a biological perspective? Seems like such problems can occur in genetics and nature might have come up with some tricks we can use

yellowapple · on Aug 29, 2023

> An early abstraction that would've grouped those coincidentally similar pieces of code would then have to stretch to cover both evolutions.

In that case, my takeaway would be that it ain't the abstraction itself that's wrong, but the unwillingness to get rid of it (or decompose it) when it no longer serves its purpose.

crabbone · on Aug 29, 2023

So... you kept modifying the two similar pieces of code until they became dissimilar. Why do you think that you wouldn't be able to modify the abstraction if you saw that it doesn't fit anymore?

convolvatron · on Aug 29, 2023

I think part of the issue here is that a fair number of programmers work in shops where they have very limited agency. They are tasked with making the minimum defensible change to add a feature or fix a bug. They are not allowed to change the tests or suggest refactoring. So those things just don't occur.

crabbone · on Aug 30, 2023

I see, but wouldn't this lack of agency, or, more precisely the inability to escalate the problem to someone with more edit rights be the actual issue?

I could see this also happening in situations where such problems arise at component interface boundaries, where the pressure not to change comes not as an organizational policy but rather inability to influence external components (eg. the OS offers a poor abstraction -- the user-space program developer wouldn't be changing how OS works to rectify things). But, this is again sort of an administrative problem, because, ideally, the user-space program developer should be able to convince the OS developer to change the interface if it's found to be not a good abstraction...

But, yes, I can see how in practice that'd be a very difficult thing to do.

convolvatron · on Aug 30, 2023

I guess I was trying to claim that in a lot of places there isn't such a person.

yeah - if an application company runs up against a poorly designed OS interface - they usually aren't aware, or just back away slowly. they don't have the scope or mandate to pursue it.

that kind of behavior often extends to library dependencies, and even internal interfaces that the company should ostensibly own.

its not so much an administrative problem as a desire to limit spending and time on software and do a kind of agile mvp glue job over an unbounded number of external dependencies. that often leads to an unmaintainable hairball. but if the alternative is to hire a bunch of really experienced developers and let them 'do the right thing' for 5 years...

even if you do, there is a still a good chance that the output isn't gonna be that great. software people lost a lot of credibility with that model. we've probably swung too far in the other direction

mdiesel · on Aug 29, 2023

Equality doesn't necessarily mean Equivalence.

MilStdJunkie · on Aug 29, 2023

I might have to say some unkind things here, but statements like:

  instead of “duplication is cheaper than the wrong  abstraction”, I would say “duplication is cheaper than confusing code littered with conditional logic”.

seems like it's looking at this problem from an extremely narrow context.

The truth is that the phrase "wrong abstraction" is (more or less) unquantifiable, which makes the original phrase, as employed, sort of like a koan. It addresses the very human tendency to see patterns in noise, and our ability to "transmit" such hallucinations to other humans via natural language and other means.

The closest I can get to - given my at-best-apprentice status as a formal programmer - is the quantitative test I developed for CCS (conditional content systems), where the abstraction lies in the SNS[1], and the de-duplication mechanism is applicability[2]. Since each applicability statement carries its own overhead, there's a limit on how much "abstraction" the model can take before it's using quantitatively more keystrokes than duplication.

The test goes like this: take the flat text procedures for ALL the configurations, and add it together. Now, take the conditionalized, applicability-laden procedure that unifies the procedure, and measure its file size. If the latter is LARGER than the former, then you're using the wrong SNS/applicability model for rolling up this content.

Thing is, this is inevitable if you throw enough dissimilar configurations at a CCS, because each configuration has its own overhead, and eventually that outpaces the content itself.

You can address this in a bunch of ways - like adding a containing pseudo-product that has all the configurations inside of it - but the actual real Product Management might not let you build on the applicability like that, because the Product itself isn't sold that way. Any other abstraction isn't available to you, because in the end this is natural language, which - unlike structured language - resists first order abstractions really well. This is one of those instances where, yes, the abstraction of the SNS/Applicability is worse - quantifiably - than duplication. All that complexity would be better handled via version control fork/branch relationships - far outside of the realm of natural language.

[1] standard numbering system, a sort of numeric designator of functional systems, the primary way that content is designated as semi-independent modules.

[2] conditional "chunks" that turn on and off depending on the applicability statement

feoren · on Aug 29, 2023

It'd be wonderful if we could measure the utility of software engineering choices by counting keystrokes or measuring file sizes or putting them in a turbo encabulator and seeing which one has more modial interaction with its magneto-reluctance. Unfortunately, reality is just too complicated, with far too many tradeoffs to be balanced. I'd recommend deep thought and discussion about the domain over looking at a graph of your codebase's sinusoidal repleneration.

> All that complexity would be better handled via version control fork/branch relationships

Probably not.

MilStdJunkie · on Aug 29, 2023

Holy smokes, my turbo-sarcasmo detector just broke! But yeah, that's more or less the TLDR of my point. The phrase "wrong abstraction" does some heavy lifting, but it's not a bad concept, even if largely a qualitative one. No one should use a single metric to toss ginormous architecture decisions - they're tools to inform educated judgement, not replace it.

Re: fork/branch shenanigans, no, you're right, that's not an optimal way to handle variance . . in a normal programming language. In the context of natural language, it's not the same kettle of fish, because, well, lots of reasons, probably the most prominent being the "messy unidirectionality" of NL that's all mish mished with its extremely complex grammar vs constructed languages. Chopping up giant documents into tiny pieces a la CCS[1] systems has made this a stew of problems, but for some reason Leadership is fond of the idea. It's not unlikely that specialized on-prem LLMs are going to nuke the CCS concept from orbit in the next five years, except for those cases where the CCS is a contractual requirement for doing the work.

[1] component content systems

gorjusborg · on Aug 29, 2023

The saying 'duplication is cheaper than the wrong abstraction' is a gem of a saying, but like many pieces of wisdom, takes experience to fully understand.

I first saw the saying when DRY was being applied without any nuance. If a piece of code appeared in two places, it was obvious, and important, to factor it out, because that was 'good coding practice'.

The saying being discussed was pushback against that kneejerk, thoughtless application of DRY. The 'cheaper than the wrong abstraction' is pointing out that DRY isn't a 'no tradeoff' policy. By factoring out any duplication, many uses pass through the same code. If the uses don't quite match, there is a tendency for the code to get modified to fit them anyway. This, over time, makes the shared code simultaneously unfit for use, and widely used. A recipe for poor code quality and system health. Ironically, this is the outcome that DRY was called in to address.

wvenable · on Aug 29, 2023

The most important thing to note about DRY is that it's not about code -- it's about knowledge. You should not repeat knowledge -> logic, constants, etc. If the temperature is 87 and the price of the widget is 87 that is coincidence and not repetition.

There should just be one source of truth for any logic or process. If you duplicate that then bad things will eventually happen.

newaccount74 · on Aug 30, 2023

> You should not repeat knowledge -> logic, constants, etc.

It's easy to shoot yourself in the foot with that. I spent a few years working with binary formats, where you have lots of constants (byte offsets, flags, etc).

There were two approaches to dealing with this:

1) Some people just put the raw numbers in the code, and if it wasn't clear added a comment

2) Other people used constants for everything, defined somewhere else, often through multiple layers of abstraction.

If you follow DRY, then you should always chose 2). But in my experience, this often makes the code extremely hard to read. You often have to look through a dozen header files to find the constants used in a 5 line bit of code, and it becomes really hard to reason about the code.

And in the end you almost always still need some duplication, because you have other files where you can't use the constants (eg. in sample files, docs, external libraries,etc.)

wvenable · on Aug 30, 2023

I don't understand? Why do you need to know the value of the constant if the name is descriptive enough?

    X = 5;  // File header offset

vs.

    X = FILE_HEADER_OFFSET;

I don't think there's much value in the idea that just because someone can implement something badly, it validates the whole idea. You should always use constants instead of raw values if they have some meaning. If someone, somehow, manages to make that so complicated that it's a pain it's still not the fault of the concept.

> And in the end you almost always still need some duplication, because you have other files where you can't use the constants

And those will invariably end up out of date and incorrect at some point.

newaccount74 · on Aug 30, 2023

When I'm looking at a hex dump, FILE_HEADER_OFFSET doesn't tell me where to start looking for the header. So I need to open a new tab in my source code editor, find the header file where FILE_HEADER_OFFSET was defined, only to figure out that it is defined as FILE_MAGIC_LENGTH + RESERVED_FLAG_LENGTH, and FILE_MAGIC_LENGTH is defined as LENGTH(FILE_MAGIC_LITERAL), and FILE_MAGIC_LITERAL is not defined in the source code, because it is provided as a compiler flag that is generated by the Configure script, and there are three conflicting definitions of RESERVED_FLAG_LENGTH, and you are not sure which one is the correct one.

wvenable · on Aug 30, 2023

Get a better editor. I can mouse over FILE_HEADER_OFFSET anywhere it's used and get the value.

But still your describing a bit of a wild setup. If the value of FILE_HEADER_OFFSET is 42, how did you come to calculating the value? What if MEMORY_BLOB_OFFSET is also coincidentally 42? If you have dozens of source files filled with literal magic values that's completely unmaintainable and inscrutable.

A constant doesn't have to be in a separate file if it's only used in one place.

jasonswett · on Aug 29, 2023

Totally agree, although I'd maybe replace "knowledge" with "knowledge or behavior".

starbugs · on Aug 29, 2023

Anything pushed to the extreme will result in its opposite.

There's an interesting Wikipedia page about a concept from philosophy called "Unity of opposites": https://en.wikipedia.org/wiki/Unity_of_opposites

Worth a read IMHO.

zogrodea · on Aug 29, 2023

Thanks for the link. This sounds more useful to refer to (because more general_ than horseshoe theory.

starbugs · on Aug 30, 2023

Maybe also worth a watch is this video from Ian McGilchrist:

Ian McGilchrist: The Coincidence of Opposites

https://www.youtube.com/watch?v=N4AFdNxLmb4

Intro skipped: https://youtu.be/N4AFdNxLmb4?t=340

serial_dev · on Aug 29, 2023

> If the uses don't quite match, there is a tendency for the code to get modified to fit them anyway.

And this is essential, this is how you'll end up 5 arguments and 6 further bool flags to an 7-line function.

ilyt · on Aug 29, 2023

And one of them for some reason takes from environmental variable...

_the_inflator · on Aug 29, 2023

DRY should be an option.

Results may vary and depend on the code in question as well as the language you are using.

We - a former team a couple of years ago using Java - started to duplicate code in Java, because we were totally tired of interface'ing and class'ing everything away that was not DRY. It became to tedious to bloat code with them as well as understanding whole classes when all you got was references to other interfaces etc.

If there is a small service architecture like in Angular with TypeScript, abstracting away becomes fun and useful.

It all depends. But what I really do not miss is the pile of interfaces in Java and C#. These became so tough to grasp and entangled, that we DRY'ed this cesspool. DRY on DRY so to say.

switchbak · on Aug 29, 2023

So your issue was with the nature of the language and the size of the project more than the application of DRY?

I think I see what you're getting at, but I've certainly also seen very large Java projects that are simple at a high level and composed in such a way that they're still legible without a ton of duplication. These might be somewhat orthogonal concepts.

devjab · on Aug 29, 2023

I think it’s down to the systems, and I think the people who favour abstraction often forget who needs to write it. Duplication isn’t just cheaper than the wrong abstraction, it’s cheaper than almost any abstraction. Not because it should be, mind you, but because duplication works for a tired Thursday afternoon programmer and abstraction doesn’t. Maybe it’s because I spent some time in management, but a key concept I worked with when I did that was how we have two modes of mental capacity. One where we have the energy and wit to do the right thing, and one where we haven’t slept for a week, and, well… it’s Thursday afternoon after a day of too many useless meetings.

I think the best way I saw it put was for a Theme-park to coin a slogan that any employee would be able to find inspiration in when dealing with a customer on that Thursday afternoon. To me most abstractions are similar to having a slogan along the lines of “Think Different”, which is an absolutely useless concept when you’re tired and dealing with an angry customer in your summer job about an hour before you clock out.

I obviously don’t think you should avoid all abstraction. The author of the article is right, theoretically at least, it’s just that this way of thinking rarely works out. Similar to you, my experience is that it tends to fail after a few years of changing needs.

These days I favour abstraction only when it’s use is never altered in the slightest. For everything else duplication is so much easier to handle over 5+ year periods. Of course there are many ways to deal with this. Small single purpose functions are abstractions as well, just don’t build big OOP hierarchies. Because they just don’t work for those Thursday afternoons.

mekoka · on Aug 30, 2023

Entirely agree. DRYing is compression. It serves similar purpose. Like a mathematical equation, it's terse, light, elegant, easy to carry in one's pocket, yet loaded with meaning. It's not, however, zero cost. At maintenance time, it needs unpacking. DRYing is also not an exact science, nor a hard rule. It's the factorization of a specific idea (at least that's what it should be). When applied, it requires usability and cognitive considerations. That's the delicate trade-off. "Will people coming after me have an easy time figuring this out?" The newbie who comes to an existing code base and proceeds to indiscriminately DRY things up often doesn't realize that the reason they were able to do so in the first place is because, although repetitive, they could understand the original source, often without much effort. That's why repetition is cheaper than the wrong abstraction.

djha-skin · on Aug 29, 2023

I'm a DevOps engineer. I totally buy duplication is better than the wrong abstraction but I'd like to nuance it: duplication is better than an abstraction used by two disparate parties (groups of people that don't talk to each other).

This is in agreement with Conway's law, which absolutely governs everything I do. I work on a DevOps team that supports several different development teams all working on different things. The code I write for those teams I often duplicate along team boundary lines. Build scripts, for example, I write and I put them in each team's git repository. These might look very similar. This allows the scripts to grow and change and evolve according to the different teams needs without the teams needing to talk to each other.

"Proper duplication" goes back to separation of concern. If you have two different concerns (using the lens of Conway's law, two very different teams) using the same code, perhaps they should not be using the same code because that is not a separation of concern. Separate the concerns by separating the code paths both concerns use.

This type of duplication is praised in more depth on wingolog[1]. I highly recommend reading it as something every engineer should read.

It's very important to know when to duplicate and when not to do so, because duplicating it the wrong time can lead to pain, but not duplicating can lead to pain also.

1: https://wingolog.org/archives/2015/11/09/embracing-conways-l...

dahart · on Aug 29, 2023

This. I think you’re hitting the nail on the head. The question is whether there are multiple dependencies on a given bit of code. When there are multiple dependencies, changing the code because one of them wants something means the code needs to be checked and tested against all the other places the code is being used. And it’s really really common to have inadequate understanding and inadequate test coverage, so things break, and hence people develop superstitions about code that shouldn’t be touched.

Another way of putting it is that if the code is really truly duplicated, then it doesn’t need to change at all. If it has to change, the need for change is there because the multiple parties depending on that code have slightly different needs and slightly different ideas about what they want. Abstracting the code to make deduplication happen is just a way of spackling over those differences, but it can and does often cause trouble down the road, even when it’s done well. Once abstracted for two dependencies, a third dependency or more without test coverage can make changes exponentially more dangerous and error prone.

Duplication is good when forking for separate parties (or separate dependencies), each of whom may wish to customize the code, and now they are free to do so without the risk or fear of breaking someone else. I feel like the author of the article didn’t understand the benefits of duplication.

waffletower · on Aug 29, 2023

Very sad hearing this from a DevOps engineer. While the config smear of ops is encouraged by their tools (Terrorform is a fantastic example), a DevOps engineer that does not dedicate themselves to DRY practices will erode the productivity of an organization by default. I remember how Terrable things were before our Ops team developed strong module abstractions for our infrastructure. And get them to talk to each other.

ncruces · on Aug 29, 2023

This is an example of Go's proverb: A little copying is better than a little dependency.

schnable · on Aug 29, 2023

100% agree. Coupling between systems and teams is very expensive and should be done as deliberately as possible.

cogman10 · on Aug 29, 2023

This is a very amateurish take The author very clearly (at least at the time of writing this) has not dealt with complex code bases.

> If I were to see a confusing piece of code littered with conditional logic, I wouldn’t see it and think “oh, there’s an incorrect abstraction”, I would just think, “oh, there’s a piece of crappy code”. It’s neither an abstraction nor wrong, it’s just bad code.

This is the primary issue. The author does not recognize that poor abstractions can involve more than just a lot of conditional logic. That sometimes, that conditional logic bubbles in places where secondary to where the bad abstraction was made.

A simple (real) example of this. One seen code where "get, these two objects share a field, let's pull out a base object and have them both inherit from it, after all, duplication is bad!". Then later on "hey, here's two other objects with the same field, but they don't have that old base objects field, duplication is bad, so let's make a third base class"

This sort of thinking resulted in a really gnarly object graph. But further, down stream code had to do type checks and casting to compensate for this bad abstraction.

All because the original dev didn't want to duplicate a field on two otherwise unrelated objects.

And worse, you the dev that works on this code years later are left with the option "keep it as is, of rewrite and touch 100s of files potentially breaking large amounts of code)."

Oh, not too mention the unit tests that accompanied such code, ironically, filled to the brim with duplication around this hierarchy making minor charges massive.

On smaller less complex code bases you rarely see this comedy/tragedy play out.

feoren · on Aug 29, 2023

Class inheritance is flawed because it tries to be two things at once: a shared "surface" (public members, polymorphism, etc.) and shared implementation. An abstraction is only a surface -- this could be an interface, a function declaration, or even a data model. It almost never happens that implementation-sharing and surface-sharing completely coincide, and this is why class inheritance is falling out of favor and something I completely avoid (occasionally I will use abstract classes, but I usually regret it later). This is where "favor composition over inheritance" comes from. I'd go so far to say that because they cannot be completely divorced from implementation details, base classes cannot even be called abstractions.

So if "wrong abstractions" includes shoddy base-class shenanigans, then the statement becomes almost tautological. Of course duplication is better than class inheritance -- everything is better than class inheritance. So the real statement there is "class inheritance is actually awful", which is important to understand, but a side point to this debate.

If you don't count class inheritance as abstraction, then the tradeoff between code duplication vs. abstraction becomes much more nuanced, and that's what all this discussion is about. I certainly don't agree that ignoring class inheritance is a signal that the author is amateurish. Many complex codebases have no class inheritance at all.

jeremyjh · on Aug 29, 2023

Inheritance is very useful in domains like game engines, where it is very common to have a base object such as "Node" that has some properties that every object in the scene graph must have, which all share the same implementation. For example they should all have a parent property and a collection of children, and ways to modify those properties. They'll also share methods such as "render" which probably must be overridden in every subclass. Its not impossible to solve this with interfaces and composition but those solutions are sub-optimal.

An example you might be more familiar with is the DOM of a web browser - every element has some basic properties and methods that all share an implementation.

feoren · on Aug 29, 2023

Quite the opposite: game engines are one of the few places where the sub-optimality and fundamental problems with object inheritance became so overwhelming that people starting abandoning their deeply ingrained CS 101 models of Dog : Animal and invented Entity-Component-System architecture, which at its extreme uses no object inheritance at all and is a deeply "relational" model. Game engines which don't do this were either mostly developed before ECS was invented/popularized (Unreal) or are specifically targeting beginners who have little more than a CS 101 understanding of OO programming (and also following Unreal's lead).

DOM elements are a better example, but just because that's how they are done doesn't mean that's how they should be done. Does a <script> element really need a "focus()" method? It has one. Does a <br> element need an "innerHTML" property? It has one. Does a <head> element need an "offsetHeight" property? It has one. If you look at the history of the development of HTML and JavaScript as a shining ideal of software engineering, you're certainly in the minority (this is all before TypeScript, which is a shining ideal of type systems!). The HTMLElement class has 134 properties, most of which make no sense for most elements. It has a long history and a lot of excuses for becoming what it is today, but I would not recommend you follow that lead in your own designs.

whstl · on Aug 29, 2023

Not really. Composition has been the preferred technique for a long time already.

A lot of Games and GUIs that use inheritance worked in spite of that inheritance. In more complex object graphs there were always things like override boolean DoNotActuallyRender() in one or two children of the RenderableNode class to account for special behaviour.

ECS is just the nail in the coffin of inheritance in game engines. And it's not even new anymore, it has been fashionable for what, almost 15 years now?

yellowapple · on Aug 29, 2023

> Its not impossible to solve this with interfaces and composition but those solutions are sub-optimal.

The growing popularity of ECS and data-oriented design in game engines suggests otherwise: keeping components separate from entities enables both performance enhancements and separations of concerns that are much more difficult to achieve with the traditional inheritance-based approach. To illustrate a bit:

> it is very common to have a base object such as "Node" that has some properties that every object in the scene graph must have, which all share the same implementation. For example they should all have a parent property and a collection of children, and ways to modify those properties.

You don't need subclasses for that; you just need a table of entity IDs (where both the things to render and the scene itself are entities) and parent IDs, which you can then recursively walk to get the entities you want to render:

    WITH RECURSIVE entity_children AS (
        SELECT id, parent FROM entities
        UNION ALL
        SELECT ec.id, ec.parent
        FROM entity_children AS ec
        JOIN entities AS e ON ec.id = e.parent
    )
    INSERT INTO scene_entities (scene, entity)
    SELECT $scene_entity_id, id
    FROM entity_children
    WHERE parent = $scene_entity_id;

(Obviously you probably won't actually be running SQL queries in a game engine's rendering loop; this is just to illustrate the logic.)

Once you've got that list...

> They'll also share methods such as "render" which probably must be overridden in every subclass.

You don't need subclasses for that; you just need a table of entity IDs and things to render, which you can then query and send to the GPU:

    INSERT INTO some_buffer_in_GPU_memory (entity, mesh, texture, position)
    SELECT se.entity, em.mesh, et.texture, ep.position
    FROM scene_entities AS se
    JOIN entity_meshes AS em ON se.entity = em.entity
    JOIN entity_textures AS et ON se.entity = et.entity
    JOIN entity_positions AS ep ON se.entity = ep.entity
    WHERE se.scene = $scene_entity_id;

(Again: you probably ain't actually using SQL for this; this is also overly simplified, since most modern game engines use all sorts of other stuff besides a mesh, texture, and position when rendering something. Note also that "em.mesh", "et.texture", and "ep.position" need not be actual meshes/textures/positions, but could instead be indices into buffers already on the GPU.)

The key advantage in both of these cases is that the parent/child data and the render data can live where they make the most sense, and can be processed by independently-running systems with minimal contention. This is critical for processing game logic in parallel - something which the game industry is learning the hard way with legacy engines that can't fully exploit multicore hardware.

ozim · on Aug 29, 2023

I was looking for a word to describe my feeling about the article and "amateurish" fits the bill.

What mostly took me down was: (for example, the same several lines of code duplicated across distant parts of the codebase dozens of times, and with inconsistent names which make the duplication hard to notice or track down)

It is silly example because in such scenario there is no way you can even start writing abstraction to handle that.

Other part is what cogman10 wrote, wrong abstraction is not "simply piece of code gathering if statements". Wrong abstraction is piece of code or whole part of system where you cannot simply add an if statement and get going. Wrong abstraction might be something that actively prevents you from changing code in meaningful way.

There is also another comment I would riff off about DevOps and having scripts per team/domain even if mostly those look the same you never know what the team will require. Nowadays domain driven development is in vouge, mostly because it recognizes separation of concerns is much more important than DRY.

To finish off, author also assumes abstractions are born by de-duplication of code, yes we discuss "duplication is cheaper" so as finishing wanted to rant on something. Worst abstractions I saw in practice were born in heads of "Astronaut Architects" who built some system top down making stuff up "because it should be like that". Other bad ones were done by junior devs who were high on DRY.

laserlight · on Aug 29, 2023

> It is silly example because in such scenario there is no way you can even start writing abstraction to handle that.

You start writing the abstraction the first time you duplicate, so that you don’t end up with this mess down the road.

kiitos · on Aug 30, 2023

Better rule is the first time you duplicate, just copy/paste the code. But once you duplicate a third time, that's a pretty good signal that you have something that is common enough to be abstracted, and you can write the abstraction then.

laserlight · on Aug 30, 2023

In my experience, one is unlikely to remember how many times a piece of code is being copy-pasted, unless they are doing it in the same coding session. Everyone copy-pastes, thinking that they'll DRY the next time and the codebase turns into a mess.

kiitos · on Aug 30, 2023

Well, when you make a new abstraction, that same PR will presumably also include changes to some N call sites to make use of that abstraction, right? The important thing is that N>=3. Anything less than that is, more often than not, premature. And in my experience, premature and/or leaky abstractions are far (far) more harmful to codebases than copy-pasted code.

laserlight · on Aug 30, 2023

I abstract, i.e, put code into its own function, even when N=1, to make surrounding code more readable and simpler. At N=2, I definitely abstract. Abstracting doesn't mean building class hierarchies, when they are not needed. I abstract at the required level.

Premature abstraction, in my book, means abstracting in expectation of future needs. It is different than abstracting small and often. My abstractions are not premature, because they address existing needs.

I view copy-pasted code as cancer of a codebase. It is very difficult to tell if two similar looking code function the same way, therefore making it practically impossible to DRY away, whereas calling the same function leaves no room for guess. Copy-pasted code reduces readability, understandability, makes code longer. It never happens under my supervision.

kiitos · on Aug 30, 2023

Given

    fn a
      foo a
      foo b
      foo c
      bar d
      bar e
      baz f
      baz g
      baz h

and

    fn b
      foo
      bar
      baz

    fn foo
      a
      b
      c

    fn bar
      d
      e
    
    fn baz
      f
      g
      h

which of a or b is more readable and simpler? (Spoiler: it's almost always a.)

DRY isn't about repetition of text in a source file, it's about repetition of authority in your domain model. You evaluate it at an architectual level, not in PR diffs or whatever.

laserlight · on Aug 30, 2023

Of course fn a is simpler and more readable, simply because it's not 200-lines long and there are no indentations and there are no states to keep track of, etc. This example is not at all like a real code and is just a straw man of my argument.

DRY at any architectural level corresponds naturally to repetition of text in source files.

kiitos · on Aug 31, 2023

> DRY at any architectural level corresponds naturally to repetition of text in source files.

The "repeat" in "don't repeat yourself" refers to semantics (data and sources of truth) and not syntax (text in source files).

This is a common misunderstanding, which is one of the things that tends to separate junior and senior engineers.

laserlight · on Sept 1, 2023

Agreed.

bitblender · on Aug 29, 2023

I think your disagreements are valid, but I don't think it is fair to say this is an amateurish take or infer the author's level of experience. Your example of unnecessary inheritance hierarchies (which I have also faced many times in real world scenarios) may even be a symptom of exactly what the author is saying: what you might call a "bad fitting abstraction" the author would just call "bad code". The implementation details of how code gets shared (composition vs inheritance) is a subtle but still vital consideration to the cost benefit analysis. The author is observing that it might be misleading or dangerous advice to urge developers to choose duplication just because issues with abstraction have been historically observed, which I completely agree with and do not consider myself to be an amateur. I also agree with you and other posts that the author fails to mention the (exponentially higher) costs of abstraction boundaries that also span human organizational boundaries.

em-bee · on Aug 29, 2023

i wouldn't create a base class until there are a non-trivial amount of common properties shared by several classes and i find that i am adding more such common properties. and when a class appears that doesn't have one of these common properties, then perhaps it makes more sense to move that one no-longer-common property out of the base class back into the individual classes so that again i can have all classes share the same base class.

cochne · on Aug 29, 2023

> Not every piece of code is an abstraction of course. To me, an abstraction is a piece of code that’s expressed in high-level language so that the distracting details are abstracted away. If I were to see a confusing piece of code littered with conditional logic, I wouldn’t see it and think “oh, there’s an incorrect abstraction”, I would just think, “oh, there’s a piece of crappy code”. It’s neither an abstraction nor wrong, it’s just bad code.

The wrong abstraction isn't crappy code itself. It is a reasonable looking piece of code that will force the next person into writing crappy code to accommodate it.

Edit: I think the entire project of TensorFlow is a good example of this. They built the library around a "graph" entity, and anything you did had to be shoehorned to fit that. That worked OK for some straightforward neural networks and situations for a while. As the area evolved though, it proved very burdensome. They tried to evolve it into TensorFlow 2.0 which was more forgiving, but by that point it was too late, the ecosystem became a mess. PyTorch stole the thunder because they didn't make the wrong abstraction (though I'm not sure if "duplicating" is what helped them do that)

kristov · on Aug 29, 2023

Abstraction is not just about hiding code - its about reducing options. You purposefully reduce options to make the system easier to reason about. A "function" in a programming language is an abstraction over machine code. It looks like variables have scope in an isolated environment, and it looks like the braces mean something, but it's compiled down to machine instructions that have no such concept. Goto considered harmful, but compiled machine code is littered with jump instructions (of course). You can do a lot of funky tricks with machine code that the higher abstraction of a programming language doesn't let you do. When you create an abstraction you reduce options for the user of that abstraction. So abstractions tend to gather cruft over time because users want those restrictions relaxed to do their special thing.

feoren · on Aug 29, 2023

Absolutely right. One of the most important questions to ask an abstraction is: what can I not do with this? If the answer is "nothing -- you can do everything you could before", then the abstraction is an inner platform. The entire power that abstraction brings is in "focusing" on the problems we care about solving; it must make other problems impossible (ideally ones we don't care about). It follows from the No Free Lunch theorem.

One way to make sure your abstractions are focused on solving the right problems is to always define them based on what you need, not based on what you have. The root of the abstraction vs. duplication debate comes down to this. Indeed it's unhelpful to look at two pieces of code and say "these look the same; I will abstract them!". Instead you say "wow these have really similar needs; I will define exactly what that need is and they'll both ask for it."

disintegore · on Aug 29, 2023

I resent how much we've trained developers to value concision over everything else. I can't tell you how many times I've seen people use DRY as a justification to alias stuff that's already heavily abstracted by the framework that they use, ending up with less useful interfaces. Either that, or they'll explode the cognitive load by building crazy type hierarchies and inserting opaque anti-patterns like factories and decorators and whatnot.

These are "the wrong abstractions" in the sense that they're not actually crappy code full of conditionals and are actually well-redacted and not all that hard to decipher. They're "the wrong abstractions" in the sense that there's either a way to do it that is simpler and makes fewer assumptions, or in the sense that they are worse than "no abstraction" which is to say sticking to the abstractions that have already been invented for you by people whose jobs it is to do that exact work for millions of engineers and are therefore probably way better equipped.

dgb23 · on Aug 29, 2023

The most important underlying issue isn't discussed in the article:

DRY must be understood and applied correctly.

"Every piece of knowledge must have a single, unambiguous, authoritative representation within a system"

The keyword here is _knowledge_.

When we see duplication, repetition and so on, then that might be because that piece of code represents:

- data of different entities that have similar structures

- logic that just happens to be similar.

- boilerplate code

None of these things have anything to do with representing the same piece of knowledge in a program. In fact, you can easily get into trouble _especially_ if you think the first two things are violating DRY when they are not.

I agree with the article, wholeheartedly though. If your code or data _model_ is not DRY, you can get into trouble very easily. Very nasty bugs, regressions during maintenance or extension, hours spent in frustration, money lost etc. On top of that: Non-DRY code almost always _proliferates incidental complexity_, because if you don't fix it, then eventually you patch over it.

Here's the best case scenario: Even if you are aware of code not being DRY, do everything right and turn multiple knobs at the same time to change or extend it correctly instead of fixing it, you will do so with much more reluctance and it will be much more mentally taxing.

Non-DRY code is by definition complex: You now have more interconnected parts than you need. So really, if you make your code more DRY, you _simplify_ it.

dahart · on Aug 29, 2023

My favorite counter-acronym to DRY is WET: write everything twice (or thrice!). Doing and then redoing it once you understand it better is the best way to learn how to apply DRY correctly.

> Non-DRY code is by definition complex: You now have more interconnected parts than you need. So really, if you make your code more DRY, you _simplify_ it.

It really depends, I think there are some assumptions here that could use clarification. The whole point of choosing duplication is to disconnect parts that shouldn’t be connected, so I don’t understand what you mean about non-DRY code being more interconnected than duplicated code. Conscious duplication (often called “forking”) allows people who depend on a piece of code to change it without breaking anyone else. When you merge two pieces of similar code, they already had two or more separate uses, and you’re adding a new connection, tying together the fates of two or more different users. From now on, if they don’t have exactly the same agenda, there will be tension and/or bugs.

If deduplication requires adding an abstraction layer, then that absolutely is adding complexity, and it happens because the code being de-duplicated was not exactly the same. Code that’s truly duplicated doesn’t need to change in order to de-duplicate. So you can delete a copy in that case and centralize the dependencies onto the remaining copy. That eliminates code but doesn’t really simplify; it has the potential to simplify future development, but it doesn’t simplify the code at the moment of deletion. With modern build systems and project structures, however, it might take a lot of work and it might add complexity to get the DRY code into the right spot where it’s visible to everyone who needs it. Another reason for duplication is to avoid having to do backflips to get the code into the right file or scope.

dgb23 · on Aug 29, 2023

> Conscious duplication (often called “forking”) allows people who depend on a piece of code to change it without breaking anyone else.

Then that code is DRY by definition and simpler.

It can’t be Non-DRY because that would imply you’d need to change things in multiple places at once in order to avoid breakage.

If you have two separate parts that can evolve independently, without coordination then those parts don’t represent the same piece of knowledge.

dahart · on Aug 30, 2023

Well all repeated code can evolve independently too, so by that definition, how can non-DRY code exist?

This discussion is a bit too abstract and losing meaning as a result. I think you’re glossing over the very real scenarios of code (“knowledge”) that’s similar but not the exactly the same; of copies of code that are similar being merged; of code that’s exactly the same being intentionally forked and “repeated” and placed nearby; and others. Repeated code definitely can be the same piece of “knowledge” and exist in multiple places without coordination. After that the copies might drift in different directions, and the DRY dogma says this is bad. This is what people mean when they talk about DRY principles, the general idea to look for and merge similar code, and to prevent multiple pieces of similar code from being able to evolve independently.

DRY in fact doesn’t really come up if two pieces of code are exactly the same; then it’s pretty obvious and easy to factor it. It’s a non-issue. The reason this is a concept we talk about is because bits of code are similar but not the same, or they’re they same but people need them to be modifiable without risk.

I don’t know about using the word “knowledge” to represent processes with dependencies that change over time, it doesn’t seem like the best terminology for this discussion. Specifically, the word “knowledge” tends to imply a timeless static quality to the code. In reality all the of issues worth discussing here, all of the problems with DRY, and all of the benefits of DRY, relate to how code changes over time.

Your point about similar but not the same code being different knowledge actually kind-of illustrates exactly why merging similar bits of code is dangerous, precisely why you shouldn’t just apply DRY principles blindly, and why sometimes rejecting the idea of DRY is appropriate in a given situation.

dgb23 · on Aug 30, 2023

The point that I wanted to make above is that DRY has a precise meaning and we seem to talk past each other because you use a much looser definition of DRY than I.

With that loose definition we think in terms of arbitrary similarity and duplication. And I absolutely agree with you that it's dangerous to use abstraction to patch over this perceived similarity.

Let's look at a trivial example of applying DRY correctly:

Say we have a database schema with a column representing some URI of an entity like "person/12" which encodes ":kind/:id". It's trivially obvious that this should be a computed column, which is derived from other columns that represent the "single, unambiguous, authoritative" pieces of knowledge of these values.

(It's no coincidence that DRY applies neatly to data models. The Pragmatic Programmer, which coined the term, uses data models to explain the principle as well).

Really what DRY teaches us is to have a separation between authoritative knowledge and derived knowledge. Typically we're talking about data representing information here, but it can just as well be a piece of logic that is used to validate user input that you better have in just one place (think of the fun that you would have otherwise...) and so on.

Of course we need derived knowledge all the time, but we want to treat that very differently. We use things like code generators, cache invalidation, materialized views, macros, temporal databases and so on in order to protect ourselves from having to coordinate derived knowledge in tandem with authoritative knowledge.

And it's not just IT related. Even game programmers and compiler writers like to use DRY code in order to reduce coordinating (and growing) memory allocations, because it's often faster to compute values on the fly from a cached array, than to fetch memoized values.

> I don’t know about using the word “knowledge” to represent processes with dependencies that change over time, it doesn’t seem like the best terminology for this discussion. Specifically, the word “knowledge” tends to imply a timeless static quality to the code. In reality all the of issues worth discussing here, all of the problems with DRY, and all of the benefits of DRY, relate to how code changes over time.

I fully agree with the latter statement here. Thinking of how code changes over time and the implied coordination, is like a razor that we can use to determine whether some piece of code is violating actual DRY and whether we should do something about it.

tcfhgj · on Aug 30, 2023

> My favorite counter-acronym to DRY is WET: write everything twice (or thrice!). Doing and then redoing it once you understand it better is the best way to learn how to apply DRY correctly.

I don't like it. It's so much worse than DRY. Imagine you have something that is duplicated. Instead of deduplicating it, you leave it. Later at one location you make modifications, like rename a variable, introduce an optimization, whatever.

Now, if you don't explicitly remember, that duplication is hidden.

Months later you need to change or use the functionality somewhere else. How likely is it you'll edit it just in one place or introduce another duplication?

Now let that run for a few years and you'll have this all over the code base.

Sounds nice

dahart · on Aug 30, 2023

Sounds bad, the only problem is it’s a straw man imaginary scenario. Write everything twice isn’t a call to leave repeated code around at all, it’s a call to learn by doing rather than believe that dogmatic principles like DRY can get you there the first time. The idea behind WET is to use the right tool for the job, and acknowledging that you probably won’t know what the right tool is until after you’ve tried doing the job. DRY is the right tool for some jobs, but not all jobs.

There are downsides to duplicating code, and there are downsides to merging code. I’ve seen examples of such downsides in both directions in practice. The main downside of DRY I was trying to point out in other comments is that multiple dependencies on code adds additional complexity and risk regardless of the quality of the abstraction. A lot of people here are arguing the quality of the abstraction is what matters, but that’s only sometimes true. And in the cases where eschewing DRY is called for, it often has nothing to do with the quality of the abstraction.

tcfhgj · on Sept 4, 2023

Your interpretation of wet has nothing to do with wet.

hinkley · on Aug 29, 2023

For testing I prefer DAMP. Descriptive And Meaningful Prose/Phrases. I’ve watched otherwise smart people wrestle with testing boilerplate when requirements change and I’ve had my fill for this lifetime.

Each test is a separate story. At most tests in a suite should share setup code. Anything more than that is coupling of tests, which is a no-no. The distinction between mocks and fakes are the most common place I see this blow up in our faces. Fakes result in coupling of tests. They were difficult to write so they get amortized across ten tests, making new requirements difficult to impossible to add without accidentally removing coverage of other requirements.

Ensorceled · on Aug 29, 2023

I have a current example where this bit my team.

A fairly common pattern that I've seen over and over in multiple domains is this:

    Given a group of of "things" with a start and stop date,  list all the things that are "active" during a given date range.

Some one abstracted it because we have several "things" that use this logic.

Then it had a bug because some of the things are inclusive and some are exclusive.

Then it had a bug because some of the things use dates and some timestamps.

Then it had a bug because some of the things are timezone aware and some are not.

So we started down the path of a rather simple query construction becoming a complex thing with flags for inclusive/exclusive for start and end, timezone settings ...

williamdclt · on Aug 29, 2023

> So we started down the path of a rather simple query construction becoming a complex thing with flags for inclusive/exclusive for start and end, timezone settings ...

Forcing the caller to _think_ about inclusivity and timezone awareness is not a bad thing, rather the opposite. These are important decisions to be taken: the abstraction is not trivial because what it abstracts actually does have inherent complexity.

If the abstraction forces you to take the necessary decisions (inclusive? timezone?) without having to think of how to implement them, it doesn't sound like a bad abstraction. Too often these decisions are not thought about, and the expected behaviour is "whatever is implemented".

wvenable · on Aug 29, 2023

Who knows whether some things are inclusive or not? Who knows what use dates and timestamps? It seems like this should be abstracted somewhere and this knowledge codified in one single place. It sounds like your abstraction, in this case, isn't very abstract at all.

That is common for bad abstractions -- they add a layer but they don't actually encapsulate any knowledge. To use this abstraction, you shouldn't be passing any flags for inclusive/exclusive, etc -- it should know that for you.

IshKebab · on Aug 29, 2023

That sounds like your code just isn't properly typed.

For example in Rust the first bug would be caught by `Range` vs `RangeInclusive`.

The second bug would trivially be caught because dates and timestamps are different types.

The third is trickier, but (depending on exactly what you mean) that can be caught with static types too.

Pointing your finger in the wrong place IMO. If anything this refactoring highlighted worrying inconsistencies in your code that probably would have cropped up as bugs elsewhere.

willio58 · on Aug 29, 2023

Great example. One way to avoid these problems is having lots of tests written for the various uses of the abstracted thing so you know they’re all covered. But also, if all of these things function in different nuanced ways, is it really any benefit to have them all jammed into the same abstraction in the first place? I’ve found this comes down to personal taste. I prefer a little duplication if it means not having to “own” an abstraction that I’ll need to heavily document and hope people read the documentation for in order to not break. But some would rather own the one point of failure.

alphanumeric0 · on Aug 29, 2023

Sounds like each thing should know how to search for active instances of itself, given a date range, which is a common OO abstraction.

marcosdumay · on Aug 29, 2023

One should at minimum name the things that behave differently by different names, what is a common practice in data modeling.

I expect all those bugs to return again and again as different people maintain that code. At least with code deduplication they would have a clear alarm telling them their knowledge is wrong and they must pay attention. But with each query doing everything people will just assume they know it all.

chiefbucket · on Aug 29, 2023

The point isn’t the interface though it’s the implementation. And if many of those things are implementing the same search functionality slightly differently, you’re back to the same spot, except now your bugs are spread across multiple sites, often with duplication.

The underlying issue is just that correctness is hard I think.

yafbum · on Aug 29, 2023

Let me give an example bad abstraction that isn't due to littered conditionals, but still very bad.

One time company A had a database, and code that loaded persisted object state from them. Some of the objects could be soft deleted. Rather than check various objects for soft deletion, the team decided to check all objects for soft deletion, regardless of their type, by querying a table where objects had to be listed if they were still live (not soft deleted).

Fast forward a few years, everybody follows this pattern, and there is massive hotspotting of that central "object lifetime" table that has basically two columns (object_id, is_deleted) that becomes a latency bottleneck because absolutely everything is joining on it all the time.

Truth is, it made it convenient to code with this, because you never had two ways of checking whether an object was live, and by construction you could never make the mistake of operating on a soft deleted object or forgetting to implement lifecycle deletion.

But man was that a poor abstraction. It was probably redundant with database functionality. It gave soft deletion capabilities even to things that didn't need soft deletion. It had a significant latency cost. But everybody adding a new object type just picked it because it was the way the company has decided it would do soft deletion.

mannykannot · on Aug 29, 2023

I feel you are describing an implementation that was once fine but is no longer satisfactory, rather than an abstraction, which perhaps could have been made easier to fix with a bit more abstraction: a function to do the soft deletion if possible, with a better-performing (albeit probably more complex) way of determining whether soft deletion was an option.

harrisonjackson · on Aug 29, 2023

> Fast forward a few years

Sounds like it was just what was needed at the time and worked better/longer than most abstractions.

RandallBrown · on Aug 29, 2023

> Sounds like it was just what was needed at the time

The problem I've seen often in codebases is that as an abstraction or pattern grows more unwieldy, they don't take the time to update it.

They often don't get revisited until they're so bad that they can't be ignored.

Handling a something with a switch or if/else is fine if there's only 2 or 3 options, but people will often just keep piling on. When it's 10 things, changing it becomes much more work so people will continue to add to it. Then when it breaks at 20 things, someone will come in and say "Why did we write it this way in the first place? It doesn't make any sense!"

I'm often torn between pragmatically writing the simplest code possible and being proactive about abstracting early to prevent an eventual breakdown of the pattern.

Pannoniae · on Aug 29, 2023

How does a switch break at 20 items? Any respectable compiler or interpreter should handle that fine. If it was 32k cases, I could imagine why it would raise an error. But 20? Seriously?

Often, writing more cases into a switch statement is way easier and less boiler-plate-y than abstracting it out to subclasses or a dictionary or whatever.

edgyquant · on Aug 29, 2023

Why is a switch bad? Python uses a giant switch statement to run it’s opcodes

dathinab · on Aug 29, 2023

I do.

I have seen ton of time wasted due to the wrong abstraction.

Through it's a question about how much and what you duplicate.

Which means I somewhat partially agree with the articles which is more well nounced then the title implies.

One of the most common case of bad de-duplication is doing so with code which happens to be mostly same but there is nothing on a business logic pov which makes it the same.

Or code which differs mainly in points which the language used needs a lot of complexity to abstract over.

In my experience having a more power full type system, like in Scala, Haskell or Rust one one side has the benefit of making the refactoring much less bug prone, but also are easier to go into the "abstraction introduces too much complexity" territory. In the end using a type system _appropriately_ is a skill. One some which some technical very skilled people are missing.

Through what I also realized is that with strict type system a "top down abstractions" using e.g. custom traits/interfaces/abstract classes tend to be much more likely to cause issues compared to composite bottom up abstractions using closures to fill in the missing part. Sadly this kind of abstraction while simple in the simple case are also prone to need some limited degree of higher kindred typing in the less simple case. This is putting limits on how much you can practically apply them in many languages (or it accidental becoming to complex due to missing intuitive notation for the limited higher kindred type parts needed).

Through the most important thing for many projects is to make the code easy to change. And with this I mean changing the source code, not having complicated abstractions allowing you to use the same source code in many different ways even through you only do use it in one way at any point in time.

hinkley · on Aug 29, 2023

There are two situations I’ve observed where Sunk Cost Fallacy reliably doesn’t kick in. One is three line functions and unit tests. The other is duplicated code. It’s better to err on the side of mistakes that people don’t get precious about fixing later.

A lot of the arguments I have with coworkers end up being about friction and blind spots about friction. “You” think these things don’t slow you or others down later, but I have a bibliography of incidents that say you’re wrong. Wishful thinking is married to magic thinking, and they have a child named “mortgaging the future”.

ekidd · on Aug 29, 2023

I dislike code duplication. But do you know what I like even less?

Giant functions with 12 keyword arguments passed up and down a call stack, because those functions have many callers which want slightly different things.

Choosing the wrong abstraction often leads to endless kludges and special cases. Two warning signs are functions with 12+ keyword arguments, and strange class hierarchies full of callbacks that only interact with a few functions.

The problem with all programming advice is that it needs to come with George Orwell's classic advice to "Break any of these rules sooner than say anything outright barbarous."

If programming advice makes your code look obviously gross, ignore the advice.

gpderetta · on Aug 29, 2023

Worse, for those 12 parameters (invariably booleans or if you are lucky enums) functions is that usually only a small subset of all possible flag combination is tested (or even meaningful). Worst part is when the flags are directly or indirectly under user (or configuration) control and the application can go into uncharted territory.

Worse still, good luck refactoring those functions when you have no idea which combinations are actually meant to be supported and what their original semantics where.

overgard · on Aug 29, 2023

Ah, but the solution is to turn that 12 argument function into a class that does one thing (runs the function), and dependency inject all those arguments. It still totally sucks, but you can pretend you're writing "clean" code by obfuscating the parameter passing.

CharlieDigital · on Aug 29, 2023

That's not what abstraction is.

Abstraction tends to shift logic from procedural to structural.

Rather than 12 keyword arguments and 12 branches in 1 big function, it should be 12 small classes (in OOP) or 12 small functions (in FP) that each handle one of the branches. All organized in some way that thelogic of executing those parts is shared in the structure of the code.

ekidd · on Aug 29, 2023

> Rather than 12 keyword arguments and 12 branches in 1 big function, it should be 12 small classes (in OOP) or 12 small functions (in FP) that each handle one of the branches.

I mean, sure, you could convert your library into 12 little classes, or a collection of purely-functional combinators. Sometimes that helps. Sometimes it makes the situation even worse.

Some of the most terrifyingly inappropriate abstractions I've seen in my career involved complex class hierarchies, or worse, things like "abstract interpretation over the free monad."

There's no substitute for asking, "Are these things I'm trying to abstract over actually similar in any fundamental way?" And "Is this code actually just horrible?"

CharlieDigital · on Aug 29, 2023

> "Are these things I'm trying to abstract over actually similar in any fundamental way?"

I'm not sure why this is even a point; if there's no similarity of the things that are being abstracted, why would one even discuss abstraction in the first place? The point of abstraction is that there is some fundamental similarity that the abstraction addresses. `ICloudStorage` abstracts `GoogleCloudStorage` and `AwsS3Storage` because at some level, they both have the same abstract operations: read, write, delete, etc.

dllthomas · on Aug 30, 2023

> if there's no similarity

The point is that there may be superficial similarity. Code that happens to look the same, but represents different "pieces of knowledge" that are likely to change independently probably shouldn't be unified.

peeters · on Aug 29, 2023

> I don’t see how it can be said, without qualification, that duplication is cheaper than the wrong abstraction.

I mean, this statement IS qualified. The word "wrong" is doing some heavy lifting. Part of what makes an abstraction wrong is when it is expensive to use as tiny differences emerge in the requirements.

hinkley · on Aug 29, 2023

It’s also wrong when active epics contain a third implementation of the “same” pattern.

It’s been a while since I’ve seen as much time wasted as trying to a tract the second implementation only to be proven wrong by the third. So instead of being for instance 8, 8 and 16 points to implement, it ends up barely squeaking by as 8, 16 and then 16 again.

It’s one thing to fight the Rule of Three for things that might happen. It’s quite another when it will happen.

ants_everywhere · on Aug 29, 2023

The wrong abstraction can completely destroy a startup. I've never seen duplicate code with that ability to cause damage.

Or consider the centuries humans spent trying to make geocentric astronomical models work.

edgyquant · on Aug 29, 2023

You’re lucky then. I joined a company that had a team of inexperienced engineers where every form or details page was a separate program and the render functions were several hundred lines long by themselves. When I joined they had a dozen pages that were each so buggy adding new ones was nearly infeasible and fixing bugs took most of the dev time. Duplicate code can certainly slow down the dev process and kill a startup.

waffletower · on Aug 29, 2023

I have seen much damage from duplicate code at multiple organizations. I have seen thoughtful abstractions work successfully to mitigate it, and rarely encountered the opposite. I have encountered multiple perjoratives: copypasta coders, couch developers et al.

jmull · on Aug 29, 2023

The key factor when de-duping some code is to know whether the code is the same because they express the same abstraction or due to coincidence.

If they are the same abstraction then they should always be the same and you're doing the right thing to de-dupe.

If they are the same due to coincidence de-duping will tie together things that should be independent. As development continues the implementations will need to diverge. That's when you get the rat's nest of conditional logic. It's a lot easier to add a parameter and conditional logic to a function than rip it out.

It's not always easy to tell if two bits of code are the same due to coincidence or not... it might come down to nuances of business considerations that the developer has no idea about (or, since we're talking about predicting the future, no one knows about).

I don't think it can be done perfectly. But it's worth considering why not to de-dupe before you do it.

kevincox · on Aug 29, 2023

I was looking for this. There are definitely two types of duplication. For example not every use of the number 16 should be replaced with a SIXTEEN constant. However if the maximum allowed password length is 16 you shouldn't be writing 16 all over you code, you should be writing MAX_PASSWORD_CODEPOINTS because your system may depend on that value being consistent.

Although I would disagree that you should never deduplicate things that are coincidentally the same. Sometimes code that is coincidentally the same can have the same bugs and require the same updates over time, so deduplicating them can reduce maintenance cost and remove bugs. However I wouldn't race to deduplicate these things. Just if they become frequent patterns or have remained the same for long enough to justify the effort to unify them.

aleksiy123 · on Aug 29, 2023

I think this is the biggest thing people generally misunderstand about "duplication".

It's really about if two concepts need to change together over time. They should be singularly defined.

If they can move independently they should be two definitions.

It's not literally about the code looking the same.

thecodrr · on Aug 29, 2023

The author is going into technicalities without much actual substance, ending with: it depends.

I think whenever we, as programmers, try to pin down a certain principle, it bites us. Hard. DRY was cool as an observation but when it got turned into a law we saw the spaghetti code.

Duplication, on the other hand, is detested almost as much as the goto statement. Let me tell you, it's not that bad. Duplicate code makes everything more flexible. It helps you to NOT bend over backwards in order to change a line of code. It allows you to NOT touch anyone else's code.

So many good things. Of course, I agree with the author's summary of the bad things that can happen with duplicated code. But there's a litmus test for that:

If you have to make changes in multiple blocks of duplicated code in order to change the behavior of something, there's a problem. DRY out the code so you only have to touch 1 place.

If, however, 2 blocks of code LOOK similar but aren't actually the same, and changing one block doesn't make the other block outdated and stale, you are good to go.

Judge and decide. It's just 2 approaches that when taken to an extreme can cause a lot of pain, but if used with common sense, nothing is simpler.

overgard · on Aug 29, 2023

> Duplication, on the other hand, is detested almost as much as the goto statement.

Honestly, even the goto statement isn't that bad. It's pretty useful in C code. I'm not saying anyone should put it in a new language, but the amount of hate it gets is really just related to BASIC monstrosities from the 1970s, not any real-world applications of it.

mcqueenjordan · on Aug 29, 2023

The crux of the argument is:

> I think “the wrong abstraction” is a confused way of referring to poorly-de-duplicated code.

But I believe this is similar to a no true Scotsman fallacy. “If you just make the right abstraction, de-duplicating is fine!”

Yes, if you’re good at making the right abstraction, it’s not worse! Those are the cases when I definitely do the refactoring: when I know for sure I know the right abstraction. Otherwise, I defer the decision for an older, smarter, wiser me (or future maintainer).

jkubicek · on Aug 29, 2023

It is the "no true Scotsman" fallacy.

twodave · on Aug 29, 2023

I have been down both roads. I’ve seen unwieldy abstractions reduce a codebase down to a giant pile of edge cases, and I’ve seen codebases where making a single change to the design has required editing dozens of files. Where I’ve ended up over the years is to abstract the “big” things. The types that represent your domain. The pieces of the data layer that need to be exactly the same every time. After that, solve for large classes of problems. This may be an abstraction, a usage pattern, or just a function. Transaction management, logging, etc.

Know that if you try to wrap ANYTHING in an adapter “in case we want to swap it out later” that this almost never happens, and when it does the abstraction you came up with is probably inadequate. Transaction handling in one tech is different than another. Or logging context is handled via disposable scopes instead of as part of the log entry. For those cases, if someone isn’t already maintaining a good abstraction (like MassTransit) then it probably doesn’t exist.

arein3 · on Aug 29, 2023

It depends. If it's not some core area of the code, but more like a script, some code that lives at the periphery, it might be better to "duplicate" almost similar code that is hard to abstract.

I saw attempts to remove "duplication" that made the code so hairy and hard to read, as opposed to very readable. I put duplication in quotes, because code might be similar, but not 100%.

Some code is easy to deduplicate.

Some code might be hard, and if the overengineering is done to remove 2 occurrences at some code periphery, is not worth it.

bob1029 · on Aug 29, 2023

Duplication is a superpower if you can put your OCD into a box for a little bit and frame it as a temporary stepping stone.

Refactoring nightmare codebases can become trivial if you don't mind a few copies of "the same thing" being kept around to satisfy serializers and other legacy APIs. Writing mappers between nearly-equivalent types sucks really hard but it still sucks a lot less than saying things like "lets just rewrite the whole product".

preommr · on Aug 29, 2023

Duplication is cheaper because of how most programmers write code at their job:

- write stuff as fast as possible, without having time to think about overall architecture, especially if it involves having to cooperate with other devs. It's easier to just implement something that's as quick and as simple as possible so that it can be passed off to someone else with minimum effort.

- no need to communicate the abstraction semantics - no need for documentation outlining the abstraction, reasoning, possible expansion, etc.

- it's much easier to make localized changes. A well written abstraction will cover some logic that might be spread across multiple areas. Changing something major to the abstraction requires understanding how the abstraction affects all it's applications in the same ticket. Whereas duplicated code can result in a ticket being resolved by just making a change to a specific code block, like a function.

- Things that work well aren't appreciated. If it's easy to update an abstraction to a new feature, it'll be the expected outcome. When a change like the previous point is needed, it's much more memorable because of the frustrating experience is likely to be longer and more strenuous. We also tend to remember negative experiences over positive ones.

- Abstractions require reading more code with additional levels of indirection and devs don't like reading other people's code.

- Writing things well requires effort, so bad abstractions are more likely.

- More mature projects tend to have more abstractions because of their additional complexity, so I would guess that there's a strong correlation between difficult projects and frequency of abstractions.

- Some people went absolutely nuts with writing blogs posts, and evangelizing certain techniques which were completely unnecessary in an effort to push out content. There's lots of things to write about on implementing abstractions. But little in the other direction other than don't write unnecessary abstractions.

AnimalMuppet · on Aug 29, 2023

The flip side is, duplication is bad because when you find a bug and fix it, did you fix it everywhere? How many places were there where the bug needed fixed? Are you sure you got them all? It's much easier when there's only one place that you have to fix.

preommr · on Aug 29, 2023

Ticket closed, case closed.

And I am only half-joking about that. I don't think that effort is that visible and often goes unrewarded. I feel like a lot of managers don't directly, but indirectly use number of tickets closed as a sign of productivity which affects promotions and compensation.

Obviously YMMV, but teams that care about their code quality to such an extent are less likely than places that act as ticket factories.

ozim · on Aug 29, 2023

Flip side of the flip side is bad because when you fix the bug in "abstraction" or de-duplicated piece of code: how do you know you did not break something you don't know.

Duplication is easier because once you fix that single place - you are 100% sure you fixed that place and you did not break 10 other places. Maybe you know "by writing unit tests", but when you write unit tests, when you find out you broke something.

Funny story time: had an add/edit popup in system because they looked the same so dev just made it "single thing". Something like 3 months dev1 fixed something -> qa2 found a bug X -> dev2 fixed something -> qa1 found a bug Y -> dev3 fixed something -> qa 3 again found bug X. When I got into code base I noticed that ping-pong because somehow I was only sane person to check git history and I split things up. Something like that happened multiple times in my career

mfitton · on Aug 29, 2023

I'm getting shot in the foot by this right now as our team embarks on tackling some long-term tech debt.

The approach we've found that works is health checks and manually looking into cases when we think we've fixed a bug, as it will often point us to a piece of duplicated code we missed that we can wrap into the fold.

postalrat · on Aug 29, 2023

Or maybe it's a bug in 80% of the cases and not in the other 20%.

hooverd · on Aug 29, 2023

Maybe that bug in that abstraction was actually load bearing.

olingern · on Aug 29, 2023

I passionately disagree with this. Abstractions inherently introduce some level of opaqueness and it's only useful in the context of making things more maintainable. Duplicated code is easier to reason about because its intent is closer to the problem it originally solved.

nraf · on Aug 29, 2023

> for example, the same several lines of code duplicated across distant parts of the codebase dozens of times, and with inconsistent names which make the duplication hard to notice or track down

While I think there’s merit in deduplicating these situations, one pitfall is introducing coupling and tangled dependencies when DRYing.

There are ways around this of course, but I’ve come across a number of instances where deduplication has led to unnecessary coupling between modules.

joshstrange · on Aug 29, 2023

The whiplash I get from reading this article is massive. One second they agree that bad abstraction (filled with conditionals) is bad but then say:

> So instead of “duplication is cheaper than the wrong abstraction”, I would say “duplication is cheaper than confusing code littered with conditional logic”. But I actually wouldn’t say that, because I don’t believe duplication is cheaper. I think it’s usually much more expensive.

(emphasis on the last sentence)

I couldn't disagree more. In fact it's an incredibly "junior dev" mindset that sees 2 pieces of similar (or _even identical_) code and is compelled to abstract it. Unless there are at a _minimum_ of 3 implementations I think it's always better to duplicate. I've watched too many "common" functions grow over time with way too many arguments, too many conditionals, and way too confusing for anyone to easily follow. The most egregious is different return values based on arguments passed in. I'm not talking "array of strings" or "null" but "array of strings" or "single string" (or worse).

Abstraction can be fun to write and it feels like you are doing something to help "future proof" (also XKCD 927 [0]) but in reality it boxes people in (especially if you try to abstract with less than 3 real implementations) and leads to overly complicated code, or worse "clever" code.

As I've grown as a dev I'm less and less inclined to write "magic" or highly abstracted code and prefer dealing with"boilerplate" that I can tweak as needed for the individual use-case. Only once I have a clear pattern of code that's been deployed and used for a good bit of time do I reach for abstraction/reusable code.

[0] https://xkcd.com/927/

mostlylurks · on Aug 29, 2023

> I've watched too many "common" functions grow over time with way too many arguments, too many conditionals, and way too confusing for anyone to easily follow.

This is not the fault of the abstraction. This is the fault of (especially junior developers) treating abstractions as sacred and non-disposable, which is itself the result of a mindset in which creating abstractions is discouraged. You should almost never modify an abstraction. Don't modify abstractions to cover new use cases, and you more or less won't run into any of these issues. If you need to, create new abstractions and throw old ones away.

> Unless there are at a _minimum_ of 3 implementations I think it's always better to duplicate.

This is a silly rule to follow, except for the most inexperienced of developers, perhaps. It doesn't take long to gather enough experience to know be able to recognize in most cases whether some instance of duplication is coincidental (structurally similar by happenstance, which could be "abstracted" in a macro-like manner, resulting in something quite fragile to changes) or if you're actually encoding some piece of knowledge into an abstraction. Advice like waiting until a piece of code repeats three times encourages developers to think about abstractions in terms of structural similarity, which is exactly the opposite of how abstraction should be considered.

joshstrange · on Aug 29, 2023

> This is a silly rule to follow, except for the most inexperienced of developers, perhaps.

Perhaps you'd consider me inexperienced though I don't consider myself to be so. I've learned enough times that neither I, nor my colleagues, can accurately predict the future and every time we think we know the cases that code will need to handle in the future we guess wrong more often than not.

What I'm trying to say is until you are sure a piece of code is literally the same or with tiny differences that you can cleanly abstract you shouldn't try to guess how future code will use the abstraction. It's the same rule of mine where I try to never proactively add functionality to a function/piece of code. You think that you are saving your future self (or peers) time but too many times I've see people guess wrong at what extra functionality we will need and then that code never gets touched and/or gets migrated/updated for years before someone realizes there is no calling-code that uses that functionality but we have been dragging it along this whole time.

Could you check everywhere and make sure it's not being used and thus can be removed? Maybe but I understand the desire to make as few changes as possible and preserve the functionality as it was when you first went to edit the code. Overall that's a good idea when making changes and sometimes you don't always know what params all the clients are passing to an endpoint to be sure of if something is still in use or not.

jamil7 · on Aug 29, 2023

> I couldn't disagree more. In fact it's an incredibly "junior dev" mindset that sees 2 pieces of similar (or _even identical_) code and is compelled to abstract it. Unless there are at a _minimum_ of 3 implementations I think it's always better to duplicate. I've watched too many "common" functions grow over time with way too many arguments, too many conditionals, and way too confusing for anyone to easily follow. The most egregious is different return values based on arguments passed in. I'm not talking "array of strings" or "null" but "array of strings" or "single string" (or worse).

I agree with you here and tend to rather, if possible, deduplicate subsystems or sub-functions of similar looking/identical code and keep the duplicate public surfaces.

joshstrange · on Aug 29, 2023

> I agree with you here and tend to rather, if possible, deduplicate subsystems or sub-functions of similar looking/identical code and keep the duplicate public surfaces.

Completely agree, take the small parts that are standalone/discrete and abstract them. I greatly prefer something like

    function1() {
        commonCode1();
        commonCode2();
        commonCode4();
    }

    function2() {
        commonCode2();
        commonCode3();
        commonCode4();
    }

Over something like (assume I've inlined the commonCodeX logic):

    functionCommon(branchingParam) {
        if(branchingParam) {
            commonCode1();
        }
        commonCode2();
        if(!branchingParam) {
            commonCode3();
        }
        commonCode4();
    }

jkubicek · on Aug 29, 2023

> As I've grown as a dev I'm less and less inclined to write "magic" or highly abstracted code and prefer dealing with"boilerplate" that I can tweak as needed for the individual use-case.

This is part of creating abstractions to benefit the reader, not the writer of the code.

I'm currently refactoring a python package that was designed to make writing ETLs very elegant (it worked!), but as a consequence, when something goes wrong, figuring out what happened involves pouring through 4 different modules, class hierarchies and trying to track variables through multiple layers of abstraction. It's a nightmare for debugging.

Simple boilerplate is repetitive and boring, but man would it be so much easier to read

joshstrange · on Aug 29, 2023

> Simple boilerplate is repetitive and boring, but man would it be so much easier to read

Yep, and I'll fully admit when I first started out I hated this idea and wanted everything to be super-DRY but I've swung back in the opposite direction (or at least to a good mean). I had a developer ask why we had some boilerplate semi-recently when the function in question was simply calling another function on the parent class, why not just call the parent function directly (it was protected, they wanted to just make it public). I explained that yes, right now we were doing a straight pass-through essentially (this was for a CRUD layer) but that we had learned over and over that over time we needed to add in things like business logic, validation, or data migrations and this way we just needed to change our "intermediate" function instead of adding one later and having to change all the places that were calling the "direct" function. Same idea as with getters/setters, yes you don't "need" them always when you first write them but having those hooks are invaluable down the line.

waysa · on Aug 29, 2023

In a world of changing requirements it can be difficult to know what the right abstraction is going to be. I am happy to accept some duplication early in the development cycle until the requirements have settled. Only then it's possible to go back and refactor (which admittedly doesn't always happen in practice).

I believe duplication should raise eyebrows but it can be justified.

sergiotapia · on Aug 29, 2023

I disagree with OP. You cannot abstract after two or three dupes because you don't even know what you have or need yet.

Let it breathe, let it stink for a bit - THEN make an informed decision about what to refactor and abstract. You're just jerking off otherwise, and I hate working with code that's been abstracted early for no reason.

HumblyTossed · on Aug 29, 2023

A little bit of code duplication tends to be much less toxic than a lot of discussions on proper coding techniques.

NewEntryHN · on Aug 29, 2023

The author did not understood the idea.

His description of his understanding does not include any reference to the "wrong"-ness of abstractions that shouldn't exist. If I read him as-is, I should conclude that the idea is to never make any abstraction at all. It obviously cannot be it since that would be stupid.

"Wrong" abstractions are already bastardized, from their first iteration. Developers decide to code them nonetheless because they estimate that their "awkwardness" is worth it in comparison to code duplication. What they fail to realize is that, to the contrary that code duplication which just "is there", the awkwardness of the abstraction will compound.

Duplication is the last resort, when one has established that he couldn't find any non-wrong abstraction.

andrewprock · on Aug 29, 2023

The underlying problem is that the "don't repeat yourself" principle is often in conflict with the "single responsibility principle". Structurally, this comes down to the problem of managing dependencies. Over the years, the problem of dependency management has become bigger and more difficult to tame.

The same problem holds for internal code as well as external code. Duplicating code creates one kind of dependency problem (feature drift). Shared code creates another kind of dependency problem (increased coupling). Broadly speaking, solutions which reduce coupling are going to be cheaper to maintain.

Ideally, there would be clear, well defined layers with narrow communication protocols.

opportune · on Aug 29, 2023

When you work in a very large and complex codebase you encounter a few things that this author doesn’t seem to consider or thinks are very minor:

1. Refactoring something introduces non-negligible risk. Consider a class with many fields and multiple mutexes it uses to control concurrent access to those fields. Even just consolidating those mutexes introduces the hard-to-conclusively-find-in-testing risks of introducing a deadlocks and livelocks. And that’s like the base case of refactoring the class: anything involving splitting the class up, moving data fields up or down the stack, changing the way member functions (which acquire locks) call each together is even more complicated and risky. It is just not worth refactoring this thing unless you have a very very good reason.

2. A function or object often has a many-to-many relationship in what it touches: it is called or accessed from multiple places and it calls and accesses many things. Non-trivial improvements to abstractions typically involve changes at both ends: which may be “as simple” as updating all the call sites to take a new argument or handle a different kind of error (hopefully all your call sites are structured so error handling is compatible with their abstraction!) or as complex as completely refactoring multiple levels up and down the stack to reflect better-abstracted semantics.

No you shouldn’t lazily copy-paste around such problems when they are straightforward enough. But it can so so much less work (and again, less risk of breaking things) to use composition + wrappers, or inheritance, or to copy some little chunk to code than to do things the “right” way.

3. Let’s face it, your cool new abstraction sounds right in your head, but in a complex system it may just be playing abstraction whackamole once all the bugs and edge cases you’re not initially considering get addressed. It may be impossible to fully understand the entire system from beginning to end, without which it’s hard to be confident you’re actually improving things before embarking on your epic partial rewrite, or at the very least know you’re not changing semantics around some arbitrarily-drawn box. But if you’re not even changing the semantics, see point 1.

janaagaard · on Aug 29, 2023

> My understanding of the “duplication is cheaper than the wrong abstraction” idea, based on Sandi Metz’s post about it, is as follows. When a programmer refactors a piece of code to be less duplicative, that programmer replaces the duplicative code with a new, non-duplicative abstraction.

I think one of the main takeways from Sandi Metz's quote is that you should postpone creating the abstraction until after you have the duplicated code. Sometimes you will remove the duplication when you have just two implementations, sometimes you will want many more. Once you have the repeated code it's relatively easy to make the right abstraction.

saurik · on Aug 29, 2023

As someone who has made a good life over the years by taking advantage of the security bugs (either to build my embedded empires--aka, jailbreaking--or to directly collect bounties) caused by all of the people who hate abstraction so much (or are merely so bad at doing it that they don't know how to do it well) that they vehemently argue that duplication is not merely a temporary pragmatic decision to incur potentially-dangerous architectural debt which you intend to come back and fix later but is somehow better than even trying to address it, I guess I find this discussion thread of people almost 100% tearing into this article's fundamental premise... kind of fun? ;P

So, yes, yes: please do continue to ensure you have so much boilerplate in your "flat and easy to understand" code that you eventually make a fatal mistake (potentially simply while doing a merge commit), refuse to factor your safety checks out into abstractions that prevent you from making the same mistake twice due to your refusal to "obfuscate the underlying API everyone knows how to use", and (my true favorite) litter your code with multiple implementations of the same algorithms that have very subtle differences in them (so called "parser differentials") as you insist on every single programming language in use having its own copy of the algorithm "for ergonomic reasons, as IPC/FFI would be crazy when I can just import a second one off-the-shelf".

rightbyte · on Aug 29, 2023

How do you know the code you reverse engineered was flat and simple and not Best Practice with Scrum on top?

mannykannot · on Aug 29, 2023

"To me, an abstraction is a piece of code that’s expressed in high-level language so that the distracting details are abstracted away. If I were to see a confusing piece of code littered with conditional logic, I wouldn’t see it and think “oh, there’s an incorrect abstraction”, I would just think, “oh, there’s a piece of crappy code”. It’s neither an abstraction nor wrong, it’s just bad code."

Of course, if bad code is not an abstraction, then there can be no such thing as a bad abstraction!

More to the point, code littered with conditional logic might well be both good code and a good abstraction. There's a somewhat well-known article out there claiming that Netscape shot itself in the foot by deciding to rewrite the browser from scratch. As an example of how that went wrong, the author mentions the hapless developer trying to write code to work with some hardware component (the great many different dial-up modems that were out there at the time, IIRC), discovering that most of them had unique quirks that had to be respected, even when they nominally conformed to the same spec.

The thing is, you can no more apply abstraction to a program until everything is simple than you can apply compression to a file until its down to a byte. What's really at issue here, as Fred Brooks noted many years ago, is the difficult problem of satisfying the demands of the context's essential complexity while keeping a lid on the implementation's accidental complexity.

tikhonj · on Aug 29, 2023

There are a lot of ways for good code to express bad abstractions. The abstraction could be inconsistent with other parts of the system, inconsistent with the concepts it is meant to represent, inconsistent with its own observable behavior, inherently complex or hard to reason about, inconvenient to actually use, poorly suited to whatever people actually use it for...

I've seen a lot of code that is perfectly clean and "well-organized" as code but organized into absolutely awful abstractions.

None of that goes against your core point, I just think that seeing the code and its abstractions separately is an important perspective for understanding code design.

On the flip side, it's also totally possible to have bad code but a good abstraction. Some of the best abstractions I've worked with have painful implementations, and it didn't impinge on the quality of the abstraction itself! Of course, the bad code made life a lot more painful for the people responsible for implementing and maintaining the abstraction, and I'm sure it required some real skill and experience to keep that from manifesting to users of the abstraction, but they managed it.

aeturnum · on Aug 29, 2023

I think the core of the difference can be found in the What exactly is meant by “the wrong abstraction”? paragraph. Admittedly, the quoted article is also a bit confusing here, but I think it's easy to resolve.

I think the wisdom of the original saying is hard to understand when you just look at any piece of code as it exists. Instead, imagine the future. You have two pieces of code that do similar things - you can centralize them (with a bunch of conditionals) to have a "single" code path, or you can allow them to stay separate (perhaps confusing new people). The wisdom of "duplication is cheaper" is to observe that it will generally be less work to allow the duplication than to maintain the circumstantial needs over time. Each time you need to "do the same thing again but a little different" you can either add more conditionals to a single piece of code, or add another instance of 'duplication' which can just deal with the concerns at hand. It's not about "crappy code" - it's about the difficulty of having one piece of functionality serve many masters over time.

IMO, in general, you will also find that if you have many 'duplicated' copies of code, it will often be easier to see the truly duplicated sub-sections that you can DRY out into a common subroutine. I find that is easier to see with duplicates than with a single piece of complex code.

stephc_int13 · on Aug 29, 2023

Software architecture is a domain where hard and fast rules don't work.

This is all about understanding tradeoffs and nuance.

In general, I believe that abstractions should be used with moderation, de-duplication is not always an improvement, especially in the long run.

I've made this mistake a lot as I tend to be quite obsessive with so-called code "cleanliness".

It is good that novice programmers are warned about the dark side of abstractions, but ultimately they'll have to experience it by themselves to fully grasp why and how they can be detrimental.

Strilanc · on Aug 29, 2023

One of the major shifts in my coding style over the past ten years has been to increase the amount of duplication. My threshold for "I should really dedupe that" increased from ~3:7 lines to ~10:50. Looking back this was driven by two main factors: testing and performance optimization.

The testing side is just that tests become awful much faster than normal code if you dedupe them. Unit tests are supposed to be simple and independent, but deduping makes them correlated and complex. You think you'll make things simpler by extracting the common setup from twenty tests into one method, but instead you've coupled the tests so they can't individually be tweaked and laid the seeds for a monster incomprehensible test object to grow from.

The performance side is that often improving performance requires removing abstraction layers so everything is in one spot, allowing irrelevant cases to be removed. Adding the abstraction layers ahead of time makes performance worse to start with, from all the jumping and "paper over one more difference" flag checking, and also makes performance improvements harder later.

If two things are supposed to behave analogously, I'm nowadays much more likely to enforce this by testing the analogy rather than by sharing the implementation.

linuxftw · on Aug 29, 2023

If working in a solitary codebase, this problem isn't very interesting. Do whatever makes your life easier.

If you're working on any kind of code that serves as a library to other code, don't mutate the signatures of your public methods/functions. Once that signature is released, the only changes to it's output should be bug fixes. If you have a need for two very similar functions, you should use 2 wrapper functions with the common code in the 3rd.