Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
50 Algorithms Every Programmer Should Know (Second Edition) (oreilly.com)
295 points by teleforce on Jan 3, 2024 | hide | past | favorite | 205 comments


Most of those you'd consume in library form. Most programmers shouldn't be writing their own lists, hash maps, b-trees, sort algorithms, etc. Or generally reinvent wheels. It's nice to understand how this stuff works and have a notion of complexity and other limitations. But, mostly we're just sticking things together that already works.

I've dabbled a bit over the years with reinventing wheels like this and it's fun. and of course I had an algorithms course when I did my CS degree, 30 years ago. But, I can't say I spend a lot of time obsessing about algorithm implementations. I mostly just use them in library form; just like everybody else.


I don't think anyone is suggesting that every programmer implements their own list every time they use a list. But it's useful to have an understanding of how common data structures work, how basic algorithms operate, and what kind of things are out there.

Writing software is much easier when you know that things like B-Trees exist or how breadth-first search works. How do you know where to find things like this in your library if you don't even know they exist?


> I don't think anyone is suggesting that every programmer implements their own list every time they use a list. But it's useful to have an understanding of how common data structures work, how basic algorithms operate, and what kind of things are out there.

And, when you need to optimize or handle special cases on critical paths, you may end up needing to implement these algorithms yourself.

Case in point: I once lead development of a major competitor to Dropbox. We couldn't use "off the shelf" queues and lists to manage upload and downloads, because there were too many special cases. I also had to implement my own immutable collections (until a library was released and we switched to it.)

But otherwise, 99% of the product used off-the-shelf list and queues under the hood.


Yep, following rules like "never ever use goto" or "use design patterns and OOP everywhere" makes no sense, if (typical) software development itself is not exact. Some folks mix up software development with being super precise and mathematical, but it's not that exact. So do repeat yourself, if the alternative is not worthwhile. As a concrete example, I don't need to download an engine (such as Unreal/Unity) to make a retro-styled 3D game. All I need are software rendering algorithms, SDL2 and maybe Rust. That's it.


That’s why the title is “should know”, and not “should reimplement by themselves for production use all the time”. As the book description says, “This book will help you not only to develop the skills to select and use an algorithm to tackle problems in the real world but also to understand how it works.” Note “select”, “use”, and “understand”, not “write“.


To be honest, normally I'd agree with you--there's a core set of important algorithms and data structures that pretty much every programmer should be aware of and understand well enough to be able to figure out how to look for them should they need to use it. And while I normally have a visceral reaction against "XYZ every programmer should know" because it's almost invariably far more detail than what every programmer should know, your basic data structures and algorithms actually are important enough to fall into that category, and there's a reason that a course on the topic is essentially course #2 in a CS curriculum.

But after reading the table of contents here, I'm thoroughly convinced that this book is not about that. About halfway through, I gave up trying to count the algorithms it was covering because it seems like it randomly decides it wants to be an AI textbook instead. Its focus on AI is weird; if you've got time for Naïve Bayes classifiers, where's convex hull? Where's simplex [1]? Fast Fourier Transform? Union-find? Constraint satisfaction? Hell, where's quicksort?

Another harbinger of quality: there's a section on "Choosing between MD5 and SHA". If you have an entire section on that in a book published in 2023, something is wrong. Then again, an algorithms book needing to have a section on that topic is itself... o_O-worthy.

[1] Okay, there's a section on linear programming. A section. Which, based on the section headings, seems entirely focused on "here's how to use a linear programming solver" and not anything on the actual algorithm itself. My algorithms textbook has an entire chapter on simplex, with instructions on how to code the algorithm itself.


Yes, I haven’t looked deeper into that particular book and really only wanted to respond to the general notion.


Thanks for the comment. What books would you recommend?


As noted by sibling, CLRS (https://mitpress.mit.edu/9780262046305/introduction-to-algor...) is perhaps the reigning king of algorithms textbooks.

I haven't read it myself, but Algorithm Design Manual (https://www.amazon.com/Algorithm-Design-Manual-Computer-Scie...) also tends to rank high on recommendation lists, and from looking at its table of contents, it does complement CLRS nicely--there looks to be a better selection of things like constraint satisfication or computational geometry.


Skiena is great, and is very different from CLRS --- I actually enjoyed reading through Skiena, where CLRS is a reference (and I'd just use the Internet instead of CLRS at this point).


CLRS remains canon.


Honest question: what's the point of knowing trivia on data structures and algorithms if you will never implement those or even be aware you're using them or not? In the real world you pick containers based on their traits, and you aren't bothered about how they're implemented under the hood.


I wouldn’t classify it as trivia to know the big-O complexity of an algorithm or to have some idea of the worst-case behaviors you may run into (like inserting already sorted data into a binary tree). You don’t need to memorize the source code, but you should at least be able to visualize the boxes and arrows in your head.

That said, this particular book is an expensive way to line your birdcage, and it should be an O(1) decision to reject it. Shame that ORA put their name on this trash.


> I wouldn’t classify it as trivia to know the big-O complexity of an algorithm or to have some idea of the worst-case behaviors you may run into (like inserting already sorted data into a binary tree).

It's trivia, and more often than not it's completely useless trivia.

Big-O complexity refers to asymptotic complexity which is only theoretically met with very large datasets which are already in the realm of a RDBMS. Until that point, performance is far more sensitive to cache locality and other implementation details. This means that all those trivia numbers don't even match with real world data you get from your own performance tests.

So what's the point of wasting time memorizing trivia that doesn't even match real-world observations?


Big-o of various data structures is typically trivia not worth remembering but it’s also easy to remember for the most common data structures.

For all the other data structure operations it up. But there’s a lot more to data structures than big-O and implementing it yourself is one way to expose yourself to those details (eg cache locality or other things). Crossing abstraction layers requires an understanding of what’s above and below an abstraction. It’s not trivial to have that knowledge and know how to apply it effectively.

It also is not relegated to RDBMS. It can come up when dealing with as few as 1M elements. For example, look up the Rockstar long loading time bug that applied an n^2 algorithm accidentally to something like a few thousand or hundred thousand elements.

But you seem to be pretty into the ignorance is a virtue mantra so not sure there’s any point in engaging further.


Big O notation is a description of algorithmic complexity, not “data structures” and it definitely isn’t trivia. One of the two biggest pain points I’ve dealt with with new devs (even experienced ones) is that they ignore time complexity and write god awful slow code.

(The other is adding network requests, eg database calls, in loops)


You’re being a tad pedantic and more annoyingly also wrong.

Big-o notation can also describe space complexity which is a property of the data structure itself and not the algorithmic operations. What I said was a catch all.

As for trivia, it 100% is when literally ever good data structure documentation says the big-o of that operation (+ a lot more details that can sometimes be more relevant). It’s trivia worth knowing for arrays and hash tables but not worth remembering for the various trees because there’s so many of them and they’re used so infrequently (+ implementation decisions can matter which is why reading docs is more important).

I don’t know what problem domains you are working with but algorithmic complexity is not the most common slow code reason I’ve seen new devs run across because the sane generic defaults tend to be relied on for that exact reason (+ code review by someone with experience will catch obvious problems)


> I don’t know what problem domains

The addendum about network calls in a loop makes it sound like his domain uses some kind of SDK with a bad API design. There are some notorious examples which are known to fool new and uninitiated developers who haven't read the fine print into writing code that looks perfectly reasonable on the surface, but does dumb things under the hood.

My experience with new devs is more towards obsessive concern with respect to time complexity, at least where isn't hidden behind a bad API design, fearing an algorithm with high complexity that when measured under its practical use would be just fine. It can be difficult for new (and even old) developers to grasp just how fast computers actually are.


I work in multiple problem domains as I lead teams on multiple projects across orgs. My comment about network requests is because one of the most common issues I see in inexperienced engineers is writing database code locally where they add a lookup and insert on some dataset inside of a loop. Regardless of language this is unacceptably slow when a real network is involved.


Yeah, a lot of database management systems have really bad API designs, often mirroring the underlying database API directly to the user over the network without any further consideration towards the network having different constraints.

Said database APIs are typically not unreasonable when the database is running in the same memory space as the application, where latency is unnoticeable, as historically was the case when a lot of these APIs were designed. But, as you know, the model breaks as soon as you find yourself in a high latency environment, like over a network.

So then you get some weird bulk operators bolted onto the system to work around the latency issue, but they don't fit the mental model of the rest of the API designed around the idea of single unit operations. Save catching it in the fine print, they go unnoticed. Indeed, the naive programmer who hasn't yet been burned will not be able to fathom that another developer could design such a bad API and will put faith in the idea that the underlying system will somehow automatically mitigate the problem you describe. And in small scale testing it works fine, so there is no reason to doubt that notion. That is, until it is too late...

The experienced developer has learned to stop trusting other developers and bring a heavy skepticism when using another's API. This is a blessing as it means they (usually) stop making those kinds of mistakes when they encounter a bad API, but it is curse as it means they see no reason to fix the problem. "Why don't the junior devs just know better?!" they say. And so, the cycle repeats.


A dev being able to tell you the time complexity of their code isn’t trivia it’s a very important tool to be able to document, discuss trade offs and generally plan for time complexity at scale.

I’ve worked and led teams in multiple problem domains. Devs not understanding how to optimize to avoid slow time complexity is definitely one of the first nuts you have to crack in a new engineer.


I don’t know. I generally haven’t seen these be a problem. I think I’ve seen accidental n^2 complexity a couple of times at most and it wasn’t code written by juniors but rather just an oversight in adding nested loops somewhere they didn’t need to be. Now I have seen juniors who couldn’t figure out what was n^2 and how to fix it when given it in an interview, but slowness and big-o being related is generally rare. As you mention poorly considered sequences of network calls are more common). Additionally, the relation between the big o of a data structure and the big o of the overall algorithm solving some higher level problem tends to be more decoupled except for some very specific problem domains. I’m a systems engineer having worked in OS, embedded, mobile, and distributed systems and have seen big-O analysis of the high level code documented about 0 times (and it would have been helpful about as many).

Again, not saying it’s irrelevant and it might be highly relevant in some problem domains. But in general big-O of the common data structures should be memorized with the rest being trivia that should be accessible in the documentation for the data structure (or by looking at the implementation and doing it yourself).


> A dev being able to tell you the time complexity of their code isn’t trivia

I take it that you didn't bother to read the thread you are replying to? There is no reason why you need to have the time complexity of any given function memorized should someone ask. You can – better yet, they can – read the documentation or, failing that, read the code to answer the question. If you happen to have it on hand, that's great and all, but it is only necessary if the asking party see themselves as a trivia master.

> Devs not understanding how to optimize to avoid slow time complexity is definitely one of the first nuts you have to crack in a new engineer.

You never really crack that nut, no matter how experienced, as many problems have no known faster time complexity solution to provide the 'how'. To begin to try to crack that nut you're going to down the road of researcher, not writing day-to-day software, and even then your research is not guaranteed to yield results.


No, it is definitely trivia. You can just as easily look it up on an as-needed basis.

Whether or not developers choose to observe time complexity is orthogonal.


Look up what? If I ask a dev the time complexity of a function they’ve just written, what are you expecting them to lookup? They should be able to tell me on the spot and we can discuss if it’s optimal


How does that meme go? Oh yeah: Tell us you haven't read the thread without telling you haven't read the thread.

But, okay, let's entertain your random tangent. Such a request is emblematic of junior managers/junior manager hopefuls misguidedly trying to wedge their knowledge into the function of the operation. In reality, you don't want to have to build a team that requires discussion about such matters. You need to define clear performance requirements upfront and let the benchmarks speak for themselves. Even if a function is O(N!), but measures successfully within the needs of the application, good enough.

It is especially important to encode these requirements in benchmarks as it documents for future developers the same requirements, and ensures they stay within them as code gets modified. The reality is that you won't be around forever. Keeping a team that relies on you to ask such questions is a failing on you.

But let's accept, for the sake of discussion, that you have failed your team. One still does not need to remember that kind of information. The function they wrote contains what they need to know, and they can consult the function to ascertain that information as needed – just as you could all by yourself in this scenario. If you like keeping such trivia in active memory, good on ya, but it is never necessary.


That’s not what this thread chain is about where we’re saying the time complexity for various operations on more obscure but still common data structures (eg btree, trie etc) are trivia that is typically documented by the library providing said data structure. That is distinctly different from being able to perform a big-o analysis on a given piece of code.


As a concrete example, it sometimes happens that I'm working on existing code trying to make it faster, because it just isn't up to it's task at the moment. Looking over the code, I can see that it has some code that does a LOT of lookups to see if something is in a collection. And I can see that the collection is using a linked list; which is horrible for lookups. This is something I can notice just browsing through the code because I _already_ know the implementation (and performance) details of a linked list.

At this point, I can take a look at the rest of the code and see if there's some specific reason a linked list was chosen, or if it was just that the developer didn't put much thought into it. If there is a reason (order matters, etc), I can consider what other structures might _also_ suit that need, but also give faster lookups. And, because I know of a variety of data structures, how they're implemented, and what their characteristics are; coming up with some initial choices (or starting points) happens fairly rapidly.

So yes, I could get all this done without knowing all that information. But it happens a LOT faster because I know it. I notice the issue faster. I have some starting points for options faster. Etc.

Not having that information would not prevent me from doing my job. But having that information makes me _better_ at my job. Faster, more efficient, safer, etc.


I came across concrete examples like this before, and solved them, so here's a slightly different take: I'm principally an engineer. In general it doesn't even matter what topic though obviously there are some which I know a lot more about because of experience. Programming is one of them. I don't have a CS degree, I've seen terms like b-tree a ton of times but couldn't even begin to explain what it is, you get the point.

Now: the way I approach an example like this is similar to what you do, but the difference is that I'm merely aware there's a bunch of different containers out there which behave in different ways wrt lookup/add/remove/... and for the vast majority of them I have no clue about the implementation. I don't need such implementation knowledge to realize that another container might be the trick here. I also don't really need that knowledge to decide which container that eventually might be, nor to come to conlusions fast (since you're stressing that): that's a couple of minutes to lookup moreover the final decision stage will be through performance testing anyway, because measuring under actual usage is the only way I'll ever be convinced one is effectively better than the other. That's usually what is going to take time, not picking one out of x. In the extreme it might even be so that I'd first need to write a wrapper to then have a configurable container type just to be able to have people test it under different workload types, etc.

Obviously I've been through this a couple of times and by doing so did become aware of main traits of various containers/algorithms/... and yes that will be somewhat faster in the decision process. Likewise there have been certain domains in which I did dig down all the way to the very bottom and obviously having to sort out problems in that domain will be faster for me. But in the end: for me personally 'x algorithms programmers should know' is useless. Job-dependent perhaps, on the other hand I've seen enough people struggle to come up with actual and practically useful software aspects exactly because they kept on getting lost in implementation details.


> But in the end: for me personally 'x algorithms programmers should know' is useless.

I guess for me, it boils down to something like "Given a software developer A, adding more experience to them will generally make then a _better_ software developer; and most of that is the knowledge they gained in that experience". Knowledge makes one a better software developer. And knowing more algorithms and data structures is one form that knowledge can take.


Do you just eye it up, or run a profiler to narrow down where the hot spots are?


I'm working in Java atm, and my general steps are

- Take a quick look at the symptoms and code and see if it's obvious

- Take a look at the jstacks and see if there's anything that stands out (lots of code stuck in the same place)

- Take a more thorough look at the code to see if anything stands out

- See if I can reproduce the issue locally

- From here it various wildly

I rarely run a profiler just because it's difficult to put it into place in a production environment (which is generally the only place these issue come into play unless you set up the conditions artificially on a lower environment). That's not to say never; I just try to identify the issue other ways first.


I think implementing some data structures like hash tables, lists and trees is the best way to get a feel for when one is suitable and what their traits actually are apart from just “has O(1) insert”. You can accomplish the same by lots of use, benchmarking, studying their source code etc too. But implementing at least to me is the quick and fun way of developing this gut feeling. It’s also not that rare to suddenly have the need to implement something you’d usually find in a library because you need some unique variation of an algorithm/data structure.

Whether this is truly fun I guess depends on where you are on the developer spectrum. I’d much rather develop low level libraries than app code so any chance I get is just a great opportunity to dig into the low level stuff.


Whatever you work in, chances are you run into situations where you need to choose between multiple solutions, optimize things or generally find out why things aren’t optimal. Your needs are always your needs, and if you are informed about how different tradeoffs compare, it helps to understand how they work.

To keep things more substantial, consider facing issues storing data at your company. Maybe the nature of your data flow causes constant slowness, because your B-Tree indices are being constantly rebalanced on writes. You knew this because you understand how B-Trees work and you also know LSM trees might be a better data structure for your needs.

And so what I mean is that understanding the big blocks you use in high level often requires understanding data structures and algorithms, even though you don’t need to implement them


It helps to make informed decisions, for example regarding memory locality, and it helps when they don’t perform as expected or required. It’s also difficult to get a good understanding of the properties of algorithms and data structures if you’ve never looked at how they are implemented under the hood. Learning how algorithms and data structures are implemented is probably the best way to learn the general principles.

And then there’s also some algorithms, like for example topological sorting, that are typically implemented as part of application-specific code, not by use of an existing stock implementation. When you know nothing about implementations, you might not notice that the code you’re writing happens to actually be an instance of a well-known algorithm or data structure.


A much shorter, less amazing answer: it’s hard to know what you don’t know! This sort of background knowledge is exactly what big tech claims to look for in applicants, because it comes up in tiny little ways at unexpected times.


Depends on what you want to work on. But if you’re picking data structures without understanding the implementation details, then you’re not grokking the decision but instead comparing traits you happen to know about and ignoring the ones that aren’t explicitly thought to be called out. Basically the trait everyone things about is O(1) or not. But the design space for data structures is vast and CPUs they run on complex. Heck, sometimes the available abstractions are readily available just because it provides a simple general propose API that we’ve iterated in through trial and error over a long time. That doesn’t mean that design is actually optimal for your problem. Now in many problems you might not care. But there are definitely problems where you do start to care. All your answer is saying “I work at a very high level of abstraction” which is fine for you but there is real value in being able to cut through layers of abstraction (performance, bug hunting etc) and discarding that as trivia is myopic. It may not be valuable to you but it is valuable knowledge I’ve used countless times in my career.

Think about ECS in games - that requires a knowledge of data structure that transcends big-O trait comparison to exploit CPU architecture for drastically better efficiency.

You might ignore memory allocations but the under the hood allocations can be important in some cases to make sure they play friendly. Or understanding cache impact and why using a linked list may not be a great idea. Or when sorting+searching may make more sense than just searching.

The more you know the better able you are to make effective decisions and people who work strictly in abstractions tend to be limited in their ability to solve problems that cross abstractions.


People will come up with their reasons, but the thing is that even if you do learn these algos once in your life, and implement all these structures, unless you use them often, you'll very likely forget them. I did a lot of them 20 years ago, but completely forgot.


> what's the point of knowing trivia on data structures and algorithms if you will never implement those or even be aware you're using them or not?

Because:

1: Sometimes you need to choose the algorithm, even if it's implemented for you. (Many languages provide many different kinds of collection types, varying on some small details that appear unimportant, until they are very important to your specific need or use case.)

2: Sometimes knowing details of the algorithm helps you avoid edge cases and other inefficient behavior.

3: Sometimes if you know the algorithm exists, you know what kind of library to search for, instead of reinventing the wheel.


It is much easier to remember the traits of data structures if you know how they are implemented.


Not really. There are two key traits one needs to memorize for general purpose programming: time and space complexity. Literally a “cheat sheet” with a list of order notations is sufficient to memorize.


Just because they are general and are relevant to almost every problem, doesn’t mean there aren’t traits you need to be aware of that might be relevant to your problem.

It’s like the Microsoft Word problem. 10% of functionality is used by everyone (eg saving a file, opening a file, changing the font). Those are baseline “everyone should know this”. The remaining 90% is still valuable and used by distinct subsets of customers. If Microsoft started deprecating the long tail of features without careful cohort analysis they’d start losing swathes of customers.

So for the most common operations of the most common structures I’d expect memorization of complexities just because of repetition of use (or deriving/guessing it from first principles cause it’s easy for the most common ones). But there’s much more traits and it would be important to understand that (eg what are the primary drivers of CPU time when accessing a hash map? An array? A linked list?).


What you describe are relatively niche or uncommon uses of these algorithms and data structures. Further, the specific implementation details of a given algorithm become relevant in these contexts. Different implementations of a hashtable can lead to substantially different operational performance, and no textbook understanding (including the toy implementations performed as part of a CS curriculum, for example) of what a hashtable is will lead to the kind of understanding you discuss.


Of course. That’s why good libraries will describe the specific implementation. Not just hash table but whether or not there’s open or closed addressing, how buckets are chosen (eg cuckoo hashing, Robin Hood, etc). A text book implementation is simply meant to give you a starting point understanding of the design space but you need continuing education to become more of an expert to understand when this stuff matters. But this expertise is valuable across problem domains and levels up your technical expertise. At my level all the engineers I’ve ever interacted with have a solid grounding in the fundamentals.


I think in C++ it can help to know how they're implemented under the hood just to be aware of when it's allocating heap memory


If you don't understand how they work, you may end up choosing a wrong container for the job (e.g. a list instead of a dictionary), and you will be tied to finding modules that do a specific job in a case when it would be trivial to just write something yourself.

In many cases it will be impossible for you to write certain parts of code, or you will end up mashing random libraries together, ending up with a program that works 1000x too slow, instead of just writing a few lines of code that solves the issue.

This is especially true for graphs/BFS/DFS and binary trees/search algorithms.

Also, without training your understanding of basic algorithms, you may be unable to write a lots of code that goes beyond wrapping over basic libraries. Not to mention being able to write the libraries themselves.


I think you're asking a far more general question than you realize about things like professional standards, bodies of knowledge, what is or should be considered canon or foundational, how a curriculum should be constructed, how training programs should work, whether or not specific careers should require licenses, how they should be given. This doesn't apply only to software developers. If you're going to end up being a vascular surgeon, do you really need to know or understand anything about physics and chemistry? Does a contract lawyer need to know anything about legal theory or criminal law? Should a prison warden study ethics and social theory? Should a homebuilder learn about materials science and Newtonian static dynamics? A skyscraper builder? A bridge builder? Where exactly lies the divide between researcher and implementer? Engineer and hacker? Engineer and technician? Technician and hobbyist? Should someone who is good enough to make something themselves that basically works on the happy path be allowed to advertise and sell what they create to others? Should that practice be regulated at all and, if so, how? Most professions have better, widely-agreed upon answers to these questions than software development because they have existed for longer.

I would personally say there is definitely no answer to "what should all programmers know" other than at least one programming language along with its associated tooling and enough about a platform software can run on to deploy it. Beyond that, across the full spectrum of software lifecycle careers, I'd say we've started converging on somewhat of a hodgepodge depending on role and where the standards are coming from. IT technician type careers have various certifications coming from platform providers like Cisco, Redhat, and Microsoft, and industry groups like CompTIA. Corporate security gets things like CISSP. If you go to a university, majors in computer science will give you what we expect academics in the subject to know before they can run labs or teach, and software engineering majors some idea what we might expect team leaders in large software organizations to know about how more general engineering principles and practices get applied specifically to software. These tend to have a lot of commonality in terms of at least being familiar with the basics of operating systems and networking, information security, discrete math, data structures and algorithms, programming language paradigms and constructs, possibly some bit of how compilers and maybe processors work.

Then you get what large and well-known employers expect implicitly from their interview and promotion processes. This seems to have also converged largely on at least including basic common programming language constructs, data structures and algorithms, and possibly some minutiae related to specific languages, toolchains, platforms, ecosystems, and possibly larger principles related to things like distributed systems, databases, how the Internet and web work, depending on the nature of the major product lines.

No single person needs to know all of that, but if you do, you're probably reasonably well-prepared to understand the hows and whys of any well-known or largely-used software system. It definitely doesn't mean your daily life as a developer will ever involve directly using that knowledge any more than a practicing psychiatrist remembers and can recite the exact details of the Krebs cycle. But having to learn it still might help guard a bit against susceptibility to bullshit quackery from drugmakers trying to tell them a miracle cure can do something that is metabolically impossible.


I'm going to somewhat agree with your assertion, because I'm one of those developers that builds my business by simply plugging what are essentially legos together and googling stackoverflow solutions to my architecture problems.

However, a small anecdote. I recently obtained my pilots license, and one of the things you'll quickly learn in aviation is that there is a massive amount of information that you are required to learn that you will almost certainly never need, except possibly for that one moment in 1000 hours of flying when something happens in the air, and you simply can't pull over to the side of the road.

Developers who have an understanding of the how -and- why when they are required to solve a problem vs just plugging a library in will have a much better career developing solutions and leading teams that do exceptional work. Back to my anecdote about aviation, understanding the "how -and- why" could mean life or death.


> (...) there is a massive amount of information that you are required to learn that you will almost certainly never need, except possibly for that one moment in 1000 hours of flying when something happens in the air, (...)

I don't think this logic applies to software development in general and algorithms in particular.

When you need to roll out a b-tree from scratch, just crack open a book and read up. The worst thing it can happen is that somr acceptance test fails and you roll back a change. That is hardly a life-or-death scenario.

The word "soft" in "software" has meaning.


Gp's point is about how there are many things worth learning even if you can't immediately see how they will be useful, and not about urgency or timing. Sure you can pick up a book and learn about b-trees if you have to but you might not even know that a b-tree is what you need in a particular problem if you never learnt it in the first place. Most algorithmic problems in the wild don't present themselves cleanly with a signpost saying 'this is an X algorithm problem'


> When you need to roll out a b-tree from scratch, just crack open a book and read up.

How would you know you need to use (or implement) a B-Tree in your solution VS some other data structure?


> Developers who have an understanding of the how -and- why when they are required to solve a problem vs just plugging a library in will have a much better career developing solutions and leading teams that do exceptional work.

Will they? Or do you just like the idea of a world where that is true?

Programming is (almost) never life or death and honestly is rarely even in the same ballpark.

Most people don’t need (or want) to do exceptional work.


Realistically, do most pilots remember these things?

I also don’t understand the career claim. How does that work?


Yes, as they’re literally life-and-death in many cases. Getting a pilot license requires one to demonstrate the ability to do certain “not common but not exactly uncommon either” recovery things from memory. And different bits of those “1000 things” play a role in different scenarios. If you’re lucky and at high enough altitude to have time to look up a checklist in the POH it’s great. When you have seconds because you just took off or something screws up on final you don’t have time to go through the POH or futz around with the checklists in your avionics (if they’re even there).

The pilots that don’t remember these things when they’re needed are dead.


That is probably right for most jobs in the industry. But it is not always so. You can use these algorithms in side projects (I did so for Go for instance, because it didn't have sets implemented and wanted to understand the language better).

Or maybe you lust for a career in the gaming industry, which yes have their own engines ready to use, but you might have to do optimisations of your own. Or work with SOC and embedded systems. Maybe you would like to work with Linux core tools one day.

I think there is a broader world than that of the common jobs in the silicon valley.


I always joke that the best thing I learned in university is the ability to learn. Most of the actual things that were taught there were pretty standard stuff (compilers, algorithms, a bit of functional programming, monads, formal methods, linear algebra and other math, etc.).

Almost none of that is stuff I use on a daily basis and I've frankly forgotten most of it. And I work with successful programmers that skipped college entirely and never learned any of that. I don't find myself building compilers that often and while I enjoy the functional programming renaissance in languages in the last ten or so years, I had to relearn a lot of that stuff as I hadn't touched any of it in 20 years. A lot of the expert systems and Bayesian belief network stuff in the nineties got obsoleted by machine learning later.

While I've forgotten most of that stuff, I remember enough to get back into it when I need to. Which has happened a couple of times. Mostly, I just hit Wikipedia, read up on a bunch of things and then figure out what tools and libraries are appropriate.

And there are a bunch of things that I was taught that didn't click until I learned it properly by doing it in practice. Like dealing with concurrency. Which, as it turns out is less about formal methods (and temporal logic, which was a pet topic of my teacher) and more about engineering practical solutions to very real problems which my teacher had never experienced because he was an academic. I knew all the lingo but hadn't really experienced any of the pain. Nothing like debugging a misbehaving system where you can see all this play out in real life.

I never cease to be amazed by the amount of stuff I have to learn on new projects and have dealt with some amazingly niche stuff over the years. That's stuff from computer science, medical stuff, material science, legal stuff, and more. If you do software for companies working in some niche field, you end up absorbing a lot of knowledge about what they do, how they do it, and why.


> I always joke that the best thing I learned in university is the ability to learn.

I hope no one laughs -- it's not a joke.


It’s explicitly the point of an undergrad degree.

Traditionally the masters degree is where you start specializing and becoming an industry expert.

The undergrad degree is intended to teach you how to learn and then how to learn within your specific specialization without becoming an expert in the specialization.


This was one of the main lessons they hammered into us at UIUC in the early 90's.


i thought the same -

I was told early on at uni that you're going to forget almost everything here or things change and you won't need this info

but hopefuly you'll develop an engineering mindset


> And I work with successful programmers that skipped college entirely and never learned any of that.

I also noticed this and questioned what I have to offer more. But once and a while there is this complex problem, and then you can see that some developers run into their limits, while others (mostly university degrees) can go further.


> a broader world than that of the common jobs in the silicon valley.

I'd hope it was the opposite: common jobs outside the valley ought to be all the plumbing of comesouttas into goesintos, and the valley ought to be concentrating on the fewer problems where deeper skills yield much higher values?

(I left the valley in the 1980s, so this may be just rose-coloured nostalgia)


Go still doesn't have Set types. The usual thing to do is using a map of your type to either an empty struct, or maybe a bool, if I got that correctly (have only been using Go for a couple months so far)

Did you implement your own Set type and primitive functions?


Yes in Go I think I’m mostly tempted to reinvent the wheel every few lines of code, for the sheer lack of stdlib abstractions I’ve grown used to.


This is funny to me, because Go is often touted as a language that has "everything you need" in the stdlib. Of course, I'm coming from the nodejs world, where before you can start working you first need to evaluate and choose: typescript or not typescript and if yes what build tool chain, which test runner, which assertion lib, which mocking lib, etc etc. Not really that long ago when you had to choose which async lib (now you just need to choose which async color).

Thankfully a lot of that stuff is in nodejs now, and there's a commitment to maintain parity with Web APIs which also helps, but there's still a lot of half-done work and path-dependencies on outside libs.


> This is funny to me, because Go is often touted as a language that has "everything you need" in the stdlib.

My usual example of something missing from the standard library is

    func max(a, b int) int {
      if a > b {
        return a
      } else {
        return b
      }
    }
math.Max only operates on two float64s (due to the absence of function overloads; many standard library functions such as this predated generics), so if you want to avoid casting (and/or potential loss of precision if you're working with int64), you need to reinvent the wheel.


Go has had generic forms of the `min` and `max` functions as builtins since 1.21. https://pkg.go.dev/builtin#max


Go 1.21 is the version from which it is noticeable that they are pushing towards (re)implementing things that were lacking before due to missing support for generics.

Like the new slices package, meant to do common operations on slices, that is brand new on 1.21.

I wouldn't be surprised to see more and more new additions leveraging generics in next releases.


Good to know! I admittedly haven't written much Go in the past year, and while min / max were some of the most egregious examples, I'm sure there are plenty more that still exist.


You actually don't need any of that stuff, you can just write .js files and run them in node. The node.js standard library is quite extensive.


It's much much better now, especially if you are willing to adopt the experimental modules.


I did it using maps and generics and implemented a subset of the convenience methods that python has in sets.


Most languages implement sets using maps. The only major difference you can optimise for is the fact that in majority usage, keys in maps are likely to exist, whereas sets are used for membership checks. Bar that, you can just as well use a map with a dummy value.


Can you give a reference for that? It sounds very unlikely to me, although I've only read/gone through two relevant standard library implementations: F# and OCaml.

Neither of them use dummy values as you suggest. They have similar implementations (both are AVL trees) but sets really do contain only one value while maps contain two. I would be surprised if many other languages did as you suggest.


Rust's HashSet is a HashMap mapping to "()", which is Rust's equivalent to "void".


That is true for the internal implementation details but HashSet provides a different API, it is not just a type alias. So you don't have to worry about semantics of what the values mean (like the bools in Go), and it has no size overhead.


Python, Ruby, Perl. Languages where hashmaps are a builtin.


Python has its own set type not based on hashmap/dict, and that's been the case for years.

The set implementation points out some of the differences, at https://github.com/python/cpython/blob/main/Objects/setobjec... :

   Unlike the dictionary implementation, the lookkey function can return
   NULL if the rich comparison returns an error.

   Use cases for sets differ considerably from dictionaries where looked-up
   keys are more likely to be present.  In contrast, sets are primarily
   about membership testing where the presence of an element is not known in
   advance.  Accordingly, the set implementation needs to optimize for both
   the found and not-found case.


Java as well: https://github.com/pengisgood/jdk-source-code/blob/master/sr...

EDIT : and Rust: https://docs.rs/hashbrown/latest/hashbrown/hash_set/index.ht... (this is used by the default implementation of HashSet).


Update: Looks like I had good reason to be surprised. Thanks everyone for the examples.


While I agree with using libraries and not reinventing the wheel, we should not ignore the fact that the latest sorting algorithms introduced in Go, C, CPP, and Rust, if I’m not mistaken, were introduced between 2020 and 2022. New algorithms and data structures are refined almost every year to adapt to new hardware and specifications.

A good engineer must understand the hardware, the data, and the algorithm.


And 99.999% of all SDE do not need to implement these algorithms. Or ever understand them. And those that do, should seek that knowledge of their specific circumstance when the need arises.


99.999% of all SWE don’t know that algorithms and data structures evolve. How can you they choose the right library? That was my point.


They know (I hope), but it is completely irrelevant for the vast majority of work they do. And if it becomes relevant with the use of profiling, they can dive down in the details.


> Most programmers shouldn't be writing their own lists, hash maps, b-trees, sort algorithms, etc. Or generally reinvent wheels.

Of course one should not reinvent the wheels. But understanding tricks and idioms used in these algorithms are useful elsewhere and in general helps you grow as a programmer.


I figured the point was learning a mental model of the solution or how it was arrived at, so this can be reused as a template for different problems.

As an aside, for non-programming mental models I found the Personal MBA by Josh Kaufman to be fantastic. Its title and marketing cheapen it a bit but the content is on point, extracting the best reusable patterns from graduate business education.


In order to consume such algorithmus in any form, one should first know whether such algorithms exist at all. Moreover they should be able to figure out which one applies to their problem best.


Depends strongly on how you define “know”-in that context I perceive it as referring to having or gaining some basic conceptual comprehension of the concept or algorithm, particularly in order to eschew reinventing the wheel in a squared shape, and thereby unnecessarily exhibiting exponential complexity.

It also depends on your field of work.

“Knowing” of them would tell one where to look and how to think.


> Most of those you'd consume in library form

It seems having great libraries is both a blessing and a curse to software engineers. On one hand, we get to build increasingly more powerful systems with these libraries without being bogged down by re-inventing the wheels -- some of which are nearly impossible to implement anyway, such as BLAS. On the other hand, it seems a waste of time to spend years studying the fundamental algorithms and data structures -- or in a narrow view our CS background is not necessarily a moat for us nor a foundation for us to build our expertise on. Of course, a lucky or talented few can build their careers on deepening their CS background, but many of us end up spending majority of time integrating various libraries and systes.


Likewise, most algorithms and data structures is to give the developer an idea how expensive it is to run. And these are often the things that people who studied CS has an edge over the self-taught ppl. However, I have too not written those things since college a long time ago.


Understanding time & space complexity is tablestakes for many software engineer roles. It sounds naive to think that self-taught ppl would lack that knowledge


One person's "naive" is another's "statistical conclusion after having worked with enough self-taught or bootcamp-educated people".

There are some good self-taught devs and a select few bootcamp-educated devs in the same sense that there are a few unwarped 2x4s available at your local Home Depot - the fact that the exceptions exist does not disprove that the majority are not worth your time.


I think for anything more complicated than (maybe) sorting, appreciating the subtleties in these algorithms goes a long way in helping you decide when to use them vs not.


Still good to know how they work. Helps with mental modeling.

“Reinventing wheels is how we get better wheels.” - Source Unknown


If they come in a library or as part of the language, that's an even better reason why you should understand how they work.

If you live in a world where a line of code assigns a variable or dereferences a pointer or emits adds and imuls, you can probably get what's happening just looking at the code.

If each line of code manipulates complex structures, sends queries to the database, interfaces with the network etc, you'd better know what you're doing and you'd better know it well.


One thing about learning algorithms from scratch is you learn how to measure code in terms of various resource costs (cpu, men, I/O). Being able to understand the cost of a given code block is very valuable.


It doesn't seem that people actually learn those things in many cases. I encounter programmers fairly regularly who struggle to grasp CPU vs memory vs I/O trade-offs and have no mastery of tools for measuring them.

This is unfortunate because the tools can often be a cheat code. It's painful to see people fixate on one possible cause without going to fundamentals and making a proper assessment. These people all pass the traditional programming interview filters for algorithms.


Srsly

With technology moving faster than ever, I believe we need to cultivate more acceptability for simply using existing stuff

while rudimentarily understanding how it works, obviously ...


Sometimes the general purpose implementations just don’t cut it, depending on the data access patterns, the hardware, etc., neither do they exhibit sufficient tuning capabilities. Sometimes people find out how embarrassingly inefficient a stdlib implementation under the hood is.


Even if you’re not trying to reinvent them, understanding how wheels work is useful.


you must be able to code them manually for new jobs,leetcode style,though


This book seems to be suffering from an identity crisis. It’s ostensibly aimed at “programmers” but the content is predominantly machine learning topics. What sort of general software engineer needs to know about GRUs and LLMs?!

Add that to the fact that it’s published by Packt - this is probably one to avoid.


It's aimed at people who will... spend money on it. Pivoted where money seems to be trying to cash on hype.

If it was written few years ago, there would be blockchain chapter followed by llms now.


Packt’s business model is so transparent and so cynical. I hate it.


Again. Packt publication books are rarely good. Nothing against the author. But I have been time and again disappointed with their content quality. They go for quantity of books over even the basic quality.


That's because Packt, unlike O'Reilly titles, are the shovelware of the programming textbook market. They have a pretty sorry business model. Lots of copypasta-style content slapped up into a book, then sending free copies of the book in exchange for reviews.


I wondered about that. I just looked at the table of contents, and there is far, far too much stuff in this book. Either it is 2000 pages long (it's not), or it is incredibly superficial.


They will take on literally anyone as an author. Authorship gets you kudos in the increasingly large segment of programmer influencers.


Opposite to Manning, O’Reilly proper, and No Starch Press books, Packt’s aren’t even properly edited. Apress books are the worst in my experience, really zero editorial work noticeable, lots of grammatical and logical mistakes, many obvious code errors. There are a couple better Packt books, but I don’t even bother looking inside anymore.


Just a thought: if I were a decision-maker at O'Reilly, there's no way I'd allow my brand to get diluted by Packt with these oreilly.com joint publications. Reputation is everything (or almost everything) in the publishing world, and it's why when you see O'Reilly's name on the spine, or Wiley, Addison-Wesley, Manning, or No Starch, you can expect a certain quality.


Ditto. I was asked to write a Rust for Linux book back in 2020. Rust was (and still is) not actually driving any part of the Linux kernel so I kindly refused. They very clearly stated that they are only interested in being the first publisher with a book on this supposedly green field.


I was asked to be a reviewer / editor for a book back in 2014, as a uni student having a somewhat viral github library on top of a framework, but really no deeper understanding of the framework the book was supposed to be about. And what I would get for that work was a short bio in the book, a copy of the finished book, and another book of my choice.

While I at the time was flattered and almost said yes, it also felt a bit weird knowing they didn't vet people more. What did that say about other books from them I had read?


I have one or two Packt books that are very good. But overall agreed.

This seems like a strange mix of "hard" algorithms like sorting or binary search, and ML things like k-means clustering. Not sure who the intended audience is.


The intended audience is the legions of people grinding Leetcode and going through our industry's ridiculous interviewing process.


They would be better served with more focused books however.


Better served pre or post interview?

I'd argue they're better served doing leetcode (given finite time) to land the job, THEN reading this book to become better at their job.


I came here to specifically look for this comment, and... well, the rest says a lot about HN ;)


It's weird how mixed quality their books are, and how many overlapping books they have. From Packt I like Sebastian Raschka's books on Python Machine Learning. Other than that from time to time they do have some good "recipes", but hardly any of their books seems worth reading cover to cover.


I think that's a case where a good author has chosen the publisher for some reason. They are good enough to be published by someone else but went with Packt possibly because it was an easy onboarding.


I completely agree. Packt as a publisher is hot garbage. That's not to say every author is bad but but even when you get a quality author with good content, you will have horrific formatting and nonsense like pages with one sentence on them, illegible screenshots and there's often a complete absence of proof-reading or editing. The Packt game seems to be to simply flood search results hence quantity over quality.


Indeed, they do very little work in selecting authors and editing the books, but they still take a big revenue cut. So they tend to attract lower-quality authors. There are a few exceptions but I agree with the general opinion of Packt.

They approached me a few years ago to write something, and Leanpub seemed a much more compelling offer.


Packt feels like Udemy to me. I know there are some good stuff, but...


Personally had an order of magnitude better experience with Udemy than Packt. Hard to find advanced content on Udemy but there’s a lot of beginner content that I’d consider rigorous for its level - can’t say the same about Packt.


It's a comparable situation, I tend to agree. I wouldn't make a 1:1 comparison but Packt is not known for publishing well-edited and written books.


Any better recommendations?


Algorithms?

- Algorithm Design Manual (3rd ed.) (by Skiena)

- Art of Computer Programming vols 1, 2, 3, 4A, 4B (editions 3,3,2,1,1) (collectively, "TAOCP") (by Knuth)

- Algorithms (by Erickson) [CC-BY-4.0]

- Algorithms (by Sedgewick & Wayne)

- Introduction to Algorithms (4th ed.) ("CLRS") (by Cormen et al.)

- How to Design Programs (by Felleisen et al.) - crossover between basic programming advice, functional programming, and some limited/basic algorithms coverage [CC-BY-NC-ND-4.0]

- Pearls of Functional Algorithm Design (by Bird)

- Purely Functional Data Structures (by Okasaki)

Publishers? Pragmatic Bookshelf, or O'Reilly (not oreilly.com-selling-Packt). Find what everyone is using as a learning/reference resource for a specific topic and use that, regardless of publisher; it's rarely if ever Packt.


I would also add James Aspnes' Notes on Data Structures and Programming Techniques (<https://cs.yale.edu/homes/aspnes/classes/223/notes.html>, CC BY-SA 4.0).


What do you think are the most worthwhile Pragmatic Bookshelf titles? I just looked at their publications list and there's a lot more of them than I had thought before I checked.


I have fond memories of the "Picaxe" book ... I have no idea if it is still the best way of learning ruby (I used the 2nd edition, for ruby 1.8 I think...).

Two other books that I like from them are "Language Implementation Patterns" and "Distributed Services with Go".


From all these I feel like the best intro is Sedgewick & Wayne. CLRS is less solo-learner friendly, in my opinion. I skimmed Erickson and I liked it too.

In the "friendly+actually-useful" category, you missed "Open Data Structures" [1] ... I think it has one of the best descriptions of trees, including B-Trees which seem to be only superficially described in other books. The "Think" series is also pretty good and covers more than just algorithms [2].

TAOCP, Okasaki, Bird: these feel really academic and/or less useful for industry work. Reading HTDP wasn't as useful to me, at least: I feel like Felleisen's books describe what he "imagines" would be a good way to write software on a "real" job :-p (non-academic setting).

Skiena: I have mixed feelings about it... It seems to have gained popularity since Steve Yegge's blog post about Google interviews, but imho many concepts in that book are better explained elsewhere. I suppose the intended purpose of the book was to serve as a reference and an index to other resources. However, it often refers to sources that are 10 years old or older, academic implementations that could be challenging to apply in a real job setting, or things that are no longer available online afaict (e.g., LEDA), or hard to read papers, most of which lack a working implementation of the described concepts, etc. I wonder if anybody can really vouch for it or have a personal anecdote of the book being useful or enjoyable.

--

1: https://github.com/patmorin/ods

2: https://greenteapress.com/wp/


For beginners, I can recommend Grokking Algorithms by Bhargava. Small book with lots of illustrations, which is good for building up an intuition of what's going on.


I have never been disappointed with a single book in Manning’s “Grokking” category.

It’s a great way to build up an understanding of how something works from the ground up without worrying about the specific code implementation (or using the book as a guide to write one).


second this, it seems that anyone can publish with them and the books most of the times are terrible, like reading clickbait articles on medium. Terrible.


Any opinion on this specific book?


I just randomly picked a chapter discussing a topic I'm somewhat familiar with (the one on TLS handshakes), and I'm not impressed with what I'm seeing.

I don't think anybody that doesn't already know the basics of TLS would be able to make sense of it as presented there: There's an implicit assumption that an RSA public key is used for key establishment that's never mentioned, and the diagram (which is sloppily drawn too) doesn't help things either.

In the end, "a" secret key is used to encrypt something – but the core of TLS key establishment (again, only the RSA variants) is how that key is agreed upon using only the server's public key. That's arguably the entire algorithm here (the rest is protocol negotiation)!


Judging from the table of contents it seems that the first half of this book covers basic CS stuff (what is an algorithm, Big O notation, sorting and searching, data structures etc) and then discusses some machine learning basics before looking at the more recent deep learning architectures. I would not buy this book as there are plenty of (better) online resources and even better books IMHO such as the books from Bishop or Marsland.


Did you read the book? Or is this just some general judgment you are passing?

I own the book, and I enjoy it. Bought it for Xmas.


I read it superficially. And it's okayish. Again, nothing against the author.


"Generative AI and large language models (LLMs), deep learning techniques, including LSTMs, GRUs, RNNs, neural networks, unsupervised learning, Seq2Seq model, the SciPy ecosystem, Jupyter Notebooks, Keras, TensorFlow".

No, every programmer does not need to know these things. Not even close. Looking over the table of contents it is maybe 20% relevant topics and 80% soup of current hot buzzwords strung together to sound appealing and sell the latest edition of the book. And of course you can hit a button to get a monthly subscription and read all of it!

If you want some actually helpful algorithms-focused material look up the website and YouTube channel of any good university. They all have great foundational courses covering data structures and algorithms available for free online that will be much more concise and useful.


I generally agree with you. If you want classical algorithms (which are, imo, fundamental), you can't go wrong start with Skiena's "Algorithm Design Manual" or lecture notes: https://www3.cs.stonybrook.edu/~skiena/

> the SciPy ecosystem, Jupyter Notebooks

These, though, are gems that can be dramatic quality-of-life improvements for all kinds of data analysis. I resisted for a while (writing code in a web browser? BARF), but have gradually come to have a deep affection for pandas, numpy, scipy, seaborn, etc. They're quirky, their APIs are sometimes a bit strange and unintuitive, but they make data analysis tasks go so much quicker. I'm not a data scientist; most days I'm more of a systems programmer. Being able to take log files, write a bit of Python glue to transform them into Pandas Dataframes, and then slice-and-dice-and-plot different facets of the data rapidly is a game changer.


I really like David Kopec's Classic Computer Science Problems series. He has it implemented in multiple languages too.

https://classicproblems.com/


Frankly, the classic algorithms courses at universities aren't great either. The set of "classic" algorithms covered in an intro course is arbitrary and comes down to history more than pedagogy. There are some useful concepts that all programmers should know like dynamic programming or divide-and-conquer, but spending a bunch of time on relatively specialized and narrow algorithms like the simplex algorithm or FFT is not a great way to illustrate them.

The core algorithms sequence ends up feeling more like a grab bag of "historically significant" algorithms rather than a coherently structured introduction to the subject. The set of algorithms taught is not a great practical foundation for solving problems, but also aren't especially effective at teaching how to generalize the ideas and design algorithms for your own problems.

You can use a classic algorithms course as a starting point to learn more, but it's not an obviously great starting point. One grab-bag of algorithms might be better than another, but it isn't fundamentally better.


I think of a lot of the school algorithms as stepping stones and good practice. It's easy for established programmers to forget the days when a pointer was beyond mindblowing and, sure, OK, I get the idea of a function, but a function that calls a function? A function that takes as a parameter a function to call? What is this?

A linked list is useless in 2024. But it is also just about the simplest possible non-trivial data structure, and while you may not ever implement one ever again the pointer practice is good, and so I support keeping it in the curriculum, albeit I also support being very clear and up front about the fact that it is not actually useful anymore as well.

Even something like a compilers course, odds are decent you're not going to ever use those exact algorithms ever again, but the general overview and practice is very valuable and the difference between those who understand those concepts and those who don't is often pretty clear to me.

You've got to bootstrap off of something and I think a lot of people fall into the unexamined trap of defeating the curriculum in detail ("well, this thing is useless, and this next thing is useless, and the next thing is useless" and therefore the entire thing is useless) and in the process cutting off any way of getting enough experience with the space in the first place to get to the point that they don't need the bootstrap anymore. There is no scenario where we sit a student down in a seat, say a few words over a couple of hours, and presto they're as fluent as any other programmer with decades of experience with the basic tools of data structure design.


> It's easy for established programmers to forget the days when a pointer was beyond mindblowing

Oh? I distinctly remember – after hearing the same story over and over that they are mind-blowing – thinking "that's it?" when I was first exposed to them. I was primed to see my life change, and all I got was a nothingburger.

But maybe it is that I still don't yet truly understand pointers, and thus have not reached the point where my mind gets blown? What have I missed?


Then for you, it was some other thing. Nobody is born knowing how to program, and nobody learns how to program by starting out on day one by architecting a cloud-native scalable solution for a real-time chat client for one hundred million users a day with five nines of reliability, infrastructure-as-code, source control, repeatable deployment, high-quality code review practices, etc.

Nowadays I have a hard time finding a task even a 4-year-grad can do right off the bat without another 6 months of intensive on-the-job training. Asking for the programming curriculum to cover only directly-relevant tasks is pretty much guaranteed to pull the ladder up well above anybody's heads. Especially with the improvements in the past 20-30 years with getting basic algorithms like search and sort reusable (I got into programming just on the tail end of when you might still be expected to write one yourself), there just isn't anything left to give a computer science freshman to implement that isn't already some combination of obsolete and provided by library in every environment they'll ever use.


> Nobody is born knowing how to program

I think we can agree that nobody is born with the language needed to express a program. But can we equate that to having no innate sense of how to program in the abstract?

One of the newer programming languages on the scene, GPT, has shown that many people who didn't think they could program are suddenly able to thanks to a syntax that is nearly identical to languages used for regular social communication. It seems people could program all along, they just lacked a way to express it.

I seems fair to say that a "cloud-native real time chat client" also requires understanding a lot of implementation details that have been built up over decades. Those particular systems are no doubt something that needs to be learned to work with. But that's really moving beyond programming in and of itself.

In a similar vein, are people born able to conceive of a story? I think we can agree that nobody is born with the natural language necessary to express it to others, but if you were raised by wolves in the forest would you still have some semblance of a story floating around in your mind? Understandably we lack the tools needed to give a definitive answer, but the best evidence suggests yes. So, why not programming?


I suspect you have not spent any time with a child lately in a teaching context, if ever. Yes, I'm perfectly comfortable saying people have no innate sense of programming, even if I simply discard babies and toddlers (which is a pretty large concession to be making). Some pick it up more easily than others but nobody is simply born with it. Unsurprisingly (for an HN poster), I'm on the higher end of talent myself in general, but I still remember when I realized programs don't need line numbers, how closures work, etc. Pointers was no big deal to me but I still wasn't born knowing about them, and while pointers to pointers posed no conceptual difficulty in what they are in some brute sense, grokking what could be done with them was a process. I don't think even Knuth learned what pointers are and immediately would have derived the Dancing Links algorithm without conscious thought.


Pointers are a really common stumbling block for a lot of people, maybe because it's a memory address stored in memory (at a different address).

If you've written a little assembly, you understand all this stuff and there are no deeper secrets.


Perhaps it helps that C was my first exposure? As it conceptualizes memory as an array, a pointer was naturally conceptualized as an array index. There was no real novelty in that.

But maybe we're talking more abstractly, outside of computing? I watched my children start to learn about lists and indices around the age of four to five. Presumably the story was similar for me. If there was a mind-blowing experience at that time, it is true that I have forgotten about it (along with most other things that happened back then).


I remember in high school I was confused by pointers because it did not make sense to allocate something and return a pointer instead of just returning the thing itself.

Then in senior year of high school I discovered Haskell and realized that it really didn't make sense and I was right to be confused :)


It's quite strange to include LSTMs/GRUs/RNNs here. I can maybe see why a programmer should be familiar with generative AI, and how a neural network works at a high level, but people not working in AI (that is, people who are at least touching a tensor library - making OpenAI API calls doesn't count) really have little reason to study RNNs, let alone GRUs.

Looking at this again - the book description is "Delve into the realm of generative AI and large language models (LLMs) while exploring modern deep learning techniques, including LSTMs, GRUs, RNNs". This doesn't sound like 50 algorithms every programmer should know at all.

A very strangely put together book. It's definitely not a generic book for the average programmer as the title seems to suggest (just way too much machine learning for that), but even we interpret it as a machine learning book, it's really odd to include content on the TLS handshake and dynamic programming.


Thanks, saved me some time.


This is just garbage. The publisher's business model is producing as much content as possible without care for what programmers actually need to know.


The reviews for the 1st edition, which only had 40 algorithms you should know, are pretty awful.


I'm a little tired of endless lists of things I should or must know. There have been tons of them created and has only added to the confusion. It only increases my burden but doesn't really contribute to my bottom line. It would be different if salary scaled more directly with what I know.

Considering we can't figure out the minimum knowledge required to be employed, I don't trust we have a good idea of what "every programmer should know" either.

This "every programmer should know" meme has simply become lazy, useless, and harmful and prevented the book from choosing a good title.


30 years into my career as a programmer, and I've never had to understand how any of these worked internally, not even once.


When last did you go through a round of job interviews?

The world seems crazy about algorithms as a must-know thing to get a job.

Built large scale projects? Built production teams that can tackle any problem? Deeply understand the process of delivering value quickly without defects?

Doesn't matter. Show me how you would do a binary search. Show me that you know the whole API for our weird stack. Show me that you know how to do a systems design on a white board of a webcrawler or twitter.

Once you've done that you can do one-liner bug fixes on this legacy system that breaks routinely because we have no idea what we are doing.


ha yeah totally. things are out of whack. i did misspeak when i said "never" because there have been times when I have learned some of those algorithms for an upcoming job interview, which i always failed to get if they required that.

i generally get jobs through people I know. going back to about 2009 every FT job I've gotten has been through that way, 3 or 4 jobs. So not much in the way of technical interviews, just kind of "we know you, welcome aboard."


Shelved right next to other literary gems such as, "Complexity Theorists Hate This One Weird Trick", "You’ll Never Believe This Bayesian Network", “The 10 Best Constants” and "This Is What Happens If You Don't Check Your Borrow"...


I would really add a bit of geometry. It’s mostly a blend of graphs and numerical, but it gives a sense of a way they can be intermixed, and adds the idea that you will have a shit sandwich and are coping with classes of errors and can’t know really know the true result but it’s good enough to produce meaningful outputs.


What is with this "N things every X should know/eat/do/not do" titles? It is patronizing and stupid.


Easy reading (promises bullet points you can skim) plus fomo (you're missing this knowledge) plus no true Scotsman (you don't know this so you're not a real dev)

Or, to summarise, it's clickbait


Reading down further I see this is a follow-up to "40 Algorithms Every Programmer Should Know". So, is this new book the same 40 algorithms from the prior book with 10 added?

If so, perhaps the real book could be called "The One Book Writing Algorithm You Will Ever Need."


Personally I'm holding out for "The ten algorithms that every programmer that needs to know 50 algorithms should know, but are dispensable for programmers that only need to know 40". I feel it could be a real gem, but when will O'Reilly deliver??


> The One Book Writing Algorithm You Will Ever Need.*

* until we publish the next one


The gp means they’ll use the same algorithm to publish the next book.


I have been working in the industry for almost 20 years and had to implement a B-tree for the first time since college a little over a year ago. Felt like I'd finally made it as a developer.


It jumps from basic graph algorithms to unsupervised machine learning. It's a "let's put shit that sells" book.


Yeah. I really can't call RNN an algorithm every programmer should know. It may be a predecessor to deep learning (which also isn't an algorithm every programmer should know), but it's purely academic.


How is this still on the HN front page 10 hours after being posted with the most damning comment thread I've ever seen?


I'm a self-taught developer and always suffered from imposter syndrome, especially with regards to "advanced" algorithms.

So, professional devs, do you use these algorithms in solving the problems you are working on, on a regular basis?


I'm also a self taught programmer. I've been coding since I was a child, and professionally for just shy of 20 years.

I took the Stanford Algorithms 1 & 2 courses that used to be free on Coursera and certainly never regretted it. I've since worked at several large tech companies and ALMOST never needed them. From my memory I've needed to implement my own binary search and my own tree traversal (BFS) once each. Both were due to an odd data structure that made it hard to use a library. I probably could've shoe-horned it in somehow, but it was easier (or maybe more fun) in the moment to implement it myself.

However, the thing that REALLY triggered my imposter syndrome wasn't algorithms, but design patterns. When I started at Microsoft I'd been programming for over 10 years and suddenly felt like people spoke an entirely different language. Luckily, it turned out most of the patterns they were referencing by name I had already been using, but just didn't know the vocabulary. I spent a lot of time studying design patterns largely just to have a shared vocabulary (and not feel dumb whenever someone said "maybe we should use a ___ pattern" in a meeting), but I also learned several new ones.


Oh, design patterns ― those were a wild ride for me back in the day!

I stumbled upon the Gang of Four's Design Patterns when I was a kid, feeling like I'd uncovered some coding Bible in a caveman moment.

So, I dove headfirst into trying to understand and apply those patterns to my Java code. Let me tell you, it was a struggle. Failed attempts left me feeling pretty darn horrible and questioning my smarts.

That whole experience still gives me a shiver down my spine.

But, fast forward a few years, and I realized that, hey, I've been using some of those patterns all along—just didn't know their fancy names back then.


Is MS still hung up on GoF design patterns, many of which existed only because of how inexpressive Java and other languages used to be 25 years ago? Or is it a more modern set they were referring to?


I'm sure it varies widely between teams, but yes almost entirely GoF in my experience.


No. Despite what the book title says, 99% of programmers won't need any of these to implement themselves in a professional setting (ever). There is more useful stuff to learn about than this for most devs.


Implement? Never. Unless for fun, advent of code or something, but never at work.

Knowing how they work, that the algorithm even exists and when to use them? Quite often. Very useful to know that "my problem can be modeled as this algorithm problem, and then I can use some tool to solve it". And the same for various data structures. Knowing which one to use can make some things trivial, and not knowing can make it sooo slow. But would never implement them myself.


Absolutely get where you're coming from!

Knowing the ins and outs, recognizing when to pull them out of the toolbox, and understanding the underlying principles—now that's the sweet spot.

It's like having a cheat sheet for problem-solving, making things click into place without having to reinvent the wheel every time.


I may be a very grug-brained „junior programmer“ but I literally never had to implement an algorithm from scratch in a 5 year „career“. Neither did any of my colleagues, who are as a rule smarter and working on harder problems than me.


What a book? When looking at the toc it seems overwhelming (basics, cryptography, ml, ...), haven't seen the page count, but must be a very thick book...


Who not extend it to pipelining algorithms, caching algorithms if you are at it? I don’t think any programmers really know how any of that works.


We all should know machine learning algos? They're cool and I've learned a bunch in school, but this is the crux of programming now?


Has anyone ever used Shell sort?

It is more complicated than Bubble and insertion sort, but has worse complexity than Merge, HeapSort, QuickSort which are n lg n.


It's fairly popular in extremely resource constrained embedded systems (micro-controllers). Because like bubble and insertion sort, it doesn't need a stack, the average case time complexity is often better than bubble sort and insertion sort. Also, it can be tuned to a certain extent using different gap sequences.


Heapsort is best and worst case O(n log n), does not require a stack, and the code is simple; more complicated than shellsort but no so much that you'd notice. I feel like heapsort is always a better choice than shellsort, no?

If I were to make a list of 50 algorithms that every programmer should know, I can't imagine why I would put bubblesort and shellsort on the list, but include neither heapsort nor quicksort.


I agree. Heapsort is on the optimal frontier.


I did once. I needed to do some sorting in VB5 which I don't believe had any built in sorting library (or if it does, I was unaware of it or it didn't work for my use case). I think I went with shell sort because it was easy enough to implement in VB5.


long long time ago. When there was no built-in sorting library and I forgot how to implement n log n sorts.


... with their mathematical proof.

I wonder if it does include the minimal polynomial proof required for floating point computations of trancendental functions.


I found the 40 and 50 algorithms book from the same publisher at a library, most of it was machine learning algorithms though.


I would have read it if it wasn't published from Packt


the whole series "* Algorithms Every Programmer Should Know" is, how to put it, not good.


I still enjoy Julian Bucknall's Tomes of Delphi book: https://www.lulu.com/content/435417

Explains everything well and easy to apply in other languages


Slightly O/T but am I the only person sick of seeing the word “every“ followed by the word “programmer”?

I realise this might be because I’m a very experienced developer, and these things are not interesting to me, but it always seems anything titled this way is often incredibly dogmatic.


Sick and tired. But this is working. It seems that many developers actually want to do/use/learn whatever their peers are. Just look at all the tweets and reddit posts starting with "What should I use...?" or "What is the best...?.

All aboard the hype train!


Every programmer is sick of seeing this.

Joking aside, welcome to the attention economy. There is probably another book with similar content and a less dramatic title, but unfortunately it did not make it to the HN frontpage due to the descriptive title. There is not much you can do, just use your experience to interpret the title correctly ("50 of the most common programming algorithms"). Trying the world from changing never worked.


I have seen that shaming sometimes works, so I wouldn't abandon it. After all, there are so few social deterents we can use that I only say "that won't work" when I have a better suggestion.


Let's do a thought experiment and imagine that we instead titled it in the passive mode: "50 Algorithms Some Programmers May be Interested in Knowing". Are you as likely to read this?


Honestly, it's so offbeat compared to the sensationalist headlines that have been optimized over the last 20 years that it feels refreshing. I would probably read it just for that fact alone!


Controversial: why? Your LLM knows it and you can ask it.


Why learn anything?


For pleasure? Sure


What's with the book cover design being almost the same as [1]?

[1] https://www.amazon.de/-/en/gp/product/1839217715/


The same publisher for both books


Yeah, I realize that, but they are almost identical for something that's not a series.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: