Why can't debugging be tested?

josephg · on Jan 7, 2023

It can. I love using debugging problems as an interview question.

Give the candidate some simple code (~200-300 lines is probably about right) with a handful of failing unit tests. Sit them down, walk them through running the test suite and ask them to find and fix as many bugs as they can in 30 minutes.

It pays to make the first bug trivial (like, a typo or something) and make the subsequent bugs increasingly subtle. You learn an awful lot about a candidate by watching their process.

The hard thing with this sort of question is calibration. Candidates will be able to debug way less code than you think in 30 minutes. It helps to run the code past a few coworkers before giving it to candidates to figure out if its too hard a test.

The surprising thing about this test is that it seems to be really highly correlated with senior engineers. Senior engineers often don't do as well as smart grads at coding problems because we often don't write that much code any more. But senior engineers seem to do waay better than juniors at debugging problems. If I could only get a candidate to do one programming problem, I'd get them to debug code. I think you get a massive amount of signal about whether you'd want to hire someone out of a debugging challenge.

gregmac · on Jan 7, 2023

Debugging works extremely well during an interview because:

1. Your evaluation can be interactive: you're watching them step through, can stop and ask questions about their thought process, etc. If it's a theoretical problem (ie: not on a live computer) you can even re-calibrate on-the-fly if they're really quick (maybe a lucky guess) at narrowing down the issue, or just asking "what if this happened instead, what would you do?"

2. Grading the candidate is subjective, just like the rest of the interview process.

The OP was talking in the context of college-style evaluation. I don't think you can apply either of these things to grading someone in a course. (1) doesn't scale, and (2) isn't a fair (or unbiased) way to evaluate students.

josephg · on Jan 8, 2023

Sure you can.

I’ve done “prac tests” before when I was studying CS. The test happened in a computer lab. We were given some specs and had to submit programs which implemented the specs. (The specs were written in a way that grading could happen automatically).

Just do the same thing, except the student is given some code with some failing tests. Their grade is determined by how many of the bugs they can fix within the time allotted.

gregmac · on Jan 7, 2023

I can think of a couple challenges. Let's say you're debugging "the website won't load."

Often debugging is a hunch, and quickly branches. One person will go down the path of trying the same site from another PC. Another will try a different site from the same browser/PC. Another will try to ping the server.

If you're running this live you're always going to have the chance where someone's hunch or first try basically pinpoints the problem and eliminates a dozen checks from their consideration. If you're looking for things like "did they check DNS? Did they try another site?" but they skipped that and immediately figured out apache wasn't running, do they lose points? Or do they get 100% for their lucky (and/or experienced) guess?

If you do it on paper, the major problem is basically infinite branching for every step and possible result. How do you mark that? What's the minimum you need to do to get 100%? I think it's also unfair, because even as a seasoned vet I've been deep in troubleshooting to the point of (for example) greping source of nginx for a specific error message -- which I'd never have even thought of two steps earlier. I do t expect anything that complex on a school test of course, but the point is the result of each step is often the only way to even think of the next.

atoav · on Jan 7, 2023

Sure testing a beginner is hard here. But testing someone who learned how to debug means testing their knowledge of the moving parts of the system they are debugging. And this can be explained by the person being tested.

E.g. "I am going to ping an IP on the outside first, to see if ICMP message reaches the outside server. If yes I will check if the DNS server responds, if no we check the physical connection."

That would be an ideal answer, judging less ideal answers fairly is certainly a challenge, but not impossible.

no1lives4ever · on Jan 7, 2023

And what if the firewall blocks icmp? What if the dns server is internal to the network and is returning a stale ip? There are way too paths down this rabbit hole.

gregmac · on Jan 7, 2023

This is exactly my point. It's even worse in a "school test" situation, because who knows what contrived scenario the instructor has invented depending on what their focus is (eg: DNS vs physical net vs routing vs apache config).

Maybe there are multiple DNS servers for the zone that are returning different IPs, maybe one or both of them is even in split-zone configuration so it returns a different IP depending on if you're internal or external. Maybe the client has manual DNS configured or a hosts entry that's wrong.

Each of these problems would have several more layers of troubleshooting steps and branching, and it's not even a complete list -- and this is only if the problem is DNS-centric! There's hundreds of other branches for each of the other problem categories.

atoav · on Jan 8, 2023

I mean if it is a test you obviously test things that you tought them first, or not?

Of course in reality there can be more, weirder things — especially if you are coming into an unknown network. But we are talking about an educational context here, would not make a lot of sense to let your students run into new unknown issues on a test unless your goal is not to educate.

sideshowb · on Jan 8, 2023

More simply you could just ask to brainstorm 5 things the issue might be.

no1lives4ever · on Jan 7, 2023

One of the problems with debugging is that everyone seems to approach it differently. With some of the approaches working better than others in some situations. Which is why a group debugging session works great when you have a tricky problem that is not getting solved by one person.

We can definitely test for basic debugging and troubleshooting skills. But I havent found a way to consistently evaluate people who are capable of identifying and finding solutions to complex problems. These days a lot of these come down to framework level experience. With the proliferation of frameworks and tools used in modern apps, it is impossible to find someone who can solve problems involving all of them. So in a big team you want a variety of such experiences to cover a wider base.

Having said that, I have been in many situations where i have had to join a debugging session involving technologies or programming languages where i have had zero prior knowledge and have moved things towards a solution by asking what at times seems like basic queries to help others come up with the solution.

invalidname · on Jan 7, 2023

I can personally evaluate a specific persons skill. But doing this in a formal gradable test would be hard.

clintonb · on Jan 7, 2023

Debugging is difficult to test in a university/exam setting, but can be tested in an observational sense. Stripe does it in one of the on-site interviews.

marginalia_nu · on Jan 7, 2023

A compounding factor is that it's much harder to debug code you're not at least reasonably familiar with.

Even if you can gain such intuition through looking at a piece of code, in a realistic scenario, more often than not you don't actually know in which file or method the bug is hiding. What you have is a code base and some externality that is wrong, the debugging process is essentially deepening your understanding of the relevant code (in relationship to what it should be doing) until you understand what is wrong.

Debugging isn't being able to use a debugger or a profiler or syscall tracing tool or whatever, sure they help but the critical skill is being able to quickly and accurately model the behavior of a system in your head, in combination with a tacit understanding of where bugs tend to occur.

no1lives4ever · on Jan 7, 2023

I have been in enough debugging sessions to know that it is easy to weed out those who are bad from those who have the basics right. But trying to know if a candidate is good with advanced debugging is a futile exercise. You need to be in a few intense sessions with such people to know who are good/bad at these things.

I have only been able to figure this about people after working with them for 6-12 months. Which is why it helps keep references to such people and have them on your teams in future ;-)