This is not the first time human error on BGP routing configuration which then caused a significant portion of the Internet down. Is there any kind of configuration validator that can be implemented to prevent and catch this type of errors? I am fairly sure this won't be the last time we will hear about human error on BGP routing config causing Internet down.
Or is BGP intrinsically a unsafe protocol without builtin protections on this sort of human mistakes?
BGP is a fitting name for such a distinct plane of computation. Indeed the border where any remaining physical concerns are cut loose and reality melts, receding behind a shroud of gateways, giving way to the vast expanse of cyberspace. Traffic whirls past. Raw elemental ether flows with abundance in this region. Any wizard who happens to experience even brief exposure to it normally considers themselves lucky. But to be enlisted to serve as a warden of the border, whether punishment or honor, is a a responsibility most high. BGP is either the final, or the primal, abstraction depending on which side of the gates you most intimately inhabit. And the task of maintaining it a meticulous and manual art.
It's really hard because it's often not really that your configuration is invalid. It's not a syntax error. It's that BGP provides a way for networks to tell each other about how they see the world, and sometimes what you say can melt down everyone else's systems.
It's a dynamic, constantly-changing system, and the effects of your actions may not always be seen - it's not always obvious how other networks behave and will react to your route announcements. And so even trying to snapshot the current state and say "hypothetically, would this change be a bad idea?" can be hard to get right.
Now, this particular case was of internal BGP use at Cloudflare, so everything I said doesn't necessarily apply... but it still does a little. Even internal networks can be so complicated, they may as well be un-analyzable.
I think the problems here are pretty deep, unfortunately, and have to do with our "network of networks" design.
Or is BGP intrinsically a unsafe protocol without builtin protections on this sort of human mistakes?