BeyondCorp is dead, long live BeyondCorp

zbrozek · on Feb 12, 2022

CorpIT was one of a handful of reasons why I left Alphabet. Maybe it's all rainbows and unicorns if you're a software engineer and you do all your work on a Chromebook. As a hardware engineer it was pretty miserable. You end up with 4-5 computers, only the most-recently-used of which works. Every time you need to switch you end up re-imaging. All infosec policies are designed with no regard for non-SWE use-cases. Want to move files to or from test equipment via flash drive? Infosec policy violation. Want to use Drive to move things to your untrusted machine instead? Sorry, you can't log in to your corp gAccount from a non-corp computer. Want to have two NICs so you can use one to talk to Ethernet-connected hardware directly? Explicitly an anti-supported use case and your machine will often fail to boot, can't be remotely boot-unlocked, and will frequently drop certs.

As far as I could tell it was a huge group of bored people chasing Impact and not solving real problems or even bothering to read tickets. Most of those problems were the product of previous generations of Impact-chasers.

vinay_ys · on Feb 12, 2022

Google's internal security was one of the best I have experienced. Everywhere else, just the broken VPN drove me nuts. Also, one discipline I have maintained across many companies is clear separation between work devices and personal devices. I don't do personal stuff on work devices. I had multiple high-privilege access enabled work devices and all of them worked just fine. Of course authentication tokens/certs had short expiry (rightly so) and I would have to re-authenticate (which was quick and painless and worked 100% reliably). I wasn't a hardware engineer, but I suppose for non-trivial test hardware setups I would be required to work out of a on-premise device lab. For flashing android mobile devices, I could easily do that from my work laptop.

sleepydog · on Feb 12, 2022

This has not been my experience, but I'm not a hardware engineer, and haven't tried any of the things you were frustrated with, so we could both be right. I use a linux laptop at home and a linux desktop in the office.

For almost 2 years of mostly remote working during the pandemic, beyondcorp has caused me trouble maybe 4 times, and 2 of those times were when my office re-opened and I had to revive my security keys that hadn't been used in over a year.

There is a VPN, but I've never had to use it. They kept my corp desktop running in the office which I could use as a jump host when I needed it. Tunneling X11 over SSH worked well, and the browser-based remote desktop worked ~ok.

exikyut · on Feb 12, 2022

I've slowly come to appreciate Google's emphasis on process/police over engineering coherence (and real security :P), so maybe I'm describing the infuriatingly futile here, but...

- Box with Ethernet port that connects straight to CorpIT (something something service account) and client/gadget USB port that shows up as mass storage; you talk to the box via a remote server using a userspace tool that speaks FUSE. Everything staged/uploaded gets archived. You could go one step further and tie this in with goma and have the box receive securely built code, so you'd only need to archive git metadata instead of giant blobs.

- Engineering-specific Drive frontend could give you token/TOTP/straightforward based access to files, with download audit logging/data archiving (...this would be so simple for CorpIT to provide...)

- PCI card with two Ethernet ports, one special one that plugs straight into CorpIT and a data port for single devices or even a local n-port switch; Linux on the card signs the special port into CorpIT via a service account and then uses custom RPC/whatever to mirror all traffic flowing over the data port. Chances are this setup would likely only be able to viably do 10Mbps given that all traffic would be mirrored. The card would require a wakeup (and/or firmware upload) event before the PHY appears, preventing EFI boot confusion.

- "Secure bringup services" branch of CorpIT implements processes and makes services available that focus on the fundamentally open-ended requirements of hardware design, which would ideally provide more cohesive solutions that the workarounds described above.

IMHO I reckon all those bored engineers would have an absolute field day doing stuff like the above (chasing Impact in a hardware context); it's a great pity they aren't adequately enabled to demonstrate the security implications and priority of doing so.

numpad0 · on Feb 12, 2022

It sounds to me like the problem he encountered stems from a flawed understanding that IP:MAC:Machine has 1:1:1 relationship and that subnet:router:default-route:external-ip:machine has a 1:1:1:1:n relationship, which are only the cases for media consumption devices and cloud servers.

And so when he tried to work with a machine without exactly one(1) IP and one(1) active NIC going up the NAT router to the Internet it immediately failed. This is surprising to me that of MANGA, Google seems to be on the weak side with respect to IP networking.

uudecoded · on Feb 12, 2022

Could you elaborate more on the control mechanisms that supported the outcome of "your machine will often fail to boot, can't be remotely boot-unlocked" for said anti-supported use cases?

zbrozek · on Feb 12, 2022

As far as I was able to tell, the system didn't know which NIC it should be routing through. So it would sit at the disk encryption screen and you wouldn't be able to remote unlock it. And you don't have access to the building to go in and do it in person. Or if you successfully cleared that, the OS also rolled the dice to let you authenticate. Rebooting was painful, typically taking 5-15 minutes. And if you tried to edit configurations to try and load the dice, puppet would overwrite them. Tickets seeking help were mostly ignored, citing that dual NIC was explicitly not supported.

exikyut · on Feb 12, 2022

That's interesting to hear that you can remote-unlock machines that are waiting for a disk encryption key. I presume the remote-unlock case is the primary one.

I've long been curious how early boot works on Google servers (and TIL workstations too, although it makes perfect sense) - primarily because I want to copy the techniques myself! :D

How is key storage and device attestation actually done?

qmarchi · on Feb 12, 2022

That's a good question, now isn't it?

In reality, what we can expose, is that during the early boot process, our systems reach out to another system that register's its interest, and as a user (from another trusted device, phone, laptop, etc.) you can visit the web service and click a button.

How and where keys are stored is a great big "?" that's up to the implementer to solve..

exikyut · on Feb 12, 2022

So basically you have 2FA for workstation boot. That's really quite cool.

Of course I want to know how and where the keys are stored :D so I can do the same thing myself! Arguably security systems in this class demonstrate their integrity *because* their architecture is fully open, documented and straightforwardly reproducible. It isn't science if it isn't reproducible, right? Something something computer science...

That's the idealistic view, of course. In the <insert cartoon punching fight cloud here> real world, we have The Legacy PC Problem™, where secure boot isn't, TPMs can be bus sniffed, SGX doesn't really support the hacker/tinkerer exploration necessary to power defense in depth, ME is a ginormous black box that eats authentication headers like they might as well be glue... and it doesn't matter that I am a dog residing on Mars because the MDM my 2FA device is signed into has decided I'm legit.

Hence my interest in real security. It's a giant debacle, surely there are some genuinely cool wins to be had out there that truly make a dent ._.

zbrozek · on Feb 12, 2022

Like most Alphabet implementations it's 80% awesome and 20% undone. Unfortunately that means it's a bad time.

devit · on Feb 12, 2022

The normal non-Google way is to configure initramfs to start an SSH server on a non-default port, and then you log in (from your secure workstation with a dedicated public/private key, of course) and use systemd-tty-ask-password-agent to enter the disk encryption key.

You can then monitor whether your service is up, and if you get a notification that it's down use the hosting service's management interface to reboot the machine/VM and then do the SSH thing.

danans · on Feb 11, 2022

> Even my favourite Googler has 5 laptops at home, but only some of them work, even fewer are for work, and most of them are in the other room.

That doesn't reflect my experience as a Googler. I have one corporate issued laptop that works great for everything from software development, flashing testing devices, managing production services, and browsing the internal meme boards. I don't have to switch devices. I'm not sure what relevance personal devices have. You are pretty restricted in using personal devices for work (phones being an exception).

I think BeyondCorp ended up working fantastically well during the pandemic as basically the entire company has been working via untrusted networks. It's hard to imagine using a clunky old corporate VPN from the past anymore.

jsnell · on Feb 11, 2022

The quote is phrased awkwardly and is kind of a non sequitur, but I don't think you're actually contradicting it? It's not saying that the anonymous Googler has five work laptops. It's that out of the five laptops they have at home, some are broken, some are personal devices that can't be used for work, and one (maybe two?) is a work device used for work.

danans · on Feb 11, 2022

I understand that, but I don't understand how it's an example of BeyondCorp not working somehow.

Maybe the 2 devices are a PC and Mac because the employee needs tools that only work on one platform or the other. But that's orthogonal to BeyondCorp.

lupire · on Feb 12, 2022

The poorly explained point is that your users don't have 2 trusted devices at once, so if you don't like the device the user is using, there is nothing they have that you will like better, so there is no point in having one device vouch for a other, or splitting trust/privileges among two devices for the same user.

jeffbee · on Feb 12, 2022

Wait, that's what that passage is supposed to mean? It does not seem obviously correct to me. Why can I not enroll several trusted devices?

refulgentis · on Feb 12, 2022

You can. There's a focus on verbosity that's hiding the core of "many googlers only have one device!". I don't understand what the authors getting at though, its a non-issue at Google and they should know that? Trivially, let's say that a clear workaround is having a help desk for this situation that doesn't require device trust

joshuamorton · on Feb 12, 2022

Also many/most users with a laptop do have a second trusted device: a phone.

danans · on Feb 12, 2022

The second trusted device is a USB FIDO key. No second computer is needed.

niyikiza · on Feb 11, 2022

The author makes some good points here: devices and CAs are the most challenging part although I think it remains feasible for some types of companies(likely more so than at Google).

I was discussing about ZT with a friend recently and we were agreeing that one of the problems with the USGov memo (and most of ZT advocates) is referring to ZT as an "Architecture". The memo paints a picture of ZT as a destination whereas it really should be understood as a framework, culture and design philosophy. And that makes it, by definition, a journey. Its principles are supposed to guide your architecture design but they are not the architecture i.e there can never really be a point where you can call a friend and be like "Look at this, I've finally 'built' a Zero Trust Architecture". And you can't have a consultant come in and go back a few months later telling you "Alright, here's your Zero Trust Architecture". ZT has to be continuously entangled into your dev flow, ops, policies and day to day technical decision making.

I also suspect that another important missing piece (whether you look at it as a journey or a destination) is how to quantitatively MEASURE progress on Zero Trust. Having precise reference metrics would help in actually enforcing the goal of the memo or at least being able to tell that company A has a better measured ZT progress than company B.

I guess, like they say, "Zero Trust is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it."

Disclaimer: Googler but I don't work on the BeyondCorp team.

theamk · on Feb 11, 2022

I think you are making it sound more complex than it is. ZT is an architecture where people don't need trusted network, but still have the same or better level of security as traditional VPN.

To measure ZT progress, count the services and security controls which rely on VPN/LAN. The closer it is to zero, the closer you are to ZT.

niyikiza · on Feb 12, 2022

Don't trust everything inside the gate isn't necessarily about getting rid of the gate. Like another child comment points out, ZT isn't about getting rid of VPN. There is still a place for VPNs in an organization that embraces the ZT way.

jiveturkey · on Feb 12, 2022

eh no. !VPN is not the hallmark of ZT. It'd be so easy just to put services on the Internet with user/pass. Does that give you ZT?

theamk · on Feb 12, 2022

Just a plain user/pass is not going to cut it, I said "same or better level of security" for a reason.

For example, all VPNs give you a single lockout/change password point, so this has to be at least common authentication. And VPNs have good logging for connections, so you got to have this too. And if your VPN client had device attestation, your solution should have have this too. And also services on VPN are not nearly vulnerable to auth bypass bugs, so your webserver should have some protection on this too (authenticating proxy?).

I think !VPN is exactly hallmark of ZT... as long as security is preserved. If you disagree, what do you think ZT's hallmark is?

(There is a philosophical question: if a company had a crappy VPN with no logging, and multiple VPN servers with no central auth.. and went to equally crappy user/pass webapp on internet.. does this count as "ZT transition"?)

sascha_sl · on Feb 12, 2022

It gives you zero network trust, which is unfortunately what a lot of companies still have. How many companies really do device attestation to connect to a VPN? I haven't been in a single one. (well, one tried, but it didn't really keep you from using a third party client)

Putting your internal apps behind an OIDC proxy instead of the VPN is a straight upgrade at that point. Especially if your provider already does some checks for you (e.g. Chrome Enterprise, requiring Cloudflare WARP app)

jiveturkey · on Feb 11, 2022

I think you're overselling it, but not by much. Indeed, ZTA is not a destination.

0xbadcafebee · on Feb 11, 2022

But they do want to think of it as an architecture. They want some "Architecture Group" to publish a "ZeroTrust Standard" which every team will be required to mindlessly implement so they don't have to actually understand the underlying concepts. It's like those wonderful "security karate" mandatory training courses where they require you watch a video and fill out a multiple choice "test", and after that every application you build will be totally secure by default.

I think the whole DevSecWhateverOps thing fails to account for the severe antipathy large organizations have for outside-the-box solutions. A solution that requires people leave their silos, learn new concepts, or adopt new practices is just too much for them.

lokar · on Feb 11, 2022

The vast majority of apps at Google that are available "beyond corp" are just normal web apps behind a smart reverse proxy that takes care of everything for them.

xyzzy_plugh · on Feb 12, 2022

I'm not a Googler, but this massively understates the architecture they've built. The beyondcorp "smart reverse proxy" solves authentication, but the true innovation is entirely about contextual authorization. Beyondcorp just binds that context to a human's actions for systems to consume.

You can also see this publicly in GCP's Workload Identity and ALTS primitives, which enable very sophisticated policies.

lokar · on Feb 12, 2022

I was there when it was built Almost all the smarts are in the proxy. If you have a typical web based app integration is easy, not some impossible mandate.

xyzzy_plugh · on Feb 12, 2022

Sure, I said nothing to the contrary. It's a minor simplification to call it a reverse proxy. The proxy is built upon a very deep investment in infrastructure, take all of the cert signing stuff, for example.

Most BeyondCorp concepts seem simple, and they are, but they depend on a lot of existing machinery, almost all of which is non-existent in pre-existing corp networks. The average tech company is currently struggling to catch up.

lokar · on Feb 14, 2022

My point was that it is simple from the POV of the app developer. See the comment I replied to.

0xbadcafebee · on Feb 13, 2022

Do you notice all the different components on this page? https://beyondcorp.com/ There are a lot more components and concepts than just an OAuth proxy. A web developer may think it's all very simple from their perspective, but it goes much deeper.

lokar · on Feb 14, 2022

I know on great detail the implementation and what went into it.

My point was it is simple for an app developer to integrate with.

ogazitt · on Feb 11, 2022

The last paragraph is especially salient. To me, ZT is really about recognizing that perimeter security is dead, and a modern approach to authorization requires defense-in-depth. A zero trust access proxy is just one layer (and is inherently coarse-grained). The identity provider and API gateway can provide more gates. And applications themselves should implement fine-grained authorization in a manner that is complementary but independent of upstream access controls.

Melatonic · on Feb 12, 2022

I dont think perimeter security is dead at all - we are just finally recognizing we need more finely controlled access. Just because a castle has a Keep in its center does not mean it no longer needs its main walls.

xyzzy_plugh · on Feb 12, 2022

Perimeter security needs to die before it can be reborn.

The point of all of this is perimeter security becomes an optional, small contributor towards a secure context. It turns the entire legacy corporate network on its head.

Especially at large megacorps, it's borderline impossible to perform incremental migrations towards a zero trust network without first killing perimeter security as a concept. You can always introduce it later, but it's a different beast entirely.

lupire · on Feb 12, 2022

what's the difference between a perimeter and a proxy? Sure, you can't give everyone in the org root on everything, but that was always true even on a physical or virtual private network.

closeparen · on Feb 12, 2022

Perimeter: services totally naive to authentication and authorization. It is assumed that no bad actors could open a connection in the first place. If you are in the office or on VPN, you’re good. Perhaps there are broad compartments like different rooms in the office or different firewall profiles loaded by the VPN.

Proxy (BeyondCorp): services remain naive but employees and offices are evicted from the trusted network. Employee requests go to the proxy, which checks authentication and authorization before passing along the trusted network to services. Services and potentially engineers/SREs who can get on the production network can still make whatever requests they like.

Zero Trust: all services on the production network authenticate all requests. Even if you are root on a production box you would need tokens/certs to get useful responses from other services on the same production network segment.

I entered the industry at a “proxy” company that is pushing towards zero trust. It’s hard to believe that the “perimeter” model is real. But you look at something like Target getting owned by thermostats in its stores that merely needed internet access, it is clear that some enterprises do work that way.

londons_explore · on Feb 11, 2022

Beyondcorp is still great for all http network services.

Having an nginx loadbalancer doing all the beyondcorp stuff and forwarding on any authorized request is pretty straightforward and covers a large chunk of what your employees will be wanting to do. It easily lets you allow employees to access low risk services from their personal unmanaged and untrusted phones. It also means any home grown internal service doesn't need to do Auth - it can just look at the trusted header from the loadbalancer and know who the user is logged in.

You have then at least substantially lowered the remaining attack surface, which now consists of just the non-http services (shared drives, SSH, remote desktop, etc.).

For those, a VPN server allows you to authenticate the user and device, and a big dynamic set of iptables rules lets you decide which sessions can access which service.

qmarchi · on Feb 12, 2022

To add ontop,

It's perfectly capable that you control things like SSH and other services through a Zero-trust system as well.

Personally, there's things like Pomerium.io that can handle HTTP(s) services, as well as TCP services.

Even internally at Google, the machine, and user are used to validate SSH sessions, and the remote system has a unique policy on who's allowed to access it.

Disc: Googler, not on BeyondCorp

devit · on Feb 12, 2022

Not sure why it would be so hard.

All you need is a centralized authentication server (preferably including access logging) and a simple way to use it for all applications (the simplest is to have the app listen on localhost and expose it via nginx configured to use the authentication service).

I guess for real security you also need to forbid logins from non-secure devices, which is trivial if you can trust employees and only hire security-competent engineers, and otherwise needs dedicated devices, secure boot and remote attestation (which is also easy but will take some work to do).

imwillofficial · on Feb 11, 2022

This is a great article and gives good context for orgs trying to dive in.

deathanatos · on Feb 11, 2022

> The reality is if you say you’re “doing zero trust” to a security professional today, they’ll assume you’re naïve, and haven’t realized what you’re actually signing yourself up for.

The old-school variant has the same issues, IMO? A previous employer of mine was "everything was on the company wide VPN" i.e., no SSO, half the stuff is insecure.

Externally, you could connect to the VPN with a fairly bog standard VPN client. But how was the trust established? Well… it trusted DigiCert's CA cert. Meaning anyone who purchased a DigiCert cert & could obtain a privileges position on the network could MitM the VPN connection.

Inside the office, the WiFi was connected to the VPN (i.e., VPN client was only required off-site) … and the office WiFi was WPA2-PSK. And of course the PSK a. was not a good PSK and b. did not rotate when employees left the company.

Worked for a larger corp. Same idea, internal net was trusted net. STP packets on the ethernet ports, which IIRC my network training probably meant that I could convince a switch on the network that I was a switch, please start routing me traffic.

> They’ve tried to use the tools already available in the market themselves

I've tried to use more old school solutions, like LDAP. LDAP tooling is horridly difficult to set up. I gave up on my first attempt; I think today I know where I went wrong. Things like pfSense & VPN are also incredibly complex, and my understanding on the security community consensus on IPSec is that it is that its complexity guarantees that it is insecure.

> The tools that are on the market today aren’t even doing the hard part of zero trust yet.

But I don't disagree with that, too. The tools are definitely not up to snuff. Getting working SSO, even with just a decent MFA experience & then getting a service to authz with it has been considerably complex. & like the author surmises … even if I ignore device security.

Now we're all remote … yet security would still like to allowlist network access by IP address, meaning one of these days I am creating a common, centralized white pages of employee IP addresses because those are getting numerous. (I have mixed feelings on this. The list is a PITA to maintain. But it does keep random Internet riff-raff from ever hitting an open SSH port… and while yes, people should use keys, all it takes is one mistake. And people make mistakes.)

ZT is IMO the right approach; the network is already compromised. Whether that's at the Internet or the port on your laptop doesn't matter. Yeah, people definitely forget the device bit. (And ought not to.) But the other approaches aren't implemented with any more diligence.

jrockway · on Feb 11, 2022

> Inside the office, the WiFi was connected to the VPN (i.e., VPN client was only required off-site) … and the office WiFi was WPA2-PSK. And of course the PSK a. was not a good PSK and b. did not rotate when employees left the company.

Yeah, I've never found network topology to be a good way to manage trust. Even ignoring employees sticking Raspberry Pis under desks or reconnecting using non-rotated credentials, it's ridiculously easy to convince apps to make internal requests expecting them to be external requests. Slack had a big security incident when its url unfurler, which runs on its internal network, started making requests to other internal services, thinking that they were external websites. The problem is that internal addresses were implicitly trusted, and people have endpoints like /quitquitquit hanging around, and that combination ends up being game over.

Ultimately, every request depends on at least two pieces of information: what user is making this request, and what downstream application is making this request. Most people only take into account the first, and thus these problems recur. (Because operators get made when they "kubectl port-forward" in and the application rejects their debug requests because "random unauthenticated HTTP request" does not meet the security requirements. This, of course, is a good thing. For security, anyway.)

> Getting working SSO, even with just a decent MFA experience & then getting a service to authz with it has been considerably complex.

Yeah, the industry seems to have decided upon OIDC, which is significantly more complicated for both the operator and the application developer. I really like the way Google's managed auth proxy works, and I'm surprised it's not more popular. If the request goes through the proxy, it injects a signed header with the user information in it. The application simply uses a few lines of code (ok, JWKS is involved, so a lot of lines of code to keep the list of trusted public keys up to date) to verify the signature and extract the username, and then can make an authorization decision. No cookies, no redirects.

I ended up writing my own proxy that uses username + WebAuthn to authenticate and pass this information on to applications behind the proxy, and it's nicer than any auth solution I've paid 100000x more for. I can FaceID into internal status pages when I'm out drinking, impressing everyone! OK, not very many people are impressed, but they can at least see the thing I want to show them. I'm surprised there's no maintained OSS thing that works like this.

vineyardmike · on Feb 13, 2022

Care to share your impl, since you're surprised no one else has an OSS version?

Spivak · on Feb 11, 2022

I’m surprised that this isn’t a solved problem already because we already have a currently operating global scale system for application and device trust — DRM! Don’t think about the problem like an IT person, think about it like you’re Netflix.

wmf · on Feb 12, 2022

Can't tell if joking. If hacking Google was as easy as breaking DRM, civilization would have collapsed already.

Spivak · on Feb 12, 2022

And device authentication deployed by IT can be broken just as easily. You really believe that the device certs deployed on a laptop are going to fare any better than the Widevine decryption key?

wmf · on Feb 14, 2022

I guess the difference is that DRM is basically a single factor with no defense in depth while BeyondCorp uses at least three factors.

closeparen · on Feb 12, 2022

The part about device-bound credentials, TPMs, secure boot, and remote attestation is DRM. One of the biggest real world deployments of those technologies is the XBox.

happyopossum · on Feb 12, 2022

1) DRM doesnt improve appsec, it’s only about content security

2) Netflix runs a single application, whereas organizations maintain and provide hundreds

3) Netflix doesn’t need to do any device assurance or inventory

4) Netflix has essentially one level of access to enforce, vs hundreds of roles across hundreds of applications.

Spivak · on Feb 12, 2022

1) DRM absolutely improves appsec, one of the main things it does it allows you to ensure that the client software is running your unmodified code and protected from the OS.

2) Okay but a web client will get you 80% the way there.

3) Yes/No. Netflix doesn’t but that’s because Google does it for them for Widevine deployments.

4) Widevine has three, pretty complex, policy levels. No reason there couldn’t be more. Plus device authentication is just a metric used in authorization decisions. Nothing is stopping you from having arbitrarily complex authz.