For years people have essentially made a living off FUD like "ignore the literal legal agreement and imagine all the worst case scenarios!!!" to justify absolutely farcical on-premise deployments of a lot of software, but AI is starting to ruin the grift.
There are some cases where you really can't afford to send Microsoft data for their OpenAI offering... but there are a lot more where some figurehead solidified their power by insisting the company build less secure versions of public offerings instead of letting their "gold" go to a 3rd party provider.
As AI starts to appear as a competitive advantage, and the SOTA of self-hosted lagging so ridiculously far behind, you're seeing that work less and less. Take Harvey.ai for example: it's a frankly non-functional product and still manages to spook top law firms with tech policies that have been entrenched for decades into paying money despite being OpenAI based on the simple chance they might get outcompeted otherwise.
Gah, this is just not how it works. You are probably right that e.g. patient information, private conversations, proprietary code, etc would be safe with OpenAI. But it's not the on-prem team that needs to convince the rest of the organization to keep things on prem. Quite the opposite -- every single tech person would love to make our data someone else's problem (and get a big career boost from dealing with cloud tech instead of the dead-end that is local sysadmin!).
But you just can't. You cannot trust the scrappy startup OpenAI. You can't even trust Microsoft's normal cloud offering, because the people who actually give a fuck about the risk NEED to have granular detail of what data, readable by whom, is stored exactly where and for how long, and how can you make sure, and how do you know that access is scoped to the absolute minimum number of people, and is there a paper trail for that?
For these "figureheads": the buck, stopping, here, etc.
> because the people who actually give a fuck about the risk NEED to have granular detail of what data, readable by whom, is stored exactly where and for how long, and how can you make sure, and how do you know that access is scoped to the absolute minimum number of people, and is there a paper trail for that?
You realize that Microsoft is a publicly traded company that has multiple privacy certifications? They have to subject to detailed data ownership and consumption audits. They most definitely have data ownership and retention logs and you can request for copies of their certification audits to understand how they log/track this. I think the parent is being too optimistic but your answer is so comically simplistic it's silly. I highly suggest you read about the world of HIPAA, PCI, and FedRAMP instead of just thinking "omg the data".
Absolutely. I am well aware of this, as are most tech people, that's what I'm saying. It's not us that are trying to convince our orgs to build a rack in the basement "to be more secure" just because we want to hear the fans running and see the lights blinking.
It's the lawyers that you need to convince. Good luck convincing any bigco lawyer that your company's data is safe on openAI because their legal agreement says "we don't train on API calls."
Vendor risk management is a thing, and plenty of companies that work with medical data or legal data or financial data or sensitive government data and are, in fact, able to store that data with their vendors. This is a thing that happens all the time by literally all of the companies.
This nonsense that you can't trust anyone with your data is completely unfounded
You're not saying anything counter to what I said.
> You cannot trust the scrappy startup OpenAI
Not saying you do: Azure has a dedicated capacity driven GPT-4/3.5 offering that you can stick in your VPC with everything from PCI to HITRUST certs. These are the things that come out if you actually care about delivering solutions vs jumping to deliver the right sounding words for the figureheads like "We'd never trust those scrappy OpenAI guys!!!!"
> Quite the opposite -- every single tech person would love to make our data someone else's problem (and get a big career boost from dealing with cloud tech instead of the dead-end that is local sysadmin!).
You're attracting the least equipped people who tumbled into what you just admitted is a dead end trajectory, usually paying below market rates as a result, and then expecting them to outperform the people paying the most money for competent security outlays with much bigger fish (Azure is working with teams that need FedRAMP, DoD certs, HIPPA compliance, and much more)
The end result is that you end up with a poorly maintained leak sieve of an infrastructure in which Azure would likely be the most secure component you have to lean on in your entire organization.
You say:
> because the people who actually give a fuck about the risk NEED to have granular detail of what data, readable by whom, is stored exactly where and for how long, and how can you make sure, and how do you know that access is scoped to the absolute minimum number of people, and is there a paper trail for that
They don't care about risk, they care about flawed perceptions of risk that don't align with reality. These are the same companies that get pwned for years through some basic social engineering, and all that they ever have to show for it is audit logs that show who ac... ah wait no one ever actually checked the logs and it turns out they're useless because subsystem X Y and Z aren't even connected to it.
There are some cases where you really can't afford to send Microsoft data for their OpenAI offering... but there are a lot more where some figurehead solidified their power by insisting the company build less secure versions of public offerings instead of letting their "gold" go to a 3rd party provider.
As AI starts to appear as a competitive advantage, and the SOTA of self-hosted lagging so ridiculously far behind, you're seeing that work less and less. Take Harvey.ai for example: it's a frankly non-functional product and still manages to spook top law firms with tech policies that have been entrenched for decades into paying money despite being OpenAI based on the simple chance they might get outcompeted otherwise.