Hacker Newsnew | past | comments | ask | show | jobs | submit | bwillard's commentslogin

Officially, up to you if you believe they are following their policies, all of the companies have published statements on how long they keep their data after deletion (which customers broadly want to support recovery if something goes wrong).

- Google: active storage for "around 2 months from the time of deletion" and in backups "for up to 6 months": https://policies.google.com/technologies/retention?hl=en-US

- Meta: 90 days: https://www.meta.com/help/quest/609965707113909/

- Apple/iCloud: 30 days: https://support.apple.com/guide/icloud/delete-files-mm3b7fcd...

- Microsoft: 30-180 days: https://learn.microsoft.com/en-us/compliance/assurance/assur...

So if it ends up that they are storing data longer there can be consequences (GDPR, CCPA, FTC).


TBH, I'd be surprised if they kept significant amounts around for longer, for the simple reason that it costs money. Yes, drives are cheap, but the electricity to keep them online for months and years is definitely not free, and physical space is not infinite. This is also why some of their services have pretty aggressive deletion policies (like recordings in MS Teams, etc).


Howdy, (I work on DTP)

I wanted to provide my thinking on some of these very valid wories,

Re: Copy vs. Move: This was a conscious choice that I think has a solid backing in two things: 1) In our user research for Takeout, the majority of users who user Takeout don't do it to leave Google. We suspect that the same will be true for DTP, users will want to try out a new service, or user a complementary service, instead of a replacement. 2) Users should absolutely be able to delete their data once they copy it. However we think that separating the two is better for the user. For instance you want to make sure the user has a chance to verify the fidelity of the data at the destination. It would be terrible if a user ported their photos to a new provider and the new provider down-sampled them and originals were automatically deleted.

Re: Scraping Its true that DTP can use API of companies that are 'participating' in DTP. But we don't do it by scraping their UIs. We do it like any other app developer, asking for an API key, which that service is free to decline to give. One of the foundational principals we cover in the white paper is that the source service maintain control over who, how, and and when to give the data out via their API. So if they aren't interesteed in their data being used via DTP, that is absolutely their choice.

Re: Economics As with all future looking statements we'll have to wait and see how it works out. But I'll give one antidote on why I don't think this will happen. Google Takeout (which I also work on) allows users to export their data to OneDrive, DropBox, and Box (as well as Google Drive). One of the reasons we wanted to make DTP is we were tired of dealing with other peoples APIs, as it doesn't scale well. Google should build adapters for Google, and Microsoft should build adapters for Microsoft. So with Takeout we tried the specialized transport method, but it was a lot of work, so we went with the DTP approach specifically to try to avoid having specialized transports.

DTP is still in the early phases, and I would encourage you, and everyone else, to get involved in the project (https://github.com/google/data-transfer-project) and help shape the direction of the project.


Hey! Thanks for the response. If you don't mind, I have some questions and comments after reading through your feedback.

> We suspect that [the majority of users who use Takeout don't do it to leave Google] will be true for DTP, users will want to try out a new service, or user a complementary service, instead of a replacement.

Interesting, thanks. I think this sort of worldview makes sense from a certain perspective.

> 2) Users should absolutely be able to delete their data once they copy it.

This is an aspirational statement and not a requirement of DTP, so it's problematic from a public perception standpoint to make the claim that DTP provides the user with more control of their data when the control very much remains at the mercy of the data controller. Indeed, this project directly facilitates the opportunity for more data controllers to obtain copies of the subject's data.

> If they aren't interested in their data being used via DTP, that is absolutely their choice.

Can you clarify whether you are saying that the DTP Project will honor takedown requests from parties targeted by DTP tooling?

> Google should build adapters for Google, and Microsoft should build adapters for Microsoft.

Can you explain the business drivers that incentivize these companies to provide parity between their import and export capabilities? Does the DTP Project require parity between these capabilities?


>This is an aspirational statement and not a requirement of DTP, so it's problematic from a public perception standpoint to make the claim that DTP provides the user with more control of their data when the control very much remains at the mercy of the data controller. Indeed, this project directly facilitates the opportunity for more data controllers to obtain copies of the subject's data.

I don't really disagree with what you, but I interpret things differently:

Without DTP, if you ask a data controller to delete your data you have to trust that they do. There is very little way to verify that the deletion actually happened, you more or less need to rely on the reputation of the company. Nowadays they all should have published retention statements which state their deletion practices in more details, so that helps some, and allow for some recourse if in fact they aren't following it. But in general for the average user, it comes down mostly to trust.

With DTP, nothing is worse. But users now can get their data into a new service easier.

If DTP had move semantics you still have the same problem as above, it mostly comes down to trust.

It is true that after a copy there are now two copies of the data, which isn't ideal in terms of data minimization. But because of the reasons I outline previously, I think it is important to keep deletion as a separate action from copy. I do think that after a copy the option to delete the data should be presented to the user prominently to make that as easy as possible if that is what they want to do.

So DTP isn't trying to solve every problem, but my take is that it makes some things better without making anything else significantly worse, so it's a net win.

> Can you clarify whether you are saying that the DTP Project will honor takedown requests from parties targeted by DTP tooling?

DTP doesn't really store data, so I don't think it is scope for a traditional takedown request. But I think more to the spirit of the question, yes if a service doesn't want to grant a DTP host a API key, or revoked an API, we wouldn't condone trying to work around that.

(One super detailed note, DTP is just an open source project, and doesn't operate any production code. A Hosting Entity can download/run the code. A Hosting Entity could be a company letting users transfer data in or out, or a user running DTP locally. Each Hosting Entity is responsible for acquiring API keys for all the services they want to interact with; including agreeing to and complying with any restrictions that that service might impose for access to their API.)

> Can you explain the business drivers that incentivize these companies to provide parity between their import and export capabilities? Does the DTP Project require parity between these capabilities?

This is a little bit of a bet on our part. I think Google has demonstrated, through its almost decade long investment in Takeout, that giving users more control over their data leads to greater user trust and that is good for business.

As for requiring parity, we cover this a bit in the white paper, but as you say, we recognize the reciprocity is key, and we need to incentive services to invest equally in import and export otherwise the whole thing falls apart.

Right now the stance we are taking is the reciprocity is strongly encouraged and we will be collection stats/metrics to try to measure it so we can name and shame folks that aren't following that. We hope that by providing transparency around different service's practices in this area will allow users to make informed decisions about where to store their data.

An interesting thought experiment in this area is that if a user wants to transfer data from service A to service B, but service B doesn't allow export back out, what should service A do? Ideally you force service B to support export, but on the other hand the user should be in control, and who is service A to say no. Its almost putting the good of an individual user against the good of the ecosystem.

We are hoping that as the project, and the large portability ecosystem, evolves there emerges some kind of neutral governance model that can help mediate some of these issues. It is problematic for service A to decide that question, but a neutral group representing the interests of users will have more legitimacy in making these tough questions.


Thanks for taking the time to provide these detailed follow ups. I'm still pretty wary of this project, but you've demonstrated that at least one person on the team is thinking through some of this stuff.

> An interesting thought experiment in this area is that if a user wants to transfer data from service A to service B, but service B doesn't allow export back out, what should service A do? Ideally you force service B to support export, but on the other hand the user should be in control, and who is service A to say no. Its almost putting the good of an individual user against the good of the ecosystem.

I'll offer that the European Union's answer to this -- the GDPR -- is to put the data subject first. It would be nice to see the DTP Project align with that position.


Please define 'delete' in this context. I'm afraid that if I transfer my data, the original will never be deleted.

Now I've doubled my problem.


In this context, "delete" should probably be understood to mean "removed from production systems, and retained only to the extent required to meet legal obligations".


Howdy (I work on DTP),

I'd say suspicion is always wanted with things like this. If you know of other services you would like to see data transfer to/from, please let us know, we want this be open to everyone, big and small, and are looking for suggestions.

FWIW, the team that is building this at Google is the same team that builds Takeout, so we've have been trying to give users useful tools for leaving Google a while now. We think giving users the ability to directly move data to a new service provider is the next evolution of the Takeout ethos about not locking users in.


Thanks for your effort

I'd lije to be able to transfer all of my maps data. That is the places I have marked by both name, and location as well as tag, reviews, and photos.

places I could exchange this data with include OSM, Apple Maps, Yelp, etc...


Thanks for the reply, I've had some downtime today to look over the documentation. It looks like it's pretty solid, but I have a way to go, but I'm actually planning on becoming active on github for your project. The whole java thing kind of irks me, but hey I did fine with it in college.

How welcoming of contributors is the project?


Super welcoming :).

Re: Java, ya its in the roadmap that the adapters should be language agnostic. Forcing them all into one language, regardless of what it is, is kind of lame.

Check out https://github.com/google/data-transfer-project/blob/master/... for ways to stay in contact with us and start contributing.


Awesome! Thank you for taking the time to respond.

I will try to refrain from criticizing the project until I am more familiar. The next time I interact with this project will be on github, my username there is the same as it is here.


It turns out reports of the Data Liberation Front demise have been greatly exaggerated. We are still very actively improving Google Takeout[0] as well as developing other products[1]. The thing that didn't last was running a quasi-official Twitter account, perhaps without official blessing.

Its true that the Data Liberation Front did start as a group of 20%ers, but like the much less successful GMail, we got turned in a real team and now we have reorgs and TPS and everything.

[0] Check out the new My Activity service as an example of some of the new data we have added: https://takeout.google.com/settings/takeout/custom/my_activi...

[1] https://thejournal.com/articles/2016/12/09/google-unveils-gr...


> The thing that didn't last was running a quasi-official Twitter account, perhaps without official blessing.

Running a quasi-official blog without blessing didn't last either [0]. Glad to hear the team got turned into a permanent one internally, but the lack of a public info on its existence doesn't fill me with confidence that this is something Google wants or prioritises at a management level.

The gp was talking about companies' engineers writing "user-friendly data-portability features. Across all product lines". Is that the case with Takeout at Google, or is it just a job given to one relatively small team of self-starters working on a project that was never spearheaded by management?

[0] http://dataliberation.blogspot.com/


I am having trouble figuring out how in the world they can make the numbers work.

- The lowest possible rates I've seen for personal loans are ~9%.

- From the article: over 50 years the "cumulative divorce rate of ever-married women approached 40 percent". - 22% of divorces are caused by financial dificulty (https://www.institutedfa.com/Leading-Causes-Divorce/), so assume there is no money to get back there.

So for an investor to think giving out this type of loan is better than a standard personal loan the minimum rates they could offer would be 9 / .6 (stay married) / .8 (already bankrupt) = ~18.75%.

And there are all kind of unfavorable terms in this loan:

- it's a bullet loan so there is no incremental income stream (at least until the divorce)

- you have to track people's marital status to see when loans are due

So for the best qualified people you have to charge ~18.75% to come close to breaking even on the returns from a normal personal loan at which point you start having to worry about usury laws that cap the interest that can be charged.

Maybe their algorithm is just going to select for couples with high credit rating that they think will get divorced soon. But once it becomes known that getting approved for one of these loans is a strong signal of getting divorced it seems like people would shy away from them.


>> "But once it becomes known that getting approved for one of these loans is a strong signal of getting divorced it seems like people would shy away from them."

~100% of couples, always and everywhere, are sure they will never get divorced and that divorce is solely for other people. Centuries of actuarial results cannot convince them otherwise, so probably this won't either.


This is a very nice collection of Buffett QnAs.

It would be interesting to see these in a timeline to see how his various answers have changed (or not) over time.


It's true the sales people do out number us here in Chicago, but we do have O(100) engineers in the office. We are working on lots of cool stuff: Ads, Search, and Privacy to name a couple.


You guys need to help the CJUG get some more talks! It's been a while since Google has had a role in that.


And you guys have a super-cool and mildly famous engineering site lead!


Sorry about that. Did you try the move to drive option? One of the reasons we added the ability to move the finished archive to Drive was to allow users with unreliable internet connections to use the Drive Sync tool to sync the finished download down to their computer. I know it is a little inconvenient but it should make it possible to download a large archive over an unreliable connection.

(I work on the Takeout team)


Does your takeout archive include the drive takeout archive?


It does, there was a lot of debate on whether it should or not. If you have strong opinions one way or the other I would love to hear them as we are open to changing that behavior.

If you don't want that behavior, or just want to export part of your drive (or photos) collection you can expand that product in the Takeout UI and just select certain folders.


I didn't know about that option; I must have missed it at the time, or it wasn't available back then. I'll give it a go, thanks!


When you create the Takeout or when you try to download the archive?


I created it twice successfully last night, I thought maybe the servers are overloaded so I waited to download the second one until this morning. Checked my Gmail, got the your archive download is ready email, clicked the link, logged into Google's site, clicked Download Archive...mine is several hundred megs in size, it gets between 60 and 150 megs downloaded then again this morning, it quits with the error I described.

Maybe there's some bug relating to the size of my download but I'd think a few hundred megs shouldn't be an issue to download.


Perhaps you should choose a single category (like Gmail) of your Google content, download that, then go back and download another single category until you have it all.


It would probably work but it's only a few hundred megs I'm talking about. I logged onto my Linode and created a 588 meg file with this:

  dd if=/dev/zero of=testing bs=196k count=3k oflag=dsync 
Then I downloaded it and it went fine. So the problem is on Google's side.

The fact that I tried downloading the first archive so many times their service told me finally that I'd downloaded it too many times then the second archive failed around the same rough time frame. Their service is counting the failed downloads as successful, that's bad in itself.


Just as an FYI Takeout's zip files are limited to 2GB (for compatibility reasons) so you'll actually get a couple zip files, if you want just one file you need to pick one of the other archive formats.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: