Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is incredibly dumb. Both Pendo and Snowplow are analytics providers, meaning they both have in their TOS that the company remains the owner of the data in question and that the services only exist to facilitate analysis of the data in question.

Effectively this is users complaining that Gitlab wants to simplify their data analysis overhead. Presumably nothing precludes them from sending the exact same data to these companies and more on the backend. What do users expect? For Gitlab to build every single part of their stack in-house (CRM, analytics, support tooling, etc)? Because that's what this is effectively asking for.

What's next? Protesting that a company uses RDS instead of their own hand-rolled Postgres setup? Because this is the same level of stupid.



What about running third party scripts on the page, which would have access to all code on the account you’re logged in with? How do organisations audit these scripts, and how can they audit new versions of these scripts when gitlab controls the release strategy of these scripts?

You’d be moving from one (possibly two if you include the cloud provider) vendors having theoretical access to all of your code to four vendors having potential access.


Any vendor Gitlab works with already has potential access. Just because you have a known front-end attack vector doesn’t mean you’ve gone from 1 to 4. You’ve been at N the whole time, it just hasn’t been as visible.

FWIW I agree that on-page JS on pages with source code is a terrible idea, but that’s easily fixable and doesn’t seem to be at the root of the issue.


I think part of the issue was that there are many cases where you can't send potentially sensitive information to a third party, regardless of their TOS.

I left a comment on the feedback issue about this. It's not as comprehensive as a third party, but you can build your own analytics in house. There are a lot of managed services (like BigQuery) that make it significantly easier to implement it yourself, and you do get valuable insights from such data.


> I think part of the issue was that there are many cases where you can't send potentially sensitive information to a third party, regardless of their TOS.

I don’t think this is true, provided the third-party is GDPR compliant themselves. It’s the controller-processor relationship under GDPR. Presumably if there was not a cutout for this, AWS would not be able to exist.


I'm not talking about GDPR specifically, I'm talking about embedding a third party script (or sending data to a third party) from a company that I have no relationship with. Many companies would find that unacceptable, especially within their source control. Where all their IP is hosted.

The "can't" here isn't necessary legal, it could be internal.


I think that’s a fair point from a security standpoint, but there are clear technical solutions for sending telemetry data to third party scripts without allowing page access that are well established (e.g safeframes) yet the conversation isn’t about that. The conversation (more like coordinated uni-directional screeching) is instead an irrational moral panic which is not justified by the facts at hand.

I don’t see the fact that you have or don’t have a relationship with the company as relevant. The mechanism of passing of the data is just an implementation detail. You don’t have control over what relationships the company has on the backend (e.g. what if they store your telemetry data in BigQuery or Snowflake, or keep your log data in Loggly) so I don’t see how this expectation suddenly applies if the data is being sent from the frontend instead.


In the self hosted instances, which is what I was talking about, you do have a control over the backend. At last more than you do with the managed version. It pretty much boils down to this: Can something leak sensitive information to a third party. If so, then it's a no go.

If you have a contract with that third party, and you deem that third party to be a safe harbour for your data (yes, that includes gitlab.org, AWS, etc), then that's a different case.

If Gitlab was to have instead said:

1. We are going to enable telemetry on all public repositories on Gitlab

2. On self-hosted instances we will provide you with the ability to embed your own analytics, from a company of your choosing

Then the screeching (and I fully agree it is screeching) would have been less. Unfortunately, with self-hosted instances, you simply cannot allow Gitlab to leak information like that to a company you don't have a direct relationship with. I'm not sure how else to phrase this concept or explain it, and it doesn't really matter if you use safeframes or not.


I see your point with respect to self-hosting. I get that most of the reason for running on-prem is data security and privacy and I can see why people might get annoyed at something they thought they were buying not really being there.

That said, while I'm not familiar enough with the details of how Gitlab supports self-hosting to comment on whether or not their particular case allows them control over the backend still or not, many self-hosted AWS solutions are implemented as marketplace AMIs for which the end-user can run in their VPC but still doesn't maintain control over what is running inside the AMI. It's not necessarily that odd for software implemented this way to still phone home with telemetry.


They indeed do have opt out instance wide telemetry. This is restricted to specific site-wide activity: number of merge requests created, users active, gitlab version, usage of feature X etc. It doesn’t send back any sensitive data (project names, namespaces, comment text, diffs), or give the potential to access that to a third party.

You can also view all the data it sends back in the admin console, and disable it.

Again, it’s the trust aspect. Sure, gitlab could just silently implement a phone home with all your private data (even by accident). They would be put out of business if they did, for breaking their contract with us and others. Nobody would trust them.


Most companies won't allow telemetry nor arbitrary code fetched from the Internet in their private network.

It is common sense. Some are even legally required to ensure that.


The main issue was that this is for internally hosted enterprise instances, so no they can't just pass along the same data on the backend because they don't control the environment the backend is running in.

In the hosted gitlab, if they want to keep me as a paying customer they should be looking at on premise analytics providers. If there going to be sending data out to random third parties I don't trust, who I can't trust because I have never heard of them and have no relationship with, then I can't trust gitlab either.


There is a lot of FUD whenever someone mentioned third party because many don't know the difference between a dedicated analytics provider and Facebook.


How would you know a difference, how could you trust the difference (given the adtech industry being what it is), and how can you know things won't change over time? All that without being in a contractual relationship with that third party.


Are you in a contractual relationship with AWS for every service you use that uses AWS on the backend? How about Loggly? How about BigQuery/Snowflake? Intercom if they use that? Salesforce? Facebook and Google if they run ads? They may not do anything weird with your data now but how can you know that things won’t change over time?


GDPR.


You don't understand GDPR very well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: