This is incredibly dumb. Both Pendo and Snowplow are analytics providers, meanin...

jackcodes · on Oct 30, 2019

What about running third party scripts on the page, which would have access to all code on the account you’re logged in with? How do organisations audit these scripts, and how can they audit new versions of these scripts when gitlab controls the release strategy of these scripts?

You’d be moving from one (possibly two if you include the cloud provider) vendors having theoretical access to all of your code to four vendors having potential access.

jamiequint · on Oct 30, 2019

Any vendor Gitlab works with already has potential access. Just because you have a known front-end attack vector doesn’t mean you’ve gone from 1 to 4. You’ve been at N the whole time, it just hasn’t been as visible.

FWIW I agree that on-page JS on pages with source code is a terrible idea, but that’s easily fixable and doesn’t seem to be at the root of the issue.

orf · on Oct 30, 2019

I think part of the issue was that there are many cases where you can't send potentially sensitive information to a third party, regardless of their TOS.

I left a comment on the feedback issue about this. It's not as comprehensive as a third party, but you can build your own analytics in house. There are a lot of managed services (like BigQuery) that make it significantly easier to implement it yourself, and you do get valuable insights from such data.

jamiequint · on Oct 30, 2019

> I think part of the issue was that there are many cases where you can't send potentially sensitive information to a third party, regardless of their TOS.

I don’t think this is true, provided the third-party is GDPR compliant themselves. It’s the controller-processor relationship under GDPR. Presumably if there was not a cutout for this, AWS would not be able to exist.

orf · on Oct 30, 2019

I'm not talking about GDPR specifically, I'm talking about embedding a third party script (or sending data to a third party) from a company that I have no relationship with. Many companies would find that unacceptable, especially within their source control. Where all their IP is hosted.

The "can't" here isn't necessary legal, it could be internal.

jamiequint · on Oct 30, 2019

I think that’s a fair point from a security standpoint, but there are clear technical solutions for sending telemetry data to third party scripts without allowing page access that are well established (e.g safeframes) yet the conversation isn’t about that. The conversation (more like coordinated uni-directional screeching) is instead an irrational moral panic which is not justified by the facts at hand.

I don’t see the fact that you have or don’t have a relationship with the company as relevant. The mechanism of passing of the data is just an implementation detail. You don’t have control over what relationships the company has on the backend (e.g. what if they store your telemetry data in BigQuery or Snowflake, or keep your log data in Loggly) so I don’t see how this expectation suddenly applies if the data is being sent from the frontend instead.

orf · on Oct 30, 2019

In the self hosted instances, which is what I was talking about, you do have a control over the backend. At last more than you do with the managed version. It pretty much boils down to this: Can something leak sensitive information to a third party. If so, then it's a no go.

If you have a contract with that third party, and you deem that third party to be a safe harbour for your data (yes, that includes gitlab.org, AWS, etc), then that's a different case.

If Gitlab was to have instead said:

1. We are going to enable telemetry on all public repositories on Gitlab

2. On self-hosted instances we will provide you with the ability to embed your own analytics, from a company of your choosing

Then the screeching (and I fully agree it is screeching) would have been less. Unfortunately, with self-hosted instances, you simply cannot allow Gitlab to leak information like that to a company you don't have a direct relationship with. I'm not sure how else to phrase this concept or explain it, and it doesn't really matter if you use safeframes or not.

jamiequint · on Oct 30, 2019

I see your point with respect to self-hosting. I get that most of the reason for running on-prem is data security and privacy and I can see why people might get annoyed at something they thought they were buying not really being there.

That said, while I'm not familiar enough with the details of how Gitlab supports self-hosting to comment on whether or not their particular case allows them control over the backend still or not, many self-hosted AWS solutions are implemented as marketplace AMIs for which the end-user can run in their VPC but still doesn't maintain control over what is running inside the AMI. It's not necessarily that odd for software implemented this way to still phone home with telemetry.

orf · on Oct 30, 2019

They indeed do have opt out instance wide telemetry. This is restricted to specific site-wide activity: number of merge requests created, users active, gitlab version, usage of feature X etc. It doesn’t send back any sensitive data (project names, namespaces, comment text, diffs), or give the potential to access that to a third party.

You can also view all the data it sends back in the admin console, and disable it.

Again, it’s the trust aspect. Sure, gitlab could just silently implement a phone home with all your private data (even by accident). They would be put out of business if they did, for breaking their contract with us and others. Nobody would trust them.

hsiehwnb · on Oct 30, 2019

Most companies won't allow telemetry nor arbitrary code fetched from the Internet in their private network.

It is common sense. Some are even legally required to ensure that.

flukus · on Oct 30, 2019

The main issue was that this is for internally hosted enterprise instances, so no they can't just pass along the same data on the backend because they don't control the environment the backend is running in.

In the hosted gitlab, if they want to keep me as a paying customer they should be looking at on premise analytics providers. If there going to be sending data out to random third parties I don't trust, who I can't trust because I have never heard of them and have no relationship with, then I can't trust gitlab either.

mobjack · on Oct 30, 2019

There is a lot of FUD whenever someone mentioned third party because many don't know the difference between a dedicated analytics provider and Facebook.

TeMPOraL · on Oct 30, 2019

How would you know a difference, how could you trust the difference (given the adtech industry being what it is), and how can you know things won't change over time? All that without being in a contractual relationship with that third party.

jamiequint · on Oct 30, 2019

Are you in a contractual relationship with AWS for every service you use that uses AWS on the backend? How about Loggly? How about BigQuery/Snowflake? Intercom if they use that? Salesforce? Facebook and Google if they run ads? They may not do anything weird with your data now but how can you know that things won’t change over time?

TeMPOraL · on Oct 30, 2019

GDPR.

jamiequint · on Oct 30, 2019

You don't understand GDPR very well.