Cache keys are unique identifier names for a cache, as otherwise a global cache ...

benatkin · on Dec 10, 2021

Good information!

> In case you like to invalidate the cache every time a specific file (yarn.lock, go.sum, etc) changes, you can explicitly configure this behavior using cache:keys:files https://docs.gitlab.com/ee/ci/yaml/index.html#cachekeyfiles

They're using cache:keys:files so it will be installing all of them each time yarn.lock changes. When a build is triggered where yarn.lock hasn't changed, it does a build without downloading all the packages.

Come to think of it, builds don't always run in chronological order, so it could wind up with extra packages. Yarn has autoclean, but it says to avoid using it. NPM seems to be quite OK with it, though: https://docs.npmjs.com/cli/v7/commands/npm-prune

I think caching two folders - one that contains the downloads and one that contains the installed packages - could be the way to go. Yarn and npm have caches to prevent downloading files. And maybe only cache the downloads on the main branch.

dnsmichi · on Dec 11, 2021

Thank you for the great thoughts :)

> And maybe only cache the downloads on the main branch.

$CI_COMMIT_REF_SLUG resolves into the branch when executed in a pipeline. Using it as value for the cache key, Git branches (and related MRs) use different caches. It can be one way to avoid collision but requires more storage with multiple caches. https://docs.gitlab.com/ee/ci/variables/predefined_variables...

In general, I agree, the more caches and parallel execution you add, the more complex and error prone it can get. Simulating a pipeline with runtime requirements like network & caches needs its own "staging" env for developing pipelines. That's a scenario not many have, or might be willing to assign resources onto. Static simulation where you predict the building blocks from the yaml config, is something GitLab's pipeline authoring team is working on in https://gitlab.com/groups/gitlab-org/-/epics/6498

And it is also a matter of insights and observability - the critical path in the pipeline has a long max duration, where do you start analysing and how do you prevent this scenario from happening again. Monitoring with the GitLb CI Pipeline Exporter for Prometheus is great, another way of looking into CI/CD pipelines can be tracing.

CI/CD Tracing with OpenTelemetry is discussed in https://gitlab.com/gitlab-org/gitlab/-/issues/338943 to learn about user experiences, and define the next steps. Imho a very hot topic, seeing more awareness for metrics and traces from everyone. Like, seeing the full trace for pipeline from start to end with different spans inside, and learning that the container image pull takes a long time. That can be the entry point into deeper analysis.

Another idea is to make app instrumentation easier for developers, providing tips for e.g. adding /metrics as an http endpoint using Prometheus and OpenTelemetry client libraries. That way you not only see the CI/CD infrastructure & pipelines, but also user side application performance monitoring and beyond in distributed environments. I'm collecting ideas for blog posts in https://gitlab.com/gitlab-com/marketing/corporate_marketing/...

For someone starting with pipeline efficiency tasks, I'd recommend setting a goal - like shown in the blog post X minutes down to Y - and then start with analysing to get an idea about the blocking parts. Evaluate and test solutions for each part, e.g. a terraform apply might depend on AWS APIs, whereas a Docker pull could be switched to use the Dependency proxy in GitLab for caching.

Each environment has different requirements - collect helpful resources from howtos, blog posts, docs, HN threads, etc. and also ask the community about their experience. https://forum.gitlab.com/ is a good spot too. Recommend to create an example project highlighting the pipeline, and allowing everyone to fork, analyse, add suggestions.

wdb · on Dec 11, 2021

I think it would be amazing if Gitlab CI would allow to send CI pipeline traces to a OTLP endpoint; I can then decide via Orel-collector where I want to send the trace spans to e.g. Google Tracé or Jaeger etc