Credentials, Risk, and The Supply Chain: Lessons to Learn From The Codecov Breach

By Dotan Nahum April 20, 2021

It seems like there’s a data breach disclosed every day. They come in a variety of forms and from all possible industries and verticals. However, some data breaches can be more impactful than others. For example, chain supply attacks resulting in data breaches can potentially have a cascading effect across network domains and vendor clients.

With the SolarWinds breach still fresh in everyone’s mind, comes another reminder of just how potentially devastating a supply chain attack on a SaaS vendor can be. Not only to the vendor themselves but also to their customers.

The Codecov breach: what we know

Codecov is a leading code coverage solution with a SaaS option that uploads your code coverage information for analysis on Codecov servers. Codecov servers were breached around April 1st 2021, after being tampered with as early as January 31st 2021

codecov logo

An attacker successfully accessed and modified a critical part of Codecov’s infrastructure. By doing that, they gained access to part of Codecov’s customers’ infrastructure as well. This potentially enabled the malefactors to get hold of customers’ secrets and disrupt critical systems.

Though this breach is still under investigation, it showcases a fairly sophisticated supply chain attack. In this kind of attack, the malefactors will target a 3rd party, typically one of many vendors that an organization uses. Today, it is also typically a cloud service.

In this specific case the value of code coverage analysis (which used to be local to an organization i.e. we used to put code coverage on a local Jenkins in the old days), is entrusted with an external, cloud-based vendor adding it to the supply chain of value to the end-user.

With only limited facts at our disposal (the breach report on Codecov’s blog and press release by Reuters), we’re going to try and analyze the breach. Our goal – To extract actionable insights anyone can apply in order to avoid being the next victim or collateral damage of such a breach.

To get the best value of our analysis, We’re going to look at:

  1. Interest, motive, and ROI: What could be in it for an attacker? This helps figure out how attackers think.
  2. Impact and forensics: An analysis of what was or could be the attack vectors as it stands right now. Let’s squeeze some more juice out of the lemon, and see what more we can learn.
  3. Lessons learned: Immediate and key takeaways for all of us. These are actions we can take to protect ourselves and our code as soon as tomorrow morning.
codecov breach meme

Interest, motive, and ROI

According to Reuters: “Codecov makes software auditing tools that allow developers to see how thoroughly their own code is being tested, a process that can give the tool access to stored credentials for various internal software accounts.”

“The tool” probably means Codecov’s uploaders, for various programming languages, and what it does is similar to any coverage SDK — understand all outputs of popular coverage formats and:

  1. Upload a coverage report to a central server, locally or from a CI provider
  2. Optionally, make sure to “tuck in” some metadata if the coverage vendor product is offering some more “extras” such as drilling down to code, or pinpointing uncovered lines with their surrounding context

From the official Codecov security update we learn that this tool is indeed their bash upload script:

On Thursday, April 1, 2021, we learned that someone had gained unauthorized access to our Bash Uploader script and modified it without our permission. The actor gained access because of an error in Codecov’s Docker image creation process that allowed the actor to extract the credential required to modify our Bash Uploader script.

There was also a blast radius here, because that meant, according to the official security update, that these were affected too:

Codecov-actions uploader for Github, the Codecov CircleCl Orb, and the Codecov Bitrise Step (together, the “Bash Uploaders”). 

Since we’re now in the CI realm (and Codecov is not a CI provider but has integration with these providers via this bash script). The impact is more massive than it should have been.

So the motive here would be to see how to side-step into the more interesting area – the CI provider. And the ROI is clear: as CI providers often host jobs for integration, end-to-end testing, data access, and more. It would be a hub of sensitive access actions, and so, a great place to look for access details, credentials, and more. Not to mention — access to the source code at hand.

Impact and forensics

The official security update states the following impact because the bash uploader was also used in CI providers:

Any credentials, tokens, or keys that our customers were passing through their CI runner that would be accessible when the Bash Uploader script was executed.

Any services, datastores, and application code that could be accessed with these credentials, tokens, or keys.

The git remote information (URL of the origin repository) of repositories using the Bash Uploaders to upload coverage to Codecov in CI.

This basically means a really bad day to a whole lot of people.

Other possible attack vectors

By reading Codecov’s documentation and surveying possible additional attack vectors, we’re going to extract a few more lessons to take home.

To start, we have a few very positive points in Codecov’s design for security:

They address precisely the topic of “do you store my code or not?”. We’ve seen many vendors circle around this question. However, they are on point and give a clear answer: No, but..:

TLDR; We do not store source code. Some archived raw uploads may contain source code, which you can elect to disable.

And they’re transparent enough to give out more edge cases which are great and caring for those using these languages. It’s worth noting that I would be happy for a clear listing of all the other languages other than C++:

There is only one opportunity for source code to be stored: while uploading reports. Some languages, C++ for example, produce reports that include source code in the report data. Codecov scrubs some source code out (and we plan to support this effort more) but may not find it all. These uploads, by default, are archived for 1 month. You may elect to prevent all uploads from archiving by disabling this feature.

All makes sense.

Another point Codecov are making: focus on what we do best. And they are bold enough to state it. Codecov is not a CI provider, and therefore does not need your code:

Codecov does not run your test suite. That is the job of the CI Provider. Codecov gathers coverage reports and other key data for static analysis.

Twist plot, for the question “If Codecov doesn’t store my source code, why is it visible in the UI?” we’re getting:

At display time, Codecov uses an oauth token from your repository provider (e.g., GitHub, GitLab, BitBucket) to retrieve the code from the repository provider to display on the page with the coverage overlaid. The code is not stored anywhere and should the oauth token be revoked or access to the repo change, this page will not load and will instead show an error.

We already know that the breach happened with the Bash script uploader, thereby letting an attacker side-step into any given CI using it. But if an attacker had access to Codecov’s infrastructure in this way then, combined with our learning of how the report uploading and usage work, they could also:

  1. Look for partial source code dumps. In them, they can search for secrets and credentials to side-step to a more critical asset
  2. Look for OAuth tokens. From there they can access source code and data, then seek out secrets, search for data, and access details for side-stepping into restricted systems

This is what’s dubbed as a “chain of supply attack”, where Codecov is a link in the chain of an unsuspecting company that took Codecov as a 3rd party vendor.

Lessons learned

The chain of supply attack took multiple faces here:

  1. Running someone else’s code on your infrastructure and remotely fetching it. In this case through bash scripts. The lesson – Don’t curl -L without first copying the script to your own infrastructure and running from your own premise. Then triage and validate whatever the script is doing and see if it can be risky for you.
    Ask yourself — if someone takes over the 3rd party domain — what happens to my infrastructure?
  2. For mission-critical code — vendor your 3rd party code and check it in. This can be done easily in Go, Rust, and node with offline Yarn. Yes, it may seem messy, but that’s the way to go and tooling should solve any challenge or hurdle.
  3. Scan your code for secrets, sensitive data, and access credentials. Then make sure to mitigate and clean it. Assume someone will eventually leak it or have access to it. The worst that can happen is that they steal your algorithms but will not compromise your data or sensitive and critical systems.
git secrets meme

How Spectral can help

Spectral is building the world’s first hybrid scanner that can scan for secrets of any shape and form (over 500 detectors). And all that without sending your code anywhere or storing it anywhere.

Talk to us and we’ll get you started protecting your infrastructure and code from breaches and leaks.

Related articles

top 12 open source security solutions

Top 12 Open Source Code Security Tools

Open source software is everywhere. From your server to your fitness band. And it’s only becoming more common as over 90% of developers acknowledge using open

top 10 java vulnerabilities

Top 10 Most Common Java Vulnerabilities You Need to Prevent

It’s easy to think that our code is secure. Vulnerabilities or potential exploits are often the things we think about last. Most of the time, our

6 steps to a data breach response plan

6 Steps to Developing a Data Breach Response Plan

Experiencing a data breach is never pleasant. Just ask any of the hundreds of businesses that suffered a data breach in the past year, exposing billions

Stop leaks at the source!