Documentation

How to choose a secret scanning solution to protect credentials in your code

May 5, 2021

How safe are your passwords? How secure are your API keys? Are you sure your CI pipeline is configured using the best security practices?

One of the easiest methods malicious actors use to infiltrate systems and abuse data is by scanning for secrets that accidentally leak into the public space. Why go through the effort of hacking when someone has left the keys to the kingdom sitting on the doormat?

For organizations, this can have quite a cost. According to IBM’s Cost of a Data Breach Report, the average cost of a data breach in 2020 was $3.9 million.

Put simply: Can your organization afford a data breach? If maintaining continued stability and security is your responsibility, read on.

Why You Need To Scan For Secrets

The growth of collaborative development via shared code repositories has given malicious actors new and imaginative attack vectors to exploit.

Souce: https://www.researchgate.net/figure/Snippets-from-news-articles-that-reported-AWS-key-leaks-Simply-searching-for-AWS-key_fig1_281278649

Nissan Bites The Bucket

In early 2021, a misconfigured Bitbucket server operated by Nissan NA was breached using a default administrator password.

The Git-based code repository contained code used across Nissan’s North American operations, allowing anyone with a little computing experience to clone the repository and plant backdoors into the existing codebase to exploit at a later date.

Gone With The Solarwinds

One of the biggest data breaches in recorded history began with a poorly selected password exposed on a public GitHub server by a SolarWinds intern.

As SolarWinds did not have a secret scanning tool inside their CI/CD pipeline, the password leaked into a public repository and was used to breach SolarWinds. Once in control, the hackers used SolarWinds’ platform to exploit many of their high-profile clients in a classic supply-chain attack.

Amazon’s AWS Adventure

In January 2021, an Amazon cloud engineer accidentally committed almost a gigabyte worth of sensitive data to his personal GitHub repository.

Within 30 minutes, the leak was detected by automated tools used by a third-party security firm, demonstrating the speed and ease at which leaked secrets can be detected with the right tools in place. Without the quick detection and notification by a reputable security firm, Amazon’s AWS could have suffered additional data leaks and service disruption.

Approaches To Secret Scanning

Today’s secret scanning solutions use one or more of the following scanning algorithms:

Entropy Checks

Entropy is the simplest secret detection method. This approach works under the assumption that secrets use randomized values in relation to actual code syntax, which uses more structure.

An example of a high entropy string would be “Gj12_34xAaQ2p01oV”. Entropy scanning on its own often results in false positives and is not sufficient for use in large projects.

gitLeaks is an open-source secret scanning solution that employs entropy scanning of JSON, SARIF, or CSV file formats, integrates into the CI/CD pipeline, and scans Git commit history.

Regular Expressions (Regex)

A regular expression tries to identify specific patterns that may point to an exposed secret. For example, all YouTube DATA v3 API keys begin with the string “AIza” and use a fixed number of characters.

Regular expressions are especially useful for detecting API keys and tokens that use a fixed string structure. However, regular expressions are not well suited when scanning for other secrets (such as passwords).

Git-Secrets is an open-source solution that uses regular expressions to scan code for secrets and can integrate into the CI/CD pipeline to scan for accidental commits.

AI / Machine Learning

The rise of AI and machine learning completely was a game-changer.

Unlike entropy checks and regular expressions, which are essentially educated guesses, machine learning works by training an algorithm on a large, curated data set of previously discovered secret leaks.

The machine learning algorithm evolves through additional data sets and user-generated feedback. As the algorithm is trained, false-positive reports are reduced, and previously hidden secrets may be revealed.

Spectral is a powerful commercial solution that performs intelligent secret scans using AI and machine learning algorithms. Spectral easily integrates into the CI/CD pipeline and offers a clean and intuitive user interface.

How To Choose A Secret Scanning Solution

When selecting a secret scanning solution, the first thing to consider is whether the solution meets your organization’s needs and specifications.

Developer Experience

The last thing a developer wants is to be interrupted. Interrupting a developer’s workflow reduces the developer’s output and may even affect their morale if the interruption is perceived as a ‘waste of time’ (e.g., a false-positive report). A secret scanning solution should provide an intuitive UX (user experience) with minimal disruption.

CI/CD Integration

CI/CD integration is a mandatory feature for any serious secret scanning solution. It ensures that secrets are scanned in real-time as developers commit their code to a repository. With CI/CD integration, developers are notified as soon as an accidental leak is detected, blocking the leak well before it has an opportunity to spread beyond control.

Coverage

Most secret scanning tools are designed to scan for secrets in code. More advanced tools expand coverage by scanning Git commit history, Gists (shared code), Git server configuration, Git Wiki (shared knowledge), logs, and more. Make sure the solution you select offers comprehensive coverage that is suited to your organization.

Accuracy

One of the most important issues when scanning for secrets is accuracy. False positives may hurt developer performance and morale as developers waste their time handling non-existent cases. At the same time, false negatives mean that secrets are leaking, and you simply have no idea how or where.

A scanning solution must use machine learning combined with user feedback to significantly reduce the number of false positives while evolving to detect other secrets that may still be lurking in the wild.

Speed & CI/CD Resource Allocation

Not all CI/CD integration solutions are built with developer experience and resource allocation in mind.

A secret scanning solution should offer a fast, smooth CI/CD integration that does not introduce artificial delays into the pipeline. Such delays often frustrate developers and slow down development as a whole.

Monitoring & Alerting

Secrets may leak in areas beyond your control. Whether it’s Slack, Microsoft Teams, EMail, or public platforms such as GitHub, Gists, or Pastebin, monitoring and secret leak alerts should extend beyond your organization’s internal systems.

Any secret scanning solution you choose should detect and alert you to any accidental secret leak involving your staff and the external services they may use.

Customization

Your organization may use unique templates to store secrets. Such templates may remain undetected by default.

A secret scanning solution that allows you to train the algorithm by importing or creating custom detection rules can elegantly resolve detection issues and enhance secret discovery.

Code Privacy

Your company’s source code is sacred. Exposing proprietary code may result in data theft, secret leakage, security breach, ransomware, regulatory concerns, and other nasty predicaments.

For these reasons, it is important to select a secret scanning solution that does not expose your intellectual property by remotely scanning the code on a server beyond your control.

Summary

Very few secret scanning solutions combine expansive coverage, developer-first approach, machine learning detection, smooth CI/CD integration, an intuitive user experience, secure code privacy, and enhanced monitoring.

Spectral’s proprietary technology is the only solution that delivers a fast, accurate, and developer-friendly option that checks all the boxes. Furthermore, Spectral is well-suited to businesses and enterprises working with a large codebase and corporate DevSecOps in mind.