DevSecOps

4 Reasons why Python libraries are not secure

By Eyal Katz January 5, 2023

The Don’t Repeat Yourself (DRY) Principle is one of Python’s most used software development principles. It aims to reduce the repetition of software patterns and algorithms by using package libraries and boilerplate templates to improve product release efficiency.

Although the benefits are clear, there’s a downside: the DRY Principle requires businesses to place their trust in the Python ecosystem–Mainly the Python Package Index (PyPI), since it is where most open-source libraries are hosted for public use. Given PyPI’s popularity and the fact that a high proportion of organizations worldwide (30%) implicitly trust open-source repositories, we must ask: Is open source safe?

Should businesses place trust in the PyPI?

PyPI is a repository of software that allows anyone to publish libraries for the Python programming language. The project has been ongoing since 2003 and, as of January 2022, contains nearly 350,000 Python packages. But since anyone can contribute their own code to the repository, we shouldn’t assume that all PyPI publishers have the best intentions.

Though most PyPI libraries are safe, malicious software can also spread in the repository if unchecked. Open-source contributors and volunteers look over most of the open-source libraries on PyPI, but some of these libraries can be missed leaving room for malicious code to crawl in.

This problem became apparent in 2019 when two libraries containing malicious code were removed from PyPI after they were published using a technique known as “typosquatting”:

Attackers mask malicious libraries by choosing names that are close enough to the original so that developers accidentally install the fraudulent, malicious version of the one they intended to get.

For example, the libraries jellyfish and dateutil were typosquatted into two new malicious libraries named jeIlyfish (with an uppercase I) and python3-dateutil, respectively. When installed, they behave exactly like the originals, except for attempting to steal personal data from the developer.

The malicious libraries were removed from PyPI, but in 2021, it was recorded that almost half of all packages in PyPI contain problematic or potentially exploitable code. This security issue raises an alarm for organizations that use open-source libraries–and that’s most of them–because malicious code can put them at risk of data breaches or severe downtimes.

Should You Be Worried about Python Cyber Security?

In layperson terms, a Python library is an open-source collection of related Python modules that contain bundles of code that can be reused repeatedly across different programs. Most of these libraries reside in a public repository such as the PyPI. One commonly used library from PyPI is the “requests” library.It contains several modules communicating inside it to help perform HTTP requests with minimal effort.

response = requests.get('https://my-app.com/api/users');

All you need to do is install the library using pip to use it across any project. This is particularly helpful if you are trying to implement a problem that has already been solved. For example, if you are a machine learning engineer, you could use libraries such as scikit-learn or TensorFlow to import models such as a neural network rather than coding one from scratch.

The advantages of using PyPI are not limited to code reusability. Ease of access also attracts Python developers who can download a library from any location using pip. But the problem remains: You don’t know the contributors hosting packages on PyPI, and they don’t always get the scrutiny needed to ensure their safety.

As a result, consider some of the common security issues of open-source libraries:

Identity Hijacking: Many attackers offer to take over neglected open-source libraries from owners without time to look after them and end up updating the code with malicious files.
Typosquatting: This is similar to typosquatting but in Python. Most attackers create libraries with names identical to the original, so developers accidentally install the malicious library, not the original.
Dependency Vulnerabilities: Some libraries depend on other libraries to implement their functionality. When one dependent library has a vulnerability, it ultimately affects the parent library.

These security vulnerabilities ultimately make your application less secure as these libraries could send your data to an attacker or record your activities. Therefore, developers must understand that not all libraries are safe and must remain vigilant when installing open-source libraries. Using security verifier tools is one step towards ensuring a library is safe to install.

Why Aren’t Python Libraries Secure?

Attackers exploit libraries on PyPI with techniques such as spam packages, typosquatting containing malware, and packages designed to steal developers’ credentials upon installation. One example is Ascii2text, a spam package that tries to mimic the popular package art while fetching a malicious script that searches for local passwords and exports them via a Discord webhook. Sounds too relatable? Make sure to take extra measures to protect your code if a malicious library steals your data.

But generally speaking, there are four main security concerns with libraries in PyPI. Let’s discuss each one in detail to get a better understanding of what you should look out for:

1. Exposure to External Inputs

Libraries that interact with external inputs, such as user data, web requests, or files, might not implement proper input sanitization or validation. This increases the risk of attacks like command injection, file inclusion vulnerabilities, or data corruption.

So attackers can exploit such vulnerabilities and gain access to sensitive data or perform destructive operations.

2. Directory Traversal Attack

A directory traversal attack allows attackers to read files on a server or directory by manipulating the path.

The figure depicted above highlights a directory traversal attack. The attacker can navigate across file paths using ../../ and access sensitive data.

The code snippet shown below highlights this attack:

inport os

file_location = input('\nType location: ') # /Users/[user]/../my-secure-password.txt)

file = open(file_location, "r")
print(file.read())

This snippet is prone to this attack as the input provided (path provided) is directly loaded by the system. Hence, it is evident that this attack is caused due to lack of input sanitization.

Developers can mitigate these attacks by maintaining a list of allowed directories and sanitizing all inputs against the allowed path list.

3. Outdated Libraries

This is a prevalent issue in the open-source community because all code we write is not 100% error-free. Constant releases are done to address such issues.

However, some developers forget to update a library whenever a patch is released. This can leave them exposed to attacks that intentionally exploit the bugs.

Hence, it is important to:

Keep an eye out for bug fixes and patch releases, and implement them readily when available.
Browse through the release notes of the fix and update to the latest version to address the resolved changes.

4. Easy-to-Use, Hard-to-Secure.

Overall, Python modules are built to be easy to use. However, this can be the most significant downfall on it. The development of such libraries prioritize ease of development over security.

For example, libraries that simplify file handling, web requests, or database interactions might not include necessary safeguards, leaving developers vulnerable if they are not aware of best practices.

Avoiding Traps and Coding With Confidence

Though there are several security concerns in using open-source libraries in Python, let’s not throw the baby out with the bathwater. Open-source libraries bring out efficiency to the development workflow, significantly boosting the organization’s efficiency with faster releases. Use it wisely, and if you do find any suspicious Python libraries with any malicious intent, it’s recommended to report them to Python.

Finally, if you are developing your own Python library, ensure that you write secure code to help your peers build secure applications. One easy way to ensure that you’re contributing positively to the community is by using code security monitoring tools like Spectral to verify that the code you’ve written for your library adheres to security best practices.

Meanwhile, protect yourself from potential mistakes with the SpectralOps platform, which automates the process of secret protection at build time. It provides real-time monitoring for exposed API keys, misconfigured security credentials, and more while mapping and monitoring sensitive assets such as codebases, logs, and intellectual property that may have been left exposed in public-facing repositories such as PyPI.

Learn more about Spectral and get started in seconds here.

4 Reasons why Python libraries are not secure

Should businesses place trust in the PyPI?

Should You Be Worried about Python Cyber Security?

Why Aren’t Python Libraries Secure?

1. Exposure to External Inputs

2. Directory Traversal Attack

3. Outdated Libraries

4. Easy-to-Use, Hard-to-Secure.

Avoiding Traps and Coding With Confidence

Related articles

Top 12 Open Source Code Security Tools

Top 10 Most Common Java Vulnerabilities You Need to Prevent

6 Steps to Developing a Data Breach Response Plan

Stop leaks at the source!