Spectral now part of Check Point’s CloudGuard to provide the industry’s most comprehensive security platform from code to cloud Read now

4 Reasons why Python libraries are not secure

By Eyal Katz January 5, 2023

The Don’t Repeat Yourself (DRY) Principle is one of Python’s most used software development principles. It aims to reduce the repetition of software patterns and algorithms by using package libraries and boilerplate templates to improve product release efficiency. 

Although the benefits are clear, there’s a downside: the DRY Principle requires businesses to place their trust in the Python ecosystem–Mainly the Python Package Index (PyPI), since it is where most open-source libraries are hosted for public use. Given PyPI’s popularity and the fact that a high proportion of organizations worldwide (30%) implicitly trust open-source repositories, we must ask: Is open source safe? 

Should businesses place trust in the PyPI?

PyPI is a repository of software that allows anyone to publish libraries for the Python programming language. The project has been ongoing since 2003 and, as of January 2022, contains nearly 350,000 Python packages. But since anyone can contribute their own code to the repository, we shouldn’t assume that all PyPI publishers have the best intentions. 

Though most PyPI libraries are safe, malicious software can also spread in the repository if unchecked. Open-source contributors and volunteers look over most of the open-source libraries on PyPI, but some of these libraries can be missed leaving room for malicious code to crawl in. 

This problem became apparent in 2019 when two libraries containing malicious code were removed from PyPI after they were published using a technique known as “typosquatting”: 

Attackers mask malicious libraries by choosing names that are close enough to the original so that developers accidentally install the fraudulent, malicious version of the one they intended to get. 

Python PyPi security issues meme

For example, the libraries jellyfish and dateutil were typosquatted into two new malicious libraries named jeLlyfish (with an uppercase L) and python-dateutil, respectively. When installed, they behave exactly like the originals, except for attempting to steal personal data from the developer.
The malicious libraries were removed from PyPI, but in 2021, it was recorded that almost half of all packages in PyPI contain problematic or potentially exploitable code. This security issue raises an alarm for organizations that use open-source libraries–and that’s most of them–because malicious code can put them at risk of data breaches or severe downtimes.

Should You Be Worried about Python Cyber Security?

In layperson terms, a Python library is an open-source collection of related Python modules that contain bundles of code that can be reused repeatedly across different programs. Most of these libraries reside in a public repository such as the PyPI. One commonly used library from PyPI is the “requests” library.It contains several modules communicating inside it to help perform HTTP requests with minimal effort.

response = requests.get('https://my-app.com/api/users');

All you need to do is install the library using pip to use it across any project. This is particularly helpful if you are trying to implement a problem that has already been solved. For example, if you are a machine learning engineer, you could use libraries such as scikit-learn or TensorFlow to import models such as a neural network rather than coding one from scratch.

The advantages of using PyPI are not limited to code reusability. Ease of access also attracts Python developers who can download a library from any location using pip. But the problem remains: You don’t know the contributors hosting packages on PyPI, and they don’t always get the scrutiny needed to ensure their safety.

As a result, consider some of the common security issues of open-source libraries:

  • Identity Hijacking: Many attackers offer to take over neglected open-source libraries from owners without time to look after them and end up updating the code with malicious files.
  • Typosquatting: This is similar to typosquatting but in Python. Most attackers create libraries with names identical to the original, so developers accidentally install the malicious library, not the original.
  • Dependency Vulnerabilities: Some libraries depend on other libraries to implement their functionality. When one dependent library has a vulnerability, it ultimately affects the parent library.
Typosquatting

These security vulnerabilities ultimately make your application less secure as these libraries could send your data to an attacker or record your activities. Therefore, developers must understand that not all libraries are safe and must remain vigilant when installing open-source libraries. Using security verifier tools is one step towards ensuring a library is safe to install.

Why Aren’t Python Libraries Secure?

Attackers exploit libraries on PyPI with techniques such as spam packages, typosquatting containing malware, and packages designed to steal developers’ credentials upon installation. One example is Ascii2text, a spam package that tries to mimic the popular package art while fetching a malicious script that searches for local passwords and exports them via a Discord webhook. Sounds too relatable? Make sure to take extra measures to protect your code if a malicious library steals your data.

But generally speaking, there are four main security concerns with libraries in PyPI. Let’s discuss each one in detail to get a better understanding of what you should look out for:

1. Arbitrary Code Execution

Arbitrary Code Execution is an attacker’s ability to run any Python command or code on a process or machine. A common example is a command injection. This is caused when user input is directly processed in a function without sanitization.

Consider the snippet below:

user_input = input('\Provide an input: ')
if not user_input:
    print ("No input provided by user")
else:print ("This is what you said: ", eval(compute_user_input))

This snippet accepts user input and executes it (only if it is a valid Python statement). In this example, an attacker can easily pass in a statement such as import(‘os’).system(‘rm –rf /’) that will be executed in the else block. This statement is hazardous as it will wipe the OS directory.

But you can quickly mitigate these attacks by performing input sanitization. Use the ast module offered by Python and sanitize the input before passing it into a Python function. This will ensure that only a predefined set of allowed actions are given to it.

2. Directory Traversal Attack

A directory traversal attack allows attackers to read files on a server or directory by manipulating the path.

Directory Traversal Attack

The figure depicted above highlights a directory traversal attack. The attacker can navigate across file paths using ../../ and access sensitive data.

The code snippet shown below highlights this attack:

inport os

file_location = input('\nType location: ') # /Users/[user]/../my-secure-password.txt)

file = open(file_location, "r")
print(file.read())

This snippet is prone to this attack as the input provided (path provided) is directly loaded by the system. Hence, it is evident that this attack is caused due to lack of input sanitization.

Developers can mitigate these attacks by maintaining a list of allowed directories and sanitizing all inputs against the allowed path list.

3. Outdated Libraries

Outdated Libraries cyber security

This is a prevalent issue in the open-source community because all code we write is not 100% error-free. Constant releases are done to address such issues.

However, some developers forget to update a library whenever a patch is released. This can leave them exposed to attacks that intentionally exploit the bugs.

Hence, it is important to:

  1. Keep an eye out for bug fixes and patch releases, and implement them readily when available.
  2. Browse through the release notes of the fix and update to the latest version to address the resolved changes.

4. Broken Access Control

Poor access control implementation often leads to broken access control, allowing attackers to bypass authentication/authorization and access data. This attack is so common and severe that OWASP has listed it as the number one web application vulnerability in 2022. 

Certain Python libraries might have poorly coded authorization flows that get bypassed under specific edge cases. So it’s important to acknowledge the benefits of well-tested libraries from trusted sources.

Avoiding Traps and Coding With Confidence

Though there are several security concerns in using open-source libraries in Python, let’s not throw the baby out with the bathwater. Open-source libraries bring out efficiency to the development workflow, significantly boosting the organization’s efficiency with faster releases. Use it wisely, and if you do find any suspicious Python libraries with any malicious intent, it’s recommended to report them to Python.

Finally, if you are developing your own Python library, ensure that you write secure code to help your peers build secure applications. One easy way to ensure that you’re contributing positively to the community is by using code security monitoring tools like Spectral to verify that the code you’ve written for your library adheres to security best practices. 

Meanwhile, protect yourself from potential mistakes with the SpectralOps platform, which automates the process of secret protection at build time. It provides real-time monitoring for exposed API keys, misconfigured security credentials, and more while mapping and monitoring sensitive assets such as codebases, logs, and intellectual property that may have been left exposed in public-facing repositories such as PyPI. 

Learn more about Spectral and get started in seconds here

Related articles

top 12 open source security solutions

Top 12 Open Source Code Security Tools

Open source software is everywhere. From your server to your fitness band. And it’s only becoming more common as over 90% of developers acknowledge using open

top 10 java vulnerabilities

Top 10 Most Common Java Vulnerabilities You Need to Prevent

It’s easy to think that our code is secure. Vulnerabilities or potential exploits are often the things we think about last. Most of the time, our

6 steps to a data breach response plan

6 Steps to Developing a Data Breach Response Plan

Experiencing a data breach is never pleasant. Just ask any of the hundreds of businesses that suffered a data breach in the past year, exposing billions

Stop leaks at the source!