Analysis

Misconfigured Kafdrop Puts Companies’ Apache Kafka Completely Exposed

By Dotan Nahum December 6, 2021

This research refers to exposed data of organizations or individuals as a result of misconfigured infrastructure, not caused by the Kafdrop project itself.
Highly committed to the open-source movement and sworn contributors ourselves, we appreciate the importance of open source. This article aims to shed light on a misconfiguration that puts companies at risk and offers an immediate mitigation.

With more than 20M downloads, Kafdrop is a top UI for viewing and managing Kafka topics

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies, including eight of 10 of the world’s largest banks, the 10 largest global insurance companies, and eight of the 10 key world telecom providers, for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Kafka typically processes and stores logs, financial transactions, and private user data. It also powers consumer-centric data pipelines processing actions, events, and behavior in real time.
Kafka is cloud-native and can be deployed from small to large cloud-based clusters, is highly scalable, and tolerant. It can be configured across the cluster. As such, tools that help in managing and controlling Kafka clusters, its data, and consumers are crucial for its operation. However, misconfiguring such a tool can result in exposing the Kafka cluster to the internet, making a perfect target for attackers who can infiltrate and exfiltrate data and take over cluster management.

Kafdrop exposes Kafka clusters in the wild through misconfigured UI

Through our DeepConfig research, we find large amounts of misconfigured apps. In this case, it was complete Kafka clusters exposed internet-wide because of Kafdrop, a popular and open source Kafka UI and management interface.
These clusters expose customer data, transactions, medical records, and internal system traffic: providing an inside look into the complete nervous system, all public.
We found exposed clusters from companies across a multitude of industries, including insurance, healthcare, IoT, media, and social networks.
Also exposed was real-time traffic revealing secrets, authentication tokens, and other access details that allow hackers to infiltrate the companies’ cloud activities on AWS, IBM, Oracle and others.

Findings Highlights

By adding an insecure management UI on top of secure, mission-critical Kafka clusters, operators have exposed the secure clusters to the world.
With the management UI, an attacker can delete Kafka topics and drop consumers, wreaking havoc in internal systems.
In addition, these Kafka clusters expose log and transactional data- everything from sensitive traffic records to financial.
It is common to use Kafka as a transactional data source and queue system, so the data includes internal database records as well as sensitive app payloads.
Since Kafka is also a central data hub, it is possible to gain additional access by injecting specially crafted messages and.
To act quickly, either take down your Kafdrop UIs or redeploy them behind an app server like Ngnix with an active and configured authentication module. We recommend specific mitigation practices to prevent vulnerabilities for the long-run.

What is Kafka?

Apache Kafka is a framework implementation of a software bus using stream-processing. It is an open-source software platform developed originally at LinkedIn and now by the Apache Software Foundation. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka can connect to external systems for data import/export and power large streaming ETL processes.
As an open-source project, Kafka has reached hyper adoption, used by more than 80% of all Fortune 100 companies.
Using Kafka, you can host topics (analogous but not identical to a queue) and enable consumers to pull and process messages from topics. One of the unique properties of Kafka is that it was built for scale to support any real-time and near-real-time processing workload. Therefore, topics can be partitioned for more massive workloads, while consumers manage their own offsets, to allow brokers to operate in a distributed fashion, at scale.

What is Kafdrop?

Kafdrop is a popular open-source UI and management interface. With Kafdrop, you can manage topic creation and removal, as well as understand the topology and layout of a cluster,drilling into hosts, topics, partitions, and consumers. It also allows you to sample and download live data from all topics and partitions, acting as a legitimate Kafka consumer.

Kafdrop can be cloud deployed as a Docker container and connects and maps existing Kafka clusters automatically.

Misconfiguration Risk

Since Kafka serves as a data hub and central processing system for mission-critical data, an exposed cluster risks every facet of the organization.
An exposed cluster through Kafdrop can also be managed, which means hackers can also create damage beyond exfiltrating data, such as dropping a cluster, deleting topics, and more.

Exposed server names, cluster layout, and addresses

By understanding the topology of a cluster, a hacker can efficiently connect and impersonate a legitimate consumer, injecting or pulling data at will.

Types of Data Leakage

Data breaches continue to be a top security concern for organizations of all sizes. Such breaches regularly result in non-compliance or the leakage of trade secrets. Here are five types of data leaks we discovered from Kafka clusters exposed by a misconfigured Kafdrop UI.

Managing Kafka topics

Kafdrop lets you manage, add, and deleteKafka topics. Deleting a topic can be as surprising to consumers of that topic as deleting a database table in a traditional database, potentially causing denial of service.

Internal traffic

Here, we see a Kafka topic serving as a log streaming service, with live traffic among microservices of one of the largest news outlets in the world. The request data contains service tokens, secrets, cookies and more.

Email traffic

Since Kafka is often used as an operational queue, we found complete email traffic in topics exposed to the public. These are emails between an organization and its customers and employees and contain sensitive data, tokens, and private cookies carried as parameters within email URLs.

Medical

One of the medical organizations had complete topology for handling requests, processing, and inventory of medication as well as customer prescription transactions. Held as one of the most sensitive of all data – medical records can be abused by hackers for impersonation, extortion, and other similar acts.

Fintech & Insurance

In a different cluster, we found insurance claims, transactions, and interactions between agents and customers. Insurance data can be used by attackers to impersonate, extort, or redirect funds elsewhere.

Secrets & configuration

Many times Kafka topics are used as plain semi-persistent data tables or contain pieces of messages that reveal sensitive data in their body. In this case, configuration, secrets, and server addresses can be extracted from messages sitting in topics.

Remediating Kafdrop misconfiguration

It is suggested in the Kafdrop project to resolve this flaw by isolating Kafdrop from public connectivity, and putting it behind an Nginx proxy. With only one connected access point, you can now add an authentication module to Nginx and use it as an authentication layer.
Kafdrop is based on the Spring-Boot framework, which supports security as a first-class citizen and includes a wide range of built-in authentication mechanisms. As such, we at Spectral have contributed an authentication code addition back into Kafdrop using a simple username-password authentication. Though this is pretty basic methodology, it is better than not having any authentication at all.

Remediation and Mitigation Strategies

Protect your organization against the Kafdrop issue and other similar vulnerabilities by using these mitigation strategies:

Understand supply chain risk—many high-profile cyber attacks come from supply chain vulnerabilities rather than vulnerabilities in in-house applications (Solarwinds is a recent example). It’s critical to thoroughly assess and understand the risks within the supply chain of external libraries, tools, and services your organization uses.
Encrypt messages and traffic—Make sure to encrypt data both at rest and in transit. This means encrypting data at rest in Kafka and configuring your applications to always encrypt when reading or writing data to and from Kafka.
Scan in-depth for misconfigurations—Advanced misconfiguration scanners can help detect the types of errors resulting in an exposed Kafka cluster, including broken authentication, input sanitation problems, and encryption errors.

Automating Your Mitigation Processes

Spectral’s DeepConfig can identify misconfigurations at all layers of software, including the app, infrastructure and data layers, to prevent exploits of security gaps and data breaches.
It can be used with almost any technologies, detect issues relating to Elastic, MySQL, Redis, Memcache, Rails, Django, Kubernetes, CloudFormation, Terraform, Postgres, and many others.
Spectral’s DeepSecret technology is the market-leading secret scanning solution supporting more than 2,000 detectors for shapeless data, code, binary and more. DeepSecret and DeepConfig allow for building custom detectors using a simple declarative language written in YAML.
Using the combination of DeepConfig and DeepSecret eliminates the extensive time and investment in security review, pentesting, and consulting.