15-Year-Old Python Flaw Slithers into Software Worldwide

In many areas, a 15-year-old flaw in the open-source Python programming language has gone unpatched, making its way into hundreds of thousands of both open-source and closed-source projects globally. Researchers cautioned that this is unintentionally resulting in a broadly vulnerable software supply chain that the majority of affected firms are unaware of.

According to analysts at the Trellix Advanced Research Center, over 350,000 distinct open source repositories still contain software programs that are vulnerable to abuse due to a path traversal-related vulnerability, tracked as CVE-2007-4559.

The code base in question is present in software that crosses a wide range of industries, primarily software development, artificial intelligence/machine learning, and code development, but also including fields such as security, IT management, and media, according to principal engineer and director of vulnerability research Douglas McKee in a blog post published on September 21.

Researchers observed that the Python tarfile module is widely used in frameworks developed by AWS, Facebook, Google, Intel, and Netflix, as well as programs used for machine learning, automation, and Docker containerization. It is also available as a default module in any Python-using project.

“Today, left unchecked, this vulnerability has been unintentionally added to hundreds of thousands of open- and closed-source projects worldwide, creating a substantial software supply chain attack surface,” McKee said.

New Problem, Old Vulnerability

Researchers from Trellix initially believed they had discovered a brand-new zero-day Python vulnerability until they discovered that Python’s tarfile module wasn’t properly checking for path traversal vulnerabilities in an enterprise device recently, McKee wrote in the post. They quickly understood, though, that the problem was already known.

Further investigation and later assistance from GitHub found that over 2.87 million open source files containing Python’s tarfile module are present in roughly 588,000 different repositories. Researchers currently believe that 350,000 Python repositories are vulnerable as a result of Trellix analysis results showing that roughly 61% of those instances are vulnerable.

In Open Source, There’s No One to Blame

It is for a variety of reasons that the problem has been allowed to propagate unchecked throughout the software for so long; nonetheless, McKee stated that it would be dishonest to specifically criticize the Python project, different Python maintainers, or any developers that use the platform.

“Let’s start by being explicitly clear — there is no one party, organization, or person to blame for the current state of CVE-2007-4559, but here we are anyway,” he wrote.

It’s more difficult to track down and promptly address known flaws with open source projects like Python since they are run and maintained by a loosely federated group of volunteers rather than a single organisation — and in this case, a nonprofit foundation — McKee noted.

Further, “it is not uncommon for libraries or software development kits … to consider the responsibility for securely leveraging their APIs as part of the developer’s responsibility,” he said.

In fact, Python explicitly advises developers not to “extract archives from untrusted sources without prior inspection” due to the dangers of doing so in its documentation of the tarfile function.

A warning is “a positive step” in raising awareness of the problem, but McKee said that it hasn’t stopped the vulnerability from recurring because it’s still up to developers using the code base to make sure their product is safe.

The majority of the Python tutorials for developers on how to use the platform’s modules, including Python’s own documentation and well-known websites like tutorialspoint, geeksforgeeks, and askpython.com, are not clear on how to avoid insecure use of the tarfile module, he added, adding that this is enraging the issue.

According to McKee, this gap has made it possible for the vulnerability to be built into the supply chain, and this trend is likely to remain for years to come without greater public awareness of the issue.

‘Incredibly Easy’ to Exploit the Flaw

Technically speaking, the Python tarfile module contains a path traversal weakness (CVE-2007-4559) that enables an attacker to overwrite any file by adding the “..” sequence to the filename in a TAR archive.

According to Trellix vulnerability researcher Charles McFarland, the actual weakness is caused by two or three lines of code that use the built-in defaults of tarfile.extractall() or an unsanitized version of tarfile.extract().

“Failure to write any safety code to sanitize the members’ files before calling or tarfile.extract() tarfile.extractall() results in a directory traversal vulnerability, enabling a bad actor access to the file system,” he wrote.

The operating system separator (“/” or “”) must be added to the file name to escape the directory that the file is intended to be extracted to in order for an attacker to attack this issue, according to Schulz. He pointed out that Python’s tarfile module enables developers to do just that.

Kasimir Schulz, a Trellix vulnerability research intern, went into great detail in a third separate Trellix blog post he wrote published on Wednesday about how “incredibly easy” it is to exploit CVE-2007-4559. Kasimir Schulz’s research on a different issue is actually what revealed the extensive Python tarfile bug.

According to Schulz’s post, tarfiles in Python are collections of several files and metadata that are later utilised to unarchive the tarfile itself. File name, size, checksum, and information about the file’s owner at the time the file was archived are just a few examples of the metadata that can be found in a TAR archive.

“The tarfile module lets users add a filter that can be used to parse and modify a file’s metadata before it is added to the TAR archive,” Schulz wrote. This enables attackers to create their exploits with as little as six lines of code, he said.