Unmasking polyglot files: how Glasswall’s CDR technology tackles this stealthy threat

Glasswall’s security research team, led by Connor Morley, as outlined in our recent Polyglot Research whitepaper, has been investigating how our CDR technology can be used to combat polyglot files. These files seamlessly merge multiple formats, like a PDF and Word document, creating security risks by evading analysis to conceal data or deploy malicious payloads. Our research focuses on image-based polyglots, which often evade scrutiny, and PDF files. Here, we explore different polyglot types, their security implications, creation techniques, and how Glasswall is addressing this evolving threat.

What are polyglot files, and why are they a security risk?

The term "polyglot" typically refers to someone who speaks multiple languages and uses them to adapt to different cultures or social circles. Similarly, polyglot files exist as a single file but can function as multiple formats - displaying an image, opening as a PDF, or extracting as an archive - depending on the application used.

These files can exist due to quirks in various file types specifications, and the potential for script interpreters to tolerate exceptions. An example of this is the PDF reference 1.7, prior to standardisation, which allows the header identifier (or magic bytes) of “%PDF-“or in hex “25 50 44 46 2D” to be placed anywhere within the first 1 KB of the file’s memory space. Although not in the PDF ISO standard, most rendering systems permit this header offset allowance for legacy support and robustness reasons. This opens the door to other file headers being introduced within the first 1 KB or even stored entirely within the 1 KB limit. Depending on the specification of the inserted file type, the PDF file can either be contained within legitimate components of the new file type or simply attached to the end of the inserted file type, resulting in the file being both a PDF and an inserted file type.

Most security systems, including GW CDR, depend on accurately identifying file types to parse and inspect them effectively. Since polyglots adhere to the host file’s specifications, they are not considered corrupt or malformed. File identification systems typically classify them based on the most “obvious” file type, using indicators like file extensions, MIME types, or magic byte values. However, because polyglots can contain multiple valid magic byte entries and headers, profiling systems that focus only on the primary identifier may overlook hidden or suspicious content.

Consider a file that appears to be a PNG image but is also a legitimate PDF. While the PNG itself is harmless, the embedded PDF contains auto-open commands and a malicious JavaScript payload. The security system classifies the file primarily as a PNG, analyzes it accordingly, and detects no threats. Since the PDF code is hidden within a PNG comment block (tEXt or iTXt), it is treated as metadata text, posing no immediate risk. However, when the file, without an extension, is opened in a vulnerable web browser, it is interpreted as a PDF, triggering the embedded payload.

The example above highlights the risks of polyglot files in malware deployment, but similar threats exist for both data infiltration and exfiltration. From a CDR perspective, failing to identify all embedded file types means content management policies may not be correctly applied. In this case, the embedded PDF may bypass validation and sanitization based on the active CDR profile. As a result, the system may not function as intended, potentially leaving vulnerabilities unaddressed.

What types of polyglots are there?

Polyglot files vary in complexity based on how the embedded file types are integrated into the primary host file. Some are simple, while others are highly sophisticated and only compatible with certain file types. Their feasibility depends on factors like header/magic byte offset limitations, available internal structures, and the presence of non-parsed segments, restricting which polyglot variants can exist within specific formats.

Stack

Stacks are the most simplistic of polyglot structures and only require the file types to be “stacked” on top of one another. This is usually achieved by appending the data which makes up the sub-type file onto the primary file with the two files occupying the same file entry. This type of polyglot is limited to sub-types which do not have a header offset limitation or whose structure is inverted (read bottom to top) such as ZIP files.

In ZIP files the “header” or primary segment is located at the end of the file rather than the beginning, as such appending a ZIP file to most other file types will achieve a supported legitimate polyglot, which is both a standard file and an archive.

Parasite

Parasite polyglots are more complex, embedding a secondary file within the structural or ignored segments of the primary file type. This can be done in various ways, from advanced formats that allow comment sections in their structure or do not validate structural elements properly, to simpler file types that include metadata fields rarely used by most systems. PNG and GIF, for example, support UTF-8 text comment segments per their specifications but are seldom utilized.

Using the earlier example of a PDF embedded in a PNG, this technique leverages a parasite approach. The PNG’s tEXt segment, located near the file's start, is populated with the PDF data. If the tEXt segment follows the correct UTF-8 format and size constraints, the PNG remains intact, allowing both the image and the embedded PDF to be interpreted from the same memory location.

Graphic, parasite hiding as part of an organism, avoiding detection

Zipper

Zippers are a more advanced form of parasite polyglots, where both file types embed each other’s data within their respective comment sections. Unlike parasites, which have a primary file and a sub-type, zippers interweave data blocks using different comment specifiers, effectively "zipping" the files together. This technique is most commonly used to merge script files that can be interpreted by multiple engines, leveraging multi-line comment blocks to encapsulate alternative script versions.

Representation of two items being enfolded into one another

Cavities

Cavities are the most advanced type of polyglot and work on embedding sub-type files into unprocessed memory space within a file's structure. As an example, in executable files, this type of null padded space can be referred to as a code cave; however, the same principle can apply to other file types. Any space which can be populated with arbitrary data which is overlooked by an interpreter can be considered a cavity for polyglot injection purposes.

What are some methods used to create polyglots?

Attackers often prefer stack or metadata/comment-based parasite polyglots due to their ease of implementation and the wide range of compatible file type combinations. Stack polyglots are relatively simple, requiring only the appending of a bottom-up file type, such as a ZIP archive, to an existing file. This allows the file to be read in one format from the top down (e.g., BMP, PNG, PDF) and as another from the bottom up, without overlapping content.

Parasite polyglots are more complex but still offer flexibility in execution. In contrast, zipper polyglots are mainly limited to scripting languages and are more challenging to create. Cavity polyglots, which exploit unused/ignored space in a file type, range from simple to complex to create depending on the base file type. Where some files will support relocating sections and creating gaps (such as TIFF) others may have strict controls over gaps between segments (such as WEBP or BMP).

Parasite polyglots can be created by injecting any compatible file type content into a legitimate component of another file. One of the most common ways to achieve this is to inject the sub-type file into the comment/metadata section of the primary file type. A prime example from the example previously discussed is the PNG format, section 11.3.4.3 and 11.3.4.5 of the PNG specification outlines the uncompressed ancillary text chunks, which can be used to store textual information related to the host file which are categorised by keywords.

The textual information is limited to the maximum chunk size of ~2.1GB (although not always supported) and must be ISO-8859-1 character set compatible for tEXt and ISO 10646-1 for iTXt but otherwise permit any content. The tEXt or iTXt chunk is not actively interpreted/processed by most image rendering systems, with the focus instead being on IDAT chunks. As these ancillary chunks can be embedded anywhere within the file, although recommended outside the IDAT block sequence, they can be used to embed at the start or end of a document.

tEXt chunk definition at start and end of document, PHP payload with image data encapsulated in multi-line comment sequence

From the example above, we have embedded a PHP script in the PNG file under two different tEXt chunks. The first chunk is located just after the IHDR section and ends with the multi-line comment start symbol “/*”. The second comment is located just before the IDAT block and consists of the multi-line comment end symbol “*/” and the closing of the PHP block. Essentially, this achieves the commenting out of all the other binary and metadata of the file which can potentially cause PHP interpreter issues. Running this file, resulted in the correct echo printout using php.exe with only minor errors after the execution of the block we specified. It is worth noting that in most cases with script injection, the interpreter will throw exceptions. However, if the primary action is achieved, this can be deemed a success.

But what if UTF-8 segments such as iTXt and tExt are blocked/removed? From section 5.4 we are provided a comprehensive outline of the primary and ancillary chunk specifications. From this, we can see that ancillary chunks, which are any chunk named with the first letter in lower case, are essentially ignored by the primary file type rendering systems and are instead only processed by specific rendering system which look for those specific segment names, making them unrestricted data chunks. Due to this, we can add any ancillary chunk we like with any name, if it adheres to the requirements outlined in section 5.4 for correct ancillary chunk creation, with any content we want. Section 14 of the specification outlines the limitations of custom ancillary chunk locations, which are, in summary, that they are not be placed in a location corrupting IHDR data or between IDAT chunks.

In the example above, we can see the addition of the “cOMM” object at the start of the PNG document before the IHDR, which is populated with 508 bytes, which makes up the sub-type pdf document. As the ancillary block is ignored by the general interpreter, the PNG still renders correctly and, when opened in Adobe Acrobat, the sub-type PDF document is displayed correctly, achieving a true parasite polyglot file.

How is Glasswall addressing this issue?

Glasswall actively engages with cybersecurity research to mitigate these risks in line with their zero-trust policy enforcement CDR technology. Investigation into both the standard and exotic polyglot creation methods for image and PDF base formats yielded several techniques which can be used to achieve such polyglot creation. These polyglots can contain one or multiple sub-type files which can potentially elude detection by standard security analysis and even potentially CDR systems themselves.

Taking the stance that polyglots are not the standard or expected file constructs an environment is expecting (due to their technical and deliberate construction), along with weighing the security risks of such files, Glasswall concluded that all such files should be stripped of sub-type content. To achieve this, the security research team at Glasswall identified polyglot injection methods supported by the image types currently processed by the CDR engine and PDF files, devising mitigation techniques to address them all.

The focus remains on retaining end user utility whilst removing, as part of the zero-trust architecture, any potentially risky elements which may be exploited to host a polyglot sub-type file.

Several identified polyglot injection techniques are already resolved as part of the current Glasswall CDR process, however enhancements addressing advanced and exotic techniques are aiming to be implemented over the coming months. This will close the door to potential attackers attempting to sneak polyglot files through as images or PDFs within an estate/network.

What’s next?

Glasswall is actively researching document, media, and image-based polyglots to assess the range of techniques used. Initial findings show promising mitigation capabilities, but our team continues to explore all possible methods of polyglot creation. Through ongoing analysis and potential enhancements, Glasswall aims to eliminate polyglot threats entirely, regardless of file origin or attacker sophistication.

If you found this article interesting and would like to learn more, please refer to our Glasswall Polyglot Research whitepaper.
‍

A banner linked to the polyglot research pdf

Connor Morley

Connor is a highly experienced security expert and researcher with a strong background in threat hunting, detection and remediation. At Glasswall, he focuses on threat analysis, research and defense development (theoretical and POC).

Riyya Ahmed

Our Senior Technical Writer and Product Marketing Manager, Riyya, is exceptional at authoring, organizing, and simplifying our product documentation. Using her keen eye for detail and wealth of experience in tech, Riyya helps our clients and partners seamlessly integrate with industry-leading Zero-Trust CDR.