Posted on 22.03.2017

Some time ago now I had an interesting conversation with a recruiter. It went a little like this:

Recruiter: I am looking for a role where we need someone with experience in an OO language: C++, Java or C#!

Me: OK, that's me, let me send you my CV.

...Some weeks later I call the recruiter back and ask what has happened...

Recruiter: yes, you do not know Java, so the company didn't want you.

At this stage I decided to ironically make a jar file that self extracted the PDF from the jar resources. This is quite easy to do, you just need code that looks like this:

public class Extract {

    private String path;

    public setPath(String path) {
        this.path = path;

    public void doTheExtraction() {
        FileOutputStream os = null;
        InputStream is = getClass().getResourceAsStream("/resources/name_of_file");

        if (path == "") {
            os = new FileOutputStream(FileDescriptor.out);
        } else {
            try {
                os = new FileOutputStream(new File(path));
            } catch (FileNotFoundException e) {

        try {
            Util.CopyStreamTo(is, os);
        } catch (IOException e) {

the magic is the getClass.getResourceAsStream() part - this allows you to reference any part of the jar file contents as a stream and, well, extract it.

Having mentioned my little joke to a few people somebody (who is either adi or raesene) suggested I make my CV a polyglot PDF/JAR. This is a little bit more tricky and I set out to work out how to do it. In particular I wanted to target something like Poc||GTFO 0x7, which has a working PDF/ZIP polyglot. So I did. To get you excited, here is the result:

Desktop screenshot of polyglot PDF file in action

I am aware of other polyglot projects for CVs, like this ISO/PDF hybrid by klange

If you begin as I did, you will begin reading funky file formats and OMG-WTF-PDF by Julia Wolf. You may also wish to review the work by Didier Stevens. In this you will learn that the PDF format is absolutely insane and that parsers are incredibly lenient. For example:

This means that to form an MBR, PDF, ZIP polyglot the first step is quite straightforward: take the MBR part you assemble from ASM, append the PDF, then append the ZIP. So far so good.

You can then even try it. Run for example:

qemu-system-x86_64 -drive file=cv.pdf,format=raw

To test your boot image. However, if you have just concaternated the zip you will run across this problem demonstrated by pocorgtfo06.pdf:

$ unzip pocorgtfo06.pdf
Archive:  pocorgtfo06.pdf
warning [pocorgtfo06.pdf]:  10672929 extra bytes at beginning or within zipfile
  (attempting to process anyway)
  inflating: 64k.txt                 
  inflating: acsac13_zaddach.pdf     
  inflating: burn.txt                
  inflating: davinci.tgz.dvs

That is not so nice. If you add JAR components following the JAR file layout, you will soon discover that unlike the PKZIP command like utilities, java is less forgiving about those extra bytes. Windows is likewise more strict. This will not do.

To understand what is going on, you need to delve into the ZIP file format. As mentioned in some of the linked reading, it is parsed by seeking to the end and reading a ZIP end of central directory record. This locates and finds the central directory, made up of central directory entries, which often just precede the end of central directory.

The end of central directory entry tells the ZIP parser where to find the central directory entry start. In turn, each central directory entry references the location of a file in the ZIP archive, from the start of this (or another, in the case of multi-part archives) file.

So our zip file should look something like this:

Diagram of on-disk ZIP file format

In this diagram the reference lines are drawn from the EOCD to the CDE start and from each CDE to the file. The values are all relative to the start of the file, however.

But when you prepend some arbitrary data what you actually have is this:

Diagram of on-disk ZIP file format, with prefixed data

PKZIP is clever enough to figure out the error, that is, try relative to the End-of-Central-Directory record, but not all ZIP processors are as smart. So how do we fix this? It is actually quite simple. Let us look first at the structures:

typedef struct _pkzip_ecrec_tag {
    uint32_t magic;
    uint16_t number_this_disk;
    uint16_t num_disk_start_cdir;
    uint16_t num_entries_central_dir_this_disk;
    uint16_t total_entries_central_dir;
    uint32_t size_central_dir;
    uint32_t offset_start_central_dir;
    uint16_t zipfile_comment_len;
} pkzip_ecrec;

typedef struct _pkzip_cdent_tag {
    uint32_t signature;
    uint16_t v_made_by;
    uint16_t v_extract_with;
    uint16_t generalpurpose;
    uint16_t compressmethod; 
    uint32_t lastmod;
    uint32_t crc32;
    uint32_t compressed_size;
    uint32_t uncompressed_size;
    uint16_t filename_len;
    uint16_t extra_field_len;
    uint16_t comment_len;
    uint16_t disk_start_num;
    uint16_t internalattribs;
    uint32_t externalattribs;
    uint32_t relativeoffset_local_hdr;
}  pkzip_cdent;

The relevant entries here are pkzip::offset_start_central_dir and pkzip::relativeoffset_local_hdr. If we are able to rewrite these structures, adding to these the size of the prefix such that everything is relative to the beginning of the file, including our PDF and MBR, then the offsets should be correct, right?

This works fantastically. Not only does PKZIP stop complaining, but java will treat the file as a valid jar. Writing the code to do the parsing is left as an exercise to the reader.

I shall end by mentioning the following problem. Whenever you try to identify the file type, say like this:

$ file cv.pdf
cv.pdf: DOS/MBR boot sector

It shows itself as an MBR boot sector. This is likely because the magic bytes come first, rather than the PDF. Fixing this is left as an exercise to the reader (while keeping the MBR, of course).


  1. 2017-03-21: Initial post.