About Me

My photo
Bay Area, CA, United States
I'm a computer security professional, most interested in cybercrime and computer forensics. I'm also on Twitter @bond_alexander All opinions are my own unless explicitly stated.

Thursday, June 2, 2011

Reverse engineering a malicious PDF Part 2

In Part 1, I began analyzing a malicious PDF. Within the PDF, there was a fair amount of obfuscated malicious Javascript present, which I parsed through. Through many transformations and text replacement, the Javascript eventually decoded and executed the attack code, saved as the variable etppeifjeka.

The attack code was initially obfuscated with excessive exclamation marks:
but once the exclamation marks were removed, it became neat and tidy code. Unlike the malicious Javascript I analyzed last month, once the exclamation marks were removed this code even had line breaks, making it much more legible. The attack code contains several functions: nplayer, printf, geticon, and collab. The PDF contains code to read which version of Acrobat is running, and based on that chooses the exploit to launch.
Adobe has provided some documentation for the app.viewerVersion method. In this case, it's looking at the version of the EScript plugin (which provides Javascript support). The EScript plugin version number is actually the same as the version number for Acrobat itself. Thus, if Acrobat is version 9 or version 8.12 or higher, it runs geticon. If Acrobat is version 7.1, it runs printf. If Acrobat is version 6 or below version 7.11, it runs collab. The last one is oddly written, they might have been trying to write "between 9.1 and 9.2" but as it's written nplayer will be triggered if it's greater than 9.1 or less than 9.2 ... which means if it hasn't hit one of the other functions it'll hit this one.

Here's the code for the geticon function:
Back in part one, I guessed that the shcode variables were the shellcode payloads for the PDF. This function confirms my guess. First, the function grabs shcode_geticon to collect the appropriate shellcode. Then, it's appended to the end of a short NOP sled and saved as the variable garbage. The next bit of code (lines 46-53) uses a variable nopblock to extend the size of the NOP sled. By the time we get through the while loop, the variable block contains a NOP sled 262045 characters long. Since all of this is being stored in memory as the code executes, this is a heap spray. Then, an array called memory is constructed (line 54-55), containing 180 copies of block plus the shellcode. Lines 56-61 construct var buffer with 4012 copies of %0a%0a%0a%0a which are line feeds in hex. Finally, the array buffer is passed to the vulnerable geticon function. The buffer overflows, the execution hits one of the NOP sleds present and executes the shellcode.

Incidentally, this is a well known exploit. I found the exact same exploit code being shared on a security research message board two years ago. Just because all that the news talks about is the new, sophisticated malware, that doesn't mean the old stuff goes away quickly. For example, Contagio has seen samples of this from last January.

Now we've finally reached the point where the shellcode is executed. In this case, what does it do? Daniel Wesemann of the SANS Internet Storm Center provides a short Perl script to take the shellcode and dump it to hex to see if there's anything obvious, like a URL. Here's the code and the results:
In this case, it wasn't very helpful. There's no obvious URL or even anything that follows a URL pattern. The next article in Daniel Wesemann's series continues to compile and disassemble the shellcode for reversing, but I don't read assembly so it's off to another option for me. Malware Tracker has an online shellcode analysis tool, but it didn't work for the four shellcode samples in this pdf. Cruising online guides to shellcode analysis led me to a tool called libEmu. However, when I ran the code in libEmu, it hit my 10000000 step limit before execution actually completed. It looks like either I did something wrong, or I hit an infinite loop in the shellcode. Odd. The same happened with each of the four shellcodes.

Since I'm still new to malicious PDF analysis, I talked to some guys in Threat Research here and they pointed me to a tool called PDF Stream Dumper. Interestingly, I was able to execute the shellcode I copied-and-pasted into it, but it choked on the actual PDF stream that Didier Stevens' tools processed without difficulty in Part 1. This confirms the need to use multiple tools, you never know when one will fail you.

I executed the shellcode within PDF Stream Dumper. There are a couple different ways that you can do this, scDbg uses the libEmu emulation and it crashes the analysis just like libEmu does running under Linux. Running the scLog version (live, not emulated analysis) the shellcode executes. scLog notes that the shellcode loads urlmon.dll, which is an Internet Explorer library for fetching files from remote URLs. Then, the shellcode tries to access shell32.dll, but scLog kills the shellcode to prevent that. Looking at the memory dump in a hex editor, I see a call to a url: /forum.php?f=PDF (GetIcon)&key=87c1a082278ace8fdf2f63b86db29d6f&u= and a reference to a.exe

These file references imply downloading an external file, but implication isn't proof. So, let's use scLog again and let it actually load all DLLs this time and see what happens. First, some cautionary notes: this is malware we're executing and I'm turning off some of scLog's safety functions. As a result, I'm adding some safety back in. I took a snapshot of my analysis VM first so I could revert if needed. Since this shellcode looks like a downloader, I'm running Wireshark to see if anything actually gets downloaded.

It tries to access /forum.php?f=PDF (GetIcon)&key=87c1a082278ace8fdf2f63b86db29d6f&u= and download a file as a.exe to the user's temporary directory. However, since that's a local URL, it fails and the shellcode crashes. Wireshark confirms that nothing was downloaded. Interestingly, /forum.php takes parameters including where the file request is coming from (a PDF) and even which exploit is being used (GetIcon). Interestingly, it looks like this PDF was intended to be viewed online, not downloaded (or emailed) and viewed.

In summary, the geticon() function exploits a known vulnerability in Acrobat to hook urlmon.dll to download and execute additional malware to exploit the user's system. The vulnerability is known as cve-2009-0927 and there is a patch available to prevent it from affecting your system. Moral of the story: keep your system patched and be careful where you browse.

Next, I'll get to the other exploits and shellcode.

No comments:

Post a Comment