About Me

Bay Area, CA, United States
I'm a computer security professional, most interested in cybercrime and computer forensics. I'm also on Twitter @bond_alexander All opinions are my own unless explicitly stated.

Monday, June 6, 2011

Reverse engineering a malicious PDF Part 3

Welcome to my series in progress about reversing a malicious PDF. Last time I worked through the first exploit, geticon(), and its shellcode payload. Next, I'll be looking at printf(), which is triggered if the user is running Adobe Reader 7.1. Here's the code:
Again, the payload is the appropriate shellcode variable I found previously (shcode_printf). It's different code this time, but it follows the same pattern. Build the NOP sled, this time in the variable nop. Append the payload to create heapblock. Build a bigger, 261310 character NOP sled in block. Create an array mem and populate it with 1400 copies of the full NOP sled plus heapblock to create the heap spray. Attack the vulnerable function util.printf, overflow the buffer and Adobe Reader hits the NOP sled and executes the shellcode.

This is an older exploit, CVE-2008-2992 made public in November of 2008. A patch was available at the same time the vulnerability was published.

Now, the fun part. What does the shellcode do? Like last time, let's look at the hex and see if there's any obvious URLs.
No such luck. Back to scLog to execute the shellcode. Executing the shellcode shows it loads shell32 to get the Temp path and loads urlmon to try (but fail) to download a file to Temp. Just like before, it tries to access /forum.php?f=PDF and passes along the exploit used (printf). Again, the file would have been saved as a.exeSimple enough.

Onward to exploit 3, collab().
First, we have a function fix_it, which takes two variables: yarsp and len. It enlarges the string to twice len, then cuts it down to half len. Once again, the shellcode is taken from shcode_collab and stored as var shellcode. Variables cc and addr are set to hex numbers, and sc_len is set to twice the length of the shellcode (338). These are used to calculate the new variable len (equal to 4093910). All of this is leading up to the good stuff, beginning with var yarsp. This variable is defined with a few NOP codes, which is then run through fix_it with len. This extends yarsp to a real NOP sled, 2096955 characters long. The variable count2 is defined and a for loop is used to generate mem_array, which is the heap spray. Next, the var overflow is created and extended to 65536 characters. Finally, overflow is passed to the vulnerable Collab.collectEmailInfo() method to trigger the exploit. This is another old exploit, discovered and patched in 2007 (CVE-2007-5659)

So far, so good. Now, onto the shellcode. I run it through scLog just like the others ... and just like the others it loads shell32 to find the temp directory, uses URLDownloadtoFile to try to access /forum.php?f=PDF (Collab)&key=... and save it to temp as a.exe.

The first three exploits all followed the same basic pattern: create a big NOP sled, attach shellcode, replicate it a few hundred times into a heap spray, overflow the buffer and let it go. All the shellcodes had essentially the same function as well, download a trojan EXE to the temp directory. As a result, I'll leave the last exploit and shellcode as an exercise to the reader. :)

Thursday, June 2, 2011

Reverse engineering a malicious PDF Part 2

In Part 1, I began analyzing a malicious PDF. Within the PDF, there was a fair amount of obfuscated malicious Javascript present, which I parsed through. Through many transformations and text replacement, the Javascript eventually decoded and executed the attack code, saved as the variable etppeifjeka.

The attack code was initially obfuscated with excessive exclamation marks:
but once the exclamation marks were removed, it became neat and tidy code. Unlike the malicious Javascript I analyzed last month, once the exclamation marks were removed this code even had line breaks, making it much more legible. The attack code contains several functions: nplayer, printf, geticon, and collab. The PDF contains code to read which version of Acrobat is running, and based on that chooses the exploit to launch.
Adobe has provided some documentation for the app.viewerVersion method. In this case, it's looking at the version of the EScript plugin (which provides Javascript support). The EScript plugin version number is actually the same as the version number for Acrobat itself. Thus, if Acrobat is version 9 or version 8.12 or higher, it runs geticon. If Acrobat is version 7.1, it runs printf. If Acrobat is version 6 or below version 7.11, it runs collab. The last one is oddly written, they might have been trying to write "between 9.1 and 9.2" but as it's written nplayer will be triggered if it's greater than 9.1 or less than 9.2 ... which means if it hasn't hit one of the other functions it'll hit this one.

Here's the code for the geticon function:
Back in part one, I guessed that the shcode variables were the shellcode payloads for the PDF. This function confirms my guess. First, the function grabs shcode_geticon to collect the appropriate shellcode. Then, it's appended to the end of a short NOP sled and saved as the variable garbage. The next bit of code (lines 46-53) uses a variable nopblock to extend the size of the NOP sled. By the time we get through the while loop, the variable block contains a NOP sled 262045 characters long. Since all of this is being stored in memory as the code executes, this is a heap spray. Then, an array called memory is constructed (line 54-55), containing 180 copies of block plus the shellcode. Lines 56-61 construct var buffer with 4012 copies of %0a%0a%0a%0a which are line feeds in hex. Finally, the array buffer is passed to the vulnerable geticon function. The buffer overflows, the execution hits one of the NOP sleds present and executes the shellcode.

Incidentally, this is a well known exploit. I found the exact same exploit code being shared on a security research message board two years ago. Just because all that the news talks about is the new, sophisticated malware, that doesn't mean the old stuff goes away quickly. For example, Contagio has seen samples of this from last January.

Now we've finally reached the point where the shellcode is executed. In this case, what does it do? Daniel Wesemann of the SANS Internet Storm Center provides a short Perl script to take the shellcode and dump it to hex to see if there's anything obvious, like a URL. Here's the code and the results:
In this case, it wasn't very helpful. There's no obvious URL or even anything that follows a URL pattern. The next article in Daniel Wesemann's series continues to compile and disassemble the shellcode for reversing, but I don't read assembly so it's off to another option for me. Malware Tracker has an online shellcode analysis tool, but it didn't work for the four shellcode samples in this pdf. Cruising online guides to shellcode analysis led me to a tool called libEmu. However, when I ran the code in libEmu, it hit my 10000000 step limit before execution actually completed. It looks like either I did something wrong, or I hit an infinite loop in the shellcode. Odd. The same happened with each of the four shellcodes.

Since I'm still new to malicious PDF analysis, I talked to some guys in Threat Research here and they pointed me to a tool called PDF Stream Dumper. Interestingly, I was able to execute the shellcode I copied-and-pasted into it, but it choked on the actual PDF stream that Didier Stevens' tools processed without difficulty in Part 1. This confirms the need to use multiple tools, you never know when one will fail you.

I executed the shellcode within PDF Stream Dumper. There are a couple different ways that you can do this, scDbg uses the libEmu emulation and it crashes the analysis just like libEmu does running under Linux. Running the scLog version (live, not emulated analysis) the shellcode executes. scLog notes that the shellcode loads urlmon.dll, which is an Internet Explorer library for fetching files from remote URLs. Then, the shellcode tries to access shell32.dll, but scLog kills the shellcode to prevent that. Looking at the memory dump in a hex editor, I see a call to a url: /forum.php?f=PDF (GetIcon)&key=87c1a082278ace8fdf2f63b86db29d6f&u= and a reference to a.exe

These file references imply downloading an external file, but implication isn't proof. So, let's use scLog again and let it actually load all DLLs this time and see what happens. First, some cautionary notes: this is malware we're executing and I'm turning off some of scLog's safety functions. As a result, I'm adding some safety back in. I took a snapshot of my analysis VM first so I could revert if needed. Since this shellcode looks like a downloader, I'm running Wireshark to see if anything actually gets downloaded.

It tries to access /forum.php?f=PDF (GetIcon)&key=87c1a082278ace8fdf2f63b86db29d6f&u= and download a file as a.exe to the user's temporary directory. However, since that's a local URL, it fails and the shellcode crashes. Wireshark confirms that nothing was downloaded. Interestingly, /forum.php takes parameters including where the file request is coming from (a PDF) and even which exploit is being used (GetIcon). Interestingly, it looks like this PDF was intended to be viewed online, not downloaded (or emailed) and viewed.

In summary, the geticon() function exploits a known vulnerability in Acrobat to hook urlmon.dll to download and execute additional malware to exploit the user's system. The vulnerability is known as cve-2009-0927 and there is a patch available to prevent it from affecting your system. Moral of the story: keep your system patched and be careful where you browse.

Next, I'll get to the other exploits and shellcode.