About Me

Bay Area, CA, United States
I'm a computer security professional, most interested in cybercrime and computer forensics. I'm also on Twitter @bond_alexander All opinions are my own unless explicitly stated.

Thursday, May 26, 2011

Reverse engineering a malicious PDF Part 1

One of the projects I work on is a malicious Javascript scanner. It also scans PDFs since the malicious part of PDFs is usually encoded Javascript. To test the scanner, we regularly collect malicious PDFs and run them against the scanner to see if they're detected. Of course, in order to determine if it's really malicious, sometimes you need to go in by hand and see what's going on. To this end, the Didier Stevens wrote a chapter on analyzing malicious PDFs I'll be using that as a reference as I go through a malicious PDF here. I recommend reading it alongside this article. Didier is far better at this than I am, so I won't be trying to explain structural concepts which he explains far better. The PDF I'll be looking at is named 4469.pdf. It was downloaded "in the wild" from a website listed on the Malware Domain List.

Didier provides several python scripts that are useful for analyzing PDFs, the first is pdfid.py. It examines the PDF for indicators of a possibly malicious PDF, such as the presence of Javascript, automatic actions, and document length (most malicious PDFs are only one page). Here's what the results look like for 4469.pdf.
In this case, the PDF is only one page, contains Javascript and contains code that will launch when the PDF is opened (OpenAction). This is potentially suspicious, so let's keep investigating. We know that the Javascript is where the malicious activity will happen, so let's look at that first, using Didier's pdf-parser.py
Pdf-parser.py only located Javascript is in indirect object 2 0 of the PDF. However, indirect object 2 0 references indirect object 11 0 as an OpenAction and Javascript. In a moment we'll see why pdf-parser.py didn't identify indirect object 11 0 as containing Javascript. For now, we see that the pdf is invoking Javascript when the file is opened, which we expect from a malicious PDF, and we expect that indirect object 11 contains our payload.

Using pdf-parser.py again, I can parse out indirect object 11, which is a stream object compressed with the Flate method. Interestingly, this is exactly the same situation in Didier's example script, so it seems this is a common way to obfuscate malicious code in pdfs.  Since it's common, Didier provides a method to uncompress the script: pdf-parser.py --object 11 --filter --raw 4469.pdf .... and voila, we have malicious code:
(click to enlarge)
It keeps going like that for another couple pages.

Just like in the malicious Javascript I took a look at last month, the functions and variables all have random names: function hddd(fff), var fpziycpii, etc. There's also plenty of junk characters and excessive transformations to make analysis more annoying. Here's one example towards the end of the script:
for(yrauyiyqouoi=0;yrauyiyqouoi<gmgdouaeyd;yrauyiyqouoi++){var dsfsg = yrauyiyqouoi+1;xrywreom+='var oynaoyoyaia'+dsfsg+' = oynaoyoyaia'+yrauyiyqouoi+';';this[fuquoudieeel](xrywreom);}
There's even a section where every letter is interspersed with a bunch of exclamation marks. It's all messy, but nothing we can't eventually analyze. Let's start at the top.

The first code introduced is function hddd which takes the parameter fff. It takes the parameter and replaces the ** with %u. There are four separate strings processed by this function, which means each string is actually a unicode-encoded string. These are stored as variables: shcode_geticon, shcode_newplayer, shcode_printf, and shcode_collab. Based on the names, these strings are likely the shellcode payloads, but we'll see when we get there.

Next we have:
var fpziycpii = 'e';var uinsenagexo = 'l';var fuquoudieeel = fpziycpii+'va'+uinsenagexo;var ioafyyad = this[fuquoudieeel];rtaoyuupaue = "ioa!fyya!d!('!t!hi!s![!fuqu!ou!d!ie!ee!l](o!y!na!o!yoy!aia!'+!gmg!do!u!a!e!yd!+!')!;!'!)!;".replace(/[!]/g, '');
Stepping through it, the variable fuquoudieeel takes the first two variables and combines them to get "eval", so ioafyyad is this[eval]. Next, rtaoyuupaue is a string that has the replace function executed on it. In this case, the replace function just removes all the extra exclamation points that are in there, yielding:


If we substitute in the known variables, we get:


That's an improvement, but there's still work to do. The variable gmgdouaeyd is later defined as 1100, so we get oynaoyoyaia1100,a variable which isn't defined yet. There's a section towards the end with oynaoyoyaia0 = eiuaopyj; but obviously that's not the same variable. It may be a typo, or it may be junk code ... we'll see. For now, let's move on.

Next we have another function:

function iuoyzemuyyi(ieuohhrk)
var iuioathlpau = '!';
var unetoptou = '';
var yaomwteez = ieuohhrk.charAt(xqqauiae);
if(yaomwteez == iuioathlpau) {  } else { unetoptou+=yaomwteez;
return unetoptou;
This function is a longer, more complicated way of removing the exclamation marks from a string. Like the last one, this is applied another code section stored as a string and obfuscated with five exclamation marks. The string is un-obfuscated and stored as the variable etppeifjeka. That's a long section and it looks like that's part of the payload, so we'll get to that in part 2. For now, let's skip past it and see how it's used.

The last section is this:
eiuaopyj = ''+etppeifjeka+'';
var gmgdouaeyd = 1100;
var xrywreom = '';
oynaoyoyaia0 = eiuaopyj;
var dsfsg = yrauyiyqouoi+1;
xrywreom+='var oynaoyoyaia'+dsfsg+' = oynaoyoyaia'+yrauyiyqouoi+';';
This section is odd, to say the least. The for loop constructs a string which is stored in the variable xrywreom. The loop counts from 0 to 1100 and builds a section of code that declares a series of variables oynaoyoyaiaX where X is the current number, and the variable is set to equal the previous number. The output looks like this:
var oynaoyoyaia1 = oynaoyoyaia0;var oynaoyoyaia2 = oynaoyoyaia1;var oynaoyoyaia3 = oynaoyoyaia2;var oynaoyoyaia4 = oynaoyoyaia3;var oynaoyoyaia5 = oynaoyoyaia4;
It goes up to var oynaoyoyaia1100 = oynaoyoyaia1099; Each step of the loop, the loop runs this[fuquoudieeel](xrywreom); which executes the code stored in the variable. This creates 1100 variables and sets them all equal to eiuaopyj (the variable holding the obfuscated section we haven't examined yet). Let's go back to the earlier section where we saw a reference to oynaoyoyaia. We had deobfuscated it to this point:
which evaluates back to etppeifjeka, the probable payload.

After the for loop is the last line of Javascript in this PDF: ioafyyad(rtaoyuupaue); As we've already discovered, ioafyyad is this[eval] and rtaoyuupaue is this[eval]('(oynaoyoyaia1100);'); so that is the line of code that actually triggers the exploit.

All that's left to do is deobfuscate the exploit itself and see what it does.

Thursday, May 5, 2011

Firefox 4 Browser Forensics, Part 5

We're nearing the end of my series on Firefox 4 forensics (click here for the full list). Media coverage has finally started to make people aware of how much their online behavior is tracked, and the addition of "Private Browsing" modes in all major browsers is making browser anti-forensics easier than ever. This means we'll probably encounter it in our investigations.

First, I'll cover actions that prevent the creation of artifacts: turning "Remember History" off and using "Private Browsing" mode. Then I'll cover various some methods of destroying artifacts that have been created. I won't be covering third-party products.

Preventative antiforensics

To test "Private Browsing" mode, I activated private browsing, searched for Nmap and downloaded the latest version and then closed Firefox. First, I wanted to see if the page was listed in the browser history, so I opened places.sqlite and queried: "select * from moz_places where url like '%nmap%';" No result.  Same with searching for 'input' in typed_urls, no cookies from the domain and nothing in the download history either. However, the google search for "nmap", many nmap images, the websites http://nmap.org, and http://nmap.org/download.html all appear in the browser cache with the appropriate timestamps and fetch count. This, plus having the creation time of the downloaded file, tells us exactly what the user did and when.

Turning off browsing history is pretty easily done, it's front-and-center on the "Privacy" tab in options. Default is "Remember history", but there are custom history settings as well as just "off". To test the artifacts, I turned off browsing history, googled "metasploit", and downloaded the latest version. As is expected, nothing is appearing in moz_places (browsing history). Nothing's showing up in the cache or the download history, so oddly enough turning off browsing history protects privacy better than "Private Browsing" mode. That means the only possibilities for detection are outside of Firefox, such as using the operating system to track who was logged in when the downloaded file was created and who executed it.

Note, this is accurate at the time of writing, for the current version of Firefox (4.0.1). Once this is made known, it's entirely possible that any of these behaviors will change. You should always run your own testing to confirm behavior before trusting it in a case.

Evidence destruction

But what if the target of our investigation didn't know in advance that he needed to cover his tracks? Firefox has several options to remove recorded data, from the selective to the blunt.

The most selective way to remove data is through the history pane. If you open the history pane and right-click on a history item, you can select "forget this site". Let's imagine this is a "violation of policy" case: browsing porn at work. I browsed to www.pornhub.com and started a video streaming to get a good cache. Opening up the history, it looks like Pornhub connected to several other porn sites, so if our suspect didn't make sure to forget all of the relevant sites there would still be evidence of their illicit browsing. In this case, however, I'm going to make sure and forget about all of them. After "forgetting" all the sites, there are no traces left in places.sqlite. There's evidence that sites were forgotten because of the gap in id numbers, but no indication of what was formerly there. Interestingly, using "forget this site" completely destroys the cache, but only removes the selected site(s) from the browsing history. This is a clear sign of evidence destruction, and the deleted cache files could likely be recovered from unallocated space or from backups (such as Volume Shadow Copy).

If any of the databases are deleted, Firefox will automatically create a new empty copy of it the next time it's run. Normally, the databases will have a modified date of the last browsing event, but a creation date of when Firefox was originally installed. The creation date is not even modified when Firefox is upgraded or the history is "forgotten" through the browser options. Therefore, if the creation date of the tables is more recent than the creation date of core Firefox files (such as firefox.exe), it's a clear indication that the table was deleted around the creation date of the existing table. It may be recoverable through standard means.

Directly modifying the databases would be somewhat more difficult to detect. The databases are modified constantly through regular browsing, so the timestamps wouldn't be a clue. However, like "forgetting this site", there will be a gap in the normally sequential ID numbers that could indicate that something was deleted, and examining the last_visit_date of the sites surrounding the gap might allow you to determine when the missing sites were visited. If backups of the databases exist, they might have the missing data. Also, the cache isn't nearly as user-friendly to edit as a sqlite database so if the cache isn't cleared it could provide a clue for what was lost. Even if the cache had been cleared, the deleted files might be recoverable through standard methods.

This isn't meant to be a complete overview of all possible methods of antiforensics with Firefox, just a quick highlight of some possibly relevant issues and how to detect and overcome them. This is the end of my Firefox 4 forensics series, I hope it'll be a useful reference for your investigations. If any of this information turns out to be incorrect or changes in future versions, please let me know and I'll edit the appropriate post.