About Me

My photo
Bay Area, CA, United States
I'm a computer security professional, most interested in cybercrime and computer forensics. I'm also on Twitter @bond_alexander All opinions are my own unless explicitly stated.

Thursday, October 20, 2011

Status update

Wow, I can't believe I haven't updated this blog since July. A lot has been going on since then, and I've been too busy to keep up the blog. While I hope to have more time and material in the near future, I'm starting a new role at a new company at the end of the month and I don't yet know their stance on personal blogging. Once I've had a chance to get settled there and get to know their stance on blogging, hopefully I'll be back here posting regularly about what's going on.

In the meantime, I've been learning more about incident response. In particular, Harlan Carvey's written some great articles on his blog, I highly recommend them. They're more pointed at high level overviews of IR rather than step by step how to do it, but that's what I need right now: basics of how to approach it. Besides, the details of IR vary dramatically based on your org's situation and needs, so that's really the only way to do it.

Also, Brian Baskin's DerbyCon talk "How to Get Fired After a Security Incident" is now available online. It's a great presentation about common mistakes made in forensics and incident response.

The short version of both Harlan's and Brian's message: prepare yourself before you discover your breach.

Monday, July 11, 2011

Sabotage, Stuxnet and the future of cyber attacks

Last year, before LulzSec and Sony's epic fail, the big topic in computer security was the uniquely sophisticated and targeted malware known as Stuxnet. I blogged about it back in September. Now, Kim Zetter of Wired Magazine's Threat Level blog just posted a great overview of the effort to reverse engineer Stuxnet. If you haven't read it yet, you should now. Not only does it present a lot of great info on Stuxnet, it also gives some good insight into malware reverse engineering in general. The rest of this post will presume you've read it.

Most of the article is excellent, well researched and well written. However, I take serious issue with one of Ralph Langner's quotes towards the end of the article. Here's the excerpt:
They will likely have no second chance to unleash their weapon now. Langner has called Stuxnet a one-shot weapon. Once it was discovered, the attackers would never be able to use it or a similar ploy again without Iran growing immediately suspicious of malfunctioning equipment.

“The attackers had to bet on the assumption that the victim had no clue about cybersecurity, and that no independent third party would successfully analyze the weapon and make results public early, thereby giving the victim a chance to defuse the weapon in time,” Langner said.
In an ideal world, Langner would be completely correct, but in practical terms he's wrong. I have great respect for Langner, his expertise and his work, but it seems that almost daily I'm reading about people falling for the same attacks over and over again. As just one example, Stuxnet spread from network to network through infected USB drives. This isn't a new attack, back in 2008 the Department of Defense was hit by a major attack spread through USB. That virus was successful, but was a re-use of a virus from 2007. One would hope that the US Government takes information security seriously, but just this year DHS tested how many employees would pick up an infected USB drive and plug it into a secure system. Result: 60%. If there was a company or government logo on the drive, it was up to 90%. Old and well-known attacks work, even on high-value targets that really ought to know better. Similarly, although 0-day exploits are highly valued for malware and hacking attempts, the majority of malware out there is successful using exploits for which patches are available.

However, let's give the Iran's Atomic Energy Organization the benefit of the doubt. Let's presume since Stuxnet, they're keeping updated with every critical security patch for every piece of software they run -- an impressive feat! That can't keep them safe, new exploits are discovered daily. To get a sense of the scale of the problem, take a look at the Exploit Database and remember that those are only the exploits that are discovered by responsible security researchers, not criminals. To further complicate issues, Stuxnet has a compartmental structure. From the article: "[Stuxnet] contained multiple components, all compartmentalized into different locations to make it easy to swap out functions and modify the malware as needed." It seems apparent that the authors of Stuxnet could simply swap in new 0-day attacks and continue as before. In fact, earlier this year a security researcher discovered a serious bug in Siemens' industrial control software and wrote proof-of-concept malware to exploit it. He claims that Siemens didn't take aggressive enough action to patch the exploit.

Frankly, the recent hacking of Lockheed, Sony, Oak Ridge National Labs, Sony, InfraGard, Sony, RSA, Sony, HBGary, Sony, and assorted government contractors proves that any network can be penetrated. The only difference now is that people are more aware that attacks on the sophistication level of Stuxnet are possible. This gives incident responders a better chance to identify and react to malware and breaches. This is what Zetter referred to when she wrote "the attackers would never be able to use it or a similar ploy again without Iran growing immediately suspicious of malfunctioning equipment." The difficulty is, equipment malfunctions. Software has bugs and hardware fails, particularly when you're a country dealing with jury-rigged equipment smuggled in under trade embargoes. For any given failure, cyber attack is the least likely cause, that's why Iran's centrifuge failure rate could increase dramatically for months before a cause was found.

To further complicate the issue, I find it highly unlikely that Iran has sufficient personnel with the skills necessary for incident response and advanced malware reverse-engineering. Quite frankly, even the US government is having problems recruiting and retaining people with those skills. It's hard to imagine that Iran has an easier time with this problem.

Quite frankly, in my opinion the only limiting factor on cyber attacks against physical infrastructure is the will and resources to do it again. It's only a matter of time before another powerful and skilled group decides they want to execute a similar attack.

Monday, June 6, 2011

Reverse engineering a malicious PDF Part 3

Welcome to my series in progress about reversing a malicious PDF. Last time I worked through the first exploit, geticon(), and its shellcode payload. Next, I'll be looking at printf(), which is triggered if the user is running Adobe Reader 7.1. Here's the code:
Again, the payload is the appropriate shellcode variable I found previously (shcode_printf). It's different code this time, but it follows the same pattern. Build the NOP sled, this time in the variable nop. Append the payload to create heapblock. Build a bigger, 261310 character NOP sled in block. Create an array mem and populate it with 1400 copies of the full NOP sled plus heapblock to create the heap spray. Attack the vulnerable function util.printf, overflow the buffer and Adobe Reader hits the NOP sled and executes the shellcode.


This is an older exploit, CVE-2008-2992 made public in November of 2008. A patch was available at the same time the vulnerability was published.


Now, the fun part. What does the shellcode do? Like last time, let's look at the hex and see if there's any obvious URLs.
No such luck. Back to scLog to execute the shellcode. Executing the shellcode shows it loads shell32 to get the Temp path and loads urlmon to try (but fail) to download a file to Temp. Just like before, it tries to access /forum.php?f=PDF and passes along the exploit used (printf). Again, the file would have been saved as a.exeSimple enough.

Onward to exploit 3, collab().
First, we have a function fix_it, which takes two variables: yarsp and len. It enlarges the string to twice len, then cuts it down to half len. Once again, the shellcode is taken from shcode_collab and stored as var shellcode. Variables cc and addr are set to hex numbers, and sc_len is set to twice the length of the shellcode (338). These are used to calculate the new variable len (equal to 4093910). All of this is leading up to the good stuff, beginning with var yarsp. This variable is defined with a few NOP codes, which is then run through fix_it with len. This extends yarsp to a real NOP sled, 2096955 characters long. The variable count2 is defined and a for loop is used to generate mem_array, which is the heap spray. Next, the var overflow is created and extended to 65536 characters. Finally, overflow is passed to the vulnerable Collab.collectEmailInfo() method to trigger the exploit. This is another old exploit, discovered and patched in 2007 (CVE-2007-5659)

So far, so good. Now, onto the shellcode. I run it through scLog just like the others ... and just like the others it loads shell32 to find the temp directory, uses URLDownloadtoFile to try to access /forum.php?f=PDF (Collab)&key=... and save it to temp as a.exe.

The first three exploits all followed the same basic pattern: create a big NOP sled, attach shellcode, replicate it a few hundred times into a heap spray, overflow the buffer and let it go. All the shellcodes had essentially the same function as well, download a trojan EXE to the temp directory. As a result, I'll leave the last exploit and shellcode as an exercise to the reader. :)

Thursday, June 2, 2011

Reverse engineering a malicious PDF Part 2

In Part 1, I began analyzing a malicious PDF. Within the PDF, there was a fair amount of obfuscated malicious Javascript present, which I parsed through. Through many transformations and text replacement, the Javascript eventually decoded and executed the attack code, saved as the variable etppeifjeka.


The attack code was initially obfuscated with excessive exclamation marks:
but once the exclamation marks were removed, it became neat and tidy code. Unlike the malicious Javascript I analyzed last month, once the exclamation marks were removed this code even had line breaks, making it much more legible. The attack code contains several functions: nplayer, printf, geticon, and collab. The PDF contains code to read which version of Acrobat is running, and based on that chooses the exploit to launch.
Adobe has provided some documentation for the app.viewerVersion method. In this case, it's looking at the version of the EScript plugin (which provides Javascript support). The EScript plugin version number is actually the same as the version number for Acrobat itself. Thus, if Acrobat is version 9 or version 8.12 or higher, it runs geticon. If Acrobat is version 7.1, it runs printf. If Acrobat is version 6 or below version 7.11, it runs collab. The last one is oddly written, they might have been trying to write "between 9.1 and 9.2" but as it's written nplayer will be triggered if it's greater than 9.1 or less than 9.2 ... which means if it hasn't hit one of the other functions it'll hit this one.


Here's the code for the geticon function:
Back in part one, I guessed that the shcode variables were the shellcode payloads for the PDF. This function confirms my guess. First, the function grabs shcode_geticon to collect the appropriate shellcode. Then, it's appended to the end of a short NOP sled and saved as the variable garbage. The next bit of code (lines 46-53) uses a variable nopblock to extend the size of the NOP sled. By the time we get through the while loop, the variable block contains a NOP sled 262045 characters long. Since all of this is being stored in memory as the code executes, this is a heap spray. Then, an array called memory is constructed (line 54-55), containing 180 copies of block plus the shellcode. Lines 56-61 construct var buffer with 4012 copies of %0a%0a%0a%0a which are line feeds in hex. Finally, the array buffer is passed to the vulnerable geticon function. The buffer overflows, the execution hits one of the NOP sleds present and executes the shellcode.


Incidentally, this is a well known exploit. I found the exact same exploit code being shared on a security research message board two years ago. Just because all that the news talks about is the new, sophisticated malware, that doesn't mean the old stuff goes away quickly. For example, Contagio has seen samples of this from last January.


Now we've finally reached the point where the shellcode is executed. In this case, what does it do? Daniel Wesemann of the SANS Internet Storm Center provides a short Perl script to take the shellcode and dump it to hex to see if there's anything obvious, like a URL. Here's the code and the results:
In this case, it wasn't very helpful. There's no obvious URL or even anything that follows a URL pattern. The next article in Daniel Wesemann's series continues to compile and disassemble the shellcode for reversing, but I don't read assembly so it's off to another option for me. Malware Tracker has an online shellcode analysis tool, but it didn't work for the four shellcode samples in this pdf. Cruising online guides to shellcode analysis led me to a tool called libEmu. However, when I ran the code in libEmu, it hit my 10000000 step limit before execution actually completed. It looks like either I did something wrong, or I hit an infinite loop in the shellcode. Odd. The same happened with each of the four shellcodes.


Since I'm still new to malicious PDF analysis, I talked to some guys in Threat Research here and they pointed me to a tool called PDF Stream Dumper. Interestingly, I was able to execute the shellcode I copied-and-pasted into it, but it choked on the actual PDF stream that Didier Stevens' tools processed without difficulty in Part 1. This confirms the need to use multiple tools, you never know when one will fail you.


I executed the shellcode within PDF Stream Dumper. There are a couple different ways that you can do this, scDbg uses the libEmu emulation and it crashes the analysis just like libEmu does running under Linux. Running the scLog version (live, not emulated analysis) the shellcode executes. scLog notes that the shellcode loads urlmon.dll, which is an Internet Explorer library for fetching files from remote URLs. Then, the shellcode tries to access shell32.dll, but scLog kills the shellcode to prevent that. Looking at the memory dump in a hex editor, I see a call to a url: /forum.php?f=PDF (GetIcon)&key=87c1a082278ace8fdf2f63b86db29d6f&u= and a reference to a.exe


These file references imply downloading an external file, but implication isn't proof. So, let's use scLog again and let it actually load all DLLs this time and see what happens. First, some cautionary notes: this is malware we're executing and I'm turning off some of scLog's safety functions. As a result, I'm adding some safety back in. I took a snapshot of my analysis VM first so I could revert if needed. Since this shellcode looks like a downloader, I'm running Wireshark to see if anything actually gets downloaded.


It tries to access /forum.php?f=PDF (GetIcon)&key=87c1a082278ace8fdf2f63b86db29d6f&u= and download a file as a.exe to the user's temporary directory. However, since that's a local URL, it fails and the shellcode crashes. Wireshark confirms that nothing was downloaded. Interestingly, /forum.php takes parameters including where the file request is coming from (a PDF) and even which exploit is being used (GetIcon). Interestingly, it looks like this PDF was intended to be viewed online, not downloaded (or emailed) and viewed.


In summary, the geticon() function exploits a known vulnerability in Acrobat to hook urlmon.dll to download and execute additional malware to exploit the user's system. The vulnerability is known as cve-2009-0927 and there is a patch available to prevent it from affecting your system. Moral of the story: keep your system patched and be careful where you browse.


Next, I'll get to the other exploits and shellcode.

Thursday, May 26, 2011

Reverse engineering a malicious PDF Part 1

One of the projects I work on is a malicious Javascript scanner. It also scans PDFs since the malicious part of PDFs is usually encoded Javascript. To test the scanner, we regularly collect malicious PDFs and run them against the scanner to see if they're detected. Of course, in order to determine if it's really malicious, sometimes you need to go in by hand and see what's going on. To this end, the Didier Stevens wrote a chapter on analyzing malicious PDFs I'll be using that as a reference as I go through a malicious PDF here. I recommend reading it alongside this article. Didier is far better at this than I am, so I won't be trying to explain structural concepts which he explains far better. The PDF I'll be looking at is named 4469.pdf. It was downloaded "in the wild" from a website listed on the Malware Domain List.


Didier provides several python scripts that are useful for analyzing PDFs, the first is pdfid.py. It examines the PDF for indicators of a possibly malicious PDF, such as the presence of Javascript, automatic actions, and document length (most malicious PDFs are only one page). Here's what the results look like for 4469.pdf.
In this case, the PDF is only one page, contains Javascript and contains code that will launch when the PDF is opened (OpenAction). This is potentially suspicious, so let's keep investigating. We know that the Javascript is where the malicious activity will happen, so let's look at that first, using Didier's pdf-parser.py
Pdf-parser.py only located Javascript is in indirect object 2 0 of the PDF. However, indirect object 2 0 references indirect object 11 0 as an OpenAction and Javascript. In a moment we'll see why pdf-parser.py didn't identify indirect object 11 0 as containing Javascript. For now, we see that the pdf is invoking Javascript when the file is opened, which we expect from a malicious PDF, and we expect that indirect object 11 contains our payload.


Using pdf-parser.py again, I can parse out indirect object 11, which is a stream object compressed with the Flate method. Interestingly, this is exactly the same situation in Didier's example script, so it seems this is a common way to obfuscate malicious code in pdfs.  Since it's common, Didier provides a method to uncompress the script: pdf-parser.py --object 11 --filter --raw 4469.pdf .... and voila, we have malicious code:
(click to enlarge)
It keeps going like that for another couple pages.

Just like in the malicious Javascript I took a look at last month, the functions and variables all have random names: function hddd(fff), var fpziycpii, etc. There's also plenty of junk characters and excessive transformations to make analysis more annoying. Here's one example towards the end of the script:
for(yrauyiyqouoi=0;yrauyiyqouoi<gmgdouaeyd;yrauyiyqouoi++){var dsfsg = yrauyiyqouoi+1;xrywreom+='var oynaoyoyaia'+dsfsg+' = oynaoyoyaia'+yrauyiyqouoi+';';this[fuquoudieeel](xrywreom);}
There's even a section where every letter is interspersed with a bunch of exclamation marks. It's all messy, but nothing we can't eventually analyze. Let's start at the top.

The first code introduced is function hddd which takes the parameter fff. It takes the parameter and replaces the ** with %u. There are four separate strings processed by this function, which means each string is actually a unicode-encoded string. These are stored as variables: shcode_geticon, shcode_newplayer, shcode_printf, and shcode_collab. Based on the names, these strings are likely the shellcode payloads, but we'll see when we get there.

Next we have:
var fpziycpii = 'e';var uinsenagexo = 'l';var fuquoudieeel = fpziycpii+'va'+uinsenagexo;var ioafyyad = this[fuquoudieeel];rtaoyuupaue = "ioa!fyya!d!('!t!hi!s![!fuqu!ou!d!ie!ee!l](o!y!na!o!yoy!aia!'+!gmg!do!u!a!e!yd!+!')!;!'!)!;".replace(/[!]/g, '');
Stepping through it, the variable fuquoudieeel takes the first two variables and combines them to get "eval", so ioafyyad is this[eval]. Next, rtaoyuupaue is a string that has the replace function executed on it. In this case, the replace function just removes all the extra exclamation points that are in there, yielding:

ioafyyad('this[fuquoudieeel](oynaoyoyaia'+gmgdouaeyd+');');

If we substitute in the known variables, we get:

this[eval]('this[eval]((oynaoyoyaia'+gmgdouaeyd+');');

That's an improvement, but there's still work to do. The variable gmgdouaeyd is later defined as 1100, so we get oynaoyoyaia1100,a variable which isn't defined yet. There's a section towards the end with oynaoyoyaia0 = eiuaopyj; but obviously that's not the same variable. It may be a typo, or it may be junk code ... we'll see. For now, let's move on.


Next we have another function:

function iuoyzemuyyi(ieuohhrk)
{
var iuioathlpau = '!';
var unetoptou = '';
for(xqqauiae=0;xqqauiae<ieuohhrk.length;xqqauiae++)
{
var yaomwteez = ieuohhrk.charAt(xqqauiae);
if(yaomwteez == iuioathlpau) {  } else { unetoptou+=yaomwteez;
}
}
return unetoptou;
}
This function is a longer, more complicated way of removing the exclamation marks from a string. Like the last one, this is applied another code section stored as a string and obfuscated with five exclamation marks. The string is un-obfuscated and stored as the variable etppeifjeka. That's a long section and it looks like that's part of the payload, so we'll get to that in part 2. For now, let's skip past it and see how it's used.

The last section is this:
eiuaopyj = ''+etppeifjeka+'';
var gmgdouaeyd = 1100;
var xrywreom = '';
oynaoyoyaia0 = eiuaopyj;
for(yrauyiyqouoi=0;yrauyiyqouoi<gmgdouaeyd;yrauyiyqouoi++)
{
var dsfsg = yrauyiyqouoi+1;
xrywreom+='var oynaoyoyaia'+dsfsg+' = oynaoyoyaia'+yrauyiyqouoi+';';
this[fuquoudieeel](xrywreom);
}
ioafyyad(rtaoyuupaue);
This section is odd, to say the least. The for loop constructs a string which is stored in the variable xrywreom. The loop counts from 0 to 1100 and builds a section of code that declares a series of variables oynaoyoyaiaX where X is the current number, and the variable is set to equal the previous number. The output looks like this:
var oynaoyoyaia1 = oynaoyoyaia0;var oynaoyoyaia2 = oynaoyoyaia1;var oynaoyoyaia3 = oynaoyoyaia2;var oynaoyoyaia4 = oynaoyoyaia3;var oynaoyoyaia5 = oynaoyoyaia4;
It goes up to var oynaoyoyaia1100 = oynaoyoyaia1099; Each step of the loop, the loop runs this[fuquoudieeel](xrywreom); which executes the code stored in the variable. This creates 1100 variables and sets them all equal to eiuaopyj (the variable holding the obfuscated section we haven't examined yet). Let's go back to the earlier section where we saw a reference to oynaoyoyaia. We had deobfuscated it to this point:
this[eval]('(oynaoyoyaia1100);');
which evaluates back to etppeifjeka, the probable payload.

After the for loop is the last line of Javascript in this PDF: ioafyyad(rtaoyuupaue); As we've already discovered, ioafyyad is this[eval] and rtaoyuupaue is this[eval]('(oynaoyoyaia1100);'); so that is the line of code that actually triggers the exploit.

All that's left to do is deobfuscate the exploit itself and see what it does.

Thursday, May 5, 2011

Firefox 4 Browser Forensics, Part 5

We're nearing the end of my series on Firefox 4 forensics (click here for the full list). Media coverage has finally started to make people aware of how much their online behavior is tracked, and the addition of "Private Browsing" modes in all major browsers is making browser anti-forensics easier than ever. This means we'll probably encounter it in our investigations.

First, I'll cover actions that prevent the creation of artifacts: turning "Remember History" off and using "Private Browsing" mode. Then I'll cover various some methods of destroying artifacts that have been created. I won't be covering third-party products.

Preventative antiforensics

To test "Private Browsing" mode, I activated private browsing, searched for Nmap and downloaded the latest version and then closed Firefox. First, I wanted to see if the page was listed in the browser history, so I opened places.sqlite and queried: "select * from moz_places where url like '%nmap%';" No result.  Same with searching for 'input' in typed_urls, no cookies from the domain and nothing in the download history either. However, the google search for "nmap", many nmap images, the websites http://nmap.org, and http://nmap.org/download.html all appear in the browser cache with the appropriate timestamps and fetch count. This, plus having the creation time of the downloaded file, tells us exactly what the user did and when.

Turning off browsing history is pretty easily done, it's front-and-center on the "Privacy" tab in options. Default is "Remember history", but there are custom history settings as well as just "off". To test the artifacts, I turned off browsing history, googled "metasploit", and downloaded the latest version. As is expected, nothing is appearing in moz_places (browsing history). Nothing's showing up in the cache or the download history, so oddly enough turning off browsing history protects privacy better than "Private Browsing" mode. That means the only possibilities for detection are outside of Firefox, such as using the operating system to track who was logged in when the downloaded file was created and who executed it.

Note, this is accurate at the time of writing, for the current version of Firefox (4.0.1). Once this is made known, it's entirely possible that any of these behaviors will change. You should always run your own testing to confirm behavior before trusting it in a case.

Evidence destruction

But what if the target of our investigation didn't know in advance that he needed to cover his tracks? Firefox has several options to remove recorded data, from the selective to the blunt.

The most selective way to remove data is through the history pane. If you open the history pane and right-click on a history item, you can select "forget this site". Let's imagine this is a "violation of policy" case: browsing porn at work. I browsed to www.pornhub.com and started a video streaming to get a good cache. Opening up the history, it looks like Pornhub connected to several other porn sites, so if our suspect didn't make sure to forget all of the relevant sites there would still be evidence of their illicit browsing. In this case, however, I'm going to make sure and forget about all of them. After "forgetting" all the sites, there are no traces left in places.sqlite. There's evidence that sites were forgotten because of the gap in id numbers, but no indication of what was formerly there. Interestingly, using "forget this site" completely destroys the cache, but only removes the selected site(s) from the browsing history. This is a clear sign of evidence destruction, and the deleted cache files could likely be recovered from unallocated space or from backups (such as Volume Shadow Copy).

If any of the databases are deleted, Firefox will automatically create a new empty copy of it the next time it's run. Normally, the databases will have a modified date of the last browsing event, but a creation date of when Firefox was originally installed. The creation date is not even modified when Firefox is upgraded or the history is "forgotten" through the browser options. Therefore, if the creation date of the tables is more recent than the creation date of core Firefox files (such as firefox.exe), it's a clear indication that the table was deleted around the creation date of the existing table. It may be recoverable through standard means.

Directly modifying the databases would be somewhat more difficult to detect. The databases are modified constantly through regular browsing, so the timestamps wouldn't be a clue. However, like "forgetting this site", there will be a gap in the normally sequential ID numbers that could indicate that something was deleted, and examining the last_visit_date of the sites surrounding the gap might allow you to determine when the missing sites were visited. If backups of the databases exist, they might have the missing data. Also, the cache isn't nearly as user-friendly to edit as a sqlite database so if the cache isn't cleared it could provide a clue for what was lost. Even if the cache had been cleared, the deleted files might be recoverable through standard methods.

This isn't meant to be a complete overview of all possible methods of antiforensics with Firefox, just a quick highlight of some possibly relevant issues and how to detect and overcome them. This is the end of my Firefox 4 forensics series, I hope it'll be a useful reference for your investigations. If any of this information turns out to be incorrect or changes in future versions, please let me know and I'll edit the appropriate post.

Friday, April 29, 2011

Firefox 4 Browser Forensics, Part 4

When I started this series, I had no idea it would go on this long. There are more forensic artifacts in FF4 than I thought. It seems like every time I turn around I find another database to mine for artifacts. For example, I was just about to start tearing apart cookies.sqlite when I saw there was a search.sqlite. It didn't turn out to have any significant artifacts, just the listing of search engines in the quick search box. Much more interesting are browser cookies.

Cookies
Browser cookies are a notable and well-known source of browsing history. In IE, each cookie is a separate text file, but in Firefox, they're stored in yet another sqlite database: cookies.sqlite. This database has just one table, moz_cookies, with several columns:
  • id: an index
  • name: the variable being stored
  • value: the value of "name"
  • host: the website the cookie is for
  • pathmain: the path the cookie is valid for.
  • expiry: when the cookie should be purged
  • lastAccessed: when the website last accessed the cookie in PRTime
  • isSecure: is HTTPS required to access the cookie?
  • isHttpOnly: Can the only be accessed by HTTP, or can other methods (Javascript) access it?
  • baseDomain: The site's base domain, without www or other subdomain
  • creationTime: when the cookie was created in PRTime

When you're drawing conclusions from cookies, remember to take into account the browser's cookie control settings as well as the cookie timestamps. Like browsing history, cookies can tell you when the user viewed a website and can be a source for usernames and passwords for insecure websites. However, if the user clears cookies regularly, the amount of data will be limited. It also doesn't mean the user visited the site directly, as many advertising cookies (such as doubleclick.net) will be downloaded for related sites.

Saved Sessions
One useful aspect of Firefox is it will automatically save the currently open browser tabs so you can reopen it the next session. The browser tabs are saved in sessionstore.js in Javascript object notation. These aren't just the raw URLs, possible fields include page title, referrer, formdata, and cookies. Any GET variables in the URL are preserved as well. As a result, if the user was logged into a website, they may still be be logged in when the saved session is restored. If sessionstore.js has been damaged or deleted, it may be recoverable in the backup, sessionstore.bak. A quick test shows that sessionstore.bak isn't always a duplicate of sessionstore.js, opening a new browser session overwrote sessionstore.js but not sessionstore.bak, so you may be able to recover two different browser sessions under some conditions.

Bookmark backups
In the profile folder there's a folder called bookmarkbackups, which contains a series of JSON files storing the last 10 backups of the user's bookmarks. The filenames are in the format of bookmarks-YYYY-MM-DD. These backup files include the bookmarks the user explicitly makes, but also Firefox's "Smart Bookmarks" which include items of forensic value like "Most Visited", "Recently Bookmarked", and "Recently Tagged". These backups also include timestamps in PRTime for when the bookmark was added and last modified. As always in forensics, backups like these can provide valuable insight into detecting antiforensics (such as deleting bookmarks) and in behavior over time.

Downloads
downloads.sqlite is where Firefox 3+ stores information relating to downloaded files. There's just one table, moz_downloads, but it has quite a few useful artifacts.
  • id: an index
  • name: the local filename of the download
  • source: the remote filename and path being downloaded
  • target: where it's being downloaded to
  • tempPath: if the file is complete, it will be blank. If not, it's where the incomplete file is being stored before moving to target
  • startTime: Time the download started, in PRTime
  • endTime: Time the download finished, also PRTime
  • state: state of download, encoded as an integer 
  • referrer: the page containing the link to the file
  • entityID: a value used for resuming downloads
  • currBytes: Number of bytes downloaded.
  • maxBytes: Total file size.
  • mimeType: MIME file type.
  • preferredApplication: From the download dialogue box, if the user clicks run, this stores the program that will open the downloaded file. If the user clicks save, this will be blank.
  • preferredAction: Action to take when download is complete. Default is 0, just save the file.
  • autoResume: Can the download be resumed if broken?
extensions.ini records what Firefox extensions are installed, which may be useful if, for example, hacker tools like Hackbar or anonymizers like FoxyProxy are installed.

Form history
formhistory.sqlite is another good source for artifacts, although unfortunately it doesn't track what website the form was used on.

  • id: Another numeric index
  • fieldname: The field that contained value. This may be an HTML field in a website's form, or it may be a Firefox field, like searchbar-history.
  • value: What was typed into fieldname. In addition to search history, this is a good source for email addresses and usernames connected to the user.
  • timesUsed: How many times the user has typed value into fieldname.
  • firstUsed: The first time the user typed value into fieldname, in PRTime as always.
  • lastUsed: The last time the user typed value into fieldname, in PRTime as always.
  • guid: A global id for the formhistory, in case syncing is enabled.


Cache
The cache exists in two folders, for Windows 7 they are Users/[user]/AppData/Local/Mozilla/Firefox/Profiles/[profile name]/Cache and Users/[user]/AppData/Local/Mozilla/Firefox/Profiles/[profile name]/OfflineCache. The offline cache is used when Firefox is in offline mode, but it's the standard cache which will be more likely to be used for forensics. The cache can be viewed through the browser by navigating to about:cache, or it can be viewed directly on the file system. On the filesystem, the actual cache files are stored in a set of directories and subdirectories with names in hex. The files can be located by using the _CACHE_MAP_ and _CACHE_00X_ files. A full writeup on how the cache works has been done by Symantec's response team here. As far as I can tell, the cache scheme hasn't changed since then. This layout is rather inconvenient to navigate by hand, so an automated tool like Firefox's cache browser or other tool is definitely the way to proceed here. Looking at about:cache, it appears Firefox stores useful information like the full URL cached (including GET parameters), cached file size, number of times the cached file was accessed, last time the file was modified and time the cached file expires, full hex dump of the file, and the full HTTP request issued to access it. Since these are individual files on the hard drive, the standard Modified-Accessed-Created timestamps can provide additional information.

Alright, we're finally through the forensic artifacts available in Firefox 4! There'll be just one entry left, anti-forensics in Firefox 4.

Friday, April 22, 2011

Firefox 4 Browser Forensics, Part 3

In Part 1, we covered typed URLs and the bookmark structure of Firefox 4. In Part 2, we started delving into the moz_places table, ending with an introduction to frecency.


As I noted, frecency is a number generated by Firefox that combines different measures of user behavior with a URL, including bookmarking it, frequency and recency of visits. The result is a single number that tries to quantify how interested the user is in the URL. This means that with a simple SQL query (select * from moz_places order by frecency desc limit 20 for example), we can get a snapshot of the user's current sites of interest. This could be particularly useful for an "acceptable use policy" investigation, where the issue may turn around how much of a user's browsing history is personal vs. work-related. If the FarmVille app is among a user's top 20 sites sorted by frecency, a violation should be pretty easy to prove. Since frecency is based on other known factors, you could go through the user's full behavior to see what exactly they did to get there (type of visit, etc). That information is stored in moz_inputhistory (which we already covered) and moz_historyvisits.


moz_historyvisits
As the names imply, moz_places records the urls that a user visits, and moz_historyvisits record the details of each visit to the url. Since it records each visit to each page in the domain, there will be a lot of records ... which makes a shorthand like frecency very useful. The columns are id (the index), from_visit (the referrer), place_id (the connection back to the URL entry in moz_places), visit_date  (timestamp in PRTime again), visit_type, and session.


Let's step through what this means by actually analyzing my browsing behavior. First, I ran select * from moz_places order by frecency desc limit 20 to grab the top 20 websites from moz_places to look for something interesting. I'm arbitrarily picking http://www.wired.com as my URL of interest. moz_places.id for this url is 485, so I run select * from moz_historyvisits where place_id=485 which yields:



seven visits. Now, for these results we can ignore the first column (id). Next is from_visit, which gives the id of the page I visited before visiting the current page. For the last three visits, it's 0 indicating that I opened a new tab and went straight to Wired. For the others it refers to the immediately prior id number, which is what you would expect if I was only browsing in one tab at the moment. If the numbers were not consecutive, that would imply multiple-tabbed browsing. In the first entry, I came to Wired after visiting the entry in moz_historyvisits.id=941. Looking up that entry, I see that was a visit to place_id=484, which is http://wired.com. Incidentally, that's true for the first four entries. To explain why, see that the visit type for all four www.wired.com entries is 5, indicating a permanent redirect (HTTP 301). For the respective entries for http://wired.com, the visit type is 2, indicating I typed the URL. So, at those times I typed wired.com in the address bar. Firefox logged the visit and took me to http://wired.com, which redirected me to http://www.wired.com and logged another visit. For the last three visits to Wired, I typed www.wired.com directly. Since those were the more recent and since I logged 7 visits to www.wired.com and only 4 to wired.com, www.wired.com was the one that was on top in frecency (8800 vs. 4813) instead of the non-www version, despite the fact that a typed URL gets a frecency bonus over a redirect URL. Also note that just because I came to www.wired.com via a redirect, doesn't mean I didn't intend to go there. Seeing what site took me there is essential to determine what my intentions were.


Obviously this is a trivial example, but it should be pretty apparent how it could be applied to a more interesting investigation.


More tables
There are a set of tables called moz_annos, moz_anno_attributes and moz_items_annos, but as far as I can tell they are used by Firefox extensions and are unlikely to be useful for forensics.


That's everything of interest out of places.sqlite, but there are other tables that have useful information: tables like signons.sqlite. This is where the logins and passwords are stored. If the user hasn't specified a master password for his Firefox profile, then the usernames and passwords will be clearly available through Options > Security > Saved Passwords. If there is a master password set and you don't have it, then you won't be able to get to the list of sites and passwords directly. Nothing stops you from accessing signons.sqlite directly, though.


signons.sqlite contains two tables, moz_logins and moz_disabledHosts. moz_disabledHosts is a list of websites which the user doesn't want Firefox to ask to remember logins. moz_logins is where the sites, usernames and passwords are stored. The columns are id, hostname (the login page), httpRealm, formSubmitUrl (the actual page the username and password get sent to), usernameField (the identifier of the username field), passwordField (the identifier of the password field), encryptedUsername, encryptedPassword, guid (global ID, for login syncing), encType, timeCreated, timeLastUsed, timePasswordChanged, timesUsed.


Setting a master password in Firefox does not change any of the data here, it just blocks you from viewing the decrypted passwords IN Firefox. You can still get lots of useful browsing information from the file, everything except what their username and passwords are. Documentation on signons.sqlite is scanty, but from what I can tell all the timestamp fields are new in Firefox 4. Strangely, these timestamps aren't encoded in PRTime (microseconds since Unix epoch) like the rest of the timestamps. Instead it's miliseconds since epoch, so we get all the timestamps with:


sqlite> select hostname, datetime(timeCreated/1000,'unixepoch'),datetime(timeLastUsed/1000,'unixepoch'),datetime(timePasswordChanged/1000,'unixepoch') from moz_logins;


For my test system, the results are:


This system has credentials saved for two Facebook accounts. For the first, the login was saved on March 17 and last used on March 31. For the second, it was only used once, on April 4. Neither has changed passwords. This can be useful if you need to prove that a user actually logged into a site, rather than just visiting it. If a user clears their browsing history but not saved passwords (which is default behavior), this will still show up as well. It's also yet another way to prove that a user actually intended to go to a site repeatedly instead of it being an accidental click or redirect. 


The usernames and passwords are encrypted, but the master password and decryption keys are stored in key3.db (more info here). Since you have the decryption keys, decrypting the passwords should be possible if needed. There are tools out to do that, so I'll leave that as an exercise to the reader. :)


In Part 4, I'll go into cookies and cached files.

Monday, April 18, 2011

Firefox 4 Browser Forensics, Part 2

In Part 1, we covered typed URLs and the bookmark structure of Firefox 4. Now, I'm going to start digging into moz_places.

moz_places
moz_places is the main table for the browser history. It's columns are id, url, title, rev_host, visit_count, hidden, typed, favicon_id, and frecency[sic].

The first 20 entries of moz_places in my test system.

As we've mentioned before, id is a simple numerical index that connects the links in moz_places to their references in the other tables. The Mozilla Wiki has a handy graphic for reference.


After id is url which is the full URL of the visited page. As you can see from my test results above, it's the exact page visited, not just the domain. That's why we see my click train from http://www.phishtank.com/ to http://www.phishtank.com/login.php and then to http://www.phishtank.com/phish_archive.php. Next is Title which is the stored title of the page.

The Mozilla Wiki notes:
Bookmarks are basically pointers to that history entry, with a few properties to determine their placement in the bookmarks folder hierarchy. The design is such that most properties of a bookmark are derived from the moz_history entry, and not the bookmark.
This is problematic as soon as the same URI is bookmarked more than once. Eg: If you change a bookmark's title, then the title changes anywhere you've bookmarked that URI.
This is no longer the case. As we already discovered, the moz_bookmarks table has it's own title field where the bookmark title is stored. Changing the bookmark title no longer overwrites the title in moz_places.

rev_host is the fully qualified domain name backwards, ending with a period. So, "http://www.phishtank.com/phish_detail.php" gets converted to "moc.knathsihp.www." Firefox uses this so that they can index this and quickly search for subdomains of a primary domain. An investigator may want to use this field for the same purpose. See the Firefox source code. Thanks to FirefoxForensics for this.

visit_count is another field very useful for an investigation. This is an integer that increments for every time the user visits the site. Since the places.sqlite is associated with the user logged into the computer, this only reflects the number of times that particular user account visited the site. Some additional notes from FirefoxForensics:
This counter is incremented when the associated places.sqlite::moz_historyvisits::visit_type is not 0, 4 (embedded), or 7 (download).

Research by the author shows that reloading a page by clicking the 'Reload current page' button, pressing CTRL-R or following a self-referring URL does not increment visit_count.

If the start-up option to 'Show my windows and tabs from last time' is selected, the visit_count of those pages will NOT be incremented when the browser is started and they are loaded.

If the start-up option to 'Show my home page' is selected, the visit_count of the home page WILL be incremented each time the browser is started and they are loaded.

hidden
According to the Mozilla devs, "Hidden uri's are ones that the user didn't specifically navigate to. Those tend to include embedded pages and images. It might include something else, but I'm not 100% sure." At first glance one might want to exclude this from analysis because the user didn't specifically navigate to them, but since it includes images that would be a mistake. If the user navigated to the page containing the image, the images entry here would still give you valuable info like how many times the image has been loaded, etc.

typed is different than the input field in moz_inputhistory If the user started typing and then clicked the quick result, then it'll be listed under moz_inputhistory.input. If the user typed the full url, then it'll be under moz_places.typed.

To give an example, the first time I went to phishtank.com, I typed "www.phishtank.com" into the address bar and it took me to the site. That generated this entry in the database:

The next time I visited, I just typed 'phishtank' before Firefox recognized the URL and I went to the site. That got stored in moz_inputhistory.

favicon_id is an id number that connects to which image in moz_favicons is the favicon (the little graphic in the title bar) for the page.

The last field in moz_places is frecency. That's not a typo for "frequency", it's a combination of "frequency" and "recency". As you'd expect from the name, it's a value generated by both how often the user has visited a place and how recently they visited it. The full algorithm is here. 0 is the default score, and the higher the number the higher it appears on autocomplete results. As the algorithm link shows, it draws on a number of elements of user behavior to determine the frecency. This means that you can use it as a shorthand for determining if a user was interested in the URL and then investigate the underlying behaviors (how often they visited, how often they typed the address vs. redirect vs. following links, if they bookmarked it, etc) to articulate what exactly they did that shows interest in the link.

Frecency turns out to be a pretty complicated topic, so I'll save it (and signons.sqlite) for part 3. Part 4 covers cookies, cache, downloads and more.

Wednesday, April 13, 2011

Firefox 4 Browser Forensics, Part 1

Firefox 4 was released almost a month ago now, and was in open beta for a significant time before that. However, a cursory Google search doesn't reveal much documentation of it's features of forensic interest, or it's changes since the last version.

For this analysis, I'm using Firefox 4.0 (current version as of writing) running on a Windows 7 VM. To examine the sqlite databases, I'm using the sqlite3 command line tool. Sqlite3 runs under Linux and Windows, and you can feed it SQL commands to script common actions, such as searching history for keywords or for browsing during a time range of interest.

Just like Firefox 3, Firefox 4 stores the browser history in an SQLite database. For Windows Vista/7, it's located at :\Users\\AppData\Roaming\Mozilla\Firefox\Profiles\\places.sqlite This database contains the tables moz_anno_attributes moz_annos moz_bookmarks moz_bookmarks_roots moz_favicons moz_historyvisits moz_inputhistory moz_items_annos moz_keywords and moz_places These tables seem unchanged from version 3, as documented on the MozillaZine.

moz_inputhistory
When a person begins typing into the address bar, Firefox searches through it's browser history to try and connect what the user is typing with the websites he has visited recently. It outputs the suggestions with the webpage title and full URL in the suggestions below the address bar. When the user clicks the suggestion, the text they typed to find that url is stored in the moz_inputhistory table. The moz_inputhistory table is of particular interest to forensics because it can show that the person using the user account typed something into the address bar to access the site, rather than following a link or getting forced there by a pop-up.

The column names for this table are place_id, input, use_count, and PRIMARY KEY. Here's an example of the contents of the table, taken from my test system.



Note that none of these are full links. What you see here is what you get when someone starts typing a URL or keyword into the address bar and then clicks on one of the autocomplete terms. It does not seem to record full URLs. So, we get "ars t" and "arstechn" from two of the ways I accessed Ars Technica. It does not store search terms from the search box.

place_id refers to the destination of the input. This number can be converted into the actual website visited via the moz_places table. So, for example, I see that when I typed "ars t" into my address bar, I clicked on the website with id 197. The command > select * from moz_places where id=197; returns the website URL (http://arstechnica.com/) and the website title (Ars Technica) as well as other information that we'll cover when we get to the moz_places table.

The last column, use_count is an odd one. David Koepi looked at Firefox 3.6 and suggested that use count refers to the number of times the same text was typed into the address bar. On my system in Firefox 4, that doesn't seem to be the case as most of the values in use_count are decimals (see screenshot above). Running some tests, it seems to be related to the number of times run ... for example, typing "sans.o" into the address bar the first time adds it to moz_inputhistory and sets use_count to 1. So far so good, but here's where it gets odd. Typing it in four more times changes it to 1.9, then 2.71, 3.439, and 4.0951. As you can see from the other links in my history, many of them end up with values less than 1. This is probably a bug in Firefox, so I wouldn't rely on the value of this for any reason until it's better understood.

This is why it's important to do your own testing, instead of just believing what you read on the Internet.

moz_bookmarks and moz_bookmarks_roots
Bookmarks are also valuable for an investigator to examine because like inputhistory it implies that the user took an active action with the site, in this case marking it for later re-visit. The fields here are id, type, fk, parent, position, title, keyword_id, folder_type, dateAdded, lastModified, and guid. Comparing to the documentation by David Koepi and firefoxforensics, it appears guid has been added for FF4. According to the Mozilla wiki, guid was added in order to have a unique global identifier, rather than the simple incrementing index of id. This means that gid, not id, would be the proper link between bookmarks across computers using the new Sync service in FF4. Note, I haven't tested this.

type is an integer referring to the type of the bookmark. Type 1 is a normal bookmark, 2 is a tag (folders are really just a type of tag). Type 3 is a separator, and so has no useful info. fk is a foreign key, which links to the id in moz_places (where you can find the actual url). parent refers to the id of the containing object (folder), which can tell you where the user categorized the bookmark. position refers to the listing order of the bookmarks, and is not likely to be interesting. title refers to the display name of the bookmark and keyword_id is the index number of the keyword (moz_keywords) associated with the bookmark. It may be useful to find associated bookmarks. folder_type is not likely to be useful, but dateAdded and lastModified would be. These values are exactly what they sound like, stored as PRTime

moz_bookmarks_root defines the root folders in the Firefox bookmark structure, and is not of forensic interest.

In Part 2 I'll get into moz_places and more browsing history. Part 3 has moz_historyvisits and signons.sqlite. Part 4 covers cookies, cache, downloads and more.

Friday, April 8, 2011

Reversing malicious Javascript, part 2

This is part 2 in my adventures reverse-engineering a malicious Javascript I found on my computer. Last time I unraveled some annoying string manipulation that eventually ran return eval(String.fromCharCode()) on a long series of Unicode-encoded characters.

I finally unencoded that long string of Unicode and, as expected, it was more Javascript. I was surprised at how much there was ... several hundred lines of code. This is unusually large for malicious Javascript. Bigger doesn't mean more sophisticated, however!

There are two Javascript functions that are relevant here. The first is called 'zazo', which tries to exploit a vulnerability in the Java Development Kit to load a malicious Java applet that the attacker controls. The payload is gone by now, so I can't tell what it would have done. This is a known vulnerability, CVE-2010-0886, and is described in detail here.

The second function is the reason this malicious script is so long. It includes a version of PluginDetect, a script by Eric Gerds that is designed to determine what version of many common browser plugins are being used. In this case, the malware author has it probe for Java and Adobe Reader versions, although the script only uses the Adobe Reader version. Although this plugin detector is being used maliciously, it's not inherently malicious and there's no reason to think Eric has any connection to the malware author.

After determining what version of Adobe Reader is being used, the script makes a simple calculation: if Reader is older than 8.0, it serves up one malicious pdf. If Reader is between 8 and 9.3.1, it serves up an alternate malicious PDF. If you don't have Reader or if your copy of Reader is up to date, you're safe from this script. Reader version 9.3.1 was released specifically to patch the vulnerability described in CVE-2010-0188, which affected Reader versions 8 to 9.3, so we can be fairly sure that's what this PDF was exploiting.

Interestingly, in part 1 we discovered that this bit of malware knew that it was 2011 and it would need to be modified to function in a different year. Despite that, the three bugs it attempts to exploit are old. The Java bug had a patch available a year ago. With the Reader exploits, one was patched last November and the other only affected versions older than 2006! These are not anywhere close to 0-day attacks. This is exactly why it's so important to keep your software up to date.

Also, Java is very frequently exploited. If you don't absolutely need it, get rid of it. Then hackers will have one less way to attack you and you'll have one less program to keep updated.