About Me

Bay Area, CA, United States
I'm a computer security professional, most interested in cybercrime and computer forensics. I'm also on Twitter @bond_alexander All opinions are my own unless explicitly stated.

Wednesday, April 6, 2011

Reverse engineering a malicious javascript part 1

My antivirus program flagged a malicious javascript a few days ago. At some point in my web browsing, a webpage quietly served up a malicious script in addition to the regular content. It was saved to my browser's cache and quarantined by my antivirus. Being the curious person that I am, I thought I'd try my hand at understanding how it works. Of course, as is typical of malicious scripts it was obfuscated.  Instead of looking like nice Javascript:

<script type="text/javascript">
document.write("<h1>This is a heading</h1>");
document.write("<p>This is a paragraph.</p>");
document.write("<p>This is another paragraph.</p>");

the malicious script is a mess, deliberately difficult to read (click to enlarge):

The sequence of numbers keeps going for the rest of the script.

Malware authors use tricks like this to keep people like me from understanding how the script works, and to make it more difficult for antivirus software to detect the page. If the av can't penetrate the obfuscation, then if they start detecting this page all the malware author needs to do is obfuscate it differently to generate a new signature. For more information on reverse engineering malware, take a look at this BlackHat presentation (pdf).

The curious thing about obfuscation is that it's designed to be difficult for people to understand yet simple for computers to understand. Luckily for me, that means we can use a javascript engine to translate it all back for us. Didier Stevens has modified Mozilla's Spidermonkey for exactly this purpose. All I need to to is extract the javascript from the rest of the page so I can feed it to the engine. Since this is pretty simple, though, I'm going to do this by hand.

Since the code has no line breaks or anything else useful, I fed it into Eclipse to clean it up and grab the javascript.

Cleaning it up in Eclipse makes the initial part of the script make a lot more sense. Take a look (click to enlarge):

If you know a little Javascript, you can already get an idea of what's going on. We've got a hidden textarea with some text in it. Right now it's meaningless, but this is going to be modified by the Javascript to pull the script together. The applet section makes a reference to a Java applet that would've been housed on the same webserver as this malicious webpage. Since I found this file in my cache, the applet isn't available for me to examine.

Right now it's the content in the script tags that we're going to look at. This is the part of the script that pulls together all the obfuscated components of the script and tells the browser how to execute them to infect itself with whatever piece of badness the author wants to hit me with.

Let's work through this step by step.
var date = new Date();
var f = date.getFullYear()-2009;
First, the script gets the date, pulls the year out, subtracts 2009, and saves it to the variable f. This limits the script to only this year, but the lifetime of an attack like this measures in days at the most so that's not a significant limitation. All this is a complicated way of defining f=2.

Next, we have:
zni = '2011val'.replace(date.getFullYear(),'');
var e = new Function('axlzg','return e'+zni)();
zni is another variable. Here, we take the string '2011val' and then delete the current year, so zni = val

Then, we define a function, e. e produces a string 'axlzg' and also takes the string 'return e' and appends the value of zni. This computes to return eval, which is a Javascript command to evaluate a string as if it was code.

Moving on:
var content = '';
There's another uninformatively-named variable here, but it's pretty obvious what it does. xzjc grabs the content of the text area, so xzjc = 'tring.from2011har2011ode' The script also defines a variable 'content', which is a blank string. We're getting somewhere now!
var fnxes=e('S'+xzjc.split(date.getFullYear()).join('C'));
This one's a little more complicated. This one's another text-manipulation exercise that will further translate things. Like math, we need to start from inside the parentheses and work outwards.

First, we're taking xzjc from the last line. We put 'S' in front and then split it into separate strings using the current year as the split point, yielding 'String.from' 'har' 'ode'. Then we re-join the fragments using a "separator" of 'C'. Now we have 'String.fromCharCode', which is a Javascript function that takes encoded characters and decodes them to a string. This result is run through the function "e", which takes the string and converts it back to code, so it can execute.

The reason the author is bothering with all this is because String.fromCharCode() is a common function that takes a set of character codes (in this case numbers) and converts them back to letters. For example, "51*f" is 51*2 = 102, which is the Unicode character code for f. Malware authors often use to obfuscate their code (as we'll soon see) so, it's a indicator that antivirus companies will trigger on. In this script, the malware author has to obfuscate their obfuscation method in order to try and evade the antivirus signature. I found this script because it triggered my antivirus, so even all this obfuscation failed.

Let's look at the last couple lines of this script.
content = fnxes(51*f,58.5*f,55*f,49.5*f,58*f,52.5*f,55.5*f,
        50*f,........ );

There's actually a lot more numbers in there than I'm showing, I'm just cropping it out for simplicity's sake. The script is taking the variable "content" and actually defining it. It's taking each of these numbers and multiplying it by f, which we already learned was 2. Then, it's running fnxes (which is really String.fromCharCode) against it. Now I'm going to turn to Spidermonkey to translate all this crap into real code, it would just be too annoying to do by hand.

So, after we multiply the numbers by 2 and then turn them back into a string, we get the payload. Unfortunately the payload itself is pretty long and complicated, so that'll have to wait for part 2 so I can have time to figure out what's going on.

No comments:

Post a Comment