If you tell enough stories, perhaps the moral will show up.

2008-06-17

Grepping the IE cache

I had to do an investigation the other week. I'm not an investigator and so naturally I screwed up. Here's what I learned.

Complaint was that some abusive hotmail-sent mail had arrived quoting the outside address of our firewall. After a bit of to-ing and fro-ing, I was allowed to see the headers, and that told me a good deal:

  • Hotmail does indeed quote an originating IP in the header. Who knew?

  • The earliest relay in a hotmail relay list is a name like bay99fd.bay99.hotmail.msn.com. Any hotmail user knows that the bay appears in the URL on the hotmail home page and throughout the user interface. And for any particular account, that bay number is fixed.

  • Timezones were going to be a problem. We were in local DST, the victim's mail infrastructure was in their DST and four hours behind, his MUA was working in another zone still and a lot of the Hotmail infrastructure is on Pacific time. Still, given headers, I could convert everything to UTC easily enough.

OK. Time to see if we can knock this out in a single step and get back to proper work. The Log appliance appliance has been gathering proxy logs all year. We're a pretty relaxed site and I've not been asked to report on usage of a named site before, so I have to code up a report with wildcards for client IP, domain-name and the page name. A bit of experimenting gives me a report of access to that Hotmail bay.

Now this is the first place a real investigator would have done it differently: First step should have been a summary report of all the users of the bay over the last three months. That might have been enough to get HR off my back. As it was, I spent a week dipping in and out of the proxy logs data to look at alternatives as the mails emerged from the complaining firm.

That initial set of headers fingered a single user. I could only see two users of the bay, and at the right time only one of them was active. And guess what? Within the two-minute precision of the log upload batch, he used pages on the bay called "compose" and "premail". A bit of experimenting with my own hotmail showed that that is the characteristic signature of sending Hotmail.

This is the second point I did the wrong thing. I've got a budget for investigations and I should have used it. For UKL 1,000 + expenses and VAT, Kroll Ontrack (it used to be Vogon) will send midnight engineer to take a swearable image of a workstation hard disk, leaving you with a handy USB disk copy for your own investigation and the user none the wiser. I was focussed on our local, more rough and ready process, which was a bit too public for HR. It wasn't a total screw-up though. I'd only looked at proxy logs through a read only interface -- I knew enough not to touch the workstation, and so the purity of its evidential status was preserved, even though the Internet cache timeout was ebbing away.

Part of the delay was at the far end. HR can't and won't do anything on a complaint like this without the offending text, and the complainer was a bit coy. HR's reason is good: it might not be offensive in our context. Still, I thought it was a bit silly -- the headers showed that the hotmail address was obviously a real name, and not the name of our user, and he is, or was, a regulated person.

In the middle of that argument, I got a second set of headers for a much more recent mail. Same accounts, same bay, same user matches.

It all went a bit off course at that point. What I got next were not proper headers with that incriminating source IP and lovely times plainly referred to UTC. It was the nesting of headers in the body of a reply/forward dialogue, and the "on" times there are converted into the time of whoever received the mail. By that time, I was so focussed on matching the time to activity on the proxies that I set to work trying to infer the timezone of each recipient and reconstruct the offenders side of the dialogue. A proper investigator would have realised that this exercise was difficult enough to make uncertain results, and insisted on headers or nothing. As it was, I made mistakes and spent a lot of time wondering how the original mail could have been sent when our target definitely was busy and wasn't on Hotmail. I went as far as trying to rope in the other user of the bay as an accomplice -- that didn't work either. Looking at the times again, I can se my mistake: It wasn't five PM, but seven, and the mail was sent from home.

I'm not privy to the discussion that went on in the business. It's called reputational risk and I guess we were asking the board to trade a reputational compromise with a non-customer against possibly losing an expensively-hired fund manager and telling his customers that their money had been in the hands of a stupid person with weak morals. Glad I don't have to make that choice, but they did the right thing and I was told off to get the dirt.

The Kroll visit was simplicity itself, mainly because I didn't have to stay up all night -- the HR guy did that!

Lunchtime next day I got an urgent package with a 40GB USB hard disk which mounted first time on my non-build laptop. That was another mistake -- if I'd used a Linux laptop, or a regedit fix, I could have controlled the mount to be read only. It didn't really matter as the forensic copy is on Kroll's servers -- the supplied disk is just a playpen. The idea is that you hunt around any way you like, but any defence witnesses or advisers can still work with a guaranteed untouched copy.

This is important -- a lesson I learnt long ago. Never give in to the temptation to take a quick look at a workstation via the admin shares or however. Unless you are collecting them automatically, don't even look at the event logs. Right at the beginning of any question, figure out -- ask -- if there's any possibility that anyone will be held to account for what you uncover. Consider whether (for example) you could work from restores, or with a reporting tool. Tell your interlocutor that if it's possibly going to get as far as swearing evidence, you are less likely to be overturned if you work throughout with a trained investigator.

If you really have no choice, make sure you get a crytographically secure hash of each and every file you access. Make it clear in your notes that you obtained the hash before looking at the file. Print the hash out, note the time it was obtained, sign it and date it. Make sure that the file you keep will generate the hash you print. That way you can swear that it was that way when you found it.

However. I had a scratch copy of a workstation disk and I could do what I liked. There are tools for this sort of thing and I ain't got aught of 'em. Not necessary at my level. You can download an excellent Windows grep from the FSF and anything else is overkill. Remember to put the GNUWin32\bin directory in the path.

With the disk mounted, you'll find the IE cache is at (name changed to incriminate the innocent)

\Documents and Settings\umacf24\Local Settings\Temporary Internet Files\Content.IE5

Make your way there in the command line and issue a carefully chosen grep:

grep -irl madeupname@hotmail.com *

will search for the address in all the cached files in that directory. That matters because hotmail puts the logged on account on every page, so you can see right away whether the user has actually been active on that account -- the one thing the proxy logs can't give you. Those options mean: -i case insensitive, -l list the matching files (as the content isn't much use, as text) and -r recurses down the directories.

I was surprised to find that the IE cache went back a lot further than I expected. It looks as the the "retain for n days" setting only takes effect if space is tight -- this man's cache went back months.

Now the beauty of the IE cache compared to Firefox is that there's no complicated database format. The files cached are the files downloaded. Names are modified, and there's a directory structure to avoid having one huge folder, but the pages can be displayed in the browser. I have an account which doesn't have Internet access, so using that account, I just started IE and browsed to the appropriate files. It was one of my happier moments to see a hotmail folder listing -- looking a bit dodgy, admittedly -- listing times and subjects of the complained-of emails. Access to compose pages actually gave me content of mails which the complaining party had been relectant to reveal. I gather that the colour prints of those pages were particularly unsettling when the confrontation occurred.

I can't write a story like this without a few lessons.

  • Serious investigation would have been overkill. We didn't need deleted files, we didn't need to to search for concealled media or executable content. It was just those emails

  • Think. Of course he was doing it from home.

  • Ask for what you need. I needed headers.

  • Don't be afraid to search a PC. I've bought an imaging machine so we can do our own. I could have got those unarguable Hotmail reconstructions much earlier and saved a lot of time.

  • You want to keep proxy logs for ever. The depth of context is invaluable when you need to do a lot of learning about what your users get up to.
  • Remember that users can't protect themselves. Using gmail over SSL would have made this offence effectively uninvestigatable without bugging his PC. But who knows that?

Good luck to you

No comments: