.

Bringing Back The Dead

A little background – basically I finally carved some time out to work on puppet again. This time I was taking a break from working on the server side. I was focused on making my desktop puppet controlled. When you combine the number of computers I use with the number of times I need to upgrade Ubuntu – you end up with a lot of stuff to manage – sounded like a problem for puppet (plus it would mean I could finally integrate all those – “Perfect Desktop” articles I’ve read).

I did what I thought was a sensible thing. I got an external drive (I specifically got one that supported eSATA – which given how disk intensive this process turned out to be was a very very good idea) and set it up to install a completely fresh install of Januty. Once this was done and fully updated, I copied the partition to a backup so I could restore it easily for testing (Please note – when I do this for a server I normally use a combination of Xen and LVM snapshots. Given that a lot of the stuff about the desktop has to do with my video card and sound – I didn’t want a hardware abstraction layer in the middle).

I started down my merry way – adding in new apt repositories. I even have a pattern working so that if you don’t include some of the repositories in the config it will make sure that they are not on the server (I haven’t always been diligent about having puppet clean up after itself). About 4 hours in, I realised that I needed some notes I had on my real install. No problem

mkdir /tmp/real
sudo mount /dev/sdb8 /tmp/real
sudo mount /dev/sd7 /tmp/real/home

Now I can easily look up stuff and continue working on puppet. I work through a but of repository stuff (including how to have it add a pgp key for a new repo to the system). Things are finally taking shape. Then I notice that a run is taking a lot lot longer. I take a moment to look at the output – and I see “File older than 7 days – Tidying”. No big deal. My puppet config is setup to keep the /tmp tidy. It found some old files in tmp and it is dutifully cleaning them up for me – how very very helpful.

If you’ve been paying attention, you already know what happened. For everyone else, those files that are old that can be tidied – it was the contents of my real workstation partition. Puppet was basically performing rm -fR on my drive. My REAL DRIVE!

I killed puppet. Stared blankly at the screen. I hate this sort of thing. The terrible thing about losing data is that you never know what you lost until later when you need something and you realise it was on the hard drive that died.

I shut the computer down. I tried to boot to the real drive – thinking that it had just wiped my home partition. Nope – it got /boot as well. So I shut it off. My next step is pretty typical. I exclaimed loudly in a variety of non-repeatable phrases inpuning the quality and character of my computer. I then booted XP and put in some solid time playing Plants vs Zombies (Which I love!)

The Next Day

Lessons Learned:

  • Keep a backup of your home directory – even if it is huge and a pain to work with (since your home directory at home actually has stuff that isn’t under git/svn)
  • When mounting a file system – put it in the right place – like /mnt or /media. (I got into the bad habit of /tmp from spending a lot of time manipulating the debian installer).
  • When you are doing puppet style system automation – don’t do it in a place where it can delete data you care about. (Normally LVM keeps me safe)

Now time to figure out what to do.

How can I recover (undelete) deleted files from my ext3 partition?

Q: How can I recover (undelete) deleted files from my ext3 partition?
Actually, you can’t! This is what one of the developers, Andreas Dilger, said about it:

In order to ensure that ext3 can safely resume an unlink after a crash, it actually zeros out the block pointers in the inode, whereas
ext2 just marks these blocks as unused in the block bitmaps and marks the inode as “deleted” and leaves the block pointers alone.

Your only hope is to “grep” for parts of your files that have been deleted and hope for the best.

Ok that sounds bad. Very very bad. Not to worry – I also found HOWTO recover deleted files on an ext3 file system . He even has code you can download. This guy is seriously smart. Basically he wrote a tool to reconstruct files from the journal.

The most important thing is to make sure that you don’t remount the drive or allow the journal to be modified until after you do the restore. In my case, shutting down and playing video games kept the system in a preserved state – making recovery possible.

The Nitty and the Gritty

Start by downloading the code ,configure,make, cd src.

Since I was so careless the first time – I was trying to be better this time – so I took a dd image of the two partitions.

dd if=/dev/sda8 bs=4096 of=sda8.backup
dd if=/dev/sda7 bs=4096 of=sda7.backup

I had to use dd so that the journal and other filesystem info would be preserved. The bad part was that sda7 was a 400GB partition (With only 200GB of data). It took several hours to get that.

Now time to start the analysis.

./ext3grep /dev/sda8 --dump-names > sda8.txt 2>&1 &
./ext3grep /dev/sda7 --dump-names > sda7.txt 2>&1 &

This basically allows ext3grep to parse the journal and gives you a list of all the files that it knows about. This took hours on sda7. The good news is that he caches the information to disk – so once you do it once – the other steps go much much faster.

Since the problem happened the day before – I didn’t actually remember when it happened – exactly. In the article, Carlos warns that if you restore too much you’ll end up with files that have legitimately been deleted and they will be hard linked to other files – meaning you’ll have a mess.

So how do you figure out when you did something stupid? Turns out he has a command for that:

I knew the date – so I just did

date -d "Thu May 28 08:00:00 2009" "+%s"
date "+%s"

That gives me a start time of before the incident (which was on Friday) and now. That will narrow down the search. Using those numbers you can ask it to do a histogram.

./ext3grep /dev/sda8 --histogram=dtime --after=1243529748 --before=1243702540
./ext3grep /dev/sda7 --histogram=dtime --after=1243529748 --before=1243702540

Here is the output for the smaller drive

Running ext3grep version 0.10.1
Only show/process deleted entries if they are deleted on or after Thu May 28 11:55:48 2009 and before Sat May 30 11:55:40 2009.

WARNING: I don't know what EXT3_FEATURE_COMPAT_EXT_ATTR is.
Number of groups: 670
Minimum / maximum journal block: 10912258 / 10945573
Loading journal descriptors... sorting... done
The oldest inode block that is still in the journal, appears to be from 1242606010 = Sun May 17 19:20:10 2009
Journal transaction 82476 wraps around, some data blocks might have been lost of this transaction.
Number of descriptors in journal: 27906; min / max sequence numbers: 79083 / 88361

Only show/process deleted entries if they are deleted on or after 1243529748 and before 1243702540.
Only showing deleted entries.
Thu May 28 11:55:48 2009  1243529748        0
Thu May 28 12:24:36 2009  1243531476        0
Thu May 28 12:53:24 2009  1243533204        0
Thu May 28 13:22:12 2009  1243534932        0
Thu May 28 13:51:00 2009  1243536660        0
Thu May 28 14:19:48 2009  1243538388        0
Thu May 28 14:48:36 2009  1243540116        0
Thu May 28 15:17:24 2009  1243541844        0
Thu May 28 15:46:12 2009  1243543572        0
Thu May 28 16:15:00 2009  1243545300        0
Thu May 28 16:43:48 2009  1243547028        2
Thu May 28 17:12:36 2009  1243548756        0
Thu May 28 17:41:24 2009  1243550484        0
Thu May 28 18:10:12 2009  1243552212        0
Thu May 28 18:39:00 2009  1243553940        0
Thu May 28 19:07:48 2009  1243555668        0
Thu May 28 19:36:36 2009  1243557396        0
Thu May 28 20:05:24 2009  1243559124        0
Thu May 28 20:34:12 2009  1243560852        0
Thu May 28 21:03:00 2009  1243562580        0
Thu May 28 21:31:48 2009  1243564308        0
Thu May 28 22:00:36 2009  1243566036        0
Thu May 28 22:29:24 2009  1243567764        0
Thu May 28 22:58:12 2009  1243569492        0
Thu May 28 23:27:00 2009  1243571220        0
Thu May 28 23:55:48 2009  1243572948        0
Fri May 29 00:24:36 2009  1243574676        0
Fri May 29 00:53:24 2009  1243576404        0
Fri May 29 01:22:12 2009  1243578132        0
Fri May 29 01:51:00 2009  1243579860        0
Fri May 29 02:19:48 2009  1243581588        0
Fri May 29 02:48:36 2009  1243583316        0
Fri May 29 03:17:24 2009  1243585044        0
Fri May 29 03:46:12 2009  1243586772        0
Fri May 29 04:15:00 2009  1243588500        0
Fri May 29 04:43:48 2009  1243590228        0
Fri May 29 05:12:36 2009  1243591956        0
Fri May 29 05:41:24 2009  1243593684        0
Fri May 29 06:10:12 2009  1243595412        0
Fri May 29 06:39:00 2009  1243597140        0
Fri May 29 07:07:48 2009  1243598868        0
Fri May 29 07:36:36 2009  1243600596        0
Fri May 29 08:05:24 2009  1243602324        0
Fri May 29 08:34:12 2009  1243604052        0
Fri May 29 09:03:00 2009  1243605780        0
Fri May 29 09:31:48 2009  1243607508        0
Fri May 29 10:00:36 2009  1243609236        0
Fri May 29 10:29:24 2009  1243610964     2891 ====================================================================================================
Fri May 29 10:58:12 2009  1243612692        0
Fri May 29 11:27:00 2009  1243614420        0
Fri May 29 11:55:48 2009  1243616148        0
Fri May 29 12:24:36 2009  1243617876        0
Fri May 29 12:53:24 2009  1243619604        0
Fri May 29 13:22:12 2009  1243621332        0
Fri May 29 13:51:00 2009  1243623060        0
Fri May 29 14:19:48 2009  1243624788        0
Fri May 29 14:48:36 2009  1243626516        0
Fri May 29 15:17:24 2009  1243628244        0
Fri May 29 15:46:12 2009  1243629972        0
Fri May 29 16:15:00 2009  1243631700        0
Fri May 29 16:43:48 2009  1243633428        0
Fri May 29 17:12:36 2009  1243635156        0
Fri May 29 17:41:24 2009  1243636884        0
Fri May 29 18:10:12 2009  1243638612        0
Fri May 29 18:39:00 2009  1243640340        0
Fri May 29 19:07:48 2009  1243642068        0
Fri May 29 19:36:36 2009  1243643796        0
Fri May 29 20:05:24 2009  1243645524        0
Fri May 29 20:34:12 2009  1243647252        0
Fri May 29 21:03:00 2009  1243648980        0
Fri May 29 21:31:48 2009  1243650708        0
Fri May 29 22:00:36 2009  1243652436        0
Fri May 29 22:29:24 2009  1243654164        0
Fri May 29 22:58:12 2009  1243655892        0
Fri May 29 23:27:00 2009  1243657620        0
Fri May 29 23:55:48 2009  1243659348        0
Sat May 30 00:24:36 2009  1243661076        0
Sat May 30 00:53:24 2009  1243662804        0
Sat May 30 01:22:12 2009  1243664532        0
Sat May 30 01:51:00 2009  1243666260        0
Sat May 30 02:19:48 2009  1243667988        0
Sat May 30 02:48:36 2009  1243669716        0
Sat May 30 03:17:24 2009  1243671444        0
Sat May 30 03:46:12 2009  1243673172        0
Sat May 30 04:15:00 2009  1243674900        0
Sat May 30 04:43:48 2009  1243676628        0
Sat May 30 05:12:36 2009  1243678356        0
Sat May 30 05:41:24 2009  1243680084        0
Sat May 30 06:10:12 2009  1243681812        0
Sat May 30 06:39:00 2009  1243683540        0
Sat May 30 07:07:48 2009  1243685268        0
Sat May 30 07:36:36 2009  1243686996        0
Sat May 30 08:05:24 2009  1243688724        0
Sat May 30 08:34:12 2009  1243690452        0
Sat May 30 09:03:00 2009  1243692180        0
Sat May 30 09:31:48 2009  1243693908        0
Sat May 30 10:00:36 2009  1243695636        0
Sat May 30 10:29:24 2009  1243697364        0
Sat May 30 10:58:12 2009  1243699092        0
Sat May 30 11:27:00 2009  1243700820        0
Sat May 30 11:55:48 2009  1243702548
Totals:
1243529748 - 1243702539     2893

If you look you can see that it looks like the journal started mass deleting stuff at 10:29. There was a similar spike on the large drive. Now I knew when the event happened. I can use that as the starting point for the restore.

./ext3grep /dev/sda8 --restore-all --after=1243529748
./ext3grep /dev/sda7 --restore-all --after=1243529748

I have no idea how long that took since I was at a concert while it was working. One note – basically it restores everything to the current directory RESTORED_FILES. I ended up having to put that directory on another disk so it would have room for the 143GB of data it restored. Also I did both at the same time. That could have been a terrible mistake since the restored files are relative to the root of the partition meaning that my home directory ended up in RESTORED_FILES/ instead of RESTORED_FILES/home. Since mine was the only directory being restored it was a big deal but if it had been a more complex file hierarchy that could have been disaster.

Now I have all my files. I mounted the drive and used rsync to copy everything back over. Then I used e2fsck to check the file system. Now I just need to boot to confirm that it’s alive!

Aftermath

First boot no good. I ended up mounting the drive using a live cd. So far it looks like the /lib directory didn’t get files copied over properly – lots and lots of dangling symlinks. The first sign of the problem was not being able to chroot into the drive under the live cd.

It kept saying something about not being able to run /bin/bash (since all libs were gone).

After some more copying and such I got things back in order. Apparently the rsync messed up some of the symlinks – I ended up doing a full copy over the top since the RESTORED_FILES had a lot of the missing files. My home directory was owned by root :( Fortunately, that was easily (though not quickly) fixed.

So I’m finally writing this from my real OS and home directory on my workstation. Just in time for Monday to get some actual work done.


A Painless Introduction To Finite State Machines

http://lamsonproject.org/docs/introduction_to_finite_state_machines.html


Baguette take 2

photo.jpg

Max tries out tgetown baby pool

photo.jpg

The beatles visit austin

photo.jpg

River tree grains

Press sandwich with butternut squash soup and humming bird cake

photo.jpg

Back in Texas

photo.jpg

Steak n shake for the indecisive

photo.jpg

I’m in awe

This is in the utlity room at my aunt’s house

photo.jpg

Big dave’s ruben

photo.jpg

    Stuff I want to read

    Shelfari: Book reviews on your book blog

    Stuff I've Read

    Shelfari: Book reviews on your book blog
    You are currently browsing the Economy Size Geek weblog archives for May, 2009.
    Categories
    Archives

    .