Clone a laptop drive
August 11, 2007 10:05 AM   Subscribe

What is the easiest way to make a bit-perfect copy of a potentially failing laptop hard drive? The copy target would be a new 2.5" drive that I'd then install in the same laptop.

I'd like to clone my laptop hard drive onto a replacement drive. The copy target would be a new laptop drive. Whatever mechanism I use to copy data would ideally a) not be derailed by bit errors (the current drive IS failing, after all) and b) be able to enumerate which files are corrupt, if any.

I'm running Windows XP, but would be willing to boot into a Linux CD distribution, etc... I'm a pretty sophisticated user, just not an I.T. guy with copies of Ghost lying around (and don't recommend Norton Ghost, please; it's over-complicated and I hate it).
posted by killdevil to Computers & Internet (37 answers total) 7 users marked this as a favorite
 
for unix i used rsync (a standard tool). however, i don't know whether what you are describing will work for windows. see here - the suggestion to use dd instead makes sense.
posted by andrew cooke at 10:14 AM on August 11, 2007


Response by poster: Yes, I was sort of thinking about trying dd. I could boot into Ubuntu and give it a go... does anybody know definitively whether or not dd will properly image a WinNT boot drive? From what I understand it will create a bit-perfect copy, but then there's the dark magic of bootloader and so on to consider.

Are there any unix mavens out there in the audience with the appropriate dd command syntax for me?
posted by killdevil at 10:40 AM on August 11, 2007


I used Acronis True Image when my old laptop's hard drive was failing. I copied everything onto an external and it completely replicated on the hard drive of the new laptop. I don't think it will tell you what files, if any, are corrupt though.

Upon further research, they also have another product that looks like it might provide more support for migration of possibly corrupted file systems, MigrateEasy.
posted by ml98tu at 10:44 AM on August 11, 2007 [1 favorite]


Boot a Unixy LiveCD, install dd_rescue (you can apt-get dd_rescue from Ubuntu, say) and from a console do:

sudo -s
dd_rescue /dev/olddrive /dev/newdrive


The disks will most likely be /dev/sd* or /dev/hd*: use dmesg |grep '[sh]d[a-z]' to find them. If you're not sure which is which (hint: letter ordering should match BIOS detection/channel order), do file -s /dev/whatever on each; if it's a fresh disk you should just get "data"; your existing disk should show up as "x86 boot sector".

If you're still unsure, install smartmontools and run smartctl -a /dev/whatever for more details like the serial numbers, age and so forth (you can also smartctl -t short /dev/whatever to run a SMART self test on either disk (replace short with long if you want to be more thorough); use smartctl -a for the results).

Pay special attention to the dmesg output; specifically, the number of sectors ("hda: 117210240 sectors (60011 MB), CHS=16383/255/63, UDMA(100)"). If the new disk has fewer sectors than the old one, you'll probably need something more like Ghost, which can resize partitions and filesystems.

When you've got your new disk booted up (or you've booted another Windows install with it connected), My Computer -> C:\ -> Properties -> Tools -> Check Now with "Automatically fix" and "Scan for.. bad sectors" checked to verify the filesystem is intact.
posted by Freaky at 10:49 AM on August 11, 2007


(And yes, dd(_rescue) will copy boot loaders too; that's just the data at the start of the drive).
posted by Freaky at 10:50 AM on August 11, 2007


Re bootloading:

Once you make the copy (can't help you with dd syntax, sorry), you will have to initialize the boot sector again. The best way to do this is to boot with a WinXP disc into the recovery console, then run fixboot and then probably bootcfg /rebuild.
posted by Partial Law at 10:51 AM on August 11, 2007


The easy way to do this is to buy a drive upgrade kit, which will have a USB drive caddy and cloning software. The one I've linked from Newegg will handle drives larger than 120 GB (not all of these enclosures will) and the EZ Gig II software CD was plug and play. I've used mine to replace several IDE drives in Winbook, IBM and Dell laptops, without a hitch.

Or you could use a Ubuntu or Knoppix Live CD and do the dd thing. dd will do its damnedest to make a perfect copy, but if your drive is failing so badly it can't automatically substitute sectors from it's spare sectors pool, you may have irretrievably lost data. Making a bit perfect copy of a hosed drive may not be all that valuable in terms of time (because dd can be single minded enough in trying to recover data that I've seen it eat a failing drive completely, if the user didn't set some sanity pass values in options), versus just making a corrupted copy with a single pass utility, swapping drives right away, and then doing a Windows XP repair install from your original media, on the new drive, which will rebuild your HAL, and reset your System Restore points. That's pretty much a guaranteed way to goodness, whereas trying to rescue from a failing drive can go south pretty badly, if the gods don't smile upon you...
posted by paulsc at 10:54 AM on August 11, 2007 [1 favorite]


There's some info about copying drives in this old thread.
posted by DarkForest at 10:54 AM on August 11, 2007


To use dd proper, replace dd_rescue with:

dd if=/dev/olddrive of=/dev/newdrive conv=noerror,sync

However, this copies in 512 byte chunks. You can use bs=128k or so to make the block size bigger, but then if a transfer fails you lose that entire block; dd_rescue will use the larger block size and automagically drop to a smaller one to save as much data as possible without being generally slow.

If you want to use FreeBSD, grab FreeSBIE, which ships with dd_rescue and smartmontools by default; use the same instructions as above, but the disks will be called /dev/ad[0-9], and you need to get the number of sectors using diskinfo /dev/ad[number].
posted by Freaky at 11:02 AM on August 11, 2007 [1 favorite]


I generally use dd for this sort of thing, along with a USB/SATA/IDEAadapter.
posted by jjb at 12:38 PM on August 11, 2007


Also consider PartImage, the open source alternative to Norton Ghost.
posted by gmarceau at 12:59 PM on August 11, 2007


Also have a look at Helix which is a forensics cd. As well as being bootable it also has a live windows startup and option to run dd for you from there.
posted by stuartmm at 1:25 PM on August 11, 2007


Acronis Acronis Acronis! It's cheap. I've used it dozens of times for what you're describing.
posted by tcv at 3:00 PM on August 11, 2007


Nthing Acronis True Image. It's worth it for the trouble it saves.
posted by mendel at 4:32 PM on August 11, 2007


Gnu ddrescue (apt-get install gddrescue on Ubuntu) is an improved dd_rescue. I wrote up the procedure for copying a damaged DVD with it here and added a few corrections here.

The procedure for using it to clone a failing hard disk is the same except for filenames. Use the dodgy hard drive's /dev entry instead of /dev/dvd, the new one's /dev entry instead of ./dvd, put ddr.log on a flash drive for safekeeping, and leave out --block-size=2048; the default block size of 512 is correct for hard drives.

All that stuff should work fine from an Ubuntu live CD, including the apt-get installs if you have a working network connection.

I suggest you remove the failing drive from the laptop and mount it in an external USB2 enclosure before you start. Install your virgin replacement drive in the laptop, make sure no USB devices are plugged in, boot the live CD and use sudo fdisk -l to list your available hard disks. The only one you see should be the one you've just mounted inside the machine. On Ubuntu 7.04, it will probably be /dev/sda, and if it's a virgin drive it won't have a partition table, so there will be no /dev/sdaN subdevices.

Plug in your failing drive's USB enclosure, and another sudo fdisk -l should now show you the /dev entries for both drives.

Do the same again to identify your USB flash drive.

Before you hit Enter on the ddrescue command line, check it twice to make sure the device you're reading from comes first. It will quite happily clone your virgin drive onto your failing drive if you ask it to. If you're the least bit unsure about which drive is which, or you find yourself confused about what's a whole drive and what's a partition, post back here for clarification before going ahead.

The advantage of doing the copy in the external->internal direction is that you can then more easily apply physical coercion to the failing drive, if necessary, in an attempt to get the last few blocks off it. Copy as much as you can just with ddrescue and no heroics, using the raw device method I wrote up for DVD. If ddrescue then says your copy is error-free, so much the better. If not, run another pass of ddrescue with --max-retries=20, and try holding the dodgy drive upside down and/or gently jiggling it as ddrescue tries to recover the bad blocks; if that doesn't work, put it in a baggie in your freezer for half an hour before trying again (smoosh out as much air as possible, leave the USB cable sticking out of the baggie and seal the bag around it with a rubber band; you want as little condensation as possible on the drive's electronics).

This all assumes that your replacement drive has at least as many sectors on it as your original did. Get the next size up, just to be sure. You can always resize the last partition, or make more, to use up the extra space.

Since you'll making a block-for-block copy of your entire drive including the partition table (unlike Ghost, which works file-by-file), there will be no need to faff around rebuilding bootloaders and whatnot after a successful copy; the cloned drive will Just Work.

If you don't already have an Ubuntu live CD to play with, I suggest you use the Trinity Rescue Kit instead. It's a much smaller download than Ubuntu, the current version has ddrescue installed already, you don't need to modprobe raw or make any /dev/raw device nodes to get the raw command to work, and you'll be running as root so you won't need sudo.

Finding which files are corrupt, if you end up with errors, is possible but very fiddly. See whether ddrescue will get you a clean copy first, before you even worry about it. I think you'll find it will, provided the original drive isn't too far gone - it's very thorough, and it tries very hard.
posted by flabdablet at 7:18 AM on August 12, 2007 [1 favorite]


In case it isn't obvious, you can leave out all the mplayer steps I wrote up for the DVD rescue, too.
posted by flabdablet at 7:26 AM on August 12, 2007


Also, you won't get bit errors. The source drive has ECC that will detect those, and fix them if it can; any blocks that can't be corrected, ddrescue will simply leave unwritten on the destination drive.
posted by flabdablet at 7:32 AM on August 12, 2007


Response by poster: Flabdablet, I drop-shipped a 160 gig replacement drive from NewEgg. The failing drive is 80 gigs... is there anything I'll need to do to accommodate the increased size?

I'd like to end up with usable free space, so will I need to resize partitions or some such?
posted by killdevil at 9:02 AM on August 12, 2007


If you copy a partition table built for an 80GB drive (along with everything else) onto your 160GB drive, you'll end up with 80GB of unpartitioned space at the end of the drive. This will cause you no trouble at all, and your new 160GB drive will continue to function as if it were the original 80GB that its image came from.

If the partition table has room for extra partition entries (which it will, if it was automatically partitioned by Windows XP Setup) you can easily create another partition (or several) to make use of that space.

Alternatively, you can make the last existing partition bigger, then extend the filesystem size on that partition to let it use the extra room. Don't use the XP disk management tools to do this. I hear Vista's disk management tools are much more robust, but personally I prefer using reliable Linux command line tools for all this stuff. If your Windows partitions are formatted NTFS, Trinity Rescue Kit has all the needed tools.

Once you've got your disk rescue done, post back here (or email me) if you need partitioning/resizing help.

If you're going to use ddrescue, I'd be interested to hear about your experiences with it.
posted by flabdablet at 4:31 PM on August 12, 2007


"... If not, run another pass of ddrescue with --max-retries=20 ..."

Here, flabdablet and I disagree. In my experience, a drive with actual media problems (oxide coating flaking off platters), that has hard errors enough to have completely exhausted its internal sector spare pool, can be pushed to failure by ddrescue. Successive retries at reading failed sectors by ddrescue actually seem to increase the rate of failure of already marginal drives, in my personal experiments.

If you examine your SMART statistics (assuming your drive supports SMART, as most drives made since the mid-90s do), and find spare sector pool exhaustion, I wouldn't bother with ddrescue, at all. You can fail adjacent sectors to those already reported bad, by long runs of repetitive re-seeks and reads.

Better, by far, to grab what is known good, while it's good, and rebuild Windows via a repair install, than to fail a dicey drive completely, mid-recovery.
posted by paulsc at 8:44 PM on August 12, 2007


I respect paulsc's wisdom in all things, and I'm offering this as a complement to what he says above, rather than a disagreement with it.

In my experience, Windows doesn't handle bad sectors well. What I've often see happen is that Windows has trouble reading a particular sector, declares it bad, adds it to the NTFS bad sectors list and then refuses to use it ever again.

This was reasonable behavior when drives didn't do their own spare sector replacement. These days, they do. Trouble is, a drive will typically not spare-out a sector it can't read successfully; it will put it on a sectors-pending-reallocation list, then spare it out the next time it's written to.

If bad sectors occur inside files that Windows doesn't ever write to - like installed program files, or parts of Windows itself - then Windows will never spare them out. The drive could well be capable of dealing with bad sectors as it's designed to do and running well for years, but Windows never gives it the opportunity.

It also sometimes happens that a drive will get a good read of a failing sector after several retries (a process that can take several seconds). In this case, the drive will typically mark the sector "pending reallocation", but it will not report a read error to Windows; and since Windows doesn't bother measuring how long its reads take, it will have no clue that anything is wrong. If this happens to a sector inside one of Windows's critical read-only files, the symptom is that Windows runs dog-slow and very clicky and noisy but otherwise OK.

I fully agree with paulsc that you shouldn't attempt heroic rescue methods on a drive whose SMART log shows that its pool of spare sectors is full or nearly full. My point is that most drives that show up in Windows as having "bad sectors" are nowhere near that close to dead; many of them can be restored to full and reliable operation if you can get just one good read of each bad sector, then rewrite that sector to make the drive reallocate it.

This could be done in-place with ddrescue (or with the commercially available SpinRite, which appears to do the same job but with more hype), but that's living closer to the edge than I like to. Far better to make as complete an image of the drive as possible onto a replacement drive before you start messing with what's left.

Gnu ddrescue does exactly this; if you run a first pass with --max-retries=0 (the default, if --max-retries isn't specified), the sequence of operations is
  1. Run a copy pass over the entire disk, using reasonably large reads (128KiB at a time, by default) and noting in the logfile where the drive reports bad reads.
  2. Work through the logfile, splitting each of the error areas in half and copying the good halves, until the error areas can't be split into a good half and a bad half any more.
At this point, ddrescue hasn't hammered on the drive any harder than normal operation would. In fact, since most of the access will have been sequential, it's probably done less hammering than just running the drive in an OS for the same amount of time.

If you then run another pass with --max-retries=20, then all ddrescue will be doing during that pass is attempting to read bad sectors and it will indeed hammer the drive. But by this stage, you already have a copy of as much stuff as you can get off the drive without physical coercion, and if you're going to replace the drive anyway, it's not going to hurt - and often can help - to try the upside-down, jiggling and cold tricks.

The rationale behind these is that bad sectors don't necessarily mean that the medium is physically flaking off the disk. They can be caused by the interaction between aging drive electronics and marginally OK spots on the media surface, aging positioning mechanisms and assorted other things. Turning failing drives upside down, or gently jiggling them while ddrescue is doing its thing, can address mis-positioning issues. Cooling them can also do that, as well as affect the sensitivity of the read-head amplifiers and they flying height of the heads, and allow them to get good reads of sectors they couldn't get while hot.

If these tricks work, you can get a 100% OK image onto your new drive, and you're on your merry way without needing any further recovery tools. Windows may well report that the drive still has bad sectors, but it tells lies; it's just that you've copied the NTFS bad sector list along with everything else.

If physical coercion doesn't work, and your cloned drive doesn't run without needing a Windows repair install first, you haven't really lost much.

If the SMART logs suggest that a drive has indeed developed internal dandruff, and the drive contains critical data that you have no backup of, then I fully agree with paulsc - you should not beat it about the head and shoulders with a large max-retries. But it should be quite safe to proceed as follows:
  1. Use ddrescue with --max-retries=0 and raw device access to get as many sectors off the failing drive as you can without heroics; use a file on a large target drive, rather than a whole device, to hold the image. This will typically cause less stress on the drive than attempting any kind of file-by-file copy.
  2. If there are not horrendous numbers of bad sectors, use ddrescue or dd to copy that image file to a replacement disk drive and see if it will work for you, with or without a Windows repair install. Use whatever automated file, filesystem or partition recovery tools you like, under whatever OS you want, on the replacement drive. If they mess it up, as they often can, re-copy the replacement drive from the image file and start again.
  3. Send the failed drive off to a professional data recovery service, with a list of files you've been unable to get hold of by messing with your copies of it.
Gnu ddrescue is like any other power tool. It's very good at what it does. But you need to understand what it does, and you need to understand some things about the machines it's doing those things to, in order to use it safely.
posted by flabdablet at 10:09 PM on August 12, 2007 [1 favorite]


If GNU ddrescue is anything like FreeBSD's recoverdisk(1), a high retry count shouldn't cause any more work for the disk than a non-retrying run followed by a run with a >0 retry count:

The recoverdisk utility reads data from the special file until all blocks could be successfully read. It starts reading in multiples of the sector size. Whenever a block fails, it is put to the end of the working queue and will be read again, possibly with a smaller read size.

That means, it'll first try to read everything, making notes of where it's failed. When it's done it will then go over the list of failures and retry each one; i.e. it'll only start thrashing the drive once it's already retrieved most of the data. Indeed, from ddrescue.info:

2) Read the non-damaged parts of the input file, skipping the damaged areas, until the requested size is reached, or until interrupted by the user.

3) Try to read the damaged areas, splitting them into smaller pieces and reading the non-damaged pieces, until the hardware block size is reached, or until interrupted by the user.

4) Try to read the damaged hardware blocks until the specified number of retries is reached, or until interrupted by the user.


Which seems to say much the same thing; it'll read the bulk of readable data and only when it's done that will it start retrying over faulty areas.

As an aside, I find it somewhat amusing that the whole of recoverdisk (280 lines, including, I shit you not, "THE BEER-WARE LICENSE") is 10 times smaller than the COPYING file GNU ddrescue ships with (and nearly 20x smaller than ddrescue itself).
posted by Freaky at 8:46 AM on August 13, 2007


Response by poster: Okay, folks -- for all of you who are still keeping track, my 160 gig replacement drive gets here tomorrow. I'm going to try using ddrescue, and to this end have burned a copy of the Trinity Rescue Kit.

Thanks for the incredible responses in this thread. I will let you all know how it goes.
posted by killdevil at 4:42 PM on August 13, 2007


Response by poster: Oh yeah -- how do I "run another pass with ddrescue" as is suggested a couple of times above?

Presumably, I need use the same command line arguments (substituting --max-retries=20 for --max-retries=0), referencing the same logfile, the same source disk and the same destination disk, and ddrescue then proceeds to thrash the disk only where it couldn't copy blocks over the first time?
posted by killdevil at 4:50 PM on August 13, 2007


Correct.

Freaky, I've got very little BSD experience. If I were to find myself using a BSD box and recoverdisk, how would I go about bypassing the kernel cache and getting unbuffered access to the underlying disk device?
posted by flabdablet at 5:24 PM on August 13, 2007


Also, does recoverdisk maintain an external logfile like ddrescue does, allowing for easy use in multiple passes with different retry and/or device settings, and is there a Linux port available?

I like small effective tools.
posted by flabdablet at 5:26 PM on August 13, 2007


Response by poster: Oh shit, I killed it.

I got about 65% of the way through the copy with only a few errors... and then the drive seems to have committed seppuku.

So I have 65 percent of the data, presumably. What syntax can I use to mount the image file I was copying off the drive?
posted by killdevil at 12:02 AM on August 14, 2007


Response by poster: Yeah, so the image file with 65% of the data says it's busy (i.e. it has an asterisk after it). I can't figure out how to make it unbusy, which is apparently a prerequisite to mounting it.
posted by killdevil at 12:03 AM on August 14, 2007


Response by poster: I'm going to bed.
posted by killdevil at 12:03 AM on August 14, 2007


Don't panic. This might be something as simple as dust in a USB connector.

When you say "the drive seems to have committed seppuku": what exactly happened? Did you suddenly hit a huge run of errors, or what?

Can you put a copy of your ddrescue log file somewhere we could look at it?

Did you use ctrl-C to interrupt ddrescue before attempting to mount its output file?

What did smartctl -a say about your failing drive before you started, and what does it say now?
posted by flabdablet at 3:26 AM on August 14, 2007


The syntax for a safe (read-only) mount of your image file, for Trinity Rescue Kit, is

mount -o ro,loop /path/to/image/file /mnt0

But don't give up on the dodgy drive just yet. It may well come good after cooling down overnight.

It would be helpful to know exactly what you did, and in what order.
posted by flabdablet at 3:31 AM on August 14, 2007


flabdablet: You don't need to do anything special to bypass caching in FreeBSD when you're poking at the devices using /dev -- there's no buffer cache like Linux, that's all handled at filesystem (vnode) level.

And yes, recoverdisk supports logging: recoverdisk -r log -w log /dev/in /dev/out

It doesn't have different retry settings; it just keeps going until it's done. If a block read fails, it logs it and moves on to come back to it later; ddrescue does this too, so I'm not convinced your --max-retries=0 followed by --max-retries=20 really helps at all: it will first read the entire disk with no retries, skipping unreadable bits, then only when it's done a single scan of the disk will it return to the failing blocks; pretty much exactly what you get with your 2 runs, unless the docs or my reading of them are inaccurate.

There's no Linux port I'm aware of; it's only just been promoted to the default base FreeBSD install in time for the release of FreeBSD 7 (after lounging in the source tree for a year; make -C /usr/src/tools/tools/recoverdisk install). It probably wouldn't be difficult; mostly a case of mapping the #include and ioctl() bits to the equivilent Linux ones and copying in the FreeBSD linked list headers.
posted by Freaky at 10:39 AM on August 14, 2007


Freaky, does that mean that there's no real difference between a character device and a block device with BSD, apart from the allowable lengths for reads and writes? Do BSD block devices have special I/O buffer alignment rules like Linux raw devices do?

The behavior you've described for recoverdisk sounds like what ddrescue does if you give it --max-retries=-1 (infinity).

The point of doing two separate invocations of ddrescue is that on the first one, with --max-retries=0, you know it's not going to hammer your source drive - it's going to make one pass, reading what it can, then split the error areas until it can't get any more good reads, then stop. Seems to me that this is unlikely to stress a failing drive unto death (despite what seems to be happening to killdevil) and examining the logfile at the end of the run will tell you just how bad a drive you're dealing with. There's even a --no-split option if you'd rather minimize failed-block reads at the expense of less-precise error location in the log.

The ddrescue logfile format is also very simple and clear and easy to parse/generate, and assorted generous people have written all kinds of useful extras that do so: see this discussion, which also addresses part (b) of killdevil's original question.

Completely by coincidence, just yesterday I had occasion to use these tools to fix a Dell Inspiron 1501 laptop. It was refusing to boot Windows, complaining that \WINDOWS\system32\config\system was missing or corrupt.

When I tried to smartctl -a the drive, it reported that SMART wasn't supported (thanks a heap, Fujitsu!) so I used TRK to do a no-retries pass of ddrescue to an image file on a network drive; turned out there was just one bad block.

The chances of one bad block out of 60GB occurring in the registry file holding HKEY_LOCAL_MACHINE\SYSTEM seemed pretty remote to me, but there you have it.

So, I made a copy of the logfile, and ran ddrescue again like so:
ddrescue --complete-only --max-retries=1 /dev/zero /dev/raw/raw1 copy.of.logfile
This caused ddrescue to "recover" /dev/zero onto the drive with the bad block, writing only those bytes marked bad in the logfile. The effect is to write zero blocks over all those found to be bad on the first pass, causing the drive to reallocate them.

I was then able to mount the NTFS partition on the failing drive again, loop-mount the image I made six months ago of the Windows partition on that laptop, and replace the failed file.

Windows then booted fine, and all I had to do was redo a drive letter assignment change I'd made since the backup was taken (no other hardware changes had happened on the machine since the backup, so I'm pretty confident that it should now be in good shape).

It really is nice, when doing this kind of work, to have a large amount of NAS available to play with. In this particular instance, the NAS was a Windows XP Home box with a 90GB partition allocated for first-level backups. That partition is formatted NTFS and has compression turned on; turns out that my 60GB ddrescue image took up only 14GB on the NAS disk. NTFS compression definitely has a place in the scheme of things.

The freedom to read and write files inside an NTFS filesystem on a Linux loop device connected to a partition image inside a disk image file on a compressed NTFS filesystem on a remote machine running Windows just boggles my mind. The fact that this is all fairly straightforward to achieve, with a suite of free tools that runs entirely from RAM on the target machine, boggles it even more. Open-source tools ROCK.
posted by flabdablet at 8:45 PM on August 14, 2007


flabdablet: No, there simply aren't any block devices in FreeBSD, everything's a character device. IO does need to be aligned, because you're pretty much directly talking to the underlying devices.

The point of doing two separate invocations of ddrescue is that on the first one, with --max-retries=0, you know it's not going to hammer your source drive - it's going to make one pass, reading what it can, then split the error areas until it can't get any more good reads

But you suggested one run with --max-retries=0 and another with --max-retries=20; it seems to me that just doing --max-retries=20 in the first place would do precisely the same as these two runs. See stages 2, 3 and 4 of the algorithm in info ddrescue;

2: serial large-IO reads, skip all fails;
3: go back over failed areas to find the actual failing sectors;
4: read the failed sectors until max retries is reached.

That is, with --max-retries=0 you run stage 2 and 3, and then with --max-retries=20 you run stage 4; if you wanted to run stage 4 anyway, why not just run --max-retries=20 in the first place? :)

recoverdisk doesn't have a clear distinction between stage 3 and 4, since it's just a generalization of "find a bad block, put it on the end of the work queue and try to do it in smaller chunks". A --max-retries for it wouldn't be difficult, but may be trickier if you wanted it to mean precisely the same thing (since the retry count it maintains internally inherits from previous, potentially larger reads, a trivially implemented --retry-count=0 would therefore imply --no-split).

You know, it's tempting to modify geom_zero to simulate a failing drive to play with some of this stuff; we already have dummynet to simulate unreliable network interfaces. Anyone want a project for their CS degree? ;)
posted by Freaky at 9:51 AM on August 15, 2007


Perhaps I was not explicit enough about taking stock and deciding what to do next between the retries=0 run and the retries=20 run.

That GEOM stuff looks interesting. Looks to me like GEOM_cache could easily give a BSD device semantics similar to those of a traditional block device, complete with write-order indeterminacy.

I really must spend some time getting stuck into the BSD family - I'm sure this would make me feel more comfortable trying to sort Mac stuff out, for a start.
posted by flabdablet at 3:34 AM on August 16, 2007


Response by poster: Postscript: no dice with further extraction of data from the destroyed drive. Trying to spin it up in order to mount it again resulted in lots of discordant screeching, clacking and clicking, but no actual access to any data (mount -t ntfs basically hung forever).

Nor was I able to do anything useful, at first, with the 65% of the drive I dumped out as an image file (using ddrescue) to an external hard drive. None of the command line tools included with the Trinity Rescue Kit were able to identify the partial drive image as such, to say nothing of actually mounting either of the two nfts partitions that were (possibly) entombed within.

I ran "strings partialimage.img | grep '<a string that was inside a crucial file I needed>' " on the file and got lots of valid output, proving that in fact my data was intact inside the image.

Not knowing how to proceed on the Unixy side of things, I booted into Windows and started playing with data recovery and data forensics programs that purported to identify and recover files from completely hosed NTFS partitions. Relatively quickly, I found two programs that together allowed me to recover the entire filesystem (nearly all of my work data got copied over in the 65% of the disk that dumped out before the HD failed completely).

So I am happy, and disaster has been averted. Thanks again for all your help, folks.
posted by killdevil at 8:56 PM on August 22, 2007


Glad you finally had a success; sorry the process was nerve-wracking. Discordant screeching from a disk drive is generally a Bad Sign - sounds like your drive was closer to death than any I've encountered myself. I'm sure paulsc has a few similar war stories, though :-)

Also, I'm sorry I steered you wrong with my loop-mount command. I'll fix that now for posterity.

The command I gave would have been the correct way to mount an image of a partition, not an image of a whole device.

Mounting a filesystem contained in a partition image that's buried inside a whole-drive image requires knowing how far into the drive image the partition image starts, then using losetup with the appropriate value for the -o (offset) option to make the partition image accessible as a loop device; then you can use mount to mount the filesystem contained in that device.

In the typical Windows case there will only be one partition on a given disk device, and it will start at disk block 63. losetup wants its offset as a byte offset, not a block offset, so you'd let the shell multiply that block number by 512:

losetup -o $((63*512)) /dev/loop1 /path/to/image/file
mount -o ro /dev/loop1 /mnt0


For a drive image with more than one partition, the first step is to find out where the start of the partition you want is:

sfdisk --dump /path/to/image/file

would show you the starting blocknumber for each entry in the image file's partition table, and you'd just use that instead of 63 in the losetup command line.

Note that because mount has been told explicitly which loop device to use, there's no need for mount -o ro,loop in this case; mount -o ro (read-only) is sufficient. If you were actually going to use the image as a fully live filesystem as opposed to a source of recovered data, you could even dispense with -o ro and mount it writeable. But for data recovery from possibly-corrupted filesystems, you definitely want to go read-only. The loop device maps to the portion of your image file between the specified offset and the end of the file, not the end of the embedded partition. It would be sad for filesystem corruption on partition 1 to cause inadvertent scribbling into partition 2.
posted by flabdablet at 10:40 PM on August 22, 2007


« Older Green Buildings in London   |   End User Confused & Concerned Newer »
This thread is closed to new comments.