How best to respond to a bad sector?
May 20, 2008 7:35 PM   Subscribe

Inconsistent information about bad sectors on a hard drive -- what's going on, and what should I do?

I am using an Ubuntu LiveCD to install Ubuntu on my friend's ThinkPad. When I try to run gparted in order to partition the drive, I get this error:
---------------------------
ntfsresize v2.0.0 (libntfs 10:0:0)
Device name : /dev/sda1
NTFS volume version: 3.1
Cluster size : 4096 bytes
Current volume size: 94034993664 bytes (94035 MB)
Current device size: 94035239424 bytes (94036 MB)
Checking for bad sectors ...
Bad cluster: 0x1300db4 - 0x1300db4 (1)
ERROR: This software has detected that the disk has at least 1 bad sector.
---------------------------------
I have run "chkdsk /f /r' and no errors were found. I also tried SeaTools (the Seagate hard disk diagnostic tool), which found 1 error and repaired it. Further scans with chkdsk and SeaTools no longer detect errors.

But when I boot the LiveCD again and run gparted, I still get the same error message about a bad sector from ntfsresize.

My questions:
- Which program is right, ntfsresize or chkdsk and SeaTools?
- Is one bad sector really something to worried about? Should I tell my friend to consider replacing the hard drive, and give up on installing Ubuntu for now?
- Will reformatting the drive fix the bad sector?
- If possible/advisable, how should I go about partitioning the drive without using gparted? I know I have to use ntfsresize using the --bad-sectors option, but after that I am not sure what to do.

I've dual booted Ubuntu and XP on my own computer for the past two and half years, but I never ran into a problem like this. (And yes, I checked the Ubuntu forums, where I found a lot of relevant threads with mostly conflicting and inconclusive responses.)
posted by puffin to Computers & Internet (14 answers total)
 
Linux is generally a lot less tolerant of flakey hardware. Your drive is on its way out, toss it.

If you still have data on the windows partition, now is the time to start copying it out.
posted by b1tr0t at 7:42 PM on May 20, 2008


- chkdsk only checks that the filesystem is clean. It can discover bad sectors, but that really isn't its forte. Far as I know.
- SeaTools is best for this. It's also pretty definitive.
- not necessarily a bad thing. Could be if it's a trend.
- If it's an isolated case, then yeah a format will remap the bad sector. Well, a zeroing out of the drive by seatools. Or something like dd if=/dev/urandom of=/dev/harddrive
- I don't know what ntfsresize does.
posted by gjc at 7:44 PM on May 20, 2008


bin the drive before you lose important data from it - because that's what will happen if you keep on using it.
posted by polyglot at 8:03 PM on May 20, 2008


As others have said, backup the drive and toss it.

The canonical tool for this sort of thing on Windows is Steve Gibson's Spinrite. It costs, but it's worth the price.
posted by stavrosthewonderchicken at 9:00 PM on May 20, 2008


Your drive may have weeks, days or minutes to live. Get your important stuff off today!
posted by OlderThanTOS at 10:53 PM on May 20, 2008


Before I got too carried away, I'd check the drive's S.M.A.R.T. registers for statistics, using a tool like Stellar Smart. If the drive has plenty of spare sectors left in its spare sector pool, I wouldn't worry too much. All hard drives, even brand new from the factory, have bad sectors. These are generally just imperfect areas of the platter, or more particularly, of the oxide coating on the platter(s), and they are automatically marked out for additional reading and writing, and replaced from the spare sectors pool initially allocated by the drive's low level format.

It's only when a drive has started accumulating lots of bad sectors, fairly suddenly, or that you've finally exhausted the spare sectors pool, that it is time to worry, and replace the drive. These symptoms are usually caused by a head crash which has physically scratched a platter or its oxide coating in one area, or eventual stress causing flaking of the oxide coating in one area. Repeated attempts to format or recover data from the area can actually cause the drive to fail faster, as the head is moved into the damaged area of the platter, often intensively, trying to read data, or repair the sector map. If you can just mark out the suspect sector, and never go there again, the drive may actually work a long while without further degradation. This happy result, of course, is more likely, if the bad area isn't in the middle of a platter, where the head has to physically traverse the area repeatedly on its way to further inner or outer tracks, even once it has been marked out. Head servos are good at jumping small defect areas, but not perfect, and if the flaking is across several radially contiguous sectors of the platter, the drive will continue to fail in operation, no matter what clever remap schemes are used, simply because the head servo can't jump the heads out of the way fast enough to get around the damage.
posted by paulsc at 11:47 PM on May 20, 2008


What's likely happened here is that NTFS has at some point noticed the bad sector on your drive, and added the cluster containing it to its own Bad Clusters list. NTFS will only ever grow that list, never shrink it; so even though you've now run SeaTools against your drive and made it reallocate that sector, effectively making it a good sector again, NTFS has still got at least one bad cluster in its list and ntfsresize won't mess with it unless you give it that --bad-sectors option.

I've successfully used the NOBADS command built into DFSee to reset an NTFS bad cluster list before. This tool has a totally bizarre user interface, but having found NOBADS, it did what it said on the tin.

Alternatively, you can do the resize by hand using ntfsresize with the --bad-sectors option, and repartition your disk using cfdisk (menu-driven, reasonably intuitive, runs in a terminal, and like all partitioning tools capable of causing great distress to the unwary).
posted by flabdablet at 3:52 AM on May 21, 2008 [1 favorite]


Drives don't (almost ever) go just a-little-bit bad.

I'd tell your friend "Sorry, Ubuntu is really great, but it refuses to use your disc because it thinks it's going to fail. For now, either use Windows or buy a new disc and we'll put Ubuntu on it. Ubuntu could be wrong about your disc failing in weeks or months, but just in case it's not, you should make rolling copies of your data while you continue to use Windows."
posted by cmiller at 5:00 AM on May 21, 2008


As flabdablet mentioned. I can't prove it, but I think there are bad sectors and then there are Bad Sectors. The first is sort of a soft error, and the bad one is where the underlying hardware is starting to fail. One scenario where a "bad sector" may appear but not exactly be legitimate is when you have a hunk of data on the drive that was written once and was never touched since. When you go to copy or check the disk, that spot of data is so old and different from more current data that the drive starts having trouble reading it. The underlying drive is still good, but from the frame of reference of the OS or checking tool, that sector is bad.

I've had numerous drives that had "freak outs" like this. In about half the cases, zeroing out the drive with dban or the Seatools types of things has rendered the drive completely operable again. Happens a lot with older servers with hardware RAID. The RAID controller starts to notice one drive's "health" statistics going bad and fails it. If you tell RAID to rebuild it, it begins to work fine with no troubles ever again.

The lesson here is that ALL drives are untrustworthy, and that backups are the only way to assure that your data is safe.
posted by gjc at 6:52 AM on May 21, 2008


Thanks for the all the responses. I'm currently running Spinrite on the computer, and waiting to see what it comes up with. My friend regularly backs up all his data, and he will do it again this afternoon once Spinrite is done. If Spinrite doesn't find any problems I think I will try partitioning using ntfsresize and cfdisk as flabdablet suggested.
posted by puffin at 7:25 AM on May 21, 2008


I expect Spinrite will be a waste of time, since any errors it's going to fix have already been fixed by SeaTools. All Spinrite does is make heroic efforts to read the contents of any failing sector before rewriting it in-place. The drive already knows that the sector is bad (if you look in a SMART log beforehand, you see it has at least one sector "pending reallocation") and will reallocate it as soon as it's rewritten.

SeaTools relies on exactly the same drive behaviour to perform its "repair", but makes no attempt to recover the data from the failed sector before rewriting it; it just uses zeroes.

Neither Spinrite nor SeaTools deals with any filesystem on the drive; they just work with the underlying raw sectors. So neither of them is going to cause NTFS to let go of the idea that one of its clusters is bad. The only way that NTFS will ever believe the disk is 100% good is if you reset its bad cluster list.

DFSee's NOBADS command really can do this, and you're allowed to run it free for evaluation. I'd do that.
posted by flabdablet at 7:40 PM on May 21, 2008


Oh, and about drives going a little bit bad: in my experience, paulsc and gjc are right and cmiller is wrong. I don't believe a drive that's had a few bad-sector remaps after leaving the factory is any less reliable than one whose bad-sector remaps all happened before you bought it. Back them all up and don't sweat it.
posted by flabdablet at 7:45 PM on May 21, 2008


I think that part of the problem with "bad sectors" is that the drive and the filesystem (and by extension, we the users) can't know what the real reason for the error was. Bad sectors can indicate a lot of different things, and "platter cancer" is only one of those things.

A helpful metaphor might be to look at a hard drive as one of those dry-erase whiteboards. In a perfect world, you write and erase and everything stays how you want it. But If you come back one day and things aren't there, you don't know why. Maybe the surface of the board has deteriorated and makes things disappear- that's a hardware issue and you replace it. But maybe it's just blank- you don't know why the stuff is gone. And if you write something else there, it stays just fine. That would be some kind of software issue.
posted by gjc at 6:04 AM on May 22, 2008


Ah, but the drive knows what went wrong, and we the users have SMART interrogation tools to find out what it knows and judge how concerned we need to be about that.

Knowing the reason for the error is not really the issue here, anyway. The issue here is a deficiency in NTFS, or perhaps in chkdsk, that makes it not remove bad clusters from the bad cluster list even after a surface scan has found them to be readable. This is quite annoying behaviour, given that hard drives have been doing automatic, transparent reallocation of bad sectors on next write for at least a decade.
posted by flabdablet at 6:50 AM on May 22, 2008


« Older My prescription is ourageous f...   |  Is there a cheap or free way t... Newer »
This thread is closed to new comments.