Observations of a failing disk

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Observations of a failing disk
@ 2006-11-27 22:25 Richard Scobie
  2006-11-28  0:29 ` dean gaudet
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Scobie @ 2006-11-27 22:25 UTC (permalink / raw)
  To: Linux RAID Mailing List

I have a machine running Fedora 5, kernel 2.6.17-1.2187_FC5smp, with a 
pair of software RAID 1 arrays (WD 500GB RE2), RAID 0'ed together. Every 
14 days, one of the arrays has a "repair", (echo repair > 
/sys/block/mdX/md/sync_action),  run on it, to hopefully pick up and fix 
dead sectors.

Over the weekend smartd emailed to say that there were "10 Currently 
unreadable (pending) sectors" on /dev/sdc1.

mdstat showed this drive was still active in the array and this was the 
case after smartd notified 4 hours later that there were now 20 pending 
sectors.

After running a repair on the array, (confirmed by following resync 
progress in mdstat), smartd was reporting 21 pending sectors.

Thinking I had nothing to lose, I failed and removed the drive and ran a

dd if=/dev/zero of=/dev/sdc1 bs=1048576

and as can be seen from the periodic smartd reports while this was 
running, these sectors were fixed and smartctl -a now shows 0 pending 
sectors.

Nov 28 08:07:30 bozo smartd[3066]: Device: /dev/sdc, 21 Currently 
unreadable (pending) sectors
Nov 28 08:37:31 bozo smartd[3066]: Device: /dev/sdc, 11 Currently 
unreadable (pending) sectors
Nov 28 09:07:31 bozo smartd[3066]: Device: /dev/sdc, 1 Currently 
unreadable (pending) sectors
Nov 28 09:37:31 bozo smartd[3066]: Device: /dev/sdc, 1 Currently 
unreadable (pending) sectors
Nov 28 10:07:31 bozo smartd[3066]: Device: /dev/sdc, 1 Currently 
unreadable (pending) sectors

Curiously, the Reallocated_Sector_Ct only shows 2 instead of the 21 I 
would have expected.

Anyway, my biggest concern is why

echo repair > /sys/block/md5/md/sync_action

appeared to have no effect at all, when I understand that it should 
re-write unreadable sectors?

Regards,

Richard

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Observations of a failing disk
  2006-11-27 22:25 Observations of a failing disk Richard Scobie
@ 2006-11-28  0:29 ` dean gaudet
  2006-11-28  0:52   ` Richard Scobie
  0 siblings, 1 reply; 3+ messages in thread
From: dean gaudet @ 2006-11-28  0:29 UTC (permalink / raw)
  To: Richard Scobie; +Cc: Linux RAID Mailing List

On Tue, 28 Nov 2006, Richard Scobie wrote:

> Anyway, my biggest concern is why
> 
> echo repair > /sys/block/md5/md/sync_action
> 
> appeared to have no effect at all, when I understand that it should re-write
> unreadable sectors?

i've had the same thing happen on a seagate 7200.8 pata 400GB... and went 
through the same sequence of operations you described, and the dd fixed 
it.

one theory was that i lucked out and the pending sectors in the unused 
disk near the md superblock... but since that's in general only about 90KB 
of disk i was kind of skeptical.  it's certainly possible, but seems 
unlikely.

another theory is that a pending sector doesn't always result in a read 
error -- i.e. depending on temperature?  but the question is, why wouldn't 
the disk try rewriting it if it does get a successful read.

i wish hard drives were a little less voodoo.

-dean

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Observations of a failing disk
  2006-11-28  0:29 ` dean gaudet
@ 2006-11-28  0:52   ` Richard Scobie
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Scobie @ 2006-11-28  0:52 UTC (permalink / raw)
  To: Linux RAID Mailing List

dean gaudet wrote:

> one theory was that i lucked out and the pending sectors in the unused 
> disk near the md superblock... but since that's in general only about 90KB 
> of disk i was kind of skeptical.  it's certainly possible, but seems 
> unlikely.

I can discount this one in my case, as sectors were being repaired 
progressively across the disk, as can be seen in the smartd reports.

You can't have too many checks is the moral of the story I guess.

Regards,

Richard

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-11-28  0:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-27 22:25 Observations of a failing disk Richard Scobie
2006-11-28  0:29 ` dean gaudet
2006-11-28  0:52   ` Richard Scobie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).