* Observations of a failing disk
@ 2006-11-27 22:25 Richard Scobie
2006-11-28 0:29 ` dean gaudet
0 siblings, 1 reply; 3+ messages in thread
From: Richard Scobie @ 2006-11-27 22:25 UTC (permalink / raw)
To: Linux RAID Mailing List
I have a machine running Fedora 5, kernel 2.6.17-1.2187_FC5smp, with a
pair of software RAID 1 arrays (WD 500GB RE2), RAID 0'ed together. Every
14 days, one of the arrays has a "repair", (echo repair >
/sys/block/mdX/md/sync_action), run on it, to hopefully pick up and fix
dead sectors.
Over the weekend smartd emailed to say that there were "10 Currently
unreadable (pending) sectors" on /dev/sdc1.
mdstat showed this drive was still active in the array and this was the
case after smartd notified 4 hours later that there were now 20 pending
sectors.
After running a repair on the array, (confirmed by following resync
progress in mdstat), smartd was reporting 21 pending sectors.
Thinking I had nothing to lose, I failed and removed the drive and ran a
dd if=/dev/zero of=/dev/sdc1 bs=1048576
and as can be seen from the periodic smartd reports while this was
running, these sectors were fixed and smartctl -a now shows 0 pending
sectors.
Nov 28 08:07:30 bozo smartd[3066]: Device: /dev/sdc, 21 Currently
unreadable (pending) sectors
Nov 28 08:37:31 bozo smartd[3066]: Device: /dev/sdc, 11 Currently
unreadable (pending) sectors
Nov 28 09:07:31 bozo smartd[3066]: Device: /dev/sdc, 1 Currently
unreadable (pending) sectors
Nov 28 09:37:31 bozo smartd[3066]: Device: /dev/sdc, 1 Currently
unreadable (pending) sectors
Nov 28 10:07:31 bozo smartd[3066]: Device: /dev/sdc, 1 Currently
unreadable (pending) sectors
Curiously, the Reallocated_Sector_Ct only shows 2 instead of the 21 I
would have expected.
Anyway, my biggest concern is why
echo repair > /sys/block/md5/md/sync_action
appeared to have no effect at all, when I understand that it should
re-write unreadable sectors?
Regards,
Richard
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Observations of a failing disk
2006-11-27 22:25 Observations of a failing disk Richard Scobie
@ 2006-11-28 0:29 ` dean gaudet
2006-11-28 0:52 ` Richard Scobie
0 siblings, 1 reply; 3+ messages in thread
From: dean gaudet @ 2006-11-28 0:29 UTC (permalink / raw)
To: Richard Scobie; +Cc: Linux RAID Mailing List
On Tue, 28 Nov 2006, Richard Scobie wrote:
> Anyway, my biggest concern is why
>
> echo repair > /sys/block/md5/md/sync_action
>
> appeared to have no effect at all, when I understand that it should re-write
> unreadable sectors?
i've had the same thing happen on a seagate 7200.8 pata 400GB... and went
through the same sequence of operations you described, and the dd fixed
it.
one theory was that i lucked out and the pending sectors in the unused
disk near the md superblock... but since that's in general only about 90KB
of disk i was kind of skeptical. it's certainly possible, but seems
unlikely.
another theory is that a pending sector doesn't always result in a read
error -- i.e. depending on temperature? but the question is, why wouldn't
the disk try rewriting it if it does get a successful read.
i wish hard drives were a little less voodoo.
-dean
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Observations of a failing disk
2006-11-28 0:29 ` dean gaudet
@ 2006-11-28 0:52 ` Richard Scobie
0 siblings, 0 replies; 3+ messages in thread
From: Richard Scobie @ 2006-11-28 0:52 UTC (permalink / raw)
To: Linux RAID Mailing List
dean gaudet wrote:
> one theory was that i lucked out and the pending sectors in the unused
> disk near the md superblock... but since that's in general only about 90KB
> of disk i was kind of skeptical. it's certainly possible, but seems
> unlikely.
I can discount this one in my case, as sectors were being repaired
progressively across the disk, as can be seen in the smartd reports.
You can't have too many checks is the moral of the story I guess.
Regards,
Richard
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-11-28 0:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-27 22:25 Observations of a failing disk Richard Scobie
2006-11-28 0:29 ` dean gaudet
2006-11-28 0:52 ` Richard Scobie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).