* drive failed, need help with interpretation / recovery
@ 2008-04-09 17:05 Christian Pernegger
2008-04-09 20:53 ` Richard Scobie
0 siblings, 1 reply; 2+ messages in thread
From: Christian Pernegger @ 2008-04-09 17:05 UTC (permalink / raw)
To: Linux RAID
Found an e-mail from mdam in my inbox and this in the logs:
Apr 8 04:44:50 jesus kernel: ata3.00: exception Emask 0x0 SAct 0x1
SErr 0x0 action 0x2 frozen
Apr 8 04:44:50 jesus kernel: ata3.00: cmd
60/00:00:00:6c:ef/01:00:2c:00:00/40 tag 0 cdb 0x0 data 131072 in
Apr 8 04:44:50 jesus kernel: res
40/00:00:00:00:02/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 8 04:44:51 jesus kernel: ata3: soft resetting port
Apr 8 04:45:01 jesus kernel: ata3: softreset failed (timeout)
Apr 8 04:45:01 jesus kernel: ata3: hard resetting port
Apr 8 04:45:11 jesus kernel: ata3: softreset failed (timeout)
Apr 8 04:45:11 jesus kernel: ata3: hard resetting port
Apr 8 04:45:46 jesus kernel: ata3: softreset failed (timeout)
Apr 8 04:45:46 jesus kernel: ata3: hard resetting port
Apr 8 04:45:51 jesus kernel: ata3: softreset failed (timeout)
Apr 8 04:45:51 jesus kernel: ata3: reset failed, giving up
Apr 8 04:45:51 jesus kernel: ata3.00: disabled
Apr 8 04:45:51 jesus kernel: ata3: EH complete
Apr 8 04:45:51 jesus kernel: sd 3:0:0:0: [sdd] Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 8 04:45:51 jesus kernel: end_request: I/O error, dev sdd, sector 753888256
Apr 8 04:45:51 jesus kernel: sd 3:0:0:0: [sdd] Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 8 04:45:51 jesus kernel: end_request: I/O error, dev sdd, sector 753888256
Apr 8 04:45:51 jesus kernel: raid5: Disk failure on sdd, disabling
device. Operation continuing on 3 devices
Apr 8 04:45:51 jesus kernel: RAID5 conf printout:
Apr 8 04:45:51 jesus kernel: --- rd:4 wd:3
Apr 8 04:45:51 jesus kernel: disk 0, o:1, dev:sdb
Apr 8 04:45:51 jesus kernel: disk 1, o:1, dev:sdc
Apr 8 04:45:51 jesus kernel: disk 2, o:0, dev:sdd
Apr 8 04:45:51 jesus kernel: disk 3, o:1, dev:sde
Apr 8 04:45:51 jesus kernel: RAID5 conf printout:
Apr 8 04:45:51 jesus kernel: --- rd:4 wd:3
Apr 8 04:45:51 jesus kernel: disk 0, o:1, dev:sdb
Apr 8 04:45:51 jesus kernel: disk 1, o:1, dev:sdc
Apr 8 04:45:51 jesus kernel: disk 3, o:1, dev:sde
---
Apr 9 17:46:08 jesus kernel: md: unbind<sdd>
Apr 9 17:46:08 jesus kernel: md: export_rdev(sdd)
Apr 9 17:47:24 jesus kernel: sd 3:0:0:0: [sdd] Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 9 17:47:24 jesus kernel: end_request: I/O error, dev sdd, sector 976773152
Apr 9 17:47:24 jesus kernel: Buffer I/O error on device sdd, logical
block 122096644
Apr 9 17:47:25 jesus kernel: sd 3:0:0:0: [sdd] Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 9 17:47:25 jesus kernel: end_request: I/O error, dev sdd, sector 976773152
Apr 9 17:47:25 jesus kernel: Buffer I/O error on device sdd, logical
block 122096644
[... lots more ...]
The first part is what was originally there. Here's what I did:
I --remove'd the drive, which went fine. Any further attempts to
access the drive, be it for a simple --(re-)add, --zero-superblock or
badblocks -w failed with the above errors.
At which point I shut down the machine to replace the drive but
restarted it instead by mistake - lo and behold, the drive is back and
working.
Re-adding it to the array went flawlessly and only took a few seconds
of recovery. (Might well be that there were no writes in the last few
days.)
BUT considering I already tried to zero the superblock and run a
destructive badblocks test - can I be sure that none of these commands
went through and the data and superblock on the intermittent disk are
ok? I started a "check" just to be sure, no errors yet, but I don't
know if it will pick up all errors, i. e. in the superblock or other
non-payload areas.
Should I
- fail the disk again manually, wipe it and force a full resync, with
the added risk of another disk going on holiday or
- let the "check" run its course and leave the disk as-is if
mismatch_cnt remains 0?
As for the failiure itself, maybe the dreaded
WD5000YS-drops-out-of-RAIDs-intermittently bug has finally bitten me
... I'm guessing I should exchange the disk just to be on the safe
side?
Thanks,
C.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: drive failed, need help with interpretation / recovery
2008-04-09 17:05 drive failed, need help with interpretation / recovery Christian Pernegger
@ 2008-04-09 20:53 ` Richard Scobie
0 siblings, 0 replies; 2+ messages in thread
From: Richard Scobie @ 2008-04-09 20:53 UTC (permalink / raw)
To: Linux RAID
Christian Pernegger wrote:
> As for the failiure itself, maybe the dreaded
> WD5000YS-drops-out-of-RAIDs-intermittently bug has finally bitten me
> ... I'm guessing I should exchange the disk just to be on the safe
> side?
Are you able to elaborate more on this? I have been running a 4 x
WD5000YS md RAID 5 for a year or so now without any trouble.
Regards,
Richard
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2008-04-09 20:53 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-09 17:05 drive failed, need help with interpretation / recovery Christian Pernegger
2008-04-09 20:53 ` Richard Scobie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).