linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joe Landman <landman@scalableinformatics.com>
To: Andrew Dunn <andrew.g.dunn@gmail.com>
Cc: Richard Scobie <richard@sauce.co.nz>,
	Roger Heflin <rogerheflin@gmail.com>,
	robin@robinhill.me.uk,
	linux-raid list <linux-raid@vger.kernel.org>,
	nfbrown@novell.com
Subject: Re: RAID 6 Failure follow up
Date: Sun, 08 Nov 2009 13:34:05 -0500	[thread overview]
Message-ID: <4AF70F1D.4010604@scalableinformatics.com> (raw)
In-Reply-To: <4AF70C61.5030301@gmail.com>

Andrew Dunn wrote:
> New data now, I got this from dmesg when it went down again. Hopefully
> there is some significance to you guys.
> 
>> [14269.650381] sd 10:0:3:0: rejecting I/O to offline device
>> [14269.650453] sd 10:0:3:0: rejecting I/O to offline device
>> [14269.650524] sd 10:0:3:0: rejecting I/O to offline device
>> [14269.650595] sd 10:0:3:0: rejecting I/O to offline device
>> [14269.650672] sd 10:0:3:0: [sdh] Unhandled error code
>> [14269.650675] sd 10:0:3:0: [sdh] Result: hostbyte=DID_NO_CONNECT
> driverbyte=DRIVER_OK
>> [14269.650680] end_request: I/O error, dev sdh, sector 1435085631
>> [14269.650749] raid5:md0: read error not correctable (sector
> 1435085568 on sdh1).
>> [14269.650753] raid5: Disk failure on sdh1, disabling device.
>> [14269.650754] raid5: Operation continuing on 7 devices.
>> [14269.650886] raid5:md0: read error not correctable (sector
> 1435085576 on sdh1).
>> [14269.650890] raid5:md0: read error not correctable (sector
> 1435085584 on sdh1).
>> [14269.650894] raid5:md0: read error not correctable (sector
> 1435085592 on sdh1).

[...]

I am not convinced this is a drive failure (yet).  You have 
sdh,sdi,sdj,sdk,sdl,sdm all reporting errors or error recovery.

This sounds like a physical backplane failure (is this on an expander 
system? we have seen this/had this happen before), a cable to the SATA 
card failing (we have seen this/had this happen before), or a power 
supply issue (can't handle all the drives in constant operation, which 
we have seen before as well).

Driver issues are possible, but it is pursuing normal failure code 
paths, so unless the driver is tickling the remove code on its own ...

Smart could be offlining the drive, and having it non-responsive. 
Something else could be doing that as well (vibration, power quality, ...)

What does

	hdparm -I /dev/sdh

tell us?

If nothing, we need to use sdparm to get some information.

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

  reply	other threads:[~2009-11-08 18:34 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-08 14:07 RAID 6 Failure follow up Andrew Dunn
2009-11-08 14:23 ` Roger Heflin
2009-11-08 14:30   ` Andrew Dunn
2009-11-08 18:01     ` Richard Scobie
2009-11-08 18:22       ` Andrew Dunn
2009-11-08 18:34         ` Joe Landman [this message]
2009-11-08 22:09       ` Andrew Dunn
2009-11-08 22:59         ` Richard Scobie
2009-11-09  2:45           ` Ryan Wagoner
2009-11-09  2:57             ` Richard Scobie
2009-11-09  8:09             ` Gabor Gombas
2009-11-09 10:08               ` Andrew Dunn
2009-11-09 11:34                 ` Gabor Gombas
2009-11-09 22:04                   ` Andrew Dunn
2009-11-10 10:55                   ` Andrew Dunn
2009-11-10 11:34                     ` Vincent Schut
2009-11-11 12:34                       ` Andrew Dunn
2009-11-11 12:46                         ` Vincent Schut
2009-11-17  8:40                       ` Vincent Schut
2009-11-10 12:45                     ` Ryan Wagoner
2009-11-08 14:36   ` Andrew Dunn
2009-11-08 14:56     ` Roger Heflin
2009-11-08 17:08       ` Andrew Dunn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AF70F1D.4010604@scalableinformatics.com \
    --to=landman@scalableinformatics.com \
    --cc=andrew.g.dunn@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=nfbrown@novell.com \
    --cc=richard@sauce.co.nz \
    --cc=robin@robinhill.me.uk \
    --cc=rogerheflin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).