Re: RAID 6 Failure follow up

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Dunn <andrew.g.dunn@gmail.com>
To: Roger Heflin <rogerheflin@gmail.com>, robin@robinhill.me.uk
Cc: linux-raid list <linux-raid@vger.kernel.org>
Subject: Re: RAID 6 Failure follow up
Date: Sun, 08 Nov 2009 09:30:21 -0500	[thread overview]
Message-ID: <4AF6D5FD.2010602@gmail.com> (raw)
In-Reply-To: <4AF6D461.3050109@gmail.com>

storrgie@ALEXANDRIA:~$ dmesg | grep sdi
[   31.019358] sd 11:0:0:0: [sdi] 1953525168 512-byte logical blocks:
(1.00 TB/931 GiB)
[   31.032233] sd 11:0:0:0: [sdi] Write Protect is off
[   31.032235] sd 11:0:0:0: [sdi] Mode Sense: 73 00 00 08
[   31.037483] sd 11:0:0:0: [sdi] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[   31.066991]  sdi:
[   31.075719]  sdi1
[   31.124713] sd 11:0:0:0: [sdi] Attached SCSI disk
[   31.147407] md: bind<sdi1>
[   31.712366] raid5: device sdi1 operational as raid disk 4
[   31.713153]  disk 4, o:1, dev:sdi1
[   33.112975]  disk 4, o:1, dev:sdi1
[  297.528544] sd 11:0:0:0: [sdi] Sense Key : Recovered Error [current]
[descriptor]
[  297.528573] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through
information available
[  297.591382] sd 11:0:0:0: [sdi] Sense Key : Recovered Error [current]
[descriptor]
[  297.591407] sd 11:0:0:0: [sdi] Add. Sense: ATA pass through
information available

I don't see anything glaring.

You should be able to force an assembly anyway (using the --force flag)
but I'd make sure you know exactly what the issue is first, otherwise
this is likely to happen again.

Do you think that the controller is dropping out? I know that I have 4
drives on one controller (AOC-USAS-L8i) and 5 drives on the other
controller (SAME make/model). but I think they are sequentially
connected... as in sd[efghi] should be on one device and sd[jklm] should
be on the other... any easy way to verify?

Roger Heflin wrote:
> Andrew Dunn wrote:
>> This is kind of interesting:
>>
>> storrgie@ALEXANDRIA:~$ sudo mdadm --assemble --force /dev/md0
>> mdadm: no devices found for /dev/md0
>>
>> All of the devices are there in /dev, so I wanted to examine them:
>>
>> storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sde1
>> /dev/sde1:
>>           Magic : a92b4efc
>>         Version : 00.90.00
>>            UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host
>> ALEXANDRIA)
>>   Creation Time : Fri Nov  6 07:06:34 2009
>>      Raid Level : raid6
>>   Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>>      Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>>    Raid Devices : 9
>>   Total Devices : 9
>> Preferred Minor : 0
>>
>>     Update Time : Sun Nov  8 08:57:04 2009
>>           State : clean
>>  Active Devices : 5
>> Working Devices : 5
>>  Failed Devices : 4
>>   Spare Devices : 0
>>        Checksum : 4ff41c5f - correct
>>          Events : 43
>>
>>      Chunk Size : 1024K
>>
>>       Number   Major   Minor   RaidDevice State
>> this     0       8       65        0      active sync   /dev/sde1
>>
>>    0     0       8       65        0      active sync   /dev/sde1
>>    1     1       8       81        1      active sync   /dev/sdf1
>>    2     2       8       97        2      active sync   /dev/sdg1
>>    3     3       8      113        3      active sync   /dev/sdh1
>>    4     4       0        0        4      faulty removed
>>    5     5       0        0        5      faulty removed
>>    6     6       0        0        6      faulty removed
>>    7     7       0        0        7      faulty removed
>>    8     8       8      193        8      active sync   /dev/sdm1
>>
>> First raid device shows the failures....
>>
>> One of the 'removed' devices:
>>
>> storrgie@ALEXANDRIA:~$ sudo mdadm --examine /dev/sdi1
>> /dev/sdi1:
>>           Magic : a92b4efc
>>         Version : 00.90.00
>>            UUID : 397e0b3f:34cbe4cc:613e2239:070da8c8 (local to host
>> ALEXANDRIA)
>>   Creation Time : Fri Nov  6 07:06:34 2009
>>      Raid Level : raid6
>>   Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>>      Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>>    Raid Devices : 9
>>   Total Devices : 9
>> Preferred Minor : 0
>>
>>     Update Time : Sun Nov  8 08:53:30 2009
>>           State : active
>>  Active Devices : 9
>> Working Devices : 9
>>  Failed Devices : 0
>>   Spare Devices : 0
>>        Checksum : 4ff41b2f - correct
>>          Events : 21
>>
>>      Chunk Size : 1024K
>>
>>       Number   Major   Minor   RaidDevice State
>> this     4       8      129        4      active sync   /dev/sdi1
>>
>>    0     0       8       65        0      active sync   /dev/sde1
>>    1     1       8       81        1      active sync   /dev/sdf1
>>    2     2       8       97        2      active sync   /dev/sdg1
>>    3     3       8      113        3      active sync   /dev/sdh1
>>    4     4       8      129        4      active sync   /dev/sdi1
>>    5     5       8      145        5      active sync   /dev/sdj1
>>    6     6       8      161        6      active sync   /dev/sdk1
>>    7     7       8      177        7      active sync   /dev/sdl1
>>    8     8       8      193        8      active sync   /dev/sdm1
>>
>
>
> Did you check dmesg and see if there were errors on those disks?
>
>

-- 
Andrew Dunn
http://agdunn.net

next prev parent reply	other threads:[~2009-11-08 14:30 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-08 14:07 RAID 6 Failure follow up Andrew Dunn
2009-11-08 14:23 ` Roger Heflin
2009-11-08 14:30   ` Andrew Dunn [this message]
2009-11-08 18:01     ` Richard Scobie
2009-11-08 18:22       ` Andrew Dunn
2009-11-08 18:34         ` Joe Landman
2009-11-08 22:09       ` Andrew Dunn
2009-11-08 22:59         ` Richard Scobie
2009-11-09  2:45           ` Ryan Wagoner
2009-11-09  2:57             ` Richard Scobie
2009-11-09  8:09             ` Gabor Gombas
2009-11-09 10:08               ` Andrew Dunn
2009-11-09 11:34                 ` Gabor Gombas
2009-11-09 22:04                   ` Andrew Dunn
2009-11-10 10:55                   ` Andrew Dunn
2009-11-10 11:34                     ` Vincent Schut
2009-11-11 12:34                       ` Andrew Dunn
2009-11-11 12:46                         ` Vincent Schut
2009-11-17  8:40                       ` Vincent Schut
2009-11-10 12:45                     ` Ryan Wagoner
2009-11-08 14:36   ` Andrew Dunn
2009-11-08 14:56     ` Roger Heflin
2009-11-08 17:08       ` Andrew Dunn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AF6D5FD.2010602@gmail.com \
    --to=andrew.g.dunn@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=robin@robinhill.me.uk \
    --cc=rogerheflin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.