Degraded Raid6

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Degraded Raid6
       [not found] <S1754985Ab2LLT4x/20121212195656Z+425@vger.kernel.org>
@ 2012-12-12 20:49 ` Bernd Waage
  2012-12-13  8:54   ` Mikael Abrahamsson
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Bernd Waage @ 2012-12-12 20:49 UTC (permalink / raw)
  To: linux-raid

Hello all,

I run an 8-disk raid 6 on which sporadically 2 drives dropped out, that I could just re-add when I zeroed the zuperblock beforehand. Recently, upon re-adding those 2 drives (after the zero-superblock) a third drive dropped out after 5-10 minutes of syncing. I then did a zero-superblock on the third drive and tried to re-add it - which failed.

I'm pretty much at my wits' end and stumbled upon this list. Perhaps someone of you guys can help me out. I'm running an ubuntu 12.04 box with kernel 3.3.8, so I should not be affected by the kernel-bug that popped up some time ago.

I append the output of mdadm --detail as well as mdadm --examine...


berndman@berndman:~$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Thu Aug  4 21:32:45 2011
     Raid Level : raid6
  Used Dev Size : 1953509376 (1863.01 GiB 2000.39 GB)
   Raid Devices : 8
  Total Devices : 8
    Persistence : Superblock is persistent

    Update Time : Fri Nov 30 07:24:57 2012
          State : active, FAILED, Not Started
 Active Devices : 5
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 3

         Layout : left-symmetric
     Chunk Size : 4096K

           Name : berndman-System:0
           UUID : ef105344:6158fb94:c37cc33b:00ac2d04
         Events : 25340

    Number   Major   Minor   RaidDevice State
       0       8       81        0      active sync   /dev/sdf1
       9       8       49        1      active sync   /dev/sdd1
       2       0        0        2      removed
       3       0        0        3      removed
       4       8      129        4      active sync   /dev/sdi1
       8       8       33        5      active sync   /dev/sdc1
       6       8       17        6      active sync   /dev/sdb1
       7       0        0        7      removed

      10       8      113        -      spare   /dev/sdh1
      11       8        1        -      spare   /dev/sda1
      12       8       65        -      spare   /dev/sde1
berndman@berndman:~$







 * Documentation:  https://help.ubuntu.com/

Last login: Wed Dec 12 21:16:35 2012 from berndman-pc.local
berndman@berndman:~$ sudo mdadm --examine /dev/sdf1 /dev/sdd1 /dev/sdi1 /dev/sdc1 /dev/sdb1 /dev/sdh1 /dev/sda1 /dev/sde1
[sudo] password for berndman:
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
           Name : berndman-System:0
  Creation Time : Thu Aug  4 21:32:45 2011
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
     Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
  Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 4792cd00:92d53de8:4d2d7438:dc86e0fd

    Update Time : Fri Nov 30 07:24:57 2012
       Checksum : dc7a3261 - correct
         Events : 25340

         Layout : left-symmetric
     Chunk Size : 4096K

   Device Role : Active device 0
   Array State : AA..AAA. ('A' == active, '.' == missing)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
           Name : berndman-System:0
  Creation Time : Thu Aug  4 21:32:45 2011
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
     Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
  Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 6670d0fb:447d937f:be88f731:f6161042

    Update Time : Fri Nov 30 07:24:57 2012
       Checksum : ac85a3c6 - correct
         Events : 25340

         Layout : left-symmetric
     Chunk Size : 4096K

   Device Role : Active device 1
   Array State : AA..AAA. ('A' == active, '.' == missing)
/dev/sdi1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
           Name : berndman-System:0
  Creation Time : Thu Aug  4 21:32:45 2011
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
     Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
  Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 67054bcd:7e9c5380:452a8756:14dd629e

    Update Time : Fri Nov 30 07:24:57 2012
       Checksum : ffa2bfa1 - correct
         Events : 25340

         Layout : left-symmetric
     Chunk Size : 4096K

   Device Role : Active device 4
   Array State : AA..AAA. ('A' == active, '.' == missing)
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
           Name : berndman-System:0
  Creation Time : Thu Aug  4 21:32:45 2011
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
     Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
  Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 8722a93e:a2397924:1d28b78b:e786004e

    Update Time : Fri Nov 30 07:24:57 2012
       Checksum : f9f42193 - correct
         Events : 25340

         Layout : left-symmetric
     Chunk Size : 4096K

   Device Role : Active device 5
   Array State : AA..AAA. ('A' == active, '.' == missing)
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
           Name : berndman-System:0
  Creation Time : Thu Aug  4 21:32:45 2011
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
     Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
  Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : b1426d37:1ebaa27c:f1eea681:1a55b574

    Update Time : Fri Nov 30 07:24:57 2012
       Checksum : 678657ff - correct
         Events : 25340

         Layout : left-symmetric
     Chunk Size : 4096K

   Device Role : Active device 6
   Array State : AA..AAA. ('A' == active, '.' == missing)
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
           Name : berndman-System:0
  Creation Time : Thu Aug  4 21:32:45 2011
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
     Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
  Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : b80ab95e:7271c8a4:87980189:25169231

    Update Time : Fri Nov 30 07:24:57 2012
       Checksum : 7b2ede44 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 4096K

   Device Role : spare
   Array State : AA..AAA. ('A' == active, '.' == missing)
/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
           Name : berndman-System:0
  Creation Time : Thu Aug  4 21:32:45 2011
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
     Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
  Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : a002afcf:af02cbfc:fa09bff8:252116c7

    Update Time : Fri Nov 30 07:24:57 2012
       Checksum : 496946da - correct
         Events : 25340

         Layout : left-symmetric
     Chunk Size : 4096K

   Device Role : spare
   Array State : AA..AAA. ('A' == active, '.' == missing)
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
           Name : berndman-System:0
  Creation Time : Thu Aug  4 21:32:45 2011
     Raid Level : raid6
   Raid Devices : 8

 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
     Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
  Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : f3ff2451:d3c11122:c1e8a54a:0481d299

    Update Time : Fri Nov 30 07:24:57 2012
       Checksum : 14c941f6 - correct
         Events : 25340

         Layout : left-symmetric
     Chunk Size : 4096K

   Device Role : spare
   Array State : AA..AAA. ('A' == active, '.' == missing)




regards,
Bernd

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Degraded Raid6
  2012-12-12 20:49 ` Degraded Raid6 Bernd Waage
@ 2012-12-13  8:54   ` Mikael Abrahamsson
  2012-12-13  9:52   ` Robin Hill
  2012-12-13 18:51   ` Roy Sigurd Karlsbakk
  2 siblings, 0 replies; 6+ messages in thread
From: Mikael Abrahamsson @ 2012-12-13  8:54 UTC (permalink / raw)
  To: Bernd Waage; +Cc: linux-raid

On Wed, 12 Dec 2012, Bernd Waage wrote:

> I run an 8-disk raid 6 on which sporadically 2 drives dropped out, that 
> I could just re-add when I zeroed the zuperblock beforehand. Recently, 
> upon re-adding those 2 drives (after the zero-superblock) a third drive 
> dropped out after 5-10 minutes of syncing. I then did a zero-superblock 
> on the third drive and tried to re-add it - which failed.

At this point, your only option is to re-create the raid superblocks via 
the --create command. Since you took your RAID6 (which can handle 2 drives 
going bad) with 2 bad drives, and then zero:ed the superblock on a third 
drive, you now have a non-working RAID6.

What did you expect to happen when you zero:ed the third drive?

I would imagine that the third drive dropped out due to read error? Why 
did the other two drives drop out?

Anyhow, you now have two options. You can re-create the raid6 with two 
missing drives, meaning you still have the read error somewhere (if that's 
what happened), and try to work around that (perhaps dd_rescue the drive 
to a non-defective drive, losing the information on the bad blocks), or 
you can decide that very little has changed since the first two drives 
dropped out, and use one of those instead of the drive with the read 
errors.

Please read up carefully regarding data mdadm versions differing with 
default data offset, chunk size etc. Recreating with --create 
--assume-clean is a big dangerous hammer that might wipe all your data. 
Mess up the order of the drives and forget --assume-clean or write to the 
array in this state, and you might destroy things.

If you value your data, be patient and make sure you understand all the 
ramifications of what you're doing. It might be a good thing to back up 
all drives before continuing (if you have the economic possibility of 
doing so).

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Degraded Raid6
  2012-12-12 20:49 ` Degraded Raid6 Bernd Waage
  2012-12-13  8:54   ` Mikael Abrahamsson
@ 2012-12-13  9:52   ` Robin Hill
  2012-12-13 18:51   ` Roy Sigurd Karlsbakk
  2 siblings, 0 replies; 6+ messages in thread
From: Robin Hill @ 2012-12-13  9:52 UTC (permalink / raw)
  To: Bernd Waage; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2958 bytes --]

On Wed Dec 12, 2012 at 09:49:07PM +0100, Bernd Waage wrote:

> Hello all,
> 
> I run an 8-disk raid 6 on which sporadically 2 drives dropped out,
> that I could just re-add when I zeroed the zuperblock beforehand.
> Recently, upon re-adding those 2 drives (after the zero-superblock) a
> third drive dropped out after 5-10 minutes of syncing. I then did a
> zero-superblock on the third drive and tried to re-add it - which
> failed.
> 
Firstly, drives sporadically dropping out of the array should _never_
just be ignored. You have a problem with your setup which needs fixing.
If the drives are actually okay (run SMART and full badblocks tests on
them) then it's probably a controller issue. I used to have a similar
issue on one of my servers and fixed it by moving the drives off the
onboard SATA controller and onto a proper SAS/SATA controller card.
Alternately, it may be the cables, power supply, or input power
fluctuations.

> I'm pretty much at my wits' end and stumbled upon this list. Perhaps
> someone of you guys can help me out. I'm running an ubuntu 12.04 box
> with kernel 3.3.8, so I should not be affected by the kernel-bug that
> popped up some time ago.
> 
> I append the output of mdadm --detail as well as mdadm --examine...
> 
They're all using the same data offset anyway, which is good. You do
need to check the mdadm version though as versions 3.2.4 and above use a
different data offset (as do versions prior to 3.0).

I'd also recommend checking the drives before proceeding - full SMART
tests and read-only badblocks tests on each drive should find any
issues (if there are any then you'll need to get replacements and clone
the old ones).

You'll then need to recreate the array, using exactly the same
parameters as for the original array. From the looks of it, that should
be:
    mdadm -C /dev/md0 -l 6 -e 1.2 -n 8 -c 4096 /dev/sdf1 /dev/sdd1 \
        missing missing /dev/sdi1 /dev/sdc1 /dev/sdb1 missing

One of those "missing" values should be replaced with the drive that
originally was in that slot, but you've not provided that information.
The output from dmesg should show which drives failed when, and where
they were in the array. If your rebuild was using the drives in the same
order as they were before the first failure then any drive will be okay
to use as they should all have the correct information (though you'd be
better avoiding the one with the read error), otherwise you'll have to
use the last one that failed.

Of course, the easiest option would be to start from scratch, test all
the drives, create a new array, and restore the data from backup. I'm
guessing you don't have a backup though.

Good luck,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Degraded Raid6
  2012-12-12 20:49 ` Degraded Raid6 Bernd Waage
  2012-12-13  8:54   ` Mikael Abrahamsson
  2012-12-13  9:52   ` Robin Hill
@ 2012-12-13 18:51   ` Roy Sigurd Karlsbakk
  2012-12-13 19:17     ` EJ Vincent
  2 siblings, 1 reply; 6+ messages in thread
From: Roy Sigurd Karlsbakk @ 2012-12-13 18:51 UTC (permalink / raw)
  To: Bernd Waage; +Cc: linux-raid

> I run an 8-disk raid 6 on which sporadically 2 drives dropped out,
> that I could just re-add when I zeroed the zuperblock beforehand.
> Recently, upon re-adding those 2 drives (after the zero-superblock) a
> third drive dropped out after 5-10 minutes of syncing. I then did a
> zero-superblock on the third drive and tried to re-add it - which
> failed.
...
> 10 8 113 - spare /dev/sdh1
> 11 8 1 - spare /dev/sda1
> 12 8 65 - spare /dev/sde1

If you had two drives failing, why didn't you let these spares take over the job?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Degraded Raid6
  2012-12-13 18:51   ` Roy Sigurd Karlsbakk
@ 2012-12-13 19:17     ` EJ Vincent
  2012-12-13 21:30       ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 6+ messages in thread
From: EJ Vincent @ 2012-12-13 19:17 UTC (permalink / raw)
  Cc: linux-raid

On 12/13/2012 1:51 PM, Roy Sigurd Karlsbakk wrote:
>> I run an 8-disk raid 6 on which sporadically 2 drives dropped out,
>> that I could just re-add when I zeroed the zuperblock beforehand.
>> Recently, upon re-adding those 2 drives (after the zero-superblock) a
>> third drive dropped out after 5-10 minutes of syncing. I then did a
>> zero-superblock on the third drive and tried to re-add it - which
>> failed.
> ...
>> 10 8 113 - spare /dev/sdh1
>> 11 8 1 - spare /dev/sda1
>> 12 8 65 - spare /dev/sde1
> If you had two drives failing, why didn't you let these spares take over the job?

Roy,

AFAIK, they were mistakenly labeled as spares, *after* the failure 
occurred.  sdh, sda, and sde were original members of his array.

-EJ


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Degraded Raid6
  2012-12-13 19:17     ` EJ Vincent
@ 2012-12-13 21:30       ` Roy Sigurd Karlsbakk
  0 siblings, 0 replies; 6+ messages in thread
From: Roy Sigurd Karlsbakk @ 2012-12-13 21:30 UTC (permalink / raw)
  To: EJ Vincent; +Cc: linux-raid

> AFAIK, they were mistakenly labeled as spares, *after* the failure
> occurred. sdh, sda, and sde were original members of his array.

sda was possibly the boot device, I still would not have unplugged a third device from a broken raid-6

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-12-13 21:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <S1754985Ab2LLT4x/20121212195656Z+425@vger.kernel.org>
2012-12-12 20:49 ` Degraded Raid6 Bernd Waage
2012-12-13  8:54   ` Mikael Abrahamsson
2012-12-13  9:52   ` Robin Hill
2012-12-13 18:51   ` Roy Sigurd Karlsbakk
2012-12-13 19:17     ` EJ Vincent
2012-12-13 21:30       ` Roy Sigurd Karlsbakk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).