* Degraded Raid6 [not found] <S1754985Ab2LLT4x/20121212195656Z+425@vger.kernel.org> @ 2012-12-12 20:49 ` Bernd Waage 2012-12-13 8:54 ` Mikael Abrahamsson ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Bernd Waage @ 2012-12-12 20:49 UTC (permalink / raw) To: linux-raid Hello all, I run an 8-disk raid 6 on which sporadically 2 drives dropped out, that I could just re-add when I zeroed the zuperblock beforehand. Recently, upon re-adding those 2 drives (after the zero-superblock) a third drive dropped out after 5-10 minutes of syncing. I then did a zero-superblock on the third drive and tried to re-add it - which failed. I'm pretty much at my wits' end and stumbled upon this list. Perhaps someone of you guys can help me out. I'm running an ubuntu 12.04 box with kernel 3.3.8, so I should not be affected by the kernel-bug that popped up some time ago. I append the output of mdadm --detail as well as mdadm --examine... berndman@berndman:~$ sudo mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Thu Aug 4 21:32:45 2011 Raid Level : raid6 Used Dev Size : 1953509376 (1863.01 GiB 2000.39 GB) Raid Devices : 8 Total Devices : 8 Persistence : Superblock is persistent Update Time : Fri Nov 30 07:24:57 2012 State : active, FAILED, Not Started Active Devices : 5 Working Devices : 8 Failed Devices : 0 Spare Devices : 3 Layout : left-symmetric Chunk Size : 4096K Name : berndman-System:0 UUID : ef105344:6158fb94:c37cc33b:00ac2d04 Events : 25340 Number Major Minor RaidDevice State 0 8 81 0 active sync /dev/sdf1 9 8 49 1 active sync /dev/sdd1 2 0 0 2 removed 3 0 0 3 removed 4 8 129 4 active sync /dev/sdi1 8 8 33 5 active sync /dev/sdc1 6 8 17 6 active sync /dev/sdb1 7 0 0 7 removed 10 8 113 - spare /dev/sdh1 11 8 1 - spare /dev/sda1 12 8 65 - spare /dev/sde1 berndman@berndman:~$ * Documentation: https://help.ubuntu.com/ Last login: Wed Dec 12 21:16:35 2012 from berndman-pc.local berndman@berndman:~$ sudo mdadm --examine /dev/sdf1 /dev/sdd1 /dev/sdi1 /dev/sdc1 /dev/sdb1 /dev/sdh1 /dev/sda1 /dev/sde1 [sudo] password for berndman: /dev/sdf1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04 Name : berndman-System:0 Creation Time : Thu Aug 4 21:32:45 2011 Raid Level : raid6 Raid Devices : 8 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB) Array Size : 23442112512 (11178.07 GiB 12002.36 GB) Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : 4792cd00:92d53de8:4d2d7438:dc86e0fd Update Time : Fri Nov 30 07:24:57 2012 Checksum : dc7a3261 - correct Events : 25340 Layout : left-symmetric Chunk Size : 4096K Device Role : Active device 0 Array State : AA..AAA. ('A' == active, '.' == missing) /dev/sdd1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04 Name : berndman-System:0 Creation Time : Thu Aug 4 21:32:45 2011 Raid Level : raid6 Raid Devices : 8 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB) Array Size : 23442112512 (11178.07 GiB 12002.36 GB) Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : 6670d0fb:447d937f:be88f731:f6161042 Update Time : Fri Nov 30 07:24:57 2012 Checksum : ac85a3c6 - correct Events : 25340 Layout : left-symmetric Chunk Size : 4096K Device Role : Active device 1 Array State : AA..AAA. ('A' == active, '.' == missing) /dev/sdi1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04 Name : berndman-System:0 Creation Time : Thu Aug 4 21:32:45 2011 Raid Level : raid6 Raid Devices : 8 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB) Array Size : 23442112512 (11178.07 GiB 12002.36 GB) Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : 67054bcd:7e9c5380:452a8756:14dd629e Update Time : Fri Nov 30 07:24:57 2012 Checksum : ffa2bfa1 - correct Events : 25340 Layout : left-symmetric Chunk Size : 4096K Device Role : Active device 4 Array State : AA..AAA. ('A' == active, '.' == missing) /dev/sdc1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04 Name : berndman-System:0 Creation Time : Thu Aug 4 21:32:45 2011 Raid Level : raid6 Raid Devices : 8 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB) Array Size : 23442112512 (11178.07 GiB 12002.36 GB) Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : 8722a93e:a2397924:1d28b78b:e786004e Update Time : Fri Nov 30 07:24:57 2012 Checksum : f9f42193 - correct Events : 25340 Layout : left-symmetric Chunk Size : 4096K Device Role : Active device 5 Array State : AA..AAA. ('A' == active, '.' == missing) /dev/sdb1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04 Name : berndman-System:0 Creation Time : Thu Aug 4 21:32:45 2011 Raid Level : raid6 Raid Devices : 8 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB) Array Size : 23442112512 (11178.07 GiB 12002.36 GB) Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : b1426d37:1ebaa27c:f1eea681:1a55b574 Update Time : Fri Nov 30 07:24:57 2012 Checksum : 678657ff - correct Events : 25340 Layout : left-symmetric Chunk Size : 4096K Device Role : Active device 6 Array State : AA..AAA. ('A' == active, '.' == missing) /dev/sdh1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04 Name : berndman-System:0 Creation Time : Thu Aug 4 21:32:45 2011 Raid Level : raid6 Raid Devices : 8 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB) Array Size : 23442112512 (11178.07 GiB 12002.36 GB) Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : b80ab95e:7271c8a4:87980189:25169231 Update Time : Fri Nov 30 07:24:57 2012 Checksum : 7b2ede44 - correct Events : 0 Layout : left-symmetric Chunk Size : 4096K Device Role : spare Array State : AA..AAA. ('A' == active, '.' == missing) /dev/sda1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04 Name : berndman-System:0 Creation Time : Thu Aug 4 21:32:45 2011 Raid Level : raid6 Raid Devices : 8 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB) Array Size : 23442112512 (11178.07 GiB 12002.36 GB) Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : a002afcf:af02cbfc:fa09bff8:252116c7 Update Time : Fri Nov 30 07:24:57 2012 Checksum : 496946da - correct Events : 25340 Layout : left-symmetric Chunk Size : 4096K Device Role : spare Array State : AA..AAA. ('A' == active, '.' == missing) /dev/sde1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04 Name : berndman-System:0 Creation Time : Thu Aug 4 21:32:45 2011 Raid Level : raid6 Raid Devices : 8 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB) Array Size : 23442112512 (11178.07 GiB 12002.36 GB) Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : f3ff2451:d3c11122:c1e8a54a:0481d299 Update Time : Fri Nov 30 07:24:57 2012 Checksum : 14c941f6 - correct Events : 25340 Layout : left-symmetric Chunk Size : 4096K Device Role : spare Array State : AA..AAA. ('A' == active, '.' == missing) regards, Bernd ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Degraded Raid6 2012-12-12 20:49 ` Degraded Raid6 Bernd Waage @ 2012-12-13 8:54 ` Mikael Abrahamsson 2012-12-13 9:52 ` Robin Hill 2012-12-13 18:51 ` Roy Sigurd Karlsbakk 2 siblings, 0 replies; 6+ messages in thread From: Mikael Abrahamsson @ 2012-12-13 8:54 UTC (permalink / raw) To: Bernd Waage; +Cc: linux-raid On Wed, 12 Dec 2012, Bernd Waage wrote: > I run an 8-disk raid 6 on which sporadically 2 drives dropped out, that > I could just re-add when I zeroed the zuperblock beforehand. Recently, > upon re-adding those 2 drives (after the zero-superblock) a third drive > dropped out after 5-10 minutes of syncing. I then did a zero-superblock > on the third drive and tried to re-add it - which failed. At this point, your only option is to re-create the raid superblocks via the --create command. Since you took your RAID6 (which can handle 2 drives going bad) with 2 bad drives, and then zero:ed the superblock on a third drive, you now have a non-working RAID6. What did you expect to happen when you zero:ed the third drive? I would imagine that the third drive dropped out due to read error? Why did the other two drives drop out? Anyhow, you now have two options. You can re-create the raid6 with two missing drives, meaning you still have the read error somewhere (if that's what happened), and try to work around that (perhaps dd_rescue the drive to a non-defective drive, losing the information on the bad blocks), or you can decide that very little has changed since the first two drives dropped out, and use one of those instead of the drive with the read errors. Please read up carefully regarding data mdadm versions differing with default data offset, chunk size etc. Recreating with --create --assume-clean is a big dangerous hammer that might wipe all your data. Mess up the order of the drives and forget --assume-clean or write to the array in this state, and you might destroy things. If you value your data, be patient and make sure you understand all the ramifications of what you're doing. It might be a good thing to back up all drives before continuing (if you have the economic possibility of doing so). -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Degraded Raid6 2012-12-12 20:49 ` Degraded Raid6 Bernd Waage 2012-12-13 8:54 ` Mikael Abrahamsson @ 2012-12-13 9:52 ` Robin Hill 2012-12-13 18:51 ` Roy Sigurd Karlsbakk 2 siblings, 0 replies; 6+ messages in thread From: Robin Hill @ 2012-12-13 9:52 UTC (permalink / raw) To: Bernd Waage; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2958 bytes --] On Wed Dec 12, 2012 at 09:49:07PM +0100, Bernd Waage wrote: > Hello all, > > I run an 8-disk raid 6 on which sporadically 2 drives dropped out, > that I could just re-add when I zeroed the zuperblock beforehand. > Recently, upon re-adding those 2 drives (after the zero-superblock) a > third drive dropped out after 5-10 minutes of syncing. I then did a > zero-superblock on the third drive and tried to re-add it - which > failed. > Firstly, drives sporadically dropping out of the array should _never_ just be ignored. You have a problem with your setup which needs fixing. If the drives are actually okay (run SMART and full badblocks tests on them) then it's probably a controller issue. I used to have a similar issue on one of my servers and fixed it by moving the drives off the onboard SATA controller and onto a proper SAS/SATA controller card. Alternately, it may be the cables, power supply, or input power fluctuations. > I'm pretty much at my wits' end and stumbled upon this list. Perhaps > someone of you guys can help me out. I'm running an ubuntu 12.04 box > with kernel 3.3.8, so I should not be affected by the kernel-bug that > popped up some time ago. > > I append the output of mdadm --detail as well as mdadm --examine... > They're all using the same data offset anyway, which is good. You do need to check the mdadm version though as versions 3.2.4 and above use a different data offset (as do versions prior to 3.0). I'd also recommend checking the drives before proceeding - full SMART tests and read-only badblocks tests on each drive should find any issues (if there are any then you'll need to get replacements and clone the old ones). You'll then need to recreate the array, using exactly the same parameters as for the original array. From the looks of it, that should be: mdadm -C /dev/md0 -l 6 -e 1.2 -n 8 -c 4096 /dev/sdf1 /dev/sdd1 \ missing missing /dev/sdi1 /dev/sdc1 /dev/sdb1 missing One of those "missing" values should be replaced with the drive that originally was in that slot, but you've not provided that information. The output from dmesg should show which drives failed when, and where they were in the array. If your rebuild was using the drives in the same order as they were before the first failure then any drive will be okay to use as they should all have the correct information (though you'd be better avoiding the one with the read error), otherwise you'll have to use the last one that failed. Of course, the easiest option would be to start from scratch, test all the drives, create a new array, and restore the data from backup. I'm guessing you don't have a backup though. Good luck, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Degraded Raid6 2012-12-12 20:49 ` Degraded Raid6 Bernd Waage 2012-12-13 8:54 ` Mikael Abrahamsson 2012-12-13 9:52 ` Robin Hill @ 2012-12-13 18:51 ` Roy Sigurd Karlsbakk 2012-12-13 19:17 ` EJ Vincent 2 siblings, 1 reply; 6+ messages in thread From: Roy Sigurd Karlsbakk @ 2012-12-13 18:51 UTC (permalink / raw) To: Bernd Waage; +Cc: linux-raid > I run an 8-disk raid 6 on which sporadically 2 drives dropped out, > that I could just re-add when I zeroed the zuperblock beforehand. > Recently, upon re-adding those 2 drives (after the zero-superblock) a > third drive dropped out after 5-10 minutes of syncing. I then did a > zero-superblock on the third drive and tried to re-add it - which > failed. ... > 10 8 113 - spare /dev/sdh1 > 11 8 1 - spare /dev/sda1 > 12 8 65 - spare /dev/sde1 If you had two drives failing, why didn't you let these spares take over the job? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 98013356 roy@karlsbakk.net http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Degraded Raid6 2012-12-13 18:51 ` Roy Sigurd Karlsbakk @ 2012-12-13 19:17 ` EJ Vincent 2012-12-13 21:30 ` Roy Sigurd Karlsbakk 0 siblings, 1 reply; 6+ messages in thread From: EJ Vincent @ 2012-12-13 19:17 UTC (permalink / raw) Cc: linux-raid On 12/13/2012 1:51 PM, Roy Sigurd Karlsbakk wrote: >> I run an 8-disk raid 6 on which sporadically 2 drives dropped out, >> that I could just re-add when I zeroed the zuperblock beforehand. >> Recently, upon re-adding those 2 drives (after the zero-superblock) a >> third drive dropped out after 5-10 minutes of syncing. I then did a >> zero-superblock on the third drive and tried to re-add it - which >> failed. > ... >> 10 8 113 - spare /dev/sdh1 >> 11 8 1 - spare /dev/sda1 >> 12 8 65 - spare /dev/sde1 > If you had two drives failing, why didn't you let these spares take over the job? Roy, AFAIK, they were mistakenly labeled as spares, *after* the failure occurred. sdh, sda, and sde were original members of his array. -EJ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Degraded Raid6 2012-12-13 19:17 ` EJ Vincent @ 2012-12-13 21:30 ` Roy Sigurd Karlsbakk 0 siblings, 0 replies; 6+ messages in thread From: Roy Sigurd Karlsbakk @ 2012-12-13 21:30 UTC (permalink / raw) To: EJ Vincent; +Cc: linux-raid > AFAIK, they were mistakenly labeled as spares, *after* the failure > occurred. sdh, sda, and sde were original members of his array. sda was possibly the boot device, I still would not have unplugged a third device from a broken raid-6 Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 98013356 roy@karlsbakk.net http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-12-13 21:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <S1754985Ab2LLT4x/20121212195656Z+425@vger.kernel.org>
2012-12-12 20:49 ` Degraded Raid6 Bernd Waage
2012-12-13 8:54 ` Mikael Abrahamsson
2012-12-13 9:52 ` Robin Hill
2012-12-13 18:51 ` Roy Sigurd Karlsbakk
2012-12-13 19:17 ` EJ Vincent
2012-12-13 21:30 ` Roy Sigurd Karlsbakk
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).