* Degraded Raid6
[not found] <S1754985Ab2LLT4x/20121212195656Z+425@vger.kernel.org>
@ 2012-12-12 20:49 ` Bernd Waage
2012-12-13 8:54 ` Mikael Abrahamsson
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Bernd Waage @ 2012-12-12 20:49 UTC (permalink / raw)
To: linux-raid
Hello all,
I run an 8-disk raid 6 on which sporadically 2 drives dropped out, that I could just re-add when I zeroed the zuperblock beforehand. Recently, upon re-adding those 2 drives (after the zero-superblock) a third drive dropped out after 5-10 minutes of syncing. I then did a zero-superblock on the third drive and tried to re-add it - which failed.
I'm pretty much at my wits' end and stumbled upon this list. Perhaps someone of you guys can help me out. I'm running an ubuntu 12.04 box with kernel 3.3.8, so I should not be affected by the kernel-bug that popped up some time ago.
I append the output of mdadm --detail as well as mdadm --examine...
berndman@berndman:~$ sudo mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Thu Aug 4 21:32:45 2011
Raid Level : raid6
Used Dev Size : 1953509376 (1863.01 GiB 2000.39 GB)
Raid Devices : 8
Total Devices : 8
Persistence : Superblock is persistent
Update Time : Fri Nov 30 07:24:57 2012
State : active, FAILED, Not Started
Active Devices : 5
Working Devices : 8
Failed Devices : 0
Spare Devices : 3
Layout : left-symmetric
Chunk Size : 4096K
Name : berndman-System:0
UUID : ef105344:6158fb94:c37cc33b:00ac2d04
Events : 25340
Number Major Minor RaidDevice State
0 8 81 0 active sync /dev/sdf1
9 8 49 1 active sync /dev/sdd1
2 0 0 2 removed
3 0 0 3 removed
4 8 129 4 active sync /dev/sdi1
8 8 33 5 active sync /dev/sdc1
6 8 17 6 active sync /dev/sdb1
7 0 0 7 removed
10 8 113 - spare /dev/sdh1
11 8 1 - spare /dev/sda1
12 8 65 - spare /dev/sde1
berndman@berndman:~$
* Documentation: https://help.ubuntu.com/
Last login: Wed Dec 12 21:16:35 2012 from berndman-pc.local
berndman@berndman:~$ sudo mdadm --examine /dev/sdf1 /dev/sdd1 /dev/sdi1 /dev/sdc1 /dev/sdb1 /dev/sdh1 /dev/sda1 /dev/sde1
[sudo] password for berndman:
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
Name : berndman-System:0
Creation Time : Thu Aug 4 21:32:45 2011
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 4792cd00:92d53de8:4d2d7438:dc86e0fd
Update Time : Fri Nov 30 07:24:57 2012
Checksum : dc7a3261 - correct
Events : 25340
Layout : left-symmetric
Chunk Size : 4096K
Device Role : Active device 0
Array State : AA..AAA. ('A' == active, '.' == missing)
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
Name : berndman-System:0
Creation Time : Thu Aug 4 21:32:45 2011
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 6670d0fb:447d937f:be88f731:f6161042
Update Time : Fri Nov 30 07:24:57 2012
Checksum : ac85a3c6 - correct
Events : 25340
Layout : left-symmetric
Chunk Size : 4096K
Device Role : Active device 1
Array State : AA..AAA. ('A' == active, '.' == missing)
/dev/sdi1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
Name : berndman-System:0
Creation Time : Thu Aug 4 21:32:45 2011
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 67054bcd:7e9c5380:452a8756:14dd629e
Update Time : Fri Nov 30 07:24:57 2012
Checksum : ffa2bfa1 - correct
Events : 25340
Layout : left-symmetric
Chunk Size : 4096K
Device Role : Active device 4
Array State : AA..AAA. ('A' == active, '.' == missing)
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
Name : berndman-System:0
Creation Time : Thu Aug 4 21:32:45 2011
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 8722a93e:a2397924:1d28b78b:e786004e
Update Time : Fri Nov 30 07:24:57 2012
Checksum : f9f42193 - correct
Events : 25340
Layout : left-symmetric
Chunk Size : 4096K
Device Role : Active device 5
Array State : AA..AAA. ('A' == active, '.' == missing)
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
Name : berndman-System:0
Creation Time : Thu Aug 4 21:32:45 2011
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : b1426d37:1ebaa27c:f1eea681:1a55b574
Update Time : Fri Nov 30 07:24:57 2012
Checksum : 678657ff - correct
Events : 25340
Layout : left-symmetric
Chunk Size : 4096K
Device Role : Active device 6
Array State : AA..AAA. ('A' == active, '.' == missing)
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
Name : berndman-System:0
Creation Time : Thu Aug 4 21:32:45 2011
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : b80ab95e:7271c8a4:87980189:25169231
Update Time : Fri Nov 30 07:24:57 2012
Checksum : 7b2ede44 - correct
Events : 0
Layout : left-symmetric
Chunk Size : 4096K
Device Role : spare
Array State : AA..AAA. ('A' == active, '.' == missing)
/dev/sda1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
Name : berndman-System:0
Creation Time : Thu Aug 4 21:32:45 2011
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : a002afcf:af02cbfc:fa09bff8:252116c7
Update Time : Fri Nov 30 07:24:57 2012
Checksum : 496946da - correct
Events : 25340
Layout : left-symmetric
Chunk Size : 4096K
Device Role : spare
Array State : AA..AAA. ('A' == active, '.' == missing)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ef105344:6158fb94:c37cc33b:00ac2d04
Name : berndman-System:0
Creation Time : Thu Aug 4 21:32:45 2011
Raid Level : raid6
Raid Devices : 8
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 23442112512 (11178.07 GiB 12002.36 GB)
Used Dev Size : 3907018752 (1863.01 GiB 2000.39 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : f3ff2451:d3c11122:c1e8a54a:0481d299
Update Time : Fri Nov 30 07:24:57 2012
Checksum : 14c941f6 - correct
Events : 25340
Layout : left-symmetric
Chunk Size : 4096K
Device Role : spare
Array State : AA..AAA. ('A' == active, '.' == missing)
regards,
Bernd
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Degraded Raid6
2012-12-12 20:49 ` Degraded Raid6 Bernd Waage
@ 2012-12-13 8:54 ` Mikael Abrahamsson
2012-12-13 9:52 ` Robin Hill
2012-12-13 18:51 ` Roy Sigurd Karlsbakk
2 siblings, 0 replies; 6+ messages in thread
From: Mikael Abrahamsson @ 2012-12-13 8:54 UTC (permalink / raw)
To: Bernd Waage; +Cc: linux-raid
On Wed, 12 Dec 2012, Bernd Waage wrote:
> I run an 8-disk raid 6 on which sporadically 2 drives dropped out, that
> I could just re-add when I zeroed the zuperblock beforehand. Recently,
> upon re-adding those 2 drives (after the zero-superblock) a third drive
> dropped out after 5-10 minutes of syncing. I then did a zero-superblock
> on the third drive and tried to re-add it - which failed.
At this point, your only option is to re-create the raid superblocks via
the --create command. Since you took your RAID6 (which can handle 2 drives
going bad) with 2 bad drives, and then zero:ed the superblock on a third
drive, you now have a non-working RAID6.
What did you expect to happen when you zero:ed the third drive?
I would imagine that the third drive dropped out due to read error? Why
did the other two drives drop out?
Anyhow, you now have two options. You can re-create the raid6 with two
missing drives, meaning you still have the read error somewhere (if that's
what happened), and try to work around that (perhaps dd_rescue the drive
to a non-defective drive, losing the information on the bad blocks), or
you can decide that very little has changed since the first two drives
dropped out, and use one of those instead of the drive with the read
errors.
Please read up carefully regarding data mdadm versions differing with
default data offset, chunk size etc. Recreating with --create
--assume-clean is a big dangerous hammer that might wipe all your data.
Mess up the order of the drives and forget --assume-clean or write to the
array in this state, and you might destroy things.
If you value your data, be patient and make sure you understand all the
ramifications of what you're doing. It might be a good thing to back up
all drives before continuing (if you have the economic possibility of
doing so).
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Degraded Raid6
2012-12-12 20:49 ` Degraded Raid6 Bernd Waage
2012-12-13 8:54 ` Mikael Abrahamsson
@ 2012-12-13 9:52 ` Robin Hill
2012-12-13 18:51 ` Roy Sigurd Karlsbakk
2 siblings, 0 replies; 6+ messages in thread
From: Robin Hill @ 2012-12-13 9:52 UTC (permalink / raw)
To: Bernd Waage; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2958 bytes --]
On Wed Dec 12, 2012 at 09:49:07PM +0100, Bernd Waage wrote:
> Hello all,
>
> I run an 8-disk raid 6 on which sporadically 2 drives dropped out,
> that I could just re-add when I zeroed the zuperblock beforehand.
> Recently, upon re-adding those 2 drives (after the zero-superblock) a
> third drive dropped out after 5-10 minutes of syncing. I then did a
> zero-superblock on the third drive and tried to re-add it - which
> failed.
>
Firstly, drives sporadically dropping out of the array should _never_
just be ignored. You have a problem with your setup which needs fixing.
If the drives are actually okay (run SMART and full badblocks tests on
them) then it's probably a controller issue. I used to have a similar
issue on one of my servers and fixed it by moving the drives off the
onboard SATA controller and onto a proper SAS/SATA controller card.
Alternately, it may be the cables, power supply, or input power
fluctuations.
> I'm pretty much at my wits' end and stumbled upon this list. Perhaps
> someone of you guys can help me out. I'm running an ubuntu 12.04 box
> with kernel 3.3.8, so I should not be affected by the kernel-bug that
> popped up some time ago.
>
> I append the output of mdadm --detail as well as mdadm --examine...
>
They're all using the same data offset anyway, which is good. You do
need to check the mdadm version though as versions 3.2.4 and above use a
different data offset (as do versions prior to 3.0).
I'd also recommend checking the drives before proceeding - full SMART
tests and read-only badblocks tests on each drive should find any
issues (if there are any then you'll need to get replacements and clone
the old ones).
You'll then need to recreate the array, using exactly the same
parameters as for the original array. From the looks of it, that should
be:
mdadm -C /dev/md0 -l 6 -e 1.2 -n 8 -c 4096 /dev/sdf1 /dev/sdd1 \
missing missing /dev/sdi1 /dev/sdc1 /dev/sdb1 missing
One of those "missing" values should be replaced with the drive that
originally was in that slot, but you've not provided that information.
The output from dmesg should show which drives failed when, and where
they were in the array. If your rebuild was using the drives in the same
order as they were before the first failure then any drive will be okay
to use as they should all have the correct information (though you'd be
better avoiding the one with the read error), otherwise you'll have to
use the last one that failed.
Of course, the easiest option would be to start from scratch, test all
the drives, create a new array, and restore the data from backup. I'm
guessing you don't have a backup though.
Good luck,
Robin
--
___
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Degraded Raid6
2012-12-12 20:49 ` Degraded Raid6 Bernd Waage
2012-12-13 8:54 ` Mikael Abrahamsson
2012-12-13 9:52 ` Robin Hill
@ 2012-12-13 18:51 ` Roy Sigurd Karlsbakk
2012-12-13 19:17 ` EJ Vincent
2 siblings, 1 reply; 6+ messages in thread
From: Roy Sigurd Karlsbakk @ 2012-12-13 18:51 UTC (permalink / raw)
To: Bernd Waage; +Cc: linux-raid
> I run an 8-disk raid 6 on which sporadically 2 drives dropped out,
> that I could just re-add when I zeroed the zuperblock beforehand.
> Recently, upon re-adding those 2 drives (after the zero-superblock) a
> third drive dropped out after 5-10 minutes of syncing. I then did a
> zero-superblock on the third drive and tried to re-add it - which
> failed.
...
> 10 8 113 - spare /dev/sdh1
> 11 8 1 - spare /dev/sda1
> 12 8 65 - spare /dev/sde1
If you had two drives failing, why didn't you let these spares take over the job?
Vennlige hilsener / Best regards
roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Degraded Raid6
2012-12-13 18:51 ` Roy Sigurd Karlsbakk
@ 2012-12-13 19:17 ` EJ Vincent
2012-12-13 21:30 ` Roy Sigurd Karlsbakk
0 siblings, 1 reply; 6+ messages in thread
From: EJ Vincent @ 2012-12-13 19:17 UTC (permalink / raw)
Cc: linux-raid
On 12/13/2012 1:51 PM, Roy Sigurd Karlsbakk wrote:
>> I run an 8-disk raid 6 on which sporadically 2 drives dropped out,
>> that I could just re-add when I zeroed the zuperblock beforehand.
>> Recently, upon re-adding those 2 drives (after the zero-superblock) a
>> third drive dropped out after 5-10 minutes of syncing. I then did a
>> zero-superblock on the third drive and tried to re-add it - which
>> failed.
> ...
>> 10 8 113 - spare /dev/sdh1
>> 11 8 1 - spare /dev/sda1
>> 12 8 65 - spare /dev/sde1
> If you had two drives failing, why didn't you let these spares take over the job?
Roy,
AFAIK, they were mistakenly labeled as spares, *after* the failure
occurred. sdh, sda, and sde were original members of his array.
-EJ
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Degraded Raid6
2012-12-13 19:17 ` EJ Vincent
@ 2012-12-13 21:30 ` Roy Sigurd Karlsbakk
0 siblings, 0 replies; 6+ messages in thread
From: Roy Sigurd Karlsbakk @ 2012-12-13 21:30 UTC (permalink / raw)
To: EJ Vincent; +Cc: linux-raid
> AFAIK, they were mistakenly labeled as spares, *after* the failure
> occurred. sdh, sda, and sde were original members of his array.
sda was possibly the boot device, I still would not have unplugged a third device from a broken raid-6
Vennlige hilsener / Best regards
roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-12-13 21:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <S1754985Ab2LLT4x/20121212195656Z+425@vger.kernel.org>
2012-12-12 20:49 ` Degraded Raid6 Bernd Waage
2012-12-13 8:54 ` Mikael Abrahamsson
2012-12-13 9:52 ` Robin Hill
2012-12-13 18:51 ` Roy Sigurd Karlsbakk
2012-12-13 19:17 ` EJ Vincent
2012-12-13 21:30 ` Roy Sigurd Karlsbakk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).