From: Clement Parisot <clement.parisot@inria.fr>
To: linux-raid@vger.kernel.org
Subject: Reconstruct a RAID 6 that has failed in a non typical manner
Date: Thu, 29 Oct 2015 16:59:41 +0100 (CET) [thread overview]
Message-ID: <1874721715.14008052.1446134381481.JavaMail.zimbra@inria.fr> (raw)
In-Reply-To: <404650428.13997384.1446132658661.JavaMail.zimbra@inria.fr>
Hi everyone,
we've got a problem with our old RAID 6.
root@ftalc2.nancy.grid5000.fr(physical):~# uname -a
Linux ftalc2.nancy.grid5000.fr 2.6.32-5-amd64 #1 SMP Mon Sep 23 22:14:43 UTC 2013 x86_64 GNU/Linux
root@ftalc2.nancy.grid5000.fr(physical):~# cat /etc/debian_version
6.0.8
root@ftalc2.nancy.grid5000.fr(physical):~# mdadm -V
mdadm - v3.1.4 - 31st August 2010
After an electrical maintenance, 2 of our HDD came in fail state. An alert was sent that said everything was reconstructing.
g5kadmin@ftalc2.nancy.grid5000.fr(physical):~$ cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md2 : active raid6 sda[0] sdp[15] sdo[14] sdn[13] sdm[12] sdl[11] sdk[18] sdj[9] sdi[8] sdh[16] sdg[6] sdf[5] sde[4] sdd[17] sdc[2] sdb[1](F)
13666978304 blocks super 1.2 level 6, 128k chunk, algorithm 2 [16/15] [U_UUUUUUUUUUUUUU]
[>....................] resync = 0.0% (916936/976212736) finish=16851.9min speed=964K/sec
md1 : active raid1 sdq2[0] sdr2[2]
312276856 blocks super 1.2 [2/2] [UU]
[===>.................] resync = 18.4% (57566208/312276856) finish=83.2min speed=50956K/sec
md0 : active raid1 sdq1[0] sdr1[2]
291828 blocks super 1.2 [2/2] [UU]
unused devices: <none>
md1 reconstruction works but md2 failed as a 3rd HDD seems to be broked. A new disk has been successfully added to replace a failed one.
All of the disks of md2 changed to Spare state. We rebooted the server but it was worse.
mdadm --detail command show that 13 disks left on the array and 3 are removed.
/dev/md2:
Version : 1.2
Creation Time : Tue Oct 2 16:28:23 2012
Raid Level : raid6
Used Dev Size : 976212736 (930.99 GiB 999.64 GB)
Raid Devices : 16
Total Devices : 13
Persistence : Superblock is persistent
Update Time : Wed Oct 28 13:46:13 2015
State : active, FAILED, Not Started
Active Devices : 13
Working Devices : 13
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 128K
Name : ftalc2.nancy.grid5000.fr:2 (local to host ftalc2.nancy.grid5000.fr)
UUID : 2d0b91e8:a0b10f4c:3fa285f9:3198a918
Events : 5834052
Number Major Minor RaidDevice State
0 0 0 0 removed
1 0 0 1 removed
2 8 16 2 active sync /dev/sdb
17 8 32 3 active sync /dev/sdc
4 8 48 4 active sync /dev/sdd
5 8 64 5 active sync /dev/sde
6 0 0 6 removed
16 8 96 7 active sync /dev/sdg
8 8 112 8 active sync /dev/sdh
9 8 128 9 active sync /dev/sdi
18 8 144 10 active sync /dev/sdj
11 8 160 11 active sync /dev/sdk
13 8 192 13 active sync /dev/sdm
14 8 208 14 active sync /dev/sdn
As you can see, RAID is in "active, FAILED, Not Started" State. We tried to add the new disk, re-add the previously removed disks as they appears to have no errors.
2/3 of the disks should still contains the datas. We want to recover it.
But there is a problem, devices /dev/sda and /dev/sdf can't be re-added:
mdadm: failed to add /dev/sda to /dev/md/2: Device or resource busy
mdadm: failed to add /dev/sdf to /dev/md/2: Device or resource busy
mdadm: /dev/md/2 assembled from 13 drives and 1 spare - not enough to start the array.
I tried procedure on RAID_Recovery wiki
mdadm --assemble --force /dev/md2 /dev/sda /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp
but it failed.
mdadm: failed to add /dev/sdg to /dev/md2: Device or resource busy
mdadm: failed to RUN_ARRAY /dev/md2: Input/output error
mdadm: Not enough devices to start the array.
Any help or tips on how to diagnose better the situation or solve it would be higly appreciated :-)
Thanks in advance,
Best regards,
Clément and Marc
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next parent reply other threads:[~2015-10-29 15:59 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <404650428.13997384.1446132658661.JavaMail.zimbra@inria.fr>
2015-10-29 15:59 ` Clement Parisot [this message]
2015-10-30 18:31 ` Reconstruct a RAID 6 that has failed in a non typical manner Phil Turmel
2015-11-05 10:35 ` Clement Parisot
2015-11-05 13:34 ` Phil Turmel
2015-11-17 12:30 ` Marc Pinhede
2015-11-17 13:25 ` Phil Turmel
2015-12-21 3:40 ` NeilBrown
2015-12-21 12:20 ` Phil Turmel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1874721715.14008052.1446134381481.JavaMail.zimbra@inria.fr \
--to=clement.parisot@inria.fr \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).