From mboxrd@z Thu Jan 1 00:00:00 1970 From: NeilBrown Subject: Re: RAID6 fails to assemble after unclean shutdown Date: Wed, 25 Apr 2012 21:01:45 +1000 Message-ID: <20120425210145.73b1e51b@notabene.brown> References: <20120425103536.GA9978@nsrc.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/18m7m2Q/7.aaTFiHBO.PhAO"; protocol="application/pgp-signature" Return-path: In-Reply-To: <20120425103536.GA9978@nsrc.org> Sender: linux-raid-owner@vger.kernel.org To: Brian Candler Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids --Sig_/18m7m2Q/7.aaTFiHBO.PhAO Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Wed, 25 Apr 2012 11:35:36 +0100 Brian Candler wrot= e: > I have a storage box (currently under test) which has two 12-drive RAID6 > arrays, /dev/md/data1 and /dev/md/data2. >=20 > The box crashed for an unrelated reason, and when I brought it back up, o= nly > one of the arrays assembled: >=20 > root@storage1:~# cat /proc/mdstat > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [r= aid4] [raid10]=20 > md126 : active raid6 sdj[8] sdk[9] sdd[2] sde[3] sdi[7] sdm[11] sdg[5] = sdc[1] sdb[0] sdl[10] sdh[6] sdf[4] > 29302650880 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [1= 2/12] [UUUUUUUUUUUU] > =20 > md127 : inactive sdq[3](S) sdx[10](S) sdu[6](S) sdt[5](S) sds[4](S) sdv= [8](S) sdp[2](S) sdy[11](S) sdo[1](S) sdn[0](S) sdw[9](S) sdr[7](S) > 35163186720 blocks super 1.2 > =20 > unused devices: >=20 > So it looks like 12 of the disks have all become spares (S)! The '(S) is a bit misleading there. When the array is 'inactive', everythi= ng claims to be spare. Once the array is actually started it all would become more sensible. >=20 > An attempt to manually assemble the array failed: >=20 > root@storage1:~# mdadm --stop /dev/md127 > mdadm: stopped /dev/md127 > root@storage1:~# mdadm --assemble /dev/md/disk2 /dev/sd{n..y} > mdadm: /dev/md/disk2 assembled from 4 drives - not enough to start the = array. Adding "--verbose" here would help a lot. Possibly adding "--force" would make it all work. >=20 > Since this is currently under test system I just forcibly recreated the > array, but I'm a bit worried about how I would handle this problem when I= go > into production. >=20 > Here is how I recreated the array: >=20 > root@storage1:~# mdadm --create /dev/md/disk2 -n 12 -c 1024 -l raid6 /d= ev/sd{n..y} > mdadm: /dev/sdn appears to be part of a raid array: > level=3Draid6 devices=3D12 ctime=3DMon Mar 19 11:52:55 2012 > mdadm: /dev/sdo appears to be part of a raid array: > level=3Draid6 devices=3D12 ctime=3DMon Mar 19 11:52:55 2012 > mdadm: /dev/sdp appears to be part of a raid array: > level=3Draid6 devices=3D12 ctime=3DMon Mar 19 11:52:55 2012 > mdadm: /dev/sdq appears to be part of a raid array: > level=3Draid6 devices=3D12 ctime=3DMon Mar 19 11:52:55 2012 > mdadm: /dev/sdr appears to be part of a raid array: > level=3Draid6 devices=3D12 ctime=3DMon Mar 19 11:52:55 2012 > mdadm: /dev/sds appears to be part of a raid array: > level=3Draid6 devices=3D12 ctime=3DMon Mar 19 11:52:55 2012 > mdadm: /dev/sdt appears to be part of a raid array: > level=3Draid6 devices=3D12 ctime=3DMon Mar 19 11:52:55 2012 > mdadm: /dev/sdu appears to be part of a raid array: > level=3Draid6 devices=3D12 ctime=3DMon Mar 19 11:52:55 2012 > mdadm: /dev/sdv appears to be part of a raid array: > level=3Draid6 devices=3D12 ctime=3DMon Mar 19 11:52:55 2012 > mdadm: /dev/sdw appears to be part of a raid array: > level=3Draid6 devices=3D12 ctime=3DMon Mar 19 11:52:55 2012 > mdadm: /dev/sdx appears to be part of a raid array: > level=3Draid6 devices=3D12 ctime=3DMon Mar 19 11:52:55 2012 > mdadm: /dev/sdy appears to be part of a raid array: > # /etc/fstab: static file system information. > level=3Draid6 devices=3D12 ctime=3DMon Mar 19 11:52:55 2012 > Continue creating array? y > mdadm: Defaulting to version 1.2 metadata > mdadm: array /dev/md/disk2 started. >=20 > So it seems like all the disks were known to be part of an array, but mda= dm > was still unable to assemble more than 4. I would need to see the "--examine" output of each disk (Before you recreated) to be able to explain. >=20 > Platform: Ubuntu 11.10 server x86_64, stock kernel: >=20 > Linux storage1 3.0.0-16-server #29-Ubuntu SMP Tue Feb 14 13:08:12 UTC 2= 012 x86_64 x86_64 x86_64 GNU/Linux >=20 > Unfortunately I saw the same problem once before on a different test syst= em, > and also had to forcibly rebuild the array. >=20 > So my questions are: >=20 > * Have I built the RAID array correctly in the first place? Are there some > options I could have given to mdadm to make it more robust? Yes, you have built the array correctly. >=20 > * What should I have done when presented with an array which would not > assemble, to attempt to recover without losing data? --verbose and maybe --force >=20 > * Any ideas why mdadm only thought 4 of the drives were usable? Presumably something when wrong during shutdown. However without more details (--examine) I cannot guess. NeilBrown --Sig_/18m7m2Q/7.aaTFiHBO.PhAO Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQIVAwUBT5fZmTnsnt1WYoG5AQImbA/+MhDOG38z34ISKMMige/OqmEfQcVH9eAC 4vrl1rf+BWNYZpe1HYt18v3EKz7PMopto3vCe0w9JDn1ezV9AvhmdXYaElC8kM9f am8bdv6iVY3c3+yq9t8cuX0z77US8WAc/4QHYhUv1RUfXLEi5z9PDuMan99HXO2o X0WMnaidwX2aumFvLm0/5oM4NyQnd1Hna5q59Xehwy/peq083lcgdkTR581z0W89 sOpbZDdQQy/THnZbs8lVQHpD0uypvb2RzHWQo/zJ/1diKrmahj7qHwRFnrlQRi9H n+CCH20p4NK17wzIZ8VvkchBan/b4UsZykRBFVo5Bpq/TdBR5XDOztyqoOEy9IYO TsNoz/jSxze5WqnwWAjFglnPDk+Of6NtEfbCpTafJcrtpx2LFSoo2lU9j2awTcV4 4uzvoGJPZGG7R6Leer/i7+/6NdwlZBKJdiq+sNYD0MP0StsklVp0SbzXjg7UZn1J 4VKQ1VGC03TrMTi2soXaEjb2K3KlvoVJuA5Dalm3gQBqDu0PJ76VukmuMVLM4NZH T5lb4o5bd5JveoGkHySqBO43PFT3OckXCLBJ1/IOXH13hoKGa5pXCQ3ZftVL4myd WkNUugJ7UqIY2it3AKMnZHMCaoisKWj67KskVWyM3LAv1MlKnY0AkwOJmONcG755 h9XZNqNIEMI= =7IPK -----END PGP SIGNATURE----- --Sig_/18m7m2Q/7.aaTFiHBO.PhAO--