From mboxrd@z Thu Jan 1 00:00:00 1970 From: NeilBrown Subject: Re: degraded raid 6 (1 bad drive) showing up inactive, only spares Date: Sun, 10 Jun 2012 08:09:13 +1000 Message-ID: <20120610080913.445d3cea@notabene.brown> References: <20120607222933.14ec3cd5@notabene.brown> <20120608071412.5408516f@notabene.brown> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/WNRu6L_xjF2=F9gRfaP7mTG"; protocol="application/pgp-signature" Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Martin Ziler Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids --Sig_/WNRu6L_xjF2=F9gRfaP7mTG Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable On Sat, 9 Jun 2012 20:14:12 +0200 Martin Ziler wrote: >=20 > Am 07.06.2012 um 23:14 schrieb NeilBrown: >=20 > > On Thu, 7 Jun 2012 18:49:49 +0200 Martin Ziler > > wrote: > >=20 > >> 2012/6/7 NeilBrown > >>=20 > >>> On Thu, 7 Jun 2012 13:55:32 +0200 Martin Ziler < > >>> martin.ziler@googlemail.com> > >>> wrote: > >>>=20 > >>>> Hello everybody, > >>>>=20 > >>>> I am running a 9-disk raid6 without hot spares. I already had one dr= ive > >>> go bad, which I could replace and continue using the array without any > >>> degraded raid messages. Recently I had another drive going bad by the > >>> smart-info. As it wasn't quite dead I left the array as was without r= eally > >>> using it all that much waiting for a replacement drive I ordered. As I > >>> booted the machine up in order to replace the drive I was greeted by = an > >>> inactive array with all devices showing up as spares. > >>>>=20 > >>>> md0 : inactive sdh2[0](S) sdi2[7](S) sde2[6](S) sdd2[5](S) sdf2[1](S) > >>> sdg2[2](S) sdc1[9](S) sdb2[3](S) > >>>> 15579088439 blocks super 1.2 > >>>>=20 > >>>> mdadm --examine confirms that. I already searched the web quite a bit > >>> and found this mailing list. Maybe someone in here can give me some i= nput. > >>> Normally a degraded raid should still be active. So I am quite surpri= sed > >>> that my array with only one drive missing goes inactive. I appended t= he > >>> info mdadm --examine puts out for all the drives. However the first t= wo > >>> should probably suffice as only /dev/sdk differs from the rest. The f= aulty > >>> drive - sdk - is still recognized as a raid6 member, wheres all the o= thers > >>> show up as spares. With lots of bad sectors sdk isn't accessible anym= ore. > >>>=20 > >>> You must be running 3.2.1 or 3.3 (I think). > >>>=20 > >>> You've been bitten by a rather nasty bug. > >>>=20 > >>> You can get your data back, but it will require a bit of care, so don= 't > >>> rush > >>> it. > >>>=20 > >>> The metadata on almost all the devices have been seriously corrupted.= The > >>> only way to repair it is to recreate the array. > >>> Doing this just writes new metadata and assembles the array. It does= n't > >>> touch > >>> the data so if we get the --create command right, all your data will = be > >>> available again. > >>> If we get it wrong, you won't be able to see your data, but we can ea= sily > >>> stop > >>> the array and create again with different parameters until we get it = right. > >>>=20 > >>> First thing to do it to get a newer kernel. I would recommend the la= test > >>> in > >>> the 3.3.y series. > >>>=20 > >>> Then you need to: > >>> - make sure you have a version of mdadm which gets the data offset to= 1M > >>> (2048 sectors). I think 3.2.3 or earlier does that - don't upgrade = to > >>> 3.2.5. > >>> - find the chunk size - looks like it is 4M, as sdk2 isn't corrupt. > >>> - find the order of devices. This should be in your kernel logs in > >>> "RAID conf printout". Hopefully device names haven't changed. > >>>=20 > >>> Then (with new kernel running) > >>>=20 > >>> mdadm --create /dev/md0 -l6 -n9 -c 4M -e 1.2 /dev/sdb2 /dev/sdc2 > >>> /dev/sdd2 \ > >>> /dev/sde2 /dev/sdf2 /dev/sdg2 /dev/sdh2 /dev/sdi2 missing \ > >>> --assume-clean > >>>=20 > >>> Make double-sure you add that --assume-clean. > >>>=20 > >>> Note the last device is 'missing'. That corresponds to sdk2 (which we > >>> know is device 8 - the last of 9 (0..8)). It fails so it not part of= the > >>> array any more. The others I just guessed the order. You should try= to > >>> verify it before you proceed (see RAID conf printout in kernel logs). > >>>=20 > >>> After the 'create' use "mdadm -E" to look at one device and make sure > >>> the Data Offset, Avail Dev Size and Array Size are the same as we saw > >>> on sdk2. > >>> If it is, try "fsck -n /dev/md0". That assumes ext3 or ext4. If you = had > >>> something else on the array some other command might be needed. > >>>=20 > >>> If that looks bad, "mdadm -S /dev/md0" and try again with a different > >>> order. > >>> If it looks good, "echo check > /sys/block/md0/md/sync_action" and wa= tch > >>> "mismatch_cnt" in the same directory. If it says low (few hundred at > >>> most) > >>> all is good. If it goes up to thousands something is wrong - try ano= ther > >>> order. > >>>=20 > >>> Once you have the array working again, > >>> "echo repair > /sys/block/md0/md/sync_action" > >>> then add your new device to be rebuilt. > >>>=20 > >>> Good luck. > >>> Please ask if you are unsure about anything. > >>>=20 > >>> NeilBrown > >>>=20 > >>>=20 > >>=20 > >> Hello Neil, > >>=20 > >> thank you very much for this detailed input. My last reply didn't make= it > >> into the mailing list due to the format of my mail client (OSX mail). = My > >> kernel (Ubuntu) was 3.2.0 , I upgraded to 3.3.8. mdadm version was fin= e. > >>=20 > >> I searched the log files I got and was unable to find anything concern= ing > >> my array. Maybe that sorta stuff isn't logged in ubuntu. I did find so= me > >> mails concerning degraded raid that do not correlate with my current > >> breakage. I received the following 2 messages: > >>=20 > >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > >> [raid4] [raid10] > >> md0 : active (auto-read-only) raid6 sdi2[1] sdh2[0] sdg2[8] sdc1[9] sd= d2[5] > >> sdb2[3] sdf2[7] sde2[6] > >> 13586485248 blocks super 1.2 level 6, 4096k chunk, algorithm 2 [9= /8] > >> [UU_UUUUUU] > >>=20 > >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > >> [raid4] [raid10] > >> md0 : active (auto-read-only) raid6 sdj2[2] sdg2[8] sdd2[5] sde2[6] sd= b2[3] > >> sdf2[7] sdc1[9] > >> 13586485248 blocks super 1.2 level 6, 4096k chunk, algorithm 2 [9= /7] > >> [__UUUUUUU] > >>=20 > >> I conclude that my setup must have been sdh2 [0], sdi2 [1], sdj2 [2], = sdb2 > >> [3], sdd2 [5] , sde2 [6], sdf2 [7], sdg2 [8], sdc1 [9] > >=20 > > Unfortunately these number are not the roles of the device in the array= . They > > are the order in which the devices were added to the array. > > So 0-8 are very likely roles 0-8 in the array. '9' is then the first s= pare, > > and it stays as '9' even when it becomes active. So as there is no '4'= , it > > does look likely that 'sdc1' should come between 'sdb2' and 'sdd2'. > >=20 > > NeilBrown > >=20 > >=20 > >> sdc1 is the replacement for my first drive that went bad. It's somewhat > >> strange that it is now listed as device 9 and not 4, isn't it? I reckon > >> that I have to rebuild in that order, notwithstanding. > >>=20 > >> regards, > >> Martin > >=20 >=20 >=20 > Hello Neil, >=20 > I tracked the cables in my case and tried some permutations: >=20 > mdadm --create /dev/md0 -l6 -n9 -c 4M -e 1.2 /dev/sdh2 /dev/sdi2 /dev/sdj= 2 /dev/sdb2 /dev/sdc1 /dev/sdd2 /dev/sde2 /dev/sdf2 missing --assume-clean > mdadm --create /dev/md0 -l6 -n9 -c 4M -e 1.2 /dev/sdj2 /dev/sdb2 /dev/sdc= 1 /dev/sdd2 /dev/sde2 /dev/sdf2 missing /dev/sdh2 /dev/sdi2 --assume-clean > mdadm --create /dev/md0 -l6 -n9 -c 4M -e 1.2 /dev/sdj2 /dev/sdb2 /dev/s= dc1 /dev/sdd2 /dev/sde2 /dev/sdh2 missing /dev/sdf2 /dev/sdi2 --assume-clean > mdadm --create /dev/md0 -l6 -n9 -c 4M -e 1.2 /dev/sdj2 /dev/sdb2 /dev/s= dc1 /dev/sdd2 /dev/sde2 /dev/sdi2 missing /dev/sdf2 /dev/sdh2 --assume-clean > mdadm --create /dev/md0 -l6 -n9 -c 4M -e 1.2 /dev/sdi2 /dev/sdh2 /dev/sdj= 2 /dev/sdb2 /dev/sdc1 /dev/sdd2 /dev/sde2 /dev/sdf2 missing --assume-clean >=20 > The first ones did result in metadata that looked fine but the fsck-outpu= t did not look good at all: >=20 > e2fsck 1.42 (29-Nov-2011) > fsck.ext4: Superblock ung=FCltig versuche es mit Backup-Bl=F6cken... > fsck.ext4: Ung=FCltige magische Zahl im Superblock beim Versuch, /dev/md0= zu =F6ffnen >=20 > SuperBlock ist unlesbar bzw. beschreibt kein g=FCltiges ext2 > Dateisystem. Wenn Ger=E4t g=FCltig ist und ein ext2 > Dateisystem (kein swap oder ufs usw.) enth=E4lt, dann ist der SuperBlock > besch=E4digt, und sie k=F6nnten e2fsck mit einem anderen SuperBlock: > e2fsck -b 8193 >=20 > The last one resulted in this fsck output: >=20 > e2fsck 1.42 (29-Nov-2011) > fsck.ext4: Gruppen-Deskriptoren scheinen defekt zu sein... versuche es mi= t Backup-Bl=F6cken... > fsck.ext4: Ung=FCltige magische Zahl im Superblock when using the backup = blocks > fsck.ext4: es wird zum originalen Superblock zur=FCck gekehrt > fsck.ext4: Gruppen-Deskriptoren scheinen defekt zu sein... versuche es mi= t Backup-Bl=F6cken... > fsck.ext4: Ung=FCltige magische Zahl im Superblock when using the backup = blocks > fsck.ext4: es wird zum originalen Superblock zur=FCck gekehrt > Lesefehler - Block 3823364034 (Das Argument ist ung=FCltig). Ignoriere Fe= hler? nein >=20 > SuperBlock hat ein defektes Journal (Inode 8). > Bereinige? nein >=20 > fsck.ext4: Unzul=E4ssige Inodenummer w=E4hrend der Pr=FCfung des ext3-Jou= rnals f=FCr /dev/md0 >=20 > /dev/md0: ********** WARNUNG: Noch Fehler im Dateisystem ********** >=20 > If I interpret that correctly, the filesystem ext4 is now recognized. Do = you think I should now go on with echo check > /sys/block/md0/md/sync_actio= n? >=20 The "echo check ...." is read-only and so harmless - you can do it any time you like. To stop it if it is showing lots of mismatches just "echo idle" = to the same file. However that e2fsck output doesn't look good. It does find a superblock, but then when it goes to look for "group descriptors" they are bad. Also: "Read error - block 3823364034 (Invalid argument)." (from google-translate to English) suggests that the filesystem thinks the array = is bigger than it is. This probably suggests that the first device is the correct one, but other devices are still in the wrong order. I suggest some more permutations. It shouldn't be too hard to write a scri= pt to try them all... might take a little while though. The following script, if run with sh permute.sh --prefix "mdadm --create /dev/md0 -l6 -n9 -c 4M -e 1.2 --assu= me-clean" /dev/sdh2 /dev/sdi2 /dev/sdj2 /dev/sdb2 /dev/sdc1 /dev/sdd2 /dev/= sde2 /dev/sdf2 missing will output all possible "mdadm --create" commands with different permutations. Don't know if you want to try it or not. There are only 362880 possibilities :-) Can the 'echo' to 'eval', then add 'fsck -n /dev/= md0' and 'mdadm -S /dev/md0' and collect the output for examination the ne= xt morning. NeilBrown #!/bin/sh case $1 in --prefix ) prefix=3D$2 shift 2 ;; * ) prefix=3D esac if [ $# -eq 1 ] then echo $prefix $1 exit 0 fi early=3D while [ $# -ge 1 ] do a=3D$1 shift sh permute.sh --prefix "$prefix $a" $early $* early=3D"$early $a" done --Sig_/WNRu6L_xjF2=F9gRfaP7mTG Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQIVAwUBT9PJijnsnt1WYoG5AQJ1Og/9F5CBhSHg+1HAZDPQkkyou7nfh955YfRW reGmQh3k9d9CpVKxVY85nHz5kgPoGf/puUl+EuwkhRcfHoZ083NSBsSBV2cYYYUf 0eSUEi8+00mJscgypg/KbTDVS3O9MkxTKS5Ocxm0rtPXucJ4yl/funCbRK6zfw0b mP+xysIFyw7vAH5Rm2lPZezBw4nl+IhTMSF9QsPMQYQQonW+CHA01XF/RT5iqSlT scbfy8eZZnCyh1rOajPGFTGBKdWWrvQLC75Z7ZSGzYBTW8aAbhZwyuXoH0Dufsul Iyf2KIpbpsFARVoG+ulPkbRS1M6wvJfZ2zCPYQzBDkwAhZlVApPnfc4wAV1DdOrZ Y1t/d2FT5oT/CvTX+F168mcGPhlez6bFw6e+kecZ5q9MiS7QBU47BXYfJt0NhPS3 BTf+hThDqFpy7TmU7r75Gf+LrMEEkmJE4hLh0IdYDKVZ4btHAXVrpks+adYz+Mv3 uK4oeHp14f5tBcuJW0TJ7YR3Y5wDlc6y9X17APSBGAVVI2U+Bcjb0o9+oft2/JhG YU+rmE6DC16GK7v49mVO19jFtEYs3D4NBr7NR6dborDeTh/IzhcGEynQ48AzIWur xcgdNLUvn3ale+gNtTWScfbhATfX7xkUF5OYhGBe6LA0hCtvyD6vqJjGe4MurgJE xBhFk+LYH84= =5o+Y -----END PGP SIGNATURE----- --Sig_/WNRu6L_xjF2=F9gRfaP7mTG--