From mboxrd@z Thu Jan 1 00:00:00 1970 From: NeilBrown Subject: Re: Likely forced assemby with wrong disk during raid5 grow. Recoverable? Date: Wed, 23 Feb 2011 12:53:38 +1100 Message-ID: <20110223125338.2179dd78@notabene.brown> References: <20110220162509.2eb85a03@notabene.brown> <20110221115303.4862e093@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Claude Nobs Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids On Wed, 23 Feb 2011 01:56:13 +0100 Claude Nobs w= rote: > bernstein@server:~/mdadm$ sudo ./mdadm -Afvv /dev/md2 /dev/sda1 > /dev/md0 /dev/md1 /dev/sdc1 > mdadm: looking for devices for /dev/md2 > mdadm: /dev/sda1 is identified as a member of /dev/md2, slot 4. > mdadm: /dev/md0 is identified as a member of /dev/md2, slot 3. > mdadm: /dev/md1 is identified as a member of /dev/md2, slot 2. > mdadm: /dev/sdc1 is identified as a member of /dev/md2, slot 0. > mdadm: forcing event count in /dev/md1(2) from 133603 upto 133609 This is normal - mdadm is just letting you know that it is including in= the=20 array a device that looks a bit old - we expected this. > mdadm: Cannot open /dev/sdc1: Device or resource busy This is odd. I cannot explain this at all. When this message is print= ed mdadm should give up and not continue. Yet it seems that it did conti= nue because the array is started and is reshaping. > bernstein@server:~/mdadm$ cat /proc/mdstat > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > [raid4] [raid10] > md2 : active raid5 md1[3] md0[4] sda1[5] sdc1[0] > =A0=A0=A0=A0=A0 2930281920 blocks super 1.2 level 5, 64k chunk, algor= ithm 2 [5/4] [U_UUU] > =A0=A0=A0=A0=A0 [=3D=3D>..................]=A0 reshape =3D 12.8% (125= 839952/976760640) > finish=3D825.1min speed=3D17186K/sec This looks OK. 125839952 corresponds to a "reshape Pos'n" of=20 503359808 which is slightly after where we would expect it to start, wh= ich is what we would expect. There won't be any info in the logs to tell us exactly where it started= , which is a shame, but it probably started at the right place. >=20 > this i not strictly a raid/mdadm question, but do you know a simple > way to ckeck everything went ok? i think that an e2fsck (ext4 fs) and > checksumming some random files located behind the interruption point > should verify all went ok. plus just to be sure i'd like to check > files located at the interruption point. is the offset to the > interruption point into the md device simply the reshape pos'n (e.g. > 502815488K) ? No - just the things you suggest. The Reshape pos'n is the address in the array where reshape was up to. You could try using 'debugfs' to have a look at the context of those bl= ocks. Remember to divide this number by 4 to get an ext4fs block number (assu= ming 4K blocks). Use: testb BLOCKNUMBER COUNT to see if the blocks were even allocated. Then icheck BLOCKNUM on a few of the blocks to see what inode was using them. Then ncheck INODE to find a path to that inode number. =46eel free to report your results - particularly if you find anything = helpful. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html