From mboxrd@z Thu Jan 1 00:00:00 1970 From: NeilBrown Subject: Re: Update to mdadm V3.2.5 => RAID starts to recover (reproducible) Date: Mon, 9 Sep 2013 12:39:48 +1000 Message-ID: <20130909123948.415c6c53@notabene.brown> References: <20130826155202.7a11dff5@notabene.brown> <20130902113534.34f434f3@notabene.brown> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/_3cURcN=CJ_kFp17FAUDJPx"; protocol="application/pgp-signature" Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Andreas Baer Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids --Sig_/_3cURcN=CJ_kFp17FAUDJPx Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Thu, 5 Sep 2013 17:22:26 +0200 Andreas Baer wrote: > On 9/2/13, NeilBrown wrote: > > On Thu, 29 Aug 2013 11:55:09 +0200 Andreas Baer > > wrote: > > > >> On 8/26/13, NeilBrown wrote: > >> > On Thu, 22 Aug 2013 15:20:06 +0200 Andreas Baer > >> > > >> > wrote: > >> > > >> >> Short description: > >> >> I've discovered a problem during re-assembly of a clean RAID. mdadm > >> >> throws one disk out because this disk apparently shows another disk= as > >> >> failed. After assembly, RAID starts to recover on existing spare di= sk. > >> >> > >> >> In detail: > >> >> 1. RAID-6 (Superblock V0.90.00) created with mdadm V2.6.4 and with 7 > >> >> active disks and 1 spare disk (disk size: 1 TB), fully synced and > >> >> clean. > >> >> 2. RAID-6 stopped and re-assembled with mdadm V3.2.5, but during th= at > >> >> one disk is thrown out. > >> >> > >> >> Manual assembly command for /dev/md0, relevant partitions are > >> >> /dev/sd[b-i]1: > >> >> # mdadm --assemble --scan -vvv > >> >> mdadm: looking for devices for /dev/md0 > >> >> mdadm: no RAID superblock on /dev/sdi > >> >> mdadm: no RAID superblock on /dev/sdh > >> >> mdadm: no RAID superblock on /dev/sdg > >> >> mdadm: no RAID superblock on /dev/sdf > >> >> mdadm: no RAID superblock on /dev/sde > >> >> mdadm: no RAID superblock on /dev/sdd > >> >> mdadm: no RAID superblock on /dev/sdc > >> >> mdadm: no RAID superblock on /dev/sdb > >> >> mdadm: no RAID superblock on /dev/sda1 > >> >> mdadm: no RAID superblock on /dev/sda > >> >> mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 7. > >> >> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 6. > >> >> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5. > >> >> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4. > >> >> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3. > >> >> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2. > >> >> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1. > >> >> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0. > >> >> mdadm: ignoring /dev/sdb1 as it reports /dev/sdi1 as failed > >> >> mdadm: no uptodate device for slot 0 of /dev/md0 > >> >> mdadm: added /dev/sdd1 to /dev/md0 as 2 > >> >> mdadm: added /dev/sde1 to /dev/md0 as 3 > >> >> mdadm: added /dev/sdf1 to /dev/md0 as 4 > >> >> mdadm: added /dev/sdg1 to /dev/md0 as 5 > >> >> mdadm: added /dev/sdh1 to /dev/md0 as 6 > >> >> mdadm: added /dev/sdi1 to /dev/md0 as 7 > >> >> mdadm: added /dev/sdc1 to /dev/md0 as 1 > >> >> mdadm: /dev/md0 has been started with 6 drives (out of 7) and 1 spa= re. > >> >> > >> >> I finally made a test by modifying mdadm V3.2.5 sources to not write > >> >> any data to any superblock and to simply exit() somewhere in the > >> >> middle of assembly process to be able to reproduce this behavior > >> >> without any RAID re-creation/synchronization. > >> >> So using mdadm V2.6.4 /dev/md0 assembles without problems and if I > >> >> switch to mdadm V3.2.5 it shows the same messages as above. > >> >> > >> >> The real problem: > >> >> I have more than a single machine receiving a similar software upda= te > >> >> so I need to find a solution or workaround around this problem. By = the > >> >> way, from another test without an existing spare disk, there seems = to > >> >> be no 'throwing out'-problem when switching from V2.6.4 to V3.2.5. > >> >> > >> >> It would also be a great help if someone could explain the reason > >> >> behind the relevant code fragment for rejecting a device, e.g. why = is > >> >> only the 'most_recent' device important? > >> >> > >> >> /* If this device thinks that 'most_recent' has failed, then > >> >> * we must reject this device. > >> >> */ > >> >> if (j !=3D most_recent && > >> >> content->array.raid_disks > 0 && > >> >> devices[most_recent].i.disk.raid_disk >=3D 0 && > >> >> devmap[j * content->array.raid_disks + > >> >> devices[most_recent].i.disk.raid_disk] =3D=3D 0) { > >> >> if (verbose > -1) > >> >> fprintf(stderr, Name ": ignoring %s as it reports %s as > >> >> failed\n", > >> >> devices[j].devname, devices[most_recent].devname); > >> >> best[i] =3D -1; > >> >> continue; > >> >> } > >> >> > >> >> I also attached some files showing some details about related > >> >> superblocks before and after assembly as well as about RAID status > >> >> itself. > >> > > >> > > >> > Thanks for the thorough report. I think this issue has been fixed in > >> > 3.3-rc1 > >> > You can fix it for 3.2.5 by applying the following patch: > >> > > >> > diff --git a/Assemble.c b/Assemble.c > >> > index 227d66f..bc65c29 100644 > >> > --- a/Assemble.c > >> > +++ b/Assemble.c > >> > @@ -849,7 +849,8 @@ int Assemble(struct supertype *st, char *mddev, > >> > devices[devcnt].i.disk.minor =3D minor(stb.st_rdev); > >> > if (most_recent < devcnt) { > >> > if (devices[devcnt].i.events > >> > - > devices[most_recent].i.events) > >> > + > devices[most_recent].i.events && > >> > + devices[devcnt].i.disk.state =3D=3D 6) > >> > most_recent =3D devcnt; > >> > } > >> > if (content->array.level =3D=3D LEVEL_MULTIPATH) > >> > > >> > The "most recent" device is important as we need to choose one to > >> > compare > >> > all > >> > others again. The problem is that the code in 3.2.5 can sometimes > >> > choose a > >> > spare, which isn't such a good idea. > >> > > >> > The "most recent" is also important because when a collection of dev= ices > >> > is given to the kernel it will give priority to some information whi= ch is > >> > on the > >> > last device passed in. So we make sure that the last device given to > >> > the kernel is the "most recent". > >> > > >> > Please let me know if the patch fixes your problem. > >> > > >> > NeilBrown > >> > >> First of all, thanks for your very helpful 'most recent disk' > >> explanation. > >> > >> Sadly, the patch didn't fix my problem because the event counters are > >> really equal on all disks (inclusive spare) and the first disk that is > >> checked is the spare disk so there is no reason to set another disk as > >> 'most recent disk', but I improved your patch a little bit by > >> providing more output and created also an own solution, but that needs > >> review because I'm not sure if it can be done like that. > >> > >> Patch 1: Your solution with more output > >> Diff: mdadm-3.2.5-noassemble-patch1.diff > >> Assembly: mdadm-3.2.5-noassemble-patch1.txt > >> > >> Patch 2: My proposed solution > >> Diff: mdadm-3.2.5-noassemble-patch2.diff > >> Assembly: mdadm-3.2.5-noassemble-patch2.txt > > > > > > Thanks for the testing and suggestions. I see what I missed now. > > Can you check if this patch works please? > > > > Thanks. > > NeilBrown > > > > diff --git a/Assemble.c b/Assemble.c > > index 227d66f..9131917 100644 > > --- a/Assemble.c > > +++ b/Assemble.c > > @@ -215,7 +215,7 @@ int Assemble(struct supertype *st, char *mddev, > > unsigned int okcnt, sparecnt, rebuilding_cnt; > > unsigned int req_cnt; > > int i; > > - int most_recent =3D 0; > > + int most_recent =3D -1; > > int chosen_drive; > > int change =3D 0; > > int inargv =3D 0; > > @@ -847,8 +847,9 @@ int Assemble(struct supertype *st, char *mddev, > > devices[devcnt].i =3D *content; > > devices[devcnt].i.disk.major =3D major(stb.st_rdev); > > devices[devcnt].i.disk.minor =3D minor(stb.st_rdev); > > - if (most_recent < devcnt) { > > - if (devices[devcnt].i.events > > + if (devices[devcnt].i.disk_state =3D=3D 6) { > > + if (most_recent < 0 || > > + devices[devcnt].i.events > > > devices[most_recent].i.events) > > most_recent =3D devcnt; > > } >=20 > Your patch seems to work without issues. >=20 > There is only a small typo: > + if (devices[devcnt].i.disk_state =3D=3D 6) { > should be: > + if (devices[devcnt].i.disk.state =3D=3D 6) { >=20 > I attached the patch that I'm finally using to this mail. > Thank you very much for your help. Great. Thanks for the confirmation. This fix is in 3.3. NeilBrown --Sig_/_3cURcN=CJ_kFp17FAUDJPx Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIVAwUBUi009Tnsnt1WYoG5AQLv9Q//fMMuuIwfgsDWjxh84xtnCofDxwJL6H9G NJMVA5Z1/8LmHPp1TO0Rz4D6Q3R6EJgCb8AXnUuBUP1eP9Mcw2Qv2MNQw9n8EOhT qP19RH2Sy/UYdk8JhnljzwMC9mBB8dnVeUHbia0ed8eguxTYWfkf1tjDLT/uSvFK aIDeQotjrAkBIW47rFj67yjffgW8A7lHxjs6WS1OqvYAuYpRD1HTGT06qy7UXxYp GlhvyObuCnBCsmg2zc21QNCvmpEVD+icBMZq4V3sePfp4ZM5vBCdabGdt1EhxUTu Tjiu0q2jwkmwjSMKLkAPTWWVt0HGaQ2DwYxolSJsA8VEqQjQpdH6enrjKoFZhtbK 22vgpcZD27/5fdTNLcy7uXjIXbaUCd/AaFev9gZgfK6HYZqKAqwu7xbD9YIfHOGC jyRQ6d02bDEwRgFLOXyIktU1EHXoqCkYKiKX7fUWW7Pu+FK/kERWeUI3SXQbUD/3 YSTsHmScILGtK/SKRWtU5XJ1+64RrQHu0+hSvkS6hYL0/oamMJQoMeyBMp+bFN3l F3y4fqf9Qt5n0NBIvSkioFTZBHesJo2cSoAF5LxRDb6hzDudYfQLT+qhQ9/svsix yacNKt4qvEl/D6uLbeQF03WBSD/ay4Bzyl3mYXXx022jj/9tbX2/7lyppvBX5dEZ Rc6izE6+BD0= =f64o -----END PGP SIGNATURE----- --Sig_/_3cURcN=CJ_kFp17FAUDJPx--