Re: Update to mdadm V3.2.5 => RAID starts to recover (reproducible)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Andreas Baer <synthetic.gods@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Update to mdadm V3.2.5 => RAID starts to recover (reproducible)
Date: Mon, 9 Sep 2013 12:39:48 +1000	[thread overview]
Message-ID: <20130909123948.415c6c53@notabene.brown> (raw)
In-Reply-To: <CAE65G7=qqcUVSHTJOczLCqAAjYgLtf02s-RpKzvD24NT2-q5ZA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 8300 bytes --]

On Thu, 5 Sep 2013 17:22:26 +0200 Andreas Baer <synthetic.gods@gmail.com>
wrote:

> On 9/2/13, NeilBrown <neilb@suse.de> wrote:
> > On Thu, 29 Aug 2013 11:55:09 +0200 Andreas Baer <synthetic.gods@gmail.com>
> > wrote:
> >
> >> On 8/26/13, NeilBrown <neilb@suse.de> wrote:
> >> > On Thu, 22 Aug 2013 15:20:06 +0200 Andreas Baer
> >> > <synthetic.gods@gmail.com>
> >> > wrote:
> >> >
> >> >> Short description:
> >> >> I've discovered a problem during re-assembly of a clean RAID. mdadm
> >> >> throws one disk out because this disk apparently shows another disk as
> >> >> failed. After assembly, RAID starts to recover on existing spare disk.
> >> >>
> >> >> In detail:
> >> >> 1. RAID-6 (Superblock V0.90.00) created with mdadm V2.6.4 and with 7
> >> >> active disks and 1 spare disk (disk size: 1 TB), fully synced and
> >> >> clean.
> >> >> 2. RAID-6 stopped and re-assembled with mdadm V3.2.5, but during that
> >> >> one disk is thrown out.
> >> >>
> >> >> Manual assembly command for /dev/md0, relevant partitions are
> >> >> /dev/sd[b-i]1:
> >> >> # mdadm --assemble --scan -vvv
> >> >> mdadm: looking for devices for /dev/md0
> >> >> mdadm: no RAID superblock on /dev/sdi
> >> >> mdadm: no RAID superblock on /dev/sdh
> >> >> mdadm: no RAID superblock on /dev/sdg
> >> >> mdadm: no RAID superblock on /dev/sdf
> >> >> mdadm: no RAID superblock on /dev/sde
> >> >> mdadm: no RAID superblock on /dev/sdd
> >> >> mdadm: no RAID superblock on /dev/sdc
> >> >> mdadm: no RAID superblock on /dev/sdb
> >> >> mdadm: no RAID superblock on /dev/sda1
> >> >> mdadm: no RAID superblock on /dev/sda
> >> >> mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 7.
> >> >> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 6.
> >> >> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
> >> >> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4.
> >> >> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
> >> >> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
> >> >> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
> >> >> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
> >> >> mdadm: ignoring /dev/sdb1 as it reports /dev/sdi1 as failed
> >> >> mdadm: no uptodate device for slot 0 of /dev/md0
> >> >> mdadm: added /dev/sdd1 to /dev/md0 as 2
> >> >> mdadm: added /dev/sde1 to /dev/md0 as 3
> >> >> mdadm: added /dev/sdf1 to /dev/md0 as 4
> >> >> mdadm: added /dev/sdg1 to /dev/md0 as 5
> >> >> mdadm: added /dev/sdh1 to /dev/md0 as 6
> >> >> mdadm: added /dev/sdi1 to /dev/md0 as 7
> >> >> mdadm: added /dev/sdc1 to /dev/md0 as 1
> >> >> mdadm: /dev/md0 has been started with 6 drives (out of 7) and 1 spare.
> >> >>
> >> >> I finally made a test by modifying mdadm V3.2.5 sources to not write
> >> >> any data to any superblock and to simply exit() somewhere in the
> >> >> middle of assembly process to be able to reproduce this behavior
> >> >> without any RAID re-creation/synchronization.
> >> >> So using mdadm V2.6.4 /dev/md0 assembles without problems and if I
> >> >> switch to mdadm V3.2.5 it shows the same messages as above.
> >> >>
> >> >> The real problem:
> >> >> I have more than a single machine receiving a similar software update
> >> >> so I need to find a solution or workaround around this problem. By the
> >> >> way, from another test without an existing spare disk, there seems to
> >> >> be no 'throwing out'-problem when switching from V2.6.4 to V3.2.5.
> >> >>
> >> >> It would also be a great help if someone could explain the reason
> >> >> behind the relevant code fragment for rejecting a device, e.g. why is
> >> >> only the 'most_recent' device important?
> >> >>
> >> >> /* If this device thinks that 'most_recent' has failed, then
> >> >>   * we must reject this device.
> >> >>   */
> >> >> if (j != most_recent &&
> >> >>     content->array.raid_disks > 0 &&
> >> >>     devices[most_recent].i.disk.raid_disk >= 0 &&
> >> >>     devmap[j * content->array.raid_disks +
> >> >> devices[most_recent].i.disk.raid_disk] == 0) {
> >> >>     if (verbose > -1)
> >> >>         fprintf(stderr, Name ": ignoring %s as it reports %s as
> >> >> failed\n",
> >> >>             devices[j].devname, devices[most_recent].devname);
> >> >>     best[i] = -1;
> >> >>     continue;
> >> >> }
> >> >>
> >> >> I also attached some files showing some details about related
> >> >> superblocks before and after assembly as well as about RAID status
> >> >> itself.
> >> >
> >> >
> >> > Thanks for the thorough report.  I think this issue has been fixed in
> >> > 3.3-rc1
> >> > You can fix it for 3.2.5 by applying the following patch:
> >> >
> >> > diff --git a/Assemble.c b/Assemble.c
> >> > index 227d66f..bc65c29 100644
> >> > --- a/Assemble.c
> >> > +++ b/Assemble.c
> >> > @@ -849,7 +849,8 @@ int Assemble(struct supertype *st, char *mddev,
> >> >  		devices[devcnt].i.disk.minor = minor(stb.st_rdev);
> >> >  		if (most_recent < devcnt) {
> >> >  			if (devices[devcnt].i.events
> >> > -			    > devices[most_recent].i.events)
> >> > +			    > devices[most_recent].i.events &&
> >> > +			    devices[devcnt].i.disk.state == 6)
> >> >  				most_recent = devcnt;
> >> >  		}
> >> >  		if (content->array.level == LEVEL_MULTIPATH)
> >> >
> >> > The "most recent" device is important as we need to choose one to
> >> > compare
> >> > all
> >> > others again.  The problem is that the code in 3.2.5 can sometimes
> >> > choose a
> >> > spare, which isn't such a good idea.
> >> >
> >> > The "most recent" is also important because when a collection of devices
> >> > is given to the kernel it will give priority to some information which is
> >> > on the
> >> > last device passed in.  So we make sure that the last device given to
> >> > the kernel is the "most recent".
> >> >
> >> > Please let me know if the patch fixes your problem.
> >> >
> >> > NeilBrown
> >>
> >> First of all, thanks for your very helpful 'most recent disk'
> >> explanation.
> >>
> >> Sadly, the patch didn't fix my problem because the event counters are
> >> really equal on all disks (inclusive spare) and the first disk that is
> >> checked is the spare disk so there is no reason to set another disk as
> >> 'most recent disk', but I improved your patch a little bit by
> >> providing more output and created also an own solution, but that needs
> >> review because I'm not sure if it can be done like that.
> >>
> >> Patch 1: Your solution with more output
> >> Diff: mdadm-3.2.5-noassemble-patch1.diff
> >> Assembly: mdadm-3.2.5-noassemble-patch1.txt
> >>
> >> Patch 2: My proposed solution
> >> Diff: mdadm-3.2.5-noassemble-patch2.diff
> >> Assembly: mdadm-3.2.5-noassemble-patch2.txt
> >
> >
> > Thanks for the testing and suggestions.  I see what I missed now.
> > Can you check if this patch works please?
> >
> > Thanks.
> > NeilBrown
> >
> > diff --git a/Assemble.c b/Assemble.c
> > index 227d66f..9131917 100644
> > --- a/Assemble.c
> > +++ b/Assemble.c
> > @@ -215,7 +215,7 @@ int Assemble(struct supertype *st, char *mddev,
> >  	unsigned int okcnt, sparecnt, rebuilding_cnt;
> >  	unsigned int req_cnt;
> >  	int i;
> > -	int most_recent = 0;
> > +	int most_recent = -1;
> >  	int chosen_drive;
> >  	int change = 0;
> >  	int inargv = 0;
> > @@ -847,8 +847,9 @@ int Assemble(struct supertype *st, char *mddev,
> >  		devices[devcnt].i = *content;
> >  		devices[devcnt].i.disk.major = major(stb.st_rdev);
> >  		devices[devcnt].i.disk.minor = minor(stb.st_rdev);
> > -		if (most_recent < devcnt) {
> > -			if (devices[devcnt].i.events
> > +		if (devices[devcnt].i.disk_state == 6) {
> > +			if (most_recent < 0 ||
> > +			    devices[devcnt].i.events
> >  			    > devices[most_recent].i.events)
> >  				most_recent = devcnt;
> >  		}
> 
> Your patch seems to work without issues.
> 
> There is only a small typo:
> +		if (devices[devcnt].i.disk_state == 6) {
> should be:
> +		if (devices[devcnt].i.disk.state == 6) {
> 
> I attached the patch that I'm finally using to this mail.
> Thank you very much for your help.

Great.  Thanks for the confirmation.

This fix is in 3.3.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

     prev parent reply	other threads:[~2013-09-09  2:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-22 13:20 Update to mdadm V3.2.5 => RAID starts to recover (reproducible) Andreas Baer
2013-08-26  5:52 ` NeilBrown
2013-08-29  9:55   ` Andreas Baer
2013-09-02  1:35     ` NeilBrown
2013-09-05 15:22       ` Andreas Baer
2013-09-09  2:39         ` NeilBrown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130909123948.415c6c53@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=synthetic.gods@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.