From: Neil Brown <neilb@suse.de>
To: Guy Martin <gmsoft@tuxicoman.be>
Cc: linux-raid@vger.kernel.org
Subject: Re: Raid 5 to raid 6 reshape failure after reboot
Date: Thu, 29 Oct 2009 15:55:10 +1100 [thread overview]
Message-ID: <19177.8238.461124.535782@notabene.brown> (raw)
In-Reply-To: message from Guy Martin on Thursday October 22
On Thursday October 22, gmsoft@tuxicoman.be wrote:
>
> Hi Neil,
>
> Thanks, this new mdadm does fix the assemble issue.
>
> However, I performed an additional test and it didn't go so well.
> I failed one drive during the reshape and tried to remove and add it
> back.
> I wasn't able to remove the drive because the mdadm process running in
> the background was keeping the partition open. I then decided to stop
> the array and restart it but without luck.
> I've performed this test with today's devel-3.1 branch.
>
> Is this supposed to be working or no drive should fail during the reshape ?
Thanks for reporting this.
I hadn't tested, or even thought through, that scenario.
I have tested that a degraded array can be reshaped, but not that a
reshaping array can get degraded.
md will certainly not allow you to add the device back - that will
have to wait for the reshape to finish.... I guess it could be managed
by it would be rather complex.... maybe.
However it should handle the failure properly but it doesn't.
In particular, the reshape process in aborted and restarted where it
was up to, but in the process of doing that it 'escapes' from the
controlling mdadm process that was managing the backup. So the
reshape gets way ahead of the backup and as you discovered, the backup
file is no longer useful for restarting the reshaped array.
You can fix this by changing the
mddev->resync_max = MaxSector;
near the end of md_do_sync to
if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery))
mddev->resync_max = MaxSector;
But doing that with the current mdadm isn't a good solution as it
could be backing up the wrong data (as mdadm will trust the device
that has been marked as faulty).
It looks like I have some fixing to do....
Thanks!
NeilBrown
>
> Here are the commands that I've been issuing :
> [array currently reshaping]
> mdadm --fail /dev/md0 /dev/sdb1
> mdadm -r /dev/md0 /dev/sdb1 -> device busy
> mdadm -S /dev/md0 -> array stopped
> mdadm --assemble /dev/md0 /dev/sd[bdef]1 --backup-file backup -v
>
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 2.
> mdadm:/dev/md0 has an active reshape - checking if critical section needs to be restored
> mdadm: backup-metadata found on backup but is not needed
> mdadm: Failed to find backup of critical section
> mdadm: Failed to restore critical section for reshape, sorry.
>
>
> Guy
>
>
> > Ahhh... I wondered a bit about that as I was adding the fprintf there,
> > but it was along the lines of "this cannot happen", not "this is where
> > the bug might be" :-)
> >
> > I see now what is happening. I need to update the mtime every time I
> > write the backup metadata (of course!). I never tripped on this
> > because I never let a reshape run for more than a few minutes.
> >
> > I have checked in a patch which updated the mtime properly, so it
> > should now word for you.
> >
> > Thanks for helping make mdadm even better!
> >
> > NeilBrown
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-10-29 4:55 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-18 16:10 Raid 5 to raid 6 reshape failure after reboot Guy Martin
2009-10-18 20:14 ` NeilBrown
2009-10-19 13:53 ` Guy Martin
2009-10-19 20:05 ` NeilBrown
2009-10-20 5:54 ` NeilBrown
2009-10-20 8:37 ` Guy Martin
2009-10-21 23:44 ` Neil Brown
2009-10-22 9:29 ` Guy Martin
2009-10-29 4:55 ` Neil Brown [this message]
2009-10-22 14:20 ` Guy Martin
2009-10-29 3:32 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=19177.8238.461124.535782@notabene.brown \
--to=neilb@suse.de \
--cc=gmsoft@tuxicoman.be \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).