From: NeilBrown <neilb@suse.de>
To: Claude Nobs <claudenobs@blunet.cc>
Cc: linux-raid@vger.kernel.org
Subject: Re: Likely forced assemby with wrong disk during raid5 grow. Recoverable?
Date: Mon, 21 Feb 2011 11:53:03 +1100 [thread overview]
Message-ID: <20110221115303.4862e093@notabene.brown> (raw)
In-Reply-To: <AANLkTi=-guMf-8YJDMvq9ybyY9Fppi+W0pqhH2Of=mKd@mail.gmail.com>
On Sun, 20 Feb 2011 15:44:35 +0100 Claude Nobs <claudenobs@blunet.cc> wrote:
> > They are the 'Number' column in the --detail output below. This is /dev/md1
> > - I can tell from the --examine outputs, but it is a bit confusing. Newer
> > versions of mdadm make this a little less confusing. If you look for
> > patterns of U and u in the 'Array State' line, the U is 'this device', the
> > 'u' is some other devices.
>
> Actually this is running a stock Ubunutu 10.10 server kernel. But as
> it is from my memory it could very well have been :
>
> 2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/5] [U_UUU]
>
I'm quite sure it would have been '[U_UUU]' as you say.
When I say "Newer versions" I mean of mdadm, not the kernel.
What does
mdadm -V
show? Version 3.0 or later gives less confusing output for "mdadm --examine"
on 1.x metadata.
> > Just to go through some of the numbers...
> >
> > Chunk size is 64K. Reshape was 4->5, so 3 -> 4 data disks.
> > So old stripes have 192K, new stripes have 256K.
> >
> > The 'good' disks think reshape has reached 502815488K which is
> > 1964123 new stripes. (2618830.66 old stripes)
> > md1 thinks reshape has only reached 489510400K which is 1912150
> > new stripes (2549533.33 old stripes).
>
> i think you mixed up sdd1 with md1 here? (the numbers above for md1
> are for sdd1. md1 would be : reshape has reached 502809856K which
> would be 1964101 new stripes. so the difference between the good disks
> and md1 would be 22 stripes.)
Yes, I got them mixed up. But the net result is the same - the 'new' stripes
numbers haven't got close to overwriting the 'old' stripe numbers.
>
> >
> > So of the 51973 stripes that have been reshaped since the last metadata
> > update on sdd1, some will have been done on sdd1, but some not, and we don't
> > really know how many. But it is perfectly safe to repeat those stripes
> > as all writes to that region will have been suspended (and you probably
> > weren't writing anyway).
>
> jep there was nothing writing to the array. so now i am a little
> confused, if you meant sdd1 (which failed first is 51973 stripes
> behind) this would imply that at least so many stripes of data are
> kept of the old (3 data disks) configuration as well as the new one?
> if continuing from there is possible then the array would no longer be
> degraded right? so i think you meant md1 (22 stripes behind), as
> keeping 5.5M of data from the old and new config seems more
> reasonable. however this is just a guess :-)
Yes, it probably is possible to re-assemble the array to include sdd1 and not
have a degraded array, and still have all your data safe - providing you are
sure that nothing at all changed on the array (e.g. maybe it was unmounted?).
I'm not sure I'd recommend it though.... I cannot see anything that would go
wrong, but it is somewhat unknown territory.
Up to you...
If you:
% git clone git://neil.brown.name/mdadm master
% cd mdadm
% make
% sudo bash
# ./mdadm -S /dev/md2
# ./mdadm -Afvv /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdc1
It should restart your array - degraded - and repeat the last stages of
reshape just in case.
Alternately, before you run 'make' you could edit Assemble.c, find:
while (force && !enough(content->array.level, content->array.raid_disks,
content->array.layout, 1,
avail, okcnt)) {
around line 818, and change the '1,' to '0,', then run make, mdadm -S, and
then
# ./mdadm -Afvv /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdc1 /dev/sdd1
it should assemble the array non-degraded and repeat all of the reshape since
sdd1 fell out of the array.
As you have a backup, this is probably safe because even if to goes bad you
can restore from backups - not that I expect it to go bad but ....
> >
> > Thanks for the excellent problem report.
> >
> > NeilBrown
>
> Well i thank you for providing such an elaborate and friendly answer!
> this is actually my first mailing list post and considering how many
> questions get ignored (don't know about this list though) i just hoped
> someone would at least answer with a one liner... i never expected
> this. so thanks again.
All part of the service... :-)
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-02-21 0:53 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <AANLkTikhOAXQ6JAG1fK3x9V3icki8cjn0_ggyQwkGmnt@mail.gmail.com>
2011-02-20 3:23 ` Likely forced assemby with wrong disk during raid5 grow. Recoverable? Claude Nobs
2011-02-20 5:25 ` NeilBrown
2011-02-20 14:44 ` Claude Nobs
2011-02-20 14:47 ` Mathias Burén
2011-02-21 0:53 ` NeilBrown [this message]
2011-02-21 1:03 ` NeilBrown
2011-02-23 0:56 ` Claude Nobs
2011-02-23 1:53 ` NeilBrown
2011-02-24 4:06 ` Claude Nobs
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110221115303.4862e093@notabene.brown \
--to=neilb@suse.de \
--cc=claudenobs@blunet.cc \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.