From: Neil Brown <neilb@suse.de>
To: Brett Russ <bruss@netezza.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: non-fresh data unavailable bug
Date: Mon, 18 Jan 2010 16:32:51 +1300 [thread overview]
Message-ID: <20100118163251.28b6a84e@notabene> (raw)
In-Reply-To: <hiq226$n6t$1@ger.gmane.org>
On Fri, 15 Jan 2010 10:36:39 -0500
Brett Russ <bruss@netezza.com> wrote:
> On 01/14/2010 02:24 PM, Michael Evans wrote:
> > On Thu, Jan 14, 2010 at 7:10 AM, Brett Russ<bruss@netezza.com> wrote:
> >> Slightly related to my last message here Re:non-fresh behavior, we have seen
> >> cases where the following happens:
> >> * healthy 2 disk raid1 (disks A& B) incurs a problem with disk B
> >> * disk B is removed, unit is now degraded
> >> * replacement disk C is added; recovery from A to C begins
> >> * during recovery, disk A incurs a brief lapse in connectivity. At this
> >> point C is still up yet only has a partial copy of the data.
> >> * a subsequent assemble operation on the raid1 results in disk A being
> >> kicked out as non-fresh, yet C is allowed in.
> >
> > I believe the desired and logical behavior here is to refuse running
> > an incomplete array unless explicitly forced to do so. Incremental
> > assembly might be what you're seeing.
>
> This brings up a good point. I didn't mention that the assemble in the
> last step above was forced. Thus, the "bug" I'm reporting is that under
> duress, mdadm/md chose to assemble the array with a partially recovered
> (but "newer") member instead of the older member which was the recovery
> *source* for the newer member.
>
> What I think should happen is members that are *destinations* for
> recovery should *never* receive a higher event count, timestamp, or any
> other marking than the recovery sources. By definition they are
> incomplete and can't be trusted, thus they should never trump a complete
> member during assemble. I would assume the code already does this but
> perhaps there is a hole.
>
> One other piece of information that may be relevant--we're using 2
> member RAID1 units with one member marked write-mostly. At this time, I
> don't have the specifics for which member (A or B) was the write-mostly
> member in the example above, but I can find that out.
>
> > I very much recommend running it read-only until you can determine which
> > assembly pattern produces the most viable results.
>
> Good tip. We were able to manually recover the array in the case
> outlined above, now we're looking back to fixing the kernel to prevent
> it happening again.
>
Thanks for the report. It sounds like a real problem.
I'm travelling at the moment so reproducing it would be a challenge.
If you are able to, can you report the output of
mdadm -E /dev/list-of-devices
at the key points in the process, and also add "-v" to any
mdadm --assemble
command you use, and report the output?
Thanks,
NeilBrown
prev parent reply other threads:[~2010-01-18 3:32 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-14 15:10 non-fresh data unavailable bug Brett Russ
2010-01-14 19:24 ` Michael Evans
2010-01-15 15:36 ` Brett Russ
2010-01-18 3:32 ` Neil Brown [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100118163251.28b6a84e@notabene \
--to=neilb@suse.de \
--cc=bruss@netezza.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).