From: Brett Russ <bruss@netezza.com>
To: linux-raid@vger.kernel.org
Subject: Re: non-fresh data unavailable bug
Date: Fri, 15 Jan 2010 10:36:39 -0500 [thread overview]
Message-ID: <hiq226$n6t$1@ger.gmane.org> (raw)
In-Reply-To: <4877c76c1001141124m1d3fcfcdmffc3df97d1b504a@mail.gmail.com>
On 01/14/2010 02:24 PM, Michael Evans wrote:
> On Thu, Jan 14, 2010 at 7:10 AM, Brett Russ<bruss@netezza.com> wrote:
>> Slightly related to my last message here Re:non-fresh behavior, we have seen
>> cases where the following happens:
>> * healthy 2 disk raid1 (disks A& B) incurs a problem with disk B
>> * disk B is removed, unit is now degraded
>> * replacement disk C is added; recovery from A to C begins
>> * during recovery, disk A incurs a brief lapse in connectivity. At this
>> point C is still up yet only has a partial copy of the data.
>> * a subsequent assemble operation on the raid1 results in disk A being
>> kicked out as non-fresh, yet C is allowed in.
>
> I believe the desired and logical behavior here is to refuse running
> an incomplete array unless explicitly forced to do so. Incremental
> assembly might be what you're seeing.
This brings up a good point. I didn't mention that the assemble in the
last step above was forced. Thus, the "bug" I'm reporting is that under
duress, mdadm/md chose to assemble the array with a partially recovered
(but "newer") member instead of the older member which was the recovery
*source* for the newer member.
What I think should happen is members that are *destinations* for
recovery should *never* receive a higher event count, timestamp, or any
other marking than the recovery sources. By definition they are
incomplete and can't be trusted, thus they should never trump a complete
member during assemble. I would assume the code already does this but
perhaps there is a hole.
One other piece of information that may be relevant--we're using 2
member RAID1 units with one member marked write-mostly. At this time, I
don't have the specifics for which member (A or B) was the write-mostly
member in the example above, but I can find that out.
> I very much recommend running it read-only until you can determine which
> assembly pattern produces the most viable results.
Good tip. We were able to manually recover the array in the case
outlined above, now we're looking back to fixing the kernel to prevent
it happening again.
Thanks,
Brett
next prev parent reply other threads:[~2010-01-15 15:36 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-14 15:10 non-fresh data unavailable bug Brett Russ
2010-01-14 19:24 ` Michael Evans
2010-01-15 15:36 ` Brett Russ [this message]
2010-01-18 3:32 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='hiq226$n6t$1@ger.gmane.org' \
--to=bruss@netezza.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).