linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Martin Wilck <mwilck@arcor.de>
Cc: Albert Pauw <albert.pauw@gmail.com>, linux-raid@vger.kernel.org
Subject: Re: Bugreport ddf rebuild problems
Date: Tue, 6 Aug 2013 10:16:33 +1000	[thread overview]
Message-ID: <20130806101633.4b8f2374@notabene.brown> (raw)
In-Reply-To: <5200180C.6060604@arcor.de>

[-- Attachment #1: Type: text/plain, Size: 2612 bytes --]

On Mon, 05 Aug 2013 23:24:28 +0200 Martin Wilck <mwilck@arcor.de> wrote:

> Hi Albert, Neil,
> 
> I just submitted a new patch series; patch 3/5 integrates your 2nd case
> as a new unit test and 4/5 should fix it.
> 
> However @Neil: I am not yet entirely happy with this solution. AFAICS
> there is a possible race condition here, if a disk fails and mdadm -CR
> is called to create a new array before the metadata reflecting the
> failure is written to disk. If a disk failure happens in one array,
> mdmon will call reconcile_failed() to propagate the failure to other
> already known arrays in the same container, by writing "faulty" to the
> sysfs state attribute. It can't do that for a new container though.
> 
> I thought that process_update() may need to check the kernel state of
> array members against meta data state when a new VD configuration record
> is received, but that's impossible because we can't call open() on the
> respective sysfs files. It could be done in prepare_update(), but that
> would require major changes, I wanted to ask you first.
> 
> Another option would be changing manage_new(). But we don't seem to have
> a suitable metadata handler method to pass the meta data state to the
> manager....
> 
> Ideas?

Thanks for the patches - I applied them all.

Is there a race here?  When "mdadm -C" looks at the metadata the device will
either be an active member of another array, or it will be marked faulty.
Either way mdadm won't use it.

If the first array was created to use only (say) half of each device and the
second array was created with a size to fit in the other half of the device
then it might get interesting.
"mdadm -C" might see that everything looks good, create the array using the
second half of that drive that has just failed, and give that info to mdmon.

I suspect that ddf_open_new (which currently looks like it is just a stub)
needs to help out here.
When manage_new() gets told about a new array it will collect relevant info
from sysfs and call ->open_new() to make sure it matches the metadata.
ddf_open_new should check that all the devices in the array are recorded as
working in the metadata.  If any are failed, it can write 'faulty' to the
relevant state_fd.

Possibly the same thing can be done generically in manage_new() as you
suggested.  After the new array has been passed over to the monitor thread,
manage_new() could check if any devices should be failed much like
reconcile_failed() does and just fail them.

Does that make any sense?  Did I miss something?

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2013-08-06  0:16 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-01 19:30 Bugreport ddf rebuild problems Albert Pauw
2013-08-01 19:09 ` Martin Wilck
     [not found]   ` <CAGkViCFT0+qm9YAnAJM7JGgVg0RTJi8=HAYDTMs-mfhXinqdcg@mail.gmail.com>
2013-08-01 21:13     ` Martin Wilck
2013-08-01 22:09       ` Martin Wilck
2013-08-01 22:37         ` Martin Wilck
2013-08-03  9:43           ` Albert Pauw
2013-08-04  9:47             ` Albert Pauw
2013-08-05 16:55               ` Albert Pauw
2013-08-05 21:24                 ` Martin Wilck
2013-08-06  0:16                   ` NeilBrown [this message]
2013-08-06 21:26                     ` Martin Wilck
2013-08-06 21:37                       ` Patches related to current discussion mwilck
2013-08-06 21:38                       ` [PATCH 6/9] tests/10ddf-fail-spare: more sophisticated result checks mwilck
2013-08-06 21:38                       ` [PATCH 7/9] tests/10ddf-fail-create-race: test handling of fail/create race mwilck
2013-08-06 21:38                       ` [PATCH 8/9] DDF: ddf_open_new: check device status for new subarray mwilck
2013-08-06 21:38                       ` [PATCH 9/9] Create: set array status to frozen until monitoring starts mwilck
2013-08-08  0:44                         ` NeilBrown
2013-08-08  7:31                           ` Martin Wilck
2013-08-07 18:07                       ` Bugreport ddf rebuild problems Albert Pauw
2013-08-08  0:40                       ` NeilBrown
     [not found] <CAGkViCHPvbmcehFvACBKVFFCw+DdnjqvK2uNGmvKrFki+n9n-Q@mail.gmail.com>
2013-08-05  6:21 ` NeilBrown
2013-08-05  7:17   ` Albert Pauw

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130806101633.4b8f2374@notabene.brown \
    --to=neilb@suse.de \
    --cc=albert.pauw@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=mwilck@arcor.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).