From: Neil Brown <neilb@suse.de>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Ciechanowski, Ed" <ed.ciechanowski@intel.com>,
"Labun, Marcin" <Marcin.Labun@intel.com>,
"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: [GIT PATCH 0/2] external-metadata recovery checkpointing for 2.6.33
Date: Wed, 16 Dec 2009 16:16:13 +1100 [thread overview]
Message-ID: <20091216161613.226a6a38@notabene.brown> (raw)
In-Reply-To: <e9c3a7c20912151003y942a4aex803e1e6722f23f31@mail.gmail.com>
On Tue, 15 Dec 2009 11:03:06 -0700
Dan Williams <dan.j.williams@intel.com> wrote:
> On Mon, Dec 14, 2009 at 9:19 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > On second thought, if we get to activate_spare() it's already too
> > late. Moving this to mdadm at assembly time (prior to setting
> > readonly) is a better approach.
> >
>
> Problem. slot_store() in the array inactive case currently does:
>
> /* assume it is working */
> clear_bit(Faulty, &rdev->flags);
> clear_bit(WriteMostly, &rdev->flags);
> set_bit(In_sync, &rdev->flags);
> sysfs_notify_dirent(rdev->sysfs_state);
>
> i.e. sets the disk insync even if we specified a recovery_start <
> MaxSector. If userspace can guarantee that the array stays inactive
> then it can write to 'recovery_start' after 'slot' and catch attempts
> to cold_add() out-of-sync disks on pre-2.6.33 kernels, but that gives
> a window of invalid configuration. The other fix is to remove the
> set_bit(In_sync), and then for the pre-2.6.33 case userspace would
> need to disallow adding out-of-sync disks and force them through the
> hot_add() case. This is how mdadm/mdmon currently operates, but that
> is a surprising ABI quirk when switching to/from 2.6.33. A third
> option is to allow recovery_start_store to be modified while the array
> is read only. Although not my favorite, because it requires tricky
> mdmon logic to catch activate_spare() attempts before the monitor
> thread starts touching the array, it has the benefit of not changing
> any old behavior and no window of invalid configuration. Thoughts??
I'm tempted to wait a bit longer and see if you find a solution,
as you seem to be progressing quite well :-) But I won't.
I imagine there are two cases:
1/ assembling an array from devices some of which might be partially
recovered,
2/ re-adding a device to an array which is already active.
In the first case, mdadm would:
- add the disk (write to new_dev)
- set the slot - this sets 'In_sync'
- set the recovery_start - this clears 'In_sync' as required.
In the second case either mdadm or mdmon would:
- write 'frozen' to sync_action, which would inhibit any call
to remove_and_add_spares
- add the disk
- set recovery_start
- set the slot
- write 'recover' to sync_action
It is unfortunate that the setting of 'slot' and 'recovery_start'
must be in different orders in the different cases, but maybe that
isn't a tragedy.
Possibly I could change slot_store in the pers==NULL case to not
set In_sync if recovery_offset were not MaxSector, but
I'm not sure it is worth the effort.
Does that answer your concerns?
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-12-16 5:16 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-13 4:17 [GIT PATCH 0/2] external-metadata recovery checkpointing for 2.6.33 Dan Williams
2009-12-13 4:17 ` [PATCH 1/2] md: rcu_read_lock() walk of mddev->disks in md_do_sync() Dan Williams
2009-12-13 4:17 ` [PATCH 2/2] md: add 'recovery_start' sysfs attribute Dan Williams
2009-12-14 4:07 ` [GIT PATCH 0/2] external-metadata recovery checkpointing for 2.6.33 Neil Brown
2009-12-14 4:49 ` Dan Williams
2009-12-14 5:35 ` Neil Brown
2009-12-15 0:37 ` Dan Williams
2009-12-15 4:19 ` Dan Williams
2009-12-15 18:03 ` Dan Williams
2009-12-16 5:16 ` Neil Brown [this message]
2009-12-16 6:24 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091216161613.226a6a38@notabene.brown \
--to=neilb@suse.de \
--cc=Marcin.Labun@intel.com \
--cc=dan.j.williams@intel.com \
--cc=ed.ciechanowski@intel.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).