From: Doug Ledford <dledford@redhat.com>
To: Christian Gatzemeier <c.gatzemeier@tu-bs.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: safe segmenting of conflicting changes, and hot-plugging between alternative versions
Date: Mon, 26 Apr 2010 13:11:03 -0400 [thread overview]
Message-ID: <4BD5C927.6030608@redhat.com> (raw)
In-Reply-To: <loom.20100423T213339-797@post.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 5140 bytes --]
On 04/23/2010 05:04 PM, Christian Gatzemeier wrote:
> Phillip Susi <psusi <at> cfl.rr.com> writes:
>
>> when mdadm
>> --incremental sees the second disk claims the first disk is failed, but
>> it is active and working fine in the running array, it should realize
>> that the superblock on the second disk is wrong, and correct it, which
>> would leave the second disk as failed, removed, and neither use the out
>> of sync data on the disk, nor overwrite it with a copy from the first.
>
> "Correcting the superblocks" of conflicting members, would translate into having
> a defined way to mark those members as composing a segment that contains a known
> alternative version of the array. The earliest an alternative version can be
> detected, and thus be known and marked as such, is on an incident when a
> conflicting segment comes up while another segment of the array is already
> running degraded. (To simply support segments consisting of single raid member
> devices it may be enough if a superblock marking itself as failed would mean it
> is contains conflicting changes. Multi member segments would require segment IDs)
>
> IMHO all segments with alternative versions can be marked as known on such
> incidences. However whether the segments containing alternative versions
> continue to be normally assembled when they come up after the incident like
> before, or if they get ignored in favor of the arbitrary first segment of the
> incidence, should be configurable.
>
> For users that don't need or want to be able to switch between versions of an
> array by simply switching disks in a hot-pluggable manner, and for those
> concerned about a failure mode that may exist and make disks available in an
> alternating manner and them not noticing it all the time until an incident, I
> suggested "AUTO -SINGLE_SEGMENTS_WITH_KNOWN_ALTERNATIVE_VERSIONS".
>
> In order to manage segments with alternative versions in a hot-plug manner
> however, all segments need to continue to show up under their real array ID, if
> they are connected first or one at a time. (KNOWN_ALTERNATIVE_VERSIONS need to
> be assembled if they come up.) If the segments would be transformed into
> separate arrays the system won't recognize the segment of the array as such and
> not boot or open it correctly any more. And you wouldn't be able to switch
> between versions by switching the disks that are connected.
Actually, I have a feature request that I haven't gotten around to yet
for something similar to this. It's the ability pause a raid1 array,
causing a member of the array to stop all updates while the rest of the
array operates as normal. You then do your system updates, do your
testing, and if you decide it was a bad update, then you revert the
paused state of the array and you are back to the state you had prior to
the update. The basic guidelines that I've worked out for how this must
be done are as follows:
1) Use mdadm to mark a constituent device of an array as a paused member
(add an internal write intent bitmap if no bitmap currently exists and
use bitmap to track changed areas of array).
2) Reboot, pause becomes effective on next assembly (this is because you
want to make sure the pause takes effect at a point in time when the
filesystem is clean, pausing the system while live would be bad).
3) Perform updates, do testing.
4) Either unpause the array, keeping current setup (in which case the
unpause is immediate and you start syncing the current array data to the
paused array member), or unpause --revert, in which case the unpause
does just like the pause did and waits until the next reboot to become
effective for the obvious reason that we can't revert filesystem state
on a live filesystem.
5) If we added a bitmap where none existed before, remove it.
Done.
However, this is fairly orthogonal to the original problem you
mentioned, specifically that mounting to members of a raid1 array
independently can trick them into thinking they are in sync when they
aren't. The simplest solution to solve that problem would be to add a
generation count to each device's data in each superblock such that if
device B is failed from the array, then the subsequent update to the
superblock on device A would record not only that device B was failed,
but what the generation count was when device B was failed. On
subsequent reassembly, if device B reappears, and the generation count
on device B does not match the recorded generation count for device B's
failure incident, then refuse to reassemble the devices into the same
array as this would indicate that the arrays have changed independent of
each other. But that would probably require a superblock version update
to start storing that for each failed device. Unless Neil could find
some place to stash the data in the current superblock layouts.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
next prev parent reply other threads:[~2010-04-26 17:11 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-23 13:42 safe segmenting of conflicting changes (was: Two degraded mirror segments recombined out of sync for massive data loss) Christian Gatzemeier
2010-04-23 15:08 ` Phillip Susi
2010-04-23 18:18 ` Phillip Susi
2010-04-26 16:59 ` safe segmenting of conflicting changes Doug Ledford
2010-04-26 17:48 ` Phillip Susi
2010-04-26 18:05 ` Doug Ledford
2010-04-26 18:43 ` Phillip Susi
2010-04-26 19:07 ` Doug Ledford
2010-04-26 19:38 ` Phillip Susi
2010-04-26 23:33 ` Doug Ledford
2010-04-27 16:20 ` Phillip Susi
2010-04-27 17:27 ` Doug Ledford
2010-04-27 18:04 ` Phillip Susi
2010-04-27 19:29 ` Doug Ledford
2010-04-28 13:22 ` Phillip Susi
2010-04-23 21:04 ` safe segmenting of conflicting changes, and hot-plugging between alternative versions Christian Gatzemeier
2010-04-24 8:10 ` Christian Gatzemeier
2010-04-26 17:11 ` Doug Ledford [this message]
2010-04-26 21:10 ` Christian Gatzemeier
2010-05-05 11:28 ` detecting segmentation / conflicting changes Christian Gatzemeier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BD5C927.6030608@redhat.com \
--to=dledford@redhat.com \
--cc=c.gatzemeier@tu-bs.de \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.