* [Drbd-dev] DRBD8: Split-brain false positive on Primary/primary potential patch
@ 2006-11-06 23:47 Montrose, Ernest
2006-11-16 9:10 ` Philipp Reisner
2006-11-18 11:00 ` Philipp Reisner
0 siblings, 2 replies; 9+ messages in thread
From: Montrose, Ernest @ 2006-11-06 23:47 UTC (permalink / raw)
To: Graham, Simon, drbd-dev
[-- Attachment #1: Type: text/plain, Size: 514 bytes --]
When running Primary/Primary if the Heartbeat connection goes down when
we recover we always split brain. Simon had an idea which I have
implemented. He is on vacation so this may not reflect his exact idea.
Essentially with this change, we do not create a new current UUID on the
node unless I/O is seen. This prevent Split-Brain mitigation when both
nodes are primary but only one node is originating I/O and never the
other. He is only stand-by in that case.
Take a look and let me know.
EM--
[-- Attachment #2: anti_spli_brain.patch --]
[-- Type: application/octet-stream, Size: 1175 bytes --]
Index: trunk/drbd/drbd_actlog.c
===================================================================
--- trunk/drbd/drbd_actlog.c (revision 6152)
+++ trunk/drbd/drbd_actlog.c (working copy)
@@ -259,6 +259,14 @@
spin_unlock_irq(&mdev->al_lock);
wake_up(&mdev->al_wait);
}
+
+ if (mdev->state.role == Primary &&
+ mdev->bc->md.uuid[Bitmap] == 0 &&
+ mdev->state.conn == StandAlone) {
+ /* Only do it if we have not yet done it... */
+ drbd_uuid_new_current(mdev);
+ }
+
}
void drbd_al_complete_io(struct Drbd_Conf *mdev, sector_t sector)
Index: trunk/drbd/drbd_main.c
===================================================================
--- trunk/drbd/drbd_main.c (revision 6152)
+++ trunk/drbd/drbd_main.c (working copy)
@@ -900,11 +900,6 @@
mdev->p_uuid = NULL;
}
if (inc_local(mdev)) {
- if (ns.role == Primary && mdev->bc->md.uuid[Bitmap] == 0 ) {
- /* Only do it if we have not yet done it... */
- INFO("Creating new current UUID\n");
- drbd_uuid_new_current(mdev);
- }
if (ns.peer == Primary ) {
/* Note: The condition ns.peer == Primary implies
that we are connected. Otherwise it would
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [Drbd-dev] DRBD8: Split-brain false positive on Primary/primary potential patch
2006-11-06 23:47 [Drbd-dev] DRBD8: Split-brain false positive on Primary/primary potential patch Montrose, Ernest
@ 2006-11-16 9:10 ` Philipp Reisner
2006-11-18 11:00 ` Philipp Reisner
1 sibling, 0 replies; 9+ messages in thread
From: Philipp Reisner @ 2006-11-16 9:10 UTC (permalink / raw)
To: drbd-dev; +Cc: Montrose, Ernest
Am Dienstag, 7. November 2006 00:47 schrieb Montrose, Ernest:
> When running Primary/Primary if the Heartbeat connection goes down when
> we recover we always split brain. Simon had an idea which I have
> implemented. He is on vacation so this may not reflect his exact idea.
>
> Essentially with this change, we do not create a new current UUID on the
> node unless I/O is seen. This prevent Split-Brain mitigation when both
> nodes are primary but only one node is originating I/O and never the
> other. He is only stand-by in that case.
>
> Take a look and let me know.
Hi Ernest,
I understand your reasoning, I see the patch, which I guess does
what you expect of it.
I do not want to do it that way for the following reasons:
* It is only applicable in case you are using a 1-node filesystem
on a primary-primary DRBD cluster.
* I do not want users to do this. Because with this setup it is
easily possible to mount the FS on both nodes concurrently.
I want to protect the from themselfs ;)
* Users using a 1-node filesystem should use DRBD withe
primary and secondary role.
* I rather want to fix DRBD's split brain recovery methods to deal
with a cluster crash of a primary-primary cluster (actually this
is item 41 in the ROADMAP file)
I have a few hours time today, I will work on this today...
-Phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [Drbd-dev] DRBD8: Split-brain false positive on Primary/primary potential patch
2006-11-06 23:47 [Drbd-dev] DRBD8: Split-brain false positive on Primary/primary potential patch Montrose, Ernest
2006-11-16 9:10 ` Philipp Reisner
@ 2006-11-18 11:00 ` Philipp Reisner
1 sibling, 0 replies; 9+ messages in thread
From: Philipp Reisner @ 2006-11-18 11:00 UTC (permalink / raw)
To: drbd-dev; +Cc: Montrose, Ernest
Am Dienstag, 7. November 2006 00:47 schrieb Montrose, Ernest:
> When running Primary/Primary if the Heartbeat connection goes down when
> we recover we always split brain. Simon had an idea which I have
> implemented. He is on vacation so this may not reflect his exact idea.
>
> Essentially with this change, we do not create a new current UUID on the
> node unless I/O is seen. This prevent Split-Brain mitigation when both
> nodes are primary but only one node is originating I/O and never the
> other. He is only stand-by in that case.
>
> Take a look and let me know.
>
Hi Ernset and Simon,
I found an good examply why I do not like this approach:
N1/P --- N2/P/M both primary, FS mounted on N2 and is completely idle.
N1/P - - N2/P/M network breaks (still unchanged UUIDs on both sides)
N1/P/M - - N2/P/M users mounts FS on N1 (and modifies data, new UUID N1)
N1/P - - N2/P/M users umounts FS on N1.
N1/P ->- N2/P/M Network gets repaired. Sync from N1 to N2.
With the patch you sent, we would get a resync from N1 to N2, instantly
corrupting all the cached information that the FS on N2 might have from
the data!
I understand you test scenario therefore I introduced this solution to your
problem:
Implemented a new after-slit-brain-0pri policy:
"discard-zero-changes"
Auto sync from the node that modified
blocks during the split brain situation, but only
if the target not did not touched a single block.
If both nodes touched their data, this policy
falls back to disconnect.
And a new after-sb-1pri & 2pri policy
"violently-as0p" Alsways take the decission of the "after-sb-0pri"
algorithm. Even if that causes case an erratic change
of the primarie's view of the data.
This is only usefull if you use an 1node FS (i.e.
not OCFS2 or GFS) with the allow-two-primaries
flag, _AND_ you really know what you are doing.
This is DANGEROUS and MAY CRASH YOUR MACHINE if you
have a FS mounted on the primary node.
Now you need to configure it like this:
after-sb-0pri discard-zero-changes;
after-sb-1pri violently-as0p;
after-sb-2pri violently-as0p;
And you can do the tests with the behaviour you expect, but other
users are free to select an other behaviour.
-Phil
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [Drbd-dev] DRBD8: Split-brain false positive on Primary/primary potential patch
@ 2006-11-16 12:52 Graham, Simon
2006-11-17 14:04 ` Lars Ellenberg
[not found] ` <5e77099e0611180419s77b9e3f5u172d853634174bd8@mail.gmail.com>
0 siblings, 2 replies; 9+ messages in thread
From: Graham, Simon @ 2006-11-16 12:52 UTC (permalink / raw)
To: Philipp Reisner, drbd-dev; +Cc: Montrose, Ernest
Not sure I agree that the current behavior is protecting users from themselves -- it only causes the split-brain if you lose the n/w and during 'normal' operation and there is nothing that protects against mounting a 1-node fs on both nodes of a primary-primary DRBD cluster.
Running primary-secondary doesn't work if you are in a situation where it is not possible to switch primaryness when failing over; a good example of that is if you want to run a Xen virtual machine on top of a DRBD partition and support live migration of the VM (the problem is that Xen doesn't provide the means to execute a script to change primaryness at the required point in the migration). Of course you could argue that this is a Xen bug _but_ pragmatically, the proposed patch to delay updating the UUID until an actual write occurs preserves (I believe) correctness in DRBD and works without introducing new features into Xen.
Recovering from split-brain automatically is of course something that is incredibly valuable but I think it can be treated orthogonally to the proposed fix.
Simon
-----Original Message-----
From: Philipp Reisner [mailto:philipp.reisner@linbit.com]
Sent: Thursday, November 16, 2006 4:10 AM
To: drbd-dev@linbit.com
Cc: Montrose, Ernest; Graham, Simon
Subject: Re: [Drbd-dev] DRBD8: Split-brain false positive on Primary/primary potential patch
Am Dienstag, 7. November 2006 00:47 schrieb Montrose, Ernest:
> When running Primary/Primary if the Heartbeat connection goes down when
> we recover we always split brain. Simon had an idea which I have
> implemented. He is on vacation so this may not reflect his exact idea.
>
> Essentially with this change, we do not create a new current UUID on the
> node unless I/O is seen. This prevent Split-Brain mitigation when both
> nodes are primary but only one node is originating I/O and never the
> other. He is only stand-by in that case.
>
> Take a look and let me know.
Hi Ernest,
I understand your reasoning, I see the patch, which I guess does
what you expect of it.
I do not want to do it that way for the following reasons:
* It is only applicable in case you are using a 1-node filesystem
on a primary-primary DRBD cluster.
* I do not want users to do this. Because with this setup it is
easily possible to mount the FS on both nodes concurrently.
I want to protect the from themselfs ;)
* Users using a 1-node filesystem should use DRBD withe
primary and secondary role.
* I rather want to fix DRBD's split brain recovery methods to deal
with a cluster crash of a primary-primary cluster (actually this
is item 41 in the ROADMAP file)
I have a few hours time today, I will work on this today...
-Phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [Drbd-dev] DRBD8: Split-brain false positive on Primary/primary potential patch
2006-11-16 12:52 Graham, Simon
@ 2006-11-17 14:04 ` Lars Ellenberg
[not found] ` <5e77099e0611180419s77b9e3f5u172d853634174bd8@mail.gmail.com>
1 sibling, 0 replies; 9+ messages in thread
From: Lars Ellenberg @ 2006-11-17 14:04 UTC (permalink / raw)
To: drbd-dev
/ 2006-11-16 07:52:14 -0500
\ Graham, Simon:
> Not sure I agree that the current behavior is protecting users from
> themselves -- it only causes the split-brain if you lose the n/w and
> during 'normal' operation and there is nothing that protects against
> mounting a 1-node fs on both nodes of a primary-primary DRBD cluster.
>
> Running primary-secondary doesn't work if you are in a situation where
> it is not possible to switch primaryness when failing over; a good
> example of that is if you want to run a Xen virtual machine on top of
> a DRBD partition and support live migration of the VM (the problem is
> that Xen doesn't provide the means to execute a script to change
> primaryness at the required point in the migration). Of course you
> could argue that this is a Xen bug _but_ pragmatically, the proposed
> patch to delay updating the UUID until an actual write occurs
> preserves (I believe) correctness in DRBD and works without
> introducing new features into Xen.
>
> Recovering from split-brain automatically is of course something that
> is incredibly valuable but I think it can be treated orthogonally to
> the proposed fix.
I agree here.
But see below why I still think Philipp is "right", too :)
But I think the provided patch (doing it only in al_begin_io) is wrong.
actually it needs to be done as soon as the bitmap is touched,
so it needs be done in "set_out_of_sync", which may be called in the
cleanup code after connection loss, too, and will be, typically, on the
actually active node.
when there is a journalled file system mounted, even if it had been
idle, there are periodic updates for the journal/superblock, so it would
be deferred only a few seconds on the actually active node.
on the "Primary" but inactive node, it would indeed defer this uuid update,
thus preventing the "split brain"...
one alternative would be to update the uuids where it is done now,
but only if we have been opened RW (we have that information anyways
somewhere), and do it again (unless already done) as soon as we are
opened rw. that would be correct, I think, and easy.
Why we could leave the code as is, anyways:
we can leave it like it is right now, because the "after split brain
recovery" strategy "discard least changes" would do the same thing:
your assumtion was that the "inactive" node does no changes.
zero is less than anything else...
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :
^ permalink raw reply [flat|nested] 9+ messages in thread[parent not found: <5e77099e0611180419s77b9e3f5u172d853634174bd8@mail.gmail.com>]
* [Drbd-dev] DRBD8: Split-brain false positive on Primary/primary potential patch
[not found] ` <5e77099e0611180419s77b9e3f5u172d853634174bd8@mail.gmail.com>
@ 2006-11-18 12:20 ` Sudhakar Mekathotti
2006-11-20 12:39 ` Lars Ellenberg
0 siblings, 1 reply; 9+ messages in thread
From: Sudhakar Mekathotti @ 2006-11-18 12:20 UTC (permalink / raw)
To: drbd-dev
[-- Attachment #1: Type: text/plain, Size: 2389 bytes --]
On 11/16/06, Graham, Simon <Simon.Graham@stratus.com> wrote:
>
> Not sure I agree that the current behavior is protecting users from
> themselves -- it only causes the split-brain if you lose the n/w and during
> 'normal' operation and there is nothing that protects against mounting a
> 1-node fs on both nodes of a primary-primary DRBD cluster.
>
> Running primary-secondary doesn't work if you are in a situation where it
> is not possible to switch primaryness when failing over; a good example of
> that is if you want to run a Xen virtual machine on top of a DRBD partition
> and support live migration of the VM (the problem is that Xen doesn't
> provide the means to execute a script to change primaryness at the required
> point in the migration). Of course you could argue that this is a Xen bug
> _but_ pragmatically, the proposed patch to delay updating the UUID until an
> actual write occurs preserves (I believe) correctness in DRBD and works
> without introducing new features into Xen.
>
> Recovering from split-brain automatically is of course something that is
> incredibly valuable but I think it can be treated orthogonally to the
> proposed fix.
I think from a technical perspective, automatically recovering from
split-brain is nice to have. But from a user perspective, I would in almost
all cases refrain from using that feature as I would like to make double
sure my data is consistent and makes 'business sense' before electing which
disk to be primary.
-----Original Message-----
> From: Philipp Reisner [mailto: philipp.reisner@linbit.com]
> Sent: Thursday, November 16, 2006 4:10 AM
> To: drbd-dev@linbit.com
> Cc: Montrose, Ernest; Graham, Simon
> Subject: Re: [Drbd-dev] DRBD8: Split-brain false positive on
> Primary/primary potential patch
>
> Am Dienstag, 7. November 2006 00:47 schrieb Montrose, Ernest:
> > When running Primary/Primary if the Heartbeat connection goes down when
> > we recover we always split brain. Simon had an idea which I have
> > implemented. He is on vacation so this may not reflect his exact idea.
> >
> > Essentially with this change, we do not create a new current UUID on the
> > node unless I/O is seen. This prevent Split-Brain mitigation when both
> > nodes are primary but only one node is originating I/O and never the
> > other. He is only stand-by in that case.
> >
> > Take a look and let me know.
>
>
[snip]
[-- Attachment #2: Type: text/html, Size: 3241 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [Drbd-dev] DRBD8: Split-brain false positive on Primary/primary potential patch
2006-11-18 12:20 ` Sudhakar Mekathotti
@ 2006-11-20 12:39 ` Lars Ellenberg
0 siblings, 0 replies; 9+ messages in thread
From: Lars Ellenberg @ 2006-11-20 12:39 UTC (permalink / raw)
To: drbd-dev
/ 2006-11-18 12:20:56 +0000
\ Sudhakar Mekathotti:
> On 11/16/06, Graham, Simon <Simon.Graham@stratus.com> wrote:
> >
> >Not sure I agree that the current behavior is protecting users from
> >themselves -- it only causes the split-brain if you lose the n/w and during
> >'normal' operation and there is nothing that protects against mounting a
> >1-node fs on both nodes of a primary-primary DRBD cluster.
> >
> >Running primary-secondary doesn't work if you are in a situation where it
> >is not possible to switch primaryness when failing over; a good example of
> >that is if you want to run a Xen virtual machine on top of a DRBD partition
> >and support live migration of the VM (the problem is that Xen doesn't
> >provide the means to execute a script to change primaryness at the required
> >point in the migration). Of course you could argue that this is a Xen bug
> >_but_ pragmatically, the proposed patch to delay updating the UUID until an
> >actual write occurs preserves (I believe) correctness in DRBD and works
> >without introducing new features into Xen.
> >
> >Recovering from split-brain automatically is of course something that is
> >incredibly valuable but I think it can be treated orthogonally to the
> >proposed fix.
>
>
> I think from a technical perspective, automatically recovering from
> split-brain is nice to have. But from a user perspective, I would in almost
> all cases refrain from using that feature as I would like to make double
> sure my data is consistent and makes 'business sense' before electing which
> disk to be primary.
and that is the reason why all the "recovery strategies" are configurable.
different deployments, different requirements, different settings.
one possible (and default) strategy is "disconnect",
i.e. refuse to talk to each other until operator intervenes.
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [Drbd-dev] DRBD8: Split-brain false positive on Primary/primary potential patch
@ 2006-11-20 13:38 Montrose, Ernest
2006-11-20 13:53 ` Philipp Reisner
0 siblings, 1 reply; 9+ messages in thread
From: Montrose, Ernest @ 2006-11-20 13:38 UTC (permalink / raw)
To: Philipp Reisner, drbd-dev
Phil,
Thanks! I will retest our scenario with this new configuration.
Hopefully this will yield the desired results for our specific
configuration. Thanks a lot.
EM--
-----Original Message-----
From: Philipp Reisner [mailto:philipp.reisner@linbit.com]
Sent: Saturday, November 18, 2006 6:01 AM
To: drbd-dev@linbit.com
Cc: Montrose, Ernest; Graham, Simon
Subject: Re: [Drbd-dev] DRBD8: Split-brain false positive on
Primary/primary potential patch
Am Dienstag, 7. November 2006 00:47 schrieb Montrose, Ernest:
> When running Primary/Primary if the Heartbeat connection goes down
when
> we recover we always split brain. Simon had an idea which I have
> implemented. He is on vacation so this may not reflect his exact
idea.
>
> Essentially with this change, we do not create a new current UUID on
the
> node unless I/O is seen. This prevent Split-Brain mitigation when both
> nodes are primary but only one node is originating I/O and never the
> other. He is only stand-by in that case.
>
> Take a look and let me know.
>
Hi Ernset and Simon,
I found an good examply why I do not like this approach:
N1/P --- N2/P/M both primary, FS mounted on N2 and is completely
idle.
N1/P - - N2/P/M network breaks (still unchanged UUIDs on both
sides)
N1/P/M - - N2/P/M users mounts FS on N1 (and modifies data, new
UUID N1)
N1/P - - N2/P/M users umounts FS on N1.
N1/P ->- N2/P/M Network gets repaired. Sync from N1 to N2.
With the patch you sent, we would get a resync from N1 to N2,
instantly
corrupting all the cached information that the FS on N2 might have
from
the data!
I understand you test scenario therefore I introduced this solution to
your
problem:
Implemented a new after-slit-brain-0pri policy:
"discard-zero-changes"
Auto sync from the node that modified
blocks during the split brain situation, but only
if the target not did not touched a single block.
If both nodes touched their data, this policy
falls back to disconnect.
And a new after-sb-1pri & 2pri policy
"violently-as0p" Alsways take the decission of the "after-sb-0pri"
algorithm. Even if that causes case an erratic
change
of the primarie's view of the data.
This is only usefull if you use an 1node FS (i.e.
not OCFS2 or GFS) with the allow-two-primaries
flag, _AND_ you really know what you are doing.
This is DANGEROUS and MAY CRASH YOUR MACHINE if
you
have a FS mounted on the primary node.
Now you need to configure it like this:
after-sb-0pri discard-zero-changes;
after-sb-1pri violently-as0p;
after-sb-2pri violently-as0p;
And you can do the tests with the behaviour you expect, but other
users are free to select an other behaviour.
-Phil
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [Drbd-dev] DRBD8: Split-brain false positive on Primary/primary potential patch
2006-11-20 13:38 Montrose, Ernest
@ 2006-11-20 13:53 ` Philipp Reisner
0 siblings, 0 replies; 9+ messages in thread
From: Philipp Reisner @ 2006-11-20 13:53 UTC (permalink / raw)
To: drbd-dev; +Cc: Montrose, Ernest
Am Montag, 20. November 2006 14:38 schrieb Montrose, Ernest:
> Phil,
> Thanks! I will retest our scenario with this new configuration.
> Hopefully this will yield the desired results for our specific
> configuration. Thanks a lot.
>
While I got on the toppic I realized that there are more cases
where the user assigned roles are might be in conflict with the
sync redirection that gets negotiated during connect.
Therefore I introduced the "rr-conflict" setting.
For your setup you should use "rr-conflict violently;"
-Phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-11-20 13:53 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-06 23:47 [Drbd-dev] DRBD8: Split-brain false positive on Primary/primary potential patch Montrose, Ernest
2006-11-16 9:10 ` Philipp Reisner
2006-11-18 11:00 ` Philipp Reisner
-- strict thread matches above, loose matches on Subject: below --
2006-11-16 12:52 Graham, Simon
2006-11-17 14:04 ` Lars Ellenberg
[not found] ` <5e77099e0611180419s77b9e3f5u172d853634174bd8@mail.gmail.com>
2006-11-18 12:20 ` Sudhakar Mekathotti
2006-11-20 12:39 ` Lars Ellenberg
2006-11-20 13:38 Montrose, Ernest
2006-11-20 13:53 ` Philipp Reisner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.