* [Drbd-dev] DRBD8: Stuck in WFBitMapS state even across reboot.
@ 2006-11-27 22:28 Montrose, Ernest
2006-11-30 15:11 ` Philipp Reisner
0 siblings, 1 reply; 7+ messages in thread
From: Montrose, Ernest @ 2006-11-27 22:28 UTC (permalink / raw)
To: drbd-dev
[-- Attachment #1: Type: text/plain, Size: 528 bytes --]
Hi all,
I am seeing a problem where both nodes get stuck in the WFBitMaps state
even across a reboot.
Dmesg also reveals this on both nodes:
Drbd0: [drbd0_receiver/4800] sock_sendmsg time expired, ko = 4294967256
.......
Drbd0: Split-brain detected, manuall solved. Sync from this node
Both nodes reports this split brain message. My configuration for
device 0 has the following new
Settings NET:
rr-conflict violently;
after-sb-1pri violently-as0p;
after-sb-2pri violently-as0p;
Thanks for any help.
EM--
[-- Attachment #2: Type: text/html, Size: 6150 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Drbd-dev] DRBD8: Stuck in WFBitMapS state even across reboot.
2006-11-27 22:28 [Drbd-dev] DRBD8: Stuck in WFBitMapS state even across reboot Montrose, Ernest
@ 2006-11-30 15:11 ` Philipp Reisner
0 siblings, 0 replies; 7+ messages in thread
From: Philipp Reisner @ 2006-11-30 15:11 UTC (permalink / raw)
To: drbd-dev; +Cc: Montrose, Ernest
Am Montag, 27. November 2006 23:28 schrieb Montrose, Ernest:
> Hi all,
> I am seeing a problem where both nodes get stuck in the WFBitMaps state
> even across a reboot.
> Dmesg also reveals this on both nodes:
> Drbd0: [drbd0_receiver/4800] sock_sendmsg time expired, ko = 4294967256
> .......
> Drbd0: Split-brain detected, manuall solved. Sync from this node
> Both nodes reports this split brain message. My configuration for
> device 0 has the following new
> Settings NET:
> rr-conflict violently;
> after-sb-1pri violently-as0p;
> after-sb-2pri violently-as0p;
>
> Thanks for any help.
>
Hi Ernest,
Could you please give us a better description, how we can reproduce this
here ?
-Phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [Drbd-dev] DRBD8: Stuck in WFBitMapS state even across reboot.
@ 2006-11-30 15:38 Montrose, Ernest
2006-11-30 19:59 ` Lars Ellenberg
0 siblings, 1 reply; 7+ messages in thread
From: Montrose, Ernest @ 2006-11-30 15:38 UTC (permalink / raw)
To: Philipp Reisner, drbd-dev
Phil,
This involves Xen Vm's. I would create one vm, I would then put an i/o
load on there (Something that keeps reading and writing). I would then
go to the host and do an ifdown on the heartbeat interface in an attempt
to force a split brain situation. I would then do an ifup. And every now
and then this
would happen (not all the time). When it happens, it survives a reboot.
I actually have not figured out how to get out of it.
I will try to find a more automatic way to reproduce it.
Thanks,
EM--
-----Original Message-----
From: Philipp Reisner [mailto:philipp.reisner@linbit.com]
Sent: Thursday, November 30, 2006 10:12 AM
To: drbd-dev@linbit.com
Cc: Montrose, Ernest
Subject: Re: [Drbd-dev] DRBD8: Stuck in WFBitMapS state even across
reboot.
Am Montag, 27. November 2006 23:28 schrieb Montrose, Ernest:
> Hi all,
> I am seeing a problem where both nodes get stuck in the WFBitMaps
state
> even across a reboot.
> Dmesg also reveals this on both nodes:
> Drbd0: [drbd0_receiver/4800] sock_sendmsg time expired, ko =
4294967256
> .......
> Drbd0: Split-brain detected, manuall solved. Sync from this node
> Both nodes reports this split brain message. My configuration for
> device 0 has the following new
> Settings NET:
> rr-conflict violently;
> after-sb-1pri violently-as0p;
> after-sb-2pri violently-as0p;
>
> Thanks for any help.
>
Hi Ernest,
Could you please give us a better description, how we can reproduce this
here ?
-Phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Drbd-dev] DRBD8: Stuck in WFBitMapS state even across reboot.
2006-11-30 15:38 Montrose, Ernest
@ 2006-11-30 19:59 ` Lars Ellenberg
2006-12-01 9:43 ` Philipp Reisner
0 siblings, 1 reply; 7+ messages in thread
From: Lars Ellenberg @ 2006-11-30 19:59 UTC (permalink / raw)
To: drbd-dev
/ 2006-11-30 10:38:33 -0500
\ Montrose, Ernest:
> Phil,
> This involves Xen Vm's. I would create one vm, I would then put an i/o
> load on there (Something that keeps reading and writing). I would then
> go to the host and do an ifdown on the heartbeat interface in an attempt
> to force a split brain situation. I would then do an ifup. And every now
> and then this
> would happen (not all the time). When it happens, it survives a reboot.
> I actually have not figured out how to get out of it.
>
> I will try to find a more automatic way to reproduce it.
drbd_receiver.c, drbd_asb_recover_0p
| ch_peer = mdev->p_uuid[UUID_SIZE];
| ch_self = drbd_bm_total_weight(mdev); ### <==
this ch_self may be different
from the one we communicated before, right?
| switch ( mdev->net_conf->after_sb_0p ) {
| ...
|
| case DiscardZeroChg:
so, if we communicated ch_self == 0, but now ch_self is > 0,
and ch_peer is 0 (inactive peer sees this reversed), then
| if( ch_peer == 0 && ch_self == 0) {
inactive peer does this, and may decide he is the source;
| rv=test_bit(DISCARD_CONCURRENT,&mdev->flags) ? -1 : 1;
| break;
| } else {
active peer does this branch,
and decides he is the source.
| if ( ch_peer == 0 ) { rv = 1; break; }
| if ( ch_self == 0 ) { rv = -1; break; }
| }
| if( mdev->net_conf->after_sb_0p == DiscardZeroChg ) break;
doh. have to think about that...
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [Drbd-dev] DRBD8: Stuck in WFBitMapS state even across reboot.
@ 2006-11-30 20:12 Montrose, Ernest
0 siblings, 0 replies; 7+ messages in thread
From: Montrose, Ernest @ 2006-11-30 20:12 UTC (permalink / raw)
To: Lars Ellenberg, drbd-dev
Lars,
Interesting...Actually I am currently investigating a situation where
from an initial creation state, two out of my 4 devices will sync the
other two
Will get stuck in Inconsistent/Inconsistent Dstate and would never sync,
of course.
Your analysis might hold for that case too. Only happens when my net
configuration is:
rr-conflict violently;
after-sb-0pri discard-zero-changes
after-sb-1pri violently-as0p
after-sb-2pri violently-as0p
This situation is automatic after we first install and enabling of drbd
anew.
The configuration above works great if you install without them, sync
and
Then add them later and reboot.
EM--
-----Original Message-----
From: drbd-dev-bounces@linbit.com [mailto:drbd-dev-bounces@linbit.com]
On Behalf Of Lars Ellenberg
Sent: Thursday, November 30, 2006 3:00 PM
To: drbd-dev@linbit.com
Subject: Re: [Drbd-dev] DRBD8: Stuck in WFBitMapS state even across
reboot.
/ 2006-11-30 10:38:33 -0500
\ Montrose, Ernest:
> Phil,
> This involves Xen Vm's. I would create one vm, I would then put an
i/o
> load on there (Something that keeps reading and writing). I would
then
> go to the host and do an ifdown on the heartbeat interface in an
attempt
> to force a split brain situation. I would then do an ifup. And every
now
> and then this
> would happen (not all the time). When it happens, it survives a
reboot.
> I actually have not figured out how to get out of it.
>
> I will try to find a more automatic way to reproduce it.
drbd_receiver.c, drbd_asb_recover_0p
| ch_peer = mdev->p_uuid[UUID_SIZE];
| ch_self = drbd_bm_total_weight(mdev); ### <==
this ch_self may be different
from the one we communicated before, right?
| switch ( mdev->net_conf->after_sb_0p ) {
| ...
|
| case DiscardZeroChg:
so, if we communicated ch_self == 0, but now ch_self is > 0,
and ch_peer is 0 (inactive peer sees this reversed), then
| if( ch_peer == 0 && ch_self == 0) {
inactive peer does this, and may decide he is the source;
| rv=test_bit(DISCARD_CONCURRENT,&mdev->flags) ?
-1 : 1;
| break;
| } else {
active peer does this branch,
and decides he is the source.
| if ( ch_peer == 0 ) { rv = 1; break; }
| if ( ch_self == 0 ) { rv = -1; break; }
| }
| if( mdev->net_conf->after_sb_0p == DiscardZeroChg )
break;
doh. have to think about that...
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :
_______________________________________________
drbd-dev mailing list
drbd-dev@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-dev
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Drbd-dev] DRBD8: Stuck in WFBitMapS state even across reboot.
2006-11-30 19:59 ` Lars Ellenberg
@ 2006-12-01 9:43 ` Philipp Reisner
0 siblings, 0 replies; 7+ messages in thread
From: Philipp Reisner @ 2006-12-01 9:43 UTC (permalink / raw)
To: drbd-dev
[-- Attachment #1: Type: text/plain, Size: 517 bytes --]
[...]
Lars, that's right, this is the reason for the race condition.
I think this patch fixes this. With this patch it should no longer
be possible to get both nodes into the WFBitMaps state.
It is in SVN with revision 2607.
Ernest, could you repeat your tests with this revision? Thanks!
-Phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria http://www.linbit.com :
[-- Attachment #2: both_in_WFBitMaps_fix.diff --]
[-- Type: text/x-diff, Size: 1264 bytes --]
Index: drbd_receiver.c
===================================================================
--- drbd_receiver.c (revision 2603)
+++ drbd_receiver.c (working copy)
@@ -1668,7 +1668,7 @@
peer = mdev->p_uuid[Bitmap] & 1;
ch_peer = mdev->p_uuid[UUID_SIZE];
- ch_self = drbd_bm_total_weight(mdev);
+ ch_self = mdev->comm_bm_set;
switch ( mdev->net_conf->after_sb_0p ) {
case Consensus:
Index: drbd_main.c
===================================================================
--- drbd_main.c (revision 2603)
+++ drbd_main.c (working copy)
@@ -1288,7 +1288,8 @@
: 0;
}
- p.uuid[UUID_SIZE] = cpu_to_be64(drbd_bm_total_weight(mdev));
+ mdev->comm_bm_set = drbd_bm_total_weight(mdev);
+ p.uuid[UUID_SIZE] = cpu_to_be64(mdev->comm_bm_set);
uuid_flags |= mdev->net_conf->want_lose ? 1 : 0;
uuid_flags |= test_bit(CRASHED_PRIMARY, &mdev->flags) ? 2 : 0;
p.uuid[UUID_FLAGS] = cpu_to_be64(uuid_flags);
Index: drbd_int.h
===================================================================
--- drbd_int.h (revision 2603)
+++ drbd_int.h (working copy)
@@ -853,6 +853,7 @@
unsigned int peer_seq;
spinlock_t peer_seq_lock;
int minor;
+ unsigned long comm_bm_set; // communicated number of set bits.
};
static inline drbd_dev *minor_to_mdev(int minor)
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [Drbd-dev] DRBD8: Stuck in WFBitMapS state even across reboot.
@ 2006-12-01 12:06 Montrose, Ernest
0 siblings, 0 replies; 7+ messages in thread
From: Montrose, Ernest @ 2006-12-01 12:06 UTC (permalink / raw)
To: Philipp Reisner, drbd-dev
Phil,
Thanks, will test and report back.
EM--
-----Original Message-----
From: drbd-dev-bounces@linbit.com [mailto:drbd-dev-bounces@linbit.com]
On Behalf Of Philipp Reisner
Sent: Friday, December 01, 2006 4:44 AM
To: drbd-dev@linbit.com
Subject: Re: [Drbd-dev] DRBD8: Stuck in WFBitMapS state even across
reboot.
[...]
Lars, that's right, this is the reason for the race condition.
I think this patch fixes this. With this patch it should no longer
be possible to get both nodes into the WFBitMaps state.
It is in SVN with revision 2607.
Ernest, could you repeat your tests with this revision? Thanks!
-Phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-12-01 12:06 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-27 22:28 [Drbd-dev] DRBD8: Stuck in WFBitMapS state even across reboot Montrose, Ernest
2006-11-30 15:11 ` Philipp Reisner
-- strict thread matches above, loose matches on Subject: below --
2006-11-30 15:38 Montrose, Ernest
2006-11-30 19:59 ` Lars Ellenberg
2006-12-01 9:43 ` Philipp Reisner
2006-11-30 20:12 Montrose, Ernest
2006-12-01 12:06 Montrose, Ernest
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.