* [Drbd-dev] DRBD8: disconnecting while already disconnecting can hang the receiver
@ 2007-11-18 23:11 Montrose, Ernest
2007-11-27 10:36 ` Philipp Reisner
2007-11-27 13:06 ` Montrose, Ernest
0 siblings, 2 replies; 9+ messages in thread
From: Montrose, Ernest @ 2007-11-18 23:11 UTC (permalink / raw)
To: drbd-dev
[-- Attachment #1.1: Type: text/plain, Size: 519 bytes --]
Hi all,
There is problem that manifest itself this way:
Consider 2 nodes A and B, "A" issues a disconnect to r2, B gets into drbd_receiver.c: drbd_disconnect(). While B is disconnecting,
it gets a "disconnect" request for r2. This hangs the receiver.
I am thinking that we should just not allow the state transition to "disconnecting" if we are already doing so. We could redefine "Standalone" to mean
less then or equal to "TearDown" in some cases. I include a patch to show this.
Thanks,
EM--
[-- Attachment #1.2: Type: text/html, Size: 963 bytes --]
[-- Attachment #2: drbdsetup_hang.patch --]
[-- Type: text/plain, Size: 494 bytes --]
Index: drbd/drbd_main.c
===================================================================
--- drbd/drbd_main.c (revision 20723)
+++ drbd/drbd_main.c (working copy)
@@ -589,7 +589,7 @@
if( (ns.conn == StartingSyncT || ns.conn == StartingSyncS ) &&
os.conn > Connected) rv=SS_ResyncRunning;
- if( ns.conn == Disconnecting && os.conn == StandAlone)
+ if( ns.conn == Disconnecting && os.conn <= TearDown )
rv=SS_AlreadyStandAlone;
if( ns.disk > Attaching && os.disk == Diskless)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Drbd-dev] DRBD8: disconnecting while already disconnecting can hang the receiver
2007-11-18 23:11 [Drbd-dev] DRBD8: disconnecting while already disconnecting can hang the receiver Montrose, Ernest
@ 2007-11-27 10:36 ` Philipp Reisner
2007-11-27 13:06 ` Montrose, Ernest
1 sibling, 0 replies; 9+ messages in thread
From: Philipp Reisner @ 2007-11-27 10:36 UTC (permalink / raw)
To: drbd-dev; +Cc: Montrose, Ernest
On Monday 19 November 2007 00:11:36 Montrose, Ernest wrote:
> Hi all,
> There is problem that manifest itself this way:
>
> Consider 2 nodes A and B, "A" issues a disconnect to r2, B gets into
> drbd_receiver.c: drbd_disconnect(). While B is disconnecting, it gets a
> "disconnect" request for r2. This hangs the receiver.
>
> I am thinking that we should just not allow the state transition to
> "disconnecting" if we are already doing so. We could redefine "Standalone"
> to mean less then or equal to "TearDown" in some cases. I include a patch
> to show this.
>
Hi Ernest,
I tried hard to reproduce/understand this. I tried with various
instrumentations but I can not reproduce this.
I assumed that it "hangs" in the drbd_state_lock() function, but
I could not find it by experiment nor by drawing timing diagrams.
Could you provide some LOGs of this event ?
Thanks!
The best I get:
Node1:
[42951592.560000] drbd0: state_locked
[42951592.560000] drbd0: state_unlocked
[42951592.560000] drbd0: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
[42951592.560000] drbd0: state_locked
[42951592.560000] drbd0: state_unlocked
[42951592.560000] drbd0: Writing meta data super block now.
[42951592.560000] drbd0: sock was shut down by peer
[42951592.560000] drbd0: short read expecting header on sock: r=0
[42951592.560000] drbd0: sock_recvmsg returned -104
[42951592.560000] drbd0: asender terminated
[42951592.560000] drbd0: tl_clear()
[42951592.560000] drbd0: Connection closed
[42951592.560000] drbd0: conn( Disconnecting -> StandAlone )
[42951592.560000] drbd0: receiver terminated
Node2:
[42951603.570000] drbd0: state_locked
[42951603.570000] drbd0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
[42951603.570000] drbd0: Writing meta data super block now.
[42951603.570000] drbd0: state_unlocked
[42951603.570000] drbd0: conn( TearDown -> Disconnecting )
[42951603.570000] drbd0: asender terminated
[42951603.570000] drbd0: tl_clear()
[42951603.570000] drbd0: Connection closed
[42951603.570000] drbd0: conn( Disconnecting -> StandAlone )
[42951603.570000] drbd0: receiver terminated
Of course the state transition TearDown -> Disconnecting is not right/fine, but
I can not reproduce a hang of the receiver...
-phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [Drbd-dev] DRBD8: disconnecting while already disconnecting can hang the receiver
2007-11-18 23:11 [Drbd-dev] DRBD8: disconnecting while already disconnecting can hang the receiver Montrose, Ernest
2007-11-27 10:36 ` Philipp Reisner
@ 2007-11-27 13:06 ` Montrose, Ernest
2007-11-27 14:52 ` Philipp Reisner
` (2 more replies)
1 sibling, 3 replies; 9+ messages in thread
From: Montrose, Ernest @ 2007-11-27 13:06 UTC (permalink / raw)
To: Philipp Reisner, drbd-dev
Phil,
I looked at my notes...To reproduce this you can fake the condition this
way:
* Issue a disconnect on node0 for r5.
* Locally on node1 we will get into drbd_receiver.c:drbd_disconnect()
and while there in drbd_disconnect() (Put a small delay there or
something); issue a "drbdsetup /dev/drbd5 disconnect".
This last drbdsetup will time out with " No response from the DRBD
driver! Is the module loaded?"
But the driver will be waiting forever in
drbd_nl.c:drbd_nl_disconnect().
Hope that helps. If not I'll back out my fix and sent you the exact
instrumentation to reproduce it.
EM--
-----Original Message-----
From: Philipp Reisner [mailto:philipp.reisner@linbit.com]
Sent: Tuesday, November 27, 2007 5:36 AM
To: drbd-dev@linbit.com
Cc: Montrose, Ernest
Subject: Re: [Drbd-dev] DRBD8: disconnecting while already disconnecting
can hang the receiver
On Monday 19 November 2007 00:11:36 Montrose, Ernest wrote:
> Hi all,
> There is problem that manifest itself this way:
>
> Consider 2 nodes A and B, "A" issues a disconnect to r2, B gets into
> drbd_receiver.c: drbd_disconnect(). While B is disconnecting, it gets
a
> "disconnect" request for r2. This hangs the receiver.
>
> I am thinking that we should just not allow the state transition to
> "disconnecting" if we are already doing so. We could redefine
"Standalone"
> to mean less then or equal to "TearDown" in some cases. I include a
patch
> to show this.
>
Hi Ernest,
I tried hard to reproduce/understand this. I tried with various
instrumentations but I can not reproduce this.
I assumed that it "hangs" in the drbd_state_lock() function, but
I could not find it by experiment nor by drawing timing diagrams.
Could you provide some LOGs of this event ?
Thanks!
The best I get:
Node1:
[42951592.560000] drbd0: state_locked
[42951592.560000] drbd0: state_unlocked
[42951592.560000] drbd0: peer( Secondary -> Unknown ) conn( Connected ->
Disconnecting ) pdsk( UpToDate -> DUnknown )
[42951592.560000] drbd0: state_locked
[42951592.560000] drbd0: state_unlocked
[42951592.560000] drbd0: Writing meta data super block now.
[42951592.560000] drbd0: sock was shut down by peer
[42951592.560000] drbd0: short read expecting header on sock: r=0
[42951592.560000] drbd0: sock_recvmsg returned -104
[42951592.560000] drbd0: asender terminated
[42951592.560000] drbd0: tl_clear()
[42951592.560000] drbd0: Connection closed
[42951592.560000] drbd0: conn( Disconnecting -> StandAlone )
[42951592.560000] drbd0: receiver terminated
Node2:
[42951603.570000] drbd0: state_locked
[42951603.570000] drbd0: peer( Secondary -> Unknown ) conn( Connected ->
TearDown ) pdsk( UpToDate -> DUnknown )
[42951603.570000] drbd0: Writing meta data super block now.
[42951603.570000] drbd0: state_unlocked
[42951603.570000] drbd0: conn( TearDown -> Disconnecting )
[42951603.570000] drbd0: asender terminated
[42951603.570000] drbd0: tl_clear()
[42951603.570000] drbd0: Connection closed
[42951603.570000] drbd0: conn( Disconnecting -> StandAlone )
[42951603.570000] drbd0: receiver terminated
Of course the state transition TearDown -> Disconnecting is not
right/fine, but
I can not reproduce a hang of the receiver...
-phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Drbd-dev] DRBD8: disconnecting while already disconnecting can hang the receiver
2007-11-27 13:06 ` Montrose, Ernest
@ 2007-11-27 14:52 ` Philipp Reisner
2007-11-27 15:06 ` Montrose, Ernest
2007-11-27 21:51 ` Montrose, Ernest
2 siblings, 0 replies; 9+ messages in thread
From: Philipp Reisner @ 2007-11-27 14:52 UTC (permalink / raw)
To: drbd-dev; +Cc: Montrose, Ernest
On Tuesday 27 November 2007 14:06:46 Montrose, Ernest wrote:
> Phil,
> I looked at my notes...To reproduce this you can fake the condition this
> way:
> * Issue a disconnect on node0 for r5.
> * Locally on node1 we will get into drbd_receiver.c:drbd_disconnect()
> and while there in drbd_disconnect() (Put a small delay there or
> something); issue a "drbdsetup /dev/drbd5 disconnect".
>
> This last drbdsetup will time out with " No response from the DRBD
> driver! Is the module loaded?"
> But the driver will be waiting forever in
> drbd_nl.c:drbd_nl_disconnect().
>
Yes. This is what I tested. I had a delay in drbd_disconenct().
I did not managed to get it into troubles.
BTW, while looking at the patch, I would have done it like this:
@@ -589,7 +589,8 @@ STATIC int is_valid_state_transition(drbd_dev*
mdev,drbd_state_t ns,drbd_state_t
if( (ns.conn == StartingSyncT || ns.conn == StartingSyncS ) &&
os.conn > Connected) rv=SS_ResyncRunning;
- if( ns.conn == Disconnecting && os.conn == StandAlone)
+ if ( ns.conn == Disconnecting &&
+ ( os.conn == StandAlone || os.conn == TearDown ) )
rv=SS_AlreadyStandAlone;
if( ns.disk > Attaching && os.disk == Diskless)
-Phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [Drbd-dev] DRBD8: disconnecting while already disconnecting can hang the receiver
2007-11-27 13:06 ` Montrose, Ernest
2007-11-27 14:52 ` Philipp Reisner
@ 2007-11-27 15:06 ` Montrose, Ernest
2007-11-27 21:51 ` Montrose, Ernest
2 siblings, 0 replies; 9+ messages in thread
From: Montrose, Ernest @ 2007-11-27 15:06 UTC (permalink / raw)
To: Philipp Reisner, drbd-dev
Phil,
Interesting...With a delay at the end of drbd_disconnect() it happens
every time for me. What I did is that I delay for 30 seconds and
quickly issue the disconnect in that window.
I added this at the very end of drbd_disconnect:
if(os.conn == TearDown && ns.conn == Unconnected && mdev->minor ==11)
{
INFO("drbd_disconnect: ##5# EM-- Done but waiting 30 seconds######\n");
set_current_state(TASK_INTERRUPTIBLE);
schedule_timeout(HZ * 30);
INFO("drbd_disconnect: ##5# EM-- Done ##### waiting 30
seconds######\n");
}
Notice mdev->minor == 11..you can change the 11 to some other device
that you are doing the disconnect on. Once you see the message "done
waiting" then you'd issue the local disconnect. Put the instrumented
driver on one side (The side that will do the last disconnect)
BTW, I agree that your spin on the patch is less intrusive. I will test
that and let you know.
EM--
-----Original Message-----
From: Philipp Reisner [mailto:philipp.reisner@linbit.com]
Sent: Tuesday, November 27, 2007 9:53 AM
To: drbd-dev@linbit.com
Cc: Montrose, Ernest
Subject: Re: [Drbd-dev] DRBD8: disconnecting while already disconnecting
can hang the receiver
On Tuesday 27 November 2007 14:06:46 Montrose, Ernest wrote:
> Phil,
> I looked at my notes...To reproduce this you can fake the condition
this
> way:
> * Issue a disconnect on node0 for r5.
> * Locally on node1 we will get into drbd_receiver.c:drbd_disconnect()
> and while there in drbd_disconnect() (Put a small delay there or
> something); issue a "drbdsetup /dev/drbd5 disconnect".
>
> This last drbdsetup will time out with " No response from the DRBD
> driver! Is the module loaded?"
> But the driver will be waiting forever in
> drbd_nl.c:drbd_nl_disconnect().
>
Yes. This is what I tested. I had a delay in drbd_disconenct().
I did not managed to get it into troubles.
BTW, while looking at the patch, I would have done it like this:
@@ -589,7 +589,8 @@ STATIC int is_valid_state_transition(drbd_dev*
mdev,drbd_state_t ns,drbd_state_t
if( (ns.conn == StartingSyncT || ns.conn == StartingSyncS ) &&
os.conn > Connected) rv=SS_ResyncRunning;
- if( ns.conn == Disconnecting && os.conn == StandAlone)
+ if ( ns.conn == Disconnecting &&
+ ( os.conn == StandAlone || os.conn == TearDown ) )
rv=SS_AlreadyStandAlone;
if( ns.disk > Attaching && os.disk == Diskless)
-Phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [Drbd-dev] DRBD8: disconnecting while already disconnecting can hang the receiver
2007-11-27 13:06 ` Montrose, Ernest
2007-11-27 14:52 ` Philipp Reisner
2007-11-27 15:06 ` Montrose, Ernest
@ 2007-11-27 21:51 ` Montrose, Ernest
2007-11-28 0:26 ` Lars Ellenberg
` (2 more replies)
2 siblings, 3 replies; 9+ messages in thread
From: Montrose, Ernest @ 2007-11-27 21:51 UTC (permalink / raw)
To: Philipp Reisner, drbd-dev
Phil,
Phil,
Your modification to the original patch will break it actually. The
reason is that we can get into "disconnecting" anywhere. Below I have
some logs with the problem happening.
On Node0:
# drbdsetup /dev/drbd16 disconnect
Nov 27 16:38:20 node1 kernel: drbd16: peer( Secondary -> Unknown ) conn(
Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
Nov 27 16:38:20 node1 kernel: drbd16: Creating new current UUID
Nov 27 16:38:20 node1 kernel: drbd16: short read expecting header on
sock: r=-512
Nov 27 16:38:20 node1 kernel: drbd16: asender terminated
Nov 27 16:38:20 node1 kernel: drbd16: tl_clear()
Nov 27 16:38:20 node1 kernel: drbd16: Connection closed
Nov 27 16:38:20 node1 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:20 node1 kernel: drbd16: conn( Disconnecting -> StandAlone
)
Nov 27 16:38:20 node1 kernel: drbd16: receiver terminated
Nov 27 16:38:23 node1 kernel: drbd16: conn( StandAlone -> Unconnected )
Nov 27 16:38:23 node1 kernel: drbd16: receiver (re)started
Nov 27 16:38:23 node1 kernel: drbd16: conn( Unconnected -> WFConnection
)
Nov 27 16:38:26 node1 kernel: drbd16: conn( WFConnection ->
WFReportParams )
Nov 27 16:38:26 node1 kernel: drbd16: Handshake successful: DRBD Network
Protocol version 86
Nov 27 16:38:26 node1 kernel: drbd16: peer( Unknown -> Secondary ) conn(
WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
Nov 27 16:38:26 node1 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:26 node1 kernel: drbd16: conn( WFBitMapS -> SyncSource )
pdsk( UpToDate -> Inconsistent )
Nov 27 16:38:26 node1 kernel: drbd16: Began resync as SyncSource (will
sync 4 KB [1 bits set]).
Nov 27 16:38:26 node1 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:26 node1 kernel: drbd16: Resync done (total 1 sec; paused 0
sec; 4 K/sec)
Nov 27 16:38:26 node1 kernel: drbd16: conn( SyncSource -> Connected )
pdsk( Inconsistent -> UpToDate )
Nov 27 16:38:27 node1 kernel: drbd16: Writing meta data super block now.
======On Node1============
Nov 27 16:38:20 node0 kernel: drbd16: peer( Primary -> Unknown ) conn(
Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
Nov 27 16:38:20 node0 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:20 node0 kernel: drbd16: meta connection shut down by peer.
Nov 27 16:38:20 node0 kernel: drbd16: asender terminated
Nov 27 16:38:20 node0 kernel: drbd16: tl_clear()
Nov 27 16:38:20 node0 kernel: drbd16: Connection closed
Nov 27 16:38:20 node0 kernel: drbd16: conn( TearDown -> Unconnected )
Nov 27 16:38:20 node0 kernel: drbd16: drbd_disconnect: ##5# EM-- Done
but waiting 30 seconds######
====Issue disconnect here=====
# drbdsetup /dev/drbd16 disconnect
No response from the DRBD driver! Is the module loaded?
Nov 27 16:38:26 node0 kernel: drbd16: conn( Unconnected -> Disconnecting
)
Nov 27 16:38:26 node0 kernel: drbd16: drbd_nl_disconnect: EM-- Start
wait_event_interruptible for mdev->state.conn==StandAlone ****
Nov 27 16:38:26 node0 kernel: drbd16: drbd_disconnect: ##5# EM-- Done
##### waiting 30 seconds######
Nov 27 16:38:26 node0 kernel: drbd16: receiver terminated
Nov 27 16:38:26 node0 kernel: drbd16: receiver (re)started
Nov 27 16:38:26 node0 kernel: drbd16: ASSERT( mdev->state.conn >=
Unconnected ) in
/sandbox/emontros/devel/trunk/platform/drbd/src/drbd/drbd_receiver.c:715
Nov 27 16:38:26 node0 kernel: drbd16: conn( Disconnecting ->
WFConnection )
Nov 27 16:38:26 node0 kernel: drbd16: conn( WFConnection ->
WFReportParams )
Nov 27 16:38:26 node0 kernel: drbd16: Handshake successful: DRBD Network
Protocol version 86
Nov 27 16:38:26 node0 kernel: drbd16: receive_state: EM-- ....
Nov 27 16:38:26 node0 kernel: drbd16: receive_state: EM-- ....calling
sync_handshake
Nov 27 16:38:26 node0 kernel: drbd16: peer( Unknown -> Primary ) conn(
WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Nov 27 16:38:26 node0 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:26 node0 kernel: drbd16: conn( WFBitMapT -> WFSyncUUID )
Nov 27 16:38:26 node0 kernel: drbd16: conn( WFSyncUUID -> SyncTarget )
disk( UpToDate -> Inconsistent )
Nov 27 16:38:26 node0 kernel: drbd16: Began resync as SyncTarget (will
sync 4 KB [1 bits set]).
Nov 27 16:38:26 node0 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:27 node0 kernel: drbd16: Resync done (total 1 sec; paused 0
sec; 4 K/sec)
Nov 27 16:38:27 node0 kernel: drbd16: conn( SyncTarget -> Connected )
disk( Inconsistent -> UpToDate )
Nov 27 16:38:27 node0 kernel: drbd16: Writing meta data super block now.
===Done logging====
Notice that on node1 we never transition to Standalone after the
disconnect. It is because of that that we wait forever.
-----Original Message-----
From: Philipp Reisner [mailto:philipp.reisner@linbit.com]
Sent: Tuesday, November 27, 2007 9:53 AM
To: drbd-dev@linbit.com
Cc: Montrose, Ernest
Subject: Re: [Drbd-dev] DRBD8: disconnecting while already disconnecting
can hang the receiver
On Tuesday 27 November 2007 14:06:46 Montrose, Ernest wrote:
> Phil,
> I looked at my notes...To reproduce this you can fake the condition
this
> way:
> * Issue a disconnect on node0 for r5.
> * Locally on node1 we will get into drbd_receiver.c:drbd_disconnect()
> and while there in drbd_disconnect() (Put a small delay there or
> something); issue a "drbdsetup /dev/drbd5 disconnect".
>
> This last drbdsetup will time out with " No response from the DRBD
> driver! Is the module loaded?"
> But the driver will be waiting forever in
> drbd_nl.c:drbd_nl_disconnect().
>
Yes. This is what I tested. I had a delay in drbd_disconenct().
I did not managed to get it into troubles.
BTW, while looking at the patch, I would have done it like this:
@@ -589,7 +589,8 @@ STATIC int is_valid_state_transition(drbd_dev*
mdev,drbd_state_t ns,drbd_state_t
if( (ns.conn == StartingSyncT || ns.conn == StartingSyncS ) &&
os.conn > Connected) rv=SS_ResyncRunning;
- if( ns.conn == Disconnecting && os.conn == StandAlone)
+ if ( ns.conn == Disconnecting &&
+ ( os.conn == StandAlone || os.conn == TearDown ) )
rv=SS_AlreadyStandAlone;
if( ns.disk > Attaching && os.disk == Diskless)
-Phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Drbd-dev] DRBD8: disconnecting while already disconnecting can hang the receiver
2007-11-27 21:51 ` Montrose, Ernest
@ 2007-11-28 0:26 ` Lars Ellenberg
2007-11-28 13:09 ` Philipp Reisner
2007-11-28 14:08 ` Montrose, Ernest
2 siblings, 0 replies; 9+ messages in thread
From: Lars Ellenberg @ 2007-11-28 0:26 UTC (permalink / raw)
To: drbd-dev
On Tue, Nov 27, 2007 at 04:51:21PM -0500, Montrose, Ernest wrote:
> Phil,
> Your modification to the original patch will break it actually. The
> reason is that we can get into "disconnecting" anywhere.
and that should not be the case.
unfortunately it is, because it is forced everywhere.
grep -n force_state *[ch] | tr -s '\t ' ' ' | grep ': '
drbd_actlog.c:852: drbd_force_state(mdev,NS(conn,Disconnecting));
drbd_main.c:1883: drbd_force_state(mdev, NS(conn,BrokenPipe));
drbd_main.c:1885: drbd_force_state(mdev, NS(conn,Timeout));
drbd_nl.c:1025: drbd_force_state(mdev,NS(disk,Diskless));
drbd_receiver.c:589: if(rv != size)
drbd_force_state(mdev,NS(conn,BrokenPipe));
drbd_receiver.c:672: drbd_force_state(mdev,NS(conn,Disconnecting));
drbd_receiver.c:1957: drbd_force_state(mdev,NS(conn,Disconnecting));
drbd_receiver.c:2016: drbd_force_state(mdev,NS(conn,Disconnecting));
drbd_receiver.c:2023: drbd_force_state(mdev,NS(conn,Disconnecting));
drbd_receiver.c:2035: drbd_force_state(mdev,NS(conn,Disconnecting));
drbd_receiver.c:2143: drbd_force_state(mdev,NS(conn,Disconnecting));
drbd_receiver.c:2225: drbd_force_state(mdev,NS(conn,Disconnecting));
drbd_receiver.c:2259: drbd_force_state(mdev,NS(conn,Disconnecting));
drbd_receiver.c:2294: drbd_force_state(mdev,NS(conn,Disconnecting));
drbd_receiver.c:2447: drbd_force_state(mdev,NS(conn,Disconnecting));
drbd_receiver.c:2648: drbd_force_state(mdev,NS(conn,ProtocolError));
drbd_receiver.c:2654: drbd_force_state(mdev,NS(conn,ProtocolError));
drbd_receiver.c:3099: drbd_force_state(mdev,NS(conn,Disconnecting));
drbd_receiver.c:3430: drbd_force_state(mdev,NS(conn,NetworkFailure));
drbd_worker.c:738: drbd_force_state(mdev,NS(conn,NetworkFailure));
drbd_worker.c:1008: drbd_force_state(mdev,NS(conn,NetworkFailure));
I don't think "Disconnecting" should ever be forced.
NetworkFailure and the like are possible "forced" transitions.
"state engine" should then do "the right thing".
most locations above where there is "force -> Disconnecting",
in fact there should be either no "force", or force -> ProtocolError.
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Drbd-dev] DRBD8: disconnecting while already disconnecting can hang the receiver
2007-11-27 21:51 ` Montrose, Ernest
2007-11-28 0:26 ` Lars Ellenberg
@ 2007-11-28 13:09 ` Philipp Reisner
2007-11-28 14:08 ` Montrose, Ernest
2 siblings, 0 replies; 9+ messages in thread
From: Philipp Reisner @ 2007-11-28 13:09 UTC (permalink / raw)
To: drbd-dev; +Cc: Montrose, Ernest
[-- Attachment #1: Type: text/plain, Size: 7057 bytes --]
On Tuesday 27 November 2007 22:51:21 Montrose, Ernest wrote:
> Phil,
> Phil,
> Your modification to the original patch will break it actually. The
> reason is that we can get into "disconnecting" anywhere. Below I have
> some logs with the problem happening.
Hi Ernest.
You are using some ancient code!
I removed the line breaks from your logs:
On Node0:
# drbdsetup /dev/drbd16 disconnect
Nov 27 16:38:20 node1 kernel: drbd16: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
Nov 27 16:38:20 node1 kernel: drbd16: Creating new current UUID Nov 27 16:38:20 node1 kernel: drbd16: short read expecting header on sock: r=-512
Nov 27 16:38:20 node1 kernel: drbd16: asender terminated
Nov 27 16:38:20 node1 kernel: drbd16: tl_clear()
Nov 27 16:38:20 node1 kernel: drbd16: Connection closed
Nov 27 16:38:20 node1 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:20 node1 kernel: drbd16: conn( Disconnecting -> StandAlone )
Nov 27 16:38:20 node1 kernel: drbd16: receiver terminated
[
Nov 27 16:38:23 node1 kernel: drbd16: conn( StandAlone -> Unconnected )
Nov 27 16:38:23 node1 kernel: drbd16: receiver (re)started
Nov 27 16:38:23 node1 kernel: drbd16: conn( Unconnected -> WFConnection )
Nov 27 16:38:26 node1 kernel: drbd16: conn( WFConnection -> WFReportParams )
Nov 27 16:38:26 node1 kernel: drbd16: Handshake successful: DRBD Network Protocol version 86
Nov 27 16:38:26 node1 kernel: drbd16: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
Nov 27 16:38:26 node1 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:26 node1 kernel: drbd16: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent )
Nov 27 16:38:26 node1 kernel: drbd16: Began resync as SyncSource (will sync 4 KB [1 bits set]).
Nov 27 16:38:26 node1 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:26 node1 kernel: drbd16: Resync done (total 1 sec; paused 0 sec; 4 K/sec)
Nov 27 16:38:26 node1 kernel: drbd16: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
Nov 27 16:38:27 node1 kernel: drbd16: Writing meta data super block now.
]
======On Node1============
Nov 27 16:38:20 node0 kernel: drbd16: peer( Primary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
Nov 27 16:38:20 node0 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:20 node0 kernel: drbd16: meta connection shut down by peer.
Nov 27 16:38:20 node0 kernel: drbd16: asender terminated
Nov 27 16:38:20 node0 kernel: drbd16: tl_clear()
Nov 27 16:38:20 node0 kernel: drbd16: Connection closed
Nov 27 16:38:20 node0 kernel: drbd16: conn( TearDown -> Unconnected )
Nov 27 16:38:20 node0 kernel: drbd16: drbd_disconnect: ##5# EM-- Done but waiting 30 seconds######
====Issue disconnect here=====
# drbdsetup /dev/drbd16 disconnect
No response from the DRBD driver! Is the module loaded?
Nov 27 16:38:26 node0 kernel: drbd16: conn( Unconnected -> Disconnecting )
Nov 27 16:38:26 node0 kernel: drbd16: drbd_nl_disconnect: EM-- Start wait_event_interruptible for mdev->state.conn==StandAlone ****
Nov 27 16:38:26 node0 kernel: drbd16: drbd_disconnect: ##5# EM-- Done ##### waiting 30 seconds######
Nov 27 16:38:26 node0 kernel: drbd16: receiver terminated
Nov 27 16:38:26 node0 kernel: drbd16: receiver (re)started
Nov 27 16:38:26 node0 kernel: drbd16: ASSERT( mdev->state.conn >= Unconnected ) in /sandbox/emontros/devel/trunk/platform/drbd/src/drbd/drbd_receiver.c:715
Nov 27 16:38:26 node0 kernel: drbd16: conn( Disconnecting -> WFConnection ) <=<<==<<<===<<<<====<<<<<=====<<<<<<======<<<<<<<=======
Nov 27 16:38:26 node0 kernel: drbd16: conn( WFConnection -> WFReportParams )
Nov 27 16:38:26 node0 kernel: drbd16: Handshake successful: DRBD Network Protocol version 86
Nov 27 16:38:26 node0 kernel: drbd16: receive_state: EM-- ....
Nov 27 16:38:26 node0 kernel: drbd16: receive_state: EM-- ....calling sync_handshake
Nov 27 16:38:26 node0 kernel: drbd16: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Nov 27 16:38:26 node0 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:26 node0 kernel: drbd16: conn( WFBitMapT -> WFSyncUUID )
Nov 27 16:38:26 node0 kernel: drbd16: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent )
Nov 27 16:38:26 node0 kernel: drbd16: Began resync as SyncTarget (will sync 4 KB [1 bits set]).
Nov 27 16:38:26 node0 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:27 node0 kernel: drbd16: Resync done (total 1 sec; paused 0 sec; 4 K/sec)
Nov 27 16:38:27 node0 kernel: drbd16: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
Nov 27 16:38:27 node0 kernel: drbd16: Writing meta data super block now.
Look for the line marked with "<=<<==<<<===<<<<====<<<<<=====<<<<<<======<<<<<<<=======".
This state transition fails (silently) in the current code. See the attached commit
from Aug 31. In the mean time we did two releases (8.0.6 and 8.0.7) why is this
patch not in your code-base ?
In the current code the problem you describe is simply not existing.
Here are the logs from my machines:
node0:
[42951113.120000] drbd0: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
[42951113.120000] drbd0: sock was shut down by peer
[42951113.120000] drbd0: short read expecting header on sock: r=0
[42951113.120000] drbd0: meta connection shut down by peer.
[42951113.120000] drbd0: asender terminated
[42951113.120000] drbd0: tl_clear()
[42951113.120000] drbd0: Connection closed
[42951113.120000] drbd0: Writing meta data super block now.
[42951113.120000] drbd0: conn( Disconnecting -> StandAlone )
[42951113.120000] drbd0: receiver terminated
[42951113.960000] drbd0: conn( StandAlone -> Unconnected )
[42951113.960000] drbd0: receiver (re)started
[42951113.960000] drbd0: conn( Unconnected -> WFConnection )
node1:
[42951105.980000] drbd0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
[42951105.980000] drbd0: Writing meta data super block now.
[42951105.980000] drbd0: asender terminated
[42951105.980000] drbd0: tl_clear()
[42951105.980000] drbd0: Connection closed
[42951105.980000] drbd0: conn( TearDown -> Unconnected )
[42951105.980000] drbd0: Entering sleep!
[42951107.160000] drbd0: conn( Unconnected -> Disconnecting )
[42951115.990000] drbd0: Leaving sleep!
[42951115.990000] drbd0: receiver terminated
[42951115.990000] drbd0: receiver (re)started
! ** !! Notice here. No state transition to WFConnection !! ** !
[42951115.990000] drbd0: tl_clear()
[42951115.990000] drbd0: Connection closed
[42951115.990000] drbd0: conn( Disconnecting -> StandAlone )
[42951115.990000] drbd0: Entering sleep!
[42951126.000000] drbd0: Leaving sleep!
[42951126.000000] drbd0: receiver terminated
-Phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria http://www.linbit.com :
[-- Attachment #2: baec52c8af1da0b23a38978e358c24066ff08ef7.diff --]
[-- Type: text/x-diff, Size: 4006 bytes --]
commit baec52c8af1da0b23a38978e358c24066ff08ef7
Author: Philipp Reisner <philipp.reisner@linbit.com>
Date: Fri Aug 31 21:24:39 2007 +0000
r3043: When you let DRBD connect to an listening TCP port and close
that, then allowing it to connect a second time you trigger
an workaround from the DRBD-0.7 days. It is printed to the
syslog as "My msock connect got accepted onto peer's sock!".
The the receiver sleeps for connect_int/2.
When you droped the network config during this time with
"drbdadm disconnect", you hit an OOPS.
The root of this bug was the the
if(drbd_request_state(mdev,NS(conn,WFConnection)) < SS_Success ) ...
statement in drbd_connect() elevated the connection state from
StandAlone to WFConnection. Later we dereference mdev->net_conf
and OOPS...
I fixed this by allowing that state change only when the connection
state before was >= Unconnected (I.e. we had a network config).
diff --git a/drbd/drbd_main.c b/drbd/drbd_main.c
index 571cec0..b8d4751 100644
--- a/drbd/drbd_main.c
+++ b/drbd/drbd_main.c
@@ -595,6 +595,9 @@ STATIC int is_valid_state_transition(drbd_dev* mdev,drbd_state_t ns,drbd_state_t
if( ns.disk > Attaching && os.disk == Diskless)
rv=SS_IsDiskLess;
+ if ( ns.conn == WFConnection && os.conn < Unconnected )
+ rv=SS_NoNetConfig;
+
return rv;
}
diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c
index dfa5b22..86ff8b4 100644
--- a/drbd/drbd_receiver.c
+++ b/drbd/drbd_receiver.c
@@ -706,16 +706,18 @@ STATIC Drbd_Packet_Cmd drbd_recv_fp(drbd_dev *mdev,struct socket *sock)
* 0 oops, did not work out, please try again
* -1 peer talks different language,
* no point in trying again, please go standalone.
+ * -2 We do not have a network config...
*/
int drbd_connect(drbd_dev *mdev)
{
struct socket *s, *sock,*msock;
int try,h;
- D_ASSERT(mdev->state.conn >= Unconnected);
D_ASSERT(!mdev->data.socket);
- if(drbd_request_state(mdev,NS(conn,WFConnection)) < SS_Success ) return 0;
+ if (_drbd_request_state(mdev,NS(conn,WFConnection),0) < SS_Success )
+ return -2;
+
clear_bit(DISCARD_CONCURRENT, &mdev->flags);
sock = NULL;
@@ -2688,6 +2690,7 @@ STATIC void drbd_disconnect(drbd_dev *mdev)
int rv=SS_UnknownError;
D_ASSERT(mdev->state.conn < Connected);
+ if (mdev->state.conn == StandAlone) return;
/* FIXME verify that:
* the state change magic prevents us from becoming >= Connected again
* while we are still cleaning up.
@@ -3102,7 +3105,7 @@ int drbdd_init(struct Drbd_thread *thi)
drbd_disconnect(mdev);
schedule_timeout(HZ);
}
- if( h < 0 ) {
+ if( h == -1 ) {
WARN("Discarding network configuration.\n");
drbd_force_state(mdev,NS(conn,Disconnecting));
}
diff --git a/drbd/drbd_strings.c b/drbd/drbd_strings.c
index 6afbc69..30a6c80 100644
--- a/drbd/drbd_strings.c
+++ b/drbd/drbd_strings.c
@@ -80,7 +80,8 @@ static const char *drbd_state_sw_errors[] = {
[-SS_CW_FailedByPeer] = "State changed was refused by peer node",
[-SS_IsDiskLess] =
"Device is diskless, the requesed operation requires a disk",
- [-SS_DeviceInUse] = "Device is held open by someone"
+ [-SS_DeviceInUse] = "Device is held open by someone",
+ [-SS_NoNetConfig] = "Have no net/connection configuration"
};
const char* conns_to_name(drbd_conns_t s) {
@@ -100,7 +101,7 @@ const char* disks_to_name(drbd_disks_t s) {
}
const char* set_st_err_name(set_st_err_t err) {
- return err < SS_DeviceInUse ? "TOO_SMALL" :
+ return err < SS_NoNetConfig ? "TOO_SMALL" :
err > SS_TwoPrimaries ? "TOO_LARGE"
: drbd_state_sw_errors[-err];
}
diff --git a/drbd/linux/drbd.h b/drbd/linux/drbd.h
index b46e754..6d5259f 100644
--- a/drbd/linux/drbd.h
+++ b/drbd/linux/drbd.h
@@ -207,7 +207,8 @@ typedef enum {
SS_AlreadyStandAlone=-9,
SS_CW_FailedByPeer=-10,
SS_IsDiskLess=-11,
- SS_DeviceInUse=-12
+ SS_DeviceInUse=-12,
+ SS_NoNetConfig=-13
} set_st_err_t;
/* from drbd_strings.c */
^ permalink raw reply related [flat|nested] 9+ messages in thread
* RE: [Drbd-dev] DRBD8: disconnecting while already disconnecting can hang the receiver
2007-11-27 21:51 ` Montrose, Ernest
2007-11-28 0:26 ` Lars Ellenberg
2007-11-28 13:09 ` Philipp Reisner
@ 2007-11-28 14:08 ` Montrose, Ernest
2 siblings, 0 replies; 9+ messages in thread
From: Montrose, Ernest @ 2007-11-28 14:08 UTC (permalink / raw)
To: Philipp Reisner, drbd-dev
Phil,
Aha!! OK I apologize for this. We are using ancient code indeed. I
will update and retest. We were so busy chasing the other issues that
we kept putting the merge on the back burner. Sorry..
EM--
-----Original Message-----
From: Philipp Reisner [mailto:philipp.reisner@linbit.com]
Sent: Wednesday, November 28, 2007 8:10 AM
To: drbd-dev@linbit.com
Cc: Montrose, Ernest
Subject: Re: [Drbd-dev] DRBD8: disconnecting while already disconnecting
can hang the receiver
On Tuesday 27 November 2007 22:51:21 Montrose, Ernest wrote:
> Phil,
> Phil,
> Your modification to the original patch will break it actually. The
> reason is that we can get into "disconnecting" anywhere. Below I have
> some logs with the problem happening.
Hi Ernest.
You are using some ancient code!
I removed the line breaks from your logs:
On Node0:
# drbdsetup /dev/drbd16 disconnect
Nov 27 16:38:20 node1 kernel: drbd16: peer( Secondary -> Unknown ) conn(
Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
Nov 27 16:38:20 node1 kernel: drbd16: Creating new current UUID Nov 27
16:38:20 node1 kernel: drbd16: short read expecting header on sock:
r=-512
Nov 27 16:38:20 node1 kernel: drbd16: asender terminated
Nov 27 16:38:20 node1 kernel: drbd16: tl_clear()
Nov 27 16:38:20 node1 kernel: drbd16: Connection closed
Nov 27 16:38:20 node1 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:20 node1 kernel: drbd16: conn( Disconnecting -> StandAlone
)
Nov 27 16:38:20 node1 kernel: drbd16: receiver terminated
[
Nov 27 16:38:23 node1 kernel: drbd16: conn( StandAlone -> Unconnected )
Nov 27 16:38:23 node1 kernel: drbd16: receiver (re)started
Nov 27 16:38:23 node1 kernel: drbd16: conn( Unconnected -> WFConnection
)
Nov 27 16:38:26 node1 kernel: drbd16: conn( WFConnection ->
WFReportParams )
Nov 27 16:38:26 node1 kernel: drbd16: Handshake successful: DRBD Network
Protocol version 86
Nov 27 16:38:26 node1 kernel: drbd16: peer( Unknown -> Secondary ) conn(
WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
Nov 27 16:38:26 node1 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:26 node1 kernel: drbd16: conn( WFBitMapS -> SyncSource )
pdsk( UpToDate -> Inconsistent )
Nov 27 16:38:26 node1 kernel: drbd16: Began resync as SyncSource (will
sync 4 KB [1 bits set]).
Nov 27 16:38:26 node1 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:26 node1 kernel: drbd16: Resync done (total 1 sec; paused 0
sec; 4 K/sec)
Nov 27 16:38:26 node1 kernel: drbd16: conn( SyncSource -> Connected )
pdsk( Inconsistent -> UpToDate )
Nov 27 16:38:27 node1 kernel: drbd16: Writing meta data super block now.
]
======On Node1============
Nov 27 16:38:20 node0 kernel: drbd16: peer( Primary -> Unknown ) conn(
Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
Nov 27 16:38:20 node0 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:20 node0 kernel: drbd16: meta connection shut down by peer.
Nov 27 16:38:20 node0 kernel: drbd16: asender terminated
Nov 27 16:38:20 node0 kernel: drbd16: tl_clear()
Nov 27 16:38:20 node0 kernel: drbd16: Connection closed
Nov 27 16:38:20 node0 kernel: drbd16: conn( TearDown -> Unconnected )
Nov 27 16:38:20 node0 kernel: drbd16: drbd_disconnect: ##5# EM-- Done
but waiting 30 seconds######
====Issue disconnect here=====
# drbdsetup /dev/drbd16 disconnect
No response from the DRBD driver! Is the module loaded?
Nov 27 16:38:26 node0 kernel: drbd16: conn( Unconnected -> Disconnecting
)
Nov 27 16:38:26 node0 kernel: drbd16: drbd_nl_disconnect: EM-- Start
wait_event_interruptible for mdev->state.conn==StandAlone ****
Nov 27 16:38:26 node0 kernel: drbd16: drbd_disconnect: ##5# EM-- Done
##### waiting 30 seconds######
Nov 27 16:38:26 node0 kernel: drbd16: receiver terminated
Nov 27 16:38:26 node0 kernel: drbd16: receiver (re)started
Nov 27 16:38:26 node0 kernel: drbd16: ASSERT( mdev->state.conn >=
Unconnected ) in
/sandbox/emontros/devel/trunk/platform/drbd/src/drbd/drbd_receiver.c:715
Nov 27 16:38:26 node0 kernel: drbd16: conn( Disconnecting ->
WFConnection ) <=<<==<<<===<<<<====<<<<<=====<<<<<<======<<<<<<<=======
Nov 27 16:38:26 node0 kernel: drbd16: conn( WFConnection ->
WFReportParams )
Nov 27 16:38:26 node0 kernel: drbd16: Handshake successful: DRBD Network
Protocol version 86
Nov 27 16:38:26 node0 kernel: drbd16: receive_state: EM-- ....
Nov 27 16:38:26 node0 kernel: drbd16: receive_state: EM-- ....calling
sync_handshake
Nov 27 16:38:26 node0 kernel: drbd16: peer( Unknown -> Primary ) conn(
WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Nov 27 16:38:26 node0 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:26 node0 kernel: drbd16: conn( WFBitMapT -> WFSyncUUID )
Nov 27 16:38:26 node0 kernel: drbd16: conn( WFSyncUUID -> SyncTarget )
disk( UpToDate -> Inconsistent )
Nov 27 16:38:26 node0 kernel: drbd16: Began resync as SyncTarget (will
sync 4 KB [1 bits set]).
Nov 27 16:38:26 node0 kernel: drbd16: Writing meta data super block now.
Nov 27 16:38:27 node0 kernel: drbd16: Resync done (total 1 sec; paused 0
sec; 4 K/sec)
Nov 27 16:38:27 node0 kernel: drbd16: conn( SyncTarget -> Connected )
disk( Inconsistent -> UpToDate )
Nov 27 16:38:27 node0 kernel: drbd16: Writing meta data super block now.
Look for the line marked with
"<=<<==<<<===<<<<====<<<<<=====<<<<<<======<<<<<<<=======".
This state transition fails (silently) in the current code. See the
attached commit
from Aug 31. In the mean time we did two releases (8.0.6 and 8.0.7) why
is this
patch not in your code-base ?
In the current code the problem you describe is simply not existing.
Here are the logs from my machines:
node0:
[42951113.120000] drbd0: peer( Secondary -> Unknown ) conn( Connected ->
Disconnecting ) pdsk( UpToDate -> DUnknown )
[42951113.120000] drbd0: sock was shut down by peer
[42951113.120000] drbd0: short read expecting header on sock: r=0
[42951113.120000] drbd0: meta connection shut down by peer.
[42951113.120000] drbd0: asender terminated
[42951113.120000] drbd0: tl_clear()
[42951113.120000] drbd0: Connection closed
[42951113.120000] drbd0: Writing meta data super block now.
[42951113.120000] drbd0: conn( Disconnecting -> StandAlone )
[42951113.120000] drbd0: receiver terminated
[42951113.960000] drbd0: conn( StandAlone -> Unconnected )
[42951113.960000] drbd0: receiver (re)started
[42951113.960000] drbd0: conn( Unconnected -> WFConnection )
node1:
[42951105.980000] drbd0: peer( Secondary -> Unknown ) conn( Connected ->
TearDown ) pdsk( UpToDate -> DUnknown )
[42951105.980000] drbd0: Writing meta data super block now.
[42951105.980000] drbd0: asender terminated
[42951105.980000] drbd0: tl_clear()
[42951105.980000] drbd0: Connection closed
[42951105.980000] drbd0: conn( TearDown -> Unconnected )
[42951105.980000] drbd0: Entering sleep!
[42951107.160000] drbd0: conn( Unconnected -> Disconnecting )
[42951115.990000] drbd0: Leaving sleep!
[42951115.990000] drbd0: receiver terminated
[42951115.990000] drbd0: receiver (re)started
! ** !! Notice here. No state transition to WFConnection !! ** !
[42951115.990000] drbd0: tl_clear()
[42951115.990000] drbd0: Connection closed
[42951115.990000] drbd0: conn( Disconnecting -> StandAlone )
[42951115.990000] drbd0: Entering sleep!
[42951126.000000] drbd0: Leaving sleep!
[42951126.000000] drbd0: receiver terminated
-Phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-11-28 14:08 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-18 23:11 [Drbd-dev] DRBD8: disconnecting while already disconnecting can hang the receiver Montrose, Ernest
2007-11-27 10:36 ` Philipp Reisner
2007-11-27 13:06 ` Montrose, Ernest
2007-11-27 14:52 ` Philipp Reisner
2007-11-27 15:06 ` Montrose, Ernest
2007-11-27 21:51 ` Montrose, Ernest
2007-11-28 0:26 ` Lars Ellenberg
2007-11-28 13:09 ` Philipp Reisner
2007-11-28 14:08 ` Montrose, Ernest
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox