Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
* [Drbd-dev] DRBD8: Resync stalled at 100% due to race condition
       [not found] <BD7042533C2F8943A6A4257A9E31C454F47906@EXNA.corp.stratus.com>
@ 2007-05-24 12:41 ` Montrose, Ernest
  2007-06-01 14:19   ` Philipp Reisner
  2007-06-04  9:08   ` Philipp Reisner
  0 siblings, 2 replies; 5+ messages in thread
From: Montrose, Ernest @ 2007-05-24 12:41 UTC (permalink / raw)
  To: drbd-dev

[-- Attachment #1: Type: text/plain, Size: 4243 bytes --]

Hi all,
We are seeing a problem where a resync hangs on the SyncSource at the
end.  The SyncTarget finished OK and shows Connected. The signature on
the SyncSource is:
 
 ns:364900 nr:13389364 dw:13754324 dr:800327 al:165 bm:959 lo:0 pe:0
ua:0 ap:0
    [===================>] sync'ed:100.0% (0/0)K
    stalled

What I think is happening is that there is a race condition where
drbd_resync_finished() races receive_state() in this manner:
1)  The resync finished on drbd0 and we enter drbd_resync_finished().
But before it can set the stated to Connected, drbd15 which is a higher
priority device starts syncing. This puts drbd0 in PausedSyncS from
SyncSource.
 
2) Drbd_resync_finished() for drbd0 now tries to go to Connected from
PausedSyncS.  The logs below prints this transition but the transition
was not actually commited since we print before we actually assign the
new values. 
 
3) At this precise moment, another thread calls receive_state().
Receive_state() is entered with a state of "PausedSyncS".  It skips the
drbd_sync_handshake() because of that.  However, before reaching the end
of receive_state() the drbd0 state change from drbd_request_finished()
is commited for real. So the state is Connected for drbd0.
 
4) By the time receive_state calls _drbd_set_state()  we have
os=Connected and ns=PausedSyncS because receive_state() had cached the
state in nconn early without the req_lock held.  The mdev value had
changed while the cached value is being used. 
 
5) drbd0 now transitions from Connected to PausedSyncS.
 
6) Because of 5 above we transition from PausedSyncS to SyncSource but
the other side is connected and Uptodate ...So we're stuck.
 
I think this is the result of receive_state() being a bit racy.  I
include a patch that may at least help illustrate the issue if not fix
it as I am not sure the req_lock can be held this early without causing
a deadlock or other perfomance issues.
 
Here are the logs for completeness:
 
So on syncsource:
May  8 03:50:12 bruce kernel: drbd0: Began resync as SyncSource (will
sync 840 KB [210 bits set]).
May  8 03:50:12 bruce kernel: drbd0: Writing meta data super block now.
May  8 03:50:13 bruce kernel: drbd0: conn( SyncSource -> PausedSyncS )
aftr_isp( 0 -> 1 )
May  8 03:50:13 bruce kernel: drbd0: Resync suspended
May  8 03:50:13 bruce kernel: drbd0: Resync done (total 1 sec; paused 0
sec; 840 K/sec)
May  8 03:50:13 bruce kernel: drbd0:  uuid[History_start] now
1DCA5DA245F96038
May  8 03:50:13 bruce kernel: drbd0:  uuid[Bitmap] now 0000000000000000
May  8 03:50:13 bruce kernel: drbd0: conn( PausedSyncS -> Connected )
pdsk( Inconsistent -> UpToDate )
May  8 03:50:13 bruce kernel: drbd0: conn( Connected -> PausedSyncS )
peer_isp( 0 -> 1 )
May  8 03:50:13 bruce kernel: drbd0: Writing meta data super block now.
May  8 03:50:13 bruce kernel: drbd0: peer_isp( 1 -> 0 )
May  8 03:50:13 bruce kernel: drbd0: conn( PausedSyncS -> SyncSource )
aftr_isp( 1 -> 0 )
May  8 03:50:13 bruce kernel: drbd0: Syncer continues.


On the syncTarget:
May  8 03:50:13 dick kernel: drbd0: conn( WFSyncUUID -> SyncTarget )
disk( UpToDate -> Inconsistent )
May  8 03:50:13 dick kernel: drbd0: Began resync as SyncTarget (will
sync 840 KB [210 bits set]).
May  8 03:50:13 dick kernel: drbd0: Writing meta data super block now.
May  8 03:50:14 dick kernel: drbd0: Resync done (total 1 sec; paused 0
sec; 840 K/sec)
May  8 03:50:14 dick kernel: drbd0:  uuid[Bitmap] now A10989D4FAD4C4BB
May  8 03:50:14 dick kernel: drbd0:  uuid[History_start] now
B2E9873EF8AC35CB
May  8 03:50:14 dick kernel: drbd0:  uuid[History_end] now
161E8961AB2B5473
May  8 03:50:14 dick kernel: drbd0:  uuid[History_start] now
A10989D4FAD4C4BB
May  8 03:50:14 dick kernel: drbd0:  uuid[Bitmap] now 1DCA5DA245F96038
May  8 03:50:14 dick kernel: drbd0:  uuid[Current] now 8B19AE5BFEDF83D4
May  8 03:50:14 dick kernel: drbd0:  uuid[History_start] now
1DCA5DA245F96038
May  8 03:50:14 dick kernel: drbd0:  uuid[Bitmap] now 0000000000000000
May  8 03:50:14 dick kernel: drbd0: conn( SyncTarget -> Connected )
disk( Inconsistent -> UpToDate )
May  8 03:50:14 dick kernel: drbd0: Writing meta data super block now.

EM--

[-- Attachment #2: resync_hangs_2290.patch --]
[-- Type: application/octet-stream, Size: 882 bytes --]

Index: drbd/drbd_receiver.c
===================================================================
--- drbd/drbd_receiver.c	(revision 14387)
+++ drbd/drbd_receiver.c	(working copy)
@@ -2398,6 +2398,8 @@
 	if (drbd_recv(mdev, h->payload, h->length) != h->length)
 		return FALSE;
 
+       spin_lock_irq(&mdev->req_lock);
+   
 	nconn = mdev->state.conn;
 	if (nconn == WFReportParams ) nconn = Connected;
 
@@ -2409,7 +2411,11 @@
 		nconn=drbd_sync_handshake(mdev,peer_state.role,peer_state.disk);
 		dec_local(mdev);
 
-		if(nconn == conn_mask) return FALSE;
+		if(nconn == conn_mask) {
+                 spin_unlock_irq(&mdev->req_lock);
+                 return FALSE;
+              }
+              
 	}
 
 	if (mdev->state.conn > WFReportParams ) {
@@ -2424,7 +2430,6 @@
 		}
 	}
 
-	spin_lock_irq(&mdev->req_lock);
 	os = mdev->state;
 	ns.i = mdev->state.i;
 	ns.conn = nconn;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Drbd-dev] DRBD8: Resync stalled at 100% due to race condition
  2007-05-24 12:41 ` Montrose, Ernest
@ 2007-06-01 14:19   ` Philipp Reisner
  2007-06-04  9:08   ` Philipp Reisner
  1 sibling, 0 replies; 5+ messages in thread
From: Philipp Reisner @ 2007-06-01 14:19 UTC (permalink / raw)
  To: drbd-dev

On Thursday 24 May 2007 14:41:53 Montrose, Ernest wrote:
> Hi all,
> We are seeing a problem where a resync hangs on the SyncSource at the
> end.  The SyncTarget finished OK and shows Connected. The signature on
> the SyncSource is:
[...]

Hi Ernest,

I will try to go trough this early next week (hopefully Monday).

-Phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [Drbd-dev] DRBD8: Resync stalled at 100% due to race condition
@ 2007-06-01 14:27 Ernest Montrose
  0 siblings, 0 replies; 5+ messages in thread
From: Ernest Montrose @ 2007-06-01 14:27 UTC (permalink / raw)
  To: philipp.reisner, drbd-dev

Phil,
Thanks! But be aware that I had sent multiple messages about this since we are having
email issues here. I don't know which email you've seen.  But at least one of the emails
have a proposed patch that would simply acquire the req_lok earlier.  But that would
deadlock, so ignore that patch. Just an FYI.

EM--
-----Original Message-----
From: drbd-dev-bounces@linbit.com [mailto:drbd-dev-bounces@linbit.com] On Behalf Of Philipp Reisner
Sent: Friday, June 01, 2007 10:19 AM
To: drbd-dev@linbit.com
Subject: Re: [Drbd-dev] DRBD8: Resync stalled at 100% due to race condition
 
On Thursday 24 May 2007 14:41:53 Montrose, Ernest wrote:
> Hi all,
> We are seeing a problem where a resync hangs on the SyncSource at the
> end.  The SyncTarget finished OK and shows Connected. The signature on
> the SyncSource is:
[...]
 
Hi Ernest,
 
I will try to go trough this early next week (hopefully Monday).
 
-Phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :
_______________________________________________
drbd-dev mailing list
drbd-dev@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-dev


       
____________________________________________________________________________________
Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more. 
http://mobile.yahoo.com/go?refer=1GNXIC


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Drbd-dev] DRBD8: Resync stalled at 100% due to race condition
  2007-05-24 12:41 ` Montrose, Ernest
  2007-06-01 14:19   ` Philipp Reisner
@ 2007-06-04  9:08   ` Philipp Reisner
  1 sibling, 0 replies; 5+ messages in thread
From: Philipp Reisner @ 2007-06-04  9:08 UTC (permalink / raw)
  To: drbd-dev; +Cc: Montrose, Ernest

[-- Attachment #1: Type: text/plain, Size: 2331 bytes --]

On Thursday 24 May 2007 14:41:53 Montrose, Ernest wrote:
> Hi all,
> We are seeing a problem where a resync hangs on the SyncSource at the
> end.  The SyncTarget finished OK and shows Connected. The signature on
> the SyncSource is:
>
[...]

I think you are right in saying that receive_state() is wrong, but
I have an other interpretation of the logs.

> What I think is happening is that there is a race condition where
> drbd_resync_finished() races receive_state() in this manner:
> 1)  The resync finished on drbd0 and we enter drbd_resync_finished().
> But before it can set the stated to Connected, drbd15 which is a higher
> priority device starts syncing. This puts drbd0 in PausedSyncS from
> SyncSource.

Right.

> 2) Drbd_resync_finished() for drbd0 now tries to go to Connected from
> PausedSyncS.  The logs below prints this transition but the transition
> was not actually commited since we print before we actually assign the
> new values.

No, the log line "conn (PausedSyncS -> Connected)" is done by
_drbd_set_state() (with the PSC macros) which runs under the req_lock, 
and there happens the assignment "mdev->state.i = ns.i;". 
The log is okay.

I think we have a race betwen receive_state() assigning the 
connection state to nconn=PausedSyncS, then the resync finishes
before we reach the call to spin_lock() (mdev->state.conn = Connected).
Now when receive_state() finally continues, it assigns (the now
obsolete value of) nconn to mdev->state.conn again by calling
_drbd_set_state().

> I include a patch that may at least help illustrate the issue if not fix
> it as I am not sure the req_lock can be held this early without causing
> a deadlock or other perfomance issues.
>

I considered the approach of making drbd_sync_handshake() to run
under the req_lock and to have a simple retry mechanism in receive_state().

Since 8.0.x is not the stable branch I decided to go with that small
patch. (I did not do any testing of this, since I guess it is rather
hard to hit this exact timing...)

Ernest, thanks for pointing this out!
As soon as you agree, I will commit this patch...

-phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :

[-- Attachment #2: fix_rsync_hangs.diff --]
[-- Type: text/x-diff, Size: 2121 bytes --]

Index: drbd/drbd_receiver.c
===================================================================
--- drbd/drbd_receiver.c	(revision 2903)
+++ drbd/drbd_receiver.c	(working copy)
@@ -2390,7 +2390,7 @@
 STATIC int receive_state(drbd_dev *mdev, Drbd_Header *h)
 {
 	Drbd_State_Packet *p = (Drbd_State_Packet*)h;
-	drbd_conns_t nconn;
+	drbd_conns_t nconn,oconn;
 	drbd_state_t os,ns,peer_state;
 	int rv;
 
@@ -2398,12 +2398,16 @@
 	if (drbd_recv(mdev, h->payload, h->length) != h->length)
 		return FALSE;
 
-	nconn = mdev->state.conn;
+	peer_state.i = be32_to_cpu(p->state);
+
+	spin_lock_irq(&mdev->req_lock);
+ retry:
+	oconn = nconn = mdev->state.conn;
+	spin_unlock_irq(&mdev->req_lock);
+
 	if (nconn == WFReportParams ) nconn = Connected;
 
-	peer_state.i = be32_to_cpu(p->state);
-
-	if (mdev->p_uuid && mdev->state.conn <= Connected &&
+	if (mdev->p_uuid && oconn <= Connected &&
 	    inc_local_if_state(mdev,Negotiating) &&
 	    peer_state.disk >= Negotiating) {
 		nconn=drbd_sync_handshake(mdev,peer_state.role,peer_state.disk);
@@ -2412,19 +2416,8 @@
 		if(nconn == conn_mask) return FALSE;
 	}
 
-	if (mdev->state.conn > WFReportParams ) {
-		if( nconn > Connected && peer_state.conn <= Connected) {
-			// we want resync, peer has not yet decided to sync...
-			drbd_send_uuids(mdev);
-			drbd_send_state(mdev);
-		}
-		else if (nconn == Connected && peer_state.disk == Negotiating) {
-			// peer is waiting for us to respond...
-			drbd_send_state(mdev);
-		}
-	}
-
 	spin_lock_irq(&mdev->req_lock);
+	if( mdev->state.conn != oconn ) goto retry;
 	os = mdev->state;
 	ns.i = mdev->state.i;
 	ns.conn = nconn;
@@ -2446,6 +2439,18 @@
 		return FALSE;
 	}
 
+	if (oconn > WFReportParams ) {
+		if( nconn > Connected && peer_state.conn <= Connected) {
+			// we want resync, peer has not yet decided to sync...
+			drbd_send_uuids(mdev);
+			drbd_send_state(mdev);
+		}
+		else if (nconn == Connected && peer_state.disk == Negotiating) {
+			// peer is waiting for us to respond...
+			drbd_send_state(mdev);
+		}
+	}
+
 	mdev->net_conf->want_lose = 0;
 
 	/* FIXME assertion for (gencounts do not diverge) */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [Drbd-dev] DRBD8: Resync stalled at 100% due to race condition
@ 2007-06-04 18:15 Montrose, Ernest
  0 siblings, 0 replies; 5+ messages in thread
From: Montrose, Ernest @ 2007-06-04 18:15 UTC (permalink / raw)
  To: Philipp Reisner, drbd-dev

Phil,
Thanks for the patch.  I tested the change and it appears to be fine so
far.
I am pretty confident this fixes it based on what I have tried before
when 
I tried to stage the problem. Please check it in.

EM-- 

-----Original Message-----
From: drbd-dev-bounces@linbit.com [mailto:drbd-dev-bounces@linbit.com]
On Behalf Of Philipp Reisner
Sent: Monday, June 04, 2007 5:09 AM
To: drbd-dev@linbit.com
Cc: Montrose, Ernest
Subject: Re: [Drbd-dev] DRBD8: Resync stalled at 100% due to race
condition

On Thursday 24 May 2007 14:41:53 Montrose, Ernest wrote:
> Hi all,
> We are seeing a problem where a resync hangs on the SyncSource at the
> end.  The SyncTarget finished OK and shows Connected. The signature on
> the SyncSource is:
>
[...]

I think you are right in saying that receive_state() is wrong, but
I have an other interpretation of the logs.

> What I think is happening is that there is a race condition where
> drbd_resync_finished() races receive_state() in this manner:
> 1)  The resync finished on drbd0 and we enter drbd_resync_finished().
> But before it can set the stated to Connected, drbd15 which is a
higher
> priority device starts syncing. This puts drbd0 in PausedSyncS from
> SyncSource.

Right.

> 2) Drbd_resync_finished() for drbd0 now tries to go to Connected from
> PausedSyncS.  The logs below prints this transition but the transition
> was not actually commited since we print before we actually assign the
> new values.

No, the log line "conn (PausedSyncS -> Connected)" is done by
_drbd_set_state() (with the PSC macros) which runs under the req_lock, 
and there happens the assignment "mdev->state.i = ns.i;". 
The log is okay.

I think we have a race betwen receive_state() assigning the 
connection state to nconn=PausedSyncS, then the resync finishes
before we reach the call to spin_lock() (mdev->state.conn = Connected).
Now when receive_state() finally continues, it assigns (the now
obsolete value of) nconn to mdev->state.conn again by calling
_drbd_set_state().

> I include a patch that may at least help illustrate the issue if not
fix
> it as I am not sure the req_lock can be held this early without
causing
> a deadlock or other perfomance issues.
>

I considered the approach of making drbd_sync_handshake() to run
under the req_lock and to have a simple retry mechanism in
receive_state().

Since 8.0.x is not the stable branch I decided to go with that small
patch. (I did not do any testing of this, since I guess it is rather
hard to hit this exact timing...)

Ernest, thanks for pointing this out!
As soon as you agree, I will commit this patch...

-phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-06-04 18:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-01 14:27 [Drbd-dev] DRBD8: Resync stalled at 100% due to race condition Ernest Montrose
  -- strict thread matches above, loose matches on Subject: below --
2007-06-04 18:15 Montrose, Ernest
     [not found] <BD7042533C2F8943A6A4257A9E31C454F47906@EXNA.corp.stratus.com>
2007-05-24 12:41 ` Montrose, Ernest
2007-06-01 14:19   ` Philipp Reisner
2007-06-04  9:08   ` Philipp Reisner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox