Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
* [Drbd-dev] DRBD8: Deadlock in PausedSyncS
@ 2007-07-30 22:31 Montrose, Ernest
  2007-08-02  9:44 ` Philipp Reisner
  0 siblings, 1 reply; 4+ messages in thread
From: Montrose, Ernest @ 2007-07-30 22:31 UTC (permalink / raw)
  To: drbd-dev


[-- Attachment #1.1: Type: text/plain, Size: 1067 bytes --]

Hi all,
We are seeing a problem where we deadlock if a pause sync request
happens while attaching.  Below is an explaination of what I think is
occurring:

Consider two nodes X primary and Y Secondary.
1. X becomes Secondary/Diskless
2. Y becomes Primary
3. X tries to Attach and sends its states/uuids to Y
4. While Y is in receive_state() doing a drbd_sync_handshake(), it
receives a Paused Sync request.
  This is where the trouble starts.
5. aftr_isp is changed from 0->1 on Y and after_state_ch() called.  This
triggers
  a drbd_send_state() from Y. 
6.  X receives States from Y but no uuids and runs a
drbd_sync_handshake() with the old uuids and
    we deadlocked with PausedSyncS on both sides.

I am not sure how to best fix this.  Perhaps we should not call
drbd_send_state() 
in after_state_ch() for a sync request from the peer if the peer's disk
is diskless.  
Or we do send states, sends the uuids as well.  The attached patch will
at least serve as an illustration of the issue if not the correct fix.
 
Thank you!
 
EM--

[-- Attachment #1.2: Type: text/html, Size: 6292 bytes --]

[-- Attachment #2: pausedsync.patch --]
[-- Type: application/octet-stream, Size: 478 bytes --]

Index: drbd/drbd_main.c
===================================================================
--- drbd/drbd_main.c	(revision 16853)
+++ drbd/drbd_main.c	(working copy)
@@ -953,7 +953,8 @@
 	/* We want to pause/continue resync, tell peer. */
 	if ( ns.conn >= Connected && 
 	     (( os.aftr_isp != ns.aftr_isp ) ||
-	      ( os.user_isp != ns.user_isp )) ) {
+	      ( os.user_isp != ns.user_isp )) &&
+            mdev->state.pdsk != Diskless ) {
 		drbd_send_state(mdev);
 	}
 

^ permalink raw reply	[flat|nested] 4+ messages in thread
* RE: [Drbd-dev] DRBD8: Deadlock in PausedSyncS
@ 2007-08-02 11:58 Montrose, Ernest
  0 siblings, 0 replies; 4+ messages in thread
From: Montrose, Ernest @ 2007-08-02 11:58 UTC (permalink / raw)
  To: Philipp Reisner, drbd-dev

Phil,
Yeah, I agree.  There are a few more situations other then this one.
I will try to actually test a few of those with this patch and let you
know if I find anything weird.  Thanks!

EM--
-----Original Message-----
From: Philipp Reisner [mailto:philipp.reisner@linbit.com] 
Sent: Thursday, August 02, 2007 5:45 AM
To: drbd-dev@linbit.com
Cc: Montrose, Ernest
Subject: Re: [Drbd-dev] DRBD8: Deadlock in PausedSyncS

On Tuesday 31 July 2007 00:31:19 Montrose, Ernest wrote:
> Hi all,
> We are seeing a problem where we deadlock if a pause sync request
> happens while attaching.  Below is an explaination of what I think is
> occurring:
>
> Consider two nodes X primary and Y Secondary.
> 1. X becomes Secondary/Diskless
> 2. Y becomes Primary
> 3. X tries to Attach and sends its states/uuids to Y
> 4. While Y is in receive_state() doing a drbd_sync_handshake(), it
> receives a Paused Sync request.
>   This is where the trouble starts.
> 5. aftr_isp is changed from 0->1 on Y and after_state_ch() called.
This
> triggers
>   a drbd_send_state() from Y.
> 6.  X receives States from Y but no uuids and runs a
> drbd_sync_handshake() with the old uuids and
>     we deadlocked with PausedSyncS on both sides.
>
> I am not sure how to best fix this.  Perhaps we should not call
> drbd_send_state()
> in after_state_ch() for a sync request from the peer if the peer's
disk
> is diskless.
> Or we do send states, sends the uuids as well.  The attached patch
will
> at least serve as an illustration of the issue if not the correct fix.
>


Hi Ernest,

Your patch was correct I think. But in reality we have not only this
one case of the problem it is a whole class of such problems. 

While the node that gets the new disk is in disk=Negotiating, it 
will run drbd_sync_handshake() on each state packet that comes in.

We need to avoid this. Calling pause-sync is just one possible way
to cause the transmittion of a state packets, this is a whole class
of such problems.

Instead of fixing every place where we send a state packet, to not
send it if the peer has no disk, I decided to fix the receiving
side.

http://lists.linbit.com/pipermail/drbd-cvs/2007-August/001613.html

-Phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-08-02 12:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-30 22:31 [Drbd-dev] DRBD8: Deadlock in PausedSyncS Montrose, Ernest
2007-08-02  9:44 ` Philipp Reisner
2007-08-02 12:07   ` Oren Nechushtan
  -- strict thread matches above, loose matches on Subject: below --
2007-08-02 11:58 Montrose, Ernest

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox