All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philipp Reisner <philipp.reisner@linbit.com>
To: drbd-dev@lists.linbit.com
Cc: "Montrose, Ernest" <Ernest.Montrose@stratus.com>
Subject: Re: [Drbd-dev] DRBD8: Deadlock in PausedSyncS
Date: Thu, 2 Aug 2007 11:44:49 +0200	[thread overview]
Message-ID: <200708021144.49764.philipp.reisner@linbit.com> (raw)
In-Reply-To: <BD7042533C2F8943A6A4257A9E31C454F4799A@EXNA.corp.stratus.com>

On Tuesday 31 July 2007 00:31:19 Montrose, Ernest wrote:
> Hi all,
> We are seeing a problem where we deadlock if a pause sync request
> happens while attaching.  Below is an explaination of what I think is
> occurring:
>
> Consider two nodes X primary and Y Secondary.
> 1. X becomes Secondary/Diskless
> 2. Y becomes Primary
> 3. X tries to Attach and sends its states/uuids to Y
> 4. While Y is in receive_state() doing a drbd_sync_handshake(), it
> receives a Paused Sync request.
>   This is where the trouble starts.
> 5. aftr_isp is changed from 0->1 on Y and after_state_ch() called.  This
> triggers
>   a drbd_send_state() from Y.
> 6.  X receives States from Y but no uuids and runs a
> drbd_sync_handshake() with the old uuids and
>     we deadlocked with PausedSyncS on both sides.
>
> I am not sure how to best fix this.  Perhaps we should not call
> drbd_send_state()
> in after_state_ch() for a sync request from the peer if the peer's disk
> is diskless.
> Or we do send states, sends the uuids as well.  The attached patch will
> at least serve as an illustration of the issue if not the correct fix.
>


Hi Ernest,

Your patch was correct I think. But in reality we have not only this
one case of the problem it is a whole class of such problems. 

While the node that gets the new disk is in disk=Negotiating, it 
will run drbd_sync_handshake() on each state packet that comes in.

We need to avoid this. Calling pause-sync is just one possible way
to cause the transmittion of a state packets, this is a whole class
of such problems.

Instead of fixing every place where we send a state packet, to not
send it if the peer has no disk, I decided to fix the receiving
side.

http://lists.linbit.com/pipermail/drbd-cvs/2007-August/001613.html

-Phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :

  reply	other threads:[~2007-08-02  9:44 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-30 22:31 [Drbd-dev] DRBD8: Deadlock in PausedSyncS Montrose, Ernest
2007-08-02  9:44 ` Philipp Reisner [this message]
2007-08-02 12:07   ` Oren Nechushtan
  -- strict thread matches above, loose matches on Subject: below --
2007-08-02 11:58 Montrose, Ernest

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200708021144.49764.philipp.reisner@linbit.com \
    --to=philipp.reisner@linbit.com \
    --cc=Ernest.Montrose@stratus.com \
    --cc=drbd-dev@lists.linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.