From: Philipp Reisner <philipp.reisner@linbit.com>
To: drbd-dev@lists.linbit.com
Cc: "Montrose, Ernest" <Ernest.Montrose@stratus.com>
Subject: Re: [Drbd-dev] DRBD8: nodes deadlock in PausedSync{ST]
Date: Wed, 31 Oct 2007 14:46:39 +0100 [thread overview]
Message-ID: <200710311446.42512.philipp.reisner@linbit.com> (raw)
In-Reply-To: <BD7042533C2F8943A6A4257A9E31C454F47A25@EXNA.corp.stratus.com>
On Monday 29 October 2007 23:33:29 Montrose, Ernest wrote:
> Hi all,
> I have been struggling with a problem here where the nodes enter
> PausedSync[T|S] and stay there.
> This happens when one node come up from a fresh attach, connect
> sequence. I think the issue happens this way. Say we have two volumes
> drbd5 and drbd16 and we attempt to connect both of them at roughly the
> same time. Futhermore, drbd5 and 16 will require syncing say as sync
> target. What I observe is this:
> * drbd16 is connecting and drbd5 is syncing. So 16 is paused isp=1
> * drbd16 enters receive_state() but before acquiring the req_lock
> that thread loses the CPU to drbd5 that is finishing syncing. After_isp
> is cleared on 16 giving drbd16 the green light to continue syncing. So
> far so good.
> * Now drbd16 resumes with the old peer_isp=1
> * So now we are paused forever.
>
> So I think receive_state() is just racy but I could be wrong. I am
> really not sure how to fix this but I include a patch here that may help
> to at least illustrate the problem. It seems close window for this
> particular race somewhat.
>
Hi Ernest,
You patch fixes the issue.
I spent an hour or so to understanding the exact timing, and drew
diagrams of it... it is fixed with that change. No race left, I think.
I committed it to my git tree.
-Phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria http://www.linbit.com :
next prev parent reply other threads:[~2007-10-31 13:46 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-29 22:33 [Drbd-dev] DRBD8: nodes deadlock in PausedSync{ST] Montrose, Ernest
2007-10-31 13:46 ` Philipp Reisner [this message]
2007-11-01 12:03 ` Montrose, Ernest
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200710311446.42512.philipp.reisner@linbit.com \
--to=philipp.reisner@linbit.com \
--cc=Ernest.Montrose@stratus.com \
--cc=drbd-dev@lists.linbit.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.