Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
From: Lars Ellenberg <lars.ellenberg@linbit.com>
To: drbd-dev@lists.linbit.com
Subject: Re: [Drbd-dev] Please test with CONFIG_PROVE_LOCKING=y
Date: Thu, 25 Apr 2019 12:56:41 +0200	[thread overview]
Message-ID: <20190425105641.GA919@soda.linbit> (raw)
In-Reply-To: <eec310e0-e0ca-6b18-7496-e9e3b6b29b63@i-love.sakura.ne.jp>

On Thu, Apr 25, 2019 at 06:30:05PM +0900, Tetsuo Handa wrote:
> I found that simply doing
> 
> # mount /dev/drbd0 /mnt/
> 
> on the primary side causes a lockdep splat on the peer side.
> 

> [   23.039882] ========================================================
> [   23.039906] WARNING: possible irq lock inversion dependency detected
> [   23.039931] 5.0.0 #891 Tainted: G           O
> [   23.039950] --------------------------------------------------------
> [   23.039975] drbd_r_r0/8237 just changed the state of lock:
> [   23.039997] 000000007cc227b6 (&(&connection->epoch_lock)->rlock){+.+.}, at: receive_Data+0x36b/0x1ca0 [drbd]
> [   23.040049] but this lock was taken by another, SOFTIRQ-safe lock in the past:
> [   23.040115]  (&(&resource->req_lock)->rlock){..-.}
> [   23.040117]
> 
> and interrupts could create inverse lock ordering between them.
> 
> [   23.040176]
> other info that might help us debug this:
> [   23.040200]  Possible interrupt unsafe locking scenario:
> 
> [   23.040225]        CPU0                    CPU1
> [   23.040243]        ----                    ----
> [   23.040260]   lock(&(&connection->epoch_lock)->rlock);
> [   23.040281]                                local_irq_disable();
> [   23.040303]                                lock(&(&resource->req_lock)->rlock);
> [   23.040330]                                lock(&(&connection->epoch_lock)->rlock);
> [   23.040359]   <Interrupt>
> [   23.040370]     lock(&(&resource->req_lock)->rlock);
> [   23.040389]
>  *** DEADLOCK ***


Yes.
We already know.
"impossible odds"...

But needs fixing.
Certainly NOT by making all epoch_lock irqsave.
Problem was introduced by me with
f4acb16f drbd: fix lifetime of "need to apply activity log" metadata flag

I "just" need to come up with a way to check what I am checking there
without taking the epoch lock.


> Although making below change seems to solve the lockdep splat,
> I can't check the correctness because I don't know how drbd works.
> Please test with CONFIG_PROVE_LOCKING=y and fix.

See above.
Thanks.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT

      reply	other threads:[~2019-04-25 11:04 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-25  9:30 [Drbd-dev] Please test with CONFIG_PROVE_LOCKING=y Tetsuo Handa
2019-04-25 10:56 ` Lars Ellenberg [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190425105641.GA919@soda.linbit \
    --to=lars.ellenberg@linbit.com \
    --cc=drbd-dev@lists.linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox