From: Ming Lei <ming.lei@redhat.com>
To: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
ming.lei@redhat.com
Subject: Re: [PATCH] block: model freeze & enter queue as rwsem for supporting lockdep
Date: Mon, 21 Oct 2024 10:20:30 +0800 [thread overview]
Message-ID: <ZxW6bl-k692d6H62@fedora> (raw)
In-Reply-To: <46493f6f-850e-459e-a4be-116deb5d3ca0@acm.org>
On Fri, Oct 18, 2024 at 09:57:12AM -0700, Bart Van Assche wrote:
>
> On 10/17/24 6:35 PM, Ming Lei wrote:
> > Recently we got several deadlock report[1][2][3] caused by blk_mq_freeze_queue
> > and blk_enter_queue().
> >
> > Turns out the two are just like one rwsem, so model them as rwsem for
> > supporting lockdep:
> >
> > 1) model blk_mq_freeze_queue() as down_write_trylock()
> > - it is exclusive lock, so dependency with blk_enter_queue() is covered
> > - it is trylock because blk_mq_freeze_queue() are allowed to run concurrently
> >
> > 2) model blk_enter_queue() as down_read()
> > - it is shared lock, so concurrent blk_enter_queue() are allowed
> > - it is read lock, so dependency with blk_mq_freeze_queue() is modeled
> > - blk_queue_exit() is often called from other contexts(such as irq), and
> > it can't be annotated as rwsem_release(), so simply do it in
> > blk_enter_queue(), this way still covered cases as many as possible
> >
> > NVMe is the only subsystem which may call blk_mq_freeze_queue() and
> > blk_mq_unfreeze_queue() from different context, so it is the only
> > exception for the modeling. Add one tagset flag to exclude it from
> > the lockdep support.
> >
> > With lockdep support, such kind of reports may be reported asap and
> > needn't wait until the real deadlock is triggered.
> >
> > For example, the following lockdep report can be triggered in the
> > report[3].
>
> Hi Ming,
>
> Thank you for having reported this issue and for having proposed a
> patch. I think the following is missing from the patch description:
> (a) An analysis of which code causes the inconsistent nested lock order.
> (b) A discussion of potential alternatives.
I guess you might misunderstand this patch which just enables lockdep for
freezing & entering queue, so that lockdep can be used for running early
verification on patches & kernel.
The basic idea is to model .q_usage_counter as rwsem:
- treat freeze_queue as down_write_trylock()
- treat enter_queue() as down_read()
We needn't to care relation with other locks if the following is true:
- .q_usage_counter can be aligned with lock, and
- the modeling in this patch is correct.
Then lock order can be verified by lockdep.
>
> It seems unavoidable to me that some code freezes request queue(s)
> before the limits are updated. Additionally, there is code that freezes
> queues first (sd_revalidate_disk()), executes commands and next updates
> limits. Hence the inconsistent order of freezing queues and obtaining
> limits_lock.
This is just one specific issue which can be reported with lockdep
support, but this patch only focus on how to model freeze/enter queue as lock,
so please move discussion on this specific lock issue to another standalone thread.
>
> The alternative (entirely untested) solution below has the following
> advantages:
> * No additional information has to be provided to lockdep about the
> locking order.
Can you explain a bit more? I think this patch doesn't provide
`additional` info to lockdep APIs.
> * No new flags are required (SKIP_FREEZE_LOCKDEP).
So far it isn't possible, nvme subsystem freezes queue in one context,
and unfreezes queue from another context, this way has caused many
trouble.
And it needs to refactor nvme error handling code path for removing
SKIP_FREEZE_LOCKDEP, not short-term thing.
> * No exceptions are necessary for blk_queue_exit() nor for the NVMe
> driver.
I think it isn't basically possible since lock won't be used in this way.
More importantly we have covered many enough cases already by not taking
blk_queue_exit() into account.
Thanks,
Ming
next prev parent reply other threads:[~2024-10-21 2:20 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-18 1:35 [PATCH] block: model freeze & enter queue as rwsem for supporting lockdep Ming Lei
2024-10-18 16:57 ` Bart Van Assche
2024-10-21 2:20 ` Ming Lei [this message]
2024-10-22 6:26 ` Christoph Hellwig
2024-10-18 18:45 ` kernel test robot
2024-10-19 22:46 ` Jens Axboe
2024-10-21 11:17 ` Ming Lei
2024-10-22 6:18 ` Christoph Hellwig
2024-10-22 7:19 ` Peter Zijlstra
2024-10-22 7:21 ` Christoph Hellwig
2024-10-23 3:22 ` Ming Lei
2024-10-23 6:07 ` Christoph Hellwig
2024-10-22 15:05 ` Bart Van Assche
2024-10-23 7:59 ` Ming Lei
2024-10-23 18:05 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZxW6bl-k692d6H62@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox