From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH] block: model freeze & enter queue as rwsem for supporting lockdep
Date: Mon, 21 Oct 2024 19:17:39 +0800 [thread overview]
Message-ID: <ZxY4U2nTkRhV3NWL@fedora> (raw)
In-Reply-To: <4f4cdf6a-e1d3-4e0c-bb57-9cbe767ac112@kernel.dk>
On Sat, Oct 19, 2024 at 04:46:13PM -0600, Jens Axboe wrote:
> On 10/17/24 7:35 PM, Ming Lei wrote:
> > Recently we got several deadlock report[1][2][3] caused by blk_mq_freeze_queue
> > and blk_enter_queue().
> >
> > Turns out the two are just like one rwsem, so model them as rwsem for
> > supporting lockdep:
> >
> > 1) model blk_mq_freeze_queue() as down_write_trylock()
> > - it is exclusive lock, so dependency with blk_enter_queue() is covered
> > - it is trylock because blk_mq_freeze_queue() are allowed to run concurrently
> >
> > 2) model blk_enter_queue() as down_read()
> > - it is shared lock, so concurrent blk_enter_queue() are allowed
> > - it is read lock, so dependency with blk_mq_freeze_queue() is modeled
> > - blk_queue_exit() is often called from other contexts(such as irq), and
> > it can't be annotated as rwsem_release(), so simply do it in
> > blk_enter_queue(), this way still covered cases as many as possible
> >
> > NVMe is the only subsystem which may call blk_mq_freeze_queue() and
> > blk_mq_unfreeze_queue() from different context, so it is the only
> > exception for the modeling. Add one tagset flag to exclude it from
> > the lockdep support.
> >
> > With lockdep support, such kind of reports may be reported asap and
> > needn't wait until the real deadlock is triggered.
>
> I think this is a great idea. We've had way too many issues in this
> area, getting lockdep to grok it (and report issues) is the ideal way to
> avoid that, and even find issues we haven't come across yet.
So far, one main false positive is that the modeling becomes not
correct when calling blk_queue_start_drain() with setting disk state as
GD_DEAD or queue as QUEUE_FLAG_DYING, since __bio_queue_enter() or
blk_queue_enter() can return immediately in this situation.
[ 281.645392] ======================================================
[ 281.647189] WARNING: possible circular locking dependency detected
[ 281.648770] 6.11.0_nbd+ #405 Not tainted
[ 281.649171] ------------------------------------------------------
[ 281.649668] nvme/10551 is trying to acquire lock:
[ 281.650100] ffff938a5717e3e0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x1d6/0x4d0
[ 281.650771]
but task is already holding lock:
[ 281.651442] ffff938a12206c48 (q->q_usage_counter){++++}-{0:0}, at: blk_queue_start_drain+0x12/0x40
[ 281.652085]
which lock already depends on the new lock.
[ 281.653061]
the existing dependency chain (in reverse order) is:
[ 281.653820]
-> #1 (q->q_usage_counter){++++}-{0:0}:
[ 281.654525] blk_try_enter_queue+0xc7/0x230
[ 281.654951] __submit_bio+0xa7/0x190
[ 281.655339] submit_bio_noacct_nocheck+0x1b2/0x400
[ 281.655779] __block_write_full_folio+0x1e7/0x400
[ 281.656212] write_cache_pages+0x62/0xb0
[ 281.656608] blkdev_writepages+0x56/0x90
[ 281.657007] do_writepages+0x76/0x270
[ 281.657389] __writeback_single_inode+0x5b/0x4c0
[ 281.657813] writeback_sb_inodes+0x22e/0x550
[ 281.658220] __writeback_inodes_wb+0x4c/0xf0
[ 281.658617] wb_writeback+0x193/0x3f0
[ 281.658995] wb_workfn+0x343/0x530
[ 281.659353] process_one_work+0x212/0x700
[ 281.659739] worker_thread+0x1ce/0x380
[ 281.660118] kthread+0xd2/0x110
[ 281.660460] ret_from_fork+0x31/0x50
[ 281.660818] ret_from_fork_asm+0x1a/0x30
[ 281.661192]
-> #0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}:
[ 281.661876] __lock_acquire+0x15c0/0x23e0
[ 281.662254] lock_acquire+0xd8/0x300
[ 281.662603] __flush_work+0x1f2/0x4d0
[ 281.662954] wb_shutdown+0xa1/0xd0
[ 281.663285] bdi_unregister+0x92/0x250
[ 281.663632] del_gendisk+0x37b/0x3a0
[ 281.664017] nvme_mpath_shutdown_disk+0x58/0x60 [nvme_core]
[ 281.664453] nvme_ns_remove+0x17f/0x210 [nvme_core]
[ 281.664854] nvme_remove_namespaces+0xf7/0x150 [nvme_core]
[ 281.665304] nvme_do_delete_ctrl+0x71/0x90 [nvme_core]
[ 281.665728] nvme_delete_ctrl_sync+0x3f/0x50 [nvme_core]
[ 281.666159] nvme_sysfs_delete+0x38/0x50 [nvme_core]
[ 281.666569] kernfs_fop_write_iter+0x15c/0x210
[ 281.666953] vfs_write+0x2a7/0x540
[ 281.667281] ksys_write+0x75/0x100
[ 281.667607] do_syscall_64+0x95/0x180
[ 281.667948] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 281.668352]
other info that might help us debug this:
[ 281.669122] Possible unsafe locking scenario:
[ 281.669671] CPU0 CPU1
[ 281.670019] ---- ----
[ 281.670358] lock(q->q_usage_counter);
[ 281.670676] lock((work_completion)(&(&wb->dwork)->work));
[ 281.671186] lock(q->q_usage_counter);
[ 281.671628] lock((work_completion)(&(&wb->dwork)->work));
[ 281.672056]
*** DEADLOCK ***
thanks,
Ming
next prev parent reply other threads:[~2024-10-21 11:17 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-18 1:35 [PATCH] block: model freeze & enter queue as rwsem for supporting lockdep Ming Lei
2024-10-18 16:57 ` Bart Van Assche
2024-10-21 2:20 ` Ming Lei
2024-10-22 6:26 ` Christoph Hellwig
2024-10-18 18:45 ` kernel test robot
2024-10-19 22:46 ` Jens Axboe
2024-10-21 11:17 ` Ming Lei [this message]
2024-10-22 6:18 ` Christoph Hellwig
2024-10-22 7:19 ` Peter Zijlstra
2024-10-22 7:21 ` Christoph Hellwig
2024-10-23 3:22 ` Ming Lei
2024-10-23 6:07 ` Christoph Hellwig
2024-10-22 15:05 ` Bart Van Assche
2024-10-23 7:59 ` Ming Lei
2024-10-23 18:05 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZxY4U2nTkRhV3NWL@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.