From: Ming Lei <ming.lei@redhat.com>
To: Nilay Shroff <nilay@linux.ibm.com>
Cc: "Jens Axboe" <axboe@kernel.dk>,
linux-block@vger.kernel.org,
"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Christoph Hellwig" <hch@lst.de>
Subject: Re: [PATCH V3 08/20] block: don't allow to switch elevator if updating nr_hw_queues is in-progress
Date: Wed, 30 Apr 2025 08:54:59 +0800 [thread overview]
Message-ID: <aBF044-d-hsx75I6@fedora> (raw)
In-Reply-To: <5e81717f-1b64-4302-b321-c12aee697a0b@linux.ibm.com>
On Tue, Apr 29, 2025 at 03:52:51PM +0530, Nilay Shroff wrote:
>
>
> On 4/29/25 8:13 AM, Ming Lei wrote:
> >> I couldn't recreate it on my setup using above blktests.
> > It is reproduced in my test vm every time after killing the nested variant:
> >
> > [ 74.257200] ======================================================
> > [ 74.259369] WARNING: possible circular locking dependency detected
> > [ 74.260772] 6.15.0-rc3_ublk+ #547 Not tainted
> > [ 74.261950] ------------------------------------------------------
> > [ 74.263281] check/5077 is trying to acquire lock:
> > [ 74.264492] ffff888105f1fd18 (kn->active#119){++++}-{0:0}, at: __kernfs_remove+0x213/0x680
> > [ 74.266006]
> > but task is already holding lock:
> > [ 74.267998] ffff88828a661e20 (&q->q_usage_counter(queue)#14){++++}-{0:0}, at: del_gendisk+0xe5/0x180
> > [ 74.269631]
> > which lock already depends on the new lock.
> >
> > [ 74.272645]
> > the existing dependency chain (in reverse order) is:
> > [ 74.274804]
> > -> #3 (&q->q_usage_counter(queue)#14){++++}-{0:0}:
> > [ 74.277009] blk_queue_enter+0x4c2/0x630
> > [ 74.278218] blk_mq_alloc_request+0x479/0xa00
> > [ 74.279539] scsi_execute_cmd+0x151/0xba0
> > [ 74.281078] sr_check_events+0x1bc/0xa40
> > [ 74.283012] cdrom_check_events+0x5c/0x120
> > [ 74.284892] disk_check_events+0xbe/0x390
> > [ 74.286181] disk_check_media_change+0xf1/0x220
> > [ 74.287455] sr_block_open+0xce/0x230
> > [ 74.288528] blkdev_get_whole+0x8d/0x200
> > [ 74.289702] bdev_open+0x614/0xc60
> > [ 74.290882] blkdev_open+0x1f6/0x360
> > [ 74.292215] do_dentry_open+0x491/0x1820
> > [ 74.293309] vfs_open+0x7a/0x440
> > [ 74.294384] path_openat+0x1b7e/0x2ce0
> > [ 74.295507] do_filp_open+0x1c5/0x450
> > [ 74.296616] do_sys_openat2+0xef/0x180
> > [ 74.297667] __x64_sys_openat+0x10e/0x210
> > [ 74.298768] do_syscall_64+0x92/0x180
> > [ 74.299800] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [ 74.300971]
> > -> #2 (&disk->open_mutex){+.+.}-{4:4}:
> > [ 74.302700] __mutex_lock+0x19c/0x1990
> > [ 74.303682] bdev_open+0x6cd/0xc60
> > [ 74.304613] bdev_file_open_by_dev+0xc4/0x140
> > [ 74.306008] disk_scan_partitions+0x191/0x290
> > [ 74.307716] __add_disk_fwnode+0xd2a/0x1140
> > [ 74.309394] add_disk_fwnode+0x10e/0x220
> > [ 74.311039] nvme_alloc_ns+0x1833/0x2c30
> > [ 74.312669] nvme_scan_ns+0x5a0/0x6f0
> > [ 74.314151] async_run_entry_fn+0x94/0x540
> > [ 74.315719] process_one_work+0x86a/0x14a0
> > [ 74.317287] worker_thread+0x5bb/0xf90
> > [ 74.318228] kthread+0x371/0x720
> > [ 74.319085] ret_from_fork+0x31/0x70
> > [ 74.319941] ret_from_fork_asm+0x1a/0x30
> > [ 74.320808]
> > -> #1 (&set->update_nr_hwq_sema){.+.+}-{4:4}:
> > [ 74.322311] down_read+0x8e/0x470
> > [ 74.323135] elv_iosched_store+0x17a/0x210
> > [ 74.324036] queue_attr_store+0x234/0x340
> > [ 74.324881] kernfs_fop_write_iter+0x39b/0x5a0
> > [ 74.325771] vfs_write+0x5df/0xec0
> > [ 74.326514] ksys_write+0xff/0x200
> > [ 74.327262] do_syscall_64+0x92/0x180
> > [ 74.328018] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [ 74.328963]
> > -> #0 (kn->active#119){++++}-{0:0}:
> > [ 74.330433] __lock_acquire+0x145f/0x2260
> > [ 74.331329] lock_acquire+0x163/0x300
> > [ 74.332221] kernfs_drain+0x39d/0x450
> > [ 74.333002] __kernfs_remove+0x213/0x680
> > [ 74.333792] kernfs_remove_by_name_ns+0xa2/0x100
> > [ 74.334589] remove_files+0x8d/0x1b0
> > [ 74.335326] sysfs_remove_group+0x7c/0x160
> > [ 74.336118] sysfs_remove_groups+0x55/0xb0
> > [ 74.336869] __kobject_del+0x7d/0x1d0
> > [ 74.337637] kobject_del+0x38/0x60
> > [ 74.338340] blk_unregister_queue+0x153/0x2c0
> > [ 74.339125] __del_gendisk+0x252/0x9d0
> > [ 74.339959] del_gendisk+0xe5/0x180
> > [ 74.340756] sr_remove+0x7b/0xd0
> > [ 74.341429] device_release_driver_internal+0x36d/0x520
> > [ 74.342353] bus_remove_device+0x1ef/0x3f0
> > [ 74.343172] device_del+0x3be/0x9b0
> > [ 74.343951] __scsi_remove_device+0x27f/0x340
> > [ 74.344724] sdev_store_delete+0x87/0x120
> > [ 74.345508] kernfs_fop_write_iter+0x39b/0x5a0
> > [ 74.346287] vfs_write+0x5df/0xec0
> > [ 74.347170] ksys_write+0xff/0x200
> > [ 74.348312] do_syscall_64+0x92/0x180
> > [ 74.349519] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [ 74.350797]
> > other info that might help us debug this:
> >
> > [ 74.353554] Chain exists of:
> > kn->active#119 --> &disk->open_mutex --> &q->q_usage_counter(queue)#14
> >
> > [ 74.355535] Possible unsafe locking scenario:
> >
> > [ 74.356650] CPU0 CPU1
> > [ 74.357328] ---- ----
> > [ 74.358026] lock(&q->q_usage_counter(queue)#14);
> > [ 74.358749] lock(&disk->open_mutex);
> > [ 74.359561] lock(&q->q_usage_counter(queue)#14);
> > [ 74.360488] lock(kn->active#119);
> > [ 74.361113]
> > *** DEADLOCK ***
> >
> > [ 74.362574] 6 locks held by check/5077:
> > [ 74.363193] #0: ffff888114640420 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xff/0x200
> > [ 74.364274] #1: ffff88829abb6088 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x25b/0x5a0
> > [ 74.365937] #2: ffff8881176ca0e0 (&shost->scan_mutex){+.+.}-{4:4}, at: sdev_store_delete+0x7f/0x120
> > [ 74.367643] #3: ffff88828521c380 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x90/0x520
> > [ 74.369464] #4: ffff8881176ca380 (&set->update_nr_hwq_sema){.+.+}-{4:4}, at: del_gendisk+0xdd/0x180
> > [ 74.370961] #5: ffff88828a661e20 (&q->q_usage_counter(queue)#14){++++}-{0:0}, at: del_gendisk+0xe5/0x180
> > [ 74.372050]
>
> This has baffled me as I don't understand how could read lock in
> elv_iosched_store (ruuning in context #1) depends on (same) read
> lock in add_disk_fwnode (running under another context #2) as
> both locks are represented by the same rw semaphore. As we see
That is why the read lock in elv_iosched_store() is annotated as
nested(new lockdep map) for avoiding the false positive warning, because
the two can't be grabbed at the same time.
> above both elv_iosched_store and add_disk_fwnode bot run under
> different contexts and so ideally they should be able to run
> concurrently acquiring the same read lock.
In theory, it is yes, but reality is that scheduler store attribute
isn't created until disk/queue kobject is added, then switching elevator
can't happen until the kobject/attribute is exposed to userspace, that
is why the nested annotation is correct.
Thanks,
Ming
next prev parent reply other threads:[~2025-04-30 0:55 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-24 15:21 [PATCH V3 00/20] block: unify elevator changing and fix lockdep warning Ming Lei
2025-04-24 15:21 ` [PATCH V3 01/20] block: move blk_mq_add_queue_tag_set() after blk_mq_map_swqueue() Ming Lei
2025-04-24 15:21 ` [PATCH V3 02/20] block: move ELEVATOR_FLAG_DISABLE_WBT a request queue flag Ming Lei
2025-04-25 14:29 ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 03/20] block: don't call freeze queue in elevator_switch() and elevator_disable() Ming Lei
2025-04-25 14:29 ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 04/20] block: use q->elevator with ->elevator_lock held in elv_iosched_show() Ming Lei
2025-04-25 6:08 ` Hannes Reinecke
2025-04-25 10:53 ` Nilay Shroff
2025-04-25 14:29 ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 05/20] block: add two helpers for registering/un-registering sched debugfs Ming Lei
2025-04-24 15:21 ` [PATCH V3 06/20] block: move sched debugfs register into elvevator_register_queue Ming Lei
2025-04-24 15:21 ` [PATCH V3 07/20] block: prevent adding/deleting disk during updating nr_hw_queues Ming Lei
2025-04-25 6:33 ` Hannes Reinecke
2025-04-25 11:37 ` Nilay Shroff
2025-04-25 14:33 ` Christoph Hellwig
2025-04-25 16:46 ` kernel test robot
2025-04-24 15:21 ` [PATCH V3 08/20] block: don't allow to switch elevator if updating nr_hw_queues is in-progress Ming Lei
2025-04-25 6:33 ` Hannes Reinecke
2025-04-25 12:48 ` Nilay Shroff
2025-04-27 2:27 ` Ming Lei
2025-04-28 16:17 ` Nilay Shroff
2025-04-29 2:43 ` Ming Lei
2025-04-29 10:22 ` Nilay Shroff
2025-04-30 0:54 ` Ming Lei [this message]
2025-04-24 15:21 ` [PATCH V3 09/20] block: simplify elevator reattachment for updating nr_hw_queues Ming Lei
2025-04-25 6:34 ` Hannes Reinecke
2025-04-25 18:12 ` Christoph Hellwig
2025-04-29 9:51 ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 10/20] block: move blk_unregister_queue() & device_del() after freeze wait Ming Lei
2025-04-25 6:35 ` Hannes Reinecke
2025-04-25 12:50 ` Nilay Shroff
2025-04-25 14:34 ` Christoph Hellwig
2025-04-28 11:51 ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 11/20] block: move queue freezing & elevator_lock into elevator_change() Ming Lei
2025-04-25 6:36 ` Hannes Reinecke
2025-04-25 12:54 ` Nilay Shroff
2025-04-25 14:35 ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 12/20] block: add `struct elv_change_ctx` for unifying elevator change Ming Lei
2025-04-25 6:38 ` Hannes Reinecke
2025-04-25 18:23 ` Christoph Hellwig
2025-04-29 15:45 ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 13/20] block: " Ming Lei
2025-04-25 6:39 ` Hannes Reinecke
2025-04-25 13:03 ` Nilay Shroff
2025-04-30 0:46 ` Ming Lei
2025-04-25 18:30 ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 14/20] block: pass elevator_queue to elv_register_queue & unregister_queue Ming Lei
2025-04-25 6:40 ` Hannes Reinecke
2025-04-24 15:21 ` [PATCH V3 15/20] block: fail to show/store elevator sysfs attribute if elevator is dying Ming Lei
2025-04-25 6:45 ` Hannes Reinecke
2025-04-25 18:36 ` Christoph Hellwig
2025-04-30 1:07 ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 16/20] block: move elv_register[unregister]_queue out of elevator_lock Ming Lei
2025-04-25 6:46 ` Hannes Reinecke
2025-04-25 13:05 ` Nilay Shroff
2025-04-25 18:37 ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 17/20] block: move debugfs/sysfs register out of freezing queue Ming Lei
2025-04-25 6:47 ` Hannes Reinecke
2025-04-25 18:38 ` Christoph Hellwig
2025-04-28 11:28 ` Ming Lei
2025-04-24 15:21 ` [PATCH V3 18/20] block: remove several ->elevator_lock Ming Lei
2025-04-25 6:48 ` Hannes Reinecke
2025-04-25 18:38 ` Christoph Hellwig
2025-04-24 15:21 ` [PATCH V3 19/20] block: move hctx cpuhp add/del out of queue freezing Ming Lei
2025-04-25 6:49 ` Hannes Reinecke
2025-04-24 15:21 ` [PATCH V3 20/20] block: move wbt_enable_default() out of queue freezing from sched ->exit() Ming Lei
2025-04-25 13:10 ` Nilay Shroff
2025-04-29 10:59 ` Nilay Shroff
2025-04-29 12:00 ` [PATCH V3 00/20] block: unify elevator changing and fix lockdep warning Stefan Haberland
2025-04-29 12:11 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBF044-d-hsx75I6@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=nilay@linux.ibm.com \
--cc=shinichiro.kawasaki@wdc.com \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox