* [REGRESSION] 6.17-rc2: lockdep circular dependency at boot introduced by 8f5845e0743b (“block: restore default wbt enablement”)
@ 2025-08-21 6:56 Mikhail Gavrilov
2025-08-21 12:11 ` Nilay Shroff
0 siblings, 1 reply; 2+ messages in thread
From: Mikhail Gavrilov @ 2025-08-21 6:56 UTC (permalink / raw)
To: sunjunchao, axboe, nilay, yukuai3, Ming Lei, linux-block,
Linux List Kernel Mailing, Linux regressions mailing list
[-- Attachment #1: Type: text/plain, Size: 9301 bytes --]
Hi,
After commit 8f5845e0743b (“block: restore default wbt enablement”)
I started seeing a lockdep warning about a circular locking dependency
on every boot.
Bisect
git bisect identifies 8f5845e0743bf3512b71b3cb8afe06c192d6acc4 as the
first bad commit.
Reverting this commit on top of 6.17.0-rc2-git-b19a97d57c15 makes the
warning disappear completely.
The warning looks like this:
[ 12.595070] nvme nvme0: 32/0/0 default/read/poll queues
[ 12.595566] nvme nvme1: 32/0/0 default/read/poll queues
[ 12.610697] ======================================================
[ 12.610705] WARNING: possible circular locking dependency detected
[ 12.610714] 6.17.0-rc2-git-b19a97d57c15+ #158 Not tainted
[ 12.610726] ------------------------------------------------------
[ 12.610734] kworker/u129:3/911 is trying to acquire lock:
[ 12.610743] ffffffff899ab700 (cpu_hotplug_lock){++++}-{0:0}, at:
static_key_slow_inc+0x16/0x40
[ 12.610760]
but task is already holding lock:
[ 12.610769] ffff8881d166d570
(&q->q_usage_counter(io)#4){++++}-{0:0}, at:
blk_mq_freeze_queue_nomemsave+0x16/0x30
[ 12.610787]
which lock already depends on the new lock.
[ 12.610798]
the existing dependency chain (in reverse order) is:
[ 12.610971]
-> #2 (&q->q_usage_counter(io)#4){++++}-{0:0}:
[ 12.611246] __lock_acquire+0x56a/0xbe0
[ 12.611381] lock_acquire.part.0+0xc7/0x270
[ 12.611518] blk_alloc_queue+0x5cd/0x720
[ 12.611649] blk_mq_alloc_queue+0x143/0x250
[ 12.611780] __blk_mq_alloc_disk+0x18/0xd0
[ 12.611906] nvme_alloc_ns+0x240/0x1930 [nvme_core]
[ 12.612042] nvme_scan_ns+0x320/0x3b0 [nvme_core]
[ 12.612170] async_run_entry_fn+0x94/0x540
[ 12.612289] process_one_work+0x87a/0x14e0
[ 12.612406] worker_thread+0x5f2/0xfd0
[ 12.612527] kthread+0x3b0/0x770
[ 12.612641] ret_from_fork+0x3ef/0x510
[ 12.612760] ret_from_fork_asm+0x1a/0x30
[ 12.612875]
-> #1 (fs_reclaim){+.+.}-{0:0}:
[ 12.613102] __lock_acquire+0x56a/0xbe0
[ 12.613215] lock_acquire.part.0+0xc7/0x270
[ 12.613327] fs_reclaim_acquire+0xd9/0x130
[ 12.613444] __kmalloc_cache_node_noprof+0x60/0x4e0
[ 12.613560] amd_pmu_cpu_prepare+0x123/0x670
[ 12.613674] cpuhp_invoke_callback+0x2c8/0x9c0
[ 12.613791] __cpuhp_invoke_callback_range+0xbd/0x1f0
[ 12.613904] _cpu_up+0x2f8/0x6c0
[ 12.614015] cpu_up+0x11e/0x1c0
[ 12.614124] cpuhp_bringup_mask+0xea/0x130
[ 12.614231] bringup_nonboot_cpus+0xa9/0x170
[ 12.614335] smp_init+0x2b/0xf0
[ 12.614443] kernel_init_freeable+0x23f/0x2e0
[ 12.614545] kernel_init+0x1c/0x150
[ 12.614643] ret_from_fork+0x3ef/0x510
[ 12.614744] ret_from_fork_asm+0x1a/0x30
[ 12.614840]
-> #0 (cpu_hotplug_lock){++++}-{0:0}:
[ 12.615029] check_prev_add+0xe1/0xcf0
[ 12.615126] validate_chain+0x4cf/0x740
[ 12.615221] __lock_acquire+0x56a/0xbe0
[ 12.615316] lock_acquire.part.0+0xc7/0x270
[ 12.615414] cpus_read_lock+0x40/0xe0
[ 12.615508] static_key_slow_inc+0x16/0x40
[ 12.615602] rq_qos_add+0x264/0x440
[ 12.615696] wbt_init+0x3b2/0x510
[ 12.615793] blk_register_queue+0x334/0x470
[ 12.615887] __add_disk+0x5fd/0xd50
[ 12.615980] add_disk_fwnode+0x113/0x590
[ 12.616073] nvme_alloc_ns+0x7be/0x1930 [nvme_core]
[ 12.616173] nvme_scan_ns+0x320/0x3b0 [nvme_core]
[ 12.616272] async_run_entry_fn+0x94/0x540
[ 12.616366] process_one_work+0x87a/0x14e0
[ 12.616464] worker_thread+0x5f2/0xfd0
[ 12.616558] kthread+0x3b0/0x770
[ 12.616651] ret_from_fork+0x3ef/0x510
[ 12.616749] ret_from_fork_asm+0x1a/0x30
[ 12.616841]
other info that might help us debug this:
[ 12.617108] Chain exists of:
cpu_hotplug_lock --> fs_reclaim --> &q->q_usage_counter(io)#4
[ 12.617385] Possible unsafe locking scenario:
[ 12.617570] CPU0 CPU1
[ 12.617662] ---- ----
[ 12.617755] lock(&q->q_usage_counter(io)#4);
[ 12.617847] lock(fs_reclaim);
[ 12.617940] lock(&q->q_usage_counter(io)#4);
[ 12.618035] rlock(cpu_hotplug_lock);
[ 12.618129]
*** DEADLOCK ***
[ 12.618397] 7 locks held by kworker/u129:3/911:
[ 12.618495] #0: ffff8881083ba158
((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0xe31/0x14e0
[ 12.618692] #1: ffffc900061b7d20
((work_completion)(&entry->work)){+.+.}-{0:0}, at:
process_one_work+0x7f9/0x14e0
[ 12.618906] #2: ffff888109c801a8
(&set->update_nr_hwq_lock){.+.+}-{4:4}, at: add_disk_fwnode+0xfd/0x590
[ 12.619132] #3: ffff8881d166dbb8 (&q->sysfs_lock){+.+.}-{4:4}, at:
blk_register_queue+0xdc/0x470
[ 12.619257] #4: ffff8881d166d798 (&q->rq_qos_mutex){+.+.}-{4:4},
at: wbt_init+0x39c/0x510
[ 12.619383] #5: ffff8881d166d570
(&q->q_usage_counter(io)#4){++++}-{0:0}, at:
blk_mq_freeze_queue_nomemsave+0x16/0x30
[ 12.619640] #6: ffff8881d166d5b0
(&q->q_usage_counter(queue)#4){+.+.}-{0:0}, at:
blk_mq_freeze_queue_nomemsave+0x16/0x30
[ 12.619913]
stack backtrace:
[ 12.620171] CPU: 6 UID: 0 PID: 911 Comm: kworker/u129:3 Not tainted
6.17.0-rc2-git-b19a97d57c15+ #158 PREEMPT(lazy)
[ 12.620173] Hardware name: ASRock B650I Lightning WiFi/B650I
Lightning WiFi, BIOS 3.30 06/16/2025
[ 12.620174] Workqueue: async async_run_entry_fn
[ 12.620177] Call Trace:
[ 12.620178] <TASK>
[ 12.620179] dump_stack_lvl+0x84/0xd0
[ 12.620182] print_circular_bug.cold+0x38/0x46
[ 12.620185] check_noncircular+0x14a/0x170
[ 12.620187] check_prev_add+0xe1/0xcf0
[ 12.620189] ? lock_acquire.part.0+0xc7/0x270
[ 12.620191] validate_chain+0x4cf/0x740
[ 12.620193] __lock_acquire+0x56a/0xbe0
[ 12.620196] lock_acquire.part.0+0xc7/0x270
[ 12.620197] ? static_key_slow_inc+0x16/0x40
[ 12.620199] ? rcu_is_watching+0x15/0xe0
[ 12.620202] ? __pfx___might_resched+0x10/0x10
[ 12.620204] ? static_key_slow_inc+0x16/0x40
[ 12.620205] ? lock_acquire+0xf6/0x140
[ 12.620207] cpus_read_lock+0x40/0xe0
[ 12.620209] ? static_key_slow_inc+0x16/0x40
[ 12.620210] static_key_slow_inc+0x16/0x40
[ 12.620212] rq_qos_add+0x264/0x440
[ 12.620213] wbt_init+0x3b2/0x510
[ 12.620215] ? wbt_enable_default+0x174/0x2b0
[ 12.620217] blk_register_queue+0x334/0x470
[ 12.620218] __add_disk+0x5fd/0xd50
[ 12.620220] ? wait_for_completion+0x17f/0x3c0
[ 12.620222] add_disk_fwnode+0x113/0x590
[ 12.620224] nvme_alloc_ns+0x7be/0x1930 [nvme_core]
[ 12.620232] ? __pfx_nvme_alloc_ns+0x10/0x10 [nvme_core]
[ 12.620241] ? __pfx_nvme_find_get_ns+0x10/0x10 [nvme_core]
[ 12.620249] ? __pfx_nvme_ns_info_from_identify+0x10/0x10 [nvme_core]
[ 12.620257] nvme_scan_ns+0x320/0x3b0 [nvme_core]
[ 12.620264] ? __pfx_nvme_scan_ns+0x10/0x10 [nvme_core]
[ 12.620271] ? __lock_release.isra.0+0x1cb/0x340
[ 12.620273] ? lockdep_hardirqs_on+0x8c/0x130
[ 12.620275] ? seqcount_lockdep_reader_access+0xb5/0xc0
[ 12.620277] ? seqcount_lockdep_reader_access+0xb5/0xc0
[ 12.620279] ? ktime_get+0x6a/0x180
[ 12.620281] async_run_entry_fn+0x94/0x540
[ 12.620282] process_one_work+0x87a/0x14e0
[ 12.620285] ? __pfx_process_one_work+0x10/0x10
[ 12.620287] ? local_clock_noinstr+0xf/0x130
[ 12.620289] ? assign_work+0x156/0x390
[ 12.620291] worker_thread+0x5f2/0xfd0
[ 12.620294] ? __pfx_worker_thread+0x10/0x10
[ 12.620295] kthread+0x3b0/0x770
[ 12.620297] ? local_clock_noinstr+0xf/0x130
[ 12.620298] ? __pfx_kthread+0x10/0x10
[ 12.620300] ? rcu_is_watching+0x15/0xe0
[ 12.620301] ? __pfx_kthread+0x10/0x10
[ 12.620303] ret_from_fork+0x3ef/0x510
[ 12.620305] ? __pfx_kthread+0x10/0x10
[ 12.620306] ? __pfx_kthread+0x10/0x10
[ 12.620307] ret_from_fork_asm+0x1a/0x30
[ 12.620310] </TASK>
[ 12.628224] nvme0n1: p1
[ 12.628699] nvme1n1: p1 p2 p3
It looks like enabling WBT by default causes wbt_init() → rq_qos_add()
to hit static_key_slow_inc(), which takes cpus_read_lock() (i.e.
cpu_hotplug_lock), while the worker already holds q_usage_counter(io),
creating the cycle reported by lockdep.
Environment / Repro:
Hardware: ASRock B650I Lightning WiFi (NVMe), link to probe below
Kernel: 6.17.0-rc2-git-b19a97d57c15 (self-built)
Repro: occurs deterministically on every boot during NVMe namespace scan
First bad commit: 8f5845e0743bf3512b71b3cb8afe06c192d6acc4
(“block: restore default wbt enablement”) — found by git bisect
Fix/workaround: revert 8f5845e0743b
Attachments:
Full dmesg (with the complete lockdep trace)
.config
Hardware probe: https://linux-hardware.org/?probe=9a6dd1ef4d
Happy to test any proposed patches or additional instrumentation.
Thanks for looking into it.
--
Best Regards,
Mike Gavrilov.
[-- Attachment #2: dmesg-6.17.0-rc2-git-b19a97d57c15.zip --]
[-- Type: application/zip, Size: 45945 bytes --]
[-- Attachment #3: .config.zip --]
[-- Type: application/zip, Size: 70241 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [REGRESSION] 6.17-rc2: lockdep circular dependency at boot introduced by 8f5845e0743b (“block: restore default wbt enablement”)
2025-08-21 6:56 [REGRESSION] 6.17-rc2: lockdep circular dependency at boot introduced by 8f5845e0743b (“block: restore default wbt enablement”) Mikhail Gavrilov
@ 2025-08-21 12:11 ` Nilay Shroff
0 siblings, 0 replies; 2+ messages in thread
From: Nilay Shroff @ 2025-08-21 12:11 UTC (permalink / raw)
To: Mikhail Gavrilov, sunjunchao, axboe, yukuai3, Ming Lei,
linux-block, Linux List Kernel Mailing,
Linux regressions mailing list
On 8/21/25 12:26 PM, Mikhail Gavrilov wrote:
> Hi,
>
> After commit 8f5845e0743b (“block: restore default wbt enablement”)
> I started seeing a lockdep warning about a circular locking dependency
> on every boot.
>
> Bisect
> git bisect identifies 8f5845e0743bf3512b71b3cb8afe06c192d6acc4 as the
> first bad commit.
> Reverting this commit on top of 6.17.0-rc2-git-b19a97d57c15 makes the
> warning disappear completely.
>
> The warning looks like this:
> [ 12.595070] nvme nvme0: 32/0/0 default/read/poll queues
> [ 12.595566] nvme nvme1: 32/0/0 default/read/poll queues
>
> [ 12.610697] ======================================================
> [ 12.610705] WARNING: possible circular locking dependency detected
> [ 12.610714] 6.17.0-rc2-git-b19a97d57c15+ #158 Not tainted
> [ 12.610726] ------------------------------------------------------
> [ 12.610734] kworker/u129:3/911 is trying to acquire lock:
> [ 12.610743] ffffffff899ab700 (cpu_hotplug_lock){++++}-{0:0}, at:
> static_key_slow_inc+0x16/0x40
> [ 12.610760]
> but task is already holding lock:
> [ 12.610769] ffff8881d166d570
> (&q->q_usage_counter(io)#4){++++}-{0:0}, at:
> blk_mq_freeze_queue_nomemsave+0x16/0x30
> [ 12.610787]
> which lock already depends on the new lock.
>
> [ 12.610798]
> the existing dependency chain (in reverse order) is:
> [ 12.610971]
> -> #2 (&q->q_usage_counter(io)#4){++++}-{0:0}:
> [ 12.611246] __lock_acquire+0x56a/0xbe0
> [ 12.611381] lock_acquire.part.0+0xc7/0x270
> [ 12.611518] blk_alloc_queue+0x5cd/0x720
> [ 12.611649] blk_mq_alloc_queue+0x143/0x250
> [ 12.611780] __blk_mq_alloc_disk+0x18/0xd0
> [ 12.611906] nvme_alloc_ns+0x240/0x1930 [nvme_core]
> [ 12.612042] nvme_scan_ns+0x320/0x3b0 [nvme_core]
> [ 12.612170] async_run_entry_fn+0x94/0x540
> [ 12.612289] process_one_work+0x87a/0x14e0
> [ 12.612406] worker_thread+0x5f2/0xfd0
> [ 12.612527] kthread+0x3b0/0x770
> [ 12.612641] ret_from_fork+0x3ef/0x510
> [ 12.612760] ret_from_fork_asm+0x1a/0x30
> [ 12.612875]
> -> #1 (fs_reclaim){+.+.}-{0:0}:
> [ 12.613102] __lock_acquire+0x56a/0xbe0
> [ 12.613215] lock_acquire.part.0+0xc7/0x270
> [ 12.613327] fs_reclaim_acquire+0xd9/0x130
> [ 12.613444] __kmalloc_cache_node_noprof+0x60/0x4e0
> [ 12.613560] amd_pmu_cpu_prepare+0x123/0x670
> [ 12.613674] cpuhp_invoke_callback+0x2c8/0x9c0
> [ 12.613791] __cpuhp_invoke_callback_range+0xbd/0x1f0
> [ 12.613904] _cpu_up+0x2f8/0x6c0
> [ 12.614015] cpu_up+0x11e/0x1c0
> [ 12.614124] cpuhp_bringup_mask+0xea/0x130
> [ 12.614231] bringup_nonboot_cpus+0xa9/0x170
> [ 12.614335] smp_init+0x2b/0xf0
> [ 12.614443] kernel_init_freeable+0x23f/0x2e0
> [ 12.614545] kernel_init+0x1c/0x150
> [ 12.614643] ret_from_fork+0x3ef/0x510
> [ 12.614744] ret_from_fork_asm+0x1a/0x30
> [ 12.614840]
> -> #0 (cpu_hotplug_lock){++++}-{0:0}:
> [ 12.615029] check_prev_add+0xe1/0xcf0
> [ 12.615126] validate_chain+0x4cf/0x740
> [ 12.615221] __lock_acquire+0x56a/0xbe0
> [ 12.615316] lock_acquire.part.0+0xc7/0x270
> [ 12.615414] cpus_read_lock+0x40/0xe0
> [ 12.615508] static_key_slow_inc+0x16/0x40
> [ 12.615602] rq_qos_add+0x264/0x440
> [ 12.615696] wbt_init+0x3b2/0x510
> [ 12.615793] blk_register_queue+0x334/0x470
> [ 12.615887] __add_disk+0x5fd/0xd50
> [ 12.615980] add_disk_fwnode+0x113/0x590
> [ 12.616073] nvme_alloc_ns+0x7be/0x1930 [nvme_core]
> [ 12.616173] nvme_scan_ns+0x320/0x3b0 [nvme_core]
> [ 12.616272] async_run_entry_fn+0x94/0x540
> [ 12.616366] process_one_work+0x87a/0x14e0
> [ 12.616464] worker_thread+0x5f2/0xfd0
> [ 12.616558] kthread+0x3b0/0x770
> [ 12.616651] ret_from_fork+0x3ef/0x510
> [ 12.616749] ret_from_fork_asm+0x1a/0x30
> [ 12.616841]
> other info that might help us debug this:
>
> [ 12.617108] Chain exists of:
> cpu_hotplug_lock --> fs_reclaim --> &q->q_usage_counter(io)#4
>
> [ 12.617385] Possible unsafe locking scenario:
>
> [ 12.617570] CPU0 CPU1
> [ 12.617662] ---- ----
> [ 12.617755] lock(&q->q_usage_counter(io)#4);
> [ 12.617847] lock(fs_reclaim);
> [ 12.617940] lock(&q->q_usage_counter(io)#4);
> [ 12.618035] rlock(cpu_hotplug_lock);
> [ 12.618129]
> *** DEADLOCK ***
>
This one is already being addressed here:
https://lore.kernel.org/all/20250814082612.500845-1-nilay@linux.ibm.com/
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-08-21 12:11 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-21 6:56 [REGRESSION] 6.17-rc2: lockdep circular dependency at boot introduced by 8f5845e0743b (“block: restore default wbt enablement”) Mikhail Gavrilov
2025-08-21 12:11 ` Nilay Shroff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).