* Blockdev 6.13-rc lockdep splat regressions
@ 2025-01-10 10:12 Thomas Hellström
2025-01-10 10:14 ` Christoph Hellwig
2025-01-10 12:13 ` Ming Lei
0 siblings, 2 replies; 14+ messages in thread
From: Thomas Hellström @ 2025-01-10 10:12 UTC (permalink / raw)
To: Ming Lei, Jens Axboe; +Cc: Christoph Hellwig, linux-block
Ming, Others
On 6.13-rc6 I'm seeing a couple of lockdep splats which appear
introduced by the commit
f1be1788a32e ("block: model freeze & enter queue as lock for supporting
lockdep")
The first one happens when swap-outs start to a scsi disc,
Simple reproducer is to start a couple of parallel "gitk" on the kernel
repo and watch them exhaust available memory.
the second is easily triggered by entering a debugfs trace directory,
apparently triggering automount:
cd /sys/kernel/debug/tracing/events
Are you aware of these?
Thanks,
Thomas
#1
[ 399.006581] ======================================================
[ 399.006756] WARNING: possible circular locking dependency detected
[ 399.006767] 6.12.0-rc4+ #1 Tainted: G U N
[ 399.006776] ------------------------------------------------------
[ 399.006801] kswapd0/116 is trying to acquire lock:
[ 399.006810] ffff9a67a1284a28 (&q->q_usage_counter(io)){++++}-{0:0},
at: __submit_bio+0xf0/0x1c0
[ 399.006845]
but task is already holding lock:
[ 399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at:
balance_pgdat+0xe2/0xa20
[ 399.006874]
which lock already depends on the new lock.
[ 399.006890]
the existing dependency chain (in reverse order) is:
[ 399.006905]
-> #1 (fs_reclaim){+.+.}-{0:0}:
[ 399.006919] fs_reclaim_acquire+0x9d/0xd0
[ 399.006931] __kmalloc_node_noprof+0xa5/0x460
[ 399.006943] sbitmap_init_node+0x84/0x1e0
[ 399.006953] scsi_realloc_sdev_budget_map+0xc8/0x1a0
[ 399.006965] scsi_add_lun+0x419/0x710
[ 399.006976] scsi_probe_and_add_lun+0x12d/0x450
[ 399.006988] __scsi_scan_target+0x112/0x230
[ 399.006999] scsi_scan_channel+0x59/0x90
[ 399.007009] scsi_scan_host_selected+0xe5/0x120
[ 399.007021] do_scan_async+0x1b/0x160
[ 399.007031] async_run_entry_fn+0x31/0x130
[ 399.007043] process_one_work+0x21a/0x590
[ 399.007054] worker_thread+0x1c3/0x3b0
[ 399.007065] kthread+0xd2/0x100
[ 399.007074] ret_from_fork+0x31/0x50
[ 399.007085] ret_from_fork_asm+0x1a/0x30
[ 399.007096]
-> #0 (&q->q_usage_counter(io)){++++}-{0:0}:
[ 399.007111] __lock_acquire+0x13ac/0x2170
[ 399.007123] lock_acquire+0xd0/0x2f0
[ 399.007134] blk_mq_submit_bio+0x90b/0xb00
[ 399.007145] __submit_bio+0xf0/0x1c0
[ 399.007155] submit_bio_noacct_nocheck+0x324/0x420
[ 399.007167] swap_writepage+0x14a/0x2c0
[ 399.007178] pageout+0x129/0x2d0
[ 399.007608] shrink_folio_list+0x5a0/0xd80
[ 399.008045] evict_folios+0x27a/0x790
[ 399.008486] try_to_shrink_lruvec+0x225/0x2b0
[ 399.008926] shrink_one+0x102/0x1f0
[ 399.009360] shrink_node+0xa7c/0x1130
[ 399.009821] balance_pgdat+0x560/0xa20
[ 399.010254] kswapd+0x20a/0x440
[ 399.010698] kthread+0xd2/0x100
[ 399.011141] ret_from_fork+0x31/0x50
[ 399.011584] ret_from_fork_asm+0x1a/0x30
[ 399.012024]
other info that might help us debug this:
[ 399.013283] Possible unsafe locking scenario:
[ 399.014160] CPU0 CPU1
[ 399.014584] ---- ----
[ 399.015010] lock(fs_reclaim);
[ 399.015439] lock(&q-
>q_usage_counter(io));
[ 399.015867] lock(fs_reclaim);
[ 399.016208] rlock(&q->q_usage_counter(io));
[ 399.016538]
*** DEADLOCK ***
[ 399.017539] 1 lock held by kswapd0/116:
[ 399.017887] #0: ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at:
balance_pgdat+0xe2/0xa20
[ 399.018218]
stack backtrace:
[ 399.018887] CPU: 11 UID: 0 PID: 116 Comm: kswapd0 Tainted: G U
N 6.12.0-rc4+ #1
[ 399.019217] Tainted: [U]=USER, [N]=TEST
[ 399.019543] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[ 399.019911] Call Trace:
[ 399.020235] <TASK>
[ 399.020556] dump_stack_lvl+0x6e/0xa0
[ 399.020890] print_circular_bug.cold+0x178/0x1be
[ 399.021207] check_noncircular+0x148/0x160
[ 399.021523] __lock_acquire+0x13ac/0x2170
[ 399.021852] lock_acquire+0xd0/0x2f0
[ 399.022167] ? __submit_bio+0xf0/0x1c0
[ 399.022489] ? blk_mq_submit_bio+0x8e0/0xb00
[ 399.022830] ? lock_release+0xd3/0x2b0
[ 399.023143] blk_mq_submit_bio+0x90b/0xb00
[ 399.023460] ? __submit_bio+0xf0/0x1c0
[ 399.023785] ? lock_acquire+0xd0/0x2f0
[ 399.024096] __submit_bio+0xf0/0x1c0
[ 399.024404] submit_bio_noacct_nocheck+0x324/0x420
[ 399.024713] swap_writepage+0x14a/0x2c0
[ 399.025037] pageout+0x129/0x2d0
[ 399.025348] shrink_folio_list+0x5a0/0xd80
[ 399.025668] ? evict_folios+0x25a/0x790
[ 399.026007] evict_folios+0x27a/0x790
[ 399.026325] try_to_shrink_lruvec+0x225/0x2b0
[ 399.026635] shrink_one+0x102/0x1f0
[ 399.026957] ? shrink_node+0xa63/0x1130
[ 399.027264] shrink_node+0xa7c/0x1130
[ 399.027570] ? shrink_node+0x908/0x1130
[ 399.027888] balance_pgdat+0x560/0xa20
[ 399.028197] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 399.028510] ? finish_task_switch.isra.0+0xc4/0x2a0
[ 399.028829] kswapd+0x20a/0x440
[ 399.029136] ? __pfx_autoremove_wake_function+0x10/0x10
[ 399.029483] ? __pfx_kswapd+0x10/0x10
[ 399.029912] kthread+0xd2/0x100
[ 399.030306] ? __pfx_kthread+0x10/0x10
[ 399.030699] ret_from_fork+0x31/0x50
[ 399.031105] ? __pfx_kthread+0x10/0x10
[ 399.031520] ret_from_fork_asm+0x1a/0x30
[ 399.031934] </TASK>
#2:
[ 81.960829] ======================================================
[ 81.961010] WARNING: possible circular locking dependency detected
[ 81.961048] 6.12.0-rc4+ #3 Tainted: G U
[ 81.961082] ------------------------------------------------------
[ 81.961117] bash/2744 is trying to acquire lock:
[ 81.961147] ffffffff8b6754d0 (namespace_sem){++++}-{4:4}, at:
finish_automount+0x77/0x3a0
[ 81.961215]
but task is already holding lock:
[ 81.961249] ffff8d7a8051ce50 (&sb->s_type->i_mutex_key#3){++++}-
{4:4}, at: finish_automount+0x6b/0x3a0
[ 81.961316]
which lock already depends on the new lock.
[ 81.961361]
the existing dependency chain (in reverse order) is:
[ 81.961403]
-> #6 (&sb->s_type->i_mutex_key#3){++++}-{4:4}:
[ 81.961452] down_write+0x2e/0xb0
[ 81.961486] start_creating.part.0+0x5f/0x120
[ 81.961523] debugfs_create_dir+0x36/0x190
[ 81.961557] blk_register_queue+0xba/0x220
[ 81.961594] add_disk_fwnode+0x235/0x430
[ 81.961626] nvme_alloc_ns+0x7eb/0xb50 [nvme_core]
[ 81.961696] nvme_scan_ns+0x251/0x330 [nvme_core]
[ 81.962053] async_run_entry_fn+0x31/0x130
[ 81.962088] process_one_work+0x21a/0x590
[ 81.962122] worker_thread+0x1c3/0x3b0
[ 81.962153] kthread+0xd2/0x100
[ 81.962179] ret_from_fork+0x31/0x50
[ 81.962211] ret_from_fork_asm+0x1a/0x30
[ 81.962243]
-> #5 (&q->debugfs_mutex){+.+.}-{4:4}:
[ 81.962287] __mutex_lock+0xad/0xb80
[ 81.962319] blk_mq_init_sched+0x181/0x260
[ 81.962350] elevator_init_mq+0xb0/0x100
[ 81.962381] add_disk_fwnode+0x50/0x430
[ 81.962412] sd_probe+0x335/0x530
[ 81.962441] really_probe+0xdb/0x340
[ 81.962474] __driver_probe_device+0x78/0x110
[ 81.962510] driver_probe_device+0x1f/0xa0
[ 81.962545] __device_attach_driver+0x89/0x110
[ 81.962581] bus_for_each_drv+0x98/0xf0
[ 81.962613] __device_attach_async_helper+0xa5/0xf0
[ 81.962651] async_run_entry_fn+0x31/0x130
[ 81.962686] process_one_work+0x21a/0x590
[ 81.962718] worker_thread+0x1c3/0x3b0
[ 81.962750] kthread+0xd2/0x100
[ 81.962775] ret_from_fork+0x31/0x50
[ 81.962806] ret_from_fork_asm+0x1a/0x30
[ 81.962838]
-> #4 (&q->q_usage_counter(queue)#3){++++}-{0:0}:
[ 81.962887] blk_queue_enter+0x1bc/0x1e0
[ 81.962918] blk_mq_alloc_request+0x144/0x2b0
[ 81.962951] scsi_execute_cmd+0x78/0x490
[ 81.962985] read_capacity_16+0x111/0x410
[ 81.963017] sd_revalidate_disk.isra.0+0x545/0x2eb0
[ 81.963053] sd_probe+0x2eb/0x530
[ 81.963081] really_probe+0xdb/0x340
[ 81.963112] __driver_probe_device+0x78/0x110
[ 81.963148] driver_probe_device+0x1f/0xa0
[ 81.963182] __device_attach_driver+0x89/0x110
[ 81.963218] bus_for_each_drv+0x98/0xf0
[ 81.963250] __device_attach_async_helper+0xa5/0xf0
[ 81.964380] async_run_entry_fn+0x31/0x130
[ 81.965502] process_one_work+0x21a/0x590
[ 81.965868] worker_thread+0x1c3/0x3b0
[ 81.966198] kthread+0xd2/0x100
[ 81.966528] ret_from_fork+0x31/0x50
[ 81.966855] ret_from_fork_asm+0x1a/0x30
[ 81.967179]
-> #3 (&q->limits_lock){+.+.}-{4:4}:
[ 81.967815] __mutex_lock+0xad/0xb80
[ 81.968133] nvme_update_ns_info_block+0x128/0x870 [nvme_core]
[ 81.968456] nvme_update_ns_info+0x41/0x220 [nvme_core]
[ 81.968774] nvme_alloc_ns+0x8a6/0xb50 [nvme_core]
[ 81.969090] nvme_scan_ns+0x251/0x330 [nvme_core]
[ 81.969401] async_run_entry_fn+0x31/0x130
[ 81.969703] process_one_work+0x21a/0x590
[ 81.970004] worker_thread+0x1c3/0x3b0
[ 81.970302] kthread+0xd2/0x100
[ 81.970603] ret_from_fork+0x31/0x50
[ 81.970901] ret_from_fork_asm+0x1a/0x30
[ 81.971195]
-> #2 (&q->q_usage_counter(io)){++++}-{0:0}:
[ 81.971776] blk_mq_submit_bio+0x90b/0xb00
[ 81.972071] __submit_bio+0xf0/0x1c0
[ 81.972364] submit_bio_noacct_nocheck+0x324/0x420
[ 81.972660] btrfs_submit_chunk+0x1a7/0x660
[ 81.972956] btrfs_submit_bbio+0x1a/0x30
[ 81.973250] read_extent_buffer_pages+0x15e/0x210
[ 81.973546] btrfs_read_extent_buffer+0x93/0x180
[ 81.973841] read_tree_block+0x30/0x60
[ 81.974137] read_block_for_search+0x219/0x320
[ 81.974432] btrfs_search_slot+0x335/0xd30
[ 81.974729] btrfs_init_root_free_objectid+0x90/0x130
[ 81.975027] open_ctree+0xa35/0x13eb
[ 81.975326] btrfs_get_tree.cold+0x6b/0xfd
[ 81.975627] vfs_get_tree+0x29/0xe0
[ 81.975926] fc_mount+0x12/0x40
[ 81.976223] btrfs_get_tree+0x2c1/0x6b0
[ 81.976520] vfs_get_tree+0x29/0xe0
[ 81.976816] vfs_cmd_create+0x59/0xe0
[ 81.977112] __do_sys_fsconfig+0x4f3/0x6c0
[ 81.977408] do_syscall_64+0x95/0x180
[ 81.977705] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 81.978005]
-> #1 (btrfs-root-01){++++}-{4:4}:
[ 81.978595] down_read_nested+0x34/0x150
[ 81.978895] btrfs_tree_read_lock_nested+0x25/0xd0
[ 81.979195] btrfs_read_lock_root_node+0x44/0xe0
[ 81.979494] btrfs_search_slot+0x143/0xd30
[ 81.979793] btrfs_search_backwards+0x2e/0x90
[ 81.980108] btrfs_get_subvol_name_from_objectid+0xd8/0x3c0
[ 81.980409] btrfs_show_options+0x294/0x780
[ 81.980718] show_mountinfo+0x207/0x2a0
[ 81.981025] seq_read_iter+0x2bc/0x480
[ 81.981327] vfs_read+0x294/0x370
[ 81.981628] ksys_read+0x73/0xf0
[ 81.981925] do_syscall_64+0x95/0x180
[ 81.982222] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 81.982523]
-> #0 (namespace_sem){++++}-{4:4}:
[ 81.983116] __lock_acquire+0x13ac/0x2170
[ 81.983415] lock_acquire+0xd0/0x2f0
[ 81.983714] down_write+0x2e/0xb0
[ 81.984014] finish_automount+0x77/0x3a0
[ 81.984313] __traverse_mounts+0x9d/0x210
[ 81.984612] step_into+0x349/0x770
[ 81.984909] link_path_walk.part.0.constprop.0+0x21e/0x390
[ 81.985211] path_lookupat+0x3e/0x1a0
[ 81.985511] filename_lookup+0xde/0x1d0
[ 81.985811] vfs_statx+0x8d/0x100
[ 81.986108] vfs_fstatat+0x63/0xc0
[ 81.986405] __do_sys_newfstatat+0x3c/0x80
[ 81.986704] do_syscall_64+0x95/0x180
[ 81.987005] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 81.987307]
other info that might help us debug this:
[ 81.988204] Chain exists of:
namespace_sem --> &q->debugfs_mutex --> &sb->s_type-
>i_mutex_key#3
[ 81.989113] Possible unsafe locking scenario:
[ 81.989723] CPU0 CPU1
[ 81.990028] ---- ----
[ 81.990331] lock(&sb->s_type->i_mutex_key#3);
[ 81.990637] lock(&q->debugfs_mutex);
[ 81.990947] lock(&sb->s_type-
>i_mutex_key#3);
[ 81.991257] lock(namespace_sem);
[ 81.991565]
*** DEADLOCK ***
[ 81.992465] 1 lock held by bash/2744:
[ 81.992766] #0: ffff8d7a8051ce50 (&sb->s_type-
>i_mutex_key#3){++++}-{4:4}, at: finish_automount+0x6b/0x3a0
[ 81.993084]
stack backtrace:
[ 81.993704] CPU: 2 UID: 0 PID: 2744 Comm: bash Tainted: G U
6.12.0-rc4+ #3
[ 81.994025] Tainted: [U]=USER
[ 81.994343] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[ 81.994673] Call Trace:
[ 81.995002] <TASK>
[ 81.995328] dump_stack_lvl+0x6e/0xa0
[ 81.995657] print_circular_bug.cold+0x178/0x1be
[ 81.995987] check_noncircular+0x148/0x160
[ 81.996318] __lock_acquire+0x13ac/0x2170
[ 81.996649] lock_acquire+0xd0/0x2f0
[ 81.996978] ? finish_automount+0x77/0x3a0
[ 81.997328] ? vfs_kern_mount.part.0+0x50/0xb0
[ 81.997660] ? kfree+0xd8/0x370
[ 81.997991] down_write+0x2e/0xb0
[ 81.998318] ? finish_automount+0x77/0x3a0
[ 81.998646] finish_automount+0x77/0x3a0
[ 81.998975] __traverse_mounts+0x9d/0x210
[ 81.999303] step_into+0x349/0x770
[ 81.999629] link_path_walk.part.0.constprop.0+0x21e/0x390
[ 81.999958] path_lookupat+0x3e/0x1a0
[ 82.000287] filename_lookup+0xde/0x1d0
[ 82.000618] vfs_statx+0x8d/0x100
[ 82.000944] ? strncpy_from_user+0x22/0xf0
[ 82.001271] vfs_fstatat+0x63/0xc0
[ 82.001614] __do_sys_newfstatat+0x3c/0x80
[ 82.001943] do_syscall_64+0x95/0x180
[ 82.002270] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 82.002601] ? syscall_exit_to_user_mode+0x97/0x290
[ 82.002933] ? do_syscall_64+0xa1/0x180
[ 82.003262] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 82.003592] ? syscall_exit_to_user_mode+0x97/0x290
[ 82.003923] ? clear_bhb_loop+0x45/0xa0
[ 82.004251] ? clear_bhb_loop+0x45/0xa0
[ 82.004576] ? clear_bhb_loop+0x45/0xa0
[ 82.004899] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 82.005224] RIP: 0033:0x7f82fa5ef73e
[ 82.005558] Code: 0f 1f 40 00 48 8b 15 d1 66 10 00 f7 d8 64 89 02 b8
ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 41 89 ca b8 06 01 00 00 0f
05 <3d> 00 f0 ff ff 77 0b 31 c0 c3 0f 1f 84 00 00 00 00 00 48 8b 15 99
[ 82.005925] RSP: 002b:00007ffd2e30cd98 EFLAGS: 00000246 ORIG_RAX:
0000000000000106
[ 82.006301] RAX: ffffffffffffffda RBX: 000055dfa1952b30 RCX:
00007f82fa5ef73e
[ 82.006678] RDX: 00007ffd2e30cdc0 RSI: 000055dfa1952b10 RDI:
00000000ffffff9c
[ 82.007057] RBP: 00007ffd2e30ce90 R08: 000055dfa1952b30 R09:
00007f82fa6f6b20
[ 82.007437] R10: 0000000000000000 R11: 0000000000000246 R12:
000055dfa1952b11
[ 82.007822] R13: 000055dfa1952b30 R14: 000055dfa1952b11 R15:
0000000000000003
[ 82.008209] </TASK>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions
2025-01-10 10:12 Blockdev 6.13-rc lockdep splat regressions Thomas Hellström
@ 2025-01-10 10:14 ` Christoph Hellwig
2025-01-10 10:21 ` Thomas Hellström
2025-01-10 12:13 ` Ming Lei
1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2025-01-10 10:14 UTC (permalink / raw)
To: Thomas Hellström
Cc: Ming Lei, Jens Axboe, Christoph Hellwig, linux-block
On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote:
> Ming, Others
>
> On 6.13-rc6 I'm seeing a couple of lockdep splats which appear
> introduced by the commit
>
> f1be1788a32e ("block: model freeze & enter queue as lock for supporting
> lockdep")
>
> The first one happens when swap-outs start to a scsi disc,
> Simple reproducer is to start a couple of parallel "gitk" on the kernel
> repo and watch them exhaust available memory.
>
> the second is easily triggered by entering a debugfs trace directory,
> apparently triggering automount:
> cd /sys/kernel/debug/tracing/events
>
> Are you aware of these?
Yes, this series fixes it:
https://lore.kernel.org/linux-block/20250110054726.1499538-1-hch@lst.de/
should be ready now that the nitpicking has settled down.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions
2025-01-10 10:14 ` Christoph Hellwig
@ 2025-01-10 10:21 ` Thomas Hellström
0 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2025-01-10 10:21 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Ming Lei, Jens Axboe, linux-block
On Fri, 2025-01-10 at 11:14 +0100, Christoph Hellwig wrote:
> On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote:
> > Ming, Others
> >
> > On 6.13-rc6 I'm seeing a couple of lockdep splats which appear
> > introduced by the commit
> >
> > f1be1788a32e ("block: model freeze & enter queue as lock for
> > supporting
> > lockdep")
> >
> > The first one happens when swap-outs start to a scsi disc,
> > Simple reproducer is to start a couple of parallel "gitk" on the
> > kernel
> > repo and watch them exhaust available memory.
> >
> > the second is easily triggered by entering a debugfs trace
> > directory,
> > apparently triggering automount:
> > cd /sys/kernel/debug/tracing/events
> >
> > Are you aware of these?
>
> Yes, this series fixes it:
>
> https://lore.kernel.org/linux-block/20250110054726.1499538-1-hch@lst.de/
>
> should be ready now that the nitpicking has settled down.
>
Great. Thanks.
/Thomas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions
2025-01-10 10:12 Blockdev 6.13-rc lockdep splat regressions Thomas Hellström
2025-01-10 10:14 ` Christoph Hellwig
@ 2025-01-10 12:13 ` Ming Lei
2025-01-10 14:36 ` Thomas Hellström
1 sibling, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-01-10 12:13 UTC (permalink / raw)
To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block
On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote:
> Ming, Others
>
> On 6.13-rc6 I'm seeing a couple of lockdep splats which appear
> introduced by the commit
>
> f1be1788a32e ("block: model freeze & enter queue as lock for supporting
> lockdep")
The freeze lock connects all kinds of sub-system locks, that is why
we see lots of warnings after the commit is merged.
...
> #1
> [ 399.006581] ======================================================
> [ 399.006756] WARNING: possible circular locking dependency detected
> [ 399.006767] 6.12.0-rc4+ #1 Tainted: G U N
> [ 399.006776] ------------------------------------------------------
> [ 399.006801] kswapd0/116 is trying to acquire lock:
> [ 399.006810] ffff9a67a1284a28 (&q->q_usage_counter(io)){++++}-{0:0},
> at: __submit_bio+0xf0/0x1c0
> [ 399.006845]
> but task is already holding lock:
> [ 399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at:
> balance_pgdat+0xe2/0xa20
> [ 399.006874]
The above one is solved in for-6.14/block of block tree:
block: track queue dying state automatically for modeling queue freeze lockdep
>
> #2:
> [ 81.960829] ======================================================
> [ 81.961010] WARNING: possible circular locking dependency detected
> [ 81.961048] 6.12.0-rc4+ #3 Tainted: G U
...
> -> #3 (&q->limits_lock){+.+.}-{4:4}:
> [ 81.967815] __mutex_lock+0xad/0xb80
> [ 81.968133] nvme_update_ns_info_block+0x128/0x870 [nvme_core]
> [ 81.968456] nvme_update_ns_info+0x41/0x220 [nvme_core]
> [ 81.968774] nvme_alloc_ns+0x8a6/0xb50 [nvme_core]
> [ 81.969090] nvme_scan_ns+0x251/0x330 [nvme_core]
> [ 81.969401] async_run_entry_fn+0x31/0x130
> [ 81.969703] process_one_work+0x21a/0x590
> [ 81.970004] worker_thread+0x1c3/0x3b0
> [ 81.970302] kthread+0xd2/0x100
> [ 81.970603] ret_from_fork+0x31/0x50
> [ 81.970901] ret_from_fork_asm+0x1a/0x30
> [ 81.971195]
> -> #2 (&q->q_usage_counter(io)){++++}-{0:0}:
The above dependency is killed by Christoph's patch.
Thanks,
Ming
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions
2025-01-10 12:13 ` Ming Lei
@ 2025-01-10 14:36 ` Thomas Hellström
2025-01-11 3:05 ` Ming Lei
0 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2025-01-10 14:36 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block
On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote:
> On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote:
> > Ming, Others
> >
> > On 6.13-rc6 I'm seeing a couple of lockdep splats which appear
> > introduced by the commit
> >
> > f1be1788a32e ("block: model freeze & enter queue as lock for
> > supporting
> > lockdep")
>
> The freeze lock connects all kinds of sub-system locks, that is why
> we see lots of warnings after the commit is merged.
>
> ...
>
> > #1
> > [ 399.006581]
> > ======================================================
> > [ 399.006756] WARNING: possible circular locking dependency
> > detected
> > [ 399.006767] 6.12.0-rc4+ #1 Tainted: G U N
> > [ 399.006776] ----------------------------------------------------
> > --
> > [ 399.006801] kswapd0/116 is trying to acquire lock:
> > [ 399.006810] ffff9a67a1284a28 (&q->q_usage_counter(io)){++++}-
> > {0:0},
> > at: __submit_bio+0xf0/0x1c0
> > [ 399.006845]
> > but task is already holding lock:
> > [ 399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at:
> > balance_pgdat+0xe2/0xa20
> > [ 399.006874]
>
> The above one is solved in for-6.14/block of block tree:
>
> block: track queue dying state automatically for modeling
> queue freeze lockdep
Hmm. I applied this series:
https://patchwork.kernel.org/project/linux-block/list/?series=912824&archive=both
on top of -rc6, but it didn't resolve that splat. Am I using the
correct patches?
Perhaps it might be a good idea to reclaim-prime those lockdep maps
taken during reclaim to have the splats happen earlier.
Thanks,
Thomas
>
> >
> > #2:
> > [ 81.960829]
> > ======================================================
> > [ 81.961010] WARNING: possible circular locking dependency
> > detected
> > [ 81.961048] 6.12.0-rc4+ #3 Tainted: G U
>
> ...
>
> > -> #3 (&q->limits_lock){+.+.}-{4:4}:
> > [ 81.967815] __mutex_lock+0xad/0xb80
> > [ 81.968133] nvme_update_ns_info_block+0x128/0x870
> > [nvme_core]
> > [ 81.968456] nvme_update_ns_info+0x41/0x220 [nvme_core]
> > [ 81.968774] nvme_alloc_ns+0x8a6/0xb50 [nvme_core]
> > [ 81.969090] nvme_scan_ns+0x251/0x330 [nvme_core]
> > [ 81.969401] async_run_entry_fn+0x31/0x130
> > [ 81.969703] process_one_work+0x21a/0x590
> > [ 81.970004] worker_thread+0x1c3/0x3b0
> > [ 81.970302] kthread+0xd2/0x100
> > [ 81.970603] ret_from_fork+0x31/0x50
> > [ 81.970901] ret_from_fork_asm+0x1a/0x30
> > [ 81.971195]
> > -> #2 (&q->q_usage_counter(io)){++++}-{0:0}:
>
> The above dependency is killed by Christoph's patch.
>
>
> Thanks,
> Ming
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions
2025-01-10 14:36 ` Thomas Hellström
@ 2025-01-11 3:05 ` Ming Lei
2025-01-12 11:33 ` Thomas Hellström
0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-01-11 3:05 UTC (permalink / raw)
To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block
On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote:
> On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote:
> > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote:
> > > Ming, Others
> > >
> > > On 6.13-rc6 I'm seeing a couple of lockdep splats which appear
> > > introduced by the commit
> > >
> > > f1be1788a32e ("block: model freeze & enter queue as lock for
> > > supporting
> > > lockdep")
> >
> > The freeze lock connects all kinds of sub-system locks, that is why
> > we see lots of warnings after the commit is merged.
> >
> > ...
> >
> > > #1
> > > [ 399.006581]
> > > ======================================================
> > > [ 399.006756] WARNING: possible circular locking dependency
> > > detected
> > > [ 399.006767] 6.12.0-rc4+ #1 Tainted: G U N
> > > [ 399.006776] ----------------------------------------------------
> > > --
> > > [ 399.006801] kswapd0/116 is trying to acquire lock:
> > > [ 399.006810] ffff9a67a1284a28 (&q->q_usage_counter(io)){++++}-
> > > {0:0},
> > > at: __submit_bio+0xf0/0x1c0
> > > [ 399.006845]
> > > but task is already holding lock:
> > > [ 399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at:
> > > balance_pgdat+0xe2/0xa20
> > > [ 399.006874]
> >
> > The above one is solved in for-6.14/block of block tree:
> >
> > block: track queue dying state automatically for modeling
> > queue freeze lockdep
>
> Hmm. I applied this series:
>
> https://patchwork.kernel.org/project/linux-block/list/?series=912824&archive=both
>
> on top of -rc6, but it didn't resolve that splat. Am I using the
> correct patches?
>
> Perhaps it might be a good idea to reclaim-prime those lockdep maps
> taken during reclaim to have the splats happen earlier.
for-6.14/block does kill the dependency between fs_reclaim and
q->q_usage_counter(io) in scsi_add_lun() when scsi disk isn't
added yet.
Maybe it is another warning, care to post the warning log here?
Thanks,
Ming
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions
2025-01-11 3:05 ` Ming Lei
@ 2025-01-12 11:33 ` Thomas Hellström
2025-01-12 15:50 ` Ming Lei
2025-01-13 9:28 ` Ming Lei
0 siblings, 2 replies; 14+ messages in thread
From: Thomas Hellström @ 2025-01-12 11:33 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block
On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
> On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote:
> > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote:
> > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote:
> > > > Ming, Others
> > > >
> > > > On 6.13-rc6 I'm seeing a couple of lockdep splats which appear
> > > > introduced by the commit
> > > >
> > > > f1be1788a32e ("block: model freeze & enter queue as lock for
> > > > supporting
> > > > lockdep")
> > >
> > > The freeze lock connects all kinds of sub-system locks, that is
> > > why
> > > we see lots of warnings after the commit is merged.
> > >
> > > ...
> > >
> > > > #1
> > > > [ 399.006581]
> > > > ======================================================
> > > > [ 399.006756] WARNING: possible circular locking dependency
> > > > detected
> > > > [ 399.006767] 6.12.0-rc4+ #1 Tainted: G U N
> > > > [ 399.006776] ------------------------------------------------
> > > > ----
> > > > --
> > > > [ 399.006801] kswapd0/116 is trying to acquire lock:
> > > > [ 399.006810] ffff9a67a1284a28 (&q-
> > > > >q_usage_counter(io)){++++}-
> > > > {0:0},
> > > > at: __submit_bio+0xf0/0x1c0
> > > > [ 399.006845]
> > > > but task is already holding lock:
> > > > [ 399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at:
> > > > balance_pgdat+0xe2/0xa20
> > > > [ 399.006874]
> > >
> > > The above one is solved in for-6.14/block of block tree:
> > >
> > > block: track queue dying state automatically for
> > > modeling
> > > queue freeze lockdep
> >
> > Hmm. I applied this series:
> >
> > https://patchwork.kernel.org/project/linux-block/list/?series=912824&archive=both
> >
> > on top of -rc6, but it didn't resolve that splat. Am I using the
> > correct patches?
> >
> > Perhaps it might be a good idea to reclaim-prime those lockdep maps
> > taken during reclaim to have the splats happen earlier.
>
> for-6.14/block does kill the dependency between fs_reclaim and
> q->q_usage_counter(io) in scsi_add_lun() when scsi disk isn't
> added yet.
>
> Maybe it is another warning, care to post the warning log here?
Ah, You're right, it's a different warning this time. Posted the
warning below. (Note: This is also with Christoph's series applied on
top).
May I also humbly suggest the following lockdep priming to be able to
catch the reclaim lockdep splats early without reclaim needing to
happen. That will also pick up splat #2 below.
8<-------------------------------------------------------------
diff --git a/block/blk-core.c b/block/blk-core.c
index 32fb28a6372c..2dd8dc9aed7f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -458,6 +458,11 @@ struct request_queue *blk_alloc_queue(struct
queue_limits *lim, int node_id)
q->nr_requests = BLKDEV_DEFAULT_RQ;
+ fs_reclaim_acquire(GFP_KERNEL);
+ rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_);
+ rwsem_release(&q->io_lockdep_map, _RET_IP_);
+ fs_reclaim_release(GFP_KERNEL);
+
return q;
fail_stats:
8<-------------------------------------------------------------
#1:
106.921533] ======================================================
[ 106.921716] WARNING: possible circular locking dependency detected
[ 106.921725] 6.13.0-rc6+ #121 Tainted: G U
[ 106.921734] ------------------------------------------------------
[ 106.921743] kswapd0/117 is trying to acquire lock:
[ 106.921751] ffff8ff4e2da09f0 (&q->q_usage_counter(io)){++++}-{0:0},
at: __submit_bio+0x80/0x220
[ 106.921769]
but task is already holding lock:
[ 106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
balance_pgdat+0xe2/0xa10
[ 106.921791]
which lock already depends on the new lock.
[ 106.921803]
the existing dependency chain (in reverse order) is:
[ 106.921814]
-> #1 (fs_reclaim){+.+.}-{0:0}:
[ 106.921824] fs_reclaim_acquire+0x9d/0xd0
[ 106.921833] __kmalloc_cache_node_noprof+0x5d/0x3f0
[ 106.921842] blk_mq_init_tags+0x3d/0xb0
[ 106.921851] blk_mq_alloc_map_and_rqs+0x4e/0x3d0
[ 106.921860] blk_mq_init_sched+0x100/0x260
[ 106.921868] elevator_switch+0x8d/0x2e0
[ 106.921877] elv_iosched_store+0x174/0x1e0
[ 106.921885] queue_attr_store+0x142/0x180
[ 106.921893] kernfs_fop_write_iter+0x168/0x240
[ 106.921902] vfs_write+0x2b2/0x540
[ 106.921910] ksys_write+0x72/0xf0
[ 106.921916] do_syscall_64+0x95/0x180
[ 106.921925] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 106.921935]
-> #0 (&q->q_usage_counter(io)){++++}-{0:0}:
[ 106.921946] __lock_acquire+0x1339/0x2180
[ 106.921955] lock_acquire+0xd0/0x2e0
[ 106.921963] blk_mq_submit_bio+0x88b/0xb60
[ 106.921972] __submit_bio+0x80/0x220
[ 106.921980] submit_bio_noacct_nocheck+0x324/0x420
[ 106.921989] swap_writepage+0x399/0x580
[ 106.921997] pageout+0x129/0x2d0
[ 106.922005] shrink_folio_list+0x5a0/0xd80
[ 106.922013] evict_folios+0x27d/0x7b0
[ 106.922020] try_to_shrink_lruvec+0x21b/0x2b0
[ 106.922028] shrink_one+0x102/0x1f0
[ 106.922035] shrink_node+0xb8e/0x1300
[ 106.922043] balance_pgdat+0x550/0xa10
[ 106.922050] kswapd+0x20a/0x440
[ 106.922057] kthread+0xd2/0x100
[ 106.922064] ret_from_fork+0x31/0x50
[ 106.922072] ret_from_fork_asm+0x1a/0x30
[ 106.922080]
other info that might help us debug this:
[ 106.922092] Possible unsafe locking scenario:
[ 106.922101] CPU0 CPU1
[ 106.922108] ---- ----
[ 106.922115] lock(fs_reclaim);
[ 106.922121] lock(&q-
>q_usage_counter(io));
[ 106.922132] lock(fs_reclaim);
[ 106.922141] rlock(&q->q_usage_counter(io));
[ 106.922148]
*** DEADLOCK ***
[ 106.922476] 1 lock held by kswapd0/117:
[ 106.922802] #0: ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
balance_pgdat+0xe2/0xa10
[ 106.923138]
stack backtrace:
[ 106.923806] CPU: 3 UID: 0 PID: 117 Comm: kswapd0 Tainted: G U
6.13.0-rc6+ #121
[ 106.924173] Tainted: [U]=USER
[ 106.924523] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[ 106.924882] Call Trace:
[ 106.925223] <TASK>
[ 106.925559] dump_stack_lvl+0x6e/0xa0
[ 106.925893] print_circular_bug.cold+0x178/0x1be
[ 106.926233] check_noncircular+0x148/0x160
[ 106.926565] ? unwind_next_frame+0x42a/0x750
[ 106.926905] __lock_acquire+0x1339/0x2180
[ 106.927227] lock_acquire+0xd0/0x2e0
[ 106.927546] ? __submit_bio+0x80/0x220
[ 106.927892] ? blk_mq_submit_bio+0x860/0xb60
[ 106.928212] ? lock_release+0xd2/0x2a0
[ 106.928536] blk_mq_submit_bio+0x88b/0xb60
[ 106.928850] ? __submit_bio+0x80/0x220
[ 106.929184] __submit_bio+0x80/0x220
[ 106.929499] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 106.929833] ? submit_bio_noacct_nocheck+0x324/0x420
[ 106.930147] submit_bio_noacct_nocheck+0x324/0x420
[ 106.930464] swap_writepage+0x399/0x580
[ 106.930794] pageout+0x129/0x2d0
[ 106.931114] shrink_folio_list+0x5a0/0xd80
[ 106.931447] ? evict_folios+0x25d/0x7b0
[ 106.931776] evict_folios+0x27d/0x7b0
[ 106.932092] try_to_shrink_lruvec+0x21b/0x2b0
[ 106.932410] shrink_one+0x102/0x1f0
[ 106.932742] shrink_node+0xb8e/0x1300
[ 106.933056] ? shrink_node+0x9c1/0x1300
[ 106.933368] ? shrink_node+0xb64/0x1300
[ 106.933679] ? balance_pgdat+0x550/0xa10
[ 106.933988] balance_pgdat+0x550/0xa10
[ 106.934296] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 106.934607] ? finish_task_switch.isra.0+0xc4/0x2a0
[ 106.934920] kswapd+0x20a/0x440
[ 106.935229] ? __pfx_autoremove_wake_function+0x10/0x10
[ 106.935542] ? __pfx_kswapd+0x10/0x10
[ 106.935881] kthread+0xd2/0x100
[ 106.936191] ? __pfx_kthread+0x10/0x10
[ 106.936501] ret_from_fork+0x31/0x50
[ 106.936810] ? __pfx_kthread+0x10/0x10
[ 106.937120] ret_from_fork_asm+0x1a/0x30
[ 106.937433] </TASK>
#2:
[ 5.595482] ======================================================
[ 5.596353] WARNING: possible circular locking dependency detected
[ 5.597231] 6.13.0-rc6+ #122 Tainted: G U
[ 5.598182] ------------------------------------------------------
[ 5.599149] (udev-worker)/867 is trying to acquire lock:
[ 5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4}, at:
kernfs_remove+0x31/0x50
[ 5.600987]
but task is already holding lock:
[ 5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}-
{0:0}, at: blk_mq_freeze_queue+0x12/0x20
[ 5.603033]
which lock already depends on the new lock.
[ 5.603034]
the existing dependency chain (in reverse order) is:
[ 5.603035]
-> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}:
[ 5.603038] blk_alloc_queue+0x319/0x350
[ 5.603041] blk_mq_alloc_queue+0x63/0xd0
[ 5.603043] scsi_alloc_sdev+0x281/0x3c0
[ 5.603045] scsi_probe_and_add_lun+0x1f5/0x450
[ 5.603046] __scsi_scan_target+0x112/0x230
[ 5.603048] scsi_scan_channel+0x59/0x90
[ 5.603049] scsi_scan_host_selected+0xe5/0x120
[ 5.603051] do_scan_async+0x1b/0x160
[ 5.603052] async_run_entry_fn+0x31/0x130
[ 5.603055] process_one_work+0x21a/0x590
[ 5.603058] worker_thread+0x1c3/0x3b0
[ 5.603059] kthread+0xd2/0x100
[ 5.603061] ret_from_fork+0x31/0x50
[ 5.603064] ret_from_fork_asm+0x1a/0x30
[ 5.603066]
-> #1 (fs_reclaim){+.+.}-{0:0}:
[ 5.603068] fs_reclaim_acquire+0x9d/0xd0
[ 5.603070] kmem_cache_alloc_lru_noprof+0x57/0x3f0
[ 5.603072] alloc_inode+0x97/0xc0
[ 5.603074] iget_locked+0x141/0x310
[ 5.603076] kernfs_get_inode+0x1a/0xf0
[ 5.603077] kernfs_get_tree+0x17b/0x2c0
[ 5.603080] sysfs_get_tree+0x1a/0x40
[ 5.603081] vfs_get_tree+0x29/0xe0
[ 5.603083] path_mount+0x49a/0xbd0
[ 5.603085] __x64_sys_mount+0x119/0x150
[ 5.603086] do_syscall_64+0x95/0x180
[ 5.603089] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 5.603092]
-> #0 (&root->kernfs_rwsem){++++}-{4:4}:
[ 5.603094] __lock_acquire+0x1339/0x2180
[ 5.603097] lock_acquire+0xd0/0x2e0
[ 5.603099] down_write+0x2e/0xb0
[ 5.603101] kernfs_remove+0x31/0x50
[ 5.603103] __kobject_del+0x2e/0x90
[ 5.603104] kobject_del+0x13/0x30
[ 5.603104] elevator_switch+0x44/0x2e0
[ 5.603106] elv_iosched_store+0x174/0x1e0
[ 5.603107] queue_attr_store+0x142/0x180
[ 5.603108] kernfs_fop_write_iter+0x168/0x240
[ 5.603110] vfs_write+0x2b2/0x540
[ 5.603111] ksys_write+0x72/0xf0
[ 5.603111] do_syscall_64+0x95/0x180
[ 5.603113] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 5.603114]
other info that might help us debug this:
[ 5.603115] Chain exists of:
&root->kernfs_rwsem --> fs_reclaim --> &q-
>q_usage_counter(io)#3
[ 5.603117] Possible unsafe locking scenario:
[ 5.603117] CPU0 CPU1
[ 5.603117] ---- ----
[ 5.603118] lock(&q->q_usage_counter(io)#3);
[ 5.603119] lock(fs_reclaim);
[ 5.603119] lock(&q-
>q_usage_counter(io)#3);
[ 5.603120] lock(&root->kernfs_rwsem);
[ 5.603121]
*** DEADLOCK ***
[ 5.603121] 6 locks held by (udev-worker)/867:
[ 5.603122] #0: ffff9211c16dd420 (sb_writers#4){.+.+}-{0:0}, at:
ksys_write+0x72/0xf0
[ 5.603125] #1: ffff9211e28f3e88 (&of->mutex#2){+.+.}-{4:4}, at:
kernfs_fop_write_iter+0x121/0x240
[ 5.603128] #2: ffff921203524f28 (kn->active#101){.+.+}-{0:0}, at:
kernfs_fop_write_iter+0x12a/0x240
[ 5.603131] #3: ffff9211e86f46d0 (&q->sysfs_lock){+.+.}-{4:4}, at:
queue_attr_store+0x12b/0x180
[ 5.603133] #4: ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}-
{0:0}, at: blk_mq_freeze_queue+0x12/0x20
[ 5.603136] #5: ffff9211e86f41d8 (&q-
>q_usage_counter(queue)#3){++++}-{0:0}, at:
blk_mq_freeze_queue+0x12/0x20
[ 5.603139]
stack backtrace:
[ 5.603140] CPU: 4 UID: 0 PID: 867 Comm: (udev-worker) Tainted: G
U 6.13.0-rc6+ #122
[ 5.603142] Tainted: [U]=USER
[ 5.603142] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[ 5.603143] Call Trace:
[ 5.603144] <TASK>
[ 5.603146] dump_stack_lvl+0x6e/0xa0
[ 5.603148] print_circular_bug.cold+0x178/0x1be
[ 5.603151] check_noncircular+0x148/0x160
[ 5.603154] __lock_acquire+0x1339/0x2180
[ 5.603156] lock_acquire+0xd0/0x2e0
[ 5.603158] ? kernfs_remove+0x31/0x50
[ 5.603160] ? sysfs_remove_dir+0x32/0x60
[ 5.603162] ? lock_release+0xd2/0x2a0
[ 5.603164] down_write+0x2e/0xb0
[ 5.603165] ? kernfs_remove+0x31/0x50
[ 5.603166] kernfs_remove+0x31/0x50
[ 5.603168] __kobject_del+0x2e/0x90
[ 5.603170] elevator_switch+0x44/0x2e0
[ 5.603172] elv_iosched_store+0x174/0x1e0
[ 5.603174] queue_attr_store+0x142/0x180
[ 5.603176] ? lock_acquire+0xd0/0x2e0
[ 5.603177] ? kernfs_fop_write_iter+0x12a/0x240
[ 5.603179] ? lock_is_held_type+0x9a/0x110
[ 5.603182] kernfs_fop_write_iter+0x168/0x240
[ 5.657060] vfs_write+0x2b2/0x540
[ 5.657470] ksys_write+0x72/0xf0
[ 5.657475] do_syscall_64+0x95/0x180
[ 5.657480] ? lock_acquire+0xd0/0x2e0
[ 5.657484] ? ktime_get_coarse_real_ts64+0x12/0x60
[ 5.657486] ? find_held_lock+0x2b/0x80
[ 5.657489] ? ktime_get_coarse_real_ts64+0x12/0x60
[ 5.657490] ? file_has_perm+0xa9/0xf0
[ 5.657494] ? syscall_exit_to_user_mode_prepare+0x21b/0x250
[ 5.657499] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 5.657501] ? syscall_exit_to_user_mode+0x97/0x290
[ 5.657504] ? do_syscall_64+0xa1/0x180
[ 5.657507] ? lock_acquire+0xd0/0x2e0
[ 5.662389] ? fd_install+0x3e/0x300
[ 5.662395] ? find_held_lock+0x2b/0x80
[ 5.663189] ? fd_install+0xbb/0x300
[ 5.663194] ? do_sys_openat2+0x9c/0xe0
[ 5.664093] ? kmem_cache_free+0x13e/0x450
[ 5.664099] ? syscall_exit_to_user_mode_prepare+0x21b/0x250
[ 5.664952] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 5.664956] ? syscall_exit_to_user_mode+0x97/0x290
[ 5.664961] ? do_syscall_64+0xa1/0x180
[ 5.664964] ? syscall_exit_to_user_mode_prepare+0x21b/0x250
[ 5.664967] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 5.664969] ? syscall_exit_to_user_mode+0x97/0x290
[ 5.664972] ? do_syscall_64+0xa1/0x180
[ 5.664974] ? clear_bhb_loop+0x45/0xa0
[ 5.664977] ? clear_bhb_loop+0x45/0xa0
[ 5.664979] ? clear_bhb_loop+0x45/0xa0
[ 5.664982] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 5.664985] RIP: 0033:0x7fe72d2f4484
[ 5.664988] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84
00 00 00 00 00 f3 0f 1e fa 80 3d 45 9c 10 00 00 74 13 b8 01 00
00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec
20 48 89
[ 5.664990] RSP: 002b:00007ffe51665998 EFLAGS: 00000202 ORIG_RAX:
0000000000000001
[ 5.664992] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
00007fe72d2f4484
[ 5.664994] RDX: 0000000000000003 RSI: 00007ffe51665ca0 RDI:
0000000000000038
[ 5.664995] RBP: 00007ffe516659c0 R08: 00007fe72d3f51c8 R09:
00007ffe51665a70
[ 5.664996] R10: 0000000000000000 R11: 0000000000000202 R12:
0000000000000003
[ 5.664997] R13: 00007ffe51665ca0 R14: 000055a1bab093b0 R15:
00007fe72d3f4e80
[ 5.665001] </TASK>
Thanks,
Thomas
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions
2025-01-12 11:33 ` Thomas Hellström
@ 2025-01-12 15:50 ` Ming Lei
2025-01-12 17:44 ` Thomas Hellström
2025-01-13 9:28 ` Ming Lei
1 sibling, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-01-12 15:50 UTC (permalink / raw)
To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block
On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
...
>
> Ah, You're right, it's a different warning this time. Posted the
> warning below. (Note: This is also with Christoph's series applied on
> top).
>
> May I also humbly suggest the following lockdep priming to be able to
> catch the reclaim lockdep splats early without reclaim needing to
> happen. That will also pick up splat #2 below.
>
> 8<-------------------------------------------------------------
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 32fb28a6372c..2dd8dc9aed7f 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -458,6 +458,11 @@ struct request_queue *blk_alloc_queue(struct
> queue_limits *lim, int node_id)
>
> q->nr_requests = BLKDEV_DEFAULT_RQ;
>
> + fs_reclaim_acquire(GFP_KERNEL);
> + rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_);
> + rwsem_release(&q->io_lockdep_map, _RET_IP_);
> + fs_reclaim_release(GFP_KERNEL);
> +
> return q;
Looks one nice idea for injecting fs_reclaim, maybe it can be
added to inject framework?
>
> fail_stats:
>
> 8<-------------------------------------------------------------
>
> #1:
> 106.921533] ======================================================
> [ 106.921716] WARNING: possible circular locking dependency detected
> [ 106.921725] 6.13.0-rc6+ #121 Tainted: G U
> [ 106.921734] ------------------------------------------------------
> [ 106.921743] kswapd0/117 is trying to acquire lock:
> [ 106.921751] ffff8ff4e2da09f0 (&q->q_usage_counter(io)){++++}-{0:0},
> at: __submit_bio+0x80/0x220
> [ 106.921769]
> but task is already holding lock:
> [ 106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
> balance_pgdat+0xe2/0xa10
> [ 106.921791]
> which lock already depends on the new lock.
>
> [ 106.921803]
> the existing dependency chain (in reverse order) is:
> [ 106.921814]
> -> #1 (fs_reclaim){+.+.}-{0:0}:
> [ 106.921824] fs_reclaim_acquire+0x9d/0xd0
> [ 106.921833] __kmalloc_cache_node_noprof+0x5d/0x3f0
> [ 106.921842] blk_mq_init_tags+0x3d/0xb0
> [ 106.921851] blk_mq_alloc_map_and_rqs+0x4e/0x3d0
> [ 106.921860] blk_mq_init_sched+0x100/0x260
> [ 106.921868] elevator_switch+0x8d/0x2e0
> [ 106.921877] elv_iosched_store+0x174/0x1e0
> [ 106.921885] queue_attr_store+0x142/0x180
> [ 106.921893] kernfs_fop_write_iter+0x168/0x240
> [ 106.921902] vfs_write+0x2b2/0x540
> [ 106.921910] ksys_write+0x72/0xf0
> [ 106.921916] do_syscall_64+0x95/0x180
> [ 106.921925] entry_SYSCALL_64_after_hwframe+0x76/0x7e
That is another regression from commit
af2814149883 block: freeze the queue in queue_attr_store
and queue_wb_lat_store() has same risk too.
I will cook a patch to fix it.
Thanks,
Ming
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions
2025-01-12 15:50 ` Ming Lei
@ 2025-01-12 17:44 ` Thomas Hellström
2025-01-13 0:55 ` Ming Lei
0 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2025-01-12 17:44 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block
On Sun, 2025-01-12 at 23:50 +0800, Ming Lei wrote:
> On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
>
> ...
>
> >
> > Ah, You're right, it's a different warning this time. Posted the
> > warning below. (Note: This is also with Christoph's series applied
> > on
> > top).
> >
> > May I also humbly suggest the following lockdep priming to be able
> > to
> > catch the reclaim lockdep splats early without reclaim needing to
> > happen. That will also pick up splat #2 below.
> >
> > 8<-------------------------------------------------------------
> >
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 32fb28a6372c..2dd8dc9aed7f 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -458,6 +458,11 @@ struct request_queue *blk_alloc_queue(struct
> > queue_limits *lim, int node_id)
> >
> > q->nr_requests = BLKDEV_DEFAULT_RQ;
> >
> > + fs_reclaim_acquire(GFP_KERNEL);
> > + rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_);
> > + rwsem_release(&q->io_lockdep_map, _RET_IP_);
> > + fs_reclaim_release(GFP_KERNEL);
> > +
> > return q;
>
> Looks one nice idea for injecting fs_reclaim, maybe it can be
> added to inject framework?
For the intel gpu drivers, we typically always prime lockdep like this
if we *know* that the lock will be grabbed during reclaim, like if it's
part of shrinker processing or similar.
So sooner or later we *know* this sequence will happen so we add it
near the lock initialization to always be executed when the lock(map)
is initialized.
So I don't really see a need for them to be periodially injected?
>
> >
> > fail_stats:
> >
> > 8<-------------------------------------------------------------
> >
> > #1:
> > 106.921533]
> > ======================================================
> > [ 106.921716] WARNING: possible circular locking dependency
> > detected
> > [ 106.921725] 6.13.0-rc6+ #121 Tainted: G U
> > [ 106.921734] ----------------------------------------------------
> > --
> > [ 106.921743] kswapd0/117 is trying to acquire lock:
> > [ 106.921751] ffff8ff4e2da09f0 (&q->q_usage_counter(io)){++++}-
> > {0:0},
> > at: __submit_bio+0x80/0x220
> > [ 106.921769]
> > but task is already holding lock:
> > [ 106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
> > balance_pgdat+0xe2/0xa10
> > [ 106.921791]
> > which lock already depends on the new lock.
> >
> > [ 106.921803]
> > the existing dependency chain (in reverse order) is:
> > [ 106.921814]
> > -> #1 (fs_reclaim){+.+.}-{0:0}:
> > [ 106.921824] fs_reclaim_acquire+0x9d/0xd0
> > [ 106.921833] __kmalloc_cache_node_noprof+0x5d/0x3f0
> > [ 106.921842] blk_mq_init_tags+0x3d/0xb0
> > [ 106.921851] blk_mq_alloc_map_and_rqs+0x4e/0x3d0
> > [ 106.921860] blk_mq_init_sched+0x100/0x260
> > [ 106.921868] elevator_switch+0x8d/0x2e0
> > [ 106.921877] elv_iosched_store+0x174/0x1e0
> > [ 106.921885] queue_attr_store+0x142/0x180
> > [ 106.921893] kernfs_fop_write_iter+0x168/0x240
> > [ 106.921902] vfs_write+0x2b2/0x540
> > [ 106.921910] ksys_write+0x72/0xf0
> > [ 106.921916] do_syscall_64+0x95/0x180
> > [ 106.921925] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> That is another regression from commit
>
> af2814149883 block: freeze the queue in queue_attr_store
>
> and queue_wb_lat_store() has same risk too.
>
> I will cook a patch to fix it.
Thanks. Are these splats going to be silenced for 6.13-rc? Like having
the new lockdep checks under a special config until they are fixed?
Thanks,
Thomas
>
> Thanks,
> Ming
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions
2025-01-12 17:44 ` Thomas Hellström
@ 2025-01-13 0:55 ` Ming Lei
2025-01-13 8:48 ` Thomas Hellström
0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-01-13 0:55 UTC (permalink / raw)
To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block
On Sun, Jan 12, 2025 at 06:44:53PM +0100, Thomas Hellström wrote:
> On Sun, 2025-01-12 at 23:50 +0800, Ming Lei wrote:
> > On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> > > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
> >
> > ...
> >
> > >
> > > Ah, You're right, it's a different warning this time. Posted the
> > > warning below. (Note: This is also with Christoph's series applied
> > > on
> > > top).
> > >
> > > May I also humbly suggest the following lockdep priming to be able
> > > to
> > > catch the reclaim lockdep splats early without reclaim needing to
> > > happen. That will also pick up splat #2 below.
> > >
> > > 8<-------------------------------------------------------------
> > >
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index 32fb28a6372c..2dd8dc9aed7f 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -458,6 +458,11 @@ struct request_queue *blk_alloc_queue(struct
> > > queue_limits *lim, int node_id)
> > >
> > > q->nr_requests = BLKDEV_DEFAULT_RQ;
> > >
> > > + fs_reclaim_acquire(GFP_KERNEL);
> > > + rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_);
> > > + rwsem_release(&q->io_lockdep_map, _RET_IP_);
> > > + fs_reclaim_release(GFP_KERNEL);
> > > +
> > > return q;
> >
> > Looks one nice idea for injecting fs_reclaim, maybe it can be
> > added to inject framework?
>
> For the intel gpu drivers, we typically always prime lockdep like this
> if we *know* that the lock will be grabbed during reclaim, like if it's
> part of shrinker processing or similar.
>
> So sooner or later we *know* this sequence will happen so we add it
> near the lock initialization to always be executed when the lock(map)
> is initialized.
>
> So I don't really see a need for them to be periodially injected?
What I suggested is to add the verification for every allocation with
direct reclaim by one kernel config which depends on both lockdep and
fault inject.
>
> >
> > >
> > > fail_stats:
> > >
> > > 8<-------------------------------------------------------------
> > >
> > > #1:
> > > 106.921533]
> > > ======================================================
> > > [ 106.921716] WARNING: possible circular locking dependency
> > > detected
> > > [ 106.921725] 6.13.0-rc6+ #121 Tainted: G U
> > > [ 106.921734] ----------------------------------------------------
> > > --
> > > [ 106.921743] kswapd0/117 is trying to acquire lock:
> > > [ 106.921751] ffff8ff4e2da09f0 (&q->q_usage_counter(io)){++++}-
> > > {0:0},
> > > at: __submit_bio+0x80/0x220
> > > [ 106.921769]
> > > but task is already holding lock:
> > > [ 106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
> > > balance_pgdat+0xe2/0xa10
> > > [ 106.921791]
> > > which lock already depends on the new lock.
> > >
> > > [ 106.921803]
> > > the existing dependency chain (in reverse order) is:
> > > [ 106.921814]
> > > -> #1 (fs_reclaim){+.+.}-{0:0}:
> > > [ 106.921824] fs_reclaim_acquire+0x9d/0xd0
> > > [ 106.921833] __kmalloc_cache_node_noprof+0x5d/0x3f0
> > > [ 106.921842] blk_mq_init_tags+0x3d/0xb0
> > > [ 106.921851] blk_mq_alloc_map_and_rqs+0x4e/0x3d0
> > > [ 106.921860] blk_mq_init_sched+0x100/0x260
> > > [ 106.921868] elevator_switch+0x8d/0x2e0
> > > [ 106.921877] elv_iosched_store+0x174/0x1e0
> > > [ 106.921885] queue_attr_store+0x142/0x180
> > > [ 106.921893] kernfs_fop_write_iter+0x168/0x240
> > > [ 106.921902] vfs_write+0x2b2/0x540
> > > [ 106.921910] ksys_write+0x72/0xf0
> > > [ 106.921916] do_syscall_64+0x95/0x180
> > > [ 106.921925] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > That is another regression from commit
> >
> > af2814149883 block: freeze the queue in queue_attr_store
> >
> > and queue_wb_lat_store() has same risk too.
> >
> > I will cook a patch to fix it.
>
> Thanks. Are these splats going to be silenced for 6.13-rc? Like having
> the new lockdep checks under a special config until they are fixed?
It is too late for v6.13, and Christoph's fix won't be available for v6.13
too.
Thanks,
Ming
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions
2025-01-13 0:55 ` Ming Lei
@ 2025-01-13 8:48 ` Thomas Hellström
0 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2025-01-13 8:48 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block
Hi.
On Mon, 2025-01-13 at 08:55 +0800, Ming Lei wrote:
> On Sun, Jan 12, 2025 at 06:44:53PM +0100, Thomas Hellström wrote:
> > On Sun, 2025-01-12 at 23:50 +0800, Ming Lei wrote:
> > > On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> > > > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
> > >
> > > ...
> > >
> > > >
> > > > Ah, You're right, it's a different warning this time. Posted
> > > > the
> > > > warning below. (Note: This is also with Christoph's series
> > > > applied
> > > > on
> > > > top).
> > > >
> > > > May I also humbly suggest the following lockdep priming to be
> > > > able
> > > > to
> > > > catch the reclaim lockdep splats early without reclaim needing
> > > > to
> > > > happen. That will also pick up splat #2 below.
> > > >
> > > > 8<-------------------------------------------------------------
> > > >
> > > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > > index 32fb28a6372c..2dd8dc9aed7f 100644
> > > > --- a/block/blk-core.c
> > > > +++ b/block/blk-core.c
> > > > @@ -458,6 +458,11 @@ struct request_queue
> > > > *blk_alloc_queue(struct
> > > > queue_limits *lim, int node_id)
> > > >
> > > > q->nr_requests = BLKDEV_DEFAULT_RQ;
> > > >
> > > > + fs_reclaim_acquire(GFP_KERNEL);
> > > > + rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_);
> > > > + rwsem_release(&q->io_lockdep_map, _RET_IP_);
> > > > + fs_reclaim_release(GFP_KERNEL);
> > > > +
> > > > return q;
> > >
> > > Looks one nice idea for injecting fs_reclaim, maybe it can be
> > > added to inject framework?
> >
> > For the intel gpu drivers, we typically always prime lockdep like
> > this
> > if we *know* that the lock will be grabbed during reclaim, like if
> > it's
> > part of shrinker processing or similar.
> >
> > So sooner or later we *know* this sequence will happen so we add it
> > near the lock initialization to always be executed when the
> > lock(map)
> > is initialized.
> >
> > So I don't really see a need for them to be periodially injected?
>
> What I suggested is to add the verification for every allocation with
> direct reclaim by one kernel config which depends on both lockdep and
> fault inject.
>
> >
> > >
> > > >
> > > > fail_stats:
> > > >
> > > > 8<-------------------------------------------------------------
> > > >
> > > > #1:
> > > > 106.921533]
> > > > ======================================================
> > > > [ 106.921716] WARNING: possible circular locking dependency
> > > > detected
> > > > [ 106.921725] 6.13.0-rc6+ #121 Tainted: G U
> > > > [ 106.921734] ------------------------------------------------
> > > > ----
> > > > --
> > > > [ 106.921743] kswapd0/117 is trying to acquire lock:
> > > > [ 106.921751] ffff8ff4e2da09f0 (&q-
> > > > >q_usage_counter(io)){++++}-
> > > > {0:0},
> > > > at: __submit_bio+0x80/0x220
> > > > [ 106.921769]
> > > > but task is already holding lock:
> > > > [ 106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
> > > > balance_pgdat+0xe2/0xa10
> > > > [ 106.921791]
> > > > which lock already depends on the new lock.
> > > >
> > > > [ 106.921803]
> > > > the existing dependency chain (in reverse order)
> > > > is:
> > > > [ 106.921814]
> > > > -> #1 (fs_reclaim){+.+.}-{0:0}:
> > > > [ 106.921824] fs_reclaim_acquire+0x9d/0xd0
> > > > [ 106.921833] __kmalloc_cache_node_noprof+0x5d/0x3f0
> > > > [ 106.921842] blk_mq_init_tags+0x3d/0xb0
> > > > [ 106.921851] blk_mq_alloc_map_and_rqs+0x4e/0x3d0
> > > > [ 106.921860] blk_mq_init_sched+0x100/0x260
> > > > [ 106.921868] elevator_switch+0x8d/0x2e0
> > > > [ 106.921877] elv_iosched_store+0x174/0x1e0
> > > > [ 106.921885] queue_attr_store+0x142/0x180
> > > > [ 106.921893] kernfs_fop_write_iter+0x168/0x240
> > > > [ 106.921902] vfs_write+0x2b2/0x540
> > > > [ 106.921910] ksys_write+0x72/0xf0
> > > > [ 106.921916] do_syscall_64+0x95/0x180
> > > > [ 106.921925] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > >
> > > That is another regression from commit
> > >
> > > af2814149883 block: freeze the queue in queue_attr_store
> > >
> > > and queue_wb_lat_store() has same risk too.
> > >
> > > I will cook a patch to fix it.
> >
> > Thanks. Are these splats going to be silenced for 6.13-rc? Like
> > having
> > the new lockdep checks under a special config until they are fixed?
>
> It is too late for v6.13, and Christoph's fix won't be available for
> v6.13
> too.
Yeah, I was thinking more of the lockdep warnings themselves, rather
than the actual deadlock fixing?
Thanks,
Thomas
>
>
> Thanks,
> Ming
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions
2025-01-12 11:33 ` Thomas Hellström
2025-01-12 15:50 ` Ming Lei
@ 2025-01-13 9:28 ` Ming Lei
2025-01-13 9:58 ` Thomas Hellström
1 sibling, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-01-13 9:28 UTC (permalink / raw)
To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block
On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
> > On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote:
> > > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote:
> > > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote:
> > > > > Ming, Others
> > > > >
>
> #2:
> [ 5.595482] ======================================================
> [ 5.596353] WARNING: possible circular locking dependency detected
> [ 5.597231] 6.13.0-rc6+ #122 Tainted: G U
> [ 5.598182] ------------------------------------------------------
> [ 5.599149] (udev-worker)/867 is trying to acquire lock:
> [ 5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4}, at:
> kernfs_remove+0x31/0x50
> [ 5.600987]
> but task is already holding lock:
> [ 5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}-
> {0:0}, at: blk_mq_freeze_queue+0x12/0x20
> [ 5.603033]
> which lock already depends on the new lock.
>
> [ 5.603034]
> the existing dependency chain (in reverse order) is:
> [ 5.603035]
> -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}:
> [ 5.603038] blk_alloc_queue+0x319/0x350
> [ 5.603041] blk_mq_alloc_queue+0x63/0xd0
The above one is solved in for-6.14/block of block tree:
block: track queue dying state automatically for modeling queue freeze lockdep
q->q_usage_counter(io) is killed because disk isn't up yet.
If you apply the noio patch against for-6.1/block, the two splats should
have disappeared. If not, please post lockdep log.
Thanks,
Ming
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions
2025-01-13 9:28 ` Ming Lei
@ 2025-01-13 9:58 ` Thomas Hellström
2025-01-13 10:40 ` Ming Lei
0 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2025-01-13 9:58 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block
Hi,
On Mon, 2025-01-13 at 17:28 +0800, Ming Lei wrote:
> On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
> > > On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote:
> > > > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote:
> > > > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström
> > > > > wrote:
> > > > > > Ming, Others
> > > > > >
> >
> > #2:
> > [ 5.595482]
> > ======================================================
> > [ 5.596353] WARNING: possible circular locking dependency
> > detected
> > [ 5.597231] 6.13.0-rc6+ #122 Tainted: G U
> > [ 5.598182] ----------------------------------------------------
> > --
> > [ 5.599149] (udev-worker)/867 is trying to acquire lock:
> > [ 5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4},
> > at:
> > kernfs_remove+0x31/0x50
> > [ 5.600987]
> > but task is already holding lock:
> > [ 5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}-
> > {0:0}, at: blk_mq_freeze_queue+0x12/0x20
> > [ 5.603033]
> > which lock already depends on the new lock.
> >
> > [ 5.603034]
> > the existing dependency chain (in reverse order) is:
> > [ 5.603035]
> > -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}:
> > [ 5.603038] blk_alloc_queue+0x319/0x350
> > [ 5.603041] blk_mq_alloc_queue+0x63/0xd0
>
> The above one is solved in for-6.14/block of block tree:
>
> block: track queue dying state automatically for modeling
> queue freeze lockdep
>
> q->q_usage_counter(io) is killed because disk isn't up yet.
>
> If you apply the noio patch against for-6.1/block, the two splats
> should
> have disappeared. If not, please post lockdep log.
That above dependency path is the lockdep priming I suggested, which
establishes the reclaim -> q->q_usage_counter(io) locking order.
A splat without that priming would look slightly different and won't
occur until memory is actually exhausted. But it *will* occur.
That's why I suggested using the priming to catch all fs_reclaim-
>q_usage_counter(io) violations early, perhaps already at system boot,
and anybody accidently adding a GFP_KERNEL memory allocation under the
q_usage_counter(io) lock would get a notification as soon as that
allocation happens.
The actual deadlock sequence is because kernfs_rwsem is taken under
q_usage_counter(io): (excerpt from the report [a]).
If the priming is removed, the splat doesn't happen until reclaim, and
will instead look like [b].
Thanks,
Thomas
[a]
[ 5.603115] Chain exists of:
&root->kernfs_rwsem --> fs_reclaim --> &q-
>q_usage_counter(io)#3
[ 5.603117] Possible unsafe locking scenario:
[ 5.603117] CPU0 CPU1
[ 5.603117] ---- ----
[ 5.603118] lock(&q->q_usage_counter(io)#3);
[ 5.603119] lock(fs_reclaim);
[ 5.603119] lock(&q-
>q_usage_counter(io)#3);
[ 5.603120] lock(&root->kernfs_rwsem);
[ 5.603121]
*** DEADLOCK ***
[ 5.603121] 6 locks held by (udev-worker)/867:
[ 5.603122] #0: ffff9211c16dd420 (sb_writers#4){.+.+}-{0:0}, at:
ksys_write+0x72/0xf0
[ 5.603125] #1: ffff9211e28f3e88 (&of->mutex#2){+.+.}-{4:4}, at:
kernfs_fop_write_iter+0x121/0x240
[ 5.603128] #2: ffff921203524f28 (kn->active#101){.+.+}-{0:0}, at:
kernfs_fop_write_iter+0x12a/0x240
[ 5.603131] #3: ffff9211e86f46d0 (&q->sysfs_lock){+.+.}-{4:4}, at:
queue_attr_store+0x12b/0x180
[ 5.603133] #4: ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}-
{0:0}, at: blk_mq_freeze_queue+0x12/0x20
[ 5.603136] #5: ffff9211e86f41d8 (&q-
>q_usage_counter(queue)#3){++++}-{0:0}, at:
blk_mq_freeze_queue+0x12/0x20
[ 5.603139]
stack backtrace:
[ 5.603140] CPU: 4 UID: 0 PID: 867 Comm: (udev-worker) Tainted: G
U 6.13.0-rc6+ #122
[ 5.603142] Tainted: [U]=USER
[ 5.603142] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[ 5.603143] Call Trace:
[ 5.603144] <TASK>
[ 5.603146] dump_stack_lvl+0x6e/0xa0
[ 5.603148] print_circular_bug.cold+0x178/0x1be
[ 5.603151] check_noncircular+0x148/0x160
[ 5.603154] __lock_acquire+0x1339/0x2180
[ 5.603156] lock_acquire+0xd0/0x2e0
[ 5.603158] ? kernfs_remove+0x31/0x50
[ 5.603160] ? sysfs_remove_dir+0x32/0x60
[ 5.603162] ? lock_release+0xd2/0x2a0
[ 5.603164] down_write+0x2e/0xb0
[ 5.603165] ? kernfs_remove+0x31/0x50
[ 5.603166] kernfs_remove+0x31/0x50
[ 5.
[b]
[157.543591] ======================================================
[ 157.543778] WARNING: possible circular locking dependency detected
[ 157.543787] 6.13.0-rc6+ #123 Tainted: G U
[ 157.543796] ------------------------------------------------------
[ 157.543805] git/2856 is trying to acquire lock:
[ 157.543812] ffff98b6bb882f10 (&q->q_usage_counter(io)#2){++++}-
{0:0}, at: __submit_bio+0x80/0x220
[ 157.543830]
but task is already holding lock:
[ 157.543839] ffffffffad65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
__alloc_pages_slowpath.constprop.0+0x348/0xea0
[ 157.543855]
which lock already depends on the new lock.
[ 157.543867]
the existing dependency chain (in reverse order) is:
[ 157.543878]
-> #2 (fs_reclaim){+.+.}-{0:0}:
[ 157.543888] fs_reclaim_acquire+0x9d/0xd0
[ 157.543896] kmem_cache_alloc_lru_noprof+0x57/0x3f0
[ 157.543906] alloc_inode+0x97/0xc0
[ 157.543913] iget_locked+0x141/0x310
[ 157.543921] kernfs_get_inode+0x1a/0xf0
[ 157.543929] kernfs_get_tree+0x17b/0x2c0
[ 157.543938] sysfs_get_tree+0x1a/0x40
[ 157.543945] vfs_get_tree+0x29/0xe0
[ 157.543953] path_mount+0x49a/0xbd0
[ 157.543960] __x64_sys_mount+0x119/0x150
[ 157.543968] do_syscall_64+0x95/0x180
[ 157.543977] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 157.543986]
-> #1 (&root->kernfs_rwsem){++++}-{4:4}:
[ 157.543997] down_write+0x2e/0xb0
[ 157.544004] kernfs_remove+0x31/0x50
[ 157.544012] __kobject_del+0x2e/0x90
[ 157.544020] kobject_del+0x13/0x30
[ 157.544026] elevator_switch+0x44/0x2e0
[ 157.544034] elv_iosched_store+0x174/0x1e0
[ 157.544043] queue_attr_store+0x165/0x1b0
[ 157.544050] kernfs_fop_write_iter+0x168/0x240
[ 157.544059] vfs_write+0x2b2/0x540
[ 157.544066] ksys_write+0x72/0xf0
[ 157.544073] do_syscall_64+0x95/0x180
[ 157.544081] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 157.544090]
-> #0 (&q->q_usage_counter(io)#2){++++}-{0:0}:
[ 157.544102] __lock_acquire+0x1339/0x2180
[ 157.544110] lock_acquire+0xd0/0x2e0
[ 157.544118] blk_mq_submit_bio+0x88b/0xb60
[ 157.544127] __submit_bio+0x80/0x220
[ 157.544135] submit_bio_noacct_nocheck+0x324/0x420
[ 157.544144] swap_writepage+0x399/0x580
[ 157.544152] pageout+0x129/0x2d0
[ 157.544160] shrink_folio_list+0x5a0/0xd80
[ 157.544168] evict_folios+0x27d/0x7b0
[ 157.544175] try_to_shrink_lruvec+0x21b/0x2b0
[ 157.544183] shrink_one+0x102/0x1f0
[ 157.544191] shrink_node+0xb8e/0x1300
[ 157.544198] do_try_to_free_pages+0xb3/0x580
[ 157.544206] try_to_free_pages+0xfa/0x2a0
[ 157.544214] __alloc_pages_slowpath.constprop.0+0x36f/0xea0
[ 157.544224] __alloc_pages_noprof+0x34c/0x390
[ 157.544233] alloc_pages_mpol_noprof+0xd7/0x1c0
[ 157.544241] pipe_write+0x3fc/0x7f0
[ 157.544574] vfs_write+0x401/0x540
[ 157.544917] ksys_write+0xd1/0xf0
[ 157.545246] do_syscall_64+0x95/0x180
[ 157.545576] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 157.545909]
other info that might help us debug this:
[ 157.546879] Chain exists of:
&q->q_usage_counter(io)#2 --> &root->kernfs_rwsem -->
fs_reclaim
[ 157.547849] Possible unsafe locking scenario:
[ 157.548483] CPU0 CPU1
[ 157.548795] ---- ----
[ 157.549098] lock(fs_reclaim);
[ 157.549400] lock(&root-
>kernfs_rwsem);
[ 157.549705] lock(fs_reclaim);
[ 157.550011] rlock(&q->q_usage_counter(io)#2);
[ 157.550316]
*** DEADLOCK ***
[ 157.551194] 2 locks held by git/2856:
[ 157.551490] #0: ffff98b6a221e068 (&pipe->mutex){+.+.}-{4:4}, at:
pipe_write+0x5a/0x7f0
[ 157.551798] #1: ffffffffad65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
__alloc_pages_slowpath.constprop.0+0x348/0xea0
[ 157.552115]
stack backtrace:
[ 157.552734] CPU: 5 UID: 1000 PID: 2856 Comm: git Tainted: G U
6.13.0-rc6+ #123
[ 157.553060] Tainted: [U]=USER
[ 157.553383] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[ 157.553718] Call Trace:
[ 157.554054] <TASK>
[ 157.554389] dump_stack_lvl+0x6e/0xa0
[ 157.554725] print_circular_bug.cold+0x178/0x1be
[ 157.555064] check_noncircular+0x148/0x160
[ 157.555408] ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 157.555747] ? unwind_get_return_address+0x23/0x40
[ 157.556085] __lock_acquire+0x1339/0x2180
[ 157.556425] lock_acquire+0xd0/0x2e0
[ 157.556761] ? __submit_bio+0x80/0x220
[ 157.557110] ? blk_mq_submit_bio+0x860/0xb60
[ 157.557447] ? lock_release+0xd2/0x2a0
[ 157.557784] blk_mq_submit_bio+0x88b/0xb60
[ 157.558137] ? __submit_bio+0x80/0x220
[ 157.558476] __submit_bio+0x80/0x220
[ 157.558828] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 157.559166] ? submit_bio_noacct_nocheck+0x324/0x420
[ 157.559504] submit_bio_noacct_nocheck+0x324/0x420
[ 157.559863] swap_writepage+0x399/0x580
[ 157.560205] pageout+0x129/0x2d0
[ 157.560542] shrink_folio_list+0x5a0/0xd80
[ 157.560879] ? evict_folios+0x25d/0x7b0
[ 157.561212] evict_folios+0x27d/0x7b0
[ 157.561546] try_to_shrink_lruvec+0x21b/0x2b0
[ 157.561890] shrink_one+0x102/0x1f0
[ 157.562222] shrink_node+0xb8e/0x1300
[ 157.562554] ? shrink_node+0x9c1/0x1300
[ 157.562915] ? shrink_node+0xb64/0x1300
[ 157.563245] ? do_try_to_free_pages+0xb3/0x580
[ 157.563576] do_try_to_free_pages+0xb3/0x580
[ 157.563922] ? lock_release+0xd2/0x2a0
[ 157.564252] try_to_free_pages+0xfa/0x2a0
[ 157.564583] __alloc_pages_slowpath.constprop.0+0x36f/0xea0
[ 157.564946] ? lock_release+0xd2/0x2a0
[ 157.565279] __alloc_pages_noprof+0x34c/0x390
[ 157.565613] alloc_pages_mpol_noprof+0xd7/0x1c0
[ 157.565952] pipe_write+0x3fc/0x7f0
[ 157.566283] vfs_write+0x401/0x540
[ 157.566615] ksys_write+0xd1/0xf0
[ 157.566980] do_syscall_64+0x95/0x180
[ 157.567312] ? vfs_write+0x401/0x540
[ 157.567642] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 157.568001] ? syscall_exit_to_user_mode+0x97/0x290
[ 157.568331] ? do_syscall_64+0xa1/0x180
[ 157.568658] ? do_syscall_64+0xa1/0x180
[ 157.569012] ? syscall_exit_to_user_mode+0x97/0x290
[ 157.569337] ? do_syscall_64+0xa1/0x180
[ 157.569658] ? do_user_addr_fault+0x397/0x720
[ 157.569980] ? trace_hardirqs_off+0x4b/0xc0
[ 157.570300] ? clear_bhb_loop+0x45/0xa0
[ 157.570621] ? clear_bhb_loop+0x45/0xa0
[ 157.570968] ? clear_bhb_loop+0x45/0xa0
[ 157.571286] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 157.571605] RIP: 0033:0x7fdf1ec2d484
[ 157.571966] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84
00 00 00 00 00 f3 0f 1e fa 80 3d 45 9c 10 00 00 74 13 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
[ 157.572322] RSP: 002b:00007ffd0eb6d068 EFLAGS: 00000202 ORIG_RAX:
0000000000000001
[ 157.572692] RAX: ffffffffffffffda RBX: 0000000000000331 RCX:
00007fdf1ec2d484
[ 157.573093] RDX: 0000000000000331 RSI: 000055693fe2d660 RDI:
0000000000000001
[ 157.573470] RBP: 00007ffd0eb6d090 R08: 000055693fdc6010 R09:
0000000000000007
[ 157.573875] R10: 0000556941b97c70 R11: 0000000000000202 R12:
0000000000000331
[ 157.574249] R13: 000055693fe2d660 R14: 00007fdf1ed305c0 R15:
00007fdf1ed2de80
[ 157.574621] </TASK>
>
> Thanks,
> Ming
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions
2025-01-13 9:58 ` Thomas Hellström
@ 2025-01-13 10:40 ` Ming Lei
0 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2025-01-13 10:40 UTC (permalink / raw)
To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block
On Mon, Jan 13, 2025 at 10:58:07AM +0100, Thomas Hellström wrote:
> Hi,
>
> On Mon, 2025-01-13 at 17:28 +0800, Ming Lei wrote:
> > On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> > > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
> > > > On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote:
> > > > > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote:
> > > > > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström
> > > > > > wrote:
> > > > > > > Ming, Others
> > > > > > >
> > >
> > > #2:
> > > [ 5.595482]
> > > ======================================================
> > > [ 5.596353] WARNING: possible circular locking dependency
> > > detected
> > > [ 5.597231] 6.13.0-rc6+ #122 Tainted: G U
> > > [ 5.598182] ----------------------------------------------------
> > > --
> > > [ 5.599149] (udev-worker)/867 is trying to acquire lock:
> > > [ 5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4},
> > > at:
> > > kernfs_remove+0x31/0x50
> > > [ 5.600987]
> > > but task is already holding lock:
> > > [ 5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}-
> > > {0:0}, at: blk_mq_freeze_queue+0x12/0x20
> > > [ 5.603033]
> > > which lock already depends on the new lock.
> > >
> > > [ 5.603034]
> > > the existing dependency chain (in reverse order) is:
> > > [ 5.603035]
> > > -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}:
> > > [ 5.603038] blk_alloc_queue+0x319/0x350
> > > [ 5.603041] blk_mq_alloc_queue+0x63/0xd0
> >
> > The above one is solved in for-6.14/block of block tree:
> >
> > block: track queue dying state automatically for modeling
> > queue freeze lockdep
> >
> > q->q_usage_counter(io) is killed because disk isn't up yet.
> >
> > If you apply the noio patch against for-6.1/block, the two splats
> > should
> > have disappeared. If not, please post lockdep log.
>
> That above dependency path is the lockdep priming I suggested, which
> establishes the reclaim -> q->q_usage_counter(io) locking order.
> A splat without that priming would look slightly different and won't
> occur until memory is actually exhausted. But it *will* occur.
>
> That's why I suggested using the priming to catch all fs_reclaim-
> >q_usage_counter(io) violations early, perhaps already at system boot,
> and anybody accidently adding a GFP_KERNEL memory allocation under the
> q_usage_counter(io) lock would get a notification as soon as that
> allocation happens.
>
> The actual deadlock sequence is because kernfs_rwsem is taken under
> q_usage_counter(io): (excerpt from the report [a]).
> If the priming is removed, the splat doesn't happen until reclaim, and
> will instead look like [b].
Got it, [b] is new warning between 'echo /sys/block/$DEV/queue/scheduler'
and fs reclaim from sysfs inode allocation.
Three global or sub-system locks are involved:
- fs_reclaim
- root->kernfs_rwsem
- q->queue_usage_counter(io)
The problem exists since blk-mq scheduler is introduced, looks one hard
problem because it becomes difficult to avoid their dependency now.
I will think about and see if we can figure out one solution.
Thanks,
Ming
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-01-13 10:41 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-10 10:12 Blockdev 6.13-rc lockdep splat regressions Thomas Hellström
2025-01-10 10:14 ` Christoph Hellwig
2025-01-10 10:21 ` Thomas Hellström
2025-01-10 12:13 ` Ming Lei
2025-01-10 14:36 ` Thomas Hellström
2025-01-11 3:05 ` Ming Lei
2025-01-12 11:33 ` Thomas Hellström
2025-01-12 15:50 ` Ming Lei
2025-01-12 17:44 ` Thomas Hellström
2025-01-13 0:55 ` Ming Lei
2025-01-13 8:48 ` Thomas Hellström
2025-01-13 9:28 ` Ming Lei
2025-01-13 9:58 ` Thomas Hellström
2025-01-13 10:40 ` Ming Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).