* Blockdev 6.13-rc lockdep splat regressions
@ 2025-01-10 10:12 Thomas Hellström
2025-01-10 10:14 ` Christoph Hellwig
2025-01-10 12:13 ` Ming Lei
0 siblings, 2 replies; 14+ messages in thread
From: Thomas Hellström @ 2025-01-10 10:12 UTC (permalink / raw)
To: Ming Lei, Jens Axboe; +Cc: Christoph Hellwig, linux-block
Ming, Others
On 6.13-rc6 I'm seeing a couple of lockdep splats which appear
introduced by the commit
f1be1788a32e ("block: model freeze & enter queue as lock for supporting
lockdep")
The first one happens when swap-outs start to a scsi disc,
Simple reproducer is to start a couple of parallel "gitk" on the kernel
repo and watch them exhaust available memory.
the second is easily triggered by entering a debugfs trace directory,
apparently triggering automount:
cd /sys/kernel/debug/tracing/events
Are you aware of these?
Thanks,
Thomas
#1
[ 399.006581] ======================================================
[ 399.006756] WARNING: possible circular locking dependency detected
[ 399.006767] 6.12.0-rc4+ #1 Tainted: G U N
[ 399.006776] ------------------------------------------------------
[ 399.006801] kswapd0/116 is trying to acquire lock:
[ 399.006810] ffff9a67a1284a28 (&q->q_usage_counter(io)){++++}-{0:0},
at: __submit_bio+0xf0/0x1c0
[ 399.006845]
but task is already holding lock:
[ 399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at:
balance_pgdat+0xe2/0xa20
[ 399.006874]
which lock already depends on the new lock.
[ 399.006890]
the existing dependency chain (in reverse order) is:
[ 399.006905]
-> #1 (fs_reclaim){+.+.}-{0:0}:
[ 399.006919] fs_reclaim_acquire+0x9d/0xd0
[ 399.006931] __kmalloc_node_noprof+0xa5/0x460
[ 399.006943] sbitmap_init_node+0x84/0x1e0
[ 399.006953] scsi_realloc_sdev_budget_map+0xc8/0x1a0
[ 399.006965] scsi_add_lun+0x419/0x710
[ 399.006976] scsi_probe_and_add_lun+0x12d/0x450
[ 399.006988] __scsi_scan_target+0x112/0x230
[ 399.006999] scsi_scan_channel+0x59/0x90
[ 399.007009] scsi_scan_host_selected+0xe5/0x120
[ 399.007021] do_scan_async+0x1b/0x160
[ 399.007031] async_run_entry_fn+0x31/0x130
[ 399.007043] process_one_work+0x21a/0x590
[ 399.007054] worker_thread+0x1c3/0x3b0
[ 399.007065] kthread+0xd2/0x100
[ 399.007074] ret_from_fork+0x31/0x50
[ 399.007085] ret_from_fork_asm+0x1a/0x30
[ 399.007096]
-> #0 (&q->q_usage_counter(io)){++++}-{0:0}:
[ 399.007111] __lock_acquire+0x13ac/0x2170
[ 399.007123] lock_acquire+0xd0/0x2f0
[ 399.007134] blk_mq_submit_bio+0x90b/0xb00
[ 399.007145] __submit_bio+0xf0/0x1c0
[ 399.007155] submit_bio_noacct_nocheck+0x324/0x420
[ 399.007167] swap_writepage+0x14a/0x2c0
[ 399.007178] pageout+0x129/0x2d0
[ 399.007608] shrink_folio_list+0x5a0/0xd80
[ 399.008045] evict_folios+0x27a/0x790
[ 399.008486] try_to_shrink_lruvec+0x225/0x2b0
[ 399.008926] shrink_one+0x102/0x1f0
[ 399.009360] shrink_node+0xa7c/0x1130
[ 399.009821] balance_pgdat+0x560/0xa20
[ 399.010254] kswapd+0x20a/0x440
[ 399.010698] kthread+0xd2/0x100
[ 399.011141] ret_from_fork+0x31/0x50
[ 399.011584] ret_from_fork_asm+0x1a/0x30
[ 399.012024]
other info that might help us debug this:
[ 399.013283] Possible unsafe locking scenario:
[ 399.014160] CPU0 CPU1
[ 399.014584] ---- ----
[ 399.015010] lock(fs_reclaim);
[ 399.015439] lock(&q-
>q_usage_counter(io));
[ 399.015867] lock(fs_reclaim);
[ 399.016208] rlock(&q->q_usage_counter(io));
[ 399.016538]
*** DEADLOCK ***
[ 399.017539] 1 lock held by kswapd0/116:
[ 399.017887] #0: ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at:
balance_pgdat+0xe2/0xa20
[ 399.018218]
stack backtrace:
[ 399.018887] CPU: 11 UID: 0 PID: 116 Comm: kswapd0 Tainted: G U
N 6.12.0-rc4+ #1
[ 399.019217] Tainted: [U]=USER, [N]=TEST
[ 399.019543] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[ 399.019911] Call Trace:
[ 399.020235] <TASK>
[ 399.020556] dump_stack_lvl+0x6e/0xa0
[ 399.020890] print_circular_bug.cold+0x178/0x1be
[ 399.021207] check_noncircular+0x148/0x160
[ 399.021523] __lock_acquire+0x13ac/0x2170
[ 399.021852] lock_acquire+0xd0/0x2f0
[ 399.022167] ? __submit_bio+0xf0/0x1c0
[ 399.022489] ? blk_mq_submit_bio+0x8e0/0xb00
[ 399.022830] ? lock_release+0xd3/0x2b0
[ 399.023143] blk_mq_submit_bio+0x90b/0xb00
[ 399.023460] ? __submit_bio+0xf0/0x1c0
[ 399.023785] ? lock_acquire+0xd0/0x2f0
[ 399.024096] __submit_bio+0xf0/0x1c0
[ 399.024404] submit_bio_noacct_nocheck+0x324/0x420
[ 399.024713] swap_writepage+0x14a/0x2c0
[ 399.025037] pageout+0x129/0x2d0
[ 399.025348] shrink_folio_list+0x5a0/0xd80
[ 399.025668] ? evict_folios+0x25a/0x790
[ 399.026007] evict_folios+0x27a/0x790
[ 399.026325] try_to_shrink_lruvec+0x225/0x2b0
[ 399.026635] shrink_one+0x102/0x1f0
[ 399.026957] ? shrink_node+0xa63/0x1130
[ 399.027264] shrink_node+0xa7c/0x1130
[ 399.027570] ? shrink_node+0x908/0x1130
[ 399.027888] balance_pgdat+0x560/0xa20
[ 399.028197] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 399.028510] ? finish_task_switch.isra.0+0xc4/0x2a0
[ 399.028829] kswapd+0x20a/0x440
[ 399.029136] ? __pfx_autoremove_wake_function+0x10/0x10
[ 399.029483] ? __pfx_kswapd+0x10/0x10
[ 399.029912] kthread+0xd2/0x100
[ 399.030306] ? __pfx_kthread+0x10/0x10
[ 399.030699] ret_from_fork+0x31/0x50
[ 399.031105] ? __pfx_kthread+0x10/0x10
[ 399.031520] ret_from_fork_asm+0x1a/0x30
[ 399.031934] </TASK>
#2:
[ 81.960829] ======================================================
[ 81.961010] WARNING: possible circular locking dependency detected
[ 81.961048] 6.12.0-rc4+ #3 Tainted: G U
[ 81.961082] ------------------------------------------------------
[ 81.961117] bash/2744 is trying to acquire lock:
[ 81.961147] ffffffff8b6754d0 (namespace_sem){++++}-{4:4}, at:
finish_automount+0x77/0x3a0
[ 81.961215]
but task is already holding lock:
[ 81.961249] ffff8d7a8051ce50 (&sb->s_type->i_mutex_key#3){++++}-
{4:4}, at: finish_automount+0x6b/0x3a0
[ 81.961316]
which lock already depends on the new lock.
[ 81.961361]
the existing dependency chain (in reverse order) is:
[ 81.961403]
-> #6 (&sb->s_type->i_mutex_key#3){++++}-{4:4}:
[ 81.961452] down_write+0x2e/0xb0
[ 81.961486] start_creating.part.0+0x5f/0x120
[ 81.961523] debugfs_create_dir+0x36/0x190
[ 81.961557] blk_register_queue+0xba/0x220
[ 81.961594] add_disk_fwnode+0x235/0x430
[ 81.961626] nvme_alloc_ns+0x7eb/0xb50 [nvme_core]
[ 81.961696] nvme_scan_ns+0x251/0x330 [nvme_core]
[ 81.962053] async_run_entry_fn+0x31/0x130
[ 81.962088] process_one_work+0x21a/0x590
[ 81.962122] worker_thread+0x1c3/0x3b0
[ 81.962153] kthread+0xd2/0x100
[ 81.962179] ret_from_fork+0x31/0x50
[ 81.962211] ret_from_fork_asm+0x1a/0x30
[ 81.962243]
-> #5 (&q->debugfs_mutex){+.+.}-{4:4}:
[ 81.962287] __mutex_lock+0xad/0xb80
[ 81.962319] blk_mq_init_sched+0x181/0x260
[ 81.962350] elevator_init_mq+0xb0/0x100
[ 81.962381] add_disk_fwnode+0x50/0x430
[ 81.962412] sd_probe+0x335/0x530
[ 81.962441] really_probe+0xdb/0x340
[ 81.962474] __driver_probe_device+0x78/0x110
[ 81.962510] driver_probe_device+0x1f/0xa0
[ 81.962545] __device_attach_driver+0x89/0x110
[ 81.962581] bus_for_each_drv+0x98/0xf0
[ 81.962613] __device_attach_async_helper+0xa5/0xf0
[ 81.962651] async_run_entry_fn+0x31/0x130
[ 81.962686] process_one_work+0x21a/0x590
[ 81.962718] worker_thread+0x1c3/0x3b0
[ 81.962750] kthread+0xd2/0x100
[ 81.962775] ret_from_fork+0x31/0x50
[ 81.962806] ret_from_fork_asm+0x1a/0x30
[ 81.962838]
-> #4 (&q->q_usage_counter(queue)#3){++++}-{0:0}:
[ 81.962887] blk_queue_enter+0x1bc/0x1e0
[ 81.962918] blk_mq_alloc_request+0x144/0x2b0
[ 81.962951] scsi_execute_cmd+0x78/0x490
[ 81.962985] read_capacity_16+0x111/0x410
[ 81.963017] sd_revalidate_disk.isra.0+0x545/0x2eb0
[ 81.963053] sd_probe+0x2eb/0x530
[ 81.963081] really_probe+0xdb/0x340
[ 81.963112] __driver_probe_device+0x78/0x110
[ 81.963148] driver_probe_device+0x1f/0xa0
[ 81.963182] __device_attach_driver+0x89/0x110
[ 81.963218] bus_for_each_drv+0x98/0xf0
[ 81.963250] __device_attach_async_helper+0xa5/0xf0
[ 81.964380] async_run_entry_fn+0x31/0x130
[ 81.965502] process_one_work+0x21a/0x590
[ 81.965868] worker_thread+0x1c3/0x3b0
[ 81.966198] kthread+0xd2/0x100
[ 81.966528] ret_from_fork+0x31/0x50
[ 81.966855] ret_from_fork_asm+0x1a/0x30
[ 81.967179]
-> #3 (&q->limits_lock){+.+.}-{4:4}:
[ 81.967815] __mutex_lock+0xad/0xb80
[ 81.968133] nvme_update_ns_info_block+0x128/0x870 [nvme_core]
[ 81.968456] nvme_update_ns_info+0x41/0x220 [nvme_core]
[ 81.968774] nvme_alloc_ns+0x8a6/0xb50 [nvme_core]
[ 81.969090] nvme_scan_ns+0x251/0x330 [nvme_core]
[ 81.969401] async_run_entry_fn+0x31/0x130
[ 81.969703] process_one_work+0x21a/0x590
[ 81.970004] worker_thread+0x1c3/0x3b0
[ 81.970302] kthread+0xd2/0x100
[ 81.970603] ret_from_fork+0x31/0x50
[ 81.970901] ret_from_fork_asm+0x1a/0x30
[ 81.971195]
-> #2 (&q->q_usage_counter(io)){++++}-{0:0}:
[ 81.971776] blk_mq_submit_bio+0x90b/0xb00
[ 81.972071] __submit_bio+0xf0/0x1c0
[ 81.972364] submit_bio_noacct_nocheck+0x324/0x420
[ 81.972660] btrfs_submit_chunk+0x1a7/0x660
[ 81.972956] btrfs_submit_bbio+0x1a/0x30
[ 81.973250] read_extent_buffer_pages+0x15e/0x210
[ 81.973546] btrfs_read_extent_buffer+0x93/0x180
[ 81.973841] read_tree_block+0x30/0x60
[ 81.974137] read_block_for_search+0x219/0x320
[ 81.974432] btrfs_search_slot+0x335/0xd30
[ 81.974729] btrfs_init_root_free_objectid+0x90/0x130
[ 81.975027] open_ctree+0xa35/0x13eb
[ 81.975326] btrfs_get_tree.cold+0x6b/0xfd
[ 81.975627] vfs_get_tree+0x29/0xe0
[ 81.975926] fc_mount+0x12/0x40
[ 81.976223] btrfs_get_tree+0x2c1/0x6b0
[ 81.976520] vfs_get_tree+0x29/0xe0
[ 81.976816] vfs_cmd_create+0x59/0xe0
[ 81.977112] __do_sys_fsconfig+0x4f3/0x6c0
[ 81.977408] do_syscall_64+0x95/0x180
[ 81.977705] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 81.978005]
-> #1 (btrfs-root-01){++++}-{4:4}:
[ 81.978595] down_read_nested+0x34/0x150
[ 81.978895] btrfs_tree_read_lock_nested+0x25/0xd0
[ 81.979195] btrfs_read_lock_root_node+0x44/0xe0
[ 81.979494] btrfs_search_slot+0x143/0xd30
[ 81.979793] btrfs_search_backwards+0x2e/0x90
[ 81.980108] btrfs_get_subvol_name_from_objectid+0xd8/0x3c0
[ 81.980409] btrfs_show_options+0x294/0x780
[ 81.980718] show_mountinfo+0x207/0x2a0
[ 81.981025] seq_read_iter+0x2bc/0x480
[ 81.981327] vfs_read+0x294/0x370
[ 81.981628] ksys_read+0x73/0xf0
[ 81.981925] do_syscall_64+0x95/0x180
[ 81.982222] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 81.982523]
-> #0 (namespace_sem){++++}-{4:4}:
[ 81.983116] __lock_acquire+0x13ac/0x2170
[ 81.983415] lock_acquire+0xd0/0x2f0
[ 81.983714] down_write+0x2e/0xb0
[ 81.984014] finish_automount+0x77/0x3a0
[ 81.984313] __traverse_mounts+0x9d/0x210
[ 81.984612] step_into+0x349/0x770
[ 81.984909] link_path_walk.part.0.constprop.0+0x21e/0x390
[ 81.985211] path_lookupat+0x3e/0x1a0
[ 81.985511] filename_lookup+0xde/0x1d0
[ 81.985811] vfs_statx+0x8d/0x100
[ 81.986108] vfs_fstatat+0x63/0xc0
[ 81.986405] __do_sys_newfstatat+0x3c/0x80
[ 81.986704] do_syscall_64+0x95/0x180
[ 81.987005] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 81.987307]
other info that might help us debug this:
[ 81.988204] Chain exists of:
namespace_sem --> &q->debugfs_mutex --> &sb->s_type-
>i_mutex_key#3
[ 81.989113] Possible unsafe locking scenario:
[ 81.989723] CPU0 CPU1
[ 81.990028] ---- ----
[ 81.990331] lock(&sb->s_type->i_mutex_key#3);
[ 81.990637] lock(&q->debugfs_mutex);
[ 81.990947] lock(&sb->s_type-
>i_mutex_key#3);
[ 81.991257] lock(namespace_sem);
[ 81.991565]
*** DEADLOCK ***
[ 81.992465] 1 lock held by bash/2744:
[ 81.992766] #0: ffff8d7a8051ce50 (&sb->s_type-
>i_mutex_key#3){++++}-{4:4}, at: finish_automount+0x6b/0x3a0
[ 81.993084]
stack backtrace:
[ 81.993704] CPU: 2 UID: 0 PID: 2744 Comm: bash Tainted: G U
6.12.0-rc4+ #3
[ 81.994025] Tainted: [U]=USER
[ 81.994343] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[ 81.994673] Call Trace:
[ 81.995002] <TASK>
[ 81.995328] dump_stack_lvl+0x6e/0xa0
[ 81.995657] print_circular_bug.cold+0x178/0x1be
[ 81.995987] check_noncircular+0x148/0x160
[ 81.996318] __lock_acquire+0x13ac/0x2170
[ 81.996649] lock_acquire+0xd0/0x2f0
[ 81.996978] ? finish_automount+0x77/0x3a0
[ 81.997328] ? vfs_kern_mount.part.0+0x50/0xb0
[ 81.997660] ? kfree+0xd8/0x370
[ 81.997991] down_write+0x2e/0xb0
[ 81.998318] ? finish_automount+0x77/0x3a0
[ 81.998646] finish_automount+0x77/0x3a0
[ 81.998975] __traverse_mounts+0x9d/0x210
[ 81.999303] step_into+0x349/0x770
[ 81.999629] link_path_walk.part.0.constprop.0+0x21e/0x390
[ 81.999958] path_lookupat+0x3e/0x1a0
[ 82.000287] filename_lookup+0xde/0x1d0
[ 82.000618] vfs_statx+0x8d/0x100
[ 82.000944] ? strncpy_from_user+0x22/0xf0
[ 82.001271] vfs_fstatat+0x63/0xc0
[ 82.001614] __do_sys_newfstatat+0x3c/0x80
[ 82.001943] do_syscall_64+0x95/0x180
[ 82.002270] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 82.002601] ? syscall_exit_to_user_mode+0x97/0x290
[ 82.002933] ? do_syscall_64+0xa1/0x180
[ 82.003262] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 82.003592] ? syscall_exit_to_user_mode+0x97/0x290
[ 82.003923] ? clear_bhb_loop+0x45/0xa0
[ 82.004251] ? clear_bhb_loop+0x45/0xa0
[ 82.004576] ? clear_bhb_loop+0x45/0xa0
[ 82.004899] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 82.005224] RIP: 0033:0x7f82fa5ef73e
[ 82.005558] Code: 0f 1f 40 00 48 8b 15 d1 66 10 00 f7 d8 64 89 02 b8
ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 41 89 ca b8 06 01 00 00 0f
05 <3d> 00 f0 ff ff 77 0b 31 c0 c3 0f 1f 84 00 00 00 00 00 48 8b 15 99
[ 82.005925] RSP: 002b:00007ffd2e30cd98 EFLAGS: 00000246 ORIG_RAX:
0000000000000106
[ 82.006301] RAX: ffffffffffffffda RBX: 000055dfa1952b30 RCX:
00007f82fa5ef73e
[ 82.006678] RDX: 00007ffd2e30cdc0 RSI: 000055dfa1952b10 RDI:
00000000ffffff9c
[ 82.007057] RBP: 00007ffd2e30ce90 R08: 000055dfa1952b30 R09:
00007f82fa6f6b20
[ 82.007437] R10: 0000000000000000 R11: 0000000000000246 R12:
000055dfa1952b11
[ 82.007822] R13: 000055dfa1952b30 R14: 000055dfa1952b11 R15:
0000000000000003
[ 82.008209] </TASK>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: Blockdev 6.13-rc lockdep splat regressions 2025-01-10 10:12 Blockdev 6.13-rc lockdep splat regressions Thomas Hellström @ 2025-01-10 10:14 ` Christoph Hellwig 2025-01-10 10:21 ` Thomas Hellström 2025-01-10 12:13 ` Ming Lei 1 sibling, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2025-01-10 10:14 UTC (permalink / raw) To: Thomas Hellström Cc: Ming Lei, Jens Axboe, Christoph Hellwig, linux-block On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote: > Ming, Others > > On 6.13-rc6 I'm seeing a couple of lockdep splats which appear > introduced by the commit > > f1be1788a32e ("block: model freeze & enter queue as lock for supporting > lockdep") > > The first one happens when swap-outs start to a scsi disc, > Simple reproducer is to start a couple of parallel "gitk" on the kernel > repo and watch them exhaust available memory. > > the second is easily triggered by entering a debugfs trace directory, > apparently triggering automount: > cd /sys/kernel/debug/tracing/events > > Are you aware of these? Yes, this series fixes it: https://lore.kernel.org/linux-block/20250110054726.1499538-1-hch@lst.de/ should be ready now that the nitpicking has settled down. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions 2025-01-10 10:14 ` Christoph Hellwig @ 2025-01-10 10:21 ` Thomas Hellström 0 siblings, 0 replies; 14+ messages in thread From: Thomas Hellström @ 2025-01-10 10:21 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Ming Lei, Jens Axboe, linux-block On Fri, 2025-01-10 at 11:14 +0100, Christoph Hellwig wrote: > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote: > > Ming, Others > > > > On 6.13-rc6 I'm seeing a couple of lockdep splats which appear > > introduced by the commit > > > > f1be1788a32e ("block: model freeze & enter queue as lock for > > supporting > > lockdep") > > > > The first one happens when swap-outs start to a scsi disc, > > Simple reproducer is to start a couple of parallel "gitk" on the > > kernel > > repo and watch them exhaust available memory. > > > > the second is easily triggered by entering a debugfs trace > > directory, > > apparently triggering automount: > > cd /sys/kernel/debug/tracing/events > > > > Are you aware of these? > > Yes, this series fixes it: > > https://lore.kernel.org/linux-block/20250110054726.1499538-1-hch@lst.de/ > > should be ready now that the nitpicking has settled down. > Great. Thanks. /Thomas ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions 2025-01-10 10:12 Blockdev 6.13-rc lockdep splat regressions Thomas Hellström 2025-01-10 10:14 ` Christoph Hellwig @ 2025-01-10 12:13 ` Ming Lei 2025-01-10 14:36 ` Thomas Hellström 1 sibling, 1 reply; 14+ messages in thread From: Ming Lei @ 2025-01-10 12:13 UTC (permalink / raw) To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote: > Ming, Others > > On 6.13-rc6 I'm seeing a couple of lockdep splats which appear > introduced by the commit > > f1be1788a32e ("block: model freeze & enter queue as lock for supporting > lockdep") The freeze lock connects all kinds of sub-system locks, that is why we see lots of warnings after the commit is merged. ... > #1 > [ 399.006581] ====================================================== > [ 399.006756] WARNING: possible circular locking dependency detected > [ 399.006767] 6.12.0-rc4+ #1 Tainted: G U N > [ 399.006776] ------------------------------------------------------ > [ 399.006801] kswapd0/116 is trying to acquire lock: > [ 399.006810] ffff9a67a1284a28 (&q->q_usage_counter(io)){++++}-{0:0}, > at: __submit_bio+0xf0/0x1c0 > [ 399.006845] > but task is already holding lock: > [ 399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at: > balance_pgdat+0xe2/0xa20 > [ 399.006874] The above one is solved in for-6.14/block of block tree: block: track queue dying state automatically for modeling queue freeze lockdep > > #2: > [ 81.960829] ====================================================== > [ 81.961010] WARNING: possible circular locking dependency detected > [ 81.961048] 6.12.0-rc4+ #3 Tainted: G U ... > -> #3 (&q->limits_lock){+.+.}-{4:4}: > [ 81.967815] __mutex_lock+0xad/0xb80 > [ 81.968133] nvme_update_ns_info_block+0x128/0x870 [nvme_core] > [ 81.968456] nvme_update_ns_info+0x41/0x220 [nvme_core] > [ 81.968774] nvme_alloc_ns+0x8a6/0xb50 [nvme_core] > [ 81.969090] nvme_scan_ns+0x251/0x330 [nvme_core] > [ 81.969401] async_run_entry_fn+0x31/0x130 > [ 81.969703] process_one_work+0x21a/0x590 > [ 81.970004] worker_thread+0x1c3/0x3b0 > [ 81.970302] kthread+0xd2/0x100 > [ 81.970603] ret_from_fork+0x31/0x50 > [ 81.970901] ret_from_fork_asm+0x1a/0x30 > [ 81.971195] > -> #2 (&q->q_usage_counter(io)){++++}-{0:0}: The above dependency is killed by Christoph's patch. Thanks, Ming ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions 2025-01-10 12:13 ` Ming Lei @ 2025-01-10 14:36 ` Thomas Hellström 2025-01-11 3:05 ` Ming Lei 0 siblings, 1 reply; 14+ messages in thread From: Thomas Hellström @ 2025-01-10 14:36 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote: > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote: > > Ming, Others > > > > On 6.13-rc6 I'm seeing a couple of lockdep splats which appear > > introduced by the commit > > > > f1be1788a32e ("block: model freeze & enter queue as lock for > > supporting > > lockdep") > > The freeze lock connects all kinds of sub-system locks, that is why > we see lots of warnings after the commit is merged. > > ... > > > #1 > > [ 399.006581] > > ====================================================== > > [ 399.006756] WARNING: possible circular locking dependency > > detected > > [ 399.006767] 6.12.0-rc4+ #1 Tainted: G U N > > [ 399.006776] ---------------------------------------------------- > > -- > > [ 399.006801] kswapd0/116 is trying to acquire lock: > > [ 399.006810] ffff9a67a1284a28 (&q->q_usage_counter(io)){++++}- > > {0:0}, > > at: __submit_bio+0xf0/0x1c0 > > [ 399.006845] > > but task is already holding lock: > > [ 399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at: > > balance_pgdat+0xe2/0xa20 > > [ 399.006874] > > The above one is solved in for-6.14/block of block tree: > > block: track queue dying state automatically for modeling > queue freeze lockdep Hmm. I applied this series: https://patchwork.kernel.org/project/linux-block/list/?series=912824&archive=both on top of -rc6, but it didn't resolve that splat. Am I using the correct patches? Perhaps it might be a good idea to reclaim-prime those lockdep maps taken during reclaim to have the splats happen earlier. Thanks, Thomas > > > > > #2: > > [ 81.960829] > > ====================================================== > > [ 81.961010] WARNING: possible circular locking dependency > > detected > > [ 81.961048] 6.12.0-rc4+ #3 Tainted: G U > > ... > > > -> #3 (&q->limits_lock){+.+.}-{4:4}: > > [ 81.967815] __mutex_lock+0xad/0xb80 > > [ 81.968133] nvme_update_ns_info_block+0x128/0x870 > > [nvme_core] > > [ 81.968456] nvme_update_ns_info+0x41/0x220 [nvme_core] > > [ 81.968774] nvme_alloc_ns+0x8a6/0xb50 [nvme_core] > > [ 81.969090] nvme_scan_ns+0x251/0x330 [nvme_core] > > [ 81.969401] async_run_entry_fn+0x31/0x130 > > [ 81.969703] process_one_work+0x21a/0x590 > > [ 81.970004] worker_thread+0x1c3/0x3b0 > > [ 81.970302] kthread+0xd2/0x100 > > [ 81.970603] ret_from_fork+0x31/0x50 > > [ 81.970901] ret_from_fork_asm+0x1a/0x30 > > [ 81.971195] > > -> #2 (&q->q_usage_counter(io)){++++}-{0:0}: > > The above dependency is killed by Christoph's patch. > > > Thanks, > Ming > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions 2025-01-10 14:36 ` Thomas Hellström @ 2025-01-11 3:05 ` Ming Lei 2025-01-12 11:33 ` Thomas Hellström 0 siblings, 1 reply; 14+ messages in thread From: Ming Lei @ 2025-01-11 3:05 UTC (permalink / raw) To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote: > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote: > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote: > > > Ming, Others > > > > > > On 6.13-rc6 I'm seeing a couple of lockdep splats which appear > > > introduced by the commit > > > > > > f1be1788a32e ("block: model freeze & enter queue as lock for > > > supporting > > > lockdep") > > > > The freeze lock connects all kinds of sub-system locks, that is why > > we see lots of warnings after the commit is merged. > > > > ... > > > > > #1 > > > [ 399.006581] > > > ====================================================== > > > [ 399.006756] WARNING: possible circular locking dependency > > > detected > > > [ 399.006767] 6.12.0-rc4+ #1 Tainted: G U N > > > [ 399.006776] ---------------------------------------------------- > > > -- > > > [ 399.006801] kswapd0/116 is trying to acquire lock: > > > [ 399.006810] ffff9a67a1284a28 (&q->q_usage_counter(io)){++++}- > > > {0:0}, > > > at: __submit_bio+0xf0/0x1c0 > > > [ 399.006845] > > > but task is already holding lock: > > > [ 399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at: > > > balance_pgdat+0xe2/0xa20 > > > [ 399.006874] > > > > The above one is solved in for-6.14/block of block tree: > > > > block: track queue dying state automatically for modeling > > queue freeze lockdep > > Hmm. I applied this series: > > https://patchwork.kernel.org/project/linux-block/list/?series=912824&archive=both > > on top of -rc6, but it didn't resolve that splat. Am I using the > correct patches? > > Perhaps it might be a good idea to reclaim-prime those lockdep maps > taken during reclaim to have the splats happen earlier. for-6.14/block does kill the dependency between fs_reclaim and q->q_usage_counter(io) in scsi_add_lun() when scsi disk isn't added yet. Maybe it is another warning, care to post the warning log here? Thanks, Ming ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions 2025-01-11 3:05 ` Ming Lei @ 2025-01-12 11:33 ` Thomas Hellström 2025-01-12 15:50 ` Ming Lei 2025-01-13 9:28 ` Ming Lei 0 siblings, 2 replies; 14+ messages in thread From: Thomas Hellström @ 2025-01-12 11:33 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote: > On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote: > > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote: > > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote: > > > > Ming, Others > > > > > > > > On 6.13-rc6 I'm seeing a couple of lockdep splats which appear > > > > introduced by the commit > > > > > > > > f1be1788a32e ("block: model freeze & enter queue as lock for > > > > supporting > > > > lockdep") > > > > > > The freeze lock connects all kinds of sub-system locks, that is > > > why > > > we see lots of warnings after the commit is merged. > > > > > > ... > > > > > > > #1 > > > > [ 399.006581] > > > > ====================================================== > > > > [ 399.006756] WARNING: possible circular locking dependency > > > > detected > > > > [ 399.006767] 6.12.0-rc4+ #1 Tainted: G U N > > > > [ 399.006776] ------------------------------------------------ > > > > ---- > > > > -- > > > > [ 399.006801] kswapd0/116 is trying to acquire lock: > > > > [ 399.006810] ffff9a67a1284a28 (&q- > > > > >q_usage_counter(io)){++++}- > > > > {0:0}, > > > > at: __submit_bio+0xf0/0x1c0 > > > > [ 399.006845] > > > > but task is already holding lock: > > > > [ 399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at: > > > > balance_pgdat+0xe2/0xa20 > > > > [ 399.006874] > > > > > > The above one is solved in for-6.14/block of block tree: > > > > > > block: track queue dying state automatically for > > > modeling > > > queue freeze lockdep > > > > Hmm. I applied this series: > > > > https://patchwork.kernel.org/project/linux-block/list/?series=912824&archive=both > > > > on top of -rc6, but it didn't resolve that splat. Am I using the > > correct patches? > > > > Perhaps it might be a good idea to reclaim-prime those lockdep maps > > taken during reclaim to have the splats happen earlier. > > for-6.14/block does kill the dependency between fs_reclaim and > q->q_usage_counter(io) in scsi_add_lun() when scsi disk isn't > added yet. > > Maybe it is another warning, care to post the warning log here? Ah, You're right, it's a different warning this time. Posted the warning below. (Note: This is also with Christoph's series applied on top). May I also humbly suggest the following lockdep priming to be able to catch the reclaim lockdep splats early without reclaim needing to happen. That will also pick up splat #2 below. 8<------------------------------------------------------------- diff --git a/block/blk-core.c b/block/blk-core.c index 32fb28a6372c..2dd8dc9aed7f 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -458,6 +458,11 @@ struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id) q->nr_requests = BLKDEV_DEFAULT_RQ; + fs_reclaim_acquire(GFP_KERNEL); + rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_); + rwsem_release(&q->io_lockdep_map, _RET_IP_); + fs_reclaim_release(GFP_KERNEL); + return q; fail_stats: 8<------------------------------------------------------------- #1: 106.921533] ====================================================== [ 106.921716] WARNING: possible circular locking dependency detected [ 106.921725] 6.13.0-rc6+ #121 Tainted: G U [ 106.921734] ------------------------------------------------------ [ 106.921743] kswapd0/117 is trying to acquire lock: [ 106.921751] ffff8ff4e2da09f0 (&q->q_usage_counter(io)){++++}-{0:0}, at: __submit_bio+0x80/0x220 [ 106.921769] but task is already holding lock: [ 106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xe2/0xa10 [ 106.921791] which lock already depends on the new lock. [ 106.921803] the existing dependency chain (in reverse order) is: [ 106.921814] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 106.921824] fs_reclaim_acquire+0x9d/0xd0 [ 106.921833] __kmalloc_cache_node_noprof+0x5d/0x3f0 [ 106.921842] blk_mq_init_tags+0x3d/0xb0 [ 106.921851] blk_mq_alloc_map_and_rqs+0x4e/0x3d0 [ 106.921860] blk_mq_init_sched+0x100/0x260 [ 106.921868] elevator_switch+0x8d/0x2e0 [ 106.921877] elv_iosched_store+0x174/0x1e0 [ 106.921885] queue_attr_store+0x142/0x180 [ 106.921893] kernfs_fop_write_iter+0x168/0x240 [ 106.921902] vfs_write+0x2b2/0x540 [ 106.921910] ksys_write+0x72/0xf0 [ 106.921916] do_syscall_64+0x95/0x180 [ 106.921925] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 106.921935] -> #0 (&q->q_usage_counter(io)){++++}-{0:0}: [ 106.921946] __lock_acquire+0x1339/0x2180 [ 106.921955] lock_acquire+0xd0/0x2e0 [ 106.921963] blk_mq_submit_bio+0x88b/0xb60 [ 106.921972] __submit_bio+0x80/0x220 [ 106.921980] submit_bio_noacct_nocheck+0x324/0x420 [ 106.921989] swap_writepage+0x399/0x580 [ 106.921997] pageout+0x129/0x2d0 [ 106.922005] shrink_folio_list+0x5a0/0xd80 [ 106.922013] evict_folios+0x27d/0x7b0 [ 106.922020] try_to_shrink_lruvec+0x21b/0x2b0 [ 106.922028] shrink_one+0x102/0x1f0 [ 106.922035] shrink_node+0xb8e/0x1300 [ 106.922043] balance_pgdat+0x550/0xa10 [ 106.922050] kswapd+0x20a/0x440 [ 106.922057] kthread+0xd2/0x100 [ 106.922064] ret_from_fork+0x31/0x50 [ 106.922072] ret_from_fork_asm+0x1a/0x30 [ 106.922080] other info that might help us debug this: [ 106.922092] Possible unsafe locking scenario: [ 106.922101] CPU0 CPU1 [ 106.922108] ---- ---- [ 106.922115] lock(fs_reclaim); [ 106.922121] lock(&q- >q_usage_counter(io)); [ 106.922132] lock(fs_reclaim); [ 106.922141] rlock(&q->q_usage_counter(io)); [ 106.922148] *** DEADLOCK *** [ 106.922476] 1 lock held by kswapd0/117: [ 106.922802] #0: ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xe2/0xa10 [ 106.923138] stack backtrace: [ 106.923806] CPU: 3 UID: 0 PID: 117 Comm: kswapd0 Tainted: G U 6.13.0-rc6+ #121 [ 106.924173] Tainted: [U]=USER [ 106.924523] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023 [ 106.924882] Call Trace: [ 106.925223] <TASK> [ 106.925559] dump_stack_lvl+0x6e/0xa0 [ 106.925893] print_circular_bug.cold+0x178/0x1be [ 106.926233] check_noncircular+0x148/0x160 [ 106.926565] ? unwind_next_frame+0x42a/0x750 [ 106.926905] __lock_acquire+0x1339/0x2180 [ 106.927227] lock_acquire+0xd0/0x2e0 [ 106.927546] ? __submit_bio+0x80/0x220 [ 106.927892] ? blk_mq_submit_bio+0x860/0xb60 [ 106.928212] ? lock_release+0xd2/0x2a0 [ 106.928536] blk_mq_submit_bio+0x88b/0xb60 [ 106.928850] ? __submit_bio+0x80/0x220 [ 106.929184] __submit_bio+0x80/0x220 [ 106.929499] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 106.929833] ? submit_bio_noacct_nocheck+0x324/0x420 [ 106.930147] submit_bio_noacct_nocheck+0x324/0x420 [ 106.930464] swap_writepage+0x399/0x580 [ 106.930794] pageout+0x129/0x2d0 [ 106.931114] shrink_folio_list+0x5a0/0xd80 [ 106.931447] ? evict_folios+0x25d/0x7b0 [ 106.931776] evict_folios+0x27d/0x7b0 [ 106.932092] try_to_shrink_lruvec+0x21b/0x2b0 [ 106.932410] shrink_one+0x102/0x1f0 [ 106.932742] shrink_node+0xb8e/0x1300 [ 106.933056] ? shrink_node+0x9c1/0x1300 [ 106.933368] ? shrink_node+0xb64/0x1300 [ 106.933679] ? balance_pgdat+0x550/0xa10 [ 106.933988] balance_pgdat+0x550/0xa10 [ 106.934296] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 106.934607] ? finish_task_switch.isra.0+0xc4/0x2a0 [ 106.934920] kswapd+0x20a/0x440 [ 106.935229] ? __pfx_autoremove_wake_function+0x10/0x10 [ 106.935542] ? __pfx_kswapd+0x10/0x10 [ 106.935881] kthread+0xd2/0x100 [ 106.936191] ? __pfx_kthread+0x10/0x10 [ 106.936501] ret_from_fork+0x31/0x50 [ 106.936810] ? __pfx_kthread+0x10/0x10 [ 106.937120] ret_from_fork_asm+0x1a/0x30 [ 106.937433] </TASK> #2: [ 5.595482] ====================================================== [ 5.596353] WARNING: possible circular locking dependency detected [ 5.597231] 6.13.0-rc6+ #122 Tainted: G U [ 5.598182] ------------------------------------------------------ [ 5.599149] (udev-worker)/867 is trying to acquire lock: [ 5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4}, at: kernfs_remove+0x31/0x50 [ 5.600987] but task is already holding lock: [ 5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}- {0:0}, at: blk_mq_freeze_queue+0x12/0x20 [ 5.603033] which lock already depends on the new lock. [ 5.603034] the existing dependency chain (in reverse order) is: [ 5.603035] -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}: [ 5.603038] blk_alloc_queue+0x319/0x350 [ 5.603041] blk_mq_alloc_queue+0x63/0xd0 [ 5.603043] scsi_alloc_sdev+0x281/0x3c0 [ 5.603045] scsi_probe_and_add_lun+0x1f5/0x450 [ 5.603046] __scsi_scan_target+0x112/0x230 [ 5.603048] scsi_scan_channel+0x59/0x90 [ 5.603049] scsi_scan_host_selected+0xe5/0x120 [ 5.603051] do_scan_async+0x1b/0x160 [ 5.603052] async_run_entry_fn+0x31/0x130 [ 5.603055] process_one_work+0x21a/0x590 [ 5.603058] worker_thread+0x1c3/0x3b0 [ 5.603059] kthread+0xd2/0x100 [ 5.603061] ret_from_fork+0x31/0x50 [ 5.603064] ret_from_fork_asm+0x1a/0x30 [ 5.603066] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 5.603068] fs_reclaim_acquire+0x9d/0xd0 [ 5.603070] kmem_cache_alloc_lru_noprof+0x57/0x3f0 [ 5.603072] alloc_inode+0x97/0xc0 [ 5.603074] iget_locked+0x141/0x310 [ 5.603076] kernfs_get_inode+0x1a/0xf0 [ 5.603077] kernfs_get_tree+0x17b/0x2c0 [ 5.603080] sysfs_get_tree+0x1a/0x40 [ 5.603081] vfs_get_tree+0x29/0xe0 [ 5.603083] path_mount+0x49a/0xbd0 [ 5.603085] __x64_sys_mount+0x119/0x150 [ 5.603086] do_syscall_64+0x95/0x180 [ 5.603089] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 5.603092] -> #0 (&root->kernfs_rwsem){++++}-{4:4}: [ 5.603094] __lock_acquire+0x1339/0x2180 [ 5.603097] lock_acquire+0xd0/0x2e0 [ 5.603099] down_write+0x2e/0xb0 [ 5.603101] kernfs_remove+0x31/0x50 [ 5.603103] __kobject_del+0x2e/0x90 [ 5.603104] kobject_del+0x13/0x30 [ 5.603104] elevator_switch+0x44/0x2e0 [ 5.603106] elv_iosched_store+0x174/0x1e0 [ 5.603107] queue_attr_store+0x142/0x180 [ 5.603108] kernfs_fop_write_iter+0x168/0x240 [ 5.603110] vfs_write+0x2b2/0x540 [ 5.603111] ksys_write+0x72/0xf0 [ 5.603111] do_syscall_64+0x95/0x180 [ 5.603113] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 5.603114] other info that might help us debug this: [ 5.603115] Chain exists of: &root->kernfs_rwsem --> fs_reclaim --> &q- >q_usage_counter(io)#3 [ 5.603117] Possible unsafe locking scenario: [ 5.603117] CPU0 CPU1 [ 5.603117] ---- ---- [ 5.603118] lock(&q->q_usage_counter(io)#3); [ 5.603119] lock(fs_reclaim); [ 5.603119] lock(&q- >q_usage_counter(io)#3); [ 5.603120] lock(&root->kernfs_rwsem); [ 5.603121] *** DEADLOCK *** [ 5.603121] 6 locks held by (udev-worker)/867: [ 5.603122] #0: ffff9211c16dd420 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x72/0xf0 [ 5.603125] #1: ffff9211e28f3e88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x121/0x240 [ 5.603128] #2: ffff921203524f28 (kn->active#101){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x12a/0x240 [ 5.603131] #3: ffff9211e86f46d0 (&q->sysfs_lock){+.+.}-{4:4}, at: queue_attr_store+0x12b/0x180 [ 5.603133] #4: ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}- {0:0}, at: blk_mq_freeze_queue+0x12/0x20 [ 5.603136] #5: ffff9211e86f41d8 (&q- >q_usage_counter(queue)#3){++++}-{0:0}, at: blk_mq_freeze_queue+0x12/0x20 [ 5.603139] stack backtrace: [ 5.603140] CPU: 4 UID: 0 PID: 867 Comm: (udev-worker) Tainted: G U 6.13.0-rc6+ #122 [ 5.603142] Tainted: [U]=USER [ 5.603142] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023 [ 5.603143] Call Trace: [ 5.603144] <TASK> [ 5.603146] dump_stack_lvl+0x6e/0xa0 [ 5.603148] print_circular_bug.cold+0x178/0x1be [ 5.603151] check_noncircular+0x148/0x160 [ 5.603154] __lock_acquire+0x1339/0x2180 [ 5.603156] lock_acquire+0xd0/0x2e0 [ 5.603158] ? kernfs_remove+0x31/0x50 [ 5.603160] ? sysfs_remove_dir+0x32/0x60 [ 5.603162] ? lock_release+0xd2/0x2a0 [ 5.603164] down_write+0x2e/0xb0 [ 5.603165] ? kernfs_remove+0x31/0x50 [ 5.603166] kernfs_remove+0x31/0x50 [ 5.603168] __kobject_del+0x2e/0x90 [ 5.603170] elevator_switch+0x44/0x2e0 [ 5.603172] elv_iosched_store+0x174/0x1e0 [ 5.603174] queue_attr_store+0x142/0x180 [ 5.603176] ? lock_acquire+0xd0/0x2e0 [ 5.603177] ? kernfs_fop_write_iter+0x12a/0x240 [ 5.603179] ? lock_is_held_type+0x9a/0x110 [ 5.603182] kernfs_fop_write_iter+0x168/0x240 [ 5.657060] vfs_write+0x2b2/0x540 [ 5.657470] ksys_write+0x72/0xf0 [ 5.657475] do_syscall_64+0x95/0x180 [ 5.657480] ? lock_acquire+0xd0/0x2e0 [ 5.657484] ? ktime_get_coarse_real_ts64+0x12/0x60 [ 5.657486] ? find_held_lock+0x2b/0x80 [ 5.657489] ? ktime_get_coarse_real_ts64+0x12/0x60 [ 5.657490] ? file_has_perm+0xa9/0xf0 [ 5.657494] ? syscall_exit_to_user_mode_prepare+0x21b/0x250 [ 5.657499] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 5.657501] ? syscall_exit_to_user_mode+0x97/0x290 [ 5.657504] ? do_syscall_64+0xa1/0x180 [ 5.657507] ? lock_acquire+0xd0/0x2e0 [ 5.662389] ? fd_install+0x3e/0x300 [ 5.662395] ? find_held_lock+0x2b/0x80 [ 5.663189] ? fd_install+0xbb/0x300 [ 5.663194] ? do_sys_openat2+0x9c/0xe0 [ 5.664093] ? kmem_cache_free+0x13e/0x450 [ 5.664099] ? syscall_exit_to_user_mode_prepare+0x21b/0x250 [ 5.664952] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 5.664956] ? syscall_exit_to_user_mode+0x97/0x290 [ 5.664961] ? do_syscall_64+0xa1/0x180 [ 5.664964] ? syscall_exit_to_user_mode_prepare+0x21b/0x250 [ 5.664967] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 5.664969] ? syscall_exit_to_user_mode+0x97/0x290 [ 5.664972] ? do_syscall_64+0xa1/0x180 [ 5.664974] ? clear_bhb_loop+0x45/0xa0 [ 5.664977] ? clear_bhb_loop+0x45/0xa0 [ 5.664979] ? clear_bhb_loop+0x45/0xa0 [ 5.664982] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 5.664985] RIP: 0033:0x7fe72d2f4484 [ 5.664988] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d 45 9c 10 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 [ 5.664990] RSP: 002b:00007ffe51665998 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 5.664992] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fe72d2f4484 [ 5.664994] RDX: 0000000000000003 RSI: 00007ffe51665ca0 RDI: 0000000000000038 [ 5.664995] RBP: 00007ffe516659c0 R08: 00007fe72d3f51c8 R09: 00007ffe51665a70 [ 5.664996] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003 [ 5.664997] R13: 00007ffe51665ca0 R14: 000055a1bab093b0 R15: 00007fe72d3f4e80 [ 5.665001] </TASK> Thanks, Thomas ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions 2025-01-12 11:33 ` Thomas Hellström @ 2025-01-12 15:50 ` Ming Lei 2025-01-12 17:44 ` Thomas Hellström 2025-01-13 9:28 ` Ming Lei 1 sibling, 1 reply; 14+ messages in thread From: Ming Lei @ 2025-01-12 15:50 UTC (permalink / raw) To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote: > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote: ... > > Ah, You're right, it's a different warning this time. Posted the > warning below. (Note: This is also with Christoph's series applied on > top). > > May I also humbly suggest the following lockdep priming to be able to > catch the reclaim lockdep splats early without reclaim needing to > happen. That will also pick up splat #2 below. > > 8<------------------------------------------------------------- > > diff --git a/block/blk-core.c b/block/blk-core.c > index 32fb28a6372c..2dd8dc9aed7f 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -458,6 +458,11 @@ struct request_queue *blk_alloc_queue(struct > queue_limits *lim, int node_id) > > q->nr_requests = BLKDEV_DEFAULT_RQ; > > + fs_reclaim_acquire(GFP_KERNEL); > + rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_); > + rwsem_release(&q->io_lockdep_map, _RET_IP_); > + fs_reclaim_release(GFP_KERNEL); > + > return q; Looks one nice idea for injecting fs_reclaim, maybe it can be added to inject framework? > > fail_stats: > > 8<------------------------------------------------------------- > > #1: > 106.921533] ====================================================== > [ 106.921716] WARNING: possible circular locking dependency detected > [ 106.921725] 6.13.0-rc6+ #121 Tainted: G U > [ 106.921734] ------------------------------------------------------ > [ 106.921743] kswapd0/117 is trying to acquire lock: > [ 106.921751] ffff8ff4e2da09f0 (&q->q_usage_counter(io)){++++}-{0:0}, > at: __submit_bio+0x80/0x220 > [ 106.921769] > but task is already holding lock: > [ 106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at: > balance_pgdat+0xe2/0xa10 > [ 106.921791] > which lock already depends on the new lock. > > [ 106.921803] > the existing dependency chain (in reverse order) is: > [ 106.921814] > -> #1 (fs_reclaim){+.+.}-{0:0}: > [ 106.921824] fs_reclaim_acquire+0x9d/0xd0 > [ 106.921833] __kmalloc_cache_node_noprof+0x5d/0x3f0 > [ 106.921842] blk_mq_init_tags+0x3d/0xb0 > [ 106.921851] blk_mq_alloc_map_and_rqs+0x4e/0x3d0 > [ 106.921860] blk_mq_init_sched+0x100/0x260 > [ 106.921868] elevator_switch+0x8d/0x2e0 > [ 106.921877] elv_iosched_store+0x174/0x1e0 > [ 106.921885] queue_attr_store+0x142/0x180 > [ 106.921893] kernfs_fop_write_iter+0x168/0x240 > [ 106.921902] vfs_write+0x2b2/0x540 > [ 106.921910] ksys_write+0x72/0xf0 > [ 106.921916] do_syscall_64+0x95/0x180 > [ 106.921925] entry_SYSCALL_64_after_hwframe+0x76/0x7e That is another regression from commit af2814149883 block: freeze the queue in queue_attr_store and queue_wb_lat_store() has same risk too. I will cook a patch to fix it. Thanks, Ming ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions 2025-01-12 15:50 ` Ming Lei @ 2025-01-12 17:44 ` Thomas Hellström 2025-01-13 0:55 ` Ming Lei 0 siblings, 1 reply; 14+ messages in thread From: Thomas Hellström @ 2025-01-12 17:44 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block On Sun, 2025-01-12 at 23:50 +0800, Ming Lei wrote: > On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote: > > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote: > > ... > > > > > Ah, You're right, it's a different warning this time. Posted the > > warning below. (Note: This is also with Christoph's series applied > > on > > top). > > > > May I also humbly suggest the following lockdep priming to be able > > to > > catch the reclaim lockdep splats early without reclaim needing to > > happen. That will also pick up splat #2 below. > > > > 8<------------------------------------------------------------- > > > > diff --git a/block/blk-core.c b/block/blk-core.c > > index 32fb28a6372c..2dd8dc9aed7f 100644 > > --- a/block/blk-core.c > > +++ b/block/blk-core.c > > @@ -458,6 +458,11 @@ struct request_queue *blk_alloc_queue(struct > > queue_limits *lim, int node_id) > > > > q->nr_requests = BLKDEV_DEFAULT_RQ; > > > > + fs_reclaim_acquire(GFP_KERNEL); > > + rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_); > > + rwsem_release(&q->io_lockdep_map, _RET_IP_); > > + fs_reclaim_release(GFP_KERNEL); > > + > > return q; > > Looks one nice idea for injecting fs_reclaim, maybe it can be > added to inject framework? For the intel gpu drivers, we typically always prime lockdep like this if we *know* that the lock will be grabbed during reclaim, like if it's part of shrinker processing or similar. So sooner or later we *know* this sequence will happen so we add it near the lock initialization to always be executed when the lock(map) is initialized. So I don't really see a need for them to be periodially injected? > > > > > fail_stats: > > > > 8<------------------------------------------------------------- > > > > #1: > > 106.921533] > > ====================================================== > > [ 106.921716] WARNING: possible circular locking dependency > > detected > > [ 106.921725] 6.13.0-rc6+ #121 Tainted: G U > > [ 106.921734] ---------------------------------------------------- > > -- > > [ 106.921743] kswapd0/117 is trying to acquire lock: > > [ 106.921751] ffff8ff4e2da09f0 (&q->q_usage_counter(io)){++++}- > > {0:0}, > > at: __submit_bio+0x80/0x220 > > [ 106.921769] > > but task is already holding lock: > > [ 106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at: > > balance_pgdat+0xe2/0xa10 > > [ 106.921791] > > which lock already depends on the new lock. > > > > [ 106.921803] > > the existing dependency chain (in reverse order) is: > > [ 106.921814] > > -> #1 (fs_reclaim){+.+.}-{0:0}: > > [ 106.921824] fs_reclaim_acquire+0x9d/0xd0 > > [ 106.921833] __kmalloc_cache_node_noprof+0x5d/0x3f0 > > [ 106.921842] blk_mq_init_tags+0x3d/0xb0 > > [ 106.921851] blk_mq_alloc_map_and_rqs+0x4e/0x3d0 > > [ 106.921860] blk_mq_init_sched+0x100/0x260 > > [ 106.921868] elevator_switch+0x8d/0x2e0 > > [ 106.921877] elv_iosched_store+0x174/0x1e0 > > [ 106.921885] queue_attr_store+0x142/0x180 > > [ 106.921893] kernfs_fop_write_iter+0x168/0x240 > > [ 106.921902] vfs_write+0x2b2/0x540 > > [ 106.921910] ksys_write+0x72/0xf0 > > [ 106.921916] do_syscall_64+0x95/0x180 > > [ 106.921925] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > That is another regression from commit > > af2814149883 block: freeze the queue in queue_attr_store > > and queue_wb_lat_store() has same risk too. > > I will cook a patch to fix it. Thanks. Are these splats going to be silenced for 6.13-rc? Like having the new lockdep checks under a special config until they are fixed? Thanks, Thomas > > Thanks, > Ming > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions 2025-01-12 17:44 ` Thomas Hellström @ 2025-01-13 0:55 ` Ming Lei 2025-01-13 8:48 ` Thomas Hellström 0 siblings, 1 reply; 14+ messages in thread From: Ming Lei @ 2025-01-13 0:55 UTC (permalink / raw) To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block On Sun, Jan 12, 2025 at 06:44:53PM +0100, Thomas Hellström wrote: > On Sun, 2025-01-12 at 23:50 +0800, Ming Lei wrote: > > On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote: > > > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote: > > > > ... > > > > > > > > Ah, You're right, it's a different warning this time. Posted the > > > warning below. (Note: This is also with Christoph's series applied > > > on > > > top). > > > > > > May I also humbly suggest the following lockdep priming to be able > > > to > > > catch the reclaim lockdep splats early without reclaim needing to > > > happen. That will also pick up splat #2 below. > > > > > > 8<------------------------------------------------------------- > > > > > > diff --git a/block/blk-core.c b/block/blk-core.c > > > index 32fb28a6372c..2dd8dc9aed7f 100644 > > > --- a/block/blk-core.c > > > +++ b/block/blk-core.c > > > @@ -458,6 +458,11 @@ struct request_queue *blk_alloc_queue(struct > > > queue_limits *lim, int node_id) > > > > > > q->nr_requests = BLKDEV_DEFAULT_RQ; > > > > > > + fs_reclaim_acquire(GFP_KERNEL); > > > + rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_); > > > + rwsem_release(&q->io_lockdep_map, _RET_IP_); > > > + fs_reclaim_release(GFP_KERNEL); > > > + > > > return q; > > > > Looks one nice idea for injecting fs_reclaim, maybe it can be > > added to inject framework? > > For the intel gpu drivers, we typically always prime lockdep like this > if we *know* that the lock will be grabbed during reclaim, like if it's > part of shrinker processing or similar. > > So sooner or later we *know* this sequence will happen so we add it > near the lock initialization to always be executed when the lock(map) > is initialized. > > So I don't really see a need for them to be periodially injected? What I suggested is to add the verification for every allocation with direct reclaim by one kernel config which depends on both lockdep and fault inject. > > > > > > > > > fail_stats: > > > > > > 8<------------------------------------------------------------- > > > > > > #1: > > > 106.921533] > > > ====================================================== > > > [ 106.921716] WARNING: possible circular locking dependency > > > detected > > > [ 106.921725] 6.13.0-rc6+ #121 Tainted: G U > > > [ 106.921734] ---------------------------------------------------- > > > -- > > > [ 106.921743] kswapd0/117 is trying to acquire lock: > > > [ 106.921751] ffff8ff4e2da09f0 (&q->q_usage_counter(io)){++++}- > > > {0:0}, > > > at: __submit_bio+0x80/0x220 > > > [ 106.921769] > > > but task is already holding lock: > > > [ 106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at: > > > balance_pgdat+0xe2/0xa10 > > > [ 106.921791] > > > which lock already depends on the new lock. > > > > > > [ 106.921803] > > > the existing dependency chain (in reverse order) is: > > > [ 106.921814] > > > -> #1 (fs_reclaim){+.+.}-{0:0}: > > > [ 106.921824] fs_reclaim_acquire+0x9d/0xd0 > > > [ 106.921833] __kmalloc_cache_node_noprof+0x5d/0x3f0 > > > [ 106.921842] blk_mq_init_tags+0x3d/0xb0 > > > [ 106.921851] blk_mq_alloc_map_and_rqs+0x4e/0x3d0 > > > [ 106.921860] blk_mq_init_sched+0x100/0x260 > > > [ 106.921868] elevator_switch+0x8d/0x2e0 > > > [ 106.921877] elv_iosched_store+0x174/0x1e0 > > > [ 106.921885] queue_attr_store+0x142/0x180 > > > [ 106.921893] kernfs_fop_write_iter+0x168/0x240 > > > [ 106.921902] vfs_write+0x2b2/0x540 > > > [ 106.921910] ksys_write+0x72/0xf0 > > > [ 106.921916] do_syscall_64+0x95/0x180 > > > [ 106.921925] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > > That is another regression from commit > > > > af2814149883 block: freeze the queue in queue_attr_store > > > > and queue_wb_lat_store() has same risk too. > > > > I will cook a patch to fix it. > > Thanks. Are these splats going to be silenced for 6.13-rc? Like having > the new lockdep checks under a special config until they are fixed? It is too late for v6.13, and Christoph's fix won't be available for v6.13 too. Thanks, Ming ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions 2025-01-13 0:55 ` Ming Lei @ 2025-01-13 8:48 ` Thomas Hellström 0 siblings, 0 replies; 14+ messages in thread From: Thomas Hellström @ 2025-01-13 8:48 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block Hi. On Mon, 2025-01-13 at 08:55 +0800, Ming Lei wrote: > On Sun, Jan 12, 2025 at 06:44:53PM +0100, Thomas Hellström wrote: > > On Sun, 2025-01-12 at 23:50 +0800, Ming Lei wrote: > > > On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote: > > > > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote: > > > > > > ... > > > > > > > > > > > Ah, You're right, it's a different warning this time. Posted > > > > the > > > > warning below. (Note: This is also with Christoph's series > > > > applied > > > > on > > > > top). > > > > > > > > May I also humbly suggest the following lockdep priming to be > > > > able > > > > to > > > > catch the reclaim lockdep splats early without reclaim needing > > > > to > > > > happen. That will also pick up splat #2 below. > > > > > > > > 8<------------------------------------------------------------- > > > > > > > > diff --git a/block/blk-core.c b/block/blk-core.c > > > > index 32fb28a6372c..2dd8dc9aed7f 100644 > > > > --- a/block/blk-core.c > > > > +++ b/block/blk-core.c > > > > @@ -458,6 +458,11 @@ struct request_queue > > > > *blk_alloc_queue(struct > > > > queue_limits *lim, int node_id) > > > > > > > > q->nr_requests = BLKDEV_DEFAULT_RQ; > > > > > > > > + fs_reclaim_acquire(GFP_KERNEL); > > > > + rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_); > > > > + rwsem_release(&q->io_lockdep_map, _RET_IP_); > > > > + fs_reclaim_release(GFP_KERNEL); > > > > + > > > > return q; > > > > > > Looks one nice idea for injecting fs_reclaim, maybe it can be > > > added to inject framework? > > > > For the intel gpu drivers, we typically always prime lockdep like > > this > > if we *know* that the lock will be grabbed during reclaim, like if > > it's > > part of shrinker processing or similar. > > > > So sooner or later we *know* this sequence will happen so we add it > > near the lock initialization to always be executed when the > > lock(map) > > is initialized. > > > > So I don't really see a need for them to be periodially injected? > > What I suggested is to add the verification for every allocation with > direct reclaim by one kernel config which depends on both lockdep and > fault inject. > > > > > > > > > > > > > > fail_stats: > > > > > > > > 8<------------------------------------------------------------- > > > > > > > > #1: > > > > 106.921533] > > > > ====================================================== > > > > [ 106.921716] WARNING: possible circular locking dependency > > > > detected > > > > [ 106.921725] 6.13.0-rc6+ #121 Tainted: G U > > > > [ 106.921734] ------------------------------------------------ > > > > ---- > > > > -- > > > > [ 106.921743] kswapd0/117 is trying to acquire lock: > > > > [ 106.921751] ffff8ff4e2da09f0 (&q- > > > > >q_usage_counter(io)){++++}- > > > > {0:0}, > > > > at: __submit_bio+0x80/0x220 > > > > [ 106.921769] > > > > but task is already holding lock: > > > > [ 106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at: > > > > balance_pgdat+0xe2/0xa10 > > > > [ 106.921791] > > > > which lock already depends on the new lock. > > > > > > > > [ 106.921803] > > > > the existing dependency chain (in reverse order) > > > > is: > > > > [ 106.921814] > > > > -> #1 (fs_reclaim){+.+.}-{0:0}: > > > > [ 106.921824] fs_reclaim_acquire+0x9d/0xd0 > > > > [ 106.921833] __kmalloc_cache_node_noprof+0x5d/0x3f0 > > > > [ 106.921842] blk_mq_init_tags+0x3d/0xb0 > > > > [ 106.921851] blk_mq_alloc_map_and_rqs+0x4e/0x3d0 > > > > [ 106.921860] blk_mq_init_sched+0x100/0x260 > > > > [ 106.921868] elevator_switch+0x8d/0x2e0 > > > > [ 106.921877] elv_iosched_store+0x174/0x1e0 > > > > [ 106.921885] queue_attr_store+0x142/0x180 > > > > [ 106.921893] kernfs_fop_write_iter+0x168/0x240 > > > > [ 106.921902] vfs_write+0x2b2/0x540 > > > > [ 106.921910] ksys_write+0x72/0xf0 > > > > [ 106.921916] do_syscall_64+0x95/0x180 > > > > [ 106.921925] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > > > > That is another regression from commit > > > > > > af2814149883 block: freeze the queue in queue_attr_store > > > > > > and queue_wb_lat_store() has same risk too. > > > > > > I will cook a patch to fix it. > > > > Thanks. Are these splats going to be silenced for 6.13-rc? Like > > having > > the new lockdep checks under a special config until they are fixed? > > It is too late for v6.13, and Christoph's fix won't be available for > v6.13 > too. Yeah, I was thinking more of the lockdep warnings themselves, rather than the actual deadlock fixing? Thanks, Thomas > > > Thanks, > Ming > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions 2025-01-12 11:33 ` Thomas Hellström 2025-01-12 15:50 ` Ming Lei @ 2025-01-13 9:28 ` Ming Lei 2025-01-13 9:58 ` Thomas Hellström 1 sibling, 1 reply; 14+ messages in thread From: Ming Lei @ 2025-01-13 9:28 UTC (permalink / raw) To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote: > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote: > > On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote: > > > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote: > > > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote: > > > > > Ming, Others > > > > > > > #2: > [ 5.595482] ====================================================== > [ 5.596353] WARNING: possible circular locking dependency detected > [ 5.597231] 6.13.0-rc6+ #122 Tainted: G U > [ 5.598182] ------------------------------------------------------ > [ 5.599149] (udev-worker)/867 is trying to acquire lock: > [ 5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4}, at: > kernfs_remove+0x31/0x50 > [ 5.600987] > but task is already holding lock: > [ 5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}- > {0:0}, at: blk_mq_freeze_queue+0x12/0x20 > [ 5.603033] > which lock already depends on the new lock. > > [ 5.603034] > the existing dependency chain (in reverse order) is: > [ 5.603035] > -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}: > [ 5.603038] blk_alloc_queue+0x319/0x350 > [ 5.603041] blk_mq_alloc_queue+0x63/0xd0 The above one is solved in for-6.14/block of block tree: block: track queue dying state automatically for modeling queue freeze lockdep q->q_usage_counter(io) is killed because disk isn't up yet. If you apply the noio patch against for-6.1/block, the two splats should have disappeared. If not, please post lockdep log. Thanks, Ming ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions 2025-01-13 9:28 ` Ming Lei @ 2025-01-13 9:58 ` Thomas Hellström 2025-01-13 10:40 ` Ming Lei 0 siblings, 1 reply; 14+ messages in thread From: Thomas Hellström @ 2025-01-13 9:58 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block Hi, On Mon, 2025-01-13 at 17:28 +0800, Ming Lei wrote: > On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote: > > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote: > > > On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote: > > > > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote: > > > > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström > > > > > wrote: > > > > > > Ming, Others > > > > > > > > > > #2: > > [ 5.595482] > > ====================================================== > > [ 5.596353] WARNING: possible circular locking dependency > > detected > > [ 5.597231] 6.13.0-rc6+ #122 Tainted: G U > > [ 5.598182] ---------------------------------------------------- > > -- > > [ 5.599149] (udev-worker)/867 is trying to acquire lock: > > [ 5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4}, > > at: > > kernfs_remove+0x31/0x50 > > [ 5.600987] > > but task is already holding lock: > > [ 5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}- > > {0:0}, at: blk_mq_freeze_queue+0x12/0x20 > > [ 5.603033] > > which lock already depends on the new lock. > > > > [ 5.603034] > > the existing dependency chain (in reverse order) is: > > [ 5.603035] > > -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}: > > [ 5.603038] blk_alloc_queue+0x319/0x350 > > [ 5.603041] blk_mq_alloc_queue+0x63/0xd0 > > The above one is solved in for-6.14/block of block tree: > > block: track queue dying state automatically for modeling > queue freeze lockdep > > q->q_usage_counter(io) is killed because disk isn't up yet. > > If you apply the noio patch against for-6.1/block, the two splats > should > have disappeared. If not, please post lockdep log. That above dependency path is the lockdep priming I suggested, which establishes the reclaim -> q->q_usage_counter(io) locking order. A splat without that priming would look slightly different and won't occur until memory is actually exhausted. But it *will* occur. That's why I suggested using the priming to catch all fs_reclaim- >q_usage_counter(io) violations early, perhaps already at system boot, and anybody accidently adding a GFP_KERNEL memory allocation under the q_usage_counter(io) lock would get a notification as soon as that allocation happens. The actual deadlock sequence is because kernfs_rwsem is taken under q_usage_counter(io): (excerpt from the report [a]). If the priming is removed, the splat doesn't happen until reclaim, and will instead look like [b]. Thanks, Thomas [a] [ 5.603115] Chain exists of: &root->kernfs_rwsem --> fs_reclaim --> &q- >q_usage_counter(io)#3 [ 5.603117] Possible unsafe locking scenario: [ 5.603117] CPU0 CPU1 [ 5.603117] ---- ---- [ 5.603118] lock(&q->q_usage_counter(io)#3); [ 5.603119] lock(fs_reclaim); [ 5.603119] lock(&q- >q_usage_counter(io)#3); [ 5.603120] lock(&root->kernfs_rwsem); [ 5.603121] *** DEADLOCK *** [ 5.603121] 6 locks held by (udev-worker)/867: [ 5.603122] #0: ffff9211c16dd420 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x72/0xf0 [ 5.603125] #1: ffff9211e28f3e88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x121/0x240 [ 5.603128] #2: ffff921203524f28 (kn->active#101){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x12a/0x240 [ 5.603131] #3: ffff9211e86f46d0 (&q->sysfs_lock){+.+.}-{4:4}, at: queue_attr_store+0x12b/0x180 [ 5.603133] #4: ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}- {0:0}, at: blk_mq_freeze_queue+0x12/0x20 [ 5.603136] #5: ffff9211e86f41d8 (&q- >q_usage_counter(queue)#3){++++}-{0:0}, at: blk_mq_freeze_queue+0x12/0x20 [ 5.603139] stack backtrace: [ 5.603140] CPU: 4 UID: 0 PID: 867 Comm: (udev-worker) Tainted: G U 6.13.0-rc6+ #122 [ 5.603142] Tainted: [U]=USER [ 5.603142] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023 [ 5.603143] Call Trace: [ 5.603144] <TASK> [ 5.603146] dump_stack_lvl+0x6e/0xa0 [ 5.603148] print_circular_bug.cold+0x178/0x1be [ 5.603151] check_noncircular+0x148/0x160 [ 5.603154] __lock_acquire+0x1339/0x2180 [ 5.603156] lock_acquire+0xd0/0x2e0 [ 5.603158] ? kernfs_remove+0x31/0x50 [ 5.603160] ? sysfs_remove_dir+0x32/0x60 [ 5.603162] ? lock_release+0xd2/0x2a0 [ 5.603164] down_write+0x2e/0xb0 [ 5.603165] ? kernfs_remove+0x31/0x50 [ 5.603166] kernfs_remove+0x31/0x50 [ 5. [b] [157.543591] ====================================================== [ 157.543778] WARNING: possible circular locking dependency detected [ 157.543787] 6.13.0-rc6+ #123 Tainted: G U [ 157.543796] ------------------------------------------------------ [ 157.543805] git/2856 is trying to acquire lock: [ 157.543812] ffff98b6bb882f10 (&q->q_usage_counter(io)#2){++++}- {0:0}, at: __submit_bio+0x80/0x220 [ 157.543830] but task is already holding lock: [ 157.543839] ffffffffad65e1c0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x348/0xea0 [ 157.543855] which lock already depends on the new lock. [ 157.543867] the existing dependency chain (in reverse order) is: [ 157.543878] -> #2 (fs_reclaim){+.+.}-{0:0}: [ 157.543888] fs_reclaim_acquire+0x9d/0xd0 [ 157.543896] kmem_cache_alloc_lru_noprof+0x57/0x3f0 [ 157.543906] alloc_inode+0x97/0xc0 [ 157.543913] iget_locked+0x141/0x310 [ 157.543921] kernfs_get_inode+0x1a/0xf0 [ 157.543929] kernfs_get_tree+0x17b/0x2c0 [ 157.543938] sysfs_get_tree+0x1a/0x40 [ 157.543945] vfs_get_tree+0x29/0xe0 [ 157.543953] path_mount+0x49a/0xbd0 [ 157.543960] __x64_sys_mount+0x119/0x150 [ 157.543968] do_syscall_64+0x95/0x180 [ 157.543977] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 157.543986] -> #1 (&root->kernfs_rwsem){++++}-{4:4}: [ 157.543997] down_write+0x2e/0xb0 [ 157.544004] kernfs_remove+0x31/0x50 [ 157.544012] __kobject_del+0x2e/0x90 [ 157.544020] kobject_del+0x13/0x30 [ 157.544026] elevator_switch+0x44/0x2e0 [ 157.544034] elv_iosched_store+0x174/0x1e0 [ 157.544043] queue_attr_store+0x165/0x1b0 [ 157.544050] kernfs_fop_write_iter+0x168/0x240 [ 157.544059] vfs_write+0x2b2/0x540 [ 157.544066] ksys_write+0x72/0xf0 [ 157.544073] do_syscall_64+0x95/0x180 [ 157.544081] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 157.544090] -> #0 (&q->q_usage_counter(io)#2){++++}-{0:0}: [ 157.544102] __lock_acquire+0x1339/0x2180 [ 157.544110] lock_acquire+0xd0/0x2e0 [ 157.544118] blk_mq_submit_bio+0x88b/0xb60 [ 157.544127] __submit_bio+0x80/0x220 [ 157.544135] submit_bio_noacct_nocheck+0x324/0x420 [ 157.544144] swap_writepage+0x399/0x580 [ 157.544152] pageout+0x129/0x2d0 [ 157.544160] shrink_folio_list+0x5a0/0xd80 [ 157.544168] evict_folios+0x27d/0x7b0 [ 157.544175] try_to_shrink_lruvec+0x21b/0x2b0 [ 157.544183] shrink_one+0x102/0x1f0 [ 157.544191] shrink_node+0xb8e/0x1300 [ 157.544198] do_try_to_free_pages+0xb3/0x580 [ 157.544206] try_to_free_pages+0xfa/0x2a0 [ 157.544214] __alloc_pages_slowpath.constprop.0+0x36f/0xea0 [ 157.544224] __alloc_pages_noprof+0x34c/0x390 [ 157.544233] alloc_pages_mpol_noprof+0xd7/0x1c0 [ 157.544241] pipe_write+0x3fc/0x7f0 [ 157.544574] vfs_write+0x401/0x540 [ 157.544917] ksys_write+0xd1/0xf0 [ 157.545246] do_syscall_64+0x95/0x180 [ 157.545576] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 157.545909] other info that might help us debug this: [ 157.546879] Chain exists of: &q->q_usage_counter(io)#2 --> &root->kernfs_rwsem --> fs_reclaim [ 157.547849] Possible unsafe locking scenario: [ 157.548483] CPU0 CPU1 [ 157.548795] ---- ---- [ 157.549098] lock(fs_reclaim); [ 157.549400] lock(&root- >kernfs_rwsem); [ 157.549705] lock(fs_reclaim); [ 157.550011] rlock(&q->q_usage_counter(io)#2); [ 157.550316] *** DEADLOCK *** [ 157.551194] 2 locks held by git/2856: [ 157.551490] #0: ffff98b6a221e068 (&pipe->mutex){+.+.}-{4:4}, at: pipe_write+0x5a/0x7f0 [ 157.551798] #1: ffffffffad65e1c0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x348/0xea0 [ 157.552115] stack backtrace: [ 157.552734] CPU: 5 UID: 1000 PID: 2856 Comm: git Tainted: G U 6.13.0-rc6+ #123 [ 157.553060] Tainted: [U]=USER [ 157.553383] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023 [ 157.553718] Call Trace: [ 157.554054] <TASK> [ 157.554389] dump_stack_lvl+0x6e/0xa0 [ 157.554725] print_circular_bug.cold+0x178/0x1be [ 157.555064] check_noncircular+0x148/0x160 [ 157.555408] ? __pfx_stack_trace_consume_entry+0x10/0x10 [ 157.555747] ? unwind_get_return_address+0x23/0x40 [ 157.556085] __lock_acquire+0x1339/0x2180 [ 157.556425] lock_acquire+0xd0/0x2e0 [ 157.556761] ? __submit_bio+0x80/0x220 [ 157.557110] ? blk_mq_submit_bio+0x860/0xb60 [ 157.557447] ? lock_release+0xd2/0x2a0 [ 157.557784] blk_mq_submit_bio+0x88b/0xb60 [ 157.558137] ? __submit_bio+0x80/0x220 [ 157.558476] __submit_bio+0x80/0x220 [ 157.558828] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 157.559166] ? submit_bio_noacct_nocheck+0x324/0x420 [ 157.559504] submit_bio_noacct_nocheck+0x324/0x420 [ 157.559863] swap_writepage+0x399/0x580 [ 157.560205] pageout+0x129/0x2d0 [ 157.560542] shrink_folio_list+0x5a0/0xd80 [ 157.560879] ? evict_folios+0x25d/0x7b0 [ 157.561212] evict_folios+0x27d/0x7b0 [ 157.561546] try_to_shrink_lruvec+0x21b/0x2b0 [ 157.561890] shrink_one+0x102/0x1f0 [ 157.562222] shrink_node+0xb8e/0x1300 [ 157.562554] ? shrink_node+0x9c1/0x1300 [ 157.562915] ? shrink_node+0xb64/0x1300 [ 157.563245] ? do_try_to_free_pages+0xb3/0x580 [ 157.563576] do_try_to_free_pages+0xb3/0x580 [ 157.563922] ? lock_release+0xd2/0x2a0 [ 157.564252] try_to_free_pages+0xfa/0x2a0 [ 157.564583] __alloc_pages_slowpath.constprop.0+0x36f/0xea0 [ 157.564946] ? lock_release+0xd2/0x2a0 [ 157.565279] __alloc_pages_noprof+0x34c/0x390 [ 157.565613] alloc_pages_mpol_noprof+0xd7/0x1c0 [ 157.565952] pipe_write+0x3fc/0x7f0 [ 157.566283] vfs_write+0x401/0x540 [ 157.566615] ksys_write+0xd1/0xf0 [ 157.566980] do_syscall_64+0x95/0x180 [ 157.567312] ? vfs_write+0x401/0x540 [ 157.567642] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 157.568001] ? syscall_exit_to_user_mode+0x97/0x290 [ 157.568331] ? do_syscall_64+0xa1/0x180 [ 157.568658] ? do_syscall_64+0xa1/0x180 [ 157.569012] ? syscall_exit_to_user_mode+0x97/0x290 [ 157.569337] ? do_syscall_64+0xa1/0x180 [ 157.569658] ? do_user_addr_fault+0x397/0x720 [ 157.569980] ? trace_hardirqs_off+0x4b/0xc0 [ 157.570300] ? clear_bhb_loop+0x45/0xa0 [ 157.570621] ? clear_bhb_loop+0x45/0xa0 [ 157.570968] ? clear_bhb_loop+0x45/0xa0 [ 157.571286] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 157.571605] RIP: 0033:0x7fdf1ec2d484 [ 157.571966] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d 45 9c 10 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 [ 157.572322] RSP: 002b:00007ffd0eb6d068 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 157.572692] RAX: ffffffffffffffda RBX: 0000000000000331 RCX: 00007fdf1ec2d484 [ 157.573093] RDX: 0000000000000331 RSI: 000055693fe2d660 RDI: 0000000000000001 [ 157.573470] RBP: 00007ffd0eb6d090 R08: 000055693fdc6010 R09: 0000000000000007 [ 157.573875] R10: 0000556941b97c70 R11: 0000000000000202 R12: 0000000000000331 [ 157.574249] R13: 000055693fe2d660 R14: 00007fdf1ed305c0 R15: 00007fdf1ed2de80 [ 157.574621] </TASK> > > Thanks, > Ming > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Blockdev 6.13-rc lockdep splat regressions 2025-01-13 9:58 ` Thomas Hellström @ 2025-01-13 10:40 ` Ming Lei 0 siblings, 0 replies; 14+ messages in thread From: Ming Lei @ 2025-01-13 10:40 UTC (permalink / raw) To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block On Mon, Jan 13, 2025 at 10:58:07AM +0100, Thomas Hellström wrote: > Hi, > > On Mon, 2025-01-13 at 17:28 +0800, Ming Lei wrote: > > On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote: > > > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote: > > > > On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote: > > > > > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote: > > > > > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström > > > > > > wrote: > > > > > > > Ming, Others > > > > > > > > > > > > > #2: > > > [ 5.595482] > > > ====================================================== > > > [ 5.596353] WARNING: possible circular locking dependency > > > detected > > > [ 5.597231] 6.13.0-rc6+ #122 Tainted: G U > > > [ 5.598182] ---------------------------------------------------- > > > -- > > > [ 5.599149] (udev-worker)/867 is trying to acquire lock: > > > [ 5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4}, > > > at: > > > kernfs_remove+0x31/0x50 > > > [ 5.600987] > > > but task is already holding lock: > > > [ 5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}- > > > {0:0}, at: blk_mq_freeze_queue+0x12/0x20 > > > [ 5.603033] > > > which lock already depends on the new lock. > > > > > > [ 5.603034] > > > the existing dependency chain (in reverse order) is: > > > [ 5.603035] > > > -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}: > > > [ 5.603038] blk_alloc_queue+0x319/0x350 > > > [ 5.603041] blk_mq_alloc_queue+0x63/0xd0 > > > > The above one is solved in for-6.14/block of block tree: > > > > block: track queue dying state automatically for modeling > > queue freeze lockdep > > > > q->q_usage_counter(io) is killed because disk isn't up yet. > > > > If you apply the noio patch against for-6.1/block, the two splats > > should > > have disappeared. If not, please post lockdep log. > > That above dependency path is the lockdep priming I suggested, which > establishes the reclaim -> q->q_usage_counter(io) locking order. > A splat without that priming would look slightly different and won't > occur until memory is actually exhausted. But it *will* occur. > > That's why I suggested using the priming to catch all fs_reclaim- > >q_usage_counter(io) violations early, perhaps already at system boot, > and anybody accidently adding a GFP_KERNEL memory allocation under the > q_usage_counter(io) lock would get a notification as soon as that > allocation happens. > > The actual deadlock sequence is because kernfs_rwsem is taken under > q_usage_counter(io): (excerpt from the report [a]). > If the priming is removed, the splat doesn't happen until reclaim, and > will instead look like [b]. Got it, [b] is new warning between 'echo /sys/block/$DEV/queue/scheduler' and fs reclaim from sysfs inode allocation. Three global or sub-system locks are involved: - fs_reclaim - root->kernfs_rwsem - q->queue_usage_counter(io) The problem exists since blk-mq scheduler is introduced, looks one hard problem because it becomes difficult to avoid their dependency now. I will think about and see if we can figure out one solution. Thanks, Ming ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-01-13 10:41 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-10 10:12 Blockdev 6.13-rc lockdep splat regressions Thomas Hellström 2025-01-10 10:14 ` Christoph Hellwig 2025-01-10 10:21 ` Thomas Hellström 2025-01-10 12:13 ` Ming Lei 2025-01-10 14:36 ` Thomas Hellström 2025-01-11 3:05 ` Ming Lei 2025-01-12 11:33 ` Thomas Hellström 2025-01-12 15:50 ` Ming Lei 2025-01-12 17:44 ` Thomas Hellström 2025-01-13 0:55 ` Ming Lei 2025-01-13 8:48 ` Thomas Hellström 2025-01-13 9:28 ` Ming Lei 2025-01-13 9:58 ` Thomas Hellström 2025-01-13 10:40 ` Ming Lei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).