Blockdev 6.13-rc lockdep splat regressions

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Blockdev 6.13-rc lockdep splat regressions
@ 2025-01-10 10:12 Thomas Hellström
  2025-01-10 10:14 ` Christoph Hellwig
  2025-01-10 12:13 ` Ming Lei
  0 siblings, 2 replies; 14+ messages in thread
From: Thomas Hellström @ 2025-01-10 10:12 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe; +Cc: Christoph Hellwig, linux-block

Ming, Others

On 6.13-rc6 I'm seeing a couple of lockdep splats which appear
introduced by the commit

f1be1788a32e ("block: model freeze & enter queue as lock for supporting
lockdep")

The first one happens when swap-outs start to a scsi disc,
Simple reproducer is to start a couple of parallel "gitk" on the kernel
repo and watch them exhaust available memory.

the second is easily triggered by entering a debugfs trace directory,
apparently triggering automount:
cd /sys/kernel/debug/tracing/events

Are you aware of these?
Thanks,
Thomas

#1
[  399.006581] ======================================================
[  399.006756] WARNING: possible circular locking dependency detected
[  399.006767] 6.12.0-rc4+ #1 Tainted: G     U           N
[  399.006776] ------------------------------------------------------
[  399.006801] kswapd0/116 is trying to acquire lock:
[  399.006810] ffff9a67a1284a28 (&q->q_usage_counter(io)){++++}-{0:0},
at: __submit_bio+0xf0/0x1c0
[  399.006845] 
               but task is already holding lock:
[  399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at:
balance_pgdat+0xe2/0xa20
[  399.006874] 
               which lock already depends on the new lock.

[  399.006890] 
               the existing dependency chain (in reverse order) is:
[  399.006905] 
               -> #1 (fs_reclaim){+.+.}-{0:0}:
[  399.006919]        fs_reclaim_acquire+0x9d/0xd0
[  399.006931]        __kmalloc_node_noprof+0xa5/0x460
[  399.006943]        sbitmap_init_node+0x84/0x1e0
[  399.006953]        scsi_realloc_sdev_budget_map+0xc8/0x1a0
[  399.006965]        scsi_add_lun+0x419/0x710
[  399.006976]        scsi_probe_and_add_lun+0x12d/0x450
[  399.006988]        __scsi_scan_target+0x112/0x230
[  399.006999]        scsi_scan_channel+0x59/0x90
[  399.007009]        scsi_scan_host_selected+0xe5/0x120
[  399.007021]        do_scan_async+0x1b/0x160
[  399.007031]        async_run_entry_fn+0x31/0x130
[  399.007043]        process_one_work+0x21a/0x590
[  399.007054]        worker_thread+0x1c3/0x3b0
[  399.007065]        kthread+0xd2/0x100
[  399.007074]        ret_from_fork+0x31/0x50
[  399.007085]        ret_from_fork_asm+0x1a/0x30
[  399.007096] 
               -> #0 (&q->q_usage_counter(io)){++++}-{0:0}:
[  399.007111]        __lock_acquire+0x13ac/0x2170
[  399.007123]        lock_acquire+0xd0/0x2f0
[  399.007134]        blk_mq_submit_bio+0x90b/0xb00
[  399.007145]        __submit_bio+0xf0/0x1c0
[  399.007155]        submit_bio_noacct_nocheck+0x324/0x420
[  399.007167]        swap_writepage+0x14a/0x2c0
[  399.007178]        pageout+0x129/0x2d0
[  399.007608]        shrink_folio_list+0x5a0/0xd80
[  399.008045]        evict_folios+0x27a/0x790
[  399.008486]        try_to_shrink_lruvec+0x225/0x2b0
[  399.008926]        shrink_one+0x102/0x1f0
[  399.009360]        shrink_node+0xa7c/0x1130
[  399.009821]        balance_pgdat+0x560/0xa20
[  399.010254]        kswapd+0x20a/0x440
[  399.010698]        kthread+0xd2/0x100
[  399.011141]        ret_from_fork+0x31/0x50
[  399.011584]        ret_from_fork_asm+0x1a/0x30
[  399.012024] 
               other info that might help us debug this:

[  399.013283]  Possible unsafe locking scenario:

[  399.014160]        CPU0                    CPU1
[  399.014584]        ----                    ----
[  399.015010]   lock(fs_reclaim);
[  399.015439]                                lock(&q-
>q_usage_counter(io));
[  399.015867]                                lock(fs_reclaim);
[  399.016208]   rlock(&q->q_usage_counter(io));
[  399.016538] 
                *** DEADLOCK ***

[  399.017539] 1 lock held by kswapd0/116:
[  399.017887]  #0: ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at:
balance_pgdat+0xe2/0xa20
[  399.018218] 
               stack backtrace:
[  399.018887] CPU: 11 UID: 0 PID: 116 Comm: kswapd0 Tainted: G     U 
N 6.12.0-rc4+ #1
[  399.019217] Tainted: [U]=USER, [N]=TEST
[  399.019543] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[  399.019911] Call Trace:
[  399.020235]  <TASK>
[  399.020556]  dump_stack_lvl+0x6e/0xa0
[  399.020890]  print_circular_bug.cold+0x178/0x1be
[  399.021207]  check_noncircular+0x148/0x160
[  399.021523]  __lock_acquire+0x13ac/0x2170
[  399.021852]  lock_acquire+0xd0/0x2f0
[  399.022167]  ? __submit_bio+0xf0/0x1c0
[  399.022489]  ? blk_mq_submit_bio+0x8e0/0xb00
[  399.022830]  ? lock_release+0xd3/0x2b0
[  399.023143]  blk_mq_submit_bio+0x90b/0xb00
[  399.023460]  ? __submit_bio+0xf0/0x1c0
[  399.023785]  ? lock_acquire+0xd0/0x2f0
[  399.024096]  __submit_bio+0xf0/0x1c0
[  399.024404]  submit_bio_noacct_nocheck+0x324/0x420
[  399.024713]  swap_writepage+0x14a/0x2c0
[  399.025037]  pageout+0x129/0x2d0
[  399.025348]  shrink_folio_list+0x5a0/0xd80
[  399.025668]  ? evict_folios+0x25a/0x790
[  399.026007]  evict_folios+0x27a/0x790
[  399.026325]  try_to_shrink_lruvec+0x225/0x2b0
[  399.026635]  shrink_one+0x102/0x1f0
[  399.026957]  ? shrink_node+0xa63/0x1130
[  399.027264]  shrink_node+0xa7c/0x1130
[  399.027570]  ? shrink_node+0x908/0x1130
[  399.027888]  balance_pgdat+0x560/0xa20
[  399.028197]  ? lockdep_hardirqs_on_prepare+0xdb/0x190
[  399.028510]  ? finish_task_switch.isra.0+0xc4/0x2a0
[  399.028829]  kswapd+0x20a/0x440
[  399.029136]  ? __pfx_autoremove_wake_function+0x10/0x10
[  399.029483]  ? __pfx_kswapd+0x10/0x10
[  399.029912]  kthread+0xd2/0x100
[  399.030306]  ? __pfx_kthread+0x10/0x10
[  399.030699]  ret_from_fork+0x31/0x50
[  399.031105]  ? __pfx_kthread+0x10/0x10
[  399.031520]  ret_from_fork_asm+0x1a/0x30
[  399.031934]  </TASK>

#2:
[   81.960829] ======================================================
[   81.961010] WARNING: possible circular locking dependency detected
[   81.961048] 6.12.0-rc4+ #3 Tainted: G     U            
[   81.961082] ------------------------------------------------------
[   81.961117] bash/2744 is trying to acquire lock:
[   81.961147] ffffffff8b6754d0 (namespace_sem){++++}-{4:4}, at:
finish_automount+0x77/0x3a0
[   81.961215] 
               but task is already holding lock:
[   81.961249] ffff8d7a8051ce50 (&sb->s_type->i_mutex_key#3){++++}-
{4:4}, at: finish_automount+0x6b/0x3a0
[   81.961316] 
               which lock already depends on the new lock.

[   81.961361] 
               the existing dependency chain (in reverse order) is:
[   81.961403] 
               -> #6 (&sb->s_type->i_mutex_key#3){++++}-{4:4}:
[   81.961452]        down_write+0x2e/0xb0
[   81.961486]        start_creating.part.0+0x5f/0x120
[   81.961523]        debugfs_create_dir+0x36/0x190
[   81.961557]        blk_register_queue+0xba/0x220
[   81.961594]        add_disk_fwnode+0x235/0x430
[   81.961626]        nvme_alloc_ns+0x7eb/0xb50 [nvme_core]
[   81.961696]        nvme_scan_ns+0x251/0x330 [nvme_core]
[   81.962053]        async_run_entry_fn+0x31/0x130
[   81.962088]        process_one_work+0x21a/0x590
[   81.962122]        worker_thread+0x1c3/0x3b0
[   81.962153]        kthread+0xd2/0x100
[   81.962179]        ret_from_fork+0x31/0x50
[   81.962211]        ret_from_fork_asm+0x1a/0x30
[   81.962243] 
               -> #5 (&q->debugfs_mutex){+.+.}-{4:4}:
[   81.962287]        __mutex_lock+0xad/0xb80
[   81.962319]        blk_mq_init_sched+0x181/0x260
[   81.962350]        elevator_init_mq+0xb0/0x100
[   81.962381]        add_disk_fwnode+0x50/0x430
[   81.962412]        sd_probe+0x335/0x530
[   81.962441]        really_probe+0xdb/0x340
[   81.962474]        __driver_probe_device+0x78/0x110
[   81.962510]        driver_probe_device+0x1f/0xa0
[   81.962545]        __device_attach_driver+0x89/0x110
[   81.962581]        bus_for_each_drv+0x98/0xf0
[   81.962613]        __device_attach_async_helper+0xa5/0xf0
[   81.962651]        async_run_entry_fn+0x31/0x130
[   81.962686]        process_one_work+0x21a/0x590
[   81.962718]        worker_thread+0x1c3/0x3b0
[   81.962750]        kthread+0xd2/0x100
[   81.962775]        ret_from_fork+0x31/0x50
[   81.962806]        ret_from_fork_asm+0x1a/0x30
[   81.962838] 
               -> #4 (&q->q_usage_counter(queue)#3){++++}-{0:0}:
[   81.962887]        blk_queue_enter+0x1bc/0x1e0
[   81.962918]        blk_mq_alloc_request+0x144/0x2b0
[   81.962951]        scsi_execute_cmd+0x78/0x490
[   81.962985]        read_capacity_16+0x111/0x410
[   81.963017]        sd_revalidate_disk.isra.0+0x545/0x2eb0
[   81.963053]        sd_probe+0x2eb/0x530
[   81.963081]        really_probe+0xdb/0x340
[   81.963112]        __driver_probe_device+0x78/0x110
[   81.963148]        driver_probe_device+0x1f/0xa0
[   81.963182]        __device_attach_driver+0x89/0x110
[   81.963218]        bus_for_each_drv+0x98/0xf0
[   81.963250]        __device_attach_async_helper+0xa5/0xf0
[   81.964380]        async_run_entry_fn+0x31/0x130
[   81.965502]        process_one_work+0x21a/0x590
[   81.965868]        worker_thread+0x1c3/0x3b0
[   81.966198]        kthread+0xd2/0x100
[   81.966528]        ret_from_fork+0x31/0x50
[   81.966855]        ret_from_fork_asm+0x1a/0x30
[   81.967179] 
               -> #3 (&q->limits_lock){+.+.}-{4:4}:
[   81.967815]        __mutex_lock+0xad/0xb80
[   81.968133]        nvme_update_ns_info_block+0x128/0x870 [nvme_core]
[   81.968456]        nvme_update_ns_info+0x41/0x220 [nvme_core]
[   81.968774]        nvme_alloc_ns+0x8a6/0xb50 [nvme_core]
[   81.969090]        nvme_scan_ns+0x251/0x330 [nvme_core]
[   81.969401]        async_run_entry_fn+0x31/0x130
[   81.969703]        process_one_work+0x21a/0x590
[   81.970004]        worker_thread+0x1c3/0x3b0
[   81.970302]        kthread+0xd2/0x100
[   81.970603]        ret_from_fork+0x31/0x50
[   81.970901]        ret_from_fork_asm+0x1a/0x30
[   81.971195] 
               -> #2 (&q->q_usage_counter(io)){++++}-{0:0}:
[   81.971776]        blk_mq_submit_bio+0x90b/0xb00
[   81.972071]        __submit_bio+0xf0/0x1c0
[   81.972364]        submit_bio_noacct_nocheck+0x324/0x420
[   81.972660]        btrfs_submit_chunk+0x1a7/0x660
[   81.972956]        btrfs_submit_bbio+0x1a/0x30
[   81.973250]        read_extent_buffer_pages+0x15e/0x210
[   81.973546]        btrfs_read_extent_buffer+0x93/0x180
[   81.973841]        read_tree_block+0x30/0x60
[   81.974137]        read_block_for_search+0x219/0x320
[   81.974432]        btrfs_search_slot+0x335/0xd30
[   81.974729]        btrfs_init_root_free_objectid+0x90/0x130
[   81.975027]        open_ctree+0xa35/0x13eb
[   81.975326]        btrfs_get_tree.cold+0x6b/0xfd
[   81.975627]        vfs_get_tree+0x29/0xe0
[   81.975926]        fc_mount+0x12/0x40
[   81.976223]        btrfs_get_tree+0x2c1/0x6b0
[   81.976520]        vfs_get_tree+0x29/0xe0
[   81.976816]        vfs_cmd_create+0x59/0xe0
[   81.977112]        __do_sys_fsconfig+0x4f3/0x6c0
[   81.977408]        do_syscall_64+0x95/0x180
[   81.977705]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   81.978005] 
               -> #1 (btrfs-root-01){++++}-{4:4}:
[   81.978595]        down_read_nested+0x34/0x150
[   81.978895]        btrfs_tree_read_lock_nested+0x25/0xd0
[   81.979195]        btrfs_read_lock_root_node+0x44/0xe0
[   81.979494]        btrfs_search_slot+0x143/0xd30
[   81.979793]        btrfs_search_backwards+0x2e/0x90
[   81.980108]        btrfs_get_subvol_name_from_objectid+0xd8/0x3c0
[   81.980409]        btrfs_show_options+0x294/0x780
[   81.980718]        show_mountinfo+0x207/0x2a0
[   81.981025]        seq_read_iter+0x2bc/0x480
[   81.981327]        vfs_read+0x294/0x370
[   81.981628]        ksys_read+0x73/0xf0
[   81.981925]        do_syscall_64+0x95/0x180
[   81.982222]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   81.982523] 
               -> #0 (namespace_sem){++++}-{4:4}:
[   81.983116]        __lock_acquire+0x13ac/0x2170
[   81.983415]        lock_acquire+0xd0/0x2f0
[   81.983714]        down_write+0x2e/0xb0
[   81.984014]        finish_automount+0x77/0x3a0
[   81.984313]        __traverse_mounts+0x9d/0x210
[   81.984612]        step_into+0x349/0x770
[   81.984909]        link_path_walk.part.0.constprop.0+0x21e/0x390
[   81.985211]        path_lookupat+0x3e/0x1a0
[   81.985511]        filename_lookup+0xde/0x1d0
[   81.985811]        vfs_statx+0x8d/0x100
[   81.986108]        vfs_fstatat+0x63/0xc0
[   81.986405]        __do_sys_newfstatat+0x3c/0x80
[   81.986704]        do_syscall_64+0x95/0x180
[   81.987005]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   81.987307] 
               other info that might help us debug this:

[   81.988204] Chain exists of:
                 namespace_sem --> &q->debugfs_mutex --> &sb->s_type-
>i_mutex_key#3

[   81.989113]  Possible unsafe locking scenario:

[   81.989723]        CPU0                    CPU1
[   81.990028]        ----                    ----
[   81.990331]   lock(&sb->s_type->i_mutex_key#3);
[   81.990637]                                lock(&q->debugfs_mutex);
[   81.990947]                                lock(&sb->s_type-
>i_mutex_key#3);
[   81.991257]   lock(namespace_sem);
[   81.991565] 
                *** DEADLOCK ***

[   81.992465] 1 lock held by bash/2744:
[   81.992766]  #0: ffff8d7a8051ce50 (&sb->s_type-
>i_mutex_key#3){++++}-{4:4}, at: finish_automount+0x6b/0x3a0
[   81.993084] 
               stack backtrace:
[   81.993704] CPU: 2 UID: 0 PID: 2744 Comm: bash Tainted: G     U    
6.12.0-rc4+ #3
[   81.994025] Tainted: [U]=USER
[   81.994343] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[   81.994673] Call Trace:
[   81.995002]  <TASK>
[   81.995328]  dump_stack_lvl+0x6e/0xa0
[   81.995657]  print_circular_bug.cold+0x178/0x1be
[   81.995987]  check_noncircular+0x148/0x160
[   81.996318]  __lock_acquire+0x13ac/0x2170
[   81.996649]  lock_acquire+0xd0/0x2f0
[   81.996978]  ? finish_automount+0x77/0x3a0
[   81.997328]  ? vfs_kern_mount.part.0+0x50/0xb0
[   81.997660]  ? kfree+0xd8/0x370
[   81.997991]  down_write+0x2e/0xb0
[   81.998318]  ? finish_automount+0x77/0x3a0
[   81.998646]  finish_automount+0x77/0x3a0
[   81.998975]  __traverse_mounts+0x9d/0x210
[   81.999303]  step_into+0x349/0x770
[   81.999629]  link_path_walk.part.0.constprop.0+0x21e/0x390
[   81.999958]  path_lookupat+0x3e/0x1a0
[   82.000287]  filename_lookup+0xde/0x1d0
[   82.000618]  vfs_statx+0x8d/0x100
[   82.000944]  ? strncpy_from_user+0x22/0xf0
[   82.001271]  vfs_fstatat+0x63/0xc0
[   82.001614]  __do_sys_newfstatat+0x3c/0x80
[   82.001943]  do_syscall_64+0x95/0x180
[   82.002270]  ? lockdep_hardirqs_on_prepare+0xdb/0x190
[   82.002601]  ? syscall_exit_to_user_mode+0x97/0x290
[   82.002933]  ? do_syscall_64+0xa1/0x180
[   82.003262]  ? lockdep_hardirqs_on_prepare+0xdb/0x190
[   82.003592]  ? syscall_exit_to_user_mode+0x97/0x290
[   82.003923]  ? clear_bhb_loop+0x45/0xa0
[   82.004251]  ? clear_bhb_loop+0x45/0xa0
[   82.004576]  ? clear_bhb_loop+0x45/0xa0
[   82.004899]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   82.005224] RIP: 0033:0x7f82fa5ef73e
[   82.005558] Code: 0f 1f 40 00 48 8b 15 d1 66 10 00 f7 d8 64 89 02 b8
ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 41 89 ca b8 06 01 00 00 0f
05 <3d> 00 f0 ff ff 77 0b 31 c0 c3 0f 1f 84 00 00 00 00 00 48 8b 15 99
[   82.005925] RSP: 002b:00007ffd2e30cd98 EFLAGS: 00000246 ORIG_RAX:
0000000000000106
[   82.006301] RAX: ffffffffffffffda RBX: 000055dfa1952b30 RCX:
00007f82fa5ef73e
[   82.006678] RDX: 00007ffd2e30cdc0 RSI: 000055dfa1952b10 RDI:
00000000ffffff9c
[   82.007057] RBP: 00007ffd2e30ce90 R08: 000055dfa1952b30 R09:
00007f82fa6f6b20
[   82.007437] R10: 0000000000000000 R11: 0000000000000246 R12:
000055dfa1952b11
[   82.007822] R13: 000055dfa1952b30 R14: 000055dfa1952b11 R15:
0000000000000003
[   82.008209]  </TASK>
 




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Blockdev 6.13-rc lockdep splat regressions
  2025-01-10 10:12 Blockdev 6.13-rc lockdep splat regressions Thomas Hellström
@ 2025-01-10 10:14 ` Christoph Hellwig
  2025-01-10 10:21   ` Thomas Hellström
  2025-01-10 12:13 ` Ming Lei
  1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2025-01-10 10:14 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Ming Lei, Jens Axboe, Christoph Hellwig, linux-block

On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote:
> Ming, Others
> 
> On 6.13-rc6 I'm seeing a couple of lockdep splats which appear
> introduced by the commit
> 
> f1be1788a32e ("block: model freeze & enter queue as lock for supporting
> lockdep")
> 
> The first one happens when swap-outs start to a scsi disc,
> Simple reproducer is to start a couple of parallel "gitk" on the kernel
> repo and watch them exhaust available memory.
> 
> the second is easily triggered by entering a debugfs trace directory,
> apparently triggering automount:
> cd /sys/kernel/debug/tracing/events
> 
> Are you aware of these?

Yes, this series fixes it:

https://lore.kernel.org/linux-block/20250110054726.1499538-1-hch@lst.de/

should be ready now that the nitpicking has settled down.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Blockdev 6.13-rc lockdep splat regressions
  2025-01-10 10:14 ` Christoph Hellwig
@ 2025-01-10 10:21   ` Thomas Hellström
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2025-01-10 10:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ming Lei, Jens Axboe, linux-block

On Fri, 2025-01-10 at 11:14 +0100, Christoph Hellwig wrote:
> On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote:
> > Ming, Others
> > 
> > On 6.13-rc6 I'm seeing a couple of lockdep splats which appear
> > introduced by the commit
> > 
> > f1be1788a32e ("block: model freeze & enter queue as lock for
> > supporting
> > lockdep")
> > 
> > The first one happens when swap-outs start to a scsi disc,
> > Simple reproducer is to start a couple of parallel "gitk" on the
> > kernel
> > repo and watch them exhaust available memory.
> > 
> > the second is easily triggered by entering a debugfs trace
> > directory,
> > apparently triggering automount:
> > cd /sys/kernel/debug/tracing/events
> > 
> > Are you aware of these?
> 
> Yes, this series fixes it:
> 
> https://lore.kernel.org/linux-block/20250110054726.1499538-1-hch@lst.de/
> 
> should be ready now that the nitpicking has settled down.
> 

Great. Thanks.
/Thomas


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Blockdev 6.13-rc lockdep splat regressions
  2025-01-10 10:12 Blockdev 6.13-rc lockdep splat regressions Thomas Hellström
  2025-01-10 10:14 ` Christoph Hellwig
@ 2025-01-10 12:13 ` Ming Lei
  2025-01-10 14:36   ` Thomas Hellström
  1 sibling, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-01-10 12:13 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block

On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote:
> Ming, Others
> 
> On 6.13-rc6 I'm seeing a couple of lockdep splats which appear
> introduced by the commit
> 
> f1be1788a32e ("block: model freeze & enter queue as lock for supporting
> lockdep")

The freeze lock connects all kinds of sub-system locks, that is why
we see lots of warnings after the commit is merged.

...

> #1
> [  399.006581] ======================================================
> [  399.006756] WARNING: possible circular locking dependency detected
> [  399.006767] 6.12.0-rc4+ #1 Tainted: G     U           N
> [  399.006776] ------------------------------------------------------
> [  399.006801] kswapd0/116 is trying to acquire lock:
> [  399.006810] ffff9a67a1284a28 (&q->q_usage_counter(io)){++++}-{0:0},
> at: __submit_bio+0xf0/0x1c0
> [  399.006845] 
>                but task is already holding lock:
> [  399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at:
> balance_pgdat+0xe2/0xa20
> [  399.006874] 

The above one is solved in for-6.14/block of block tree:

	block: track queue dying state automatically for modeling queue freeze lockdep

> 
> #2:
> [   81.960829] ======================================================
> [   81.961010] WARNING: possible circular locking dependency detected
> [   81.961048] 6.12.0-rc4+ #3 Tainted: G     U            

...

>                -> #3 (&q->limits_lock){+.+.}-{4:4}:
> [   81.967815]        __mutex_lock+0xad/0xb80
> [   81.968133]        nvme_update_ns_info_block+0x128/0x870 [nvme_core]
> [   81.968456]        nvme_update_ns_info+0x41/0x220 [nvme_core]
> [   81.968774]        nvme_alloc_ns+0x8a6/0xb50 [nvme_core]
> [   81.969090]        nvme_scan_ns+0x251/0x330 [nvme_core]
> [   81.969401]        async_run_entry_fn+0x31/0x130
> [   81.969703]        process_one_work+0x21a/0x590
> [   81.970004]        worker_thread+0x1c3/0x3b0
> [   81.970302]        kthread+0xd2/0x100
> [   81.970603]        ret_from_fork+0x31/0x50
> [   81.970901]        ret_from_fork_asm+0x1a/0x30
> [   81.971195] 
>                -> #2 (&q->q_usage_counter(io)){++++}-{0:0}:

The above dependency is killed by Christoph's patch.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Blockdev 6.13-rc lockdep splat regressions
  2025-01-10 12:13 ` Ming Lei
@ 2025-01-10 14:36   ` Thomas Hellström
  2025-01-11  3:05     ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2025-01-10 14:36 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block

On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote:
> On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote:
> > Ming, Others
> > 
> > On 6.13-rc6 I'm seeing a couple of lockdep splats which appear
> > introduced by the commit
> > 
> > f1be1788a32e ("block: model freeze & enter queue as lock for
> > supporting
> > lockdep")
> 
> The freeze lock connects all kinds of sub-system locks, that is why
> we see lots of warnings after the commit is merged.
> 
> ...
> 
> > #1
> > [  399.006581]
> > ======================================================
> > [  399.006756] WARNING: possible circular locking dependency
> > detected
> > [  399.006767] 6.12.0-rc4+ #1 Tainted: G     U           N
> > [  399.006776] ----------------------------------------------------
> > --
> > [  399.006801] kswapd0/116 is trying to acquire lock:
> > [  399.006810] ffff9a67a1284a28 (&q->q_usage_counter(io)){++++}-
> > {0:0},
> > at: __submit_bio+0xf0/0x1c0
> > [  399.006845] 
> >                but task is already holding lock:
> > [  399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at:
> > balance_pgdat+0xe2/0xa20
> > [  399.006874] 
> 
> The above one is solved in for-6.14/block of block tree:
> 
> 	block: track queue dying state automatically for modeling
> queue freeze lockdep

Hmm. I applied this series:

https://patchwork.kernel.org/project/linux-block/list/?series=912824&archive=both

on top of -rc6, but it didn't resolve that splat. Am I using the
correct patches?

Perhaps it might be a good idea to reclaim-prime those lockdep maps
taken during reclaim to have the splats happen earlier.

Thanks,
Thomas


> 
> > 
> > #2:
> > [   81.960829]
> > ======================================================
> > [   81.961010] WARNING: possible circular locking dependency
> > detected
> > [   81.961048] 6.12.0-rc4+ #3 Tainted: G     U            
> 
> ...
> 
> >                -> #3 (&q->limits_lock){+.+.}-{4:4}:
> > [   81.967815]        __mutex_lock+0xad/0xb80
> > [   81.968133]        nvme_update_ns_info_block+0x128/0x870
> > [nvme_core]
> > [   81.968456]        nvme_update_ns_info+0x41/0x220 [nvme_core]
> > [   81.968774]        nvme_alloc_ns+0x8a6/0xb50 [nvme_core]
> > [   81.969090]        nvme_scan_ns+0x251/0x330 [nvme_core]
> > [   81.969401]        async_run_entry_fn+0x31/0x130
> > [   81.969703]        process_one_work+0x21a/0x590
> > [   81.970004]        worker_thread+0x1c3/0x3b0
> > [   81.970302]        kthread+0xd2/0x100
> > [   81.970603]        ret_from_fork+0x31/0x50
> > [   81.970901]        ret_from_fork_asm+0x1a/0x30
> > [   81.971195] 
> >                -> #2 (&q->q_usage_counter(io)){++++}-{0:0}:
> 
> The above dependency is killed by Christoph's patch.
> 
> 
> Thanks,
> Ming
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Blockdev 6.13-rc lockdep splat regressions
  2025-01-10 14:36   ` Thomas Hellström
@ 2025-01-11  3:05     ` Ming Lei
  2025-01-12 11:33       ` Thomas Hellström
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-01-11  3:05 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block

On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote:
> On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote:
> > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote:
> > > Ming, Others
> > > 
> > > On 6.13-rc6 I'm seeing a couple of lockdep splats which appear
> > > introduced by the commit
> > > 
> > > f1be1788a32e ("block: model freeze & enter queue as lock for
> > > supporting
> > > lockdep")
> > 
> > The freeze lock connects all kinds of sub-system locks, that is why
> > we see lots of warnings after the commit is merged.
> > 
> > ...
> > 
> > > #1
> > > [  399.006581]
> > > ======================================================
> > > [  399.006756] WARNING: possible circular locking dependency
> > > detected
> > > [  399.006767] 6.12.0-rc4+ #1 Tainted: G     U           N
> > > [  399.006776] ----------------------------------------------------
> > > --
> > > [  399.006801] kswapd0/116 is trying to acquire lock:
> > > [  399.006810] ffff9a67a1284a28 (&q->q_usage_counter(io)){++++}-
> > > {0:0},
> > > at: __submit_bio+0xf0/0x1c0
> > > [  399.006845] 
> > >                but task is already holding lock:
> > > [  399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at:
> > > balance_pgdat+0xe2/0xa20
> > > [  399.006874] 
> > 
> > The above one is solved in for-6.14/block of block tree:
> > 
> > 	block: track queue dying state automatically for modeling
> > queue freeze lockdep
> 
> Hmm. I applied this series:
> 
> https://patchwork.kernel.org/project/linux-block/list/?series=912824&archive=both
> 
> on top of -rc6, but it didn't resolve that splat. Am I using the
> correct patches?
> 
> Perhaps it might be a good idea to reclaim-prime those lockdep maps
> taken during reclaim to have the splats happen earlier.

for-6.14/block does kill the dependency between fs_reclaim and
q->q_usage_counter(io) in scsi_add_lun() when scsi disk isn't
added yet.

Maybe it is another warning, care to post the warning log here?


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Blockdev 6.13-rc lockdep splat regressions
  2025-01-11  3:05     ` Ming Lei
@ 2025-01-12 11:33       ` Thomas Hellström
  2025-01-12 15:50         ` Ming Lei
  2025-01-13  9:28         ` Ming Lei
  0 siblings, 2 replies; 14+ messages in thread
From: Thomas Hellström @ 2025-01-12 11:33 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block

On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
> On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote:
> > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote:
> > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote:
> > > > Ming, Others
> > > > 
> > > > On 6.13-rc6 I'm seeing a couple of lockdep splats which appear
> > > > introduced by the commit
> > > > 
> > > > f1be1788a32e ("block: model freeze & enter queue as lock for
> > > > supporting
> > > > lockdep")
> > > 
> > > The freeze lock connects all kinds of sub-system locks, that is
> > > why
> > > we see lots of warnings after the commit is merged.
> > > 
> > > ...
> > > 
> > > > #1
> > > > [  399.006581]
> > > > ======================================================
> > > > [  399.006756] WARNING: possible circular locking dependency
> > > > detected
> > > > [  399.006767] 6.12.0-rc4+ #1 Tainted: G     U           N
> > > > [  399.006776] ------------------------------------------------
> > > > ----
> > > > --
> > > > [  399.006801] kswapd0/116 is trying to acquire lock:
> > > > [  399.006810] ffff9a67a1284a28 (&q-
> > > > >q_usage_counter(io)){++++}-
> > > > {0:0},
> > > > at: __submit_bio+0xf0/0x1c0
> > > > [  399.006845] 
> > > >                but task is already holding lock:
> > > > [  399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at:
> > > > balance_pgdat+0xe2/0xa20
> > > > [  399.006874] 
> > > 
> > > The above one is solved in for-6.14/block of block tree:
> > > 
> > > 	block: track queue dying state automatically for
> > > modeling
> > > queue freeze lockdep
> > 
> > Hmm. I applied this series:
> > 
> > https://patchwork.kernel.org/project/linux-block/list/?series=912824&archive=both
> > 
> > on top of -rc6, but it didn't resolve that splat. Am I using the
> > correct patches?
> > 
> > Perhaps it might be a good idea to reclaim-prime those lockdep maps
> > taken during reclaim to have the splats happen earlier.
> 
> for-6.14/block does kill the dependency between fs_reclaim and
> q->q_usage_counter(io) in scsi_add_lun() when scsi disk isn't
> added yet.
> 
> Maybe it is another warning, care to post the warning log here?

Ah, You're right, it's a different warning this time. Posted the
warning below. (Note: This is also with Christoph's series applied on
top).

May I also humbly suggest the following lockdep priming to be able to
catch the reclaim lockdep splats early without reclaim needing to
happen. That will also pick up splat #2 below.

8<-------------------------------------------------------------

diff --git a/block/blk-core.c b/block/blk-core.c
index 32fb28a6372c..2dd8dc9aed7f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -458,6 +458,11 @@ struct request_queue *blk_alloc_queue(struct
queue_limits *lim, int node_id)
 
        q->nr_requests = BLKDEV_DEFAULT_RQ;
 
+       fs_reclaim_acquire(GFP_KERNEL);
+       rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_);
+       rwsem_release(&q->io_lockdep_map, _RET_IP_);
+       fs_reclaim_release(GFP_KERNEL);
+
        return q;
 
 fail_stats:

8<-------------------------------------------------------------

#1:
  106.921533] ======================================================
[  106.921716] WARNING: possible circular locking dependency detected
[  106.921725] 6.13.0-rc6+ #121 Tainted: G     U            
[  106.921734] ------------------------------------------------------
[  106.921743] kswapd0/117 is trying to acquire lock:
[  106.921751] ffff8ff4e2da09f0 (&q->q_usage_counter(io)){++++}-{0:0},
at: __submit_bio+0x80/0x220
[  106.921769] 
               but task is already holding lock:
[  106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
balance_pgdat+0xe2/0xa10
[  106.921791] 
               which lock already depends on the new lock.

[  106.921803] 
               the existing dependency chain (in reverse order) is:
[  106.921814] 
               -> #1 (fs_reclaim){+.+.}-{0:0}:
[  106.921824]        fs_reclaim_acquire+0x9d/0xd0
[  106.921833]        __kmalloc_cache_node_noprof+0x5d/0x3f0
[  106.921842]        blk_mq_init_tags+0x3d/0xb0
[  106.921851]        blk_mq_alloc_map_and_rqs+0x4e/0x3d0
[  106.921860]        blk_mq_init_sched+0x100/0x260
[  106.921868]        elevator_switch+0x8d/0x2e0
[  106.921877]        elv_iosched_store+0x174/0x1e0
[  106.921885]        queue_attr_store+0x142/0x180
[  106.921893]        kernfs_fop_write_iter+0x168/0x240
[  106.921902]        vfs_write+0x2b2/0x540
[  106.921910]        ksys_write+0x72/0xf0
[  106.921916]        do_syscall_64+0x95/0x180
[  106.921925]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  106.921935] 
               -> #0 (&q->q_usage_counter(io)){++++}-{0:0}:
[  106.921946]        __lock_acquire+0x1339/0x2180
[  106.921955]        lock_acquire+0xd0/0x2e0
[  106.921963]        blk_mq_submit_bio+0x88b/0xb60
[  106.921972]        __submit_bio+0x80/0x220
[  106.921980]        submit_bio_noacct_nocheck+0x324/0x420
[  106.921989]        swap_writepage+0x399/0x580
[  106.921997]        pageout+0x129/0x2d0
[  106.922005]        shrink_folio_list+0x5a0/0xd80
[  106.922013]        evict_folios+0x27d/0x7b0
[  106.922020]        try_to_shrink_lruvec+0x21b/0x2b0
[  106.922028]        shrink_one+0x102/0x1f0
[  106.922035]        shrink_node+0xb8e/0x1300
[  106.922043]        balance_pgdat+0x550/0xa10
[  106.922050]        kswapd+0x20a/0x440
[  106.922057]        kthread+0xd2/0x100
[  106.922064]        ret_from_fork+0x31/0x50
[  106.922072]        ret_from_fork_asm+0x1a/0x30
[  106.922080] 
               other info that might help us debug this:

[  106.922092]  Possible unsafe locking scenario:

[  106.922101]        CPU0                    CPU1
[  106.922108]        ----                    ----
[  106.922115]   lock(fs_reclaim);
[  106.922121]                                lock(&q-
>q_usage_counter(io));
[  106.922132]                                lock(fs_reclaim);
[  106.922141]   rlock(&q->q_usage_counter(io));
[  106.922148] 
                *** DEADLOCK ***

[  106.922476] 1 lock held by kswapd0/117:
[  106.922802]  #0: ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
balance_pgdat+0xe2/0xa10
[  106.923138] 
               stack backtrace:
[  106.923806] CPU: 3 UID: 0 PID: 117 Comm: kswapd0 Tainted: G     U  
6.13.0-rc6+ #121
[  106.924173] Tainted: [U]=USER
[  106.924523] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[  106.924882] Call Trace:
[  106.925223]  <TASK>
[  106.925559]  dump_stack_lvl+0x6e/0xa0
[  106.925893]  print_circular_bug.cold+0x178/0x1be
[  106.926233]  check_noncircular+0x148/0x160
[  106.926565]  ? unwind_next_frame+0x42a/0x750
[  106.926905]  __lock_acquire+0x1339/0x2180
[  106.927227]  lock_acquire+0xd0/0x2e0
[  106.927546]  ? __submit_bio+0x80/0x220
[  106.927892]  ? blk_mq_submit_bio+0x860/0xb60
[  106.928212]  ? lock_release+0xd2/0x2a0
[  106.928536]  blk_mq_submit_bio+0x88b/0xb60
[  106.928850]  ? __submit_bio+0x80/0x220
[  106.929184]  __submit_bio+0x80/0x220
[  106.929499]  ? lockdep_hardirqs_on_prepare+0xdb/0x190
[  106.929833]  ? submit_bio_noacct_nocheck+0x324/0x420
[  106.930147]  submit_bio_noacct_nocheck+0x324/0x420
[  106.930464]  swap_writepage+0x399/0x580
[  106.930794]  pageout+0x129/0x2d0
[  106.931114]  shrink_folio_list+0x5a0/0xd80
[  106.931447]  ? evict_folios+0x25d/0x7b0
[  106.931776]  evict_folios+0x27d/0x7b0
[  106.932092]  try_to_shrink_lruvec+0x21b/0x2b0
[  106.932410]  shrink_one+0x102/0x1f0
[  106.932742]  shrink_node+0xb8e/0x1300
[  106.933056]  ? shrink_node+0x9c1/0x1300
[  106.933368]  ? shrink_node+0xb64/0x1300
[  106.933679]  ? balance_pgdat+0x550/0xa10
[  106.933988]  balance_pgdat+0x550/0xa10
[  106.934296]  ? lockdep_hardirqs_on_prepare+0xdb/0x190
[  106.934607]  ? finish_task_switch.isra.0+0xc4/0x2a0
[  106.934920]  kswapd+0x20a/0x440
[  106.935229]  ? __pfx_autoremove_wake_function+0x10/0x10
[  106.935542]  ? __pfx_kswapd+0x10/0x10
[  106.935881]  kthread+0xd2/0x100
[  106.936191]  ? __pfx_kthread+0x10/0x10
[  106.936501]  ret_from_fork+0x31/0x50
[  106.936810]  ? __pfx_kthread+0x10/0x10
[  106.937120]  ret_from_fork_asm+0x1a/0x30
[  106.937433]  </TASK>

#2:
[    5.595482] ======================================================
[    5.596353] WARNING: possible circular locking dependency detected
[    5.597231] 6.13.0-rc6+ #122 Tainted: G     U            
[    5.598182] ------------------------------------------------------
[    5.599149] (udev-worker)/867 is trying to acquire lock:
[    5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4}, at:
kernfs_remove+0x31/0x50
[    5.600987] 
               but task is already holding lock:
[    5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}-
{0:0}, at: blk_mq_freeze_queue+0x12/0x20
[    5.603033] 
               which lock already depends on the new lock.

[    5.603034] 
               the existing dependency chain (in reverse order) is:
[    5.603035] 
               -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}:
[    5.603038]        blk_alloc_queue+0x319/0x350
[    5.603041]        blk_mq_alloc_queue+0x63/0xd0
[    5.603043]        scsi_alloc_sdev+0x281/0x3c0
[    5.603045]        scsi_probe_and_add_lun+0x1f5/0x450
[    5.603046]        __scsi_scan_target+0x112/0x230
[    5.603048]        scsi_scan_channel+0x59/0x90
[    5.603049]        scsi_scan_host_selected+0xe5/0x120
[    5.603051]        do_scan_async+0x1b/0x160
[    5.603052]        async_run_entry_fn+0x31/0x130
[    5.603055]        process_one_work+0x21a/0x590
[    5.603058]        worker_thread+0x1c3/0x3b0
[    5.603059]        kthread+0xd2/0x100
[    5.603061]        ret_from_fork+0x31/0x50
[    5.603064]        ret_from_fork_asm+0x1a/0x30
[    5.603066] 
               -> #1 (fs_reclaim){+.+.}-{0:0}:
[    5.603068]        fs_reclaim_acquire+0x9d/0xd0
[    5.603070]        kmem_cache_alloc_lru_noprof+0x57/0x3f0
[    5.603072]        alloc_inode+0x97/0xc0
[    5.603074]        iget_locked+0x141/0x310
[    5.603076]        kernfs_get_inode+0x1a/0xf0
[    5.603077]        kernfs_get_tree+0x17b/0x2c0
[    5.603080]        sysfs_get_tree+0x1a/0x40
[    5.603081]        vfs_get_tree+0x29/0xe0
[    5.603083]        path_mount+0x49a/0xbd0
[    5.603085]        __x64_sys_mount+0x119/0x150
[    5.603086]        do_syscall_64+0x95/0x180
[    5.603089]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    5.603092] 
               -> #0 (&root->kernfs_rwsem){++++}-{4:4}:
[    5.603094]        __lock_acquire+0x1339/0x2180
[    5.603097]        lock_acquire+0xd0/0x2e0
[    5.603099]        down_write+0x2e/0xb0
[    5.603101]        kernfs_remove+0x31/0x50
[    5.603103]        __kobject_del+0x2e/0x90
[    5.603104]        kobject_del+0x13/0x30
[    5.603104]        elevator_switch+0x44/0x2e0
[    5.603106]        elv_iosched_store+0x174/0x1e0
[    5.603107]        queue_attr_store+0x142/0x180
[    5.603108]        kernfs_fop_write_iter+0x168/0x240
[    5.603110]        vfs_write+0x2b2/0x540
[    5.603111]        ksys_write+0x72/0xf0
[    5.603111]        do_syscall_64+0x95/0x180
[    5.603113]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    5.603114] 
               other info that might help us debug this:

[    5.603115] Chain exists of:
                 &root->kernfs_rwsem --> fs_reclaim --> &q-
>q_usage_counter(io)#3

[    5.603117]  Possible unsafe locking scenario:

[    5.603117]        CPU0                    CPU1
[    5.603117]        ----                    ----
[    5.603118]   lock(&q->q_usage_counter(io)#3);
[    5.603119]                                lock(fs_reclaim);
[    5.603119]                                lock(&q-
>q_usage_counter(io)#3);
[    5.603120]   lock(&root->kernfs_rwsem);
[    5.603121] 
                *** DEADLOCK ***

[    5.603121] 6 locks held by (udev-worker)/867:
[    5.603122]  #0: ffff9211c16dd420 (sb_writers#4){.+.+}-{0:0}, at:
ksys_write+0x72/0xf0
[    5.603125]  #1: ffff9211e28f3e88 (&of->mutex#2){+.+.}-{4:4}, at:
kernfs_fop_write_iter+0x121/0x240
[    5.603128]  #2: ffff921203524f28 (kn->active#101){.+.+}-{0:0}, at:
kernfs_fop_write_iter+0x12a/0x240
[    5.603131]  #3: ffff9211e86f46d0 (&q->sysfs_lock){+.+.}-{4:4}, at:
queue_attr_store+0x12b/0x180
[    5.603133]  #4: ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}-
{0:0}, at: blk_mq_freeze_queue+0x12/0x20
[    5.603136]  #5: ffff9211e86f41d8 (&q-
>q_usage_counter(queue)#3){++++}-{0:0}, at:
blk_mq_freeze_queue+0x12/0x20
[    5.603139] 
               stack backtrace:
[    5.603140] CPU: 4 UID: 0 PID: 867 Comm: (udev-worker) Tainted: G  
U             6.13.0-rc6+ #122
[    5.603142] Tainted: [U]=USER
[    5.603142] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[    5.603143] Call Trace:
[    5.603144]  <TASK>
[    5.603146]  dump_stack_lvl+0x6e/0xa0
[    5.603148]  print_circular_bug.cold+0x178/0x1be
[    5.603151]  check_noncircular+0x148/0x160
[    5.603154]  __lock_acquire+0x1339/0x2180
[    5.603156]  lock_acquire+0xd0/0x2e0
[    5.603158]  ? kernfs_remove+0x31/0x50
[    5.603160]  ? sysfs_remove_dir+0x32/0x60
[    5.603162]  ? lock_release+0xd2/0x2a0
[    5.603164]  down_write+0x2e/0xb0
[    5.603165]  ? kernfs_remove+0x31/0x50
[    5.603166]  kernfs_remove+0x31/0x50
[    5.603168]  __kobject_del+0x2e/0x90
[    5.603170]  elevator_switch+0x44/0x2e0
[    5.603172]  elv_iosched_store+0x174/0x1e0
[    5.603174]  queue_attr_store+0x142/0x180
[    5.603176]  ? lock_acquire+0xd0/0x2e0
[    5.603177]  ? kernfs_fop_write_iter+0x12a/0x240
[    5.603179]  ? lock_is_held_type+0x9a/0x110
[    5.603182]  kernfs_fop_write_iter+0x168/0x240
[    5.657060]  vfs_write+0x2b2/0x540
[    5.657470]  ksys_write+0x72/0xf0
[    5.657475]  do_syscall_64+0x95/0x180
[    5.657480]  ? lock_acquire+0xd0/0x2e0
[    5.657484]  ? ktime_get_coarse_real_ts64+0x12/0x60
[    5.657486]  ? find_held_lock+0x2b/0x80
[    5.657489]  ? ktime_get_coarse_real_ts64+0x12/0x60
[    5.657490]  ? file_has_perm+0xa9/0xf0
[    5.657494]  ? syscall_exit_to_user_mode_prepare+0x21b/0x250
[    5.657499]  ? lockdep_hardirqs_on_prepare+0xdb/0x190
[    5.657501]  ? syscall_exit_to_user_mode+0x97/0x290
[    5.657504]  ? do_syscall_64+0xa1/0x180
[    5.657507]  ? lock_acquire+0xd0/0x2e0
[    5.662389]  ? fd_install+0x3e/0x300
[    5.662395]  ? find_held_lock+0x2b/0x80
[    5.663189]  ? fd_install+0xbb/0x300
[    5.663194]  ? do_sys_openat2+0x9c/0xe0
[    5.664093]  ? kmem_cache_free+0x13e/0x450
[    5.664099]  ? syscall_exit_to_user_mode_prepare+0x21b/0x250
[    5.664952]  ? lockdep_hardirqs_on_prepare+0xdb/0x190
[    5.664956]  ? syscall_exit_to_user_mode+0x97/0x290
[    5.664961]  ? do_syscall_64+0xa1/0x180
[    5.664964]  ? syscall_exit_to_user_mode_prepare+0x21b/0x250
[    5.664967]  ? lockdep_hardirqs_on_prepare+0xdb/0x190
[    5.664969]  ? syscall_exit_to_user_mode+0x97/0x290
[    5.664972]  ? do_syscall_64+0xa1/0x180
[    5.664974]  ? clear_bhb_loop+0x45/0xa0
[    5.664977]  ? clear_bhb_loop+0x45/0xa0
[    5.664979]  ? clear_bhb_loop+0x45/0xa0
[    5.664982]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    5.664985] RIP: 0033:0x7fe72d2f4484
[    5.664988] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84
00 00 00 00 00 f3 0f 1e fa 80 3d 45 9c 10 00 00 74 13 b8 01 00
 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec
20 48 89
[    5.664990] RSP: 002b:00007ffe51665998 EFLAGS: 00000202 ORIG_RAX:
0000000000000001
[    5.664992] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
00007fe72d2f4484
[    5.664994] RDX: 0000000000000003 RSI: 00007ffe51665ca0 RDI:
0000000000000038
[    5.664995] RBP: 00007ffe516659c0 R08: 00007fe72d3f51c8 R09:
00007ffe51665a70
[    5.664996] R10: 0000000000000000 R11: 0000000000000202 R12:
0000000000000003
[    5.664997] R13: 00007ffe51665ca0 R14: 000055a1bab093b0 R15:
00007fe72d3f4e80
[    5.665001]  </TASK>

Thanks,
Thomas



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Blockdev 6.13-rc lockdep splat regressions
  2025-01-12 11:33       ` Thomas Hellström
@ 2025-01-12 15:50         ` Ming Lei
  2025-01-12 17:44           ` Thomas Hellström
  2025-01-13  9:28         ` Ming Lei
  1 sibling, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-01-12 15:50 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block

On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:

...

> 
> Ah, You're right, it's a different warning this time. Posted the
> warning below. (Note: This is also with Christoph's series applied on
> top).
> 
> May I also humbly suggest the following lockdep priming to be able to
> catch the reclaim lockdep splats early without reclaim needing to
> happen. That will also pick up splat #2 below.
> 
> 8<-------------------------------------------------------------
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 32fb28a6372c..2dd8dc9aed7f 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -458,6 +458,11 @@ struct request_queue *blk_alloc_queue(struct
> queue_limits *lim, int node_id)
>  
>         q->nr_requests = BLKDEV_DEFAULT_RQ;
>  
> +       fs_reclaim_acquire(GFP_KERNEL);
> +       rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_);
> +       rwsem_release(&q->io_lockdep_map, _RET_IP_);
> +       fs_reclaim_release(GFP_KERNEL);
> +
>         return q;

Looks one nice idea for injecting fs_reclaim, maybe it can be
added to inject framework?

>  
>  fail_stats:
> 
> 8<-------------------------------------------------------------
> 
> #1:
>   106.921533] ======================================================
> [  106.921716] WARNING: possible circular locking dependency detected
> [  106.921725] 6.13.0-rc6+ #121 Tainted: G     U            
> [  106.921734] ------------------------------------------------------
> [  106.921743] kswapd0/117 is trying to acquire lock:
> [  106.921751] ffff8ff4e2da09f0 (&q->q_usage_counter(io)){++++}-{0:0},
> at: __submit_bio+0x80/0x220
> [  106.921769] 
>                but task is already holding lock:
> [  106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
> balance_pgdat+0xe2/0xa10
> [  106.921791] 
>                which lock already depends on the new lock.
> 
> [  106.921803] 
>                the existing dependency chain (in reverse order) is:
> [  106.921814] 
>                -> #1 (fs_reclaim){+.+.}-{0:0}:
> [  106.921824]        fs_reclaim_acquire+0x9d/0xd0
> [  106.921833]        __kmalloc_cache_node_noprof+0x5d/0x3f0
> [  106.921842]        blk_mq_init_tags+0x3d/0xb0
> [  106.921851]        blk_mq_alloc_map_and_rqs+0x4e/0x3d0
> [  106.921860]        blk_mq_init_sched+0x100/0x260
> [  106.921868]        elevator_switch+0x8d/0x2e0
> [  106.921877]        elv_iosched_store+0x174/0x1e0
> [  106.921885]        queue_attr_store+0x142/0x180
> [  106.921893]        kernfs_fop_write_iter+0x168/0x240
> [  106.921902]        vfs_write+0x2b2/0x540
> [  106.921910]        ksys_write+0x72/0xf0
> [  106.921916]        do_syscall_64+0x95/0x180
> [  106.921925]        entry_SYSCALL_64_after_hwframe+0x76/0x7e

That is another regression from commit

	af2814149883 block: freeze the queue in queue_attr_store

and queue_wb_lat_store() has same risk too.

I will cook a patch to fix it.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Blockdev 6.13-rc lockdep splat regressions
  2025-01-12 15:50         ` Ming Lei
@ 2025-01-12 17:44           ` Thomas Hellström
  2025-01-13  0:55             ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2025-01-12 17:44 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block

On Sun, 2025-01-12 at 23:50 +0800, Ming Lei wrote:
> On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
> 
> ...
> 
> > 
> > Ah, You're right, it's a different warning this time. Posted the
> > warning below. (Note: This is also with Christoph's series applied
> > on
> > top).
> > 
> > May I also humbly suggest the following lockdep priming to be able
> > to
> > catch the reclaim lockdep splats early without reclaim needing to
> > happen. That will also pick up splat #2 below.
> > 
> > 8<-------------------------------------------------------------
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 32fb28a6372c..2dd8dc9aed7f 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -458,6 +458,11 @@ struct request_queue *blk_alloc_queue(struct
> > queue_limits *lim, int node_id)
> >  
> >         q->nr_requests = BLKDEV_DEFAULT_RQ;
> >  
> > +       fs_reclaim_acquire(GFP_KERNEL);
> > +       rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_);
> > +       rwsem_release(&q->io_lockdep_map, _RET_IP_);
> > +       fs_reclaim_release(GFP_KERNEL);
> > +
> >         return q;
> 
> Looks one nice idea for injecting fs_reclaim, maybe it can be
> added to inject framework?

For the intel gpu drivers, we typically always prime lockdep like this
if we *know* that the lock will be grabbed during reclaim, like if it's
part of shrinker processing or similar. 

So sooner or later we *know* this sequence will happen so we add it
near the lock initialization to always be executed when the lock(map)
is initialized.

So I don't really see a need for them to be periodially injected?

> 
> >  
> >  fail_stats:
> > 
> > 8<-------------------------------------------------------------
> > 
> > #1:
> >   106.921533]
> > ======================================================
> > [  106.921716] WARNING: possible circular locking dependency
> > detected
> > [  106.921725] 6.13.0-rc6+ #121 Tainted: G     U            
> > [  106.921734] ----------------------------------------------------
> > --
> > [  106.921743] kswapd0/117 is trying to acquire lock:
> > [  106.921751] ffff8ff4e2da09f0 (&q->q_usage_counter(io)){++++}-
> > {0:0},
> > at: __submit_bio+0x80/0x220
> > [  106.921769] 
> >                but task is already holding lock:
> > [  106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
> > balance_pgdat+0xe2/0xa10
> > [  106.921791] 
> >                which lock already depends on the new lock.
> > 
> > [  106.921803] 
> >                the existing dependency chain (in reverse order) is:
> > [  106.921814] 
> >                -> #1 (fs_reclaim){+.+.}-{0:0}:
> > [  106.921824]        fs_reclaim_acquire+0x9d/0xd0
> > [  106.921833]        __kmalloc_cache_node_noprof+0x5d/0x3f0
> > [  106.921842]        blk_mq_init_tags+0x3d/0xb0
> > [  106.921851]        blk_mq_alloc_map_and_rqs+0x4e/0x3d0
> > [  106.921860]        blk_mq_init_sched+0x100/0x260
> > [  106.921868]        elevator_switch+0x8d/0x2e0
> > [  106.921877]        elv_iosched_store+0x174/0x1e0
> > [  106.921885]        queue_attr_store+0x142/0x180
> > [  106.921893]        kernfs_fop_write_iter+0x168/0x240
> > [  106.921902]        vfs_write+0x2b2/0x540
> > [  106.921910]        ksys_write+0x72/0xf0
> > [  106.921916]        do_syscall_64+0x95/0x180
> > [  106.921925]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> That is another regression from commit
> 
> 	af2814149883 block: freeze the queue in queue_attr_store
> 
> and queue_wb_lat_store() has same risk too.
> 
> I will cook a patch to fix it.

Thanks. Are these splats going to be silenced for 6.13-rc? Like having
the new lockdep checks under a special config until they are fixed?

Thanks,
Thomas

> 
> Thanks,
> Ming
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Blockdev 6.13-rc lockdep splat regressions
  2025-01-12 17:44           ` Thomas Hellström
@ 2025-01-13  0:55             ` Ming Lei
  2025-01-13  8:48               ` Thomas Hellström
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-01-13  0:55 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block

On Sun, Jan 12, 2025 at 06:44:53PM +0100, Thomas Hellström wrote:
> On Sun, 2025-01-12 at 23:50 +0800, Ming Lei wrote:
> > On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> > > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
> > 
> > ...
> > 
> > > 
> > > Ah, You're right, it's a different warning this time. Posted the
> > > warning below. (Note: This is also with Christoph's series applied
> > > on
> > > top).
> > > 
> > > May I also humbly suggest the following lockdep priming to be able
> > > to
> > > catch the reclaim lockdep splats early without reclaim needing to
> > > happen. That will also pick up splat #2 below.
> > > 
> > > 8<-------------------------------------------------------------
> > > 
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index 32fb28a6372c..2dd8dc9aed7f 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -458,6 +458,11 @@ struct request_queue *blk_alloc_queue(struct
> > > queue_limits *lim, int node_id)
> > >  
> > >         q->nr_requests = BLKDEV_DEFAULT_RQ;
> > >  
> > > +       fs_reclaim_acquire(GFP_KERNEL);
> > > +       rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_);
> > > +       rwsem_release(&q->io_lockdep_map, _RET_IP_);
> > > +       fs_reclaim_release(GFP_KERNEL);
> > > +
> > >         return q;
> > 
> > Looks one nice idea for injecting fs_reclaim, maybe it can be
> > added to inject framework?
> 
> For the intel gpu drivers, we typically always prime lockdep like this
> if we *know* that the lock will be grabbed during reclaim, like if it's
> part of shrinker processing or similar. 
> 
> So sooner or later we *know* this sequence will happen so we add it
> near the lock initialization to always be executed when the lock(map)
> is initialized.
> 
> So I don't really see a need for them to be periodially injected?

What I suggested is to add the verification for every allocation with
direct reclaim by one kernel config which depends on both lockdep and
fault inject.

> 
> > 
> > >  
> > >  fail_stats:
> > > 
> > > 8<-------------------------------------------------------------
> > > 
> > > #1:
> > >   106.921533]
> > > ======================================================
> > > [  106.921716] WARNING: possible circular locking dependency
> > > detected
> > > [  106.921725] 6.13.0-rc6+ #121 Tainted: G     U            
> > > [  106.921734] ----------------------------------------------------
> > > --
> > > [  106.921743] kswapd0/117 is trying to acquire lock:
> > > [  106.921751] ffff8ff4e2da09f0 (&q->q_usage_counter(io)){++++}-
> > > {0:0},
> > > at: __submit_bio+0x80/0x220
> > > [  106.921769] 
> > >                but task is already holding lock:
> > > [  106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
> > > balance_pgdat+0xe2/0xa10
> > > [  106.921791] 
> > >                which lock already depends on the new lock.
> > > 
> > > [  106.921803] 
> > >                the existing dependency chain (in reverse order) is:
> > > [  106.921814] 
> > >                -> #1 (fs_reclaim){+.+.}-{0:0}:
> > > [  106.921824]        fs_reclaim_acquire+0x9d/0xd0
> > > [  106.921833]        __kmalloc_cache_node_noprof+0x5d/0x3f0
> > > [  106.921842]        blk_mq_init_tags+0x3d/0xb0
> > > [  106.921851]        blk_mq_alloc_map_and_rqs+0x4e/0x3d0
> > > [  106.921860]        blk_mq_init_sched+0x100/0x260
> > > [  106.921868]        elevator_switch+0x8d/0x2e0
> > > [  106.921877]        elv_iosched_store+0x174/0x1e0
> > > [  106.921885]        queue_attr_store+0x142/0x180
> > > [  106.921893]        kernfs_fop_write_iter+0x168/0x240
> > > [  106.921902]        vfs_write+0x2b2/0x540
> > > [  106.921910]        ksys_write+0x72/0xf0
> > > [  106.921916]        do_syscall_64+0x95/0x180
> > > [  106.921925]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > 
> > That is another regression from commit
> > 
> > 	af2814149883 block: freeze the queue in queue_attr_store
> > 
> > and queue_wb_lat_store() has same risk too.
> > 
> > I will cook a patch to fix it.
> 
> Thanks. Are these splats going to be silenced for 6.13-rc? Like having
> the new lockdep checks under a special config until they are fixed?

It is too late for v6.13, and Christoph's fix won't be available for v6.13
too.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Blockdev 6.13-rc lockdep splat regressions
  2025-01-13  0:55             ` Ming Lei
@ 2025-01-13  8:48               ` Thomas Hellström
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2025-01-13  8:48 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block

Hi.

On Mon, 2025-01-13 at 08:55 +0800, Ming Lei wrote:
> On Sun, Jan 12, 2025 at 06:44:53PM +0100, Thomas Hellström wrote:
> > On Sun, 2025-01-12 at 23:50 +0800, Ming Lei wrote:
> > > On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> > > > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
> > > 
> > > ...
> > > 
> > > > 
> > > > Ah, You're right, it's a different warning this time. Posted
> > > > the
> > > > warning below. (Note: This is also with Christoph's series
> > > > applied
> > > > on
> > > > top).
> > > > 
> > > > May I also humbly suggest the following lockdep priming to be
> > > > able
> > > > to
> > > > catch the reclaim lockdep splats early without reclaim needing
> > > > to
> > > > happen. That will also pick up splat #2 below.
> > > > 
> > > > 8<-------------------------------------------------------------
> > > > 
> > > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > > index 32fb28a6372c..2dd8dc9aed7f 100644
> > > > --- a/block/blk-core.c
> > > > +++ b/block/blk-core.c
> > > > @@ -458,6 +458,11 @@ struct request_queue
> > > > *blk_alloc_queue(struct
> > > > queue_limits *lim, int node_id)
> > > >  
> > > >         q->nr_requests = BLKDEV_DEFAULT_RQ;
> > > >  
> > > > +       fs_reclaim_acquire(GFP_KERNEL);
> > > > +       rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_);
> > > > +       rwsem_release(&q->io_lockdep_map, _RET_IP_);
> > > > +       fs_reclaim_release(GFP_KERNEL);
> > > > +
> > > >         return q;
> > > 
> > > Looks one nice idea for injecting fs_reclaim, maybe it can be
> > > added to inject framework?
> > 
> > For the intel gpu drivers, we typically always prime lockdep like
> > this
> > if we *know* that the lock will be grabbed during reclaim, like if
> > it's
> > part of shrinker processing or similar. 
> > 
> > So sooner or later we *know* this sequence will happen so we add it
> > near the lock initialization to always be executed when the
> > lock(map)
> > is initialized.
> > 
> > So I don't really see a need for them to be periodially injected?
> 
> What I suggested is to add the verification for every allocation with
> direct reclaim by one kernel config which depends on both lockdep and
> fault inject.

> 
> > 
> > > 
> > > >  
> > > >  fail_stats:
> > > > 
> > > > 8<-------------------------------------------------------------
> > > > 
> > > > #1:
> > > >   106.921533]
> > > > ======================================================
> > > > [  106.921716] WARNING: possible circular locking dependency
> > > > detected
> > > > [  106.921725] 6.13.0-rc6+ #121 Tainted: G     U            
> > > > [  106.921734] ------------------------------------------------
> > > > ----
> > > > --
> > > > [  106.921743] kswapd0/117 is trying to acquire lock:
> > > > [  106.921751] ffff8ff4e2da09f0 (&q-
> > > > >q_usage_counter(io)){++++}-
> > > > {0:0},
> > > > at: __submit_bio+0x80/0x220
> > > > [  106.921769] 
> > > >                but task is already holding lock:
> > > > [  106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
> > > > balance_pgdat+0xe2/0xa10
> > > > [  106.921791] 
> > > >                which lock already depends on the new lock.
> > > > 
> > > > [  106.921803] 
> > > >                the existing dependency chain (in reverse order)
> > > > is:
> > > > [  106.921814] 
> > > >                -> #1 (fs_reclaim){+.+.}-{0:0}:
> > > > [  106.921824]        fs_reclaim_acquire+0x9d/0xd0
> > > > [  106.921833]        __kmalloc_cache_node_noprof+0x5d/0x3f0
> > > > [  106.921842]        blk_mq_init_tags+0x3d/0xb0
> > > > [  106.921851]        blk_mq_alloc_map_and_rqs+0x4e/0x3d0
> > > > [  106.921860]        blk_mq_init_sched+0x100/0x260
> > > > [  106.921868]        elevator_switch+0x8d/0x2e0
> > > > [  106.921877]        elv_iosched_store+0x174/0x1e0
> > > > [  106.921885]        queue_attr_store+0x142/0x180
> > > > [  106.921893]        kernfs_fop_write_iter+0x168/0x240
> > > > [  106.921902]        vfs_write+0x2b2/0x540
> > > > [  106.921910]        ksys_write+0x72/0xf0
> > > > [  106.921916]        do_syscall_64+0x95/0x180
> > > > [  106.921925]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > 
> > > That is another regression from commit
> > > 
> > > 	af2814149883 block: freeze the queue in queue_attr_store
> > > 
> > > and queue_wb_lat_store() has same risk too.
> > > 
> > > I will cook a patch to fix it.
> > 
> > Thanks. Are these splats going to be silenced for 6.13-rc? Like
> > having
> > the new lockdep checks under a special config until they are fixed?
> 
> It is too late for v6.13, and Christoph's fix won't be available for
> v6.13
> too.

Yeah, I was thinking more of the lockdep warnings themselves, rather
than the actual deadlock fixing? 

Thanks,
Thomas

> 
> 
> Thanks,
> Ming
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Blockdev 6.13-rc lockdep splat regressions
  2025-01-12 11:33       ` Thomas Hellström
  2025-01-12 15:50         ` Ming Lei
@ 2025-01-13  9:28         ` Ming Lei
  2025-01-13  9:58           ` Thomas Hellström
  1 sibling, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-01-13  9:28 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block

On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
> > On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote:
> > > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote:
> > > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote:
> > > > > Ming, Others
> > > > > 
> 
> #2:
> [    5.595482] ======================================================
> [    5.596353] WARNING: possible circular locking dependency detected
> [    5.597231] 6.13.0-rc6+ #122 Tainted: G     U            
> [    5.598182] ------------------------------------------------------
> [    5.599149] (udev-worker)/867 is trying to acquire lock:
> [    5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4}, at:
> kernfs_remove+0x31/0x50
> [    5.600987] 
>                but task is already holding lock:
> [    5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}-
> {0:0}, at: blk_mq_freeze_queue+0x12/0x20
> [    5.603033] 
>                which lock already depends on the new lock.
> 
> [    5.603034] 
>                the existing dependency chain (in reverse order) is:
> [    5.603035] 
>                -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}:
> [    5.603038]        blk_alloc_queue+0x319/0x350
> [    5.603041]        blk_mq_alloc_queue+0x63/0xd0

The above one is solved in for-6.14/block of block tree:

	block: track queue dying state automatically for modeling queue freeze lockdep

q->q_usage_counter(io) is killed because disk isn't up yet.

If you apply the noio patch against for-6.1/block, the two splats should
have disappeared. If not, please post lockdep log.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Blockdev 6.13-rc lockdep splat regressions
  2025-01-13  9:28         ` Ming Lei
@ 2025-01-13  9:58           ` Thomas Hellström
  2025-01-13 10:40             ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2025-01-13  9:58 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block

Hi,

On Mon, 2025-01-13 at 17:28 +0800, Ming Lei wrote:
> On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
> > > On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote:
> > > > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote:
> > > > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström
> > > > > wrote:
> > > > > > Ming, Others
> > > > > > 
> > 
> > #2:
> > [    5.595482]
> > ======================================================
> > [    5.596353] WARNING: possible circular locking dependency
> > detected
> > [    5.597231] 6.13.0-rc6+ #122 Tainted: G     U            
> > [    5.598182] ----------------------------------------------------
> > --
> > [    5.599149] (udev-worker)/867 is trying to acquire lock:
> > [    5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4},
> > at:
> > kernfs_remove+0x31/0x50
> > [    5.600987] 
> >                but task is already holding lock:
> > [    5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}-
> > {0:0}, at: blk_mq_freeze_queue+0x12/0x20
> > [    5.603033] 
> >                which lock already depends on the new lock.
> > 
> > [    5.603034] 
> >                the existing dependency chain (in reverse order) is:
> > [    5.603035] 
> >                -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}:
> > [    5.603038]        blk_alloc_queue+0x319/0x350
> > [    5.603041]        blk_mq_alloc_queue+0x63/0xd0
> 
> The above one is solved in for-6.14/block of block tree:
> 
> 	block: track queue dying state automatically for modeling
> queue freeze lockdep
> 
> q->q_usage_counter(io) is killed because disk isn't up yet.
> 
> If you apply the noio patch against for-6.1/block, the two splats
> should
> have disappeared. If not, please post lockdep log.

That above dependency path is the lockdep priming I suggested, which
establishes the reclaim -> q->q_usage_counter(io) locking order. 
A splat without that priming would look slightly different and won't
occur until memory is actually exhausted. But it *will* occur.

That's why I suggested using the priming to catch all fs_reclaim-
>q_usage_counter(io) violations early, perhaps already at system boot,
and anybody accidently adding a GFP_KERNEL memory allocation under the
q_usage_counter(io) lock would get a notification as soon as that
allocation happens.

The actual deadlock sequence is because kernfs_rwsem is taken under
q_usage_counter(io): (excerpt from the report [a]). 
If the priming is removed, the splat doesn't happen until reclaim, and
will instead look like [b].

Thanks,
Thomas


[a]
[    5.603115] Chain exists of:
                 &root->kernfs_rwsem --> fs_reclaim --> &q-
>q_usage_counter(io)#3

[    5.603117]  Possible unsafe locking scenario:

[    5.603117]        CPU0                    CPU1
[    5.603117]        ----                    ----
[    5.603118]   lock(&q->q_usage_counter(io)#3);
[    5.603119]                                lock(fs_reclaim);
[    5.603119]                                lock(&q-
>q_usage_counter(io)#3);
[    5.603120]   lock(&root->kernfs_rwsem);
[    5.603121]
                *** DEADLOCK ***

[    5.603121] 6 locks held by (udev-worker)/867:
[    5.603122]  #0: ffff9211c16dd420 (sb_writers#4){.+.+}-{0:0}, at:
ksys_write+0x72/0xf0
[    5.603125]  #1: ffff9211e28f3e88 (&of->mutex#2){+.+.}-{4:4}, at:
kernfs_fop_write_iter+0x121/0x240
[    5.603128]  #2: ffff921203524f28 (kn->active#101){.+.+}-{0:0}, at:
kernfs_fop_write_iter+0x12a/0x240
[    5.603131]  #3: ffff9211e86f46d0 (&q->sysfs_lock){+.+.}-{4:4}, at:
queue_attr_store+0x12b/0x180
[    5.603133]  #4: ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}-
{0:0}, at: blk_mq_freeze_queue+0x12/0x20
[    5.603136]  #5: ffff9211e86f41d8 (&q-
>q_usage_counter(queue)#3){++++}-{0:0}, at:
blk_mq_freeze_queue+0x12/0x20
[    5.603139]
               stack backtrace:
[    5.603140] CPU: 4 UID: 0 PID: 867 Comm: (udev-worker) Tainted: G 
U             6.13.0-rc6+ #122
[    5.603142] Tainted: [U]=USER
[    5.603142] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[    5.603143] Call Trace:
[    5.603144]  <TASK>
[    5.603146]  dump_stack_lvl+0x6e/0xa0
[    5.603148]  print_circular_bug.cold+0x178/0x1be
[    5.603151]  check_noncircular+0x148/0x160
[    5.603154]  __lock_acquire+0x1339/0x2180
[    5.603156]  lock_acquire+0xd0/0x2e0
[    5.603158]  ? kernfs_remove+0x31/0x50
[    5.603160]  ? sysfs_remove_dir+0x32/0x60
[    5.603162]  ? lock_release+0xd2/0x2a0
[    5.603164]  down_write+0x2e/0xb0
[    5.603165]  ? kernfs_remove+0x31/0x50
[    5.603166]  kernfs_remove+0x31/0x50
[    5.

[b]

[157.543591] ======================================================
[  157.543778] WARNING: possible circular locking dependency detected
[  157.543787] 6.13.0-rc6+ #123 Tainted: G     U            
[  157.543796] ------------------------------------------------------
[  157.543805] git/2856 is trying to acquire lock:
[  157.543812] ffff98b6bb882f10 (&q->q_usage_counter(io)#2){++++}-
{0:0}, at: __submit_bio+0x80/0x220
[  157.543830] 
               but task is already holding lock:
[  157.543839] ffffffffad65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
__alloc_pages_slowpath.constprop.0+0x348/0xea0
[  157.543855] 
               which lock already depends on the new lock.

[  157.543867] 
               the existing dependency chain (in reverse order) is:
[  157.543878] 
               -> #2 (fs_reclaim){+.+.}-{0:0}:
[  157.543888]        fs_reclaim_acquire+0x9d/0xd0
[  157.543896]        kmem_cache_alloc_lru_noprof+0x57/0x3f0
[  157.543906]        alloc_inode+0x97/0xc0
[  157.543913]        iget_locked+0x141/0x310
[  157.543921]        kernfs_get_inode+0x1a/0xf0
[  157.543929]        kernfs_get_tree+0x17b/0x2c0
[  157.543938]        sysfs_get_tree+0x1a/0x40
[  157.543945]        vfs_get_tree+0x29/0xe0
[  157.543953]        path_mount+0x49a/0xbd0
[  157.543960]        __x64_sys_mount+0x119/0x150
[  157.543968]        do_syscall_64+0x95/0x180
[  157.543977]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  157.543986] 
               -> #1 (&root->kernfs_rwsem){++++}-{4:4}:
[  157.543997]        down_write+0x2e/0xb0
[  157.544004]        kernfs_remove+0x31/0x50
[  157.544012]        __kobject_del+0x2e/0x90
[  157.544020]        kobject_del+0x13/0x30
[  157.544026]        elevator_switch+0x44/0x2e0
[  157.544034]        elv_iosched_store+0x174/0x1e0
[  157.544043]        queue_attr_store+0x165/0x1b0
[  157.544050]        kernfs_fop_write_iter+0x168/0x240
[  157.544059]        vfs_write+0x2b2/0x540
[  157.544066]        ksys_write+0x72/0xf0
[  157.544073]        do_syscall_64+0x95/0x180
[  157.544081]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  157.544090] 
               -> #0 (&q->q_usage_counter(io)#2){++++}-{0:0}:
[  157.544102]        __lock_acquire+0x1339/0x2180
[  157.544110]        lock_acquire+0xd0/0x2e0
[  157.544118]        blk_mq_submit_bio+0x88b/0xb60
[  157.544127]        __submit_bio+0x80/0x220
[  157.544135]        submit_bio_noacct_nocheck+0x324/0x420
[  157.544144]        swap_writepage+0x399/0x580
[  157.544152]        pageout+0x129/0x2d0
[  157.544160]        shrink_folio_list+0x5a0/0xd80
[  157.544168]        evict_folios+0x27d/0x7b0
[  157.544175]        try_to_shrink_lruvec+0x21b/0x2b0
[  157.544183]        shrink_one+0x102/0x1f0
[  157.544191]        shrink_node+0xb8e/0x1300
[  157.544198]        do_try_to_free_pages+0xb3/0x580
[  157.544206]        try_to_free_pages+0xfa/0x2a0
[  157.544214]        __alloc_pages_slowpath.constprop.0+0x36f/0xea0
[  157.544224]        __alloc_pages_noprof+0x34c/0x390
[  157.544233]        alloc_pages_mpol_noprof+0xd7/0x1c0
[  157.544241]        pipe_write+0x3fc/0x7f0
[  157.544574]        vfs_write+0x401/0x540
[  157.544917]        ksys_write+0xd1/0xf0
[  157.545246]        do_syscall_64+0x95/0x180
[  157.545576]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  157.545909] 
               other info that might help us debug this:

[  157.546879] Chain exists of:
                 &q->q_usage_counter(io)#2 --> &root->kernfs_rwsem -->
fs_reclaim

[  157.547849]  Possible unsafe locking scenario:

[  157.548483]        CPU0                    CPU1
[  157.548795]        ----                    ----
[  157.549098]   lock(fs_reclaim);
[  157.549400]                                lock(&root-
>kernfs_rwsem);
[  157.549705]                                lock(fs_reclaim);
[  157.550011]   rlock(&q->q_usage_counter(io)#2);
[  157.550316] 
                *** DEADLOCK ***

[  157.551194] 2 locks held by git/2856:
[  157.551490]  #0: ffff98b6a221e068 (&pipe->mutex){+.+.}-{4:4}, at:
pipe_write+0x5a/0x7f0
[  157.551798]  #1: ffffffffad65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
__alloc_pages_slowpath.constprop.0+0x348/0xea0
[  157.552115] 
               stack backtrace:
[  157.552734] CPU: 5 UID: 1000 PID: 2856 Comm: git Tainted: G     U  
6.13.0-rc6+ #123
[  157.553060] Tainted: [U]=USER
[  157.553383] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[  157.553718] Call Trace:
[  157.554054]  <TASK>
[  157.554389]  dump_stack_lvl+0x6e/0xa0
[  157.554725]  print_circular_bug.cold+0x178/0x1be
[  157.555064]  check_noncircular+0x148/0x160
[  157.555408]  ? __pfx_stack_trace_consume_entry+0x10/0x10
[  157.555747]  ? unwind_get_return_address+0x23/0x40
[  157.556085]  __lock_acquire+0x1339/0x2180
[  157.556425]  lock_acquire+0xd0/0x2e0
[  157.556761]  ? __submit_bio+0x80/0x220
[  157.557110]  ? blk_mq_submit_bio+0x860/0xb60
[  157.557447]  ? lock_release+0xd2/0x2a0
[  157.557784]  blk_mq_submit_bio+0x88b/0xb60
[  157.558137]  ? __submit_bio+0x80/0x220
[  157.558476]  __submit_bio+0x80/0x220
[  157.558828]  ? lockdep_hardirqs_on_prepare+0xdb/0x190
[  157.559166]  ? submit_bio_noacct_nocheck+0x324/0x420
[  157.559504]  submit_bio_noacct_nocheck+0x324/0x420
[  157.559863]  swap_writepage+0x399/0x580
[  157.560205]  pageout+0x129/0x2d0
[  157.560542]  shrink_folio_list+0x5a0/0xd80
[  157.560879]  ? evict_folios+0x25d/0x7b0
[  157.561212]  evict_folios+0x27d/0x7b0
[  157.561546]  try_to_shrink_lruvec+0x21b/0x2b0
[  157.561890]  shrink_one+0x102/0x1f0
[  157.562222]  shrink_node+0xb8e/0x1300
[  157.562554]  ? shrink_node+0x9c1/0x1300
[  157.562915]  ? shrink_node+0xb64/0x1300
[  157.563245]  ? do_try_to_free_pages+0xb3/0x580
[  157.563576]  do_try_to_free_pages+0xb3/0x580
[  157.563922]  ? lock_release+0xd2/0x2a0
[  157.564252]  try_to_free_pages+0xfa/0x2a0
[  157.564583]  __alloc_pages_slowpath.constprop.0+0x36f/0xea0
[  157.564946]  ? lock_release+0xd2/0x2a0
[  157.565279]  __alloc_pages_noprof+0x34c/0x390
[  157.565613]  alloc_pages_mpol_noprof+0xd7/0x1c0
[  157.565952]  pipe_write+0x3fc/0x7f0
[  157.566283]  vfs_write+0x401/0x540
[  157.566615]  ksys_write+0xd1/0xf0
[  157.566980]  do_syscall_64+0x95/0x180
[  157.567312]  ? vfs_write+0x401/0x540
[  157.567642]  ? lockdep_hardirqs_on_prepare+0xdb/0x190
[  157.568001]  ? syscall_exit_to_user_mode+0x97/0x290
[  157.568331]  ? do_syscall_64+0xa1/0x180
[  157.568658]  ? do_syscall_64+0xa1/0x180
[  157.569012]  ? syscall_exit_to_user_mode+0x97/0x290
[  157.569337]  ? do_syscall_64+0xa1/0x180
[  157.569658]  ? do_user_addr_fault+0x397/0x720
[  157.569980]  ? trace_hardirqs_off+0x4b/0xc0
[  157.570300]  ? clear_bhb_loop+0x45/0xa0
[  157.570621]  ? clear_bhb_loop+0x45/0xa0
[  157.570968]  ? clear_bhb_loop+0x45/0xa0
[  157.571286]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  157.571605] RIP: 0033:0x7fdf1ec2d484
[  157.571966] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84
00 00 00 00 00 f3 0f 1e fa 80 3d 45 9c 10 00 00 74 13 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
[  157.572322] RSP: 002b:00007ffd0eb6d068 EFLAGS: 00000202 ORIG_RAX:
0000000000000001
[  157.572692] RAX: ffffffffffffffda RBX: 0000000000000331 RCX:
00007fdf1ec2d484
[  157.573093] RDX: 0000000000000331 RSI: 000055693fe2d660 RDI:
0000000000000001
[  157.573470] RBP: 00007ffd0eb6d090 R08: 000055693fdc6010 R09:
0000000000000007
[  157.573875] R10: 0000556941b97c70 R11: 0000000000000202 R12:
0000000000000331
[  157.574249] R13: 000055693fe2d660 R14: 00007fdf1ed305c0 R15:
00007fdf1ed2de80
[  157.574621]  </TASK>

> 
> Thanks,
> Ming
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Blockdev 6.13-rc lockdep splat regressions
  2025-01-13  9:58           ` Thomas Hellström
@ 2025-01-13 10:40             ` Ming Lei
  0 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2025-01-13 10:40 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: Jens Axboe, Christoph Hellwig, linux-block

On Mon, Jan 13, 2025 at 10:58:07AM +0100, Thomas Hellström wrote:
> Hi,
> 
> On Mon, 2025-01-13 at 17:28 +0800, Ming Lei wrote:
> > On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> > > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
> > > > On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote:
> > > > > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote:
> > > > > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström
> > > > > > wrote:
> > > > > > > Ming, Others
> > > > > > > 
> > > 
> > > #2:
> > > [    5.595482]
> > > ======================================================
> > > [    5.596353] WARNING: possible circular locking dependency
> > > detected
> > > [    5.597231] 6.13.0-rc6+ #122 Tainted: G     U            
> > > [    5.598182] ----------------------------------------------------
> > > --
> > > [    5.599149] (udev-worker)/867 is trying to acquire lock:
> > > [    5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4},
> > > at:
> > > kernfs_remove+0x31/0x50
> > > [    5.600987] 
> > >                but task is already holding lock:
> > > [    5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}-
> > > {0:0}, at: blk_mq_freeze_queue+0x12/0x20
> > > [    5.603033] 
> > >                which lock already depends on the new lock.
> > > 
> > > [    5.603034] 
> > >                the existing dependency chain (in reverse order) is:
> > > [    5.603035] 
> > >                -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}:
> > > [    5.603038]        blk_alloc_queue+0x319/0x350
> > > [    5.603041]        blk_mq_alloc_queue+0x63/0xd0
> > 
> > The above one is solved in for-6.14/block of block tree:
> > 
> > 	block: track queue dying state automatically for modeling
> > queue freeze lockdep
> > 
> > q->q_usage_counter(io) is killed because disk isn't up yet.
> > 
> > If you apply the noio patch against for-6.1/block, the two splats
> > should
> > have disappeared. If not, please post lockdep log.
> 
> That above dependency path is the lockdep priming I suggested, which
> establishes the reclaim -> q->q_usage_counter(io) locking order. 
> A splat without that priming would look slightly different and won't
> occur until memory is actually exhausted. But it *will* occur.
> 
> That's why I suggested using the priming to catch all fs_reclaim-
> >q_usage_counter(io) violations early, perhaps already at system boot,
> and anybody accidently adding a GFP_KERNEL memory allocation under the
> q_usage_counter(io) lock would get a notification as soon as that
> allocation happens.
> 
> The actual deadlock sequence is because kernfs_rwsem is taken under
> q_usage_counter(io): (excerpt from the report [a]). 
> If the priming is removed, the splat doesn't happen until reclaim, and
> will instead look like [b].

Got it, [b] is new warning between 'echo /sys/block/$DEV/queue/scheduler'
and fs reclaim from sysfs inode allocation.

Three global or sub-system locks are involved:

- fs_reclaim

- root->kernfs_rwsem

- q->queue_usage_counter(io)

The problem exists since blk-mq scheduler is introduced, looks one hard
problem because it becomes difficult to avoid their dependency now.

I will think about and see if we can figure out one solution.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-01-13 10:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-10 10:12 Blockdev 6.13-rc lockdep splat regressions Thomas Hellström
2025-01-10 10:14 ` Christoph Hellwig
2025-01-10 10:21   ` Thomas Hellström
2025-01-10 12:13 ` Ming Lei
2025-01-10 14:36   ` Thomas Hellström
2025-01-11  3:05     ` Ming Lei
2025-01-12 11:33       ` Thomas Hellström
2025-01-12 15:50         ` Ming Lei
2025-01-12 17:44           ` Thomas Hellström
2025-01-13  0:55             ` Ming Lei
2025-01-13  8:48               ` Thomas Hellström
2025-01-13  9:28         ` Ming Lei
2025-01-13  9:58           ` Thomas Hellström
2025-01-13 10:40             ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).