next-20250327 - lockdep whine and USB issues at boot

linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* next-20250327 - lockdep whine and USB issues at boot
@ 2025-03-28 18:10 Valdis Klētnieks
  2025-03-28 20:32 ` Luck, Tony
  2025-03-29  2:06 ` Ming Lei
  0 siblings, 2 replies; 5+ messages in thread
From: Valdis Klētnieks @ 2025-03-28 18:10 UTC (permalink / raw)
  To: Stephen Rothwell, Jens Axboe; +Cc: linux-next, linux-kernel, linux-block

Saw this during boot on a Dell Inspiron 5559 laptop.  

In addition, the external USB ports all gave up, rendering a USB mouse and a
USB external drive totally dead in the water.  May or may not be related, I didn't
dig too far into it.

[   40.842033] [    T953] io scheduler bfq registered

[   41.022391] [    T817] ======================================================
[   41.103507] [    T817] WARNING: possible circular locking dependency detected
[   41.184587] [    T817] 6.14.0-next-20250327 #110 Tainted: G          I     T  
[   41.265700] [    T817] ------------------------------------------------------
[   41.346832] [    T817] (udev-worker)/817 is trying to acquire lock:
[   41.427952] [    T817] ffff93a2c80ae9f0 (&q->elevator_lock){+.+.}-{4:4}, at: elv_iosched_store+0xe1/0x260
[   41.830112] [    T817] 
                          but task is already holding lock:
[   41.912022] [    T817] ffff93a2c80ae460 (&q->q_usage_counter(io)#10){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x11/0x20
[   42.394431] [    T817] 
                          which lock already depends on the new lock.

[   42.477193] [    T817] 
                          the existing dependency chain (in reverse order) is:
[   42.559132] [    T817] 
                          -> #2 (&q->q_usage_counter(io)#10){++++}-{0:0}:
[   43.042361] [    T817]        lock_acquire.part.0+0xbe/0x240
[   43.123452] [    T817]        blk_alloc_queue+0x30b/0x350
[   43.204547] [    T817]        blk_mq_alloc_queue+0x62/0xd0
[   43.285646] [    T817]        scsi_alloc_sdev+0x29c/0x3d0
[   43.366744] [    T817]        scsi_probe_and_add_lun+0x1d8/0x2b0
[   43.447847] [    T817]        __scsi_add_device+0x114/0x130
[   43.528950] [    T817]        ata_scsi_scan_host+0x7a/0x190
[   43.610047] [    T817]        async_run_entry_fn+0x24/0xc0
[   43.691137] [    T817]        process_one_work+0x21e/0x5a0
[   43.772226] [    T817]        worker_thread+0x1d5/0x3c0
[   43.853316] [    T817]        kthread+0x114/0x230
[   43.934369] [    T817]        ret_from_fork+0x2c/0x50
[   44.015453] [    T817]        ret_from_fork_asm+0x1a/0x30
[   44.096532] [    T817] 
                          -> #1 (fs_reclaim){+.+.}-{0:0}:
[   44.499438] [    T817]        lock_acquire.part.0+0xbe/0x240
[   44.580476] [    T817]        fs_reclaim_acquire+0xa8/0xe0
[   44.661541] [    T817]        blk_mq_alloc_and_init_hctx+0x16a/0x240
[   44.742613] [    T817]        blk_mq_realloc_hw_ctxs+0x2fb/0x390
[   44.823681] [    T817]        blk_mq_init_allocated_queue+0x13b/0x460
[   44.904744] [    T817]        blk_mq_alloc_queue+0x7b/0xd0
[   44.985800] [    T817]        __blk_mq_alloc_disk+0x13/0x60
[   45.066848] [    T817]        loop_add+0x1fd/0x3e0
[   45.147899] [    T817]        loop_init+0x17b/0x1d0
[   45.228942] [    T817]        do_one_initcall+0x83/0x3c0
[   45.309991] [    T817]        do_initcalls+0x130/0x1b0
[   45.391040] [    T817]        kernel_init_freeable+0x292/0x300
[   45.472088] [    T817]        kernel_init+0x15/0x130
[   45.553131] [    T817]        ret_from_fork+0x2c/0x50
[   45.634147] [    T817]        ret_from_fork_asm+0x1a/0x30
[   45.715192] [    T817] 
                          -> #0 (&q->elevator_lock){+.+.}-{4:4}:
[   46.117972] [    T817]        check_prev_add+0xe1/0xcf0
[   46.198974] [    T817]        __lock_acquire+0x1031/0x13b0
[   46.280004] [    T817]        lock_acquire.part.0+0xbe/0x240
[   46.361026] [    T817]        __mutex_lock+0xcb/0xfc0
[   46.442014] [    T817]        elv_iosched_store+0xe1/0x260
[   46.523029] [    T817]        kernfs_fop_write_iter+0x160/0x240
[   46.604038] [    T817]        vfs_write+0x2ec/0x5c0
[   46.685016] [    T817]        ksys_write+0x7a/0xf0
[   46.766019] [    T817]        do_syscall_64+0x68/0x140
[   46.847024] [    T817]        entry_SYSCALL_64_after_hwframe+0x71/0x79
[   46.927997] [    T817] 
                          other info that might help us debug this:

[   47.010420] [    T817] Chain exists of:
                            &q->elevator_lock --> fs_reclaim --> &q->q_usage_counter(io)#10

[   47.654630] [    T817]  Possible unsafe locking scenario:

[   47.736287] [    T817]        CPU0                    CPU1
[   47.817245] [    T817]        ----                    ----
[   47.898175] [    T817]   lock(&q->q_usage_counter(io)#10);
[   48.219898] [    T817]                                lock(fs_reclaim);
[   48.461357] [    T817]                                lock(&q->q_usage_counter(io)#10);
[   48.783086] [    T817]   lock(&q->elevator_lock);
[   49.024555] [    T817] 
                           *** DEADLOCK ***

[   49.106827] [    T817] 5 locks held by (udev-worker)/817:
[   49.187752] [    T817]  #0: ffff93a2e079a440 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x7a/0xf0
[   49.750206] [    T817]  #1: ffff93a2c62c6690 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x119/0x240
[   50.312697] [    T817]  #2: ffff93a2c2312ad0 (kn->active#89){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x122/0x240
[   50.875194] [    T817]  #3: ffff93a2c80ae460 (&q->q_usage_counter(io)#10){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x11/0x20
[   51.437693] [    T817]  #4: ffff93a2c80ae4a0 (&q->q_usage_counter(queue)){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x11/0x20
[   51.919935] [    T817] 
                          stack backtrace:
[   52.001533] [    T817] CPU: 0 UID: 0 PID: 817 Comm: (udev-worker) Tainted: G          I     T   6.14.0-next-20250327 #110 PREEMPT(full)  238541a35a93f433dad83bf02a8fb62af6fea5a1
[   52.081794] [    T817] Tainted: [I]=FIRMWARE_WORKAROUND, [T]=RANDSTRUCT
[   52.162052] [    T817] Hardware name: Dell Inc. Unidentified System/052K07, BIOS 1.9.0 09/07/2020
[   52.242309] [    T817] Call Trace:
[   52.322568] [    T817]  <TASK>
[   52.402828] [    T817]  dump_stack_lvl+0x65/0x90
[   52.483089] [    T817]  print_circular_bug.cold+0x38/0x46
[   52.563351] [    T817]  check_noncircular+0x169/0x190
[   52.643613] [    T817]  check_prev_add+0xe1/0xcf0
[   52.723871] [    T817]  ? add_chain_cache+0x115/0x4c0
[   52.804133] [    T817]  __lock_acquire+0x1031/0x13b0
[   52.884394] [    T817]  ? mark_usage+0x65/0x180
[   52.964654] [    T817]  lock_acquire.part.0+0xbe/0x240
[   53.044913] [    T817]  ? elv_iosched_store+0xe1/0x260
[   53.125175] [    T817]  ? lock_acquire+0xf8/0x140
[   53.205435] [    T817]  __mutex_lock+0xcb/0xfc0
[   53.285695] [    T817]  ? elv_iosched_store+0xe1/0x260
[   53.365955] [    T817]  ? mark_held_locks+0x40/0x70
[   53.446214] [    T817]  ? elv_iosched_store+0xe1/0x260
[   53.526473] [    T817]  ? lockdep_hardirqs_on_prepare.part.0+0x8e/0x170
[   53.606736] [    T817]  ? elv_iosched_store+0xe1/0x260
[   53.686996] [    T817]  elv_iosched_store+0xe1/0x260
[   53.767260] [    T817]  kernfs_fop_write_iter+0x160/0x240
[   53.847520] [    T817]  vfs_write+0x2ec/0x5c0
[   53.927784] [    T817]  ksys_write+0x7a/0xf0
[   54.008044] [    T817]  do_syscall_64+0x68/0x140
[   54.088307] [    T817]  entry_SYSCALL_64_after_hwframe+0x71/0x79
[   54.168565] [    T817] RIP: 0033:0x7fd9f9a7ca06
[   54.248825] [    T817] Code: 5d e8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 19 83 e2 39 83 fa 08 75 11 e8 26 ff ff ff 66 0f 1f 44 00 00 48 8b 45 10 0f 05 <48> 8b 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08
[   54.329083] [    T817] RSP: 002b:00007ffe23386d50 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[   54.489599] [    T817] RAX: ffffffffffffffda RBX: 0000564ed601dff0 RCX: 00007fd9f9a7ca06
[   54.569857] [    T817] RDX: 0000000000000003 RSI: 00007ffe233870a0 RDI: 000000000000001c
[   54.650115] [    T817] RBP: 00007ffe23386d70 R08: 0000000000000000 R09: 0000000000000000
[   54.730373] [    T817] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003
[   54.810\0\0\0\0\0\0\0\0630] [    T817] R13: 0000000000000003 R14: 00007ffe233870a0 R15: 00007ffe233870a0
[   54.890894] [    T817]  </TASK>
[   54.971553] [     T25] xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command
[   55.051817] [     T25] xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead
[   55.132085] [     T25] xhci_hcd 0000:00:14.0: HC died; cleaning up
[   55.212411] [     T59] xhci_hcd 0000:00:14.0: Timeout while waiting for stop endpoint command
[   55.253106] [     T47] kauditd_printk_skb: 81 callbacks suppressed
[   55.354880] [    T984] usb 1-2: USB disconnect, device number 2
[   55.375007] [    T986] usb 2-3: USB disconnect, device number 2
[   55.568668] [    T984] usb 1-3: USB disconnect, device number 3
[   55.784923] [    T984] usb 1-6: USB disconnect, device number 4
[   55.874703] [    T984] usb 1-7: USB disconnect, device number 5


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: next-20250327 - lockdep whine and USB issues at boot
  2025-03-28 18:10 next-20250327 - lockdep whine and USB issues at boot Valdis Klētnieks
@ 2025-03-28 20:32 ` Luck, Tony
  2025-03-29  2:06 ` Ming Lei
  1 sibling, 0 replies; 5+ messages in thread
From: Luck, Tony @ 2025-03-28 20:32 UTC (permalink / raw)
  To: Valdis Klētnieks
  Cc: Stephen Rothwell, Jens Axboe, linux-next, linux-kernel,
	linux-block

On Fri, Mar 28, 2025 at 02:10:53PM -0400, Valdis Klētnieks wrote:
> Saw this during boot on a Dell Inspiron 5559 laptop.  
> 
> In addition, the external USB ports all gave up, rendering a USB mouse and a
> USB external drive totally dead in the water.  May or may not be related, I didn't
> dig too far into it.
> 
> [   40.842033] [    T953] io scheduler bfq registered
> 
> [   41.022391] [    T817] ======================================================
> [   41.103507] [    T817] WARNING: possible circular locking dependency detected
> [   41.184587] [    T817] 6.14.0-next-20250327 #110 Tainted: G          I     T  
> [   41.265700] [    T817] ------------------------------------------------------
> [   41.346832] [    T817] (udev-worker)/817 is trying to acquire lock:
> [   41.427952] [    T817] ffff93a2c80ae9f0 (&q->elevator_lock){+.+.}-{4:4}, at: elv_iosched_store+0xe1/0x260
> [   41.830112] [    T817] 
>                           but task is already holding lock:
> [   41.912022] [    T817] ffff93a2c80ae460 (&q->q_usage_counter(io)#10){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x11/0x20
> [   42.394431] [    T817] 
>                           which lock already depends on the new lock.
> 
> [   42.477193] [    T817] 
>                           the existing dependency chain (in reverse order) is:
> [   42.559132] [    T817] 
>                           -> #2 (&q->q_usage_counter(io)#10){++++}-{0:0}:
> [   43.042361] [    T817]        lock_acquire.part.0+0xbe/0x240
> [   43.123452] [    T817]        blk_alloc_queue+0x30b/0x350
> [   43.204547] [    T817]        blk_mq_alloc_queue+0x62/0xd0
> [   43.285646] [    T817]        scsi_alloc_sdev+0x29c/0x3d0
> [   43.366744] [    T817]        scsi_probe_and_add_lun+0x1d8/0x2b0
> [   43.447847] [    T817]        __scsi_add_device+0x114/0x130
> [   43.528950] [    T817]        ata_scsi_scan_host+0x7a/0x190
> [   43.610047] [    T817]        async_run_entry_fn+0x24/0xc0
> [   43.691137] [    T817]        process_one_work+0x21e/0x5a0
> [   43.772226] [    T817]        worker_thread+0x1d5/0x3c0
> [   43.853316] [    T817]        kthread+0x114/0x230
> [   43.934369] [    T817]        ret_from_fork+0x2c/0x50
> [   44.015453] [    T817]        ret_from_fork_asm+0x1a/0x30
> [   44.096532] [    T817] 

I see similar looking lockdep splat on a 2 socket Icelake server.
Kernel built from this Linus commit:

acb4f33713b9 ("Merge tag 'm68knommu-for-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu")


[   30.253858] systemd-journald[2054]: Received client request to flush runtime journal.

[   31.187581] ======================================================
[   31.195139] WARNING: possible circular locking dependency detected
[   31.202480] 6.14.0+ #32 Not tainted
[   31.207291] ------------------------------------------------------
[   31.214922] (udev-worker)/2193 is trying to acquire lock:
[   31.221796] ff41ee0f1836f4d8 (&q->elevator_lock){+.+.}-{4:4}, at: elv_iosched_store+0xe9/0x260
[   31.231864]
               but task is already holding lock:
[   31.239595] ff41ee0f1836efa8 (&q->q_usage_counter(io)){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x12/0x20
[   31.250814]
               which lock already depends on the new lock.

[   31.261167]
               the existing dependency chain (in reverse order) is:
[   31.271104]
               -> #2 (&q->q_usage_counter(io)){++++}-{0:0}:
[   31.279299]        blk_alloc_queue+0x30e/0x350
[   31.284432]        blk_mq_alloc_queue+0x62/0xd0
[   31.290416]        scsi_alloc_sdev+0x280/0x3c0
[   31.296321]        scsi_probe_and_add_lun+0x223/0x470
[   31.302846]        __scsi_add_device+0x10d/0x120
[   31.308927]        ata_scsi_scan_host+0x9c/0x1b0
[   31.315003]        async_run_entry_fn+0x31/0x130
[   31.321016]        process_one_work+0x22f/0x5b0
[   31.326962]        worker_thread+0x1cc/0x3c0
[   31.332665]        kthread+0xfc/0x240
[   31.337784]        ret_from_fork+0x31/0x50
[   31.342252]        ret_from_fork_asm+0x1a/0x30
[   31.347061]
               -> #1 (fs_reclaim){+.+.}-{0:0}:
[   31.353404]        fs_reclaim_acquire+0x9d/0xd0
[   31.358273]        kmem_cache_alloc_node_noprof+0x59/0x420
[   31.364083]        scsi_mq_init_request+0x32/0xe0
[   31.369107]        blk_mq_alloc_and_init_hctx+0x143/0x4d0
[   31.374826]        blk_mq_realloc_hw_ctxs+0x2f5/0x390
[   31.380176]        blk_mq_init_allocated_queue+0x17b/0x570
[   31.385962]        blk_mq_alloc_queue+0x7b/0xd0
[   31.390795]        scsi_alloc_sdev+0x280/0x3c0
[   31.395603]        scsi_probe_and_add_lun+0x223/0x470
[   31.400929]        __scsi_add_device+0x10d/0x120
[   31.405828]        ata_scsi_scan_host+0x9c/0x1b0
[   31.410719]        async_run_entry_fn+0x31/0x130
[   31.415601]        process_one_work+0x22f/0x5b0
[   31.420396]        worker_thread+0x1cc/0x3c0
[   31.424997]        kthread+0xfc/0x240
[   31.428983]        ret_from_fork+0x31/0x50
[   31.433334]        ret_from_fork_asm+0x1a/0x30
[   31.438026]
               -> #0 (&q->elevator_lock){+.+.}-{4:4}:
[   31.444767]        __lock_acquire+0x1510/0x2630
[   31.449606]        lock_acquire+0xcb/0x2d0
[   31.454040]        __mutex_lock+0xca/0xe50
[   31.458382]        elv_iosched_store+0xe9/0x260
[   31.463145]        kernfs_fop_write_iter+0x165/0x240
[   31.468334]        vfs_write+0x2b0/0x540
[   31.472499]        ksys_write+0x71/0xf0
[   31.476573]        do_syscall_64+0x95/0x180
[   31.481122]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   31.486936]
               other info that might help us debug this:

[   31.495922] Chain exists of:
                 &q->elevator_lock --> fs_reclaim --> &q->q_usage_counter(io)

[   31.508010]  Possible unsafe locking scenario:

[   31.514353]        CPU0                    CPU1
[   31.519098]        ----                    ----
[   31.523833]   lock(&q->q_usage_counter(io));
[   31.528314]                                lock(fs_reclaim);
[   31.534183]                                lock(&q->q_usage_counter(io));
[   31.541178]   lock(&q->elevator_lock);
[   31.545145]
                *** DEADLOCK ***

[   31.551676] 5 locks held by (udev-worker)/2193:
[   31.556420]  #0: ff41ee0f18e77420 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x71/0xf0
[   31.564568]  #1: ff41ee0f1aabbe88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x11e/0x240
[   31.573842]  #2: ff41ee0f0cfefe48 (kn->active#86){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x127/0x240
[   31.583197]  #3: ff41ee0f1836efa8 (&q->q_usage_counter(io)){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x12/0x20
[   31.593949]  #4: ff41ee0f1836efe0 (&q->q_usage_counter(queue)){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x12/0x20
[   31.604967]
               stack backtrace:
[   31.609793] CPU: 1 UID: 0 PID: 2193 Comm: (udev-worker) Not tainted 6.14.0+ #32 PREEMPT(voluntary)
[   31.609797] Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0021.P06.2104260458 04/26/2021
[   31.609800] Call Trace:
[   31.609803]  <TASK>
[   31.609807]  dump_stack_lvl+0x6e/0xa0
[   31.609816]  print_circular_bug.cold+0x178/0x1be
[   31.609822]  check_noncircular+0x146/0x160
[   31.609828]  __lock_acquire+0x1510/0x2630
[   31.609833]  lock_acquire+0xcb/0x2d0
[   31.609835]  ? elv_iosched_store+0xe9/0x260
[   31.609842]  __mutex_lock+0xca/0xe50
[   31.609846]  ? elv_iosched_store+0xe9/0x260
[   31.609849]  ? elv_iosched_store+0xe9/0x260
[   31.609853]  ? elv_iosched_store+0xe9/0x260
[   31.609855]  elv_iosched_store+0xe9/0x260
[   31.609859]  kernfs_fop_write_iter+0x165/0x240
[   31.609862]  vfs_write+0x2b0/0x540
[   31.609868]  ksys_write+0x71/0xf0
[   31.609871]  do_syscall_64+0x95/0x180
[   31.609874]  ? lock_acquire+0xcb/0x2d0
[   31.609875]  ? ktime_get_coarse_real_ts64+0x15/0x70
[   31.609881]  ? find_held_lock+0x2b/0x80
[   31.609888]  ? ktime_get_coarse_real_ts64+0x15/0x70
[   31.609890]  ? file_has_perm+0xa8/0xf0
[   31.609897]  ? syscall_exit_to_user_mode_prepare+0x21b/0x250
[   31.609901]  ? lockdep_hardirqs_on_prepare+0xdb/0x190
[   31.609904]  ? syscall_exit_to_user_mode+0x97/0x290
[   31.609907]  ? do_syscall_64+0xa1/0x180
[   31.609908]  ? find_held_lock+0x2b/0x80
[   31.609911]  ? fd_install+0xa4/0x2c0
[   31.609917]  ? do_sys_openat2+0xa4/0xe0
[   31.609918]  ? kmem_cache_free+0x13b/0x460
[   31.609926]  ? do_sys_openat2+0xa4/0xe0
[   31.609928]  ? syscall_exit_to_user_mode_prepare+0x21b/0x250
[   31.609930]  ? lockdep_hardirqs_on_prepare+0xdb/0x190
[   31.609931]  ? syscall_exit_to_user_mode+0x97/0x290
[   31.609933]  ? do_syscall_64+0xa1/0x180
[   31.609935]  ? lock_acquire+0xcb/0x2d0
[   31.609937]  ? ktime_get+0x22/0x100
[   31.609940]  ? find_held_lock+0x2b/0x80
[   31.609942]  ? ktime_get+0x22/0x100
[   31.609946]  ? sched_clock+0x10/0x30
[   31.609947]  ? sched_clock_cpu+0xf/0x1f0
[   31.609951]  ? irqtime_account_irq+0x3e/0xc0
[   31.609955]  ? handle_softirqs+0x475/0x4d0
[   31.609960]  ? sched_clock_cpu+0xf/0x1f0
[   31.609963]  ? clear_bhb_loop+0x15/0x70
[   31.609966]  ? clear_bhb_loop+0x15/0x70
[   31.609968]  ? clear_bhb_loop+0x15/0x70
[   31.609971]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   31.609973] RIP: 0033:0x7fd14baf4484
[   31.609978] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d 45 9c 10 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
[   31.609979] RSP: 002b:00007ffe2d91f6e8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[   31.609982] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd14baf4484
[   31.609984] RDX: 0000000000000003 RSI: 00007ffe2d91f9f0 RDI: 000000000000005a
[   31.609985] RBP: 00007ffe2d91f710 R08: 00007fd14bbf51c8 R09: 00007ffe2d91f7c0
[   31.609986] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003
[   31.609987] R13: 00007ffe2d91f9f0 R14: 000055eccc8f80a0 R15: 00007fd14bbf4e80
[   31.609992]  </TASK>

-Tony

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: next-20250327 - lockdep whine and USB issues at boot
  2025-03-28 18:10 next-20250327 - lockdep whine and USB issues at boot Valdis Klētnieks
  2025-03-28 20:32 ` Luck, Tony
@ 2025-03-29  2:06 ` Ming Lei
  2025-03-29  6:02   ` Valdis Klētnieks
  1 sibling, 1 reply; 5+ messages in thread
From: Ming Lei @ 2025-03-29  2:06 UTC (permalink / raw)
  To: Valdis Klētnieks
  Cc: Stephen Rothwell, Jens Axboe, linux-next, linux-kernel,
	linux-block

On Fri, Mar 28, 2025 at 02:10:53PM -0400, Valdis Klētnieks wrote:
> Saw this during boot on a Dell Inspiron 5559 laptop.  
> 
> In addition, the external USB ports all gave up, rendering a USB mouse and a
> USB external drive totally dead in the water.  May or may not be related, I didn't
> dig too far into it.

It shouldn't be related to the warning.

> 
> [   40.842033] [    T953] io scheduler bfq registered
> 
> [   41.022391] [    T817] ======================================================
> [   41.103507] [    T817] WARNING: possible circular locking dependency detected
> [   41.184587] [    T817] 6.14.0-next-20250327 #110 Tainted: G          I     T  
> [   41.265700] [    T817] ------------------------------------------------------
> [   41.346832] [    T817] (udev-worker)/817 is trying to acquire lock:
> [   41.427952] [    T817] ffff93a2c80ae9f0 (&q->elevator_lock){+.+.}-{4:4}, at: elv_iosched_store+0xe1/0x260
> [   41.830112] [    T817] 
>                           but task is already holding lock:
> [   41.912022] [    T817] ffff93a2c80ae460 (&q->q_usage_counter(io)#10){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x11/0x20

For this lockdep warning, feel free to try patch in the following link:

https://lore.kernel.org/linux-block/Z-dUCLvf06SfTOHy@fedora/

Thanks,
Ming


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: next-20250327 - lockdep whine and USB issues at boot
  2025-03-29  2:06 ` Ming Lei
@ 2025-03-29  6:02   ` Valdis Klētnieks
  2025-03-31  9:21     ` Ming Lei
  0 siblings, 1 reply; 5+ messages in thread
From: Valdis Klētnieks @ 2025-03-29  6:02 UTC (permalink / raw)
  To: Ming Lei
  Cc: Stephen Rothwell, Jens Axboe, linux-next, linux-kernel,
	linux-block

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 10548 bytes --]

On Sat, 29 Mar 2025 10:06:39 +0800, Ming Lei said:
> On Fri, Mar 28, 2025 at 02:10:53PM -0400, Valdis Klētnieks wrote:
> > Saw this during boot on a Dell Inspiron 5559 laptop.  
> > 
> > In addition, the external USB ports all gave up, rendering a USB mouse and a
> > USB external drive totally dead in the water.  May or may not be related, I didn't
> > dig too far into it.
>
> It shouldn't be related to the warning.

> For this lockdep warning, feel free to try patch in the following link:
>
> https://lore.kernel.org/linux-block/Z-dUCLvf06SfTOHy@fedora/

After applying that patch, USB *didn't* die during boot.  So apparently
*something* changed.  

Also, the patch merely caused a *different* lockdep warning.
Rather than  &q->q_usage_counter(io) and &q->elevator_lock, the
new one is &q->elevator_lock versus pcpu_alloc_mutex...

Looks like it's a bit more convoluted than first looked?

[   52.140990] [    T954] io scheduler bfq registered

[   52.355286] [    T848] ======================================================
[   52.436367] [    T848] WARNING: possible circular locking dependency detected
[   52.517414] [    T848] 6.14.0-next-20250327-dirty #111 Tainted: G          I     T  
[   52.598492] [    T848] ------------------------------------------------------
[   52.679571] [    T848] (udev-worker)/848 is trying to acquire lock:
[   52.760647] [    T848] ffffffff99bb4470 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x524/0x840
[   53.162776] [    T848] 
                          but task is already holding lock:
[   53.244616] [    T848] ffff8c22c3392428 (&q->elevator_lock){+.+.}-{4:4}, at: elv_iosched_store+0xe1/0x260
[   53.646741] [    T848] 
                          which lock already depends on the new lock.

[   53.729459] [    T848] 
                          the existing dependency chain (in reverse order) is:
[   53.811348] [    T848] 
                          -> #3 (&q->elevator_lock){+.+.}-{4:4}:
[   54.214266] [    T848]        lock_acquire.part.0+0xbe/0x240
[   54.295342] [    T848]        __mutex_lock+0xcb/0xfc0
[   54.376409] [    T848]        elv_iosched_store+0xe1/0x260
[   54.457476] [    T848]        kernfs_fop_write_iter+0x160/0x240
[   54.538548] [    T848]        vfs_write+0x2ec/0x5c0
[   54.619619] [    T848]        ksys_write+0x7a/0xf0
[   54.700687] [    T848]        do_syscall_64+0x68/0x140
[   54.781751] [    T848]        entry_SYSCALL_64_after_hwframe+0x71/0x79
[   54.862814] [    T848] 
                          -> #2 (&q->q_usage_counter(io)#10){++++}-{0:0}:
[   55.345959] [    T848]        lock_acquire.part.0+0xbe/0x240
[   55.427018] [    T848]        blk_alloc_queue+0x30b/0x350
[   55.508082] [    T848]        blk_mq_alloc_queue+0x62/0xd0
[   55.589134] [    T848]        scsi_alloc_sdev+0x29c/0x3d0
[   55.670185] [    T848]        scsi_probe_and_add_lun+0x1d8/0x2b0
[   55.751232] [    T848]        __scsi_add_device+0x114/0x130
[   55.832276] [    T848]        ata_scsi_scan_host+0x7a/0x190
[   55.913321] [    T848]        async_run_entry_fn+0x24/0xc0
[   55.994358] [    T848]        process_one_work+0x21e/0x5a0
[   56.075404] [    T848]        worker_thread+0x1d5/0x3c0
[   56.156444] [    T848]        kthread+0x114/0x230
[   56.237474] [    T848]        ret_from_fork+0x2c/0x50
[   56.318492] [    T848]        ret_from_fork_asm+0x1a/0x30
[   56.399515] [    T848] 
                          -> #1 (fs_reclaim){+.+.}-{0:0}:
[   56.802275] [    T848]        lock_acquire.part.0+0xbe/0x240
[   56.883298] [    T848]        fs_reclaim_acquire+0xa8/0xe0
[   56.964317] [    T848]        __alloc_frozen_pages_noprof+0xd2/0x340
[   57.045338] [    T848]        __alloc_pages_noprof+0x9/0x20
[   57.126360] [    T848]        pcpu_alloc_pages.isra.0+0x97/0x180
[   57.207375] [    T848]        pcpu_populate_chunk+0x34/0x70
[   57.288388] [    T848]        pcpu_alloc_noprof+0x2d1/0x840
[   57.369366] [    T848]        iommu_dma_init_fq+0x91/0x260
[   57.450373] [    T848]        iommu_dma_init_domain+0x1de/0x210
[   57.531381] [    T848]        iommu_setup_dma_ops+0x47/0x80
[   57.612355] [    T848]        iommu_device_register+0x172/0x250
[   57.693348] [    T848]        intel_iommu_init+0x5ef/0xbb0
[   57.774344] [    T848]        pci_iommu_init+0x29/0x90
[   57.855333] [    T848]        do_one_initcall+0x83/0x3c0
[   57.936316] [    T848]        do_initcalls+0x130/0x1b0
[   58.017300] [    T848]        kernel_init_freeable+0x292/0x300
[   58.098281] [    T848]        kernel_init+0x15/0x130
[   58.179254] [    T848]        ret_from_fork+0x2c/0x50
[   58.260224] [    T848]        ret_from_fork_asm+0x1a/0x30
[   58.341195] [    T848] 
                          -> #0 (pcpu_alloc_mutex){+.+.}-{4:4}:
[   58.743832] [    T848]        check_prev_add+0xe1/0xcf0
[   58.824752] [    T848]        __lock_acquire+0x1031/0x13b0
[   58.905697] [    T848]        lock_acquire.part.0+0xbe/0x240
[   58.986633] [    T848]        __mutex_lock+0xcb/0xfc0
[   59.067565] [    T848]        pcpu_alloc_noprof+0x524/0x840
[   59.148463] [    T848]        sbitmap_init_node+0xf0/0x230
[   59.229392] [    T848]        sbitmap_queue_init_node+0x29/0x140
[   59.310317] [    T848]        blk_mq_init_tags+0x95/0x110
[   59.391240] [    T848]        blk_mq_alloc_map_and_rqs+0x3c/0x110
[   59.472131] [    T848]        blk_mq_init_sched+0x100/0x260
[   59.553017] [    T848]        elevator_switch+0xb1/0x190
[   59.633926] [    T848]        elv_iosched_store+0x1db/0x260
[   59.714841] [    T848]        kernfs_fop_write_iter+0x160/0x240
[   59.795728] [    T848]        vfs_write+0x2ec/0x5c0
[   59.876599] [    T848]        ksys_write+0x7a/0xf0
[   59.957500] [    T848]        do_syscall_64+0x68/0x140
[   60.038396] [    T848]        entry_SYSCALL_64_after_hwframe+0x71/0x79
[   60.119269] [    T848] 
                          other info that might help us debug this:

[   60.201395] [    T848] Chain exists of:
                            pcpu_alloc_mutex --> &q->q_usage_counter(io)#10 --> &q->elevator_lock

[   60.845339] [    T848]  Possible unsafe locking scenario:

[   60.926872] [    T848]        CPU0                    CPU1
[   61.007776] [    T848]        ----                    ----
[   61.088676] [    T848]   lock(&q->elevator_lock);
[   61.330053] [    T848]                                lock(&q->q_usage_counter(io)#10);
[   61.651731] [    T848]                                lock(&q->elevator_lock);
[   61.893151] [    T848]   lock(pcpu_alloc_mutex);
[   62.134602] [    T848] 
                           *** DEADLOCK ***

[   62.216729] [    T848] 6 locks held by (udev-worker)/848:
[   62.297630] [    T848]  #0: ffff8c22c8575440 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x7a/0xf0
[   62.860055] [    T848]  #1: ffff8c22c9fb4090 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x119/0x240
[   63.422482] [    T848]  #2: ffff8c22c821fad0 (kn->active#89){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x122/0x240
[   63.984915] [    T848]  #3: ffff8c22c3391e98 (&q->q_usage_counter(io)#10){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x11/0x20
[   64.547390] [    T848]  #4: ffff8c22c3391ed8 (&q->q_usage_counter(queue)){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x11/0x20
[   65.029578] [    T848]  #5: ffff8c22c3392428 (&q->elevator_lock){+.+.}-{4:4}, at: elv_iosched_store+0xe1/0x260
[   65.511794] [    T848] 
                          stack backtrace:
[   65.593348] [    T848] CPU: 1 UID: 0 PID: 848 Comm: (udev-worker) Tainted: G          I     T   6.14.0-next-20250327-dirty #111 PREEMPT(full)  7b54f670bb22238ec039f82f30e0fd3bcc8dbc83
[   65.673611] [    T848] Tainte\0\0\0\0\0\0\0\0d: [I]=FIRMWARE_WORKAROUND, [T]=RANDSTRUCT
[   65.753868] [    T848] Hardware name: Dell Inc. Unidentified System/052K07, BIOS 1.9.0 09/07/2020
[   65.834126] [    T848] Call Trace:
[   65.914385] [    T848]  <TASK>
[   65.994655] [    T848]  dump_stack_lvl+0x65/0x90
[   66.074916] [    T848]  print_circular_bug.cold+0x38/0x46
[   66.155178] [    T848]  check_noncircular+0x169/0x190
[   66.235441] [    T848]  check_prev_add+0xe1/0xcf0
[   66.315699] [    T848]  ? add_chain_cache+0x115/0x4c0
[   66.395961] [    T848]  __lock_acquire+0x1031/0x13b0
[   66.476225] [    T848]  lock_acquire.part.0+0xbe/0x240
[   66.556484] [    T848]  ? pcpu_alloc_noprof+0x524/0x840
[   66.636745] [    T848]  ? lock_acquire+0xf8/0x140
[   66.717006] [    T848]  __mutex_lock+0xcb/0xfc0
[   66.797266] [    T848]  ? pcpu_alloc_noprof+0x524/0x840
[   66.877526] [    T848]  ? lockdep_hardirqs_on+0x84/0x130
[   66.957786] [    T848]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[   67.038044] [    T848]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[   67.118303] [    T848]  ? pcpu_alloc_noprof+0x524/0x840
[   67.198566] [    T848]  ? pcpu_alloc_noprof+0x524/0x840
[   67.278825] [    T848]  ? find_held_lock+0x2b/0x80
[   67.359084] [    T848]  pcpu_alloc_noprof+0x524/0x840
[   67.439344] [    T848]  ? local_clock_noinstr+0xf/0x100
[   67.519608] [    T848]  sbitmap_init_node+0xf0/0x230
[   67.599870] [    T848]  sbitmap_queue_init_node+0x29/0x140
[   67.680131] [    T848]  blk_mq_init_tags+0x95/0x110
[   67.760392] [    T848]  blk_mq_alloc_map_and_rqs+0x3c/0x110
[   67.840654] [    T848]  blk_mq_init_sched+0x100/0x260
[   67.920916] [    T848]  elevator_switch+0xb1/0x190
[   68.001176] [    T848]  ? elv_iosched_store+0x1d0/0x260
[   68.081437] [    T848]  elv_iosched_store+0x1db/0x260
[   68.161700] [    T848]  kernfs_fop_write_iter+0x160/0x240
[   68.241960] [    T848]  vfs_write+0x2ec/0x5c0
[   68.322224] [    T848]  ksys_write+0x7a/0xf0
[   68.402485] [    T848]  do_syscall_64+0x68/0x140
[   68.482745] [    T848]  entry_SYSCALL_64_after_hwframe+0x71/0x79
[   68.563004] [    T848] RIP: 0033:0x7f2a1747ca06
[   68.643263] [    T848] Code: 5d e8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 19 83 e2 39 83 fa 08 75 11 e8 26 ff ff ff 66 0f 1f 44 00 00 48 8b 45 10 0f 05 <48> 8b 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08
[   68.723521] [    T848] RSP: 002b:00007ffe48933eb0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[   68.884037] [    T848] RAX: ffffffffffffffda RBX: 000056119a4eaff0 RCX: 00007f2a1747ca06
[   68.964295] [    T848] RDX: 0000000000000003 RSI: 00007ffe48934200 RDI: 0000000000000027
[   69.044553] [    T848] RBP: 00007ffe48933ed0 R08: 0000000000000000 R09: 0000000000000000
[   69.124811] [    T848] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003
[   69.205068] [    T848] R13: 0000000000000003 R14: 00007ffe48934200 R15: 00007ffe48934200
[   69.285332] [    T848]  </TASK>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: next-20250327 - lockdep whine and USB issues at boot
  2025-03-29  6:02   ` Valdis Klētnieks
@ 2025-03-31  9:21     ` Ming Lei
  0 siblings, 0 replies; 5+ messages in thread
From: Ming Lei @ 2025-03-31  9:21 UTC (permalink / raw)
  To: Valdis Klētnieks
  Cc: Stephen Rothwell, Jens Axboe, linux-next, linux-kernel,
	linux-block

On Sat, Mar 29, 2025 at 02:02:10AM -0400, Valdis Klētnieks wrote:
> On Sat, 29 Mar 2025 10:06:39 +0800, Ming Lei said:
> > On Fri, Mar 28, 2025 at 02:10:53PM -0400, Valdis Kl??tnieks wrote:
> > > Saw this during boot on a Dell Inspiron 5559 laptop.  
> > > 
> > > In addition, the external USB ports all gave up, rendering a USB mouse and a
> > > USB external drive totally dead in the water.  May or may not be related, I didn't
> > > dig too far into it.
> >
> > It shouldn't be related to the warning.
> 
> > For this lockdep warning, feel free to try patch in the following link:
> >
> > https://lore.kernel.org/linux-block/Z-dUCLvf06SfTOHy@fedora/
> 
> After applying that patch, USB *didn't* die during boot.  So apparently
> *something* changed.  

That is surprising, and maybe the USB die isn't 100% thing.

> 
> Also, the patch merely caused a *different* lockdep warning.
> Rather than  &q->q_usage_counter(io) and &q->elevator_lock, the
> new one is &q->elevator_lock versus pcpu_alloc_mutex...
> 
> Looks like it's a bit more convoluted than first looked?

That is another story wrt. freeze lock, fs_reclaim & percpu allocator
lock, looks one real risk, I will try to work one patch and see if
all can be addressed.



Thanks,
Ming


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-03-31  9:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-28 18:10 next-20250327 - lockdep whine and USB issues at boot Valdis Klētnieks
2025-03-28 20:32 ` Luck, Tony
2025-03-29  2:06 ` Ming Lei
2025-03-29  6:02   ` Valdis Klētnieks
2025-03-31  9:21     ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).