Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Heiner Kallweit <hkallweit1@gmail.com>
To: Lukas Wunner <lukas@wunner.de>, Roman Lozko <lozko.roma@gmail.com>
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Sean Christopherson <seanjc@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	netdev@vger.kernel.org, Christian Marangi <ansuelsmth@gmail.com>,
	Kurt Kanzenbach <kurt@linutronix.de>,
	Jesse Brandeburg <jesse.brandeburg@intel.com>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	intel-wired-lan@lists.osuosl.org
Subject: Re: Deadlock in pciehp on dock disconnect
Date: Fri, 5 Apr 2024 15:31:34 +0200	[thread overview]
Message-ID: <cd9edf12-5241-4366-b376-d5ee8f919903@gmail.com> (raw)
In-Reply-To: <Zg_MOG1OufptoRph@wunner.de>

On 05.04.2024 12:02, Lukas Wunner wrote:
> [cc += netdev maintainers]
> 
> On Fri, Apr 05, 2024 at 11:14:01AM +0200, Roman Lozko wrote:
>> Hi, I'm using HP G4 Thunderbolt docking station, and recently (?)
>> kernel started to "partially" deadlock after disconnecting the dock
>> station. This results in inability to turn network interfaces on or
>> off, system can't reboot, `sudo` does not work (guess because it uses
>> DNS).
>>
>> It started to occur ~two weeks ago, don't know why, I did not change
>> anything at that time. First seen on 6.8.2, nothing changed with
>> 6.9.0-rc2.
> 
> This is not a pciehp issue, it's a networking issue:
> 
> In the stacktrace you've provided below, the rtnl_lock() is acquired
> recursively, which leads to the deadlock:
> 
> unregister_netdev() acquires rtnl_lock(), indirectly invokes
> netdev_trig_deactivate() upon unregistering some LED, thereby
> calling unregister_netdevice_notifier(), which tries to
> acquire rtnl_lock() again.
> 
>>From a quick look at the source files involved, this doesn't look
> like something new, though I note LED support for igc was added
> only recently with ea578703b03d ("igc: Add support for LEDs on
> i225/i226"), which went into v6.9-rc1.
> 
> The other hanging tasks are simply waiting for rtnl_lock() as well.
> 
> 
>> pciehp stack trace:
>> INFO: task irq/122-pciehp:209 blocked for more than 120 seconds.
>>       Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:irq/122-pciehp  state:D stack:0     pid:209   tgid:209   ppid:2
>>    flags:0x00004000
>> Call Trace:
>>  <TASK>
>>  __schedule+0x5dd/0x1380
>>  schedule+0x6e/0xf0
>>  schedule_preempt_disabled+0x15/0x20
>>  __mutex_lock+0x2a0/0x750
>>  unregister_netdevice_notifier+0x40/0x150
>>  netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev c68f5c964fe428d1a2169816a653c62dba2f2e01]
>>  led_trigger_set+0x102/0x330
>>  led_classdev_unregister+0x4b/0x110
>>  release_nodes+0x3d/0xb0
>>  devres_release_all+0x8b/0xc0
>>  device_del+0x34f/0x3c0
>>  unregister_netdevice_many_notify+0x80b/0xaf0
>>  unregister_netdev+0x7c/0xd0
>>  igc_remove+0xd8/0x1e0 [igc d1bcf7b726f7370e167c72960cdb27ae7f970357]
>>  pci_device_remove+0x3f/0xb0
>>  device_release_driver_internal+0x1be/0x2d0
>>  pci_stop_bus_device+0x68/0xa0
>>  pci_stop_bus_device+0x39/0xa0
>>  pci_stop_bus_device+0x39/0xa0
>>  pciehp_unconfigure_device+0x12b/0x1d0
>>  pciehp_disable_slot+0x65/0x120
>>  pciehp_handle_presence_or_link_change+0x7a/0x450
>>  pciehp_ist+0xf5/0x320
>>  irq_thread_fn+0x1d/0x40
>>  irq_thread+0x19b/0x260
>>  kthread+0x147/0x160
>>  ret_from_fork+0x34/0x40
>>  ret_from_fork_asm+0x11/0x20
>>  </TASK>
>>
>> Other affected kernel threads
>> INFO: task NetworkManager:1294 blocked for more than 120 seconds.
>>       Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:NetworkManager  state:D stack:0     pid:1294  tgid:1294  ppid:1
>>    flags:0x00000002
>> Call Trace:
>>  <TASK>
>>  __schedule+0x5dd/0x1380
>>  schedule+0x6e/0xf0
>>  schedule_preempt_disabled+0x15/0x20
>>  __mutex_lock+0x2a0/0x750
>>  netlink_dump+0x1c4/0x3f0
>>  __netlink_dump_start+0x2b3/0x340
>>  rtnetlink_rcv_msg+0x469/0x4a0
>>  netlink_rcv_skb+0xed/0x120
>>  netlink_unicast+0x2ce/0x3f0
>>  netlink_sendmsg+0x39c/0x450
>>  ____sys_sendmsg+0x1a5/0x2a0
>>  ___sys_sendmsg+0x293/0x2d0
>>  __x64_sys_sendmsg+0x10d/0x140
>>  do_syscall_64+0x92/0x170
>>  entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x7971ac52c02b
>> RSP: 002b:00007ffc684c09a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
>> RAX: ffffffffffffffda RBX: 00005661e9bc5be0 RCX: 00007971ac52c02b
>> RDX: 0000000000000000 RSI: 00007ffc684c09e0 RDI: 000000000000000d
>> RBP: 00007ffc684c09c0 R08: 0000000000000000 R09: 0000000000000001
>> R10: 0000000000000001 R11: 0000000000000293 R12: 0000000000000001
>> R13: 0000000000000000 R14: 00005661e9c45030 R15: 00005661e9bc5cac
>>  </TASK>
>> INFO: task geoclue:2325 blocked for more than 120 seconds.
>>       Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:geoclue         state:D stack:0     pid:2325  tgid:2325  ppid:1
>>    flags:0x00000002
>> Call Trace:
>>  <TASK>
>>  __schedule+0x5dd/0x1380
>>  schedule+0x6e/0xf0
>>  schedule_preempt_disabled+0x15/0x20
>>  __mutex_lock+0x2a0/0x750
>>  netlink_dump+0x1c4/0x3f0
>>  __netlink_dump_start+0x2b3/0x340
>>  rtnetlink_rcv_msg+0x469/0x4a0
>>  netlink_rcv_skb+0xed/0x120
>>  netlink_unicast+0x2ce/0x3f0
>>  netlink_sendmsg+0x39c/0x450
>>  __sys_sendto+0x2c8/0x350
>>  __x64_sys_sendto+0x26/0x30
>>  do_syscall_64+0x92/0x170
>>  entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x7ad712b2beea
>> RSP: 002b:00007fff94c1fd80 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ad712b2beea
>> RDX: 0000000000000014 RSI: 00007fff94c1fe10 RDI: 0000000000000007
>> RBP: 00007fff94c1fdb0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000004000 R11: 0000000000000246 R12: 00007fff94c1fe10
>> R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000
>>  </TASK>
>> INFO: task pool-geoclue:84396 blocked for more than 120 seconds.
>>       Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:pool-geoclue    state:D stack:0     pid:84396 tgid:2325  ppid:1
>>    flags:0x00000002
>> Call Trace:
>>  <TASK>
>>  __schedule+0x5dd/0x1380
>>  schedule+0x6e/0xf0
>>  schedule_preempt_disabled+0x15/0x20
>>  __mutex_lock+0x2a0/0x750
>>  netlink_dump+0x1c4/0x3f0
>>  __netlink_dump_start+0x2b3/0x340
>>  rtnetlink_rcv_msg+0x469/0x4a0
>>  netlink_rcv_skb+0xed/0x120
>>  netlink_unicast+0x2ce/0x3f0
>>  netlink_sendmsg+0x39c/0x450
>>  __sys_sendto+0x2c8/0x350
>>  __x64_sys_sendto+0x26/0x30
>>  do_syscall_64+0x92/0x170
>>  entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x7ad712b2c0e4
>> RSP: 002b:00007ad6e7dfdf40 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ad712b2c0e4
>> RDX: 0000000000000014 RSI: 00007ad6e7dff070 RDI: 000000000000000b
>> RBP: 00007ad6e7dfdf80 R08: 00007ad6e7dff014 R09: 000000000000000c
>> R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000000b
>> R13: 0000000000000010 R14: 00007ad6e7dff030 R15: 00000000d3fb1bea
>>  </TASK>
>> INFO: task Qt bearer threa:4002 blocked for more than 120 seconds.
>>       Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:Qt bearer threa state:D stack:0     pid:4002  tgid:3506
>> ppid:3034   flags:0x00000002
>> Call Trace:
>>  <TASK>
>>  __schedule+0x5dd/0x1380
>>  schedule+0x6e/0xf0
>>  schedule_preempt_disabled+0x15/0x20
>>  __mutex_lock+0x2a0/0x750
>>  netlink_dump+0x1c4/0x3f0
>>  __netlink_dump_start+0x2b3/0x340
>>  rtnetlink_rcv_msg+0x469/0x4a0
>>  netlink_rcv_skb+0xed/0x120
>>  netlink_unicast+0x2ce/0x3f0
>>  netlink_sendmsg+0x39c/0x450
>>  __sys_sendto+0x2c8/0x350
>>  __x64_sys_sendto+0x26/0x30
>>  do_syscall_64+0x92/0x170
>>  entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x76f3c692beea
>> RSP: 002b:000076f3a51fecb0 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000076f3c692beea
>> RDX: 0000000000000020 RSI: 000076f3a51fed60 RDI: 0000000000000023
>> RBP: 000076f3a51fece0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 000076f3a51fee38
>> R13: 000076f378026b30 R14: 000076f3a51fed30 R15: 000076f378026b48
>>  </TASK>
>> INFO: task gnome-software:3529 blocked for more than 120 seconds.
>>       Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:gnome-software  state:D stack:0     pid:3529  tgid:3529
>> ppid:3034   flags:0x00000002
>> Call Trace:
>>  <TASK>
>>  __schedule+0x5dd/0x1380
>>  schedule+0x6e/0xf0
>>  schedule_preempt_disabled+0x15/0x20
>>  __mutex_lock+0x2a0/0x750
>>  netlink_dump+0x1c4/0x3f0
>>  __netlink_dump_start+0x2b3/0x340
>>  rtnetlink_rcv_msg+0x469/0x4a0
>>  netlink_rcv_skb+0xed/0x120
>>  netlink_unicast+0x2ce/0x3f0
>>  netlink_sendmsg+0x39c/0x450
>>  __sys_sendto+0x2c8/0x350
>>  __x64_sys_sendto+0x26/0x30
>>  do_syscall_64+0x92/0x170
>>  entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x7d6be892beea
>> RSP: 002b:00007ffd94e01560 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007d6be892beea
>> RDX: 0000000000000014 RSI: 00007ffd94e015f0 RDI: 000000000000000d
>> RBP: 00007ffd94e01590 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000004000 R11: 0000000000000246 R12: 00007ffd94e015f0
>> R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000
>>  </TASK>
>> INFO: task Qt bearer threa:3960 blocked for more than 120 seconds.
>>       Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:Qt bearer threa state:D stack:0     pid:3960  tgid:3550
>> ppid:3034   flags:0x00000002
>> Call Trace:
>>  <TASK>
>>  __schedule+0x5dd/0x1380
>>  schedule+0x6e/0xf0
>>  schedule_preempt_disabled+0x15/0x20
>>  __mutex_lock+0x2a0/0x750
>>  netlink_dump+0x1c4/0x3f0
>>  __netlink_dump_start+0x2b3/0x340
>>  rtnetlink_rcv_msg+0x469/0x4a0
>>  netlink_rcv_skb+0xed/0x120
>>  netlink_unicast+0x2ce/0x3f0
>>  netlink_sendmsg+0x39c/0x450
>>  __sys_sendto+0x2c8/0x350
>>  __x64_sys_sendto+0x26/0x30
>>  do_syscall_64+0x92/0x170
>>  entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x777a42b2beea
>> RSP: 002b:0000777a2abfecf0 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000777a42b2beea
>> RDX: 0000000000000020 RSI: 0000777a2abfeda0 RDI: 000000000000001d
>> RBP: 0000777a2abfed20 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000777a2abfee78
>> R13: 0000777a080285b0 R14: 0000777a2abfed70 R15: 0000777a080285c8
>>  </TASK>
>> INFO: task xdg-desktop-por:3821 blocked for more than 120 seconds.
>>       Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:xdg-desktop-por state:D stack:0     pid:3821  tgid:3821
>> ppid:2776   flags:0x00000002
>> Call Trace:
>>  <TASK>
>>  __schedule+0x5dd/0x1380
>>  schedule+0x6e/0xf0
>>  schedule_preempt_disabled+0x15/0x20
>>  __mutex_lock+0x2a0/0x750
>>  netlink_dump+0x1c4/0x3f0
>>  __netlink_dump_start+0x2b3/0x340
>>  rtnetlink_rcv_msg+0x469/0x4a0
>>  netlink_rcv_skb+0xed/0x120
>>  netlink_unicast+0x2ce/0x3f0
>>  netlink_sendmsg+0x39c/0x450
>>  __sys_sendto+0x2c8/0x350
>>  __x64_sys_sendto+0x26/0x30
>>  do_syscall_64+0x92/0x170
>>  entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x79d76612beea
>> RSP: 002b:00007ffd480942a0 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000079d76612beea
>> RDX: 0000000000000014 RSI: 00007ffd48094330 RDI: 0000000000000008
>> RBP: 00007ffd480942d0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000004000 R11: 0000000000000246 R12: 00007ffd48094330
>> R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000
>>  </TASK>
>> INFO: task DNS Res~ver #11:25588 blocked for more than 120 seconds.
>>       Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:DNS Res~ver #11 state:D stack:0     pid:25588 tgid:4934
>> ppid:3070   flags:0x00000002
>> Call Trace:
>>  <TASK>
>>  __schedule+0x5dd/0x1380
>>  schedule+0x6e/0xf0
>>  schedule_preempt_disabled+0x15/0x20
>>  __mutex_lock+0x2a0/0x750
>>  netlink_dump+0x1c4/0x3f0
>>  __netlink_dump_start+0x2b3/0x340
>>  rtnetlink_rcv_msg+0x469/0x4a0
>>  netlink_rcv_skb+0xed/0x120
>>  netlink_unicast+0x2ce/0x3f0
>>  netlink_sendmsg+0x39c/0x450
>>  __sys_sendto+0x2c8/0x350
>>  __x64_sys_sendto+0x26/0x30
>>  do_syscall_64+0x92/0x170
>>  entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x72d65892c0e4
>> RSP: 002b:000072d649cbb880 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000072d65892c0e4
>> RDX: 0000000000000014 RSI: 000072d649cbc9b0 RDI: 0000000000000053
>> RBP: 000072d649cbb8c0 R08: 000072d649cbc954 R09: 000000000000000c
>> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000053
>> R13: 0000000000000010 R14: 000072d649cbc970 R15: 00000000b48fd654
>>  </TASK>
>> INFO: task kworker/u88:2:31385 blocked for more than 120 seconds.
>>       Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:kworker/u88:2   state:D stack:0     pid:31385 tgid:31385 ppid:2
>>    flags:0x00004000
>> Workqueue: ipv6_addrconf addrconf_verify_work
>> Call Trace:
>>  <TASK>
>>  __schedule+0x5dd/0x1380
>>  schedule+0x6e/0xf0
>>  schedule_preempt_disabled+0x15/0x20
>>  __mutex_lock+0x2a0/0x750
>>  addrconf_verify_work+0x20/0x30
>>  process_scheduled_works+0x1f4/0x450
>>  worker_thread+0x349/0x5e0
>>  kthread+0x147/0x160
>>  ret_from_fork+0x34/0x40
>>  ret_from_fork_asm+0x11/0x20
>>  </TASK>
> 

It's unfortunate that the device-managed LED is bound to the netdev device.
Wouldn't binding it to the parent (&pdev->dev) solve the issue?


  reply	other threads:[~2024-04-05 13:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-05  9:14 Deadlock in pciehp on dock disconnect Roman Lozko
2024-04-05 10:02 ` Lukas Wunner
2024-04-05 13:31   ` Heiner Kallweit [this message]
2024-04-05 17:48     ` Lukas Wunner
2024-04-05 19:01       ` Heiner Kallweit
2024-04-05 19:16       ` Lukas Wunner
2024-04-05 20:16         ` Heiner Kallweit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cd9edf12-5241-4366-b376-d5ee8f919903@gmail.com \
    --to=hkallweit1@gmail.com \
    --cc=ansuelsmth@gmail.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jesse.brandeburg@intel.com \
    --cc=kuba@kernel.org \
    --cc=kurt@linutronix.de \
    --cc=linux-pci@vger.kernel.org \
    --cc=lozko.roma@gmail.com \
    --cc=lukas@wunner.de \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox