* Re: Deadlock in pciehp on dock disconnect
[not found] <CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com>
@ 2024-04-05 10:02 ` Lukas Wunner
2024-04-05 12:59 ` vient
2024-04-05 13:31 ` Heiner Kallweit
0 siblings, 2 replies; 8+ messages in thread
From: Lukas Wunner @ 2024-04-05 10:02 UTC (permalink / raw)
To: Roman Lozko
Cc: linux-pci, Bjorn Helgaas, Dave Hansen, Sean Christopherson,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, Heiner Kallweit, Christian Marangi, Kurt Kanzenbach,
Jesse Brandeburg, Tony Nguyen, intel-wired-lan
[cc += netdev maintainers]
On Fri, Apr 05, 2024 at 11:14:01AM +0200, Roman Lozko wrote:
> Hi, I'm using HP G4 Thunderbolt docking station, and recently (?)
> kernel started to "partially" deadlock after disconnecting the dock
> station. This results in inability to turn network interfaces on or
> off, system can't reboot, `sudo` does not work (guess because it uses
> DNS).
>
> It started to occur ~two weeks ago, don't know why, I did not change
> anything at that time. First seen on 6.8.2, nothing changed with
> 6.9.0-rc2.
This is not a pciehp issue, it's a networking issue:
In the stacktrace you've provided below, the rtnl_lock() is acquired
recursively, which leads to the deadlock:
unregister_netdev() acquires rtnl_lock(), indirectly invokes
netdev_trig_deactivate() upon unregistering some LED, thereby
calling unregister_netdevice_notifier(), which tries to
acquire rtnl_lock() again.
From a quick look at the source files involved, this doesn't look
like something new, though I note LED support for igc was added
only recently with ea578703b03d ("igc: Add support for LEDs on
i225/i226"), which went into v6.9-rc1.
The other hanging tasks are simply waiting for rtnl_lock() as well.
> pciehp stack trace:
> INFO: task irq/122-pciehp:209 blocked for more than 120 seconds.
> Not tainted 6.9.0-rc2 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:irq/122-pciehp state:D stack:0 pid:209 tgid:209 ppid:2
> flags:0x00004000
> Call Trace:
> <TASK>
> __schedule+0x5dd/0x1380
> schedule+0x6e/0xf0
> schedule_preempt_disabled+0x15/0x20
> __mutex_lock+0x2a0/0x750
> unregister_netdevice_notifier+0x40/0x150
> netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev c68f5c964fe428d1a2169816a653c62dba2f2e01]
> led_trigger_set+0x102/0x330
> led_classdev_unregister+0x4b/0x110
> release_nodes+0x3d/0xb0
> devres_release_all+0x8b/0xc0
> device_del+0x34f/0x3c0
> unregister_netdevice_many_notify+0x80b/0xaf0
> unregister_netdev+0x7c/0xd0
> igc_remove+0xd8/0x1e0 [igc d1bcf7b726f7370e167c72960cdb27ae7f970357]
> pci_device_remove+0x3f/0xb0
> device_release_driver_internal+0x1be/0x2d0
> pci_stop_bus_device+0x68/0xa0
> pci_stop_bus_device+0x39/0xa0
> pci_stop_bus_device+0x39/0xa0
> pciehp_unconfigure_device+0x12b/0x1d0
> pciehp_disable_slot+0x65/0x120
> pciehp_handle_presence_or_link_change+0x7a/0x450
> pciehp_ist+0xf5/0x320
> irq_thread_fn+0x1d/0x40
> irq_thread+0x19b/0x260
> kthread+0x147/0x160
> ret_from_fork+0x34/0x40
> ret_from_fork_asm+0x11/0x20
> </TASK>
>
> Other affected kernel threads
> INFO: task NetworkManager:1294 blocked for more than 120 seconds.
> Not tainted 6.9.0-rc2 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:NetworkManager state:D stack:0 pid:1294 tgid:1294 ppid:1
> flags:0x00000002
> Call Trace:
> <TASK>
> __schedule+0x5dd/0x1380
> schedule+0x6e/0xf0
> schedule_preempt_disabled+0x15/0x20
> __mutex_lock+0x2a0/0x750
> netlink_dump+0x1c4/0x3f0
> __netlink_dump_start+0x2b3/0x340
> rtnetlink_rcv_msg+0x469/0x4a0
> netlink_rcv_skb+0xed/0x120
> netlink_unicast+0x2ce/0x3f0
> netlink_sendmsg+0x39c/0x450
> ____sys_sendmsg+0x1a5/0x2a0
> ___sys_sendmsg+0x293/0x2d0
> __x64_sys_sendmsg+0x10d/0x140
> do_syscall_64+0x92/0x170
> entry_SYSCALL_64_after_hwframe+0x46/0x4e
> RIP: 0033:0x7971ac52c02b
> RSP: 002b:00007ffc684c09a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 00005661e9bc5be0 RCX: 00007971ac52c02b
> RDX: 0000000000000000 RSI: 00007ffc684c09e0 RDI: 000000000000000d
> RBP: 00007ffc684c09c0 R08: 0000000000000000 R09: 0000000000000001
> R10: 0000000000000001 R11: 0000000000000293 R12: 0000000000000001
> R13: 0000000000000000 R14: 00005661e9c45030 R15: 00005661e9bc5cac
> </TASK>
> INFO: task geoclue:2325 blocked for more than 120 seconds.
> Not tainted 6.9.0-rc2 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:geoclue state:D stack:0 pid:2325 tgid:2325 ppid:1
> flags:0x00000002
> Call Trace:
> <TASK>
> __schedule+0x5dd/0x1380
> schedule+0x6e/0xf0
> schedule_preempt_disabled+0x15/0x20
> __mutex_lock+0x2a0/0x750
> netlink_dump+0x1c4/0x3f0
> __netlink_dump_start+0x2b3/0x340
> rtnetlink_rcv_msg+0x469/0x4a0
> netlink_rcv_skb+0xed/0x120
> netlink_unicast+0x2ce/0x3f0
> netlink_sendmsg+0x39c/0x450
> __sys_sendto+0x2c8/0x350
> __x64_sys_sendto+0x26/0x30
> do_syscall_64+0x92/0x170
> entry_SYSCALL_64_after_hwframe+0x46/0x4e
> RIP: 0033:0x7ad712b2beea
> RSP: 002b:00007fff94c1fd80 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ad712b2beea
> RDX: 0000000000000014 RSI: 00007fff94c1fe10 RDI: 0000000000000007
> RBP: 00007fff94c1fdb0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000004000 R11: 0000000000000246 R12: 00007fff94c1fe10
> R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
> INFO: task pool-geoclue:84396 blocked for more than 120 seconds.
> Not tainted 6.9.0-rc2 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:pool-geoclue state:D stack:0 pid:84396 tgid:2325 ppid:1
> flags:0x00000002
> Call Trace:
> <TASK>
> __schedule+0x5dd/0x1380
> schedule+0x6e/0xf0
> schedule_preempt_disabled+0x15/0x20
> __mutex_lock+0x2a0/0x750
> netlink_dump+0x1c4/0x3f0
> __netlink_dump_start+0x2b3/0x340
> rtnetlink_rcv_msg+0x469/0x4a0
> netlink_rcv_skb+0xed/0x120
> netlink_unicast+0x2ce/0x3f0
> netlink_sendmsg+0x39c/0x450
> __sys_sendto+0x2c8/0x350
> __x64_sys_sendto+0x26/0x30
> do_syscall_64+0x92/0x170
> entry_SYSCALL_64_after_hwframe+0x46/0x4e
> RIP: 0033:0x7ad712b2c0e4
> RSP: 002b:00007ad6e7dfdf40 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ad712b2c0e4
> RDX: 0000000000000014 RSI: 00007ad6e7dff070 RDI: 000000000000000b
> RBP: 00007ad6e7dfdf80 R08: 00007ad6e7dff014 R09: 000000000000000c
> R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000000b
> R13: 0000000000000010 R14: 00007ad6e7dff030 R15: 00000000d3fb1bea
> </TASK>
> INFO: task Qt bearer threa:4002 blocked for more than 120 seconds.
> Not tainted 6.9.0-rc2 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:Qt bearer threa state:D stack:0 pid:4002 tgid:3506
> ppid:3034 flags:0x00000002
> Call Trace:
> <TASK>
> __schedule+0x5dd/0x1380
> schedule+0x6e/0xf0
> schedule_preempt_disabled+0x15/0x20
> __mutex_lock+0x2a0/0x750
> netlink_dump+0x1c4/0x3f0
> __netlink_dump_start+0x2b3/0x340
> rtnetlink_rcv_msg+0x469/0x4a0
> netlink_rcv_skb+0xed/0x120
> netlink_unicast+0x2ce/0x3f0
> netlink_sendmsg+0x39c/0x450
> __sys_sendto+0x2c8/0x350
> __x64_sys_sendto+0x26/0x30
> do_syscall_64+0x92/0x170
> entry_SYSCALL_64_after_hwframe+0x46/0x4e
> RIP: 0033:0x76f3c692beea
> RSP: 002b:000076f3a51fecb0 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000076f3c692beea
> RDX: 0000000000000020 RSI: 000076f3a51fed60 RDI: 0000000000000023
> RBP: 000076f3a51fece0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 000076f3a51fee38
> R13: 000076f378026b30 R14: 000076f3a51fed30 R15: 000076f378026b48
> </TASK>
> INFO: task gnome-software:3529 blocked for more than 120 seconds.
> Not tainted 6.9.0-rc2 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:gnome-software state:D stack:0 pid:3529 tgid:3529
> ppid:3034 flags:0x00000002
> Call Trace:
> <TASK>
> __schedule+0x5dd/0x1380
> schedule+0x6e/0xf0
> schedule_preempt_disabled+0x15/0x20
> __mutex_lock+0x2a0/0x750
> netlink_dump+0x1c4/0x3f0
> __netlink_dump_start+0x2b3/0x340
> rtnetlink_rcv_msg+0x469/0x4a0
> netlink_rcv_skb+0xed/0x120
> netlink_unicast+0x2ce/0x3f0
> netlink_sendmsg+0x39c/0x450
> __sys_sendto+0x2c8/0x350
> __x64_sys_sendto+0x26/0x30
> do_syscall_64+0x92/0x170
> entry_SYSCALL_64_after_hwframe+0x46/0x4e
> RIP: 0033:0x7d6be892beea
> RSP: 002b:00007ffd94e01560 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007d6be892beea
> RDX: 0000000000000014 RSI: 00007ffd94e015f0 RDI: 000000000000000d
> RBP: 00007ffd94e01590 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000004000 R11: 0000000000000246 R12: 00007ffd94e015f0
> R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
> INFO: task Qt bearer threa:3960 blocked for more than 120 seconds.
> Not tainted 6.9.0-rc2 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:Qt bearer threa state:D stack:0 pid:3960 tgid:3550
> ppid:3034 flags:0x00000002
> Call Trace:
> <TASK>
> __schedule+0x5dd/0x1380
> schedule+0x6e/0xf0
> schedule_preempt_disabled+0x15/0x20
> __mutex_lock+0x2a0/0x750
> netlink_dump+0x1c4/0x3f0
> __netlink_dump_start+0x2b3/0x340
> rtnetlink_rcv_msg+0x469/0x4a0
> netlink_rcv_skb+0xed/0x120
> netlink_unicast+0x2ce/0x3f0
> netlink_sendmsg+0x39c/0x450
> __sys_sendto+0x2c8/0x350
> __x64_sys_sendto+0x26/0x30
> do_syscall_64+0x92/0x170
> entry_SYSCALL_64_after_hwframe+0x46/0x4e
> RIP: 0033:0x777a42b2beea
> RSP: 002b:0000777a2abfecf0 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000777a42b2beea
> RDX: 0000000000000020 RSI: 0000777a2abfeda0 RDI: 000000000000001d
> RBP: 0000777a2abfed20 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000777a2abfee78
> R13: 0000777a080285b0 R14: 0000777a2abfed70 R15: 0000777a080285c8
> </TASK>
> INFO: task xdg-desktop-por:3821 blocked for more than 120 seconds.
> Not tainted 6.9.0-rc2 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:xdg-desktop-por state:D stack:0 pid:3821 tgid:3821
> ppid:2776 flags:0x00000002
> Call Trace:
> <TASK>
> __schedule+0x5dd/0x1380
> schedule+0x6e/0xf0
> schedule_preempt_disabled+0x15/0x20
> __mutex_lock+0x2a0/0x750
> netlink_dump+0x1c4/0x3f0
> __netlink_dump_start+0x2b3/0x340
> rtnetlink_rcv_msg+0x469/0x4a0
> netlink_rcv_skb+0xed/0x120
> netlink_unicast+0x2ce/0x3f0
> netlink_sendmsg+0x39c/0x450
> __sys_sendto+0x2c8/0x350
> __x64_sys_sendto+0x26/0x30
> do_syscall_64+0x92/0x170
> entry_SYSCALL_64_after_hwframe+0x46/0x4e
> RIP: 0033:0x79d76612beea
> RSP: 002b:00007ffd480942a0 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000079d76612beea
> RDX: 0000000000000014 RSI: 00007ffd48094330 RDI: 0000000000000008
> RBP: 00007ffd480942d0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000004000 R11: 0000000000000246 R12: 00007ffd48094330
> R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
> INFO: task DNS Res~ver #11:25588 blocked for more than 120 seconds.
> Not tainted 6.9.0-rc2 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:DNS Res~ver #11 state:D stack:0 pid:25588 tgid:4934
> ppid:3070 flags:0x00000002
> Call Trace:
> <TASK>
> __schedule+0x5dd/0x1380
> schedule+0x6e/0xf0
> schedule_preempt_disabled+0x15/0x20
> __mutex_lock+0x2a0/0x750
> netlink_dump+0x1c4/0x3f0
> __netlink_dump_start+0x2b3/0x340
> rtnetlink_rcv_msg+0x469/0x4a0
> netlink_rcv_skb+0xed/0x120
> netlink_unicast+0x2ce/0x3f0
> netlink_sendmsg+0x39c/0x450
> __sys_sendto+0x2c8/0x350
> __x64_sys_sendto+0x26/0x30
> do_syscall_64+0x92/0x170
> entry_SYSCALL_64_after_hwframe+0x46/0x4e
> RIP: 0033:0x72d65892c0e4
> RSP: 002b:000072d649cbb880 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000072d65892c0e4
> RDX: 0000000000000014 RSI: 000072d649cbc9b0 RDI: 0000000000000053
> RBP: 000072d649cbb8c0 R08: 000072d649cbc954 R09: 000000000000000c
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000053
> R13: 0000000000000010 R14: 000072d649cbc970 R15: 00000000b48fd654
> </TASK>
> INFO: task kworker/u88:2:31385 blocked for more than 120 seconds.
> Not tainted 6.9.0-rc2 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u88:2 state:D stack:0 pid:31385 tgid:31385 ppid:2
> flags:0x00004000
> Workqueue: ipv6_addrconf addrconf_verify_work
> Call Trace:
> <TASK>
> __schedule+0x5dd/0x1380
> schedule+0x6e/0xf0
> schedule_preempt_disabled+0x15/0x20
> __mutex_lock+0x2a0/0x750
> addrconf_verify_work+0x20/0x30
> process_scheduled_works+0x1f4/0x450
> worker_thread+0x349/0x5e0
> kthread+0x147/0x160
> ret_from_fork+0x34/0x40
> ret_from_fork_asm+0x11/0x20
> </TASK>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Re: Deadlock in pciehp on dock disconnect
2024-04-05 10:02 ` Deadlock in pciehp on dock disconnect Lukas Wunner
@ 2024-04-05 12:59 ` vient
2024-04-05 13:31 ` Heiner Kallweit
1 sibling, 0 replies; 8+ messages in thread
From: vient @ 2024-04-05 12:59 UTC (permalink / raw)
To: lukas; +Cc: netdev
Guess you are right about 6.9-rc1 changes, I've booted to 6.8.2 once more and
dock seems to disconnect fine here. So the problem appeared after switching
to 6.9 then.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Deadlock in pciehp on dock disconnect
2024-04-05 10:02 ` Deadlock in pciehp on dock disconnect Lukas Wunner
2024-04-05 12:59 ` vient
@ 2024-04-05 13:31 ` Heiner Kallweit
2024-04-05 17:48 ` Lukas Wunner
1 sibling, 1 reply; 8+ messages in thread
From: Heiner Kallweit @ 2024-04-05 13:31 UTC (permalink / raw)
To: Lukas Wunner, Roman Lozko
Cc: linux-pci, Bjorn Helgaas, Dave Hansen, Sean Christopherson,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, Christian Marangi, Kurt Kanzenbach, Jesse Brandeburg,
Tony Nguyen, intel-wired-lan
On 05.04.2024 12:02, Lukas Wunner wrote:
> [cc += netdev maintainers]
>
> On Fri, Apr 05, 2024 at 11:14:01AM +0200, Roman Lozko wrote:
>> Hi, I'm using HP G4 Thunderbolt docking station, and recently (?)
>> kernel started to "partially" deadlock after disconnecting the dock
>> station. This results in inability to turn network interfaces on or
>> off, system can't reboot, `sudo` does not work (guess because it uses
>> DNS).
>>
>> It started to occur ~two weeks ago, don't know why, I did not change
>> anything at that time. First seen on 6.8.2, nothing changed with
>> 6.9.0-rc2.
>
> This is not a pciehp issue, it's a networking issue:
>
> In the stacktrace you've provided below, the rtnl_lock() is acquired
> recursively, which leads to the deadlock:
>
> unregister_netdev() acquires rtnl_lock(), indirectly invokes
> netdev_trig_deactivate() upon unregistering some LED, thereby
> calling unregister_netdevice_notifier(), which tries to
> acquire rtnl_lock() again.
>
>>From a quick look at the source files involved, this doesn't look
> like something new, though I note LED support for igc was added
> only recently with ea578703b03d ("igc: Add support for LEDs on
> i225/i226"), which went into v6.9-rc1.
>
> The other hanging tasks are simply waiting for rtnl_lock() as well.
>
>
>> pciehp stack trace:
>> INFO: task irq/122-pciehp:209 blocked for more than 120 seconds.
>> Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:irq/122-pciehp state:D stack:0 pid:209 tgid:209 ppid:2
>> flags:0x00004000
>> Call Trace:
>> <TASK>
>> __schedule+0x5dd/0x1380
>> schedule+0x6e/0xf0
>> schedule_preempt_disabled+0x15/0x20
>> __mutex_lock+0x2a0/0x750
>> unregister_netdevice_notifier+0x40/0x150
>> netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev c68f5c964fe428d1a2169816a653c62dba2f2e01]
>> led_trigger_set+0x102/0x330
>> led_classdev_unregister+0x4b/0x110
>> release_nodes+0x3d/0xb0
>> devres_release_all+0x8b/0xc0
>> device_del+0x34f/0x3c0
>> unregister_netdevice_many_notify+0x80b/0xaf0
>> unregister_netdev+0x7c/0xd0
>> igc_remove+0xd8/0x1e0 [igc d1bcf7b726f7370e167c72960cdb27ae7f970357]
>> pci_device_remove+0x3f/0xb0
>> device_release_driver_internal+0x1be/0x2d0
>> pci_stop_bus_device+0x68/0xa0
>> pci_stop_bus_device+0x39/0xa0
>> pci_stop_bus_device+0x39/0xa0
>> pciehp_unconfigure_device+0x12b/0x1d0
>> pciehp_disable_slot+0x65/0x120
>> pciehp_handle_presence_or_link_change+0x7a/0x450
>> pciehp_ist+0xf5/0x320
>> irq_thread_fn+0x1d/0x40
>> irq_thread+0x19b/0x260
>> kthread+0x147/0x160
>> ret_from_fork+0x34/0x40
>> ret_from_fork_asm+0x11/0x20
>> </TASK>
>>
>> Other affected kernel threads
>> INFO: task NetworkManager:1294 blocked for more than 120 seconds.
>> Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:NetworkManager state:D stack:0 pid:1294 tgid:1294 ppid:1
>> flags:0x00000002
>> Call Trace:
>> <TASK>
>> __schedule+0x5dd/0x1380
>> schedule+0x6e/0xf0
>> schedule_preempt_disabled+0x15/0x20
>> __mutex_lock+0x2a0/0x750
>> netlink_dump+0x1c4/0x3f0
>> __netlink_dump_start+0x2b3/0x340
>> rtnetlink_rcv_msg+0x469/0x4a0
>> netlink_rcv_skb+0xed/0x120
>> netlink_unicast+0x2ce/0x3f0
>> netlink_sendmsg+0x39c/0x450
>> ____sys_sendmsg+0x1a5/0x2a0
>> ___sys_sendmsg+0x293/0x2d0
>> __x64_sys_sendmsg+0x10d/0x140
>> do_syscall_64+0x92/0x170
>> entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x7971ac52c02b
>> RSP: 002b:00007ffc684c09a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
>> RAX: ffffffffffffffda RBX: 00005661e9bc5be0 RCX: 00007971ac52c02b
>> RDX: 0000000000000000 RSI: 00007ffc684c09e0 RDI: 000000000000000d
>> RBP: 00007ffc684c09c0 R08: 0000000000000000 R09: 0000000000000001
>> R10: 0000000000000001 R11: 0000000000000293 R12: 0000000000000001
>> R13: 0000000000000000 R14: 00005661e9c45030 R15: 00005661e9bc5cac
>> </TASK>
>> INFO: task geoclue:2325 blocked for more than 120 seconds.
>> Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:geoclue state:D stack:0 pid:2325 tgid:2325 ppid:1
>> flags:0x00000002
>> Call Trace:
>> <TASK>
>> __schedule+0x5dd/0x1380
>> schedule+0x6e/0xf0
>> schedule_preempt_disabled+0x15/0x20
>> __mutex_lock+0x2a0/0x750
>> netlink_dump+0x1c4/0x3f0
>> __netlink_dump_start+0x2b3/0x340
>> rtnetlink_rcv_msg+0x469/0x4a0
>> netlink_rcv_skb+0xed/0x120
>> netlink_unicast+0x2ce/0x3f0
>> netlink_sendmsg+0x39c/0x450
>> __sys_sendto+0x2c8/0x350
>> __x64_sys_sendto+0x26/0x30
>> do_syscall_64+0x92/0x170
>> entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x7ad712b2beea
>> RSP: 002b:00007fff94c1fd80 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ad712b2beea
>> RDX: 0000000000000014 RSI: 00007fff94c1fe10 RDI: 0000000000000007
>> RBP: 00007fff94c1fdb0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000004000 R11: 0000000000000246 R12: 00007fff94c1fe10
>> R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000
>> </TASK>
>> INFO: task pool-geoclue:84396 blocked for more than 120 seconds.
>> Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:pool-geoclue state:D stack:0 pid:84396 tgid:2325 ppid:1
>> flags:0x00000002
>> Call Trace:
>> <TASK>
>> __schedule+0x5dd/0x1380
>> schedule+0x6e/0xf0
>> schedule_preempt_disabled+0x15/0x20
>> __mutex_lock+0x2a0/0x750
>> netlink_dump+0x1c4/0x3f0
>> __netlink_dump_start+0x2b3/0x340
>> rtnetlink_rcv_msg+0x469/0x4a0
>> netlink_rcv_skb+0xed/0x120
>> netlink_unicast+0x2ce/0x3f0
>> netlink_sendmsg+0x39c/0x450
>> __sys_sendto+0x2c8/0x350
>> __x64_sys_sendto+0x26/0x30
>> do_syscall_64+0x92/0x170
>> entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x7ad712b2c0e4
>> RSP: 002b:00007ad6e7dfdf40 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ad712b2c0e4
>> RDX: 0000000000000014 RSI: 00007ad6e7dff070 RDI: 000000000000000b
>> RBP: 00007ad6e7dfdf80 R08: 00007ad6e7dff014 R09: 000000000000000c
>> R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000000b
>> R13: 0000000000000010 R14: 00007ad6e7dff030 R15: 00000000d3fb1bea
>> </TASK>
>> INFO: task Qt bearer threa:4002 blocked for more than 120 seconds.
>> Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:Qt bearer threa state:D stack:0 pid:4002 tgid:3506
>> ppid:3034 flags:0x00000002
>> Call Trace:
>> <TASK>
>> __schedule+0x5dd/0x1380
>> schedule+0x6e/0xf0
>> schedule_preempt_disabled+0x15/0x20
>> __mutex_lock+0x2a0/0x750
>> netlink_dump+0x1c4/0x3f0
>> __netlink_dump_start+0x2b3/0x340
>> rtnetlink_rcv_msg+0x469/0x4a0
>> netlink_rcv_skb+0xed/0x120
>> netlink_unicast+0x2ce/0x3f0
>> netlink_sendmsg+0x39c/0x450
>> __sys_sendto+0x2c8/0x350
>> __x64_sys_sendto+0x26/0x30
>> do_syscall_64+0x92/0x170
>> entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x76f3c692beea
>> RSP: 002b:000076f3a51fecb0 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000076f3c692beea
>> RDX: 0000000000000020 RSI: 000076f3a51fed60 RDI: 0000000000000023
>> RBP: 000076f3a51fece0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 000076f3a51fee38
>> R13: 000076f378026b30 R14: 000076f3a51fed30 R15: 000076f378026b48
>> </TASK>
>> INFO: task gnome-software:3529 blocked for more than 120 seconds.
>> Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:gnome-software state:D stack:0 pid:3529 tgid:3529
>> ppid:3034 flags:0x00000002
>> Call Trace:
>> <TASK>
>> __schedule+0x5dd/0x1380
>> schedule+0x6e/0xf0
>> schedule_preempt_disabled+0x15/0x20
>> __mutex_lock+0x2a0/0x750
>> netlink_dump+0x1c4/0x3f0
>> __netlink_dump_start+0x2b3/0x340
>> rtnetlink_rcv_msg+0x469/0x4a0
>> netlink_rcv_skb+0xed/0x120
>> netlink_unicast+0x2ce/0x3f0
>> netlink_sendmsg+0x39c/0x450
>> __sys_sendto+0x2c8/0x350
>> __x64_sys_sendto+0x26/0x30
>> do_syscall_64+0x92/0x170
>> entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x7d6be892beea
>> RSP: 002b:00007ffd94e01560 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007d6be892beea
>> RDX: 0000000000000014 RSI: 00007ffd94e015f0 RDI: 000000000000000d
>> RBP: 00007ffd94e01590 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000004000 R11: 0000000000000246 R12: 00007ffd94e015f0
>> R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000
>> </TASK>
>> INFO: task Qt bearer threa:3960 blocked for more than 120 seconds.
>> Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:Qt bearer threa state:D stack:0 pid:3960 tgid:3550
>> ppid:3034 flags:0x00000002
>> Call Trace:
>> <TASK>
>> __schedule+0x5dd/0x1380
>> schedule+0x6e/0xf0
>> schedule_preempt_disabled+0x15/0x20
>> __mutex_lock+0x2a0/0x750
>> netlink_dump+0x1c4/0x3f0
>> __netlink_dump_start+0x2b3/0x340
>> rtnetlink_rcv_msg+0x469/0x4a0
>> netlink_rcv_skb+0xed/0x120
>> netlink_unicast+0x2ce/0x3f0
>> netlink_sendmsg+0x39c/0x450
>> __sys_sendto+0x2c8/0x350
>> __x64_sys_sendto+0x26/0x30
>> do_syscall_64+0x92/0x170
>> entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x777a42b2beea
>> RSP: 002b:0000777a2abfecf0 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000777a42b2beea
>> RDX: 0000000000000020 RSI: 0000777a2abfeda0 RDI: 000000000000001d
>> RBP: 0000777a2abfed20 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000777a2abfee78
>> R13: 0000777a080285b0 R14: 0000777a2abfed70 R15: 0000777a080285c8
>> </TASK>
>> INFO: task xdg-desktop-por:3821 blocked for more than 120 seconds.
>> Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:xdg-desktop-por state:D stack:0 pid:3821 tgid:3821
>> ppid:2776 flags:0x00000002
>> Call Trace:
>> <TASK>
>> __schedule+0x5dd/0x1380
>> schedule+0x6e/0xf0
>> schedule_preempt_disabled+0x15/0x20
>> __mutex_lock+0x2a0/0x750
>> netlink_dump+0x1c4/0x3f0
>> __netlink_dump_start+0x2b3/0x340
>> rtnetlink_rcv_msg+0x469/0x4a0
>> netlink_rcv_skb+0xed/0x120
>> netlink_unicast+0x2ce/0x3f0
>> netlink_sendmsg+0x39c/0x450
>> __sys_sendto+0x2c8/0x350
>> __x64_sys_sendto+0x26/0x30
>> do_syscall_64+0x92/0x170
>> entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x79d76612beea
>> RSP: 002b:00007ffd480942a0 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000079d76612beea
>> RDX: 0000000000000014 RSI: 00007ffd48094330 RDI: 0000000000000008
>> RBP: 00007ffd480942d0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000004000 R11: 0000000000000246 R12: 00007ffd48094330
>> R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000
>> </TASK>
>> INFO: task DNS Res~ver #11:25588 blocked for more than 120 seconds.
>> Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:DNS Res~ver #11 state:D stack:0 pid:25588 tgid:4934
>> ppid:3070 flags:0x00000002
>> Call Trace:
>> <TASK>
>> __schedule+0x5dd/0x1380
>> schedule+0x6e/0xf0
>> schedule_preempt_disabled+0x15/0x20
>> __mutex_lock+0x2a0/0x750
>> netlink_dump+0x1c4/0x3f0
>> __netlink_dump_start+0x2b3/0x340
>> rtnetlink_rcv_msg+0x469/0x4a0
>> netlink_rcv_skb+0xed/0x120
>> netlink_unicast+0x2ce/0x3f0
>> netlink_sendmsg+0x39c/0x450
>> __sys_sendto+0x2c8/0x350
>> __x64_sys_sendto+0x26/0x30
>> do_syscall_64+0x92/0x170
>> entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> RIP: 0033:0x72d65892c0e4
>> RSP: 002b:000072d649cbb880 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000072d65892c0e4
>> RDX: 0000000000000014 RSI: 000072d649cbc9b0 RDI: 0000000000000053
>> RBP: 000072d649cbb8c0 R08: 000072d649cbc954 R09: 000000000000000c
>> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000053
>> R13: 0000000000000010 R14: 000072d649cbc970 R15: 00000000b48fd654
>> </TASK>
>> INFO: task kworker/u88:2:31385 blocked for more than 120 seconds.
>> Not tainted 6.9.0-rc2 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:kworker/u88:2 state:D stack:0 pid:31385 tgid:31385 ppid:2
>> flags:0x00004000
>> Workqueue: ipv6_addrconf addrconf_verify_work
>> Call Trace:
>> <TASK>
>> __schedule+0x5dd/0x1380
>> schedule+0x6e/0xf0
>> schedule_preempt_disabled+0x15/0x20
>> __mutex_lock+0x2a0/0x750
>> addrconf_verify_work+0x20/0x30
>> process_scheduled_works+0x1f4/0x450
>> worker_thread+0x349/0x5e0
>> kthread+0x147/0x160
>> ret_from_fork+0x34/0x40
>> ret_from_fork_asm+0x11/0x20
>> </TASK>
>
It's unfortunate that the device-managed LED is bound to the netdev device.
Wouldn't binding it to the parent (&pdev->dev) solve the issue?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Deadlock in pciehp on dock disconnect
2024-04-05 13:31 ` Heiner Kallweit
@ 2024-04-05 17:48 ` Lukas Wunner
2024-04-05 19:01 ` Heiner Kallweit
2024-04-05 19:16 ` Lukas Wunner
0 siblings, 2 replies; 8+ messages in thread
From: Lukas Wunner @ 2024-04-05 17:48 UTC (permalink / raw)
To: Heiner Kallweit
Cc: Roman Lozko, linux-pci, Bjorn Helgaas, Dave Hansen,
Sean Christopherson, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, Christian Marangi,
Kurt Kanzenbach, Jesse Brandeburg, Tony Nguyen, intel-wired-lan
On Fri, Apr 05, 2024 at 03:31:34PM +0200, Heiner Kallweit wrote:
> On 05.04.2024 12:02, Lukas Wunner wrote:
> > On Fri, Apr 05, 2024 at 11:14:01AM +0200, Roman Lozko wrote:
> > > Hi, I'm using HP G4 Thunderbolt docking station, and recently (?)
> > > kernel started to "partially" deadlock after disconnecting the dock
> > > station. This results in inability to turn network interfaces on or
> > > off, system can't reboot, `sudo` does not work (guess because it uses
> > > DNS).
> >
> > unregister_netdev() acquires rtnl_lock(), indirectly invokes
> > netdev_trig_deactivate() upon unregistering some LED, thereby
> > calling unregister_netdevice_notifier(), which tries to
> > acquire rtnl_lock() again.
> >
> > From a quick look at the source files involved, this doesn't look
> > like something new, though I note LED support for igc was added
> > only recently with ea578703b03d ("igc: Add support for LEDs on
> > i225/i226"), which went into v6.9-rc1.
>
> It's unfortunate that the device-managed LED is bound to the netdev device.
> Wouldn't binding it to the parent (&pdev->dev) solve the issue?
I'm guessing igc commit ea578703b03d copy-pasted from r8169 commit
be51ed104ba9 ("r8169: add LED support for RTL8125/RTL8126") because
that driver has exactly the same problem. :)
Roman, does the below patch fix the issue?
Note that just changing the devm_led_classdev_register() call isn't
sufficient: I'm changing the devm_kcalloc() in igc_led_setup() as well
to avoid a use-after-free (memory would already get freed on netdev
unregister but led a little later on pdev unbind).
-- >8 --
diff --git a/drivers/net/ethernet/intel/igc/igc_leds.c b/drivers/net/ethernet/intel/igc/igc_leds.c
index bf240c5..0b78c30 100644
--- a/drivers/net/ethernet/intel/igc/igc_leds.c
+++ b/drivers/net/ethernet/intel/igc/igc_leds.c
@@ -257,13 +257,13 @@ static void igc_setup_ldev(struct igc_led_classdev *ldev,
led_cdev->hw_control_get = igc_led_hw_control_get;
led_cdev->hw_control_get_device = igc_led_hw_control_get_device;
- devm_led_classdev_register(&netdev->dev, led_cdev);
+ devm_led_classdev_register(&adapter->pdev->dev, led_cdev);
}
int igc_led_setup(struct igc_adapter *adapter)
{
struct net_device *netdev = adapter->netdev;
- struct device *dev = &netdev->dev;
+ struct device *dev = &adapter->pdev->dev;
struct igc_led_classdev *leds;
int i;
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Deadlock in pciehp on dock disconnect
2024-04-05 17:48 ` Lukas Wunner
@ 2024-04-05 19:01 ` Heiner Kallweit
2024-04-05 19:16 ` Lukas Wunner
1 sibling, 0 replies; 8+ messages in thread
From: Heiner Kallweit @ 2024-04-05 19:01 UTC (permalink / raw)
To: Lukas Wunner
Cc: Roman Lozko, linux-pci, Bjorn Helgaas, Dave Hansen,
Sean Christopherson, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, Christian Marangi,
Kurt Kanzenbach, Jesse Brandeburg, Tony Nguyen, intel-wired-lan
On 05.04.2024 19:48, Lukas Wunner wrote:
> On Fri, Apr 05, 2024 at 03:31:34PM +0200, Heiner Kallweit wrote:
>> On 05.04.2024 12:02, Lukas Wunner wrote:
>>> On Fri, Apr 05, 2024 at 11:14:01AM +0200, Roman Lozko wrote:
>>>> Hi, I'm using HP G4 Thunderbolt docking station, and recently (?)
>>>> kernel started to "partially" deadlock after disconnecting the dock
>>>> station. This results in inability to turn network interfaces on or
>>>> off, system can't reboot, `sudo` does not work (guess because it uses
>>>> DNS).
>>>
>>> unregister_netdev() acquires rtnl_lock(), indirectly invokes
>>> netdev_trig_deactivate() upon unregistering some LED, thereby
>>> calling unregister_netdevice_notifier(), which tries to
>>> acquire rtnl_lock() again.
>>>
>>> From a quick look at the source files involved, this doesn't look
>>> like something new, though I note LED support for igc was added
>>> only recently with ea578703b03d ("igc: Add support for LEDs on
>>> i225/i226"), which went into v6.9-rc1.
>>
>> It's unfortunate that the device-managed LED is bound to the netdev device.
>> Wouldn't binding it to the parent (&pdev->dev) solve the issue?
>
> I'm guessing igc commit ea578703b03d copy-pasted from r8169 commit
> be51ed104ba9 ("r8169: add LED support for RTL8125/RTL8126") because
> that driver has exactly the same problem. :)
>
Right, just tested it for r8169 and got a similar lockdep error.
> Roman, does the below patch fix the issue?
>
> Note that just changing the devm_led_classdev_register() call isn't
> sufficient: I'm changing the devm_kcalloc() in igc_led_setup() as well
> to avoid a use-after-free (memory would already get freed on netdev
> unregister but led a little later on pdev unbind).
>
> -- >8 --
>
> diff --git a/drivers/net/ethernet/intel/igc/igc_leds.c b/drivers/net/ethernet/intel/igc/igc_leds.c
> index bf240c5..0b78c30 100644
> --- a/drivers/net/ethernet/intel/igc/igc_leds.c
> +++ b/drivers/net/ethernet/intel/igc/igc_leds.c
> @@ -257,13 +257,13 @@ static void igc_setup_ldev(struct igc_led_classdev *ldev,
> led_cdev->hw_control_get = igc_led_hw_control_get;
> led_cdev->hw_control_get_device = igc_led_hw_control_get_device;
>
> - devm_led_classdev_register(&netdev->dev, led_cdev);
> + devm_led_classdev_register(&adapter->pdev->dev, led_cdev);
> }
>
> int igc_led_setup(struct igc_adapter *adapter)
> {
> struct net_device *netdev = adapter->netdev;
> - struct device *dev = &netdev->dev;
> + struct device *dev = &adapter->pdev->dev;
> struct igc_led_classdev *leds;
> int i;
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Deadlock in pciehp on dock disconnect
2024-04-05 17:48 ` Lukas Wunner
2024-04-05 19:01 ` Heiner Kallweit
@ 2024-04-05 19:16 ` Lukas Wunner
2024-04-05 20:16 ` Heiner Kallweit
2024-04-07 16:39 ` vient
1 sibling, 2 replies; 8+ messages in thread
From: Lukas Wunner @ 2024-04-05 19:16 UTC (permalink / raw)
To: Heiner Kallweit
Cc: Roman Lozko, linux-pci, Bjorn Helgaas, Dave Hansen,
Sean Christopherson, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, Christian Marangi,
Kurt Kanzenbach, Jesse Brandeburg, Tony Nguyen, intel-wired-lan
On Fri, Apr 05, 2024 at 07:48:08PM +0200, Lukas Wunner wrote:
> Roman, does the below patch fix the issue?
Actually the patch in my previous e-mail was crap as the unregistering
of the LEDs happened after unbind of the pdev, i.e. after
igc_release_hw_control() and pci_disable_device().
The driver otherwise doesn't seem to be using devm_*() and with
devm_*() it's always all or nothing. A mix of devm_*() and manual
teardown is prone to ordering issues.
Here's another attempt:
-- >8 --
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 90316dc58630..f9ffe9df9a96 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -298,6 +298,7 @@ struct igc_adapter {
/* LEDs */
struct mutex led_mutex;
+ struct igc_led_classdev *leds;
};
void igc_up(struct igc_adapter *adapter);
@@ -723,6 +724,7 @@ void igc_ptp_read(struct igc_adapter *adapter, struct timespec64 *ts);
void igc_ptp_tx_tstamp_event(struct igc_adapter *adapter);
int igc_led_setup(struct igc_adapter *adapter);
+void igc_led_teardown(struct igc_adapter *adapter);
#define igc_rx_pg_size(_ring) (PAGE_SIZE << igc_rx_pg_order(_ring))
diff --git a/drivers/net/ethernet/intel/igc/igc_leds.c b/drivers/net/ethernet/intel/igc/igc_leds.c
index bf240c5daf86..4c2806c0878a 100644
--- a/drivers/net/ethernet/intel/igc/igc_leds.c
+++ b/drivers/net/ethernet/intel/igc/igc_leds.c
@@ -236,8 +236,8 @@ static void igc_led_get_name(struct igc_adapter *adapter, int index, char *buf,
pci_dev_id(adapter->pdev), index);
}
-static void igc_setup_ldev(struct igc_led_classdev *ldev,
- struct net_device *netdev, int index)
+static int igc_setup_ldev(struct igc_led_classdev *ldev,
+ struct net_device *netdev, int index)
{
struct igc_adapter *adapter = netdev_priv(netdev);
struct led_classdev *led_cdev = &ldev->led;
@@ -257,15 +257,15 @@ static void igc_setup_ldev(struct igc_led_classdev *ldev,
led_cdev->hw_control_get = igc_led_hw_control_get;
led_cdev->hw_control_get_device = igc_led_hw_control_get_device;
- devm_led_classdev_register(&netdev->dev, led_cdev);
+ return led_classdev_register(&netdev->dev, led_cdev);
}
int igc_led_setup(struct igc_adapter *adapter)
{
struct net_device *netdev = adapter->netdev;
- struct device *dev = &netdev->dev;
+ struct device *dev = &adapter->pdev->dev;
struct igc_led_classdev *leds;
- int i;
+ int i, ret;
mutex_init(&adapter->led_mutex);
@@ -273,8 +273,27 @@ int igc_led_setup(struct igc_adapter *adapter)
if (!leds)
return -ENOMEM;
- for (i = 0; i < IGC_NUM_LEDS; i++)
- igc_setup_ldev(leds + i, netdev, i);
+ for (i = 0; i < IGC_NUM_LEDS; i++) {
+ ret = igc_setup_ldev(leds + i, netdev, i);
+ if (ret)
+ goto err;
+ }
+
+ adapter->leds = leds;
return 0;
+
+err:
+ for (i--; i >= 0; i--)
+ led_classdev_unregister(&((leds + i)->led));
+ return ret;
+}
+
+void igc_led_teardown(struct igc_adapter *adapter)
+{
+ struct igc_led_classdev *leds = adapter->leds;
+ int i;
+
+ for (i = 0; i < IGC_NUM_LEDS; i++)
+ led_classdev_unregister(&((leds + i)->led));
}
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 2e1cfbd82f4f..cd164442ab35 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -7025,6 +7025,9 @@ static void igc_remove(struct pci_dev *pdev)
cancel_work_sync(&adapter->watchdog_task);
hrtimer_cancel(&adapter->hrtimer);
+ if (IS_ENABLED(CONFIG_IGC_LEDS))
+ igc_led_teardown(adapter);
+
/* Release control of h/w to f/w. If f/w is AMT enabled, this
* would have already happened in close and is redundant.
*/
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Deadlock in pciehp on dock disconnect
2024-04-05 19:16 ` Lukas Wunner
@ 2024-04-05 20:16 ` Heiner Kallweit
2024-04-07 16:39 ` vient
1 sibling, 0 replies; 8+ messages in thread
From: Heiner Kallweit @ 2024-04-05 20:16 UTC (permalink / raw)
To: Lukas Wunner
Cc: Roman Lozko, linux-pci, Bjorn Helgaas, Dave Hansen,
Sean Christopherson, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, Christian Marangi,
Kurt Kanzenbach, Jesse Brandeburg, Tony Nguyen, intel-wired-lan
On 05.04.2024 21:16, Lukas Wunner wrote:
> On Fri, Apr 05, 2024 at 07:48:08PM +0200, Lukas Wunner wrote:
>> Roman, does the below patch fix the issue?
>
> Actually the patch in my previous e-mail was crap as the unregistering
> of the LEDs happened after unbind of the pdev, i.e. after
> igc_release_hw_control() and pci_disable_device().
>
For r8169 the first version is sufficient because everything is
device-managed.
> The driver otherwise doesn't seem to be using devm_*() and with
> devm_*() it's always all or nothing. A mix of devm_*() and manual
> teardown is prone to ordering issues.
>
> Here's another attempt:
>
> -- >8 --
>
> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
> index 90316dc58630..f9ffe9df9a96 100644
> --- a/drivers/net/ethernet/intel/igc/igc.h
> +++ b/drivers/net/ethernet/intel/igc/igc.h
> @@ -298,6 +298,7 @@ struct igc_adapter {
>
> /* LEDs */
> struct mutex led_mutex;
> + struct igc_led_classdev *leds;
> };
>
> void igc_up(struct igc_adapter *adapter);
> @@ -723,6 +724,7 @@ void igc_ptp_read(struct igc_adapter *adapter, struct timespec64 *ts);
> void igc_ptp_tx_tstamp_event(struct igc_adapter *adapter);
>
> int igc_led_setup(struct igc_adapter *adapter);
> +void igc_led_teardown(struct igc_adapter *adapter);
>
> #define igc_rx_pg_size(_ring) (PAGE_SIZE << igc_rx_pg_order(_ring))
>
> diff --git a/drivers/net/ethernet/intel/igc/igc_leds.c b/drivers/net/ethernet/intel/igc/igc_leds.c
> index bf240c5daf86..4c2806c0878a 100644
> --- a/drivers/net/ethernet/intel/igc/igc_leds.c
> +++ b/drivers/net/ethernet/intel/igc/igc_leds.c
> @@ -236,8 +236,8 @@ static void igc_led_get_name(struct igc_adapter *adapter, int index, char *buf,
> pci_dev_id(adapter->pdev), index);
> }
>
> -static void igc_setup_ldev(struct igc_led_classdev *ldev,
> - struct net_device *netdev, int index)
> +static int igc_setup_ldev(struct igc_led_classdev *ldev,
> + struct net_device *netdev, int index)
> {
> struct igc_adapter *adapter = netdev_priv(netdev);
> struct led_classdev *led_cdev = &ldev->led;
> @@ -257,15 +257,15 @@ static void igc_setup_ldev(struct igc_led_classdev *ldev,
> led_cdev->hw_control_get = igc_led_hw_control_get;
> led_cdev->hw_control_get_device = igc_led_hw_control_get_device;
>
> - devm_led_classdev_register(&netdev->dev, led_cdev);
> + return led_classdev_register(&netdev->dev, led_cdev);
> }
>
> int igc_led_setup(struct igc_adapter *adapter)
> {
> struct net_device *netdev = adapter->netdev;
> - struct device *dev = &netdev->dev;
> + struct device *dev = &adapter->pdev->dev;
> struct igc_led_classdev *leds;
> - int i;
> + int i, ret;
>
> mutex_init(&adapter->led_mutex);
>
> @@ -273,8 +273,27 @@ int igc_led_setup(struct igc_adapter *adapter)
> if (!leds)
> return -ENOMEM;
>
> - for (i = 0; i < IGC_NUM_LEDS; i++)
> - igc_setup_ldev(leds + i, netdev, i);
> + for (i = 0; i < IGC_NUM_LEDS; i++) {
> + ret = igc_setup_ldev(leds + i, netdev, i);
> + if (ret)
> + goto err;
> + }
> +
> + adapter->leds = leds;
>
> return 0;
> +
> +err:
> + for (i--; i >= 0; i--)
> + led_classdev_unregister(&((leds + i)->led));
> + return ret;
> +}
> +
> +void igc_led_teardown(struct igc_adapter *adapter)
> +{
> + struct igc_led_classdev *leds = adapter->leds;
> + int i;
> +
> + for (i = 0; i < IGC_NUM_LEDS; i++)
> + led_classdev_unregister(&((leds + i)->led));
> }
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 2e1cfbd82f4f..cd164442ab35 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -7025,6 +7025,9 @@ static void igc_remove(struct pci_dev *pdev)
> cancel_work_sync(&adapter->watchdog_task);
> hrtimer_cancel(&adapter->hrtimer);
>
> + if (IS_ENABLED(CONFIG_IGC_LEDS))
> + igc_led_teardown(adapter);
> +
> /* Release control of h/w to f/w. If f/w is AMT enabled, this
> * would have already happened in close and is redundant.
> */
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Deadlock in pciehp on dock disconnect
2024-04-05 19:16 ` Lukas Wunner
2024-04-05 20:16 ` Heiner Kallweit
@ 2024-04-07 16:39 ` vient
1 sibling, 0 replies; 8+ messages in thread
From: vient @ 2024-04-07 16:39 UTC (permalink / raw)
To: lukas; +Cc: netdev, hkallweit1
Did not notice second version until testing the first one.
First one worked, no more hangups.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-04-07 16:40 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com>
2024-04-05 10:02 ` Deadlock in pciehp on dock disconnect Lukas Wunner
2024-04-05 12:59 ` vient
2024-04-05 13:31 ` Heiner Kallweit
2024-04-05 17:48 ` Lukas Wunner
2024-04-05 19:01 ` Heiner Kallweit
2024-04-05 19:16 ` Lukas Wunner
2024-04-05 20:16 ` Heiner Kallweit
2024-04-07 16:39 ` vient
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).