* [PATCH net] igc: Fix LED-related deadlock on driver unbind
@ 2024-04-22 20:45 Tony Nguyen
2024-04-22 23:32 ` Jacob Keller
2024-04-25 3:40 ` patchwork-bot+netdevbpf
0 siblings, 2 replies; 11+ messages in thread
From: Tony Nguyen @ 2024-04-22 20:45 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, netdev
Cc: Lukas Wunner, anthony.l.nguyen, sasha.neftin, Roman Lozko,
Marek Marczykowski-Górecki, Kurt Kanzenbach, Heiner Kallweit,
Simon Horman, Naama Meir
From: Lukas Wunner <lukas@wunner.de>
Roman reports a deadlock on unplug of a Thunderbolt docking station
containing an Intel I225 Ethernet adapter.
The root cause is that led_classdev's for LEDs on the adapter are
registered such that they're device-managed by the netdev. That
results in recursive acquisition of the rtnl_lock() mutex on unplug:
When the driver calls unregister_netdev(), it acquires rtnl_lock(),
then frees the device-managed resources. Upon unregistering the LEDs,
netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
which tries to acquire rtnl_lock() again.
Avoid by using non-device-managed LED registration.
Stack trace for posterity:
schedule+0x6e/0xf0
schedule_preempt_disabled+0x15/0x20
__mutex_lock+0x2a0/0x750
unregister_netdevice_notifier+0x40/0x150
netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev]
led_trigger_set+0x102/0x330
led_classdev_unregister+0x4b/0x110
release_nodes+0x3d/0xb0
devres_release_all+0x8b/0xc0
device_del+0x34f/0x3c0
unregister_netdevice_many_notify+0x80b/0xaf0
unregister_netdev+0x7c/0xd0
igc_remove+0xd8/0x1e0 [igc]
pci_device_remove+0x3f/0xb0
Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
Reported-by: Roman Lozko <lozko.roma@gmail.com>
Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/
Reported-by: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
Closes: https://lore.kernel.org/r/ZhRD3cOtz5i-61PB@mail-itl/
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de> # Intel i225
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igc/igc.h | 2 ++
drivers/net/ethernet/intel/igc/igc_leds.c | 38 ++++++++++++++++++-----
drivers/net/ethernet/intel/igc/igc_main.c | 3 ++
3 files changed, 35 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 90316dc58630..6bc56c7c181e 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -298,6 +298,7 @@ struct igc_adapter {
/* LEDs */
struct mutex led_mutex;
+ struct igc_led_classdev *leds;
};
void igc_up(struct igc_adapter *adapter);
@@ -723,6 +724,7 @@ void igc_ptp_read(struct igc_adapter *adapter, struct timespec64 *ts);
void igc_ptp_tx_tstamp_event(struct igc_adapter *adapter);
int igc_led_setup(struct igc_adapter *adapter);
+void igc_led_free(struct igc_adapter *adapter);
#define igc_rx_pg_size(_ring) (PAGE_SIZE << igc_rx_pg_order(_ring))
diff --git a/drivers/net/ethernet/intel/igc/igc_leds.c b/drivers/net/ethernet/intel/igc/igc_leds.c
index bf240c5daf86..3929b25b6ae6 100644
--- a/drivers/net/ethernet/intel/igc/igc_leds.c
+++ b/drivers/net/ethernet/intel/igc/igc_leds.c
@@ -236,8 +236,8 @@ static void igc_led_get_name(struct igc_adapter *adapter, int index, char *buf,
pci_dev_id(adapter->pdev), index);
}
-static void igc_setup_ldev(struct igc_led_classdev *ldev,
- struct net_device *netdev, int index)
+static int igc_setup_ldev(struct igc_led_classdev *ldev,
+ struct net_device *netdev, int index)
{
struct igc_adapter *adapter = netdev_priv(netdev);
struct led_classdev *led_cdev = &ldev->led;
@@ -257,24 +257,46 @@ static void igc_setup_ldev(struct igc_led_classdev *ldev,
led_cdev->hw_control_get = igc_led_hw_control_get;
led_cdev->hw_control_get_device = igc_led_hw_control_get_device;
- devm_led_classdev_register(&netdev->dev, led_cdev);
+ return led_classdev_register(&netdev->dev, led_cdev);
}
int igc_led_setup(struct igc_adapter *adapter)
{
struct net_device *netdev = adapter->netdev;
- struct device *dev = &netdev->dev;
struct igc_led_classdev *leds;
- int i;
+ int i, err;
mutex_init(&adapter->led_mutex);
- leds = devm_kcalloc(dev, IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL);
+ leds = kcalloc(IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL);
if (!leds)
return -ENOMEM;
- for (i = 0; i < IGC_NUM_LEDS; i++)
- igc_setup_ldev(leds + i, netdev, i);
+ for (i = 0; i < IGC_NUM_LEDS; i++) {
+ err = igc_setup_ldev(leds + i, netdev, i);
+ if (err)
+ goto err;
+ }
+
+ adapter->leds = leds;
return 0;
+
+err:
+ for (i--; i >= 0; i--)
+ led_classdev_unregister(&((leds + i)->led));
+
+ kfree(leds);
+ return err;
+}
+
+void igc_led_free(struct igc_adapter *adapter)
+{
+ struct igc_led_classdev *leds = adapter->leds;
+ int i;
+
+ for (i = 0; i < IGC_NUM_LEDS; i++)
+ led_classdev_unregister(&((leds + i)->led));
+
+ kfree(leds);
}
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 35ad40a803cb..4d975d620a8e 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -7021,6 +7021,9 @@ static void igc_remove(struct pci_dev *pdev)
cancel_work_sync(&adapter->watchdog_task);
hrtimer_cancel(&adapter->hrtimer);
+ if (IS_ENABLED(CONFIG_IGC_LEDS))
+ igc_led_free(adapter);
+
/* Release control of h/w to f/w. If f/w is AMT enabled, this
* would have already happened in close and is redundant.
*/
--
2.41.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH net] igc: Fix LED-related deadlock on driver unbind
2024-04-22 20:45 [PATCH net] igc: Fix LED-related deadlock on driver unbind Tony Nguyen
@ 2024-04-22 23:32 ` Jacob Keller
2024-04-22 23:37 ` Marek Marczykowski-Górecki
2024-04-23 7:53 ` Lukas Wunner
2024-04-25 3:40 ` patchwork-bot+netdevbpf
1 sibling, 2 replies; 11+ messages in thread
From: Jacob Keller @ 2024-04-22 23:32 UTC (permalink / raw)
To: Tony Nguyen, davem, kuba, pabeni, edumazet, netdev
Cc: Lukas Wunner, sasha.neftin, Roman Lozko,
Marek Marczykowski-Górecki, Kurt Kanzenbach, Heiner Kallweit,
Simon Horman, Naama Meir
On 4/22/2024 1:45 PM, Tony Nguyen wrote:
> From: Lukas Wunner <lukas@wunner.de>
>
> Roman reports a deadlock on unplug of a Thunderbolt docking station
> containing an Intel I225 Ethernet adapter.
>
> The root cause is that led_classdev's for LEDs on the adapter are
> registered such that they're device-managed by the netdev. That
> results in recursive acquisition of the rtnl_lock() mutex on unplug:
>
> When the driver calls unregister_netdev(), it acquires rtnl_lock(),
> then frees the device-managed resources. Upon unregistering the LEDs,
> netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
> which tries to acquire rtnl_lock() again.
>
> Avoid by using non-device-managed LED registration.
>
Could we instead switch to using devm with the PCI device struct instead
of the netdev struct? That would make it still get automatically cleaned
up, but by cleaning it up only when the PCIe device goes away, which
should be after rtnl_lock() is released..
I guess I don't have an objection to this patch in principle since it
does resolve the issue, but it seems like it would be simpler to switch
which device managed the resources vs re-adding the manual handling.
Thanks,
Jake
> Stack trace for posterity:
>
> schedule+0x6e/0xf0
> schedule_preempt_disabled+0x15/0x20
> __mutex_lock+0x2a0/0x750
> unregister_netdevice_notifier+0x40/0x150
> netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev]
> led_trigger_set+0x102/0x330
> led_classdev_unregister+0x4b/0x110
> release_nodes+0x3d/0xb0
> devres_release_all+0x8b/0xc0
> device_del+0x34f/0x3c0
> unregister_netdevice_many_notify+0x80b/0xaf0
> unregister_netdev+0x7c/0xd0
> igc_remove+0xd8/0x1e0 [igc]
> pci_device_remove+0x3f/0xb0
>
> Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
> Reported-by: Roman Lozko <lozko.roma@gmail.com>
> Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/
> Reported-by: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
> Closes: https://lore.kernel.org/r/ZhRD3cOtz5i-61PB@mail-itl/
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Reviewed-by: Simon Horman <horms@kernel.org>
> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> # Intel i225
> Tested-by: Naama Meir <naamax.meir@linux.intel.com>
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> ---
> drivers/net/ethernet/intel/igc/igc.h | 2 ++
> drivers/net/ethernet/intel/igc/igc_leds.c | 38 ++++++++++++++++++-----
> drivers/net/ethernet/intel/igc/igc_main.c | 3 ++
> 3 files changed, 35 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
> index 90316dc58630..6bc56c7c181e 100644
> --- a/drivers/net/ethernet/intel/igc/igc.h
> +++ b/drivers/net/ethernet/intel/igc/igc.h
> @@ -298,6 +298,7 @@ struct igc_adapter {
>
> /* LEDs */
> struct mutex led_mutex;
> + struct igc_led_classdev *leds;
> };
>
> void igc_up(struct igc_adapter *adapter);
> @@ -723,6 +724,7 @@ void igc_ptp_read(struct igc_adapter *adapter, struct timespec64 *ts);
> void igc_ptp_tx_tstamp_event(struct igc_adapter *adapter);
>
> int igc_led_setup(struct igc_adapter *adapter);
> +void igc_led_free(struct igc_adapter *adapter);
>
> #define igc_rx_pg_size(_ring) (PAGE_SIZE << igc_rx_pg_order(_ring))
>
> diff --git a/drivers/net/ethernet/intel/igc/igc_leds.c b/drivers/net/ethernet/intel/igc/igc_leds.c
> index bf240c5daf86..3929b25b6ae6 100644
> --- a/drivers/net/ethernet/intel/igc/igc_leds.c
> +++ b/drivers/net/ethernet/intel/igc/igc_leds.c
> @@ -236,8 +236,8 @@ static void igc_led_get_name(struct igc_adapter *adapter, int index, char *buf,
> pci_dev_id(adapter->pdev), index);
> }
>
> -static void igc_setup_ldev(struct igc_led_classdev *ldev,
> - struct net_device *netdev, int index)
> +static int igc_setup_ldev(struct igc_led_classdev *ldev,
> + struct net_device *netdev, int index)
> {
> struct igc_adapter *adapter = netdev_priv(netdev);
> struct led_classdev *led_cdev = &ldev->led;
> @@ -257,24 +257,46 @@ static void igc_setup_ldev(struct igc_led_classdev *ldev,
> led_cdev->hw_control_get = igc_led_hw_control_get;
> led_cdev->hw_control_get_device = igc_led_hw_control_get_device;
>
> - devm_led_classdev_register(&netdev->dev, led_cdev);
> + return led_classdev_register(&netdev->dev, led_cdev);
> }
>
> int igc_led_setup(struct igc_adapter *adapter)
> {
> struct net_device *netdev = adapter->netdev;
> - struct device *dev = &netdev->dev;
> struct igc_led_classdev *leds;
> - int i;
> + int i, err;
>
> mutex_init(&adapter->led_mutex);
>
> - leds = devm_kcalloc(dev, IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL);
> + leds = kcalloc(IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL);
> if (!leds)
> return -ENOMEM;
>
> - for (i = 0; i < IGC_NUM_LEDS; i++)
> - igc_setup_ldev(leds + i, netdev, i);
> + for (i = 0; i < IGC_NUM_LEDS; i++) {
> + err = igc_setup_ldev(leds + i, netdev, i);
> + if (err)
> + goto err;
> + }
> +
> + adapter->leds = leds;
>
> return 0;
> +
> +err:
> + for (i--; i >= 0; i--)
> + led_classdev_unregister(&((leds + i)->led));
> +
> + kfree(leds);
> + return err;
> +}
> +
> +void igc_led_free(struct igc_adapter *adapter)
> +{
> + struct igc_led_classdev *leds = adapter->leds;
> + int i;
> +
> + for (i = 0; i < IGC_NUM_LEDS; i++)
> + led_classdev_unregister(&((leds + i)->led));
> +
> + kfree(leds);
> }
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 35ad40a803cb..4d975d620a8e 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -7021,6 +7021,9 @@ static void igc_remove(struct pci_dev *pdev)
> cancel_work_sync(&adapter->watchdog_task);
> hrtimer_cancel(&adapter->hrtimer);
>
> + if (IS_ENABLED(CONFIG_IGC_LEDS))
> + igc_led_free(adapter);
> +
> /* Release control of h/w to f/w. If f/w is AMT enabled, this
> * would have already happened in close and is redundant.
> */
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH net] igc: Fix LED-related deadlock on driver unbind
2024-04-22 23:32 ` Jacob Keller
@ 2024-04-22 23:37 ` Marek Marczykowski-Górecki
2024-04-22 23:46 ` Jacob Keller
2024-04-23 7:53 ` Lukas Wunner
1 sibling, 1 reply; 11+ messages in thread
From: Marek Marczykowski-Górecki @ 2024-04-22 23:37 UTC (permalink / raw)
To: Jacob Keller
Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, netdev, Lukas Wunner,
sasha.neftin, Roman Lozko, Kurt Kanzenbach, Heiner Kallweit,
Simon Horman, Naama Meir
[-- Attachment #1: Type: text/plain, Size: 1318 bytes --]
On Mon, Apr 22, 2024 at 04:32:01PM -0700, Jacob Keller wrote:
> On 4/22/2024 1:45 PM, Tony Nguyen wrote:
> > From: Lukas Wunner <lukas@wunner.de>
> >
> > Roman reports a deadlock on unplug of a Thunderbolt docking station
> > containing an Intel I225 Ethernet adapter.
> >
> > The root cause is that led_classdev's for LEDs on the adapter are
> > registered such that they're device-managed by the netdev. That
> > results in recursive acquisition of the rtnl_lock() mutex on unplug:
> >
> > When the driver calls unregister_netdev(), it acquires rtnl_lock(),
> > then frees the device-managed resources. Upon unregistering the LEDs,
> > netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
> > which tries to acquire rtnl_lock() again.
> >
> > Avoid by using non-device-managed LED registration.
>
> Could we instead switch to using devm with the PCI device struct instead
> of the netdev struct? That would make it still get automatically cleaned
> up, but by cleaning it up only when the PCIe device goes away, which
> should be after rtnl_lock() is released..
Wouldn't that effectively leak memory if driver is unbound from the
device and then bound back (and possibly repeated multiple times)?
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net] igc: Fix LED-related deadlock on driver unbind
2024-04-22 23:37 ` Marek Marczykowski-Górecki
@ 2024-04-22 23:46 ` Jacob Keller
2024-04-23 8:08 ` Lukas Wunner
0 siblings, 1 reply; 11+ messages in thread
From: Jacob Keller @ 2024-04-22 23:46 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, netdev, Lukas Wunner,
sasha.neftin, Roman Lozko, Kurt Kanzenbach, Heiner Kallweit,
Simon Horman, Naama Meir
On 4/22/2024 4:37 PM, Marek Marczykowski-Górecki wrote:
> On Mon, Apr 22, 2024 at 04:32:01PM -0700, Jacob Keller wrote:
>> On 4/22/2024 1:45 PM, Tony Nguyen wrote:
>>> From: Lukas Wunner <lukas@wunner.de>
>>>
>>> Roman reports a deadlock on unplug of a Thunderbolt docking station
>>> containing an Intel I225 Ethernet adapter.
>>>
>>> The root cause is that led_classdev's for LEDs on the adapter are
>>> registered such that they're device-managed by the netdev. That
>>> results in recursive acquisition of the rtnl_lock() mutex on unplug:
>>>
>>> When the driver calls unregister_netdev(), it acquires rtnl_lock(),
>>> then frees the device-managed resources. Upon unregistering the LEDs,
>>> netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
>>> which tries to acquire rtnl_lock() again.
>>>
>>> Avoid by using non-device-managed LED registration.
>>
>> Could we instead switch to using devm with the PCI device struct instead
>> of the netdev struct? That would make it still get automatically cleaned
>> up, but by cleaning it up only when the PCIe device goes away, which
>> should be after rtnl_lock() is released..
>
> Wouldn't that effectively leak memory if driver is unbound from the
> device and then bound back (and possibly repeated multiple times)?
>
My understanding of devm is that when you unload the driver it calls the
devm teardowns so you only leak until driver remove.
In the netdev case, you're releasing during unregister_netdev() instead
of at the end of the .remove() callback of the PCI driver.
To me, using devm from the PCI device should be equivalent to managing
it manually within the igc_remove() function.
I could be mis-understanding how devm works, or the order and flow for
how and when igc allocates these?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net] igc: Fix LED-related deadlock on driver unbind
2024-04-22 23:46 ` Jacob Keller
@ 2024-04-23 8:08 ` Lukas Wunner
0 siblings, 0 replies; 11+ messages in thread
From: Lukas Wunner @ 2024-04-23 8:08 UTC (permalink / raw)
To: Jacob Keller
Cc: Marek Marczykowski-Górecki, Tony Nguyen, davem, kuba, pabeni,
edumazet, netdev, sasha.neftin, Roman Lozko, Kurt Kanzenbach,
Heiner Kallweit, Simon Horman, Naama Meir
On Mon, Apr 22, 2024 at 04:46:28PM -0700, Jacob Keller wrote:
> To me, using devm from the PCI device should be equivalent to managing
> it manually within the igc_remove() function.
It is not equivalent because the ordering is different:
igc_remove() is called before device-managed resources are released:
__device_release_driver()
device_remove() # invokes igc_remove()
device_unbind_cleanup()
devres_release_all() # releases device-managed resources
If you unregister LEDs explicitly in igc_remove() before unregistering
the netdev and disabling PCI device access, everything's fine.
If you instead use devm_led_classdev_register(), the LEDs would still
be registered and available in sysfs after igc_remove() has torn down
everything, which is bad.
You'd have to use devm_*() for all initialization steps in igc_probe()
to make this work. With devm_*() it's generally all or nothing.
(There are exceptions: Using devm_*() just for memory allocations is
fine as those can safely be freed even if everything else is torn down.)
Thanks,
Lukas
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net] igc: Fix LED-related deadlock on driver unbind
2024-04-22 23:32 ` Jacob Keller
2024-04-22 23:37 ` Marek Marczykowski-Górecki
@ 2024-04-23 7:53 ` Lukas Wunner
1 sibling, 0 replies; 11+ messages in thread
From: Lukas Wunner @ 2024-04-23 7:53 UTC (permalink / raw)
To: Jacob Keller
Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, netdev, sasha.neftin,
Roman Lozko, Marek Marczykowski-Górecki, Kurt Kanzenbach,
Heiner Kallweit, Simon Horman, Naama Meir
On Mon, Apr 22, 2024 at 04:32:01PM -0700, Jacob Keller wrote:
> On 4/22/2024 1:45 PM, Tony Nguyen wrote:
> > Roman reports a deadlock on unplug of a Thunderbolt docking station
> > containing an Intel I225 Ethernet adapter.
> >
> > The root cause is that led_classdev's for LEDs on the adapter are
> > registered such that they're device-managed by the netdev. That
> > results in recursive acquisition of the rtnl_lock() mutex on unplug:
> >
> > When the driver calls unregister_netdev(), it acquires rtnl_lock(),
> > then frees the device-managed resources. Upon unregistering the LEDs,
> > netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
> > which tries to acquire rtnl_lock() again.
> >
> > Avoid by using non-device-managed LED registration.
> >
>
> Could we instead switch to using devm with the PCI device struct instead
> of the netdev struct?
No, unfortunately that doesn't work:
The unregistering of the LEDs would then happen after unbind of the
pci_dev, i.e. after igc_release_hw_control() and pci_disable_device().
The LED registers aren't even accessible at that point, but the LEDs
are still exposed in sysfs. I tried that approach but then realized
it's a mistake:
https://lore.kernel.org/all/ZhBN9p1yOyciXkzw@wunner.de/
Andrew Lunn concurred and wrote that "LEDs need to be added and
explicitly removed within the life cycle of the netdev":
https://lore.kernel.org/all/7cfb1af7-3270-447a-a2cf-16c2af02ec29@lunn.ch/
We'd have to convert the igc driver to use devm_*() for everything to
avoid this ordering issue. I don't think that's something we can do
at this point in the cycle. The present patch fixes a regression
introduced with v6.9-rc1.
There's another reason this approach doesn't work:
The first argument to devm_led_classdev_register() has two purposes:
(1) It's used to manage the resource (i.e. LED is unregistered on unbind),
(2) but it's also used as the parent below which the LED appears in sysfs.
If I changed the argument to the pci_dev, the LED would suddenly appear
below the pci_dev in sysfs, instead of the netdev. So the patch would
result in an undesired change of behavior.
Of course we can discuss introducing a new devm_*() helper which accepts
separate device arguments for the two purposes above. But that would
likewise be something we can't do at this point in the cycle.
We discussed the conundrum of the dual-purpose device argument in a
separate thread for r8169 (which suffered from the same LED deadlock):
https://lore.kernel.org/all/20240405205903.GA3458@wunner.de/
Thanks,
Lukas
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net] igc: Fix LED-related deadlock on driver unbind
2024-04-22 20:45 [PATCH net] igc: Fix LED-related deadlock on driver unbind Tony Nguyen
2024-04-22 23:32 ` Jacob Keller
@ 2024-04-25 3:40 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-04-25 3:40 UTC (permalink / raw)
To: Tony Nguyen
Cc: davem, kuba, pabeni, edumazet, netdev, lukas, sasha.neftin,
lozko.roma, marmarek, kurt, hkallweit1, horms, naamax.meir
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 22 Apr 2024 13:45:02 -0700 you wrote:
> From: Lukas Wunner <lukas@wunner.de>
>
> Roman reports a deadlock on unplug of a Thunderbolt docking station
> containing an Intel I225 Ethernet adapter.
>
> The root cause is that led_classdev's for LEDs on the adapter are
> registered such that they're device-managed by the netdev. That
> results in recursive acquisition of the rtnl_lock() mutex on unplug:
>
> [...]
Here is the summary with links:
- [net] igc: Fix LED-related deadlock on driver unbind
https://git.kernel.org/netdev/net/c/c04d1b9ecce5
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net] igc: Fix LED-related deadlock on driver unbind
@ 2024-04-15 13:48 Lukas Wunner
2024-04-16 13:51 ` Simon Horman
2024-04-16 14:06 ` Kurt Kanzenbach
0 siblings, 2 replies; 11+ messages in thread
From: Lukas Wunner @ 2024-04-15 13:48 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Jesse Brandeburg, Tony Nguyen
Cc: intel-wired-lan, netdev, Roman Lozko, Kurt Kanzenbach,
Heiner Kallweit, Andrew Lunn, Sasha Neftin
Roman reports a deadlock on unplug of a Thunderbolt docking station
containing an Intel I225 Ethernet adapter.
The root cause is that led_classdev's for LEDs on the adapter are
registered such that they're device-managed by the netdev. That
results in recursive acquisition of the rtnl_lock() mutex on unplug:
When the driver calls unregister_netdev(), it acquires rtnl_lock(),
then frees the device-managed resources. Upon unregistering the LEDs,
netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
which tries to acquire rtnl_lock() again.
Avoid by using non-device-managed LED registration.
Stack trace for posterity:
schedule+0x6e/0xf0
schedule_preempt_disabled+0x15/0x20
__mutex_lock+0x2a0/0x750
unregister_netdevice_notifier+0x40/0x150
netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev]
led_trigger_set+0x102/0x330
led_classdev_unregister+0x4b/0x110
release_nodes+0x3d/0xb0
devres_release_all+0x8b/0xc0
device_del+0x34f/0x3c0
unregister_netdevice_many_notify+0x80b/0xaf0
unregister_netdev+0x7c/0xd0
igc_remove+0xd8/0x1e0 [igc]
pci_device_remove+0x3f/0xb0
Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
Reported-by: Roman Lozko <lozko.roma@gmail.com>
Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/ethernet/intel/igc/igc.h | 2 ++
drivers/net/ethernet/intel/igc/igc_leds.c | 38 ++++++++++++++++++++++++-------
drivers/net/ethernet/intel/igc/igc_main.c | 3 +++
3 files changed, 35 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 90316dc..6bc56c7 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -298,6 +298,7 @@ struct igc_adapter {
/* LEDs */
struct mutex led_mutex;
+ struct igc_led_classdev *leds;
};
void igc_up(struct igc_adapter *adapter);
@@ -723,6 +724,7 @@ struct igc_nfc_rule *igc_get_nfc_rule(struct igc_adapter *adapter,
void igc_ptp_tx_tstamp_event(struct igc_adapter *adapter);
int igc_led_setup(struct igc_adapter *adapter);
+void igc_led_free(struct igc_adapter *adapter);
#define igc_rx_pg_size(_ring) (PAGE_SIZE << igc_rx_pg_order(_ring))
diff --git a/drivers/net/ethernet/intel/igc/igc_leds.c b/drivers/net/ethernet/intel/igc/igc_leds.c
index bf240c5..3929b25 100644
--- a/drivers/net/ethernet/intel/igc/igc_leds.c
+++ b/drivers/net/ethernet/intel/igc/igc_leds.c
@@ -236,8 +236,8 @@ static void igc_led_get_name(struct igc_adapter *adapter, int index, char *buf,
pci_dev_id(adapter->pdev), index);
}
-static void igc_setup_ldev(struct igc_led_classdev *ldev,
- struct net_device *netdev, int index)
+static int igc_setup_ldev(struct igc_led_classdev *ldev,
+ struct net_device *netdev, int index)
{
struct igc_adapter *adapter = netdev_priv(netdev);
struct led_classdev *led_cdev = &ldev->led;
@@ -257,24 +257,46 @@ static void igc_setup_ldev(struct igc_led_classdev *ldev,
led_cdev->hw_control_get = igc_led_hw_control_get;
led_cdev->hw_control_get_device = igc_led_hw_control_get_device;
- devm_led_classdev_register(&netdev->dev, led_cdev);
+ return led_classdev_register(&netdev->dev, led_cdev);
}
int igc_led_setup(struct igc_adapter *adapter)
{
struct net_device *netdev = adapter->netdev;
- struct device *dev = &netdev->dev;
struct igc_led_classdev *leds;
- int i;
+ int i, err;
mutex_init(&adapter->led_mutex);
- leds = devm_kcalloc(dev, IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL);
+ leds = kcalloc(IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL);
if (!leds)
return -ENOMEM;
- for (i = 0; i < IGC_NUM_LEDS; i++)
- igc_setup_ldev(leds + i, netdev, i);
+ for (i = 0; i < IGC_NUM_LEDS; i++) {
+ err = igc_setup_ldev(leds + i, netdev, i);
+ if (err)
+ goto err;
+ }
+
+ adapter->leds = leds;
return 0;
+
+err:
+ for (i--; i >= 0; i--)
+ led_classdev_unregister(&((leds + i)->led));
+
+ kfree(leds);
+ return err;
+}
+
+void igc_led_free(struct igc_adapter *adapter)
+{
+ struct igc_led_classdev *leds = adapter->leds;
+ int i;
+
+ for (i = 0; i < IGC_NUM_LEDS; i++)
+ led_classdev_unregister(&((leds + i)->led));
+
+ kfree(leds);
}
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 35ad40a..4d975d6 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -7021,6 +7021,9 @@ static void igc_remove(struct pci_dev *pdev)
cancel_work_sync(&adapter->watchdog_task);
hrtimer_cancel(&adapter->hrtimer);
+ if (IS_ENABLED(CONFIG_IGC_LEDS))
+ igc_led_free(adapter);
+
/* Release control of h/w to f/w. If f/w is AMT enabled, this
* would have already happened in close and is redundant.
*/
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH net] igc: Fix LED-related deadlock on driver unbind
2024-04-15 13:48 Lukas Wunner
@ 2024-04-16 13:51 ` Simon Horman
2024-04-16 14:06 ` Kurt Kanzenbach
1 sibling, 0 replies; 11+ messages in thread
From: Simon Horman @ 2024-04-16 13:51 UTC (permalink / raw)
To: Lukas Wunner
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Jesse Brandeburg, Tony Nguyen, intel-wired-lan, netdev,
Roman Lozko, Kurt Kanzenbach, Heiner Kallweit, Andrew Lunn,
Sasha Neftin
On Mon, Apr 15, 2024 at 03:48:48PM +0200, Lukas Wunner wrote:
> Roman reports a deadlock on unplug of a Thunderbolt docking station
> containing an Intel I225 Ethernet adapter.
>
> The root cause is that led_classdev's for LEDs on the adapter are
> registered such that they're device-managed by the netdev. That
> results in recursive acquisition of the rtnl_lock() mutex on unplug:
>
> When the driver calls unregister_netdev(), it acquires rtnl_lock(),
> then frees the device-managed resources. Upon unregistering the LEDs,
> netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
> which tries to acquire rtnl_lock() again.
>
> Avoid by using non-device-managed LED registration.
>
> Stack trace for posterity:
>
> schedule+0x6e/0xf0
> schedule_preempt_disabled+0x15/0x20
> __mutex_lock+0x2a0/0x750
> unregister_netdevice_notifier+0x40/0x150
> netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev]
> led_trigger_set+0x102/0x330
> led_classdev_unregister+0x4b/0x110
> release_nodes+0x3d/0xb0
> devres_release_all+0x8b/0xc0
> device_del+0x34f/0x3c0
> unregister_netdevice_many_notify+0x80b/0xaf0
> unregister_netdev+0x7c/0xd0
> igc_remove+0xd8/0x1e0 [igc]
> pci_device_remove+0x3f/0xb0
>
> Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
> Reported-by: Roman Lozko <lozko.roma@gmail.com>
> Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
I am aware that Kurt has submitted what appears to be the same patch [1,2],
which I'm inclined to put down to miscommunication (email based workflows
are like that sometimes).
FWIIW, it is my understanding is that the patch originated from
Lukas[3], and thus it seems most appropriate to take his submission.
As for the patch itself, I agree that it addresses the problem at hand.
For the record, I have not tested it.
Reviewed-by: Simon Horman <horms@kernel.org>
[1] [PATCH iwl-net] igc: Fix deadlock on module removal
https://lore.kernel.org/netdev/20240411-igc_led_deadlock-v1-1-0da98a3c68c5@linutronix.de/
[2] [PATCH iwl-net v2] igc: Fix deadlock on module removal
https://lore.kernel.org/netdev/20240411-igc_led_deadlock-v2-1-b758c0c88b2b@linutronix.de/
[3] Re: Deadlock in pciehp on dock disconnect
https://lore.kernel.org/all/ZhBN9p1yOyciXkzw@wunner.de/
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH net] igc: Fix LED-related deadlock on driver unbind
2024-04-15 13:48 Lukas Wunner
2024-04-16 13:51 ` Simon Horman
@ 2024-04-16 14:06 ` Kurt Kanzenbach
2024-04-16 20:55 ` Lukas Wunner
1 sibling, 1 reply; 11+ messages in thread
From: Kurt Kanzenbach @ 2024-04-16 14:06 UTC (permalink / raw)
To: Lukas Wunner, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jesse Brandeburg, Tony Nguyen
Cc: intel-wired-lan, netdev, Roman Lozko, Heiner Kallweit,
Andrew Lunn, Sasha Neftin
[-- Attachment #1: Type: text/plain, Size: 1748 bytes --]
On Mon Apr 15 2024, Lukas Wunner wrote:
> Roman reports a deadlock on unplug of a Thunderbolt docking station
> containing an Intel I225 Ethernet adapter.
>
> The root cause is that led_classdev's for LEDs on the adapter are
> registered such that they're device-managed by the netdev. That
> results in recursive acquisition of the rtnl_lock() mutex on unplug:
>
> When the driver calls unregister_netdev(), it acquires rtnl_lock(),
> then frees the device-managed resources. Upon unregistering the LEDs,
> netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
> which tries to acquire rtnl_lock() again.
>
> Avoid by using non-device-managed LED registration.
>
> Stack trace for posterity:
>
> schedule+0x6e/0xf0
> schedule_preempt_disabled+0x15/0x20
> __mutex_lock+0x2a0/0x750
> unregister_netdevice_notifier+0x40/0x150
> netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev]
> led_trigger_set+0x102/0x330
> led_classdev_unregister+0x4b/0x110
> release_nodes+0x3d/0xb0
> devres_release_all+0x8b/0xc0
> device_del+0x34f/0x3c0
> unregister_netdevice_many_notify+0x80b/0xaf0
> unregister_netdev+0x7c/0xd0
> igc_remove+0xd8/0x1e0 [igc]
> pci_device_remove+0x3f/0xb0
>
> Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
> Reported-by: Roman Lozko <lozko.roma@gmail.com>
> Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
I think, the first SoB has to be yours, because you are the patch
author. In fact, my SoB is not required at all.
However, feel free to add:
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de> # Intel i225
Thanks,
Kurt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH net] igc: Fix LED-related deadlock on driver unbind
2024-04-16 14:06 ` Kurt Kanzenbach
@ 2024-04-16 20:55 ` Lukas Wunner
0 siblings, 0 replies; 11+ messages in thread
From: Lukas Wunner @ 2024-04-16 20:55 UTC (permalink / raw)
To: Kurt Kanzenbach
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Jesse Brandeburg, Tony Nguyen, intel-wired-lan, netdev,
Roman Lozko, Heiner Kallweit, Andrew Lunn, Sasha Neftin
On Tue, Apr 16, 2024 at 04:06:49PM +0200, Kurt Kanzenbach wrote:
> On Mon Apr 15 2024, Lukas Wunner wrote:
> > Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
> > Reported-by: Roman Lozko <lozko.roma@gmail.com>
> > Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/
> > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
>
> I think, the first SoB has to be yours, because you are the patch
> author. In fact, my SoB is not required at all.
My understanding is that the commit author must be identical to the last
Signed-off-by, so I put mine last. I've seen Stephen Rothwell send
complaints whenever he spotted commits in linux-next violating that.
I carried over the variable and function renaming you did to match
the driver's (or your) preferred style, hence the inclusion of your
Signed-off-by.
Thanks!
Lukas
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-04-25 3:40 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-22 20:45 [PATCH net] igc: Fix LED-related deadlock on driver unbind Tony Nguyen
2024-04-22 23:32 ` Jacob Keller
2024-04-22 23:37 ` Marek Marczykowski-Górecki
2024-04-22 23:46 ` Jacob Keller
2024-04-23 8:08 ` Lukas Wunner
2024-04-23 7:53 ` Lukas Wunner
2024-04-25 3:40 ` patchwork-bot+netdevbpf
-- strict thread matches above, loose matches on Subject: below --
2024-04-15 13:48 Lukas Wunner
2024-04-16 13:51 ` Simon Horman
2024-04-16 14:06 ` Kurt Kanzenbach
2024-04-16 20:55 ` Lukas Wunner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).