* [PATCH] r8169: do not call rtl8169_down() twice on shutdown
@ 2025-04-15 9:53 Niklas Cassel
2025-04-15 18:48 ` Heiner Kallweit
0 siblings, 1 reply; 13+ messages in thread
From: Niklas Cassel @ 2025-04-15 9:53 UTC (permalink / raw)
To: Heiner Kallweit, nic_swsd, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Manivannan Sadhasivam, Niklas Cassel, netdev
For a PCI controller driver with a .shutdown() callback, we will see the
following warning:
[ 12.020111] called from state HALTED
[ 12.020459] WARNING: CPU: 7 PID: 229 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0
This is because rtl8169_down() (which calls phy_stop()) is called twice
during shutdown.
First time:
[ 23.827764] Call trace:
[ 23.827765] show_stack+0x20/0x40 (C)
[ 23.827774] dump_stack_lvl+0x60/0x80
[ 23.827778] dump_stack+0x18/0x24
[ 23.827782] rtl8169_down+0x30/0x2a0
[ 23.827788] rtl_shutdown+0xb0/0xc0
[ 23.827792] pci_device_shutdown+0x3c/0x88
[ 23.827797] device_shutdown+0x150/0x278
[ 23.827802] kernel_restart+0x4c/0xb8
Second time:
[ 23.841468] Call trace:
[ 23.841470] show_stack+0x20/0x40 (C)
[ 23.841478] dump_stack_lvl+0x60/0x80
[ 23.841483] dump_stack+0x18/0x24
[ 23.841486] rtl8169_down+0x30/0x2a0
[ 23.841492] rtl8169_close+0x64/0x100
[ 23.841496] __dev_close_many+0xbc/0x1f0
[ 23.841502] dev_close_many+0x94/0x160
[ 23.841505] unregister_netdevice_many_notify+0x160/0x9d0
[ 23.841510] unregister_netdevice_queue+0xf0/0x100
[ 23.841515] unregister_netdev+0x2c/0x58
[ 23.841519] rtl_remove_one+0xa0/0xe0
[ 23.841524] pci_device_remove+0x4c/0xf8
[ 23.841528] device_remove+0x54/0x90
[ 23.841534] device_release_driver_internal+0x1d4/0x238
[ 23.841539] device_release_driver+0x20/0x38
[ 23.841544] pci_stop_bus_device+0x84/0xe0
[ 23.841548] pci_stop_bus_device+0x40/0xe0
[ 23.841552] pci_stop_root_bus+0x48/0x80
[ 23.841555] dw_pcie_host_deinit+0x34/0xe0
[ 23.841559] rockchip_pcie_shutdown+0x20/0x38
[ 23.841565] platform_shutdown+0x2c/0x48
[ 23.841571] device_shutdown+0x150/0x278
[ 23.841575] kernel_restart+0x4c/0xb8
Add a netif_device_present() guard around the rtl8169_down() call in
rtl8169_close(), to avoid rtl8169_down() from being called twice.
This matches how e.g. e1000e_close() has a netif_device_present() guard
around the e1000e_down() call.
Signed-off-by: Niklas Cassel <cassel@kernel.org>
---
drivers/net/ethernet/realtek/r8169_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 4eebd9cb40a3..0300a06ae260 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -4879,7 +4879,8 @@ static int rtl8169_close(struct net_device *dev)
pm_runtime_get_sync(&pdev->dev);
netif_stop_queue(dev);
- rtl8169_down(tp);
+ if (netif_device_present(tp->dev))
+ rtl8169_down(tp);
rtl8169_rx_clear(tp);
free_irq(tp->irq, tp);
--
2.49.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] r8169: do not call rtl8169_down() twice on shutdown
2025-04-15 9:53 [PATCH] r8169: do not call rtl8169_down() twice on shutdown Niklas Cassel
@ 2025-04-15 18:48 ` Heiner Kallweit
2025-04-16 14:13 ` Niklas Cassel
0 siblings, 1 reply; 13+ messages in thread
From: Heiner Kallweit @ 2025-04-15 18:48 UTC (permalink / raw)
To: Niklas Cassel, nic_swsd, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Manivannan Sadhasivam, netdev
On 15.04.2025 11:53, Niklas Cassel wrote:
> For a PCI controller driver with a .shutdown() callback, we will see the
> following warning:
I saw the related mail thread, IIRC it was about potential new code.
Is this right? Or do we talk about existing code? Then it would have
to be treated as fix.
Existence of a shutdown callback itself is not the problem, the problem is
that all PCI bus devices are removed as part of shutdown handling for this
specific controller driver.
> [ 12.020111] called from state HALTED
> [ 12.020459] WARNING: CPU: 7 PID: 229 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0
>
> This is because rtl8169_down() (which calls phy_stop()) is called twice
> during shutdown.
>
> First time:
> [ 23.827764] Call trace:
> [ 23.827765] show_stack+0x20/0x40 (C)
> [ 23.827774] dump_stack_lvl+0x60/0x80
> [ 23.827778] dump_stack+0x18/0x24
> [ 23.827782] rtl8169_down+0x30/0x2a0
> [ 23.827788] rtl_shutdown+0xb0/0xc0
> [ 23.827792] pci_device_shutdown+0x3c/0x88
> [ 23.827797] device_shutdown+0x150/0x278
> [ 23.827802] kernel_restart+0x4c/0xb8
>
> Second time:
> [ 23.841468] Call trace:
> [ 23.841470] show_stack+0x20/0x40 (C)
> [ 23.841478] dump_stack_lvl+0x60/0x80
> [ 23.841483] dump_stack+0x18/0x24
> [ 23.841486] rtl8169_down+0x30/0x2a0
> [ 23.841492] rtl8169_close+0x64/0x100
> [ 23.841496] __dev_close_many+0xbc/0x1f0
> [ 23.841502] dev_close_many+0x94/0x160
> [ 23.841505] unregister_netdevice_many_notify+0x160/0x9d0
> [ 23.841510] unregister_netdevice_queue+0xf0/0x100
> [ 23.841515] unregister_netdev+0x2c/0x58
> [ 23.841519] rtl_remove_one+0xa0/0xe0
> [ 23.841524] pci_device_remove+0x4c/0xf8
> [ 23.841528] device_remove+0x54/0x90
> [ 23.841534] device_release_driver_internal+0x1d4/0x238
> [ 23.841539] device_release_driver+0x20/0x38
> [ 23.841544] pci_stop_bus_device+0x84/0xe0
> [ 23.841548] pci_stop_bus_device+0x40/0xe0
> [ 23.841552] pci_stop_root_bus+0x48/0x80
> [ 23.841555] dw_pcie_host_deinit+0x34/0xe0
> [ 23.841559] rockchip_pcie_shutdown+0x20/0x38
> [ 23.841565] platform_shutdown+0x2c/0x48
> [ 23.841571] device_shutdown+0x150/0x278
> [ 23.841575] kernel_restart+0x4c/0xb8
>
> Add a netif_device_present() guard around the rtl8169_down() call in
> rtl8169_close(), to avoid rtl8169_down() from being called twice.
>
> This matches how e.g. e1000e_close() has a netif_device_present() guard
> around the e1000e_down() call.
>
This approach has at least two issues:
1. Likely it breaks WoL, because now phy_detach() is called.
2. r8169 shutdown callback sets device to D3hot, PCI core wakes it up again
for the remove callback. Now it's left in D0.
I'll also spend a few thoughts on how to solve this best.
> Signed-off-by: Niklas Cassel <cassel@kernel.org>
> ---
> drivers/net/ethernet/realtek/r8169_main.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 4eebd9cb40a3..0300a06ae260 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -4879,7 +4879,8 @@ static int rtl8169_close(struct net_device *dev)
> pm_runtime_get_sync(&pdev->dev);
>
> netif_stop_queue(dev);
> - rtl8169_down(tp);
> + if (netif_device_present(tp->dev))
> + rtl8169_down(tp);
> rtl8169_rx_clear(tp);
>
> free_irq(tp->irq, tp);
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] r8169: do not call rtl8169_down() twice on shutdown
2025-04-15 18:48 ` Heiner Kallweit
@ 2025-04-16 14:13 ` Niklas Cassel
2025-04-16 15:45 ` Krishna Chaitanya Chundru
0 siblings, 1 reply; 13+ messages in thread
From: Niklas Cassel @ 2025-04-16 14:13 UTC (permalink / raw)
To: Heiner Kallweit
Cc: nic_swsd, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Manivannan Sadhasivam, netdev,
Krishna Chaitanya Chundru
On Tue, Apr 15, 2025 at 08:48:53PM +0200, Heiner Kallweit wrote:
> On 15.04.2025 11:53, Niklas Cassel wrote:
> > For a PCI controller driver with a .shutdown() callback, we will see the
> > following warning:
>
> I saw the related mail thread, IIRC it was about potential new code.
> Is this right? Or do we talk about existing code? Then it would have
> to be treated as fix.
The qcom PCIe controller driver wants to add a .shutdown callback:
https://lore.kernel.org/linux-pci/tb6kkyfgpemzacfwbg45otgcrg5ltj323y6ufjy3bw3rgri7uf@vrmmtueyg7su/T/#t
In that shutdown callback, they want to call dw_pcie_host_deinit(),
which will call pci_stop_root_bus().
Looking at other PCIe controller drivers:
$ git grep -p pci_stop_root_bus
There seems to be a lot of PCIe controller drivers that call
pci_stop_root_bus(), but they all do it from the .remove() callback,
not from the .shutdown() callback.
Actually, I can't see any existing PCIe controller driver that calls
pci_stop_root_bus() from the .shutdown() callback.
So perhaps we should hold off with this patch.
Adding the qcom folks to CC.
Kind regards,
Niklas
>
> Existence of a shutdown callback itself is not the problem, the problem is
> that all PCI bus devices are removed as part of shutdown handling for this
> specific controller driver.
>
> > [ 12.020111] called from state HALTED
> > [ 12.020459] WARNING: CPU: 7 PID: 229 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0
> >
> > This is because rtl8169_down() (which calls phy_stop()) is called twice
> > during shutdown.
> >
> > First time:
> > [ 23.827764] Call trace:
> > [ 23.827765] show_stack+0x20/0x40 (C)
> > [ 23.827774] dump_stack_lvl+0x60/0x80
> > [ 23.827778] dump_stack+0x18/0x24
> > [ 23.827782] rtl8169_down+0x30/0x2a0
> > [ 23.827788] rtl_shutdown+0xb0/0xc0
> > [ 23.827792] pci_device_shutdown+0x3c/0x88
> > [ 23.827797] device_shutdown+0x150/0x278
> > [ 23.827802] kernel_restart+0x4c/0xb8
> >
> > Second time:
> > [ 23.841468] Call trace:
> > [ 23.841470] show_stack+0x20/0x40 (C)
> > [ 23.841478] dump_stack_lvl+0x60/0x80
> > [ 23.841483] dump_stack+0x18/0x24
> > [ 23.841486] rtl8169_down+0x30/0x2a0
> > [ 23.841492] rtl8169_close+0x64/0x100
> > [ 23.841496] __dev_close_many+0xbc/0x1f0
> > [ 23.841502] dev_close_many+0x94/0x160
> > [ 23.841505] unregister_netdevice_many_notify+0x160/0x9d0
> > [ 23.841510] unregister_netdevice_queue+0xf0/0x100
> > [ 23.841515] unregister_netdev+0x2c/0x58
> > [ 23.841519] rtl_remove_one+0xa0/0xe0
> > [ 23.841524] pci_device_remove+0x4c/0xf8
> > [ 23.841528] device_remove+0x54/0x90
> > [ 23.841534] device_release_driver_internal+0x1d4/0x238
> > [ 23.841539] device_release_driver+0x20/0x38
> > [ 23.841544] pci_stop_bus_device+0x84/0xe0
> > [ 23.841548] pci_stop_bus_device+0x40/0xe0
> > [ 23.841552] pci_stop_root_bus+0x48/0x80
> > [ 23.841555] dw_pcie_host_deinit+0x34/0xe0
> > [ 23.841559] rockchip_pcie_shutdown+0x20/0x38
> > [ 23.841565] platform_shutdown+0x2c/0x48
> > [ 23.841571] device_shutdown+0x150/0x278
> > [ 23.841575] kernel_restart+0x4c/0xb8
> >
> > Add a netif_device_present() guard around the rtl8169_down() call in
> > rtl8169_close(), to avoid rtl8169_down() from being called twice.
> >
> > This matches how e.g. e1000e_close() has a netif_device_present() guard
> > around the e1000e_down() call.
> >
> This approach has at least two issues:
>
> 1. Likely it breaks WoL, because now phy_detach() is called.
> 2. r8169 shutdown callback sets device to D3hot, PCI core wakes it up again
> for the remove callback. Now it's left in D0.
>
> I'll also spend a few thoughts on how to solve this best.
>
> > Signed-off-by: Niklas Cassel <cassel@kernel.org>
> > ---
> > drivers/net/ethernet/realtek/r8169_main.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> > index 4eebd9cb40a3..0300a06ae260 100644
> > --- a/drivers/net/ethernet/realtek/r8169_main.c
> > +++ b/drivers/net/ethernet/realtek/r8169_main.c
> > @@ -4879,7 +4879,8 @@ static int rtl8169_close(struct net_device *dev)
> > pm_runtime_get_sync(&pdev->dev);
> >
> > netif_stop_queue(dev);
> > - rtl8169_down(tp);
> > + if (netif_device_present(tp->dev))
> > + rtl8169_down(tp);
> > rtl8169_rx_clear(tp);
> >
> > free_irq(tp->irq, tp);
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] r8169: do not call rtl8169_down() twice on shutdown
2025-04-16 14:13 ` Niklas Cassel
@ 2025-04-16 15:45 ` Krishna Chaitanya Chundru
2025-04-16 16:03 ` Niklas Cassel
0 siblings, 1 reply; 13+ messages in thread
From: Krishna Chaitanya Chundru @ 2025-04-16 15:45 UTC (permalink / raw)
To: Niklas Cassel, Heiner Kallweit
Cc: nic_swsd, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Manivannan Sadhasivam, netdev
On 4/16/2025 7:43 PM, Niklas Cassel wrote:
> On Tue, Apr 15, 2025 at 08:48:53PM +0200, Heiner Kallweit wrote:
>> On 15.04.2025 11:53, Niklas Cassel wrote:
>>> For a PCI controller driver with a .shutdown() callback, we will see the
>>> following warning:
>>
>> I saw the related mail thread, IIRC it was about potential new code.
>> Is this right? Or do we talk about existing code? Then it would have
>> to be treated as fix.
>
> The qcom PCIe controller driver wants to add a .shutdown callback:
> https://lore.kernel.org/linux-pci/tb6kkyfgpemzacfwbg45otgcrg5ltj323y6ufjy3bw3rgri7uf@vrmmtueyg7su/T/#t
>
> In that shutdown callback, they want to call dw_pcie_host_deinit(),
> which will call pci_stop_root_bus().
>
> Looking at other PCIe controller drivers:
> $ git grep -p pci_stop_root_bus
>
> There seems to be a lot of PCIe controller drivers that call
> pci_stop_root_bus(), but they all do it from the .remove() callback,
> not from the .shutdown() callback.
>
> Actually, I can't see any existing PCIe controller driver that calls
> pci_stop_root_bus() from the .shutdown() callback.
>
Not all endpoint have .shutdown callback(), the endpoint drivers still
will try to do data transfers if we don't call pci_stop_stop_bus(), and
if there are ongoing transfers over link and if the controller driver
turns off the PCIe link without calling pci_stop_root_bus we might get
bus errors, NOC errors etc which can hang the system.
This should be handled gracefully in the rtl8169 driver.
> So perhaps we should hold off with this patch.
>
I disagree on this, it might be causing issue with net driver, but we
might face some other issues as explained above if we don't call
pci_stop_root_bus().
- Krishna Chaitanya.
> Adding the qcom folks to CC.
>
>
> Kind regards,
> Niklas
>
>
>>
>> Existence of a shutdown callback itself is not the problem, the problem is
>> that all PCI bus devices are removed as part of shutdown handling for this
>> specific controller driver.
>>
>>> [ 12.020111] called from state HALTED
>>> [ 12.020459] WARNING: CPU: 7 PID: 229 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0
>>>
>>> This is because rtl8169_down() (which calls phy_stop()) is called twice
>>> during shutdown.
>>>
>>> First time:
>>> [ 23.827764] Call trace:
>>> [ 23.827765] show_stack+0x20/0x40 (C)
>>> [ 23.827774] dump_stack_lvl+0x60/0x80
>>> [ 23.827778] dump_stack+0x18/0x24
>>> [ 23.827782] rtl8169_down+0x30/0x2a0
>>> [ 23.827788] rtl_shutdown+0xb0/0xc0
>>> [ 23.827792] pci_device_shutdown+0x3c/0x88
>>> [ 23.827797] device_shutdown+0x150/0x278
>>> [ 23.827802] kernel_restart+0x4c/0xb8
>>>
>>> Second time:
>>> [ 23.841468] Call trace:
>>> [ 23.841470] show_stack+0x20/0x40 (C)
>>> [ 23.841478] dump_stack_lvl+0x60/0x80
>>> [ 23.841483] dump_stack+0x18/0x24
>>> [ 23.841486] rtl8169_down+0x30/0x2a0
>>> [ 23.841492] rtl8169_close+0x64/0x100
>>> [ 23.841496] __dev_close_many+0xbc/0x1f0
>>> [ 23.841502] dev_close_many+0x94/0x160
>>> [ 23.841505] unregister_netdevice_many_notify+0x160/0x9d0
>>> [ 23.841510] unregister_netdevice_queue+0xf0/0x100
>>> [ 23.841515] unregister_netdev+0x2c/0x58
>>> [ 23.841519] rtl_remove_one+0xa0/0xe0
>>> [ 23.841524] pci_device_remove+0x4c/0xf8
>>> [ 23.841528] device_remove+0x54/0x90
>>> [ 23.841534] device_release_driver_internal+0x1d4/0x238
>>> [ 23.841539] device_release_driver+0x20/0x38
>>> [ 23.841544] pci_stop_bus_device+0x84/0xe0
>>> [ 23.841548] pci_stop_bus_device+0x40/0xe0
>>> [ 23.841552] pci_stop_root_bus+0x48/0x80
>>> [ 23.841555] dw_pcie_host_deinit+0x34/0xe0
>>> [ 23.841559] rockchip_pcie_shutdown+0x20/0x38
>>> [ 23.841565] platform_shutdown+0x2c/0x48
>>> [ 23.841571] device_shutdown+0x150/0x278
>>> [ 23.841575] kernel_restart+0x4c/0xb8
>>>
>>> Add a netif_device_present() guard around the rtl8169_down() call in
>>> rtl8169_close(), to avoid rtl8169_down() from being called twice.
>>>
>>> This matches how e.g. e1000e_close() has a netif_device_present() guard
>>> around the e1000e_down() call.
>>>
>> This approach has at least two issues:
>>
>> 1. Likely it breaks WoL, because now phy_detach() is called.
>> 2. r8169 shutdown callback sets device to D3hot, PCI core wakes it up again
>> for the remove callback. Now it's left in D0.
>>
>> I'll also spend a few thoughts on how to solve this best.
>>
>>> Signed-off-by: Niklas Cassel <cassel@kernel.org>
>>> ---
>>> drivers/net/ethernet/realtek/r8169_main.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>>> index 4eebd9cb40a3..0300a06ae260 100644
>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>>> @@ -4879,7 +4879,8 @@ static int rtl8169_close(struct net_device *dev)
>>> pm_runtime_get_sync(&pdev->dev);
>>>
>>> netif_stop_queue(dev);
>>> - rtl8169_down(tp);
>>> + if (netif_device_present(tp->dev))
>>> + rtl8169_down(tp);
>>> rtl8169_rx_clear(tp);
>>>
>>> free_irq(tp->irq, tp);
>>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] r8169: do not call rtl8169_down() twice on shutdown
2025-04-16 15:45 ` Krishna Chaitanya Chundru
@ 2025-04-16 16:03 ` Niklas Cassel
2025-04-16 17:15 ` Krishna Chaitanya Chundru
2025-04-18 17:19 ` Manivannan Sadhasivam
0 siblings, 2 replies; 13+ messages in thread
From: Niklas Cassel @ 2025-04-16 16:03 UTC (permalink / raw)
To: Krishna Chaitanya Chundru
Cc: Heiner Kallweit, nic_swsd, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Manivannan Sadhasivam,
netdev
Hello Krishna Chaitanya,
On Wed, Apr 16, 2025 at 09:15:19PM +0530, Krishna Chaitanya Chundru wrote:
> On 4/16/2025 7:43 PM, Niklas Cassel wrote:
> >
> > So perhaps we should hold off with this patch.
> >
> I disagree on this, it might be causing issue with net driver, but we
> might face some other issues as explained above if we don't call
> pci_stop_root_bus().
When I wrote hold off with this patch, I meant the patch in $subject,
not your patch.
When it comes to your patch, I think that the commit log needs to explain
why it is so special.
Because AFAICT, all other PCIe controller drivers call pci_stop_root_bus()
in the .remove() callback, not in the .shutdown() callback.
Doing things differently from all other PCIe controller drivers is usually
a red flag.
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] r8169: do not call rtl8169_down() twice on shutdown
2025-04-16 16:03 ` Niklas Cassel
@ 2025-04-16 17:15 ` Krishna Chaitanya Chundru
2025-04-18 17:19 ` Manivannan Sadhasivam
1 sibling, 0 replies; 13+ messages in thread
From: Krishna Chaitanya Chundru @ 2025-04-16 17:15 UTC (permalink / raw)
To: Niklas Cassel
Cc: Heiner Kallweit, nic_swsd, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Manivannan Sadhasivam,
netdev
On 4/16/2025 9:33 PM, Niklas Cassel wrote:
> Hello Krishna Chaitanya,
>
> On Wed, Apr 16, 2025 at 09:15:19PM +0530, Krishna Chaitanya Chundru wrote:
>> On 4/16/2025 7:43 PM, Niklas Cassel wrote:
>>>
>>> So perhaps we should hold off with this patch.
>>>
>> I disagree on this, it might be causing issue with net driver, but we
>> might face some other issues as explained above if we don't call
>> pci_stop_root_bus().
>
> When I wrote hold off with this patch, I meant the patch in $subject,
> not your patch.
>
oh sorry, I misunderstood it.
>
> When it comes to your patch, I think that the commit log needs to explain
> why it is so special.
>
> Because AFAICT, all other PCIe controller drivers call pci_stop_root_bus()
> in the .remove() callback, not in the .shutdown() callback.
>
> Doing things differently from all other PCIe controller drivers is usually
> a red flag.
>
I will update all the required details in the next version of that
patch.
- Krishna Chaitanya.
>
> Kind regards,
> Niklas
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] r8169: do not call rtl8169_down() twice on shutdown
2025-04-16 16:03 ` Niklas Cassel
2025-04-16 17:15 ` Krishna Chaitanya Chundru
@ 2025-04-18 17:19 ` Manivannan Sadhasivam
2025-04-18 21:52 ` Heiner Kallweit
1 sibling, 1 reply; 13+ messages in thread
From: Manivannan Sadhasivam @ 2025-04-18 17:19 UTC (permalink / raw)
To: Niklas Cassel
Cc: Krishna Chaitanya Chundru, Heiner Kallweit, nic_swsd, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
On Wed, Apr 16, 2025 at 06:03:36PM +0200, Niklas Cassel wrote:
> Hello Krishna Chaitanya,
>
> On Wed, Apr 16, 2025 at 09:15:19PM +0530, Krishna Chaitanya Chundru wrote:
> > On 4/16/2025 7:43 PM, Niklas Cassel wrote:
> > >
> > > So perhaps we should hold off with this patch.
> > >
> > I disagree on this, it might be causing issue with net driver, but we
> > might face some other issues as explained above if we don't call
> > pci_stop_root_bus().
>
> When I wrote hold off with this patch, I meant the patch in $subject,
> not your patch.
>
>
> When it comes to your patch, I think that the commit log needs to explain
> why it is so special.
>
> Because AFAICT, all other PCIe controller drivers call pci_stop_root_bus()
> in the .remove() callback, not in the .shutdown() callback.
>
And this driver is special because, it has no 'remove()' callback as it
implements an irqchip controller. So we have to shutdown the devices cleanly in
the 'shutdown' callback.
Also do note that the driver core will not call the 'remove()' callback unless
the driver as a module is unloaded during poweroff/reboot scenarios. So the
controller drivers need to properly power down the devices in their 'shutdown()'
callback IMO.
> Doing things differently from all other PCIe controller drivers is usually
> a red flag.
>
Yes, even if it is the right thing to do ;) But I agree that the commit message
needs some improvement.
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] r8169: do not call rtl8169_down() twice on shutdown
2025-04-18 17:19 ` Manivannan Sadhasivam
@ 2025-04-18 21:52 ` Heiner Kallweit
2025-04-19 10:18 ` Heiner Kallweit
0 siblings, 1 reply; 13+ messages in thread
From: Heiner Kallweit @ 2025-04-18 21:52 UTC (permalink / raw)
To: Manivannan Sadhasivam, Niklas Cassel
Cc: Krishna Chaitanya Chundru, nic_swsd, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev
On 18.04.2025 19:19, Manivannan Sadhasivam wrote:
> On Wed, Apr 16, 2025 at 06:03:36PM +0200, Niklas Cassel wrote:
>> Hello Krishna Chaitanya,
>>
>> On Wed, Apr 16, 2025 at 09:15:19PM +0530, Krishna Chaitanya Chundru wrote:
>>> On 4/16/2025 7:43 PM, Niklas Cassel wrote:
>>>>
>>>> So perhaps we should hold off with this patch.
>>>>
>>> I disagree on this, it might be causing issue with net driver, but we
>>> might face some other issues as explained above if we don't call
>>> pci_stop_root_bus().
>>
>> When I wrote hold off with this patch, I meant the patch in $subject,
>> not your patch.
>>
>>
>> When it comes to your patch, I think that the commit log needs to explain
>> why it is so special.
>>
>> Because AFAICT, all other PCIe controller drivers call pci_stop_root_bus()
>> in the .remove() callback, not in the .shutdown() callback.
>>
>
> And this driver is special because, it has no 'remove()' callback as it
> implements an irqchip controller. So we have to shutdown the devices cleanly in
> the 'shutdown' callback.
>
Doing proper cleanup in a first place is responsibility of the respective bus
devices (in their shutdown() callback).
Calling pci_stop_root_bus() in the pci controllers shutdown() causes the bus
devices to be removed, hence their remove() is called. Typically devices
don't expect that remove() is called after shutdown(). This may cause issues
if e.g. shutdown() sets a device to D3cold. remove() won't expect that device
is inaccessible.
Another issue is with devices being wake-up sources. If wake-up is enabled,
then devices configure the wake-up in their shutdown() callback. Calling
remove() afterwards may reverse parts of the wake-up configuration.
And I'd expect that most wakeup-capable device disable wakeup in the
remove() callback. So there's a good chance that the proposed change breaks
wake-up.
There maybe other side effects I'm not aware of.
IMO a controller drivers shutdown() shall focus on cleanup up its own resources.
> Also do note that the driver core will not call the 'remove()' callback unless
> the driver as a module is unloaded during poweroff/reboot scenarios. So the
> controller drivers need to properly power down the devices in their 'shutdown()'
> callback IMO.
>
>> Doing things differently from all other PCIe controller drivers is usually
>> a red flag.
>>
>
> Yes, even if it is the right thing to do ;) But I agree that the commit message
> needs some improvement.
>
> - Mani
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] r8169: do not call rtl8169_down() twice on shutdown
2025-04-18 21:52 ` Heiner Kallweit
@ 2025-04-19 10:18 ` Heiner Kallweit
2025-04-19 23:18 ` Krishna Chaitanya Chundru
0 siblings, 1 reply; 13+ messages in thread
From: Heiner Kallweit @ 2025-04-19 10:18 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Krishna Chaitanya Chundru, nic_swsd, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, Niklas Cassel,
Manivannan Sadhasivam
On 18.04.2025 23:52, Heiner Kallweit wrote:
> On 18.04.2025 19:19, Manivannan Sadhasivam wrote:
>> On Wed, Apr 16, 2025 at 06:03:36PM +0200, Niklas Cassel wrote:
>>> Hello Krishna Chaitanya,
>>>
>>> On Wed, Apr 16, 2025 at 09:15:19PM +0530, Krishna Chaitanya Chundru wrote:
>>>> On 4/16/2025 7:43 PM, Niklas Cassel wrote:
>>>>>
>>>>> So perhaps we should hold off with this patch.
>>>>>
>>>> I disagree on this, it might be causing issue with net driver, but we
>>>> might face some other issues as explained above if we don't call
>>>> pci_stop_root_bus().
>>>
>>> When I wrote hold off with this patch, I meant the patch in $subject,
>>> not your patch.
>>>
>>>
>>> When it comes to your patch, I think that the commit log needs to explain
>>> why it is so special.
>>>
>>> Because AFAICT, all other PCIe controller drivers call pci_stop_root_bus()
>>> in the .remove() callback, not in the .shutdown() callback.
>>>
>>
>> And this driver is special because, it has no 'remove()' callback as it
>> implements an irqchip controller. So we have to shutdown the devices cleanly in
>> the 'shutdown' callback.
>>
> Doing proper cleanup in a first place is responsibility of the respective bus
> devices (in their shutdown() callback).
>
> Calling pci_stop_root_bus() in the pci controllers shutdown() causes the bus
> devices to be removed, hence their remove() is called. Typically devices
> don't expect that remove() is called after shutdown(). This may cause issues
> if e.g. shutdown() sets a device to D3cold. remove() won't expect that device
> is inaccessible.
>
> Another issue is with devices being wake-up sources. If wake-up is enabled,
> then devices configure the wake-up in their shutdown() callback. Calling
> remove() afterwards may reverse parts of the wake-up configuration.
> And I'd expect that most wakeup-capable device disable wakeup in the
> remove() callback. So there's a good chance that the proposed change breaks
> wake-up.
>
> There maybe other side effects I'm not aware of.
>
> IMO a controller drivers shutdown() shall focus on cleanup up its own resources.
>
>> Also do note that the driver core will not call the 'remove()' callback unless
>> the driver as a module is unloaded during poweroff/reboot scenarios. So the
>> controller drivers need to properly power down the devices in their 'shutdown()'
>> callback IMO.
>>
>>> Doing things differently from all other PCIe controller drivers is usually
>>> a red flag.
>>>
>>
>> Yes, even if it is the right thing to do ;) But I agree that the commit message
>> needs some improvement.
>>
>> - Mani
>>
>
+Bjorn, as this is primarily a PCI topic
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] r8169: do not call rtl8169_down() twice on shutdown
2025-04-19 10:18 ` Heiner Kallweit
@ 2025-04-19 23:18 ` Krishna Chaitanya Chundru
2025-04-20 6:25 ` Heiner Kallweit
0 siblings, 1 reply; 13+ messages in thread
From: Krishna Chaitanya Chundru @ 2025-04-19 23:18 UTC (permalink / raw)
To: Heiner Kallweit, Bjorn Helgaas
Cc: nic_swsd, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, Niklas Cassel,
Manivannan Sadhasivam
On 4/19/2025 3:48 PM, Heiner Kallweit wrote:
> On 18.04.2025 23:52, Heiner Kallweit wrote:
>> On 18.04.2025 19:19, Manivannan Sadhasivam wrote:
>>> On Wed, Apr 16, 2025 at 06:03:36PM +0200, Niklas Cassel wrote:
>>>> Hello Krishna Chaitanya,
>>>>
>>>> On Wed, Apr 16, 2025 at 09:15:19PM +0530, Krishna Chaitanya Chundru wrote:
>>>>> On 4/16/2025 7:43 PM, Niklas Cassel wrote:
>>>>>>
>>>>>> So perhaps we should hold off with this patch.
>>>>>>
>>>>> I disagree on this, it might be causing issue with net driver, but we
>>>>> might face some other issues as explained above if we don't call
>>>>> pci_stop_root_bus().
>>>>
>>>> When I wrote hold off with this patch, I meant the patch in $subject,
>>>> not your patch.
>>>>
>>>>
>>>> When it comes to your patch, I think that the commit log needs to explain
>>>> why it is so special.
>>>>
>>>> Because AFAICT, all other PCIe controller drivers call pci_stop_root_bus()
>>>> in the .remove() callback, not in the .shutdown() callback.
>>>>
>>>
>>> And this driver is special because, it has no 'remove()' callback as it
>>> implements an irqchip controller. So we have to shutdown the devices cleanly in
>>> the 'shutdown' callback.
>>>
>> Doing proper cleanup in a first place is responsibility of the respective bus
>> devices (in their shutdown() callback).
>>
>> Calling pci_stop_root_bus() in the pci controllers shutdown() causes the bus
>> devices to be removed, hence their remove() is called. Typically devices
>> don't expect that remove() is called after shutdown(). This may cause issues
>> if e.g. shutdown() sets a device to D3cold. remove() won't expect that device
>> is inaccessible.
>>
Lets say controller driver in the shut down callback keeps PCIe device
state in D3cold without removing PCIe devices. Then the PCIe client
drivers which are not registered with the shutdown callback assumes PCIe
link is still accessible and initiates some transfers or there may be
on ongoing transfers also which can result in some system errors like
soc error etc which can hang the device.
The patch which I submitted in the qcom pcie controller, removes the
PCIe devices first before turning off the PCIe link. All this
info needs to be in the commit text, I will update it in the next
version.
>> Another issue is with devices being wake-up sources. If wake-up is enabled,
>> then devices configure the wake-up in their shutdown() callback. Calling
>> remove() afterwards may reverse parts of the wake-up configuration.
>> And I'd expect that most wakeup-capable device disable wakeup in the
>> remove() callback. So there's a good chance that the proposed change breaks
>> wake-up.
>>
After shutdown, the system will restart right why we need to enable
wakeup in shutdown as after restart it will be like fresh boot to system
Correct me if I was wrong here.
I want to understand, why shutdown of the PCIe endpoint drivers in this
case rtl18169 shutdown will be called before PCIe controller shutdown,
AFAIK, the shutdown execution order will be same as probe order.
The problem PCI patch is trying to do is, not every PCIe drivers are
registering with shutdown callback, in that case if PCIe controller
driver if it cleans up its resources in shutdown and the PCIe drivers
which don't have the shutdown callback doesn't know that link was
down and can continue data transfers which will cause system errors like
noc error.
- Krishna Chaitanya.
>> There maybe other side effects I'm not aware of.
>>
>> IMO a controller drivers shutdown() shall focus on cleanup up its own resources.
>
>>> Also do note that the driver core will not call the 'remove()' callback unless
>>> the driver as a module is unloaded during poweroff/reboot scenarios. So the
>>> controller drivers need to properly power down the devices in their 'shutdown()'
>>> callback IMO.
>>>
>>>> Doing things differently from all other PCIe controller drivers is usually
>>>> a red flag.
>>>>
>>>
>>> Yes, even if it is the right thing to do ;) But I agree that the commit message
>>> needs some improvement.
>>>
>>> - Mani
>>>
>>
> +Bjorn, as this is primarily a PCI topic
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] r8169: do not call rtl8169_down() twice on shutdown
2025-04-19 23:18 ` Krishna Chaitanya Chundru
@ 2025-04-20 6:25 ` Heiner Kallweit
2025-04-23 17:39 ` Bjorn Helgaas
2025-04-23 17:50 ` Manivannan Sadhasivam
0 siblings, 2 replies; 13+ messages in thread
From: Heiner Kallweit @ 2025-04-20 6:25 UTC (permalink / raw)
To: Krishna Chaitanya Chundru, Bjorn Helgaas
Cc: nic_swsd, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, Niklas Cassel,
Manivannan Sadhasivam
On 20.04.2025 01:18, Krishna Chaitanya Chundru wrote:
>
> On 4/19/2025 3:48 PM, Heiner Kallweit wrote:
>> On 18.04.2025 23:52, Heiner Kallweit wrote:
>>> On 18.04.2025 19:19, Manivannan Sadhasivam wrote:
>>>> On Wed, Apr 16, 2025 at 06:03:36PM +0200, Niklas Cassel wrote:
>>>>> Hello Krishna Chaitanya,
>>>>>
>>>>> On Wed, Apr 16, 2025 at 09:15:19PM +0530, Krishna Chaitanya Chundru wrote:
>>>>>> On 4/16/2025 7:43 PM, Niklas Cassel wrote:
>>>>>>>
>>>>>>> So perhaps we should hold off with this patch.
>>>>>>>
>>>>>> I disagree on this, it might be causing issue with net driver, but we
>>>>>> might face some other issues as explained above if we don't call
>>>>>> pci_stop_root_bus().
>>>>>
>>>>> When I wrote hold off with this patch, I meant the patch in $subject,
>>>>> not your patch.
>>>>>
>>>>>
>>>>> When it comes to your patch, I think that the commit log needs to explain
>>>>> why it is so special.
>>>>>
>>>>> Because AFAICT, all other PCIe controller drivers call pci_stop_root_bus()
>>>>> in the .remove() callback, not in the .shutdown() callback.
>>>>>
>>>>
>>>> And this driver is special because, it has no 'remove()' callback as it
>>>> implements an irqchip controller. So we have to shutdown the devices cleanly in
>>>> the 'shutdown' callback.
>>>>
>>> Doing proper cleanup in a first place is responsibility of the respective bus
>>> devices (in their shutdown() callback).
>>>
>>> Calling pci_stop_root_bus() in the pci controllers shutdown() causes the bus
>>> devices to be removed, hence their remove() is called. Typically devices
>>> don't expect that remove() is called after shutdown(). This may cause issues
>>> if e.g. shutdown() sets a device to D3cold. remove() won't expect that device
>>> is inaccessible.
>>>
> Lets say controller driver in the shut down callback keeps PCIe device
> state in D3cold without removing PCIe devices. Then the PCIe client
> drivers which are not registered with the shutdown callback assumes PCIe
> link is still accessible and initiates some transfers or there may be
> on ongoing transfers also which can result in some system errors like
> soc error etc which can hang the device.
>
I'd consider a bus device driver behaving this way as broken.
IMO device drivers should ensure that device is quiesced on shutdown.
As you highlight this case, do you have specific examples?
Maybe we should focus on fixing such bus device drivers first.
I'd be interested in the PCI maintainers point of view, that's why I added
Bjorn to the discussion.
> The patch which I submitted in the qcom pcie controller, removes the
> PCIe devices first before turning off the PCIe link. All this
> info needs to be in the commit text, I will update it in the next
> version.
>>> Another issue is with devices being wake-up sources. If wake-up is enabled,
>>> then devices configure the wake-up in their shutdown() callback. Calling
>>> remove() afterwards may reverse parts of the wake-up configuration.
>>> And I'd expect that most wakeup-capable device disable wakeup in the
>>> remove() callback. So there's a good chance that the proposed change breaks
>>> wake-up.
>>>
> After shutdown, the system will restart right why we need to enable wakeup in shutdown as after restart it will be like fresh boot to system
> Correct me if I was wrong here.
>
> I want to understand, why shutdown of the PCIe endpoint drivers in this
> case rtl18169 shutdown will be called before PCIe controller shutdown,
> AFAIK, the shutdown execution order will be same as probe order.
>
See following comment in device_shutdown():
"Walk the devices list backward, shutting down each in turn."
> The problem PCI patch is trying to do is, not every PCIe drivers are
> registering with shutdown callback, in that case if PCIe controller
> driver if it cleans up its resources in shutdown and the PCIe drivers
> which don't have the shutdown callback doesn't know that link was
> down and can continue data transfers which will cause system errors like
> noc error.
>
> - Krishna Chaitanya.
>>> There maybe other side effects I'm not aware of.
>>>
>>> IMO a controller drivers shutdown() shall focus on cleanup up its own resources.
>>
>>>> Also do note that the driver core will not call the 'remove()' callback unless
>>>> the driver as a module is unloaded during poweroff/reboot scenarios. So the
>>>> controller drivers need to properly power down the devices in their 'shutdown()'
>>>> callback IMO.
>>>>
>>>>> Doing things differently from all other PCIe controller drivers is usually
>>>>> a red flag.
>>>>>
>>>>
>>>> Yes, even if it is the right thing to do ;) But I agree that the commit message
>>>> needs some improvement.
>>>>
>>>> - Mani
>>>>
>>>
>> +Bjorn, as this is primarily a PCI topic
>>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] r8169: do not call rtl8169_down() twice on shutdown
2025-04-20 6:25 ` Heiner Kallweit
@ 2025-04-23 17:39 ` Bjorn Helgaas
2025-04-23 17:50 ` Manivannan Sadhasivam
1 sibling, 0 replies; 13+ messages in thread
From: Bjorn Helgaas @ 2025-04-23 17:39 UTC (permalink / raw)
To: Heiner Kallweit
Cc: Krishna Chaitanya Chundru, Bjorn Helgaas, nic_swsd, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, Niklas Cassel, Manivannan Sadhasivam
On Sun, Apr 20, 2025 at 08:25:19AM +0200, Heiner Kallweit wrote:
> On 20.04.2025 01:18, Krishna Chaitanya Chundru wrote:
> > On 4/19/2025 3:48 PM, Heiner Kallweit wrote:
> >> On 18.04.2025 23:52, Heiner Kallweit wrote:
> >>> On 18.04.2025 19:19, Manivannan Sadhasivam wrote:
> >>>> On Wed, Apr 16, 2025 at 06:03:36PM +0200, Niklas Cassel wrote:
> >>>>> Hello Krishna Chaitanya,
> >>>>>
> >>>>> On Wed, Apr 16, 2025 at 09:15:19PM +0530, Krishna Chaitanya Chundru wrote:
> >>>>>> On 4/16/2025 7:43 PM, Niklas Cassel wrote:
> >>>>>>>
> >>>>>>> So perhaps we should hold off with this patch.
> >>>>>>>
> >>>>>> I disagree on this, it might be causing issue with net
> >>>>>> driver, but we might face some other issues as explained
> >>>>>> above if we don't call pci_stop_root_bus().
> >>>>>
> >>>>> When I wrote hold off with this patch, I meant the patch in
> >>>>> $subject, not your patch.
> >>>>>
> >>>>>
> >>>>> When it comes to your patch, I think that the commit log needs
> >>>>> to explain why it is so special.
> >>>>>
> >>>>> Because AFAICT, all other PCIe controller drivers call
> >>>>> pci_stop_root_bus() in the .remove() callback, not in the
> >>>>> .shutdown() callback.
> >>>>
> >>>> And this driver is special because, it has no 'remove()'
> >>>> callback as it implements an irqchip controller. So we have to
> >>>> shutdown the devices cleanly in the 'shutdown' callback.
> >>>>
> >>> Doing proper cleanup in a first place is responsibility of the
> >>> respective bus devices (in their shutdown() callback).
> >>>
> >>> Calling pci_stop_root_bus() in the pci controllers shutdown()
> >>> causes the bus devices to be removed, hence their remove() is
> >>> called. Typically devices don't expect that remove() is called
> >>> after shutdown(). This may cause issues if e.g. shutdown() sets
> >>> a device to D3cold. remove() won't expect that device is
> >>> inaccessible.
> >
> > Lets say controller driver in the shut down callback keeps PCIe
> > device state in D3cold without removing PCIe devices. Then the
> > PCIe client drivers which are not registered with the shutdown
> > callback assumes PCIe link is still accessible and initiates some
> > transfers or there may be on ongoing transfers also which can
> > result in some system errors like soc error etc which can hang the
> > device.
> >
> I'd consider a bus device driver behaving this way as broken. IMO
> device drivers should ensure that device is quiesced on shutdown.
> As you highlight this case, do you have specific examples? Maybe we
> should focus on fixing such bus device drivers first.
>
> I'd be interested in the PCI maintainers point of view, that's why I
> added Bjorn to the discussion.
See https://lore.kernel.org/r/20250423171715.GA430351@bhelgaas
I think we should resolve the question of whether .shutdown() should
remove all downstream drivers and devices needs to be resolved, and
hopefully we don't have to discuss it in two separate places.
Bjorn
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] r8169: do not call rtl8169_down() twice on shutdown
2025-04-20 6:25 ` Heiner Kallweit
2025-04-23 17:39 ` Bjorn Helgaas
@ 2025-04-23 17:50 ` Manivannan Sadhasivam
1 sibling, 0 replies; 13+ messages in thread
From: Manivannan Sadhasivam @ 2025-04-23 17:50 UTC (permalink / raw)
To: Heiner Kallweit
Cc: Krishna Chaitanya Chundru, Bjorn Helgaas, nic_swsd, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, Niklas Cassel
On Sun, Apr 20, 2025 at 08:25:19AM +0200, Heiner Kallweit wrote:
> On 20.04.2025 01:18, Krishna Chaitanya Chundru wrote:
> >
> > On 4/19/2025 3:48 PM, Heiner Kallweit wrote:
> >> On 18.04.2025 23:52, Heiner Kallweit wrote:
> >>> On 18.04.2025 19:19, Manivannan Sadhasivam wrote:
> >>>> On Wed, Apr 16, 2025 at 06:03:36PM +0200, Niklas Cassel wrote:
> >>>>> Hello Krishna Chaitanya,
> >>>>>
> >>>>> On Wed, Apr 16, 2025 at 09:15:19PM +0530, Krishna Chaitanya Chundru wrote:
> >>>>>> On 4/16/2025 7:43 PM, Niklas Cassel wrote:
> >>>>>>>
> >>>>>>> So perhaps we should hold off with this patch.
> >>>>>>>
> >>>>>> I disagree on this, it might be causing issue with net driver, but we
> >>>>>> might face some other issues as explained above if we don't call
> >>>>>> pci_stop_root_bus().
> >>>>>
> >>>>> When I wrote hold off with this patch, I meant the patch in $subject,
> >>>>> not your patch.
> >>>>>
> >>>>>
> >>>>> When it comes to your patch, I think that the commit log needs to explain
> >>>>> why it is so special.
> >>>>>
> >>>>> Because AFAICT, all other PCIe controller drivers call pci_stop_root_bus()
> >>>>> in the .remove() callback, not in the .shutdown() callback.
> >>>>>
> >>>>
> >>>> And this driver is special because, it has no 'remove()' callback as it
> >>>> implements an irqchip controller. So we have to shutdown the devices cleanly in
> >>>> the 'shutdown' callback.
> >>>>
> >>> Doing proper cleanup in a first place is responsibility of the respective bus
> >>> devices (in their shutdown() callback).
> >>>
> >>> Calling pci_stop_root_bus() in the pci controllers shutdown() causes the bus
> >>> devices to be removed, hence their remove() is called. Typically devices
> >>> don't expect that remove() is called after shutdown(). This may cause issues
> >>> if e.g. shutdown() sets a device to D3cold. remove() won't expect that device
> >>> is inaccessible.
> >>>
> > Lets say controller driver in the shut down callback keeps PCIe device
> > state in D3cold without removing PCIe devices. Then the PCIe client
> > drivers which are not registered with the shutdown callback assumes PCIe
> > link is still accessible and initiates some transfers or there may be
> > on ongoing transfers also which can result in some system errors like
> > soc error etc which can hang the device.
> >
> I'd consider a bus device driver behaving this way as broken.
> IMO device drivers should ensure that device is quiesced on shutdown.
> As you highlight this case, do you have specific examples?
> Maybe we should focus on fixing such bus device drivers first.
>
Hi,
I was the author of the patch that Krishna submitted and I tend to agree that it
is indeed a bad idea of remove devices during shutdown(). The prime motive of
the patch is to properly shutdown the devices during poweroff/reboot scenarios.
So the removal of devices is rather unintended.
> I'd be interested in the PCI maintainers point of view, that's why I added
> Bjorn to the discussion.
>
> > The patch which I submitted in the qcom pcie controller, removes the
> > PCIe devices first before turning off the PCIe link. All this
> > info needs to be in the commit text, I will update it in the next
> > version.
> >>> Another issue is with devices being wake-up sources. If wake-up is enabled,
> >>> then devices configure the wake-up in their shutdown() callback. Calling
> >>> remove() afterwards may reverse parts of the wake-up configuration.
> >>> And I'd expect that most wakeup-capable device disable wakeup in the
> >>> remove() callback. So there's a good chance that the proposed change breaks
> >>> wake-up.
> >>>
> > After shutdown, the system will restart right why we need to enable wakeup in shutdown as after restart it will be like fresh boot to system
> > Correct me if I was wrong here.
> >
> > I want to understand, why shutdown of the PCIe endpoint drivers in this
> > case rtl18169 shutdown will be called before PCIe controller shutdown,
> > AFAIK, the shutdown execution order will be same as probe order.
> >
> See following comment in device_shutdown():
> "Walk the devices list backward, shutting down each in turn."
>
Yes, indeed!
@krishna: We may need to reuse the code in dw_pcie_suspend_noirq() for
transitioning the devices into D3Cold and Link into L2/L3. And please drop
the call to dw_pcie_host_deinit().
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-04-23 17:50 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-15 9:53 [PATCH] r8169: do not call rtl8169_down() twice on shutdown Niklas Cassel
2025-04-15 18:48 ` Heiner Kallweit
2025-04-16 14:13 ` Niklas Cassel
2025-04-16 15:45 ` Krishna Chaitanya Chundru
2025-04-16 16:03 ` Niklas Cassel
2025-04-16 17:15 ` Krishna Chaitanya Chundru
2025-04-18 17:19 ` Manivannan Sadhasivam
2025-04-18 21:52 ` Heiner Kallweit
2025-04-19 10:18 ` Heiner Kallweit
2025-04-19 23:18 ` Krishna Chaitanya Chundru
2025-04-20 6:25 ` Heiner Kallweit
2025-04-23 17:39 ` Bjorn Helgaas
2025-04-23 17:50 ` Manivannan Sadhasivam
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).