netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] net: dsa: lan9303: avoid dsa_switch_shutdown()
@ 2024-09-04  9:07 Alexander Sverdlin
  2024-09-11 14:33 ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Alexander Sverdlin @ 2024-09-04  9:07 UTC (permalink / raw)
  To: netdev; +Cc: Andrew Lunn

dsa_switch_shutdown() doesn't bring down any ports, but only disconnects
slaves from master. Packets still come afterwards into master port and the
ports are being polled for link status. This leads to crashes:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1
Workqueue: events_power_efficient phy_state_machine
pc : lan9303_mdio_phy_read
lr : lan9303_phy_read
Call trace:
 lan9303_mdio_phy_read
 lan9303_phy_read
 dsa_slave_phy_read
 __mdiobus_read
 mdiobus_read
 genphy_update_link
 genphy_read_status
 phy_check_link_status
 phy_state_machine
 process_one_work
 worker_thread

Call lan9303_remove() instead to really unregister all ports before zeroing
drvdata and dsa_ptr.

Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
---
 drivers/net/dsa/lan9303-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 268949939636..ecd507355f51 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -1477,7 +1477,7 @@ EXPORT_SYMBOL(lan9303_remove);
 
 void lan9303_shutdown(struct lan9303 *chip)
 {
-	dsa_switch_shutdown(chip->ds);
+	lan9303_remove(chip);
 }
 EXPORT_SYMBOL(lan9303_shutdown);
 
-- 
2.46.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: dsa: lan9303: avoid dsa_switch_shutdown()
  2024-09-04  9:07 [PATCH net] net: dsa: lan9303: avoid dsa_switch_shutdown() Alexander Sverdlin
@ 2024-09-11 14:33 ` Jakub Kicinski
  0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2024-09-11 14:33 UTC (permalink / raw)
  To: Alexander Sverdlin; +Cc: netdev, Andrew Lunn

On Wed, 4 Sep 2024 11:07:42 +0200 Alexander Sverdlin wrote:
> dsa_switch_shutdown() doesn't bring down any ports, but only disconnects
> slaves from master. Packets still come afterwards into master port and the
> ports are being polled for link status. This leads to crashes:

Something is wrong with the date on your (email) system, the email
arrived with a date from a week ago. It's not going to get ingested
into our patch handling systems.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net] net: dsa: lan9303: avoid dsa_switch_shutdown()
@ 2024-09-11 14:40 A. Sverdlin
  2024-09-11 16:28 ` Vladimir Oltean
  2024-09-12 10:15 ` Vladimir Oltean
  0 siblings, 2 replies; 11+ messages in thread
From: A. Sverdlin @ 2024-09-11 14:40 UTC (permalink / raw)
  To: netdev
  Cc: Alexander Sverdlin, Andrew Lunn, Florian Fainelli,
	Vladimir Oltean, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, stable

From: Alexander Sverdlin <alexander.sverdlin@siemens.com>

dsa_switch_shutdown() doesn't bring down any ports, but only disconnects
slaves from master. Packets still come afterwards into master port and the
ports are being polled for link status. This leads to crashes:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1
Workqueue: events_power_efficient phy_state_machine
pc : lan9303_mdio_phy_read
lr : lan9303_phy_read
Call trace:
 lan9303_mdio_phy_read
 lan9303_phy_read
 dsa_slave_phy_read
 __mdiobus_read
 mdiobus_read
 genphy_update_link
 genphy_read_status
 phy_check_link_status
 phy_state_machine
 process_one_work
 worker_thread

Call lan9303_remove() instead to really unregister all ports before zeroing
drvdata and dsa_ptr.

Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
---
 drivers/net/dsa/lan9303-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 268949939636..ecd507355f51 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -1477,7 +1477,7 @@ EXPORT_SYMBOL(lan9303_remove);
 
 void lan9303_shutdown(struct lan9303 *chip)
 {
-	dsa_switch_shutdown(chip->ds);
+	lan9303_remove(chip);
 }
 EXPORT_SYMBOL(lan9303_shutdown);
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: dsa: lan9303: avoid dsa_switch_shutdown()
  2024-09-11 14:40 A. Sverdlin
@ 2024-09-11 16:28 ` Vladimir Oltean
  2024-09-11 16:55   ` Sverdlin, Alexander
  2024-09-12 10:15 ` Vladimir Oltean
  1 sibling, 1 reply; 11+ messages in thread
From: Vladimir Oltean @ 2024-09-11 16:28 UTC (permalink / raw)
  To: A. Sverdlin
  Cc: netdev, Andrew Lunn, Florian Fainelli, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, stable

Hi Alexander,

On Wed, Sep 11, 2024 at 04:40:03PM +0200, A. Sverdlin wrote:
> From: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> 
> dsa_switch_shutdown() doesn't bring down any ports, but only disconnects
> slaves from master. Packets still come afterwards into master port and the
> ports are being polled for link status. This leads to crashes:
> 
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1
> Workqueue: events_power_efficient phy_state_machine
> pc : lan9303_mdio_phy_read
> lr : lan9303_phy_read
> Call trace:
>  lan9303_mdio_phy_read
>  lan9303_phy_read
>  dsa_slave_phy_read
>  __mdiobus_read
>  mdiobus_read
>  genphy_update_link
>  genphy_read_status
>  phy_check_link_status
>  phy_state_machine
>  process_one_work
>  worker_thread
> 
> Call lan9303_remove() instead to really unregister all ports before zeroing
> drvdata and dsa_ptr.
> 
> Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")
> Cc: stable@vger.kernel.org
> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> ---
>  drivers/net/dsa/lan9303-core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
> index 268949939636..ecd507355f51 100644
> --- a/drivers/net/dsa/lan9303-core.c
> +++ b/drivers/net/dsa/lan9303-core.c
> @@ -1477,7 +1477,7 @@ EXPORT_SYMBOL(lan9303_remove);
>  
>  void lan9303_shutdown(struct lan9303 *chip)
>  {
> -	dsa_switch_shutdown(chip->ds);
> +	lan9303_remove(chip);
>  }
>  EXPORT_SYMBOL(lan9303_shutdown);
>  
> -- 
> 2.46.0
> 

You've said here that a similar change still does not protect against
packets received after shutdown:
https://lore.kernel.org/netdev/c5e0e67400816d68e6bf90b4a999bfa28c59043b.camel@siemens.com/

The difference between that and this is the extra lan9303_disable_processing_port()
calls here. But while that does disable RX on switch ports, it still doesn't wait
for pending RX frames to be processed. So the race is still open. No?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: dsa: lan9303: avoid dsa_switch_shutdown()
  2024-09-11 16:28 ` Vladimir Oltean
@ 2024-09-11 16:55   ` Sverdlin, Alexander
  2024-09-11 17:04     ` Sverdlin, Alexander
  0 siblings, 1 reply; 11+ messages in thread
From: Sverdlin, Alexander @ 2024-09-11 16:55 UTC (permalink / raw)
  To: olteanv@gmail.com
  Cc: andrew@lunn.ch, davem@davemloft.net, f.fainelli@gmail.com,
	pabeni@redhat.com, kuba@kernel.org, stable@vger.kernel.org,
	edumazet@google.com, netdev@vger.kernel.org

Hello Vladimir!

On Wed, 2024-09-11 at 19:28 +0300, Vladimir Oltean wrote:
> On Wed, Sep 11, 2024 at 04:40:03PM +0200, A. Sverdlin wrote:
> > From: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> > 
> > dsa_switch_shutdown() doesn't bring down any ports, but only disconnects
> > slaves from master. Packets still come afterwards into master port and the
> > ports are being polled for link status. This leads to crashes:
> > 
> > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> > CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1
> > Workqueue: events_power_efficient phy_state_machine
> > pc : lan9303_mdio_phy_read
> > lr : lan9303_phy_read
> > Call trace:
> >   lan9303_mdio_phy_read
> >   lan9303_phy_read
> >   dsa_slave_phy_read
> >   __mdiobus_read
> >   mdiobus_read
> >   genphy_update_link
> >   genphy_read_status
> >   phy_check_link_status
> >   phy_state_machine
> >   process_one_work
> >   worker_thread
> > 
> > Call lan9303_remove() instead to really unregister all ports before zeroing
> > drvdata and dsa_ptr.
> > 
> > Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> > ---
> >   drivers/net/dsa/lan9303-core.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
> > index 268949939636..ecd507355f51 100644
> > --- a/drivers/net/dsa/lan9303-core.c
> > +++ b/drivers/net/dsa/lan9303-core.c
> > @@ -1477,7 +1477,7 @@ EXPORT_SYMBOL(lan9303_remove);
> >   
> >   void lan9303_shutdown(struct lan9303 *chip)
> >   {
> > -	dsa_switch_shutdown(chip->ds);
> > +	lan9303_remove(chip);
> >   }
> >   EXPORT_SYMBOL(lan9303_shutdown);
> >   
> > -- 
> > 2.46.0
> > 
> 
> You've said here that a similar change still does not protect against
> packets received after shutdown:
> https://lore.kernel.org/netdev/c5e0e67400816d68e6bf90b4a999bfa28c59043b.camel@siemens.com/
> 
> The difference between that and this is the extra lan9303_disable_processing_port()
> calls here. But while that does disable RX on switch ports, it still doesn't wait
> for pending RX frames to be processed. So the race is still open. No?

This patch addresses the race of zeroing drvdata in

static void lan9303_mdio_shutdown(struct mdio_device *mdiodev)
{
        struct lan9303_mdio *sw_dev = dev_get_drvdata(&mdiodev->dev);
        
        if (!sw_dev)
                return;
 
        lan9303_shutdown(&sw_dev->chip);
        
        dev_set_drvdata(&mdiodev->dev, NULL);
}

versus 

static int lan9303_mdio_phy_read(struct lan9303 *chip, int phy,  int reg)
{
        struct lan9303_mdio *sw_dev = dev_get_drvdata(chip->dev);

what you refer to is another race, zeroing of dsa_ptr in struct net_device versus
the whole network stack, which I addressed in
https://lore.kernel.org/netdev/20240910130321.337154-2-alexander.sverdlin@siemens.com/

-- 
Alexander Sverdlin
Siemens AG
www.siemens.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: dsa: lan9303: avoid dsa_switch_shutdown()
  2024-09-11 16:55   ` Sverdlin, Alexander
@ 2024-09-11 17:04     ` Sverdlin, Alexander
  2024-09-11 17:09       ` Sverdlin, Alexander
  0 siblings, 1 reply; 11+ messages in thread
From: Sverdlin, Alexander @ 2024-09-11 17:04 UTC (permalink / raw)
  To: olteanv@gmail.com
  Cc: andrew@lunn.ch, davem@davemloft.net, f.fainelli@gmail.com,
	pabeni@redhat.com, kuba@kernel.org, stable@vger.kernel.org,
	edumazet@google.com, netdev@vger.kernel.org

Hello Vladimir,

On Wed, 2024-09-11 at 18:55 +0200, Alexander Sverdlin wrote:
> > You've said here that a similar change still does not protect against
> > packets received after shutdown:
> > https://lore.kernel.org/netdev/c5e0e67400816d68e6bf90b4a999bfa28c59043b.camel@siemens.com/
> > 
> > The difference between that and this is the extra lan9303_disable_processing_port()
> > calls here. But while that does disable RX on switch ports, it still doesn't wait
> > for pending RX frames to be processed. So the race is still open. No?

besides from the below, I've expected this question... In the meanwhile I've tested
mv88e6xxx driver, but it (accidentally) has no MDIO race vs shutdown.
After some shallow review of the drivers I didn't find dev_get_drvdata <= mdio_read
pattern therefore I've posted this tested patch.

If you'd prefer to solve this centrally for all drivers, I can test your patch from
the MDIO-drvdata PoV.

> This patch addresses the race of zeroing drvdata in
> 
> static void lan9303_mdio_shutdown(struct mdio_device *mdiodev)
> {
>         struct lan9303_mdio *sw_dev = dev_get_drvdata(&mdiodev->dev);
>         
>         if (!sw_dev)
>                 return;
>  
>         lan9303_shutdown(&sw_dev->chip);
>         
>         dev_set_drvdata(&mdiodev->dev, NULL);
> }
> 
> versus 
> 
> static int lan9303_mdio_phy_read(struct lan9303 *chip, int phy,  int reg)
> {
>         struct lan9303_mdio *sw_dev = dev_get_drvdata(chip->dev);
> 
> what you refer to is another race, zeroing of dsa_ptr in struct net_device versus
> the whole network stack, which I addressed in
> https://lore.kernel.org/netdev/20240910130321.337154-2-alexander.sverdlin@siemens.com/

-- 
Alexander Sverdlin
Siemens AG
www.siemens.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: dsa: lan9303: avoid dsa_switch_shutdown()
  2024-09-11 17:04     ` Sverdlin, Alexander
@ 2024-09-11 17:09       ` Sverdlin, Alexander
  2024-09-11 17:19         ` Vladimir Oltean
  0 siblings, 1 reply; 11+ messages in thread
From: Sverdlin, Alexander @ 2024-09-11 17:09 UTC (permalink / raw)
  To: olteanv@gmail.com
  Cc: andrew@lunn.ch, davem@davemloft.net, f.fainelli@gmail.com,
	pabeni@redhat.com, kuba@kernel.org, stable@vger.kernel.org,
	edumazet@google.com, netdev@vger.kernel.org

On Wed, 2024-09-11 at 19:04 +0200, Alexander Sverdlin wrote:
> > > The difference between that and this is the extra lan9303_disable_processing_port()
> > > calls here. But while that does disable RX on switch ports, it still doesn't wait
> > > for pending RX frames to be processed. So the race is still open. No?
> 
> besides from the below, I've expected this question... In the meanwhile I've tested
> mv88e6xxx driver, but it (accidentally) has no MDIO race vs shutdown.
> After some shallow review of the drivers I didn't find dev_get_drvdata <= mdio_read
> pattern therefore I've posted this tested patch.
> 
> If you'd prefer to solve this centrally for all drivers, I can test your patch from
> the MDIO-drvdata PoV.

But this would mean throwing away the whole
"net: dsa: be compatible with masters which unregister on shutdown" work?

-- 
Alexander Sverdlin
Siemens AG
www.siemens.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: dsa: lan9303: avoid dsa_switch_shutdown()
  2024-09-11 17:09       ` Sverdlin, Alexander
@ 2024-09-11 17:19         ` Vladimir Oltean
  0 siblings, 0 replies; 11+ messages in thread
From: Vladimir Oltean @ 2024-09-11 17:19 UTC (permalink / raw)
  To: Sverdlin, Alexander
  Cc: andrew@lunn.ch, davem@davemloft.net, f.fainelli@gmail.com,
	pabeni@redhat.com, kuba@kernel.org, stable@vger.kernel.org,
	edumazet@google.com, netdev@vger.kernel.org

On Wed, Sep 11, 2024 at 05:09:08PM +0000, Sverdlin, Alexander wrote:
> On Wed, 2024-09-11 at 19:04 +0200, Alexander Sverdlin wrote:
> > > > The difference between that and this is the extra lan9303_disable_processing_port()
> > > > calls here. But while that does disable RX on switch ports, it still doesn't wait
> > > > for pending RX frames to be processed. So the race is still open. No?
> > 
> > besides from the below, I've expected this question... In the meanwhile I've tested
> > mv88e6xxx driver, but it (accidentally) has no MDIO race vs shutdown.
> > After some shallow review of the drivers I didn't find dev_get_drvdata <= mdio_read
> > pattern therefore I've posted this tested patch.
> > 
> > If you'd prefer to solve this centrally for all drivers, I can test your patch from
> > the MDIO-drvdata PoV.
> 
> But this would mean throwing away the whole
> "net: dsa: be compatible with masters which unregister on shutdown" work?

I did not propose anything (yet). I'm still trying to form a mental model
of what is broken and what works. Hence the request to test that change.

OTOH, this patch is equally throwing away the whole "net: dsa: be
compatible with masters which unregister on shutdown" work (for lan9303).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: dsa: lan9303: avoid dsa_switch_shutdown()
  2024-09-11 14:40 A. Sverdlin
  2024-09-11 16:28 ` Vladimir Oltean
@ 2024-09-12 10:15 ` Vladimir Oltean
  2024-09-13 16:16   ` Sverdlin, Alexander
  1 sibling, 1 reply; 11+ messages in thread
From: Vladimir Oltean @ 2024-09-12 10:15 UTC (permalink / raw)
  To: A. Sverdlin
  Cc: netdev, Andrew Lunn, Florian Fainelli, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, stable

[-- Attachment #1: Type: text/plain, Size: 1261 bytes --]

Hi Alexander,

On Wed, Sep 11, 2024 at 04:40:03PM +0200, A. Sverdlin wrote:
> From: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> 
> dsa_switch_shutdown() doesn't bring down any ports, but only disconnects
> slaves from master. Packets still come afterwards into master port and the
> ports are being polled for link status. This leads to crashes:
> 
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1
> Workqueue: events_power_efficient phy_state_machine
> pc : lan9303_mdio_phy_read
> lr : lan9303_phy_read
> Call trace:
>  lan9303_mdio_phy_read
>  lan9303_phy_read
>  dsa_slave_phy_read
>  __mdiobus_read
>  mdiobus_read
>  genphy_update_link
>  genphy_read_status
>  phy_check_link_status
>  phy_state_machine
>  process_one_work
>  worker_thread
> 
> Call lan9303_remove() instead to really unregister all ports before zeroing
> drvdata and dsa_ptr.
> 
> Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")
> Cc: stable@vger.kernel.org
> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> ---

Could you please test this alternative solution (patch attached) for both reported problems?

Thanks.

[-- Attachment #2: 0001-net-dsa-improve-shutdown-sequence.patch --]
[-- Type: text/x-diff, Size: 3696 bytes --]

From e9ffae9325d1bf683e455f78ac1804b3612a4fd7 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Thu, 12 Sep 2024 01:14:19 +0300
Subject: [PATCH] net: dsa: improve shutdown sequence

Alexander Sverdlin presents 2 problems during shutdown with the
lan9303 driver. One is specific to lan9303 and the other just happens
to reproduce there.

The first problem is that lan9303 is unique among DSA drivers in that it
calls dev_get_drvdata() at "arbitrary runtime" (not probe, not shutdown,
not remove):

phy_state_machine()
-> ...
   -> dsa_user_phy_read()
      -> ds->ops->phy_read()
         -> lan9303_phy_read()
            -> chip->ops->phy_read()
               -> lan9303_mdio_phy_read()
                  -> dev_get_drvdata()

But we never stop the phy_state_machine(), so it may continue to run
after dsa_switch_shutdown(). Our common pattern in all DSA drivers is
to set drvdata to NULL to suppress the remove() method that may come
afterwards. But in this case it will result in an NPD.

The second problem is that the way in which we set
dp->conduit->dsa_ptr = NULL; is concurrent with receive packet
processing. dsa_switch_rcv() checks once whether dev->dsa_ptr is NULL,
but afterwards, rather than continuing to use that non-NULL value,
dev->dsa_ptr is dereferenced again and again without NULL checks:
dsa_conduit_find_user() and many other places. In between dereferences,
there is no locking to ensure that what was valid once continues to be
valid.

Both problems have the common aspect that closing the conduit interface
solves them.

In the first case, dev_close(conduit) triggers the NETDEV_GOING_DOWN
event in dsa_user_netdevice_event() which closes user ports as well.
dsa_port_disable_rt() calls phylink_stop(), which synchronously stops
the phylink state machine, and ds->ops->phy_read() will thus no longer
call into the driver after this point.

In the second case, dev_close(conduit) should do this, as per
Documentation/networking/driver.rst:

| Quiescence
| ----------
|
| After the ndo_stop routine has been called, the hardware must
| not receive or transmit any data.  All in flight packets must
| be aborted. If necessary, poll or wait for completion of
| any reset commands.

So it should be sufficient to ensure that later, when we zeroize
conduit->dsa_ptr, there will be no concurrent dsa_switch_rcv() call
on this conduit.

The addition of the netif_device_detach() function is to ensure that
ioctls, rtnetlinks and ethtool requests on the user ports no longer
propagate down to the driver - we're no longer prepared to handle them.

Link: https://lore.kernel.org/netdev/2d2e3bba17203c14a5ffdabc174e3b6bbb9ad438.camel@siemens.com/
Link: https://lore.kernel.org/netdev/c1bf4de54e829111e0e4a70e7bd1cf523c9550ff.camel@siemens.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/dsa.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 668c729946ea..1664547deffd 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -1577,6 +1577,7 @@ EXPORT_SYMBOL_GPL(dsa_unregister_switch);
 void dsa_switch_shutdown(struct dsa_switch *ds)
 {
 	struct net_device *conduit, *user_dev;
+	LIST_HEAD(close_list);
 	struct dsa_port *dp;
 
 	mutex_lock(&dsa2_mutex);
@@ -1586,10 +1587,16 @@ void dsa_switch_shutdown(struct dsa_switch *ds)
 
 	rtnl_lock();
 
+	dsa_switch_for_each_cpu_port(dp, ds)
+		list_add(&dp->conduit->close_list, &close_list);
+
+	dev_close_many(&close_list, true);
+
 	dsa_switch_for_each_user_port(dp, ds) {
 		conduit = dsa_port_to_conduit(dp);
 		user_dev = dp->user;
 
+		netif_device_detach(user_dev);
 		netdev_upper_dev_unlink(conduit, user_dev);
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: dsa: lan9303: avoid dsa_switch_shutdown()
  2024-09-12 10:15 ` Vladimir Oltean
@ 2024-09-13 16:16   ` Sverdlin, Alexander
  2024-09-13 18:57     ` Vladimir Oltean
  0 siblings, 1 reply; 11+ messages in thread
From: Sverdlin, Alexander @ 2024-09-13 16:16 UTC (permalink / raw)
  To: olteanv@gmail.com
  Cc: andrew@lunn.ch, davem@davemloft.net, f.fainelli@gmail.com,
	pabeni@redhat.com, kuba@kernel.org, stable@vger.kernel.org,
	edumazet@google.com, netdev@vger.kernel.org

Hi Vladimir!

Thank you for the quick fix!

On Thu, 2024-09-12 at 13:15 +0300, Vladimir Oltean wrote:
> > From: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> > 
> > dsa_switch_shutdown() doesn't bring down any ports, but only disconnects
> > slaves from master. Packets still come afterwards into master port and the
> > ports are being polled for link status. This leads to crashes:
> > 
> > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> > CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1
> > Workqueue: events_power_efficient phy_state_machine
> > pc : lan9303_mdio_phy_read
> > lr : lan9303_phy_read
> > Call trace:
> >   lan9303_mdio_phy_read
> >   lan9303_phy_read
> >   dsa_slave_phy_read
> >   __mdiobus_read
> >   mdiobus_read
> >   genphy_update_link
> >   genphy_read_status
> >   phy_check_link_status
> >   phy_state_machine
> >   process_one_work
> >   worker_thread
> > 
> > Call lan9303_remove() instead to really unregister all ports before zeroing
> > drvdata and dsa_ptr.
> > 
> > Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")

Do you think it would make sense to add the same Fixes: tag as above?
(That's the earlier one of the two)

> > Cc: stable@vger.kernel.org
> > Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> > ---
> 
> Could you please test this alternative solution (patch attached) for both reported problems?

We had two LAN9303-equipped systems running overnight with PROVE_LOCKING+PROVE_RCU and without,
and I also ran couple of reboots with PROVE_LOCKING on a Marvell mv6xxx equipped HW.
All of the above for a backport to v6.1, but this part should be OK, I believe.

Overall looks very good, you can add my
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> 
when you officially publish the patch.

-- 
Alexander Sverdlin
Siemens AG
www.siemens.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] net: dsa: lan9303: avoid dsa_switch_shutdown()
  2024-09-13 16:16   ` Sverdlin, Alexander
@ 2024-09-13 18:57     ` Vladimir Oltean
  0 siblings, 0 replies; 11+ messages in thread
From: Vladimir Oltean @ 2024-09-13 18:57 UTC (permalink / raw)
  To: Sverdlin, Alexander
  Cc: andrew@lunn.ch, davem@davemloft.net, f.fainelli@gmail.com,
	pabeni@redhat.com, kuba@kernel.org, stable@vger.kernel.org,
	edumazet@google.com, netdev@vger.kernel.org

Hi Alexander,

On Fri, Sep 13, 2024 at 04:16:57PM +0000, Sverdlin, Alexander wrote:
> > > Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")
> 
> Do you think it would make sense to add the same Fixes: tag as above?
> (That's the earlier one of the two)
> 
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> > > ---
> > 
> > Could you please test this alternative solution (patch attached) for both reported problems?
> 
> We had two LAN9303-equipped systems running overnight with PROVE_LOCKING+PROVE_RCU and without,
> and I also ran couple of reboots with PROVE_LOCKING on a Marvell mv6xxx equipped HW.
> All of the above for a backport to v6.1, but this part should be OK, I believe.
> 
> Overall looks very good, you can add my
> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> 
> when you officially publish the patch.

Thanks for testing, I really appreciate it.

I will add all the required tags for the formal patch submission.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-09-13 18:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-04  9:07 [PATCH net] net: dsa: lan9303: avoid dsa_switch_shutdown() Alexander Sverdlin
2024-09-11 14:33 ` Jakub Kicinski
  -- strict thread matches above, loose matches on Subject: below --
2024-09-11 14:40 A. Sverdlin
2024-09-11 16:28 ` Vladimir Oltean
2024-09-11 16:55   ` Sverdlin, Alexander
2024-09-11 17:04     ` Sverdlin, Alexander
2024-09-11 17:09       ` Sverdlin, Alexander
2024-09-11 17:19         ` Vladimir Oltean
2024-09-12 10:15 ` Vladimir Oltean
2024-09-13 16:16   ` Sverdlin, Alexander
2024-09-13 18:57     ` Vladimir Oltean

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).