Linux-mediatek Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Sverdlin, Alexander" <alexander.sverdlin@siemens.com>
To: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"vladimir.oltean@nxp.com" <vladimir.oltean@nxp.com>
Cc: "andrew@lunn.ch" <andrew@lunn.ch>,
	"olteanv@gmail.com" <olteanv@gmail.com>,
	"vivien.didelot@gmail.com" <vivien.didelot@gmail.com>,
	"claudiu.manoil@nxp.com" <claudiu.manoil@nxp.com>,
	"woojung.huh@microchip.com" <woojung.huh@microchip.com>,
	"dqfext@gmail.com" <dqfext@gmail.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"UNGLinuxDriver@microchip.com" <UNGLinuxDriver@microchip.com>,
	"alexandre.belloni@bootlin.com" <alexandre.belloni@bootlin.com>,
	"linux-mediatek@lists.infradead.org"
	<linux-mediatek@lists.infradead.org>,
	"george.mccollister@gmail.com" <george.mccollister@gmail.com>,
	"linux@armlinux.org.uk" <linux@armlinux.org.uk>,
	"linux@rempel-privat.de" <linux@rempel-privat.de>,
	"linus.walleij@linaro.org" <linus.walleij@linaro.org>,
	"hkallweit1@gmail.com" <hkallweit1@gmail.com>,
	"hauke@hauke-m.de" <hauke@hauke-m.de>,
	"LinoSanfilippo@gmx.de" <LinoSanfilippo@gmx.de>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"sean.wang@mediatek.com" <sean.wang@mediatek.com>,
	"kurt@linutronix.de" <kurt@linutronix.de>,
	"m.grzeschik@pengutronix.de" <m.grzeschik@pengutronix.de>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"matthias.bgg@gmail.com" <matthias.bgg@gmail.com>,
	"Landen.Chao@mediatek.com" <Landen.Chao@mediatek.com>,
	"f.fainelli@gmail.com" <f.fainelli@gmail.com>
Subject: Re: [PATCH v2 net 2/5] net: dsa: be compatible with masters which unregister on shutdown
Date: Wed, 4 Sep 2024 08:31:13 +0000	[thread overview]
Message-ID: <2d2e3bba17203c14a5ffdabc174e3b6bbb9ad438.camel@siemens.com> (raw)
In-Reply-To: <20210917133436.553995-3-vladimir.oltean@nxp.com>

Hi Vladimir!

On Fri, 2021-09-17 at 16:34 +0300, Vladimir Oltean wrote:
> Lino reports that on his system with bcmgenet as DSA master and KSZ9897
> as a switch, rebooting or shutting down never works properly.
> 
> What does the bcmgenet driver have special to trigger this, that other
> DSA masters do not? It has an implementation of ->shutdown which simply
> calls its ->remove implementation. Otherwise said, it unregisters its
> network interface on shutdown.
> 
> This message can be seen in a loop, and it hangs the reboot process there:
> 
> unregister_netdevice: waiting for eth0 to become free. Usage count = 3
> 
> So why 3?
> 
> A usage count of 1 is normal for a registered network interface, and any
> virtual interface which links itself as an upper of that will increment
> it via dev_hold. In the case of DSA, this is the call path:
> 
> dsa_slave_create
> -> netdev_upper_dev_link
>    -> __netdev_upper_dev_link
>       -> __netdev_adjacent_dev_insert
>          -> dev_hold
> 
> So a DSA switch with 3 interfaces will result in a usage count elevated
> by two, and netdev_wait_allrefs will wait until they have gone away.
> 
> Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
> delete themselves, but DSA cannot just vanish and go poof, at most it
> can unbind itself from the switch devices, but that must happen strictly
> earlier compared to when the DSA master unregisters its net_device, so
> reacting on the NETDEV_UNREGISTER event is way too late.
> 
> It seems that it is a pretty established pattern to have a driver's
> ->shutdown hook redirect to its ->remove hook, so the same code is
> executed regardless of whether the driver is unbound from the device, or
> the system is just shutting down. As Florian puts it, it is quite a big
> hammer for bcmgenet to unregister its net_device during shutdown, but
> having a common code path with the driver unbind helps ensure it is well
> tested.
> 
> So DSA, for better or for worse, has to live with that and engage in an
> arms race of implementing the ->shutdown hook too, from all individual
> drivers, and do something sane when paired with masters that unregister
> their net_device there. The only sane thing to do, of course, is to
> unlink from the master.
> 
> However, complications arise really quickly.
> 
> The pattern of redirecting ->shutdown to ->remove is not unique to
> bcmgenet or even to net_device drivers. In fact, SPI controllers do it
> too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
> and MDIO controllers do it too (this is something I have not researched
> too deeply, but even if this is not the case today, it is certainly
> plausible to happen in the future, and must be taken into consideration).
> 
> Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
> insane implication is that for the exact same DSA switch device, we
> might have both ->shutdown and ->remove getting called.
> 
> So we need to do something with that insane environment. The pattern
> I've come up with is "if this, then not that", so if either ->shutdown
> or ->remove gets called, we set the device's drvdata to NULL, and in the
> other hook, we check whether the drvdata is NULL and just do nothing.
> This is probably not necessary for platform devices, just for devices on
> buses, but I would really insist for consistency among drivers, because
> when code is copy-pasted, it is not always copy-pasted from the best
> sources.
> 
> So depending on whether the DSA switch's ->remove or ->shutdown will get
> called first, we cannot really guarantee even for the same driver if
> rebooting will result in the same code path on all platforms. But
> nonetheless, we need to do something minimally reasonable on ->shutdown
> too to fix the bug. Of course, the ->remove will do more (a full
> teardown of the tree, with all data structures freed, and this is why
> the bug was not caught for so long). The new ->shutdown method is kept
> separate from dsa_unregister_switch not because we couldn't have
> unregistered the switch, but simply in the interest of doing something
> quick and to the point.
> 
> The big question is: does the DSA switch's ->shutdown get called earlier
> than the DSA master's ->shutdown? If not, there is still a risk that we
> might still trigger the WARN_ON in unregister_netdevice that says we are
> attempting to unregister a net_device which has uppers. That's no good.
> Although the reference to the master net_device won't physically go away
> even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
> on it.
> 
> The answer to that question lies in this comment above device_link_add:
> 
>  * A side effect of the link creation is re-ordering of dpm_list and the
>  * devices_kset list by moving the consumer device and all devices depending
>  * on it to the ends of these lists (that does not happen to devices that have
>  * not been registered when this function is called).
> 
> so the fact that DSA uses device_link_add towards its master is not
> exactly for nothing. device_shutdown() walks devices_kset from the back,
> so this is our guarantee that DSA's shutdown happens before the master's
> shutdown.
> 
> Fixes: 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
> Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/
> Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> Tested-by: Andrew Lunn <andrew@lunn.ch>
> ---
>  drivers/net/dsa/b53/b53_mdio.c             | 21 ++++++++-
>  drivers/net/dsa/b53/b53_mmap.c             | 13 ++++++
>  drivers/net/dsa/b53/b53_priv.h             |  5 +++
>  drivers/net/dsa/b53/b53_spi.c              | 13 ++++++
>  drivers/net/dsa/b53/b53_srab.c             | 21 ++++++++-
>  drivers/net/dsa/bcm_sf2.c                  | 12 ++++++
>  drivers/net/dsa/dsa_loop.c                 | 22 +++++++++-
>  drivers/net/dsa/lan9303-core.c             |  6 +++
>  drivers/net/dsa/lan9303.h                  |  1 +
>  drivers/net/dsa/lan9303_i2c.c              | 24 +++++++++--
>  drivers/net/dsa/lan9303_mdio.c             | 15 +++++++
>  drivers/net/dsa/lantiq_gswip.c             | 18 ++++++++
>  drivers/net/dsa/microchip/ksz8795_spi.c    | 11 ++++-
>  drivers/net/dsa/microchip/ksz9477_i2c.c    | 14 +++++-
>  drivers/net/dsa/microchip/ksz9477_spi.c    |  8 +++-
>  drivers/net/dsa/mt7530.c                   | 18 ++++++++
>  drivers/net/dsa/mv88e6060.c                | 18 ++++++++
>  drivers/net/dsa/mv88e6xxx/chip.c           | 22 +++++++++-
>  drivers/net/dsa/ocelot/felix_vsc9959.c     | 20 ++++++++-
>  drivers/net/dsa/ocelot/seville_vsc9953.c   | 20 ++++++++-
>  drivers/net/dsa/qca/ar9331.c               | 18 ++++++++
>  drivers/net/dsa/qca8k.c                    | 18 ++++++++
>  drivers/net/dsa/realtek-smi-core.c         | 20 ++++++++-
>  drivers/net/dsa/sja1105/sja1105_main.c     | 21 ++++++++-
>  drivers/net/dsa/vitesse-vsc73xx-core.c     |  6 +++
>  drivers/net/dsa/vitesse-vsc73xx-platform.c | 22 +++++++++-
>  drivers/net/dsa/vitesse-vsc73xx-spi.c      | 22 +++++++++-
>  drivers/net/dsa/vitesse-vsc73xx.h          |  1 +
>  include/net/dsa.h                          |  1 +
>  net/dsa/dsa2.c                             | 50 ++++++++++++++++++++++
>  30 files changed, 457 insertions(+), 24 deletions(-)

[]

> diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
> index d7ce281570b5..89f920289ae2 100644
> --- a/drivers/net/dsa/lan9303-core.c
> +++ b/drivers/net/dsa/lan9303-core.c
> @@ -1379,6 +1379,12 @@ int lan9303_remove(struct lan9303 *chip)
>  }
>  EXPORT_SYMBOL(lan9303_remove);
>  
> +void lan9303_shutdown(struct lan9303 *chip)
> +{
> +	dsa_switch_shutdown(chip->ds);
> +}
> +EXPORT_SYMBOL(lan9303_shutdown);
> +
>  MODULE_AUTHOR("Juergen Borleis <kernel@pengutronix.de>");
>  MODULE_DESCRIPTION("Core driver for SMSC/Microchip LAN9303 three port ethernet switch");
>  MODULE_LICENSE("GPL v2");
> diff --git a/drivers/net/dsa/lan9303.h b/drivers/net/dsa/lan9303.h
> index 11f590b64701..c7f73efa50f0 100644
> --- a/drivers/net/dsa/lan9303.h
> +++ b/drivers/net/dsa/lan9303.h
> @@ -10,3 +10,4 @@ extern const struct lan9303_phy_ops lan9303_indirect_phy_ops;
>  
>  int lan9303_probe(struct lan9303 *chip, struct device_node *np);
>  int lan9303_remove(struct lan9303 *chip);
> +void lan9303_shutdown(struct lan9303 *chip);
> diff --git a/drivers/net/dsa/lan9303_i2c.c b/drivers/net/dsa/lan9303_i2c.c
> index 9bffaef65a04..8ca4713310fa 100644
> --- a/drivers/net/dsa/lan9303_i2c.c
> +++ b/drivers/net/dsa/lan9303_i2c.c
> @@ -67,13 +67,28 @@ static int lan9303_i2c_probe(struct i2c_client *client,
>  
>  static int lan9303_i2c_remove(struct i2c_client *client)
>  {
> -	struct lan9303_i2c *sw_dev;
> +	struct lan9303_i2c *sw_dev = i2c_get_clientdata(client);
>  
> -	sw_dev = i2c_get_clientdata(client);
>  	if (!sw_dev)
> -		return -ENODEV;
> +		return 0;
> +
> +	lan9303_remove(&sw_dev->chip);
> +
> +	i2c_set_clientdata(client, NULL);
> +
> +	return 0;
> +}
> +
> +static void lan9303_i2c_shutdown(struct i2c_client *client)
> +{
> +	struct lan9303_i2c *sw_dev = i2c_get_clientdata(client);
> +
> +	if (!sw_dev)
> +		return;
> +
> +	lan9303_shutdown(&sw_dev->chip);
>  
> -	return lan9303_remove(&sw_dev->chip);
> +	i2c_set_clientdata(client, NULL);
>  }
>  
>  /*-------------------------------------------------------------------------*/
> @@ -97,6 +112,7 @@ static struct i2c_driver lan9303_i2c_driver = {
>  	},
>  	.probe = lan9303_i2c_probe,
>  	.remove = lan9303_i2c_remove,
> +	.shutdown = lan9303_i2c_shutdown,
>  	.id_table = lan9303_i2c_id,
>  };
>  module_i2c_driver(lan9303_i2c_driver);
> diff --git a/drivers/net/dsa/lan9303_mdio.c b/drivers/net/dsa/lan9303_mdio.c
> index 9cbe80460b53..bbb7032409ba 100644
> --- a/drivers/net/dsa/lan9303_mdio.c
> +++ b/drivers/net/dsa/lan9303_mdio.c
> @@ -138,6 +138,20 @@ static void lan9303_mdio_remove(struct mdio_device *mdiodev)
>  		return;
>  
>  	lan9303_remove(&sw_dev->chip);
> +
> +	dev_set_drvdata(&mdiodev->dev, NULL);
> +}
> +
> +static void lan9303_mdio_shutdown(struct mdio_device *mdiodev)
> +{
> +	struct lan9303_mdio *sw_dev = dev_get_drvdata(&mdiodev->dev);
> +
> +	if (!sw_dev)
> +		return;
> +
> +	lan9303_shutdown(&sw_dev->chip);
> +
> +	dev_set_drvdata(&mdiodev->dev, NULL);
>  }

This unfortunately didn't work well with LAN9303 and probably will not work
with others:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G           O       6.1.99+gitb7793b7d9b35 #1
Workqueue: events_power_efficient phy_state_machine
pc : lan9303_mdio_phy_read+0x1c/0x34
lr : lan9303_phy_read+0x50/0x100
Call trace:
 lan9303_mdio_phy_read+0x1c/0x34
 lan9303_phy_read+0x50/0x100
 dsa_slave_phy_read+0x40/0x50
 __mdiobus_read+0x34/0x130
 mdiobus_read+0x44/0x70
 genphy_update_link+0x2c/0x104
 genphy_read_status+0x2c/0x120
 phy_check_link_status+0xb8/0xcc
 phy_state_machine+0x198/0x27c
 process_one_work+0x1dc/0x450
 worker_thread+0x154/0x450

as long as the ports are not down (and dsa_switch_shutdown() doesn't ensure it),
we cannot just zero drvdata, because PHY polling will eventually call

static int lan9303_mdio_phy_read(struct lan9303 *chip, int addr, int reg)
{
        struct lan9303_mdio *sw_dev = dev_get_drvdata(chip->dev);

        return mdiobus_read_nested(sw_dev->device->bus, addr, reg);

There are however multiple other unsafe patterns.
I suppose current

dsa_switch_shutdown();
dev_set_drvdata(...->dev, NULL);

pattern is broken in many cases...

-- 
Alexander Sverdlin
Siemens AG
www.siemens.com

  reply	other threads:[~2024-09-04  8:39 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-17 13:34 [PATCH v2 net 0/5] Make DSA switch drivers compatible with masters which unregister on shutdown Vladimir Oltean
2021-09-17 13:34 ` [PATCH v2 net 1/5] net: mdio: introduce a shutdown method to mdio device drivers Vladimir Oltean
2021-09-17 13:34 ` [PATCH v2 net 2/5] net: dsa: be compatible with masters which unregister on shutdown Vladimir Oltean
2024-09-04  8:31   ` Sverdlin, Alexander [this message]
2024-09-13 20:29     ` Vladimir Oltean
2021-09-17 13:34 ` [PATCH v2 net 3/5] net: dsa: hellcreek: " Vladimir Oltean
2021-09-17 13:34 ` [PATCH v2 net 4/5] net: dsa: microchip: ksz8863: " Vladimir Oltean
2021-09-17 13:34 ` [PATCH v2 net 5/5] net: dsa: xrs700x: " Vladimir Oltean
2021-09-19 11:20 ` [PATCH v2 net 0/5] Make DSA switch drivers " patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2d2e3bba17203c14a5ffdabc174e3b6bbb9ad438.camel@siemens.com \
    --to=alexander.sverdlin@siemens.com \
    --cc=Landen.Chao@mediatek.com \
    --cc=LinoSanfilippo@gmx.de \
    --cc=UNGLinuxDriver@microchip.com \
    --cc=alexandre.belloni@bootlin.com \
    --cc=andrew@lunn.ch \
    --cc=claudiu.manoil@nxp.com \
    --cc=davem@davemloft.net \
    --cc=dqfext@gmail.com \
    --cc=f.fainelli@gmail.com \
    --cc=george.mccollister@gmail.com \
    --cc=hauke@hauke-m.de \
    --cc=hkallweit1@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kurt@linutronix.de \
    --cc=linus.walleij@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux@armlinux.org.uk \
    --cc=linux@rempel-privat.de \
    --cc=m.grzeschik@pengutronix.de \
    --cc=matthias.bgg@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=sean.wang@mediatek.com \
    --cc=vivien.didelot@gmail.com \
    --cc=vladimir.oltean@nxp.com \
    --cc=woojung.huh@microchip.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox