From: "Sverdlin, Alexander" <alexander.sverdlin@siemens.com>
To: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"vladimir.oltean@nxp.com" <vladimir.oltean@nxp.com>
Cc: "andrew@lunn.ch" <andrew@lunn.ch>,
"olteanv@gmail.com" <olteanv@gmail.com>,
"vivien.didelot@gmail.com" <vivien.didelot@gmail.com>,
"claudiu.manoil@nxp.com" <claudiu.manoil@nxp.com>,
"woojung.huh@microchip.com" <woojung.huh@microchip.com>,
"dqfext@gmail.com" <dqfext@gmail.com>,
"davem@davemloft.net" <davem@davemloft.net>,
"UNGLinuxDriver@microchip.com" <UNGLinuxDriver@microchip.com>,
"alexandre.belloni@bootlin.com" <alexandre.belloni@bootlin.com>,
"linux-mediatek@lists.infradead.org"
<linux-mediatek@lists.infradead.org>,
"george.mccollister@gmail.com" <george.mccollister@gmail.com>,
"linux@armlinux.org.uk" <linux@armlinux.org.uk>,
"linux@rempel-privat.de" <linux@rempel-privat.de>,
"linus.walleij@linaro.org" <linus.walleij@linaro.org>,
"hkallweit1@gmail.com" <hkallweit1@gmail.com>,
"hauke@hauke-m.de" <hauke@hauke-m.de>,
"LinoSanfilippo@gmx.de" <LinoSanfilippo@gmx.de>,
"kuba@kernel.org" <kuba@kernel.org>,
"sean.wang@mediatek.com" <sean.wang@mediatek.com>,
"kurt@linutronix.de" <kurt@linutronix.de>,
"m.grzeschik@pengutronix.de" <m.grzeschik@pengutronix.de>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"matthias.bgg@gmail.com" <matthias.bgg@gmail.com>,
"Landen.Chao@mediatek.com" <Landen.Chao@mediatek.com>,
"f.fainelli@gmail.com" <f.fainelli@gmail.com>
Subject: Re: [PATCH v2 net 2/5] net: dsa: be compatible with masters which unregister on shutdown
Date: Wed, 4 Sep 2024 08:31:13 +0000 [thread overview]
Message-ID: <2d2e3bba17203c14a5ffdabc174e3b6bbb9ad438.camel@siemens.com> (raw)
In-Reply-To: <20210917133436.553995-3-vladimir.oltean@nxp.com>
Hi Vladimir!
On Fri, 2021-09-17 at 16:34 +0300, Vladimir Oltean wrote:
> Lino reports that on his system with bcmgenet as DSA master and KSZ9897
> as a switch, rebooting or shutting down never works properly.
>
> What does the bcmgenet driver have special to trigger this, that other
> DSA masters do not? It has an implementation of ->shutdown which simply
> calls its ->remove implementation. Otherwise said, it unregisters its
> network interface on shutdown.
>
> This message can be seen in a loop, and it hangs the reboot process there:
>
> unregister_netdevice: waiting for eth0 to become free. Usage count = 3
>
> So why 3?
>
> A usage count of 1 is normal for a registered network interface, and any
> virtual interface which links itself as an upper of that will increment
> it via dev_hold. In the case of DSA, this is the call path:
>
> dsa_slave_create
> -> netdev_upper_dev_link
> -> __netdev_upper_dev_link
> -> __netdev_adjacent_dev_insert
> -> dev_hold
>
> So a DSA switch with 3 interfaces will result in a usage count elevated
> by two, and netdev_wait_allrefs will wait until they have gone away.
>
> Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
> delete themselves, but DSA cannot just vanish and go poof, at most it
> can unbind itself from the switch devices, but that must happen strictly
> earlier compared to when the DSA master unregisters its net_device, so
> reacting on the NETDEV_UNREGISTER event is way too late.
>
> It seems that it is a pretty established pattern to have a driver's
> ->shutdown hook redirect to its ->remove hook, so the same code is
> executed regardless of whether the driver is unbound from the device, or
> the system is just shutting down. As Florian puts it, it is quite a big
> hammer for bcmgenet to unregister its net_device during shutdown, but
> having a common code path with the driver unbind helps ensure it is well
> tested.
>
> So DSA, for better or for worse, has to live with that and engage in an
> arms race of implementing the ->shutdown hook too, from all individual
> drivers, and do something sane when paired with masters that unregister
> their net_device there. The only sane thing to do, of course, is to
> unlink from the master.
>
> However, complications arise really quickly.
>
> The pattern of redirecting ->shutdown to ->remove is not unique to
> bcmgenet or even to net_device drivers. In fact, SPI controllers do it
> too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
> and MDIO controllers do it too (this is something I have not researched
> too deeply, but even if this is not the case today, it is certainly
> plausible to happen in the future, and must be taken into consideration).
>
> Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
> insane implication is that for the exact same DSA switch device, we
> might have both ->shutdown and ->remove getting called.
>
> So we need to do something with that insane environment. The pattern
> I've come up with is "if this, then not that", so if either ->shutdown
> or ->remove gets called, we set the device's drvdata to NULL, and in the
> other hook, we check whether the drvdata is NULL and just do nothing.
> This is probably not necessary for platform devices, just for devices on
> buses, but I would really insist for consistency among drivers, because
> when code is copy-pasted, it is not always copy-pasted from the best
> sources.
>
> So depending on whether the DSA switch's ->remove or ->shutdown will get
> called first, we cannot really guarantee even for the same driver if
> rebooting will result in the same code path on all platforms. But
> nonetheless, we need to do something minimally reasonable on ->shutdown
> too to fix the bug. Of course, the ->remove will do more (a full
> teardown of the tree, with all data structures freed, and this is why
> the bug was not caught for so long). The new ->shutdown method is kept
> separate from dsa_unregister_switch not because we couldn't have
> unregistered the switch, but simply in the interest of doing something
> quick and to the point.
>
> The big question is: does the DSA switch's ->shutdown get called earlier
> than the DSA master's ->shutdown? If not, there is still a risk that we
> might still trigger the WARN_ON in unregister_netdevice that says we are
> attempting to unregister a net_device which has uppers. That's no good.
> Although the reference to the master net_device won't physically go away
> even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
> on it.
>
> The answer to that question lies in this comment above device_link_add:
>
> * A side effect of the link creation is re-ordering of dpm_list and the
> * devices_kset list by moving the consumer device and all devices depending
> * on it to the ends of these lists (that does not happen to devices that have
> * not been registered when this function is called).
>
> so the fact that DSA uses device_link_add towards its master is not
> exactly for nothing. device_shutdown() walks devices_kset from the back,
> so this is our guarantee that DSA's shutdown happens before the master's
> shutdown.
>
> Fixes: 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
> Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/
> Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> Tested-by: Andrew Lunn <andrew@lunn.ch>
> ---
> drivers/net/dsa/b53/b53_mdio.c | 21 ++++++++-
> drivers/net/dsa/b53/b53_mmap.c | 13 ++++++
> drivers/net/dsa/b53/b53_priv.h | 5 +++
> drivers/net/dsa/b53/b53_spi.c | 13 ++++++
> drivers/net/dsa/b53/b53_srab.c | 21 ++++++++-
> drivers/net/dsa/bcm_sf2.c | 12 ++++++
> drivers/net/dsa/dsa_loop.c | 22 +++++++++-
> drivers/net/dsa/lan9303-core.c | 6 +++
> drivers/net/dsa/lan9303.h | 1 +
> drivers/net/dsa/lan9303_i2c.c | 24 +++++++++--
> drivers/net/dsa/lan9303_mdio.c | 15 +++++++
> drivers/net/dsa/lantiq_gswip.c | 18 ++++++++
> drivers/net/dsa/microchip/ksz8795_spi.c | 11 ++++-
> drivers/net/dsa/microchip/ksz9477_i2c.c | 14 +++++-
> drivers/net/dsa/microchip/ksz9477_spi.c | 8 +++-
> drivers/net/dsa/mt7530.c | 18 ++++++++
> drivers/net/dsa/mv88e6060.c | 18 ++++++++
> drivers/net/dsa/mv88e6xxx/chip.c | 22 +++++++++-
> drivers/net/dsa/ocelot/felix_vsc9959.c | 20 ++++++++-
> drivers/net/dsa/ocelot/seville_vsc9953.c | 20 ++++++++-
> drivers/net/dsa/qca/ar9331.c | 18 ++++++++
> drivers/net/dsa/qca8k.c | 18 ++++++++
> drivers/net/dsa/realtek-smi-core.c | 20 ++++++++-
> drivers/net/dsa/sja1105/sja1105_main.c | 21 ++++++++-
> drivers/net/dsa/vitesse-vsc73xx-core.c | 6 +++
> drivers/net/dsa/vitesse-vsc73xx-platform.c | 22 +++++++++-
> drivers/net/dsa/vitesse-vsc73xx-spi.c | 22 +++++++++-
> drivers/net/dsa/vitesse-vsc73xx.h | 1 +
> include/net/dsa.h | 1 +
> net/dsa/dsa2.c | 50 ++++++++++++++++++++++
> 30 files changed, 457 insertions(+), 24 deletions(-)
[]
> diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
> index d7ce281570b5..89f920289ae2 100644
> --- a/drivers/net/dsa/lan9303-core.c
> +++ b/drivers/net/dsa/lan9303-core.c
> @@ -1379,6 +1379,12 @@ int lan9303_remove(struct lan9303 *chip)
> }
> EXPORT_SYMBOL(lan9303_remove);
>
> +void lan9303_shutdown(struct lan9303 *chip)
> +{
> + dsa_switch_shutdown(chip->ds);
> +}
> +EXPORT_SYMBOL(lan9303_shutdown);
> +
> MODULE_AUTHOR("Juergen Borleis <kernel@pengutronix.de>");
> MODULE_DESCRIPTION("Core driver for SMSC/Microchip LAN9303 three port ethernet switch");
> MODULE_LICENSE("GPL v2");
> diff --git a/drivers/net/dsa/lan9303.h b/drivers/net/dsa/lan9303.h
> index 11f590b64701..c7f73efa50f0 100644
> --- a/drivers/net/dsa/lan9303.h
> +++ b/drivers/net/dsa/lan9303.h
> @@ -10,3 +10,4 @@ extern const struct lan9303_phy_ops lan9303_indirect_phy_ops;
>
> int lan9303_probe(struct lan9303 *chip, struct device_node *np);
> int lan9303_remove(struct lan9303 *chip);
> +void lan9303_shutdown(struct lan9303 *chip);
> diff --git a/drivers/net/dsa/lan9303_i2c.c b/drivers/net/dsa/lan9303_i2c.c
> index 9bffaef65a04..8ca4713310fa 100644
> --- a/drivers/net/dsa/lan9303_i2c.c
> +++ b/drivers/net/dsa/lan9303_i2c.c
> @@ -67,13 +67,28 @@ static int lan9303_i2c_probe(struct i2c_client *client,
>
> static int lan9303_i2c_remove(struct i2c_client *client)
> {
> - struct lan9303_i2c *sw_dev;
> + struct lan9303_i2c *sw_dev = i2c_get_clientdata(client);
>
> - sw_dev = i2c_get_clientdata(client);
> if (!sw_dev)
> - return -ENODEV;
> + return 0;
> +
> + lan9303_remove(&sw_dev->chip);
> +
> + i2c_set_clientdata(client, NULL);
> +
> + return 0;
> +}
> +
> +static void lan9303_i2c_shutdown(struct i2c_client *client)
> +{
> + struct lan9303_i2c *sw_dev = i2c_get_clientdata(client);
> +
> + if (!sw_dev)
> + return;
> +
> + lan9303_shutdown(&sw_dev->chip);
>
> - return lan9303_remove(&sw_dev->chip);
> + i2c_set_clientdata(client, NULL);
> }
>
> /*-------------------------------------------------------------------------*/
> @@ -97,6 +112,7 @@ static struct i2c_driver lan9303_i2c_driver = {
> },
> .probe = lan9303_i2c_probe,
> .remove = lan9303_i2c_remove,
> + .shutdown = lan9303_i2c_shutdown,
> .id_table = lan9303_i2c_id,
> };
> module_i2c_driver(lan9303_i2c_driver);
> diff --git a/drivers/net/dsa/lan9303_mdio.c b/drivers/net/dsa/lan9303_mdio.c
> index 9cbe80460b53..bbb7032409ba 100644
> --- a/drivers/net/dsa/lan9303_mdio.c
> +++ b/drivers/net/dsa/lan9303_mdio.c
> @@ -138,6 +138,20 @@ static void lan9303_mdio_remove(struct mdio_device *mdiodev)
> return;
>
> lan9303_remove(&sw_dev->chip);
> +
> + dev_set_drvdata(&mdiodev->dev, NULL);
> +}
> +
> +static void lan9303_mdio_shutdown(struct mdio_device *mdiodev)
> +{
> + struct lan9303_mdio *sw_dev = dev_get_drvdata(&mdiodev->dev);
> +
> + if (!sw_dev)
> + return;
> +
> + lan9303_shutdown(&sw_dev->chip);
> +
> + dev_set_drvdata(&mdiodev->dev, NULL);
> }
This unfortunately didn't work well with LAN9303 and probably will not work
with others:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+gitb7793b7d9b35 #1
Workqueue: events_power_efficient phy_state_machine
pc : lan9303_mdio_phy_read+0x1c/0x34
lr : lan9303_phy_read+0x50/0x100
Call trace:
lan9303_mdio_phy_read+0x1c/0x34
lan9303_phy_read+0x50/0x100
dsa_slave_phy_read+0x40/0x50
__mdiobus_read+0x34/0x130
mdiobus_read+0x44/0x70
genphy_update_link+0x2c/0x104
genphy_read_status+0x2c/0x120
phy_check_link_status+0xb8/0xcc
phy_state_machine+0x198/0x27c
process_one_work+0x1dc/0x450
worker_thread+0x154/0x450
as long as the ports are not down (and dsa_switch_shutdown() doesn't ensure it),
we cannot just zero drvdata, because PHY polling will eventually call
static int lan9303_mdio_phy_read(struct lan9303 *chip, int addr, int reg)
{
struct lan9303_mdio *sw_dev = dev_get_drvdata(chip->dev);
return mdiobus_read_nested(sw_dev->device->bus, addr, reg);
There are however multiple other unsafe patterns.
I suppose current
dsa_switch_shutdown();
dev_set_drvdata(...->dev, NULL);
pattern is broken in many cases...
--
Alexander Sverdlin
Siemens AG
www.siemens.com
next prev parent reply other threads:[~2024-09-04 8:39 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-17 13:34 [PATCH v2 net 0/5] Make DSA switch drivers compatible with masters which unregister on shutdown Vladimir Oltean
2021-09-17 13:34 ` [PATCH v2 net 1/5] net: mdio: introduce a shutdown method to mdio device drivers Vladimir Oltean
2021-09-17 13:34 ` [PATCH v2 net 2/5] net: dsa: be compatible with masters which unregister on shutdown Vladimir Oltean
2024-09-04 8:31 ` Sverdlin, Alexander [this message]
2024-09-13 20:29 ` Vladimir Oltean
2021-09-17 13:34 ` [PATCH v2 net 3/5] net: dsa: hellcreek: " Vladimir Oltean
2021-09-17 13:34 ` [PATCH v2 net 4/5] net: dsa: microchip: ksz8863: " Vladimir Oltean
2021-09-17 13:34 ` [PATCH v2 net 5/5] net: dsa: xrs700x: " Vladimir Oltean
2021-09-19 11:20 ` [PATCH v2 net 0/5] Make DSA switch drivers " patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2d2e3bba17203c14a5ffdabc174e3b6bbb9ad438.camel@siemens.com \
--to=alexander.sverdlin@siemens.com \
--cc=Landen.Chao@mediatek.com \
--cc=LinoSanfilippo@gmx.de \
--cc=UNGLinuxDriver@microchip.com \
--cc=alexandre.belloni@bootlin.com \
--cc=andrew@lunn.ch \
--cc=claudiu.manoil@nxp.com \
--cc=davem@davemloft.net \
--cc=dqfext@gmail.com \
--cc=f.fainelli@gmail.com \
--cc=george.mccollister@gmail.com \
--cc=hauke@hauke-m.de \
--cc=hkallweit1@gmail.com \
--cc=kuba@kernel.org \
--cc=kurt@linutronix.de \
--cc=linus.walleij@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mediatek@lists.infradead.org \
--cc=linux@armlinux.org.uk \
--cc=linux@rempel-privat.de \
--cc=m.grzeschik@pengutronix.de \
--cc=matthias.bgg@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=olteanv@gmail.com \
--cc=sean.wang@mediatek.com \
--cc=vivien.didelot@gmail.com \
--cc=vladimir.oltean@nxp.com \
--cc=woojung.huh@microchip.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox