netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v3] net: dsa: mxl-gsw1xx: manually clear RANEG bit
@ 2025-12-08  1:27 Daniel Golle
  2025-12-08 12:47 ` Russell King (Oracle)
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Golle @ 2025-12-08  1:27 UTC (permalink / raw)
  To: Hauke Mehrtens, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Russell King,
	Daniel Golle, netdev, linux-kernel
  Cc: Rasmus Villemoes, Benny (Ying-Tsan) Weng, John Crispin

Despite being documented as self-clearing, the RANEG bit sometimes
remains set, preventing auto-negotiation from happening.

Manually clear the RANEG bit after 10ms as advised by MaxLinear.
In order to not hold RTNL during the 10ms of waiting schedule
delayed work to take care of clearing the bit asynchronously, which
is similar to the self-clearing behavior.

Fixes: 22335939ec90 ("net: dsa: add driver for MaxLinear GSW1xx switch family")
Reported-by: Rasmus Villemoes <ravi@prevas.dk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
v3:
 * fix wrong parameter name in call of cancel_delayed_work_sync

v2:
 * cancel pending work before setting RANEG bit
 * cancel pending work on remove and shutdown
 * document that GSW1XX_RST_REQ_SGMII_SHELL also clears RANEG bit
 * improve commit message

 drivers/net/dsa/lantiq/mxl-gsw1xx.c | 34 ++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/lantiq/mxl-gsw1xx.c b/drivers/net/dsa/lantiq/mxl-gsw1xx.c
index cf33a16fd183b..7d332affe9ec8 100644
--- a/drivers/net/dsa/lantiq/mxl-gsw1xx.c
+++ b/drivers/net/dsa/lantiq/mxl-gsw1xx.c
@@ -11,10 +11,12 @@
 
 #include <linux/bits.h>
 #include <linux/delay.h>
+#include <linux/jiffies.h>
 #include <linux/module.h>
 #include <linux/of_device.h>
 #include <linux/of_mdio.h>
 #include <linux/regmap.h>
+#include <linux/workqueue.h>
 #include <net/dsa.h>
 
 #include "lantiq_gswip.h"
@@ -29,6 +31,7 @@ struct gsw1xx_priv {
 	struct			regmap *clk;
 	struct			regmap *shell;
 	struct			phylink_pcs pcs;
+	struct delayed_work	clear_raneg;
 	phy_interface_t		tbi_interface;
 	struct gswip_priv	gswip;
 };
@@ -145,7 +148,9 @@ static void gsw1xx_pcs_disable(struct phylink_pcs *pcs)
 {
 	struct gsw1xx_priv *priv = pcs_to_gsw1xx(pcs);
 
-	/* Assert SGMII shell reset */
+	cancel_delayed_work_sync(&priv->clear_raneg);
+
+	/* Assert SGMII shell reset (will also clear RANEG bit) */
 	regmap_set_bits(priv->shell, GSW1XX_SHELL_RST_REQ,
 			GSW1XX_RST_REQ_SGMII_SHELL);
 
@@ -428,12 +433,29 @@ static int gsw1xx_pcs_config(struct phylink_pcs *pcs, unsigned int neg_mode,
 	return 0;
 }
 
+static void gsw1xx_pcs_clear_raneg(struct work_struct *work)
+{
+	struct gsw1xx_priv *priv =
+		container_of(work, struct gsw1xx_priv, clear_raneg.work);
+
+	regmap_clear_bits(priv->sgmii, GSW1XX_SGMII_TBI_ANEGCTL,
+			  GSW1XX_SGMII_TBI_ANEGCTL_RANEG);
+}
+
 static void gsw1xx_pcs_an_restart(struct phylink_pcs *pcs)
 {
 	struct gsw1xx_priv *priv = pcs_to_gsw1xx(pcs);
 
+	cancel_delayed_work_sync(&priv->clear_raneg);
+
 	regmap_set_bits(priv->sgmii, GSW1XX_SGMII_TBI_ANEGCTL,
 			GSW1XX_SGMII_TBI_ANEGCTL_RANEG);
+
+	/* despite being documented as self-clearing, the RANEG bit
+	 * sometimes remains set, preventing auto-negotiation from happening.
+	 * MaxLinear advises to manually clear the bit after 10ms.
+	 */
+	schedule_delayed_work(&priv->clear_raneg, msecs_to_jiffies(10));
 }
 
 static void gsw1xx_pcs_link_up(struct phylink_pcs *pcs,
@@ -636,6 +658,8 @@ static int gsw1xx_probe(struct mdio_device *mdiodev)
 	if (ret)
 		return ret;
 
+	INIT_DELAYED_WORK(&priv->clear_raneg, gsw1xx_pcs_clear_raneg);
+
 	ret = gswip_probe_common(&priv->gswip, version);
 	if (ret)
 		return ret;
@@ -648,10 +672,14 @@ static int gsw1xx_probe(struct mdio_device *mdiodev)
 static void gsw1xx_remove(struct mdio_device *mdiodev)
 {
 	struct gswip_priv *priv = dev_get_drvdata(&mdiodev->dev);
+	struct gsw1xx_priv *gsw1xx_priv;
 
 	if (!priv)
 		return;
 
+	gsw1xx_priv = container_of(priv, struct gsw1xx_priv, gswip);
+	cancel_delayed_work_sync(&gsw1xx_priv->clear_raneg);
+
 	gswip_disable_switch(priv);
 
 	dsa_unregister_switch(priv->ds);
@@ -660,10 +688,14 @@ static void gsw1xx_remove(struct mdio_device *mdiodev)
 static void gsw1xx_shutdown(struct mdio_device *mdiodev)
 {
 	struct gswip_priv *priv = dev_get_drvdata(&mdiodev->dev);
+	struct gsw1xx_priv *gsw1xx_priv;
 
 	if (!priv)
 		return;
 
+	gsw1xx_priv = container_of(priv, struct gsw1xx_priv, gswip);
+	cancel_delayed_work_sync(&gsw1xx_priv->clear_raneg);
+
 	dev_set_drvdata(&mdiodev->dev, NULL);
 
 	gswip_disable_switch(priv);
-- 
2.52.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net v3] net: dsa: mxl-gsw1xx: manually clear RANEG bit
  2025-12-08  1:27 [PATCH net v3] net: dsa: mxl-gsw1xx: manually clear RANEG bit Daniel Golle
@ 2025-12-08 12:47 ` Russell King (Oracle)
  2025-12-08 13:37   ` Andrew Lunn
  0 siblings, 1 reply; 4+ messages in thread
From: Russell King (Oracle) @ 2025-12-08 12:47 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Hauke Mehrtens, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel,
	Rasmus Villemoes, Benny (Ying-Tsan) Weng, John Crispin

On Mon, Dec 08, 2025 at 01:27:04AM +0000, Daniel Golle wrote:
>  static void gsw1xx_remove(struct mdio_device *mdiodev)
>  {
>  	struct gswip_priv *priv = dev_get_drvdata(&mdiodev->dev);
> +	struct gsw1xx_priv *gsw1xx_priv;
>  
>  	if (!priv)
>  		return;
>  
> +	gsw1xx_priv = container_of(priv, struct gsw1xx_priv, gswip);
> +	cancel_delayed_work_sync(&gsw1xx_priv->clear_raneg);
> +
>  	gswip_disable_switch(priv);
>  
>  	dsa_unregister_switch(priv->ds);

Can we please pay attention to ->remove methods, and code them properly
please?

There are two golden rules of driver programming.

1. Do not publish the device during probe until hardware setup is
   complete. If you publish before hardware setup is complete, userspace
   is free to race with the hardware setup and start using the device.
   This is especially true of recent systems which use hotplug events
   via udev and systemd to do stuff.

2. Do not start tearing down a device until the user interfaces have
   been unpublished. Similar to (1), while the user interface is
   published, uesrspace is completely free to interact with the device
   in any way it sees fit.

In this case, what I'm concerned with is the call above to
cancel_delayed_work_sync() before dsa_unregister_switch(). While
cancel_delayed_work_sync() will stop this work and wait for the handler
to finish running before returning (which is safe) there is a window
between this call and dsa_unregister_switch() where the user _could_
issue a badly timed ethtool command which invokes
gsw1xx_pcs_an_restart(), which would re-schedule the delayed work,
thus undoing the cancel_delayed_work_sync() effect in this path.

So please, always unpublish and then tear-down.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net v3] net: dsa: mxl-gsw1xx: manually clear RANEG bit
  2025-12-08 12:47 ` Russell King (Oracle)
@ 2025-12-08 13:37   ` Andrew Lunn
  2025-12-08 15:52     ` Russell King (Oracle)
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Lunn @ 2025-12-08 13:37 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Daniel Golle, Hauke Mehrtens, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel,
	Rasmus Villemoes, Benny (Ying-Tsan) Weng, John Crispin

On Mon, Dec 08, 2025 at 12:47:51PM +0000, Russell King (Oracle) wrote:
> On Mon, Dec 08, 2025 at 01:27:04AM +0000, Daniel Golle wrote:
> >  static void gsw1xx_remove(struct mdio_device *mdiodev)
> >  {
> >  	struct gswip_priv *priv = dev_get_drvdata(&mdiodev->dev);
> > +	struct gsw1xx_priv *gsw1xx_priv;
> >  
> >  	if (!priv)
> >  		return;
> >  
> > +	gsw1xx_priv = container_of(priv, struct gsw1xx_priv, gswip);
> > +	cancel_delayed_work_sync(&gsw1xx_priv->clear_raneg);
> > +
> >  	gswip_disable_switch(priv);
> >  
> >  	dsa_unregister_switch(priv->ds);
> 
> Can we please pay attention to ->remove methods, and code them properly
> please?
> 
> There are two golden rules of driver programming.
> 
> 1. Do not publish the device during probe until hardware setup is
>    complete. If you publish before hardware setup is complete, userspace
>    is free to race with the hardware setup and start using the device.
>    This is especially true of recent systems which use hotplug events
>    via udev and systemd to do stuff.
> 
> 2. Do not start tearing down a device until the user interfaces have
>    been unpublished. Similar to (1), while the user interface is
>    published, uesrspace is completely free to interact with the device
>    in any way it sees fit.
> 
> In this case, what I'm concerned with is the call above to
> cancel_delayed_work_sync() before dsa_unregister_switch(). While
> cancel_delayed_work_sync() will stop this work and wait for the handler
> to finish running before returning (which is safe) there is a window
> between this call and dsa_unregister_switch() where the user _could_
> issue a badly timed ethtool command which invokes
> gsw1xx_pcs_an_restart(), which would re-schedule the delayed work,
> thus undoing the cancel_delayed_work_sync() effect in this path.

And this is why is was pushing for the much simpler msleep(10), or
io_poll.h polling to see if it self clears. It is hard to get that
wrong, where as delayed work is much easier to get wrong.

	Andrew

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net v3] net: dsa: mxl-gsw1xx: manually clear RANEG bit
  2025-12-08 13:37   ` Andrew Lunn
@ 2025-12-08 15:52     ` Russell King (Oracle)
  0 siblings, 0 replies; 4+ messages in thread
From: Russell King (Oracle) @ 2025-12-08 15:52 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Daniel Golle, Hauke Mehrtens, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel,
	Rasmus Villemoes, Benny (Ying-Tsan) Weng, John Crispin

On Mon, Dec 08, 2025 at 02:37:38PM +0100, Andrew Lunn wrote:
> On Mon, Dec 08, 2025 at 12:47:51PM +0000, Russell King (Oracle) wrote:
> > On Mon, Dec 08, 2025 at 01:27:04AM +0000, Daniel Golle wrote:
> > >  static void gsw1xx_remove(struct mdio_device *mdiodev)
> > >  {
> > >  	struct gswip_priv *priv = dev_get_drvdata(&mdiodev->dev);
> > > +	struct gsw1xx_priv *gsw1xx_priv;
> > >  
> > >  	if (!priv)
> > >  		return;
> > >  
> > > +	gsw1xx_priv = container_of(priv, struct gsw1xx_priv, gswip);
> > > +	cancel_delayed_work_sync(&gsw1xx_priv->clear_raneg);
> > > +
> > >  	gswip_disable_switch(priv);
> > >  
> > >  	dsa_unregister_switch(priv->ds);
> > 
> > Can we please pay attention to ->remove methods, and code them properly
> > please?
> > 
> > There are two golden rules of driver programming.
> > 
> > 1. Do not publish the device during probe until hardware setup is
> >    complete. If you publish before hardware setup is complete, userspace
> >    is free to race with the hardware setup and start using the device.
> >    This is especially true of recent systems which use hotplug events
> >    via udev and systemd to do stuff.
> > 
> > 2. Do not start tearing down a device until the user interfaces have
> >    been unpublished. Similar to (1), while the user interface is
> >    published, uesrspace is completely free to interact with the device
> >    in any way it sees fit.
> > 
> > In this case, what I'm concerned with is the call above to
> > cancel_delayed_work_sync() before dsa_unregister_switch(). While
> > cancel_delayed_work_sync() will stop this work and wait for the handler
> > to finish running before returning (which is safe) there is a window
> > between this call and dsa_unregister_switch() where the user _could_
> > issue a badly timed ethtool command which invokes
> > gsw1xx_pcs_an_restart(), which would re-schedule the delayed work,
> > thus undoing the cancel_delayed_work_sync() effect in this path.
> 
> And this is why is was pushing for the much simpler msleep(10), or
> io_poll.h polling to see if it self clears. It is hard to get that
> wrong, where as delayed work is much easier to get wrong.

It's not specific to delayed work. Looking at the context around
the ->remove() method, it's already wrong:

        gswip_disable_switch(priv);

        dsa_unregister_switch(priv->ds);

gswip_disable_switch() writes to a register:

	regmap_clear_bits(priv->mdio, GSWIP_MDIO_GLOB, GSWIP_MDIO_GLOB_ENABLE);

and I wonder what that does in terms of MDIO bis accesses that will
still be active at this point (because the DSA switch is still
registered.)

I see that gswip_setup() enables the switch before it configures the
MDIO bus and registers it, so the disable-then-unregister looks very
suspicious to me.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-12-08 15:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-08  1:27 [PATCH net v3] net: dsa: mxl-gsw1xx: manually clear RANEG bit Daniel Golle
2025-12-08 12:47 ` Russell King (Oracle)
2025-12-08 13:37   ` Andrew Lunn
2025-12-08 15:52     ` Russell King (Oracle)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).