Netdev List
 help / color / mirror / Atom feed
From: Vladimir Oltean <vladimir.oltean@nxp.com>
To: "Russell King (Oracle)" <linux@armlinux.org.uk>
Cc: netdev@vger.kernel.org, Andrew Lunn <andrew@lunn.ch>,
	Heiner Kallweit <hkallweit1@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH net] net: phy: transfer phy_config_inband() locking responsibility to phylink
Date: Tue, 2 Sep 2025 17:42:41 +0300	[thread overview]
Message-ID: <20250902144241.avfiqpmqy7xhlwqa@skbuf> (raw)
In-Reply-To: <aLb6puGVzR29GpPx@shell.armlinux.org.uk>

On Tue, Sep 02, 2025 at 03:09:42PM +0100, Russell King (Oracle) wrote:
> On Tue, Sep 02, 2025 at 04:41:41PM +0300, Vladimir Oltean wrote:
> > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> > index c7f867b361dd..350905928d46 100644
> > --- a/drivers/net/phy/phylink.c
> > +++ b/drivers/net/phy/phylink.c
> > @@ -1580,10 +1585,13 @@ static void phylink_resolve(struct work_struct *w)
> >  {
> >  	struct phylink *pl = container_of(w, struct phylink, resolve);
> >  	struct phylink_link_state link_state;
> > +	struct phy_device *phy = pl->phydev;
> >  	bool mac_config = false;
> >  	bool retrigger = false;
> >  	bool cur_link_state;
> >  
> > +	if (phy)
> > +		mutex_lock(&phy->lock);
> 
> I don't think this is safe.
> 
> The addition and removal of PHYs is protected by two locks:
> 
> 1. RTNL, to prevent ethtool operations running concurrently with the
>    addition or removal of PHYs.
> 
> 2. The state_mutex which protects the resolver which doesn't take the
>    RTNL.
> 
> Given that the RTNL is not held in this path, dereferencing pl->phydev
> is unsafe as the PHY may go away (through e.g. SFP module removal)
> which means this mutex_lock() may end up operating on free'd memory.
> 
> I'm not sure we want to be taking the RTNL on this path.
> 
> At the moment, I'm not sure what the solution is here.

Rephrased and slightly expanded: phylink_disconnect_phy(), when called
from drivers, has the convention that phylink_stop() must have been
called prior, or phylink_start() must have never been called.

However, when called from phylink_sfp_disconnect_phy(),
phylink_disconnect_phy() does not benefit from the same guarantee that
phylink_run_resolve_and_disable(pl, PHYLINK_DISABLE_STOPPED) ran.

Correct so far?

Can we disable the resolver from phylink_sfp_disconnect_phy(), to offer
a similar guarantee that phylink_disconnect_phy() never runs with a
concurrent resolver?

I don't have a local setup at the moment to test what happens when I
unplug an SFP module with the change I am proposing. I can test in a few
hours at the earliest. However, there's a chance testing won't reveal
why we don't stop the resolver during SFP module disconnection, hence
the reason for this possibly stupid question.

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 350905928d46..a8facc177f1f 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -2313,17 +2313,13 @@ void phylink_disconnect_phy(struct phylink *pl)

 	ASSERT_RTNL();

+	WARN_ON(!test_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state));
+
 	phy = pl->phydev;
 	if (phy) {
-		mutex_lock(&phy->lock);
-		mutex_lock(&pl->state_mutex);
 		pl->phydev = NULL;
 		pl->phy_enable_tx_lpi = false;
 		pl->mac_tx_clk_stop = false;
-		mutex_unlock(&pl->state_mutex);
-		mutex_unlock(&phy->lock);
-		flush_work(&pl->resolve);
-
 		phy_disconnect(phy);
 	}
 }
@@ -3809,7 +3805,10 @@ static int phylink_sfp_connect_phy(void *upstream, struct phy_device *phy)
 static void phylink_sfp_disconnect_phy(void *upstream,
 				       struct phy_device *phydev)
 {
-	phylink_disconnect_phy(upstream);
+	struct phylink *pl = upstream;
+
+	phylink_run_resolve_and_disable(pl, PHYLINK_DISABLE_STOPPED);
+	phylink_disconnect_phy(pl);
 }

 static const struct sfp_upstream_ops sfp_phylink_ops = {


  reply	other threads:[~2025-09-02 14:42 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-02 13:41 [RFC PATCH net] net: phy: transfer phy_config_inband() locking responsibility to phylink Vladimir Oltean
2025-09-02 14:09 ` Russell King (Oracle)
2025-09-02 14:42   ` Vladimir Oltean [this message]
2025-09-02 15:02     ` Vladimir Oltean
2025-09-03  8:36       ` Vladimir Oltean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250902144241.avfiqpmqy7xhlwqa@skbuf \
    --to=vladimir.oltean@nxp.com \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hkallweit1@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox