* [RFC PATCH net] net: phy: transfer phy_config_inband() locking responsibility to phylink
@ 2025-09-02 13:41 Vladimir Oltean
2025-09-02 14:09 ` Russell King (Oracle)
0 siblings, 1 reply; 5+ messages in thread
From: Vladimir Oltean @ 2025-09-02 13:41 UTC (permalink / raw)
To: netdev
Cc: Andrew Lunn, Heiner Kallweit, Russell King, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel
Problem description
===================
Lockdep reports a possible circular locking dependency (AB/BA) between
&pl->state_mutex and &phy->lock, as follows.
phylink_resolve() // acquires &pl->state_mutex
-> phylink_major_config()
-> phy_config_inband() // acquires &pl->phydev->lock
whereas all the other call sites where &pl->state_mutex and
&pl->phydev->lock have the locking scheme reversed. Everywhere else,
&pl->phydev->lock is acquired at the top level, and &pl->state_mutex at
the lower level. A clear example is phylink_bringup_phy().
The outlier is the newly introduced phy_config_inband() and the existing
lock order is the correct one. To understand why it cannot be the other
way around, it is sufficient to consider phylink_phy_change(), phylink's
callback from the PHY device's phy->phy_link_change() virtual method,
invoked by the PHY state machine.
phy_link_up() and phy_link_down(), the (indirect) callers of
phylink_phy_change(), are called with &phydev->lock acquired.
Then phylink_phy_change() acquires its own &pl->state_mutex, to
serialize changes made to its pl->phy_state and pl->link_config.
So all other instances of &pl->state_mutex and &phydev->lock must be
consistent with this order.
Problem impact
==============
I think the kernel runs a serious deadlock risk if an existing
phylink_resolve() thread, which results in a phy_config_inband() call,
is concurrent with a phy_link_up() or phy_link_down() call, which will
deadlock on &pl->state_mutex in phylink_phy_change(). Practically
speaking, the impact may be limited by the slow speed of the medium
auto-negotiation protocol, which makes it unlikely for the current state
to still be unresolved when a new one is detected, but I think the
problem is there. Nonetheless, the problem was discovered using lockdep.
Proposed solution
=================
Practically speaking, the phy_config_inband() requirement of having
phydev->lock acquired must transfer to the caller (phylink is the only
caller). There, it must bubble up until immediately before
&pl->state_mutex is acquired, for the cases where that takes place.
Solution details, considerations, notes
=======================================
This is the phy_config_inband() call graph:
sfp_upstream_ops :: connect_phy()
|
v
phylink_sfp_connect_phy()
|
v
phylink_sfp_config_phy()
|
| sfp_upstream_ops :: module_insert()
| |
| v
| phylink_sfp_module_insert()
| |
| | sfp_upstream_ops :: module_start()
| | |
| | v
| | phylink_sfp_module_start()
| | |
| v v
| phylink_sfp_config_optical()
phylink_start() | |
| phylink_resume() v v
| | phylink_sfp_set_config()
| | |
v v v
phylink_mac_initial_config()
| phylink_resolve()
| | phylink_ethtool_ksettings_set()
v v v
phylink_major_config()
|
v
phy_config_inband()
phylink_major_config() caller #1, phylink_mac_initial_config(), does not
acquire &pl->state_mutex nor do its callers. It must acquire
&pl->phydev->lock prior to calling phylink_major_config().
phylink_major_config() caller #2, phylink_resolve() acquires
&pl->state_mutex, thus also needs to acquire &pl->phydev->lock.
phylink_major_config() caller #3, phylink_ethtool_ksettings_set(), is
completely uninteresting, because it only calls phylink_major_config()
if pl->phydev is NULL (otherwise it calls phy_ethtool_ksettings_set()).
We need to change nothing there.
Fixes: 5fd0f1a02e75 ("net: phylink: add negotiation of in-band capabilities")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
drivers/net/phy/phy.c | 12 ++++--------
drivers/net/phy/phylink.c | 14 ++++++++++++--
2 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 13df28445f02..c02da57a4da5 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1065,23 +1065,19 @@ EXPORT_SYMBOL_GPL(phy_inband_caps);
*/
int phy_config_inband(struct phy_device *phydev, unsigned int modes)
{
- int err;
+ lockdep_assert_held(&phydev->lock);
if (!!(modes & LINK_INBAND_DISABLE) +
!!(modes & LINK_INBAND_ENABLE) +
!!(modes & LINK_INBAND_BYPASS) != 1)
return -EINVAL;
- mutex_lock(&phydev->lock);
if (!phydev->drv)
- err = -EIO;
+ return -EIO;
else if (!phydev->drv->config_inband)
- err = -EOPNOTSUPP;
- else
- err = phydev->drv->config_inband(phydev, modes);
- mutex_unlock(&phydev->lock);
+ return -EOPNOTSUPP;
- return err;
+ return phydev->drv->config_inband(phydev, modes);
}
EXPORT_SYMBOL(phy_config_inband);
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index c7f867b361dd..350905928d46 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1423,6 +1423,7 @@ static void phylink_get_fixed_state(struct phylink *pl,
static void phylink_mac_initial_config(struct phylink *pl, bool force_restart)
{
struct phylink_link_state link_state;
+ struct phy_device *phy = pl->phydev;
switch (pl->req_link_an_mode) {
case MLO_AN_PHY:
@@ -1446,7 +1447,11 @@ static void phylink_mac_initial_config(struct phylink *pl, bool force_restart)
link_state.link = false;
phylink_apply_manual_flow(pl, &link_state);
+ if (phy)
+ mutex_lock(&phy->lock);
phylink_major_config(pl, force_restart, &link_state);
+ if (phy)
+ mutex_unlock(&phy->lock);
}
static const char *phylink_pause_to_str(int pause)
@@ -1580,10 +1585,13 @@ static void phylink_resolve(struct work_struct *w)
{
struct phylink *pl = container_of(w, struct phylink, resolve);
struct phylink_link_state link_state;
+ struct phy_device *phy = pl->phydev;
bool mac_config = false;
bool retrigger = false;
bool cur_link_state;
+ if (phy)
+ mutex_lock(&phy->lock);
mutex_lock(&pl->state_mutex);
cur_link_state = phylink_link_is_up(pl);
@@ -1617,11 +1625,11 @@ static void phylink_resolve(struct work_struct *w)
/* If we have a phy, the "up" state is the union of both the
* PHY and the MAC
*/
- if (pl->phydev)
+ if (phy)
link_state.link &= pl->phy_state.link;
/* Only update if the PHY link is up */
- if (pl->phydev && pl->phy_state.link) {
+ if (phy && pl->phy_state.link) {
/* If the interface has changed, force a link down
* event if the link isn't already down, and re-resolve.
*/
@@ -1685,6 +1693,8 @@ static void phylink_resolve(struct work_struct *w)
queue_work(system_power_efficient_wq, &pl->resolve);
}
mutex_unlock(&pl->state_mutex);
+ if (phy)
+ mutex_unlock(&phy->lock);
}
static void phylink_run_resolve(struct phylink *pl)
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH net] net: phy: transfer phy_config_inband() locking responsibility to phylink
2025-09-02 13:41 [RFC PATCH net] net: phy: transfer phy_config_inband() locking responsibility to phylink Vladimir Oltean
@ 2025-09-02 14:09 ` Russell King (Oracle)
2025-09-02 14:42 ` Vladimir Oltean
0 siblings, 1 reply; 5+ messages in thread
From: Russell King (Oracle) @ 2025-09-02 14:09 UTC (permalink / raw)
To: Vladimir Oltean
Cc: netdev, Andrew Lunn, Heiner Kallweit, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel
On Tue, Sep 02, 2025 at 04:41:41PM +0300, Vladimir Oltean wrote:
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index c7f867b361dd..350905928d46 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -1580,10 +1585,13 @@ static void phylink_resolve(struct work_struct *w)
> {
> struct phylink *pl = container_of(w, struct phylink, resolve);
> struct phylink_link_state link_state;
> + struct phy_device *phy = pl->phydev;
> bool mac_config = false;
> bool retrigger = false;
> bool cur_link_state;
>
> + if (phy)
> + mutex_lock(&phy->lock);
I don't think this is safe.
The addition and removal of PHYs is protected by two locks:
1. RTNL, to prevent ethtool operations running concurrently with the
addition or removal of PHYs.
2. The state_mutex which protects the resolver which doesn't take the
RTNL.
Given that the RTNL is not held in this path, dereferencing pl->phydev
is unsafe as the PHY may go away (through e.g. SFP module removal)
which means this mutex_lock() may end up operating on free'd memory.
I'm not sure we want to be taking the RTNL on this path.
At the moment, I'm not sure what the solution is here.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH net] net: phy: transfer phy_config_inband() locking responsibility to phylink
2025-09-02 14:09 ` Russell King (Oracle)
@ 2025-09-02 14:42 ` Vladimir Oltean
2025-09-02 15:02 ` Vladimir Oltean
0 siblings, 1 reply; 5+ messages in thread
From: Vladimir Oltean @ 2025-09-02 14:42 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: netdev, Andrew Lunn, Heiner Kallweit, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel
On Tue, Sep 02, 2025 at 03:09:42PM +0100, Russell King (Oracle) wrote:
> On Tue, Sep 02, 2025 at 04:41:41PM +0300, Vladimir Oltean wrote:
> > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> > index c7f867b361dd..350905928d46 100644
> > --- a/drivers/net/phy/phylink.c
> > +++ b/drivers/net/phy/phylink.c
> > @@ -1580,10 +1585,13 @@ static void phylink_resolve(struct work_struct *w)
> > {
> > struct phylink *pl = container_of(w, struct phylink, resolve);
> > struct phylink_link_state link_state;
> > + struct phy_device *phy = pl->phydev;
> > bool mac_config = false;
> > bool retrigger = false;
> > bool cur_link_state;
> >
> > + if (phy)
> > + mutex_lock(&phy->lock);
>
> I don't think this is safe.
>
> The addition and removal of PHYs is protected by two locks:
>
> 1. RTNL, to prevent ethtool operations running concurrently with the
> addition or removal of PHYs.
>
> 2. The state_mutex which protects the resolver which doesn't take the
> RTNL.
>
> Given that the RTNL is not held in this path, dereferencing pl->phydev
> is unsafe as the PHY may go away (through e.g. SFP module removal)
> which means this mutex_lock() may end up operating on free'd memory.
>
> I'm not sure we want to be taking the RTNL on this path.
>
> At the moment, I'm not sure what the solution is here.
Rephrased and slightly expanded: phylink_disconnect_phy(), when called
from drivers, has the convention that phylink_stop() must have been
called prior, or phylink_start() must have never been called.
However, when called from phylink_sfp_disconnect_phy(),
phylink_disconnect_phy() does not benefit from the same guarantee that
phylink_run_resolve_and_disable(pl, PHYLINK_DISABLE_STOPPED) ran.
Correct so far?
Can we disable the resolver from phylink_sfp_disconnect_phy(), to offer
a similar guarantee that phylink_disconnect_phy() never runs with a
concurrent resolver?
I don't have a local setup at the moment to test what happens when I
unplug an SFP module with the change I am proposing. I can test in a few
hours at the earliest. However, there's a chance testing won't reveal
why we don't stop the resolver during SFP module disconnection, hence
the reason for this possibly stupid question.
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 350905928d46..a8facc177f1f 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -2313,17 +2313,13 @@ void phylink_disconnect_phy(struct phylink *pl)
ASSERT_RTNL();
+ WARN_ON(!test_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state));
+
phy = pl->phydev;
if (phy) {
- mutex_lock(&phy->lock);
- mutex_lock(&pl->state_mutex);
pl->phydev = NULL;
pl->phy_enable_tx_lpi = false;
pl->mac_tx_clk_stop = false;
- mutex_unlock(&pl->state_mutex);
- mutex_unlock(&phy->lock);
- flush_work(&pl->resolve);
-
phy_disconnect(phy);
}
}
@@ -3809,7 +3805,10 @@ static int phylink_sfp_connect_phy(void *upstream, struct phy_device *phy)
static void phylink_sfp_disconnect_phy(void *upstream,
struct phy_device *phydev)
{
- phylink_disconnect_phy(upstream);
+ struct phylink *pl = upstream;
+
+ phylink_run_resolve_and_disable(pl, PHYLINK_DISABLE_STOPPED);
+ phylink_disconnect_phy(pl);
}
static const struct sfp_upstream_ops sfp_phylink_ops = {
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH net] net: phy: transfer phy_config_inband() locking responsibility to phylink
2025-09-02 14:42 ` Vladimir Oltean
@ 2025-09-02 15:02 ` Vladimir Oltean
2025-09-03 8:36 ` Vladimir Oltean
0 siblings, 1 reply; 5+ messages in thread
From: Vladimir Oltean @ 2025-09-02 15:02 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: netdev, Andrew Lunn, Heiner Kallweit, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel
On Tue, Sep 02, 2025 at 05:42:41PM +0300, Vladimir Oltean wrote:
> Can we disable the resolver from phylink_sfp_disconnect_phy(), to offer
> a similar guarantee that phylink_disconnect_phy() never runs with a
> concurrent resolver?
Hmm, I now noticed phylink_sfp_link_down() which does disable the
resolver already. I need to test/understand whether the SFP state
machine ever calls sfp_remove_phy() without a prior sfp_link_down(), if
the link was up.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH net] net: phy: transfer phy_config_inband() locking responsibility to phylink
2025-09-02 15:02 ` Vladimir Oltean
@ 2025-09-03 8:36 ` Vladimir Oltean
0 siblings, 0 replies; 5+ messages in thread
From: Vladimir Oltean @ 2025-09-03 8:36 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: netdev, Andrew Lunn, Heiner Kallweit, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel
On Tue, Sep 02, 2025 at 06:02:49PM +0300, Vladimir Oltean wrote:
> On Tue, Sep 02, 2025 at 05:42:41PM +0300, Vladimir Oltean wrote:
> > Can we disable the resolver from phylink_sfp_disconnect_phy(), to offer
> > a similar guarantee that phylink_disconnect_phy() never runs with a
> > concurrent resolver?
>
> Hmm, I now noticed phylink_sfp_link_down() which does disable the
> resolver already. I need to test/understand whether the SFP state
> machine ever calls sfp_remove_phy() without a prior sfp_link_down(), if
> the link was up.
This is ugly but is also the only functional idea I have (patch is on
top of the submitted one):
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 9609dc445a0a..16644d5dfa5b 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -67,6 +67,8 @@ struct phylink {
struct timer_list link_poll;
struct mutex state_mutex;
+ /* Serialize updates to pl->phydev with phylink_resolve() */
+ struct mutex phy_lock;
struct phylink_link_state phy_state;
unsigned int phy_ib_mode;
struct work_struct resolve;
@@ -1594,11 +1596,13 @@ static void phylink_resolve(struct work_struct *w)
{
struct phylink *pl = container_of(w, struct phylink, resolve);
struct phylink_link_state link_state;
- struct phy_device *phy = pl->phydev;
bool mac_config = false;
bool retrigger = false;
+ struct phy_device *phy;
bool cur_link_state;
+ mutex_lock(&pl->phy_lock);
+ phy = pl->phydev;
if (phy)
mutex_lock(&phy->lock);
mutex_lock(&pl->state_mutex);
@@ -1704,6 +1708,7 @@ static void phylink_resolve(struct work_struct *w)
mutex_unlock(&pl->state_mutex);
if (phy)
mutex_unlock(&phy->lock);
+ mutex_unlock(&pl->phy_lock);
}
static void phylink_run_resolve(struct phylink *pl)
@@ -1839,6 +1844,7 @@ struct phylink *phylink_create(struct phylink_config *config,
if (!pl)
return ERR_PTR(-ENOMEM);
+ mutex_init(&pl->phy_lock);
mutex_init(&pl->state_mutex);
INIT_WORK(&pl->resolve, phylink_resolve);
@@ -2099,6 +2105,7 @@ static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy,
dev_name(&phy->mdio.dev), phy->drv->name, irq_str);
kfree(irq_str);
+ mutex_lock(&pl->phy_lock);
mutex_lock(&phy->lock);
mutex_lock(&pl->state_mutex);
pl->phydev = phy;
@@ -2144,6 +2151,7 @@ static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy,
mutex_unlock(&pl->state_mutex);
mutex_unlock(&phy->lock);
+ mutex_unlock(&pl->phy_lock);
phylink_dbg(pl,
"phy: %s setting supported %*pb advertising %*pb\n",
@@ -2324,6 +2332,7 @@ void phylink_disconnect_phy(struct phylink *pl)
phy = pl->phydev;
if (phy) {
+ mutex_lock(&pl->phy_lock);
mutex_lock(&phy->lock);
mutex_lock(&pl->state_mutex);
pl->phydev = NULL;
@@ -2331,6 +2340,7 @@ void phylink_disconnect_phy(struct phylink *pl)
pl->mac_tx_clk_stop = false;
mutex_unlock(&pl->state_mutex);
mutex_unlock(&phy->lock);
+ mutex_unlock(&pl->phy_lock);
flush_work(&pl->resolve);
phy_disconnect(phy);
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-09-03 8:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-02 13:41 [RFC PATCH net] net: phy: transfer phy_config_inband() locking responsibility to phylink Vladimir Oltean
2025-09-02 14:09 ` Russell King (Oracle)
2025-09-02 14:42 ` Vladimir Oltean
2025-09-02 15:02 ` Vladimir Oltean
2025-09-03 8:36 ` Vladimir Oltean
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).