* [PATCH net] net: phy: fix phy_uses_state_machine()
@ 2025-08-31 16:38 Russell King (Oracle)
2025-09-01 8:09 ` Vladimir Oltean
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Russell King (Oracle) @ 2025-08-31 16:38 UTC (permalink / raw)
To: Andrew Lunn, Heiner Kallweit
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, netdev,
Paolo Abeni, Vladimir Oltean
phy_uses_state_machine() is called from the resume path (see
mdio_bus_phy_resume()) which will be called for all devices whether
they are connected to a network device or not.
phydev->phy_link_change is initialised by phy_attach_direct(), and
overridden by phylink. This means that a never-connected PHY will
have phydev->phy_link_change set to NULL, which causes
phy_uses_state_machine() to return true. This is incorrect.
Fix the case where phydev->phy_link_change is NULL.
Reported-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20250806082931.3289134-1-xu.yang_2@nxp.com
Fixes: fc75ea20ffb4 ("net: phy: allow MDIO bus PM ops to start/stop state machine for phylink-controlled PHY")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
The provided Link: rather than Closes: is because there were two issues
identified in that thread, and this patch only addresses one of them.
Therefore, it is not correct to mark that issue closed.
Xu Yang reported this fixed the problem for him, and it is an oversight
in the phy_uses_state_machine() test.
drivers/net/phy/phy_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 7556aa3dd7ee..e6a673faabe6 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -288,7 +288,7 @@ static bool phy_uses_state_machine(struct phy_device *phydev)
return phydev->attached_dev && phydev->adjust_link;
/* phydev->phy_link_change is implicitly phylink_phy_change() */
- return true;
+ return !!phydev->phy_link_change;
}
static bool mdio_bus_phy_may_suspend(struct phy_device *phydev)
--
2.47.2
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH net] net: phy: fix phy_uses_state_machine()
2025-08-31 16:38 [PATCH net] net: phy: fix phy_uses_state_machine() Russell King (Oracle)
@ 2025-09-01 8:09 ` Vladimir Oltean
2025-09-01 8:42 ` Vladimir Oltean
2025-09-01 9:34 ` Russell King (Oracle)
2 siblings, 0 replies; 15+ messages in thread
From: Vladimir Oltean @ 2025-09-01 8:09 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
Jakub Kicinski, netdev, Paolo Abeni
On Sun, Aug 31, 2025 at 05:38:11PM +0100, Russell King (Oracle) wrote:
> phy_uses_state_machine() is called from the resume path (see
> mdio_bus_phy_resume()) which will be called for all devices whether
> they are connected to a network device or not.
>
> phydev->phy_link_change is initialised by phy_attach_direct(), and
> overridden by phylink. This means that a never-connected PHY will
> have phydev->phy_link_change set to NULL, which causes
> phy_uses_state_machine() to return true. This is incorrect.
>
> Fix the case where phydev->phy_link_change is NULL.
>
> Reported-by: Xu Yang <xu.yang_2@nxp.com>
> Link: https://lore.kernel.org/r/20250806082931.3289134-1-xu.yang_2@nxp.com
> Fixes: fc75ea20ffb4 ("net: phy: allow MDIO bus PM ops to start/stop state machine for phylink-controlled PHY")
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
> The provided Link: rather than Closes: is because there were two issues
> identified in that thread, and this patch only addresses one of them.
> Therefore, it is not correct to mark that issue closed.
>
> Xu Yang reported this fixed the problem for him, and it is an oversight
> in the phy_uses_state_machine() test.
>
> drivers/net/phy/phy_device.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 7556aa3dd7ee..e6a673faabe6 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -288,7 +288,7 @@ static bool phy_uses_state_machine(struct phy_device *phydev)
> return phydev->attached_dev && phydev->adjust_link;
>
> /* phydev->phy_link_change is implicitly phylink_phy_change() */
We should either update this comment to clarify that the function
pointer is only set to phylink_phy_change() if set at all, or do away
with the comment entirely. Otherwise, it doesn't make much sense.
With that:
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> - return true;
> + return !!phydev->phy_link_change;
> }
>
> static bool mdio_bus_phy_may_suspend(struct phy_device *phydev)
> --
> 2.47.2
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net] net: phy: fix phy_uses_state_machine()
2025-08-31 16:38 [PATCH net] net: phy: fix phy_uses_state_machine() Russell King (Oracle)
2025-09-01 8:09 ` Vladimir Oltean
@ 2025-09-01 8:42 ` Vladimir Oltean
2025-09-01 9:09 ` Russell King (Oracle)
2025-09-01 9:34 ` Russell King (Oracle)
2 siblings, 1 reply; 15+ messages in thread
From: Vladimir Oltean @ 2025-09-01 8:42 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
Jakub Kicinski, netdev, Paolo Abeni
On Sun, Aug 31, 2025 at 05:38:11PM +0100, Russell King (Oracle) wrote:
> phydev->phy_link_change is initialised by phy_attach_direct(), and
> overridden by phylink. This means that a never-connected PHY will
> have phydev->phy_link_change set to NULL, which causes
> phy_uses_state_machine() to return true. This is incorrect.
Another nitpick regarding phrasing here: the never-connected PHY doesn't
_cause_ phy_uses_state_machine() to return true. It returns true _in
spite_ of the PHY never being connected: the non-NULL quality of
phydev->phy_link_change is not something that phy_uses_state_machine()
tests for.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] net: phy: fix phy_uses_state_machine()
2025-09-01 8:42 ` Vladimir Oltean
@ 2025-09-01 9:09 ` Russell King (Oracle)
2025-09-01 9:35 ` Vladimir Oltean
0 siblings, 1 reply; 15+ messages in thread
From: Russell King (Oracle) @ 2025-09-01 9:09 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
Jakub Kicinski, netdev, Paolo Abeni
On Mon, Sep 01, 2025 at 11:42:25AM +0300, Vladimir Oltean wrote:
> On Sun, Aug 31, 2025 at 05:38:11PM +0100, Russell King (Oracle) wrote:
> > phydev->phy_link_change is initialised by phy_attach_direct(), and
> > overridden by phylink. This means that a never-connected PHY will
> > have phydev->phy_link_change set to NULL, which causes
> > phy_uses_state_machine() to return true. This is incorrect.
>
> Another nitpick regarding phrasing here: the never-connected PHY doesn't
> _cause_ phy_uses_state_machine() to return true. It returns true _in
> spite_ of the PHY never being connected: the non-NULL quality of
> phydev->phy_link_change is not something that phy_uses_state_machine()
> tests for.
No. What I'm saying is that if phydev->phy_link_change is set to NULL,
_this_ causes phy_uses_state_machine() to return true and that
behaviour incorrect.
The first part is describing _when_ phydev->phy_link_change is set to
NULL.
It is not saying that a never-connected PHY directly causes
phy_uses_state_machine() to return true.
I think my phrasing of this is totally fine, even re-reading it now.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] net: phy: fix phy_uses_state_machine()
2025-09-01 9:09 ` Russell King (Oracle)
@ 2025-09-01 9:35 ` Vladimir Oltean
2025-09-01 10:05 ` Russell King (Oracle)
0 siblings, 1 reply; 15+ messages in thread
From: Vladimir Oltean @ 2025-09-01 9:35 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
Jakub Kicinski, netdev, Paolo Abeni
On Mon, Sep 01, 2025 at 10:09:17AM +0100, Russell King (Oracle) wrote:
> On Mon, Sep 01, 2025 at 11:42:25AM +0300, Vladimir Oltean wrote:
> > On Sun, Aug 31, 2025 at 05:38:11PM +0100, Russell King (Oracle) wrote:
> > > phydev->phy_link_change is initialised by phy_attach_direct(), and
> > > overridden by phylink. This means that a never-connected PHY will
> > > have phydev->phy_link_change set to NULL, which causes
> > > phy_uses_state_machine() to return true. This is incorrect.
> >
> > Another nitpick regarding phrasing here: the never-connected PHY doesn't
> > _cause_ phy_uses_state_machine() to return true. It returns true _in
> > spite_ of the PHY never being connected: the non-NULL quality of
> > phydev->phy_link_change is not something that phy_uses_state_machine()
> > tests for.
>
> No. What I'm saying is that if phydev->phy_link_change is set to NULL,
> _this_ causes phy_uses_state_machine() to return true and that
> behaviour incorrect.
>
> The first part is describing _when_ phydev->phy_link_change is set to
> NULL.
>
> It is not saying that a never-connected PHY directly causes
> phy_uses_state_machine() to return true.
>
> I think my phrasing of this is totally fine, even re-reading it now.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
I think the language is sufficiently flexible for us to find a phrasing
that avoids all ambiguity. What about:
"It was assumed the only two valid values for phydev->phy_link_change
are phy_link_change() and phylink_phy_change(), thus phy_uses_state_machine()
was oversimplified to only compare with one of these values. There is a
third possible value (NULL), meaning that the PHY is unconnected, and
does not use the state machine. This logic misinterprets this case as
phy_uses_state_machine() == true, but in reality it should return false."
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] net: phy: fix phy_uses_state_machine()
2025-09-01 9:35 ` Vladimir Oltean
@ 2025-09-01 10:05 ` Russell King (Oracle)
2025-09-01 10:36 ` Vladimir Oltean
0 siblings, 1 reply; 15+ messages in thread
From: Russell King (Oracle) @ 2025-09-01 10:05 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
Jakub Kicinski, netdev, Paolo Abeni
On Mon, Sep 01, 2025 at 12:35:30PM +0300, Vladimir Oltean wrote:
> On Mon, Sep 01, 2025 at 10:09:17AM +0100, Russell King (Oracle) wrote:
> > On Mon, Sep 01, 2025 at 11:42:25AM +0300, Vladimir Oltean wrote:
> > > On Sun, Aug 31, 2025 at 05:38:11PM +0100, Russell King (Oracle) wrote:
> > > > phydev->phy_link_change is initialised by phy_attach_direct(), and
> > > > overridden by phylink. This means that a never-connected PHY will
> > > > have phydev->phy_link_change set to NULL, which causes
> > > > phy_uses_state_machine() to return true. This is incorrect.
> > >
> > > Another nitpick regarding phrasing here: the never-connected PHY doesn't
> > > _cause_ phy_uses_state_machine() to return true. It returns true _in
> > > spite_ of the PHY never being connected: the non-NULL quality of
> > > phydev->phy_link_change is not something that phy_uses_state_machine()
> > > tests for.
> >
> > No. What I'm saying is that if phydev->phy_link_change is set to NULL,
> > _this_ causes phy_uses_state_machine() to return true and that
> > behaviour incorrect.
> >
> > The first part is describing _when_ phydev->phy_link_change is set to
> > NULL.
> >
> > It is not saying that a never-connected PHY directly causes
> > phy_uses_state_machine() to return true.
> >
> > I think my phrasing of this is totally fine, even re-reading it now.
> >
> > --
> > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
>
> I think the language is sufficiently flexible for us to find a phrasing
> that avoids all ambiguity. What about:
>
> "It was assumed the only two valid values for phydev->phy_link_change
> are phy_link_change() and phylink_phy_change(), thus phy_uses_state_machine()
> was oversimplified to only compare with one of these values. There is a
> third possible value (NULL), meaning that the PHY is unconnected, and
> does not use the state machine. This logic misinterprets this case as
> phy_uses_state_machine() == true, but in reality it should return false."
Well, having spent considerable time writing and rewriting the damn
commit message, this is what I'm now using, which I think covers the
problem in sufficient detail.
>>>>>>
net: phy: fix phy_uses_state_machine()
The blamed commit changed the conditions which phylib uses to stop
and start the state machine in the suspend and resume paths, and
while improving it, has caused two issues.
The original code used this test:
phydev->attached_dev && phydev->adjust_link
and if true, the paths would handle the PHY state machine. This test
evaluates true for normal drivers that are using phylib directly
while the PHY is attached to the network device, but false in all
other cases, which include the following cases:
- when the PHY has never been attached to a network device.
- when the PHY has been detached from a network device (as phy_detach()
sets phydev->attached_dev to NULL, phy_disconnect() calls
phy_detach() and additionally sets phydev->adjust_link NULL.)
- when phylink is using the driver (as phydev->adjust_link is NULL.)
Only the third case was incorrect, and the blamed commit attempted to
fix this by changing this test to (simplified for brevity, see
phy_uses_state_machine()):
phydev->phy_link_change == phy_link_change ?
phydev->attached_dev && phydev->adjust_link : true
However, this also incorrectly evaluates true in the first two cases.
Fix the first case by ensuring that phy_uses_state_machine() returns
false when phydev->phy_link_change is NULL.
Fix the second case by ensuring that phydev->phy_link_change is set to
NULL when phy_detach() is called.
<<<<<<
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net] net: phy: fix phy_uses_state_machine()
2025-09-01 10:05 ` Russell King (Oracle)
@ 2025-09-01 10:36 ` Vladimir Oltean
2025-09-01 14:14 ` Russell King (Oracle)
0 siblings, 1 reply; 15+ messages in thread
From: Vladimir Oltean @ 2025-09-01 10:36 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
Jakub Kicinski, netdev, Paolo Abeni
On Mon, Sep 01, 2025 at 11:05:54AM +0100, Russell King (Oracle) wrote:
> Well, having spent considerable time writing and rewriting the damn
> commit message, this is what I'm now using, which I think covers the
> problem in sufficient detail.
>
> >>>>>>
> net: phy: fix phy_uses_state_machine()
>
> The blamed commit changed the conditions which phylib uses to stop
> and start the state machine in the suspend and resume paths, and
> while improving it, has caused two issues.
>
> The original code used this test:
>
> phydev->attached_dev && phydev->adjust_link
>
> and if true, the paths would handle the PHY state machine. This test
> evaluates true for normal drivers that are using phylib directly
> while the PHY is attached to the network device, but false in all
> other cases, which include the following cases:
>
> - when the PHY has never been attached to a network device.
> - when the PHY has been detached from a network device (as phy_detach()
> sets phydev->attached_dev to NULL, phy_disconnect() calls
> phy_detach() and additionally sets phydev->adjust_link NULL.)
> - when phylink is using the driver (as phydev->adjust_link is NULL.)
>
> Only the third case was incorrect, and the blamed commit attempted to
> fix this by changing this test to (simplified for brevity, see
> phy_uses_state_machine()):
>
> phydev->phy_link_change == phy_link_change ?
> phydev->attached_dev && phydev->adjust_link : true
>
> However, this also incorrectly evaluates true in the first two cases.
>
> Fix the first case by ensuring that phy_uses_state_machine() returns
> false when phydev->phy_link_change is NULL.
>
> Fix the second case by ensuring that phydev->phy_link_change is set to
> NULL when phy_detach() is called.
> <<<<<<
The new explanation and the placement of the function pointer clearing
make sense, thanks. Maybe add one last sentence at the end: "This covers
both the phylink_bringup_phy() error path, as well as the normal
phylink_disconnect_phy() path."
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] net: phy: fix phy_uses_state_machine()
2025-09-01 10:36 ` Vladimir Oltean
@ 2025-09-01 14:14 ` Russell King (Oracle)
2025-09-01 14:25 ` Vladimir Oltean
0 siblings, 1 reply; 15+ messages in thread
From: Russell King (Oracle) @ 2025-09-01 14:14 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
Jakub Kicinski, netdev, Paolo Abeni
On Mon, Sep 01, 2025 at 01:36:24PM +0300, Vladimir Oltean wrote:
> On Mon, Sep 01, 2025 at 11:05:54AM +0100, Russell King (Oracle) wrote:
> > Well, having spent considerable time writing and rewriting the damn
> > commit message, this is what I'm now using, which I think covers the
> > problem in sufficient detail.
> >
> > >>>>>>
> > net: phy: fix phy_uses_state_machine()
> >
> > The blamed commit changed the conditions which phylib uses to stop
> > and start the state machine in the suspend and resume paths, and
> > while improving it, has caused two issues.
> >
> > The original code used this test:
> >
> > phydev->attached_dev && phydev->adjust_link
> >
> > and if true, the paths would handle the PHY state machine. This test
> > evaluates true for normal drivers that are using phylib directly
> > while the PHY is attached to the network device, but false in all
> > other cases, which include the following cases:
> >
> > - when the PHY has never been attached to a network device.
> > - when the PHY has been detached from a network device (as phy_detach()
> > sets phydev->attached_dev to NULL, phy_disconnect() calls
> > phy_detach() and additionally sets phydev->adjust_link NULL.)
> > - when phylink is using the driver (as phydev->adjust_link is NULL.)
> >
> > Only the third case was incorrect, and the blamed commit attempted to
> > fix this by changing this test to (simplified for brevity, see
> > phy_uses_state_machine()):
> >
> > phydev->phy_link_change == phy_link_change ?
> > phydev->attached_dev && phydev->adjust_link : true
> >
> > However, this also incorrectly evaluates true in the first two cases.
> >
> > Fix the first case by ensuring that phy_uses_state_machine() returns
> > false when phydev->phy_link_change is NULL.
> >
> > Fix the second case by ensuring that phydev->phy_link_change is set to
> > NULL when phy_detach() is called.
> > <<<<<<
>
> The new explanation and the placement of the function pointer clearing
> make sense, thanks. Maybe add one last sentence at the end: "This covers
> both the phylink_bringup_phy() error path, as well as the normal
> phylink_disconnect_phy() path."
Is it really necessary to add that level of detail? phy_attach() sets
phydev->phy_link_change, so it's natural that phy_detach() should undo
that action, especially as phy_detach() is phy_attach()'s complement.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] net: phy: fix phy_uses_state_machine()
2025-09-01 14:14 ` Russell King (Oracle)
@ 2025-09-01 14:25 ` Vladimir Oltean
0 siblings, 0 replies; 15+ messages in thread
From: Vladimir Oltean @ 2025-09-01 14:25 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
Jakub Kicinski, netdev, Paolo Abeni
On Mon, Sep 01, 2025 at 03:14:49PM +0100, Russell King (Oracle) wrote:
> On Mon, Sep 01, 2025 at 01:36:24PM +0300, Vladimir Oltean wrote:
> > The new explanation and the placement of the function pointer clearing
> > make sense, thanks. Maybe add one last sentence at the end: "This covers
> > both the phylink_bringup_phy() error path, as well as the normal
> > phylink_disconnect_phy() path."
>
> Is it really necessary to add that level of detail? phy_attach() sets
> phydev->phy_link_change, so it's natural that phy_detach() should undo
> that action, especially as phy_detach() is phy_attach()'s complement.
Well, it shows for future reference the code paths you've considered
when establishing the placement of the code. Your call whether you want
to include this sentence or not - and it looks like you don't want to.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] net: phy: fix phy_uses_state_machine()
2025-08-31 16:38 [PATCH net] net: phy: fix phy_uses_state_machine() Russell King (Oracle)
2025-09-01 8:09 ` Vladimir Oltean
2025-09-01 8:42 ` Vladimir Oltean
@ 2025-09-01 9:34 ` Russell King (Oracle)
2025-09-01 9:39 ` Vladimir Oltean
2 siblings, 1 reply; 15+ messages in thread
From: Russell King (Oracle) @ 2025-09-01 9:34 UTC (permalink / raw)
To: Andrew Lunn, Heiner Kallweit
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, netdev,
Paolo Abeni, Vladimir Oltean
On Sun, Aug 31, 2025 at 05:38:11PM +0100, Russell King (Oracle) wrote:
> phy_uses_state_machine() is called from the resume path (see
> mdio_bus_phy_resume()) which will be called for all devices whether
> they are connected to a network device or not.
>
> phydev->phy_link_change is initialised by phy_attach_direct(), and
> overridden by phylink. This means that a never-connected PHY will
> have phydev->phy_link_change set to NULL, which causes
> phy_uses_state_machine() to return true. This is incorrect.
>
> Fix the case where phydev->phy_link_change is NULL.
>
> Reported-by: Xu Yang <xu.yang_2@nxp.com>
> Link: https://lore.kernel.org/r/20250806082931.3289134-1-xu.yang_2@nxp.com
> Fixes: fc75ea20ffb4 ("net: phy: allow MDIO bus PM ops to start/stop state machine for phylink-controlled PHY")
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
> The provided Link: rather than Closes: is because there were two issues
> identified in that thread, and this patch only addresses one of them.
> Therefore, it is not correct to mark that issue closed.
>
> Xu Yang reported this fixed the problem for him, and it is an oversight
> in the phy_uses_state_machine() test.
While looking at this after Vladimir's comments, I've realised that
phy_uses_state_machine() will also return true when a PHY has been
attached and detached by phylink - phydev->phy_link_change remains
set to phylink_phy_change after it has been detached. So, there will
definitely be a v2 for this.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net] net: phy: fix phy_uses_state_machine()
2025-09-01 9:34 ` Russell King (Oracle)
@ 2025-09-01 9:39 ` Vladimir Oltean
2025-09-01 10:23 ` Russell King (Oracle)
0 siblings, 1 reply; 15+ messages in thread
From: Vladimir Oltean @ 2025-09-01 9:39 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
Jakub Kicinski, netdev, Paolo Abeni
On Mon, Sep 01, 2025 at 10:34:41AM +0100, Russell King (Oracle) wrote:
> On Sun, Aug 31, 2025 at 05:38:11PM +0100, Russell King (Oracle) wrote:
> > phy_uses_state_machine() is called from the resume path (see
> > mdio_bus_phy_resume()) which will be called for all devices whether
> > they are connected to a network device or not.
> >
> > phydev->phy_link_change is initialised by phy_attach_direct(), and
> > overridden by phylink. This means that a never-connected PHY will
> > have phydev->phy_link_change set to NULL, which causes
> > phy_uses_state_machine() to return true. This is incorrect.
> >
> > Fix the case where phydev->phy_link_change is NULL.
> >
> > Reported-by: Xu Yang <xu.yang_2@nxp.com>
> > Link: https://lore.kernel.org/r/20250806082931.3289134-1-xu.yang_2@nxp.com
> > Fixes: fc75ea20ffb4 ("net: phy: allow MDIO bus PM ops to start/stop state machine for phylink-controlled PHY")
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > ---
> > The provided Link: rather than Closes: is because there were two issues
> > identified in that thread, and this patch only addresses one of them.
> > Therefore, it is not correct to mark that issue closed.
> >
> > Xu Yang reported this fixed the problem for him, and it is an oversight
> > in the phy_uses_state_machine() test.
>
> While looking at this after Vladimir's comments, I've realised that
> phy_uses_state_machine() will also return true when a PHY has been
> attached and detached by phylink - phydev->phy_link_change remains
> set to phylink_phy_change after it has been detached. So, there will
> definitely be a v2 for this.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Good point. Do you plan to modify phy_disconnect() to set
phydev->phy_link_change = NULL?
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net] net: phy: fix phy_uses_state_machine()
2025-09-01 9:39 ` Vladimir Oltean
@ 2025-09-01 10:23 ` Russell King (Oracle)
0 siblings, 0 replies; 15+ messages in thread
From: Russell King (Oracle) @ 2025-09-01 10:23 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
Jakub Kicinski, netdev, Paolo Abeni
On Mon, Sep 01, 2025 at 12:39:57PM +0300, Vladimir Oltean wrote:
> On Mon, Sep 01, 2025 at 10:34:41AM +0100, Russell King (Oracle) wrote:
> > On Sun, Aug 31, 2025 at 05:38:11PM +0100, Russell King (Oracle) wrote:
> > > phy_uses_state_machine() is called from the resume path (see
> > > mdio_bus_phy_resume()) which will be called for all devices whether
> > > they are connected to a network device or not.
> > >
> > > phydev->phy_link_change is initialised by phy_attach_direct(), and
> > > overridden by phylink. This means that a never-connected PHY will
> > > have phydev->phy_link_change set to NULL, which causes
> > > phy_uses_state_machine() to return true. This is incorrect.
> > >
> > > Fix the case where phydev->phy_link_change is NULL.
> > >
> > > Reported-by: Xu Yang <xu.yang_2@nxp.com>
> > > Link: https://lore.kernel.org/r/20250806082931.3289134-1-xu.yang_2@nxp.com
> > > Fixes: fc75ea20ffb4 ("net: phy: allow MDIO bus PM ops to start/stop state machine for phylink-controlled PHY")
> > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > > ---
> > > The provided Link: rather than Closes: is because there were two issues
> > > identified in that thread, and this patch only addresses one of them.
> > > Therefore, it is not correct to mark that issue closed.
> > >
> > > Xu Yang reported this fixed the problem for him, and it is an oversight
> > > in the phy_uses_state_machine() test.
> >
> > While looking at this after Vladimir's comments, I've realised that
> > phy_uses_state_machine() will also return true when a PHY has been
> > attached and detached by phylink - phydev->phy_link_change remains
> > set to phylink_phy_change after it has been detached. So, there will
> > definitely be a v2 for this.
> >
> > --
> > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
>
> Good point. Do you plan to modify phy_disconnect() to set
> phydev->phy_link_change = NULL?
No, I've put it in phy_detach() because this sequence:
phy_attach_direct()
phylink_bringup_phy()
phydev->phy_link_change set
error occurs
phy_detach()
would result in phydev->phy_link_change remaining set if it was
only cleared in phy_disconnect().
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH net] net: phy: fix phy_uses_state_machine()
@ 2025-09-07 20:44 Russell King (Oracle)
2025-09-08 13:37 ` Vladimir Oltean
2025-09-09 23:50 ` patchwork-bot+netdevbpf
0 siblings, 2 replies; 15+ messages in thread
From: Russell King (Oracle) @ 2025-09-07 20:44 UTC (permalink / raw)
To: Andrew Lunn, Heiner Kallweit
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, netdev,
Paolo Abeni, Vladimir Oltean
The blamed commit changed the conditions which phylib uses to stop
and start the state machine in the suspend and resume paths, and
while improving it, has caused two issues.
The original code used this test:
phydev->attached_dev && phydev->adjust_link
and if true, the paths would handle the PHY state machine. This test
evaluates true for normal drivers that are using phylib directly
while the PHY is attached to the network device, but false in all
other cases, which include the following cases:
- when the PHY has never been attached to a network device.
- when the PHY has been detached from a network device (as phy_detach()
sets phydev->attached_dev to NULL, phy_disconnect() calls
phy_detach() and additionally sets phydev->adjust_link NULL.)
- when phylink is using the driver (as phydev->adjust_link is NULL.)
Only the third case was incorrect, and the blamed commit attempted to
fix this by changing this test to (simplified for brevity, see
phy_uses_state_machine()):
phydev->phy_link_change == phy_link_change ?
phydev->attached_dev && phydev->adjust_link : true
However, this also incorrectly evaluates true in the first two cases.
Fix the first case by ensuring that phy_uses_state_machine() returns
false when phydev->phy_link_change is NULL.
Fix the second case by ensuring that phydev->phy_link_change is set to
NULL when phy_detach() is called.
Reported-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20250806082931.3289134-1-xu.yang_2@nxp.com
Fixes: fc75ea20ffb4 ("net: phy: allow MDIO bus PM ops to start/stop state machine for phylink-controlled PHY")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
v2: updated commit description
drivers/net/phy/phy_device.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 7556aa3dd7ee..c82c1997147b 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -287,8 +287,7 @@ static bool phy_uses_state_machine(struct phy_device *phydev)
if (phydev->phy_link_change == phy_link_change)
return phydev->attached_dev && phydev->adjust_link;
- /* phydev->phy_link_change is implicitly phylink_phy_change() */
- return true;
+ return !!phydev->phy_link_change;
}
static bool mdio_bus_phy_may_suspend(struct phy_device *phydev)
@@ -1864,6 +1863,8 @@ void phy_detach(struct phy_device *phydev)
phydev->attached_dev = NULL;
phy_link_topo_del_phy(dev, phydev);
}
+
+ phydev->phy_link_change = NULL;
phydev->phylink = NULL;
if (!phydev->is_on_sfp_module)
--
2.47.3
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH net] net: phy: fix phy_uses_state_machine()
2025-09-07 20:44 Russell King (Oracle)
@ 2025-09-08 13:37 ` Vladimir Oltean
2025-09-09 23:50 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 15+ messages in thread
From: Vladimir Oltean @ 2025-09-08 13:37 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
Jakub Kicinski, netdev, Paolo Abeni
On Sun, Sep 07, 2025 at 09:44:01PM +0100, Russell King (Oracle) wrote:
> The blamed commit changed the conditions which phylib uses to stop
> and start the state machine in the suspend and resume paths, and
> while improving it, has caused two issues.
>
> The original code used this test:
>
> phydev->attached_dev && phydev->adjust_link
>
> and if true, the paths would handle the PHY state machine. This test
> evaluates true for normal drivers that are using phylib directly
> while the PHY is attached to the network device, but false in all
> other cases, which include the following cases:
>
> - when the PHY has never been attached to a network device.
> - when the PHY has been detached from a network device (as phy_detach()
> sets phydev->attached_dev to NULL, phy_disconnect() calls
> phy_detach() and additionally sets phydev->adjust_link NULL.)
> - when phylink is using the driver (as phydev->adjust_link is NULL.)
>
> Only the third case was incorrect, and the blamed commit attempted to
> fix this by changing this test to (simplified for brevity, see
> phy_uses_state_machine()):
>
> phydev->phy_link_change == phy_link_change ?
> phydev->attached_dev && phydev->adjust_link : true
>
> However, this also incorrectly evaluates true in the first two cases.
>
> Fix the first case by ensuring that phy_uses_state_machine() returns
> false when phydev->phy_link_change is NULL.
>
> Fix the second case by ensuring that phydev->phy_link_change is set to
> NULL when phy_detach() is called.
>
> Reported-by: Xu Yang <xu.yang_2@nxp.com>
> Link: https://lore.kernel.org/r/20250806082931.3289134-1-xu.yang_2@nxp.com
> Fixes: fc75ea20ffb4 ("net: phy: allow MDIO bus PM ops to start/stop state machine for phylink-controlled PHY")
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
>
> v2: updated commit description
..and made an addition to phy_detach() which fixes the second case.
>
> drivers/net/phy/phy_device.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net] net: phy: fix phy_uses_state_machine()
2025-09-07 20:44 Russell King (Oracle)
2025-09-08 13:37 ` Vladimir Oltean
@ 2025-09-09 23:50 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 15+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-09-09 23:50 UTC (permalink / raw)
To: Russell King
Cc: andrew, hkallweit1, davem, edumazet, kuba, netdev, pabeni,
vladimir.oltean
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Sun, 07 Sep 2025 21:44:01 +0100 you wrote:
> The blamed commit changed the conditions which phylib uses to stop
> and start the state machine in the suspend and resume paths, and
> while improving it, has caused two issues.
>
> The original code used this test:
>
> phydev->attached_dev && phydev->adjust_link
>
> [...]
Here is the summary with links:
- [net] net: phy: fix phy_uses_state_machine()
https://git.kernel.org/netdev/net/c/e0d1c55501d3
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-09-09 23:50 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-31 16:38 [PATCH net] net: phy: fix phy_uses_state_machine() Russell King (Oracle)
2025-09-01 8:09 ` Vladimir Oltean
2025-09-01 8:42 ` Vladimir Oltean
2025-09-01 9:09 ` Russell King (Oracle)
2025-09-01 9:35 ` Vladimir Oltean
2025-09-01 10:05 ` Russell King (Oracle)
2025-09-01 10:36 ` Vladimir Oltean
2025-09-01 14:14 ` Russell King (Oracle)
2025-09-01 14:25 ` Vladimir Oltean
2025-09-01 9:34 ` Russell King (Oracle)
2025-09-01 9:39 ` Vladimir Oltean
2025-09-01 10:23 ` Russell King (Oracle)
-- strict thread matches above, loose matches on Subject: below --
2025-09-07 20:44 Russell King (Oracle)
2025-09-08 13:37 ` Vladimir Oltean
2025-09-09 23:50 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).