netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] net: phy: fix regression with AX88772A PHY driver
@ 2023-09-18 13:25 Russell King (Oracle)
  2023-09-18 13:49 ` Andrew Lunn
  2023-09-19 15:10 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 5+ messages in thread
From: Russell King (Oracle) @ 2023-09-18 13:25 UTC (permalink / raw)
  To: Andrew Lunn, Heiner Kallweit
  Cc: Marek Szyprowski, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Florian Fainelli, netdev

Marek reports that a deadlock occurs with the AX88772A PHY used on the
ASIX USB network driver:

asix 1-1.4:1.0 (unnamed net_device) (uninitialized): PHY [usb-001:003:10] driver [Asix Electronics AX88772A] (irq=POLL)
Asix Electronics AX88772A usb-001:003:10: attached PHY driver(mii_bus:phy_addr=usb-001:003:10, irq=POLL)
asix 1-1.4:1.0 eth0: register 'asix' at usb-12110000.usb-1.4, ASIX AX88772 USB 2.0 Ethernet, a2:99:b6:cd:11:eb
asix 1-1.4:1.0 eth0: configuring for phy/internal link mode

============================================
WARNING: possible recursive locking detected
6.6.0-rc1-00239-g8da77df649c4-dirty #13949 Not tainted
--------------------------------------------
kworker/3:3/71 is trying to acquire lock:
c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_start_aneg+0x1c/0x38

but task is already holding lock:
c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_state_machine+0x100/0x2b8

This is because we now consistently call phy_process_state_change()
while holding phydev->lock, but the AX88772A PHY driver then goes on
to call phy_start_aneg() which tries to grab the same lock - causing
deadlock.

Fix this by exporting the unlocked version, and use this in the PHY
driver instead.

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: ef113a60d0a9 ("net: phy: call phy_error_precise() while holding the lock")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
Reviewing the other PHY drivers, no others appear impacted, just this
one.

 drivers/net/phy/ax88796b.c | 2 +-
 drivers/net/phy/phy.c      | 3 ++-
 include/linux/phy.h        | 1 +
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/ax88796b.c b/drivers/net/phy/ax88796b.c
index 0f1e617a26c9..eb74a8cf8df1 100644
--- a/drivers/net/phy/ax88796b.c
+++ b/drivers/net/phy/ax88796b.c
@@ -90,7 +90,7 @@ static void asix_ax88772a_link_change_notify(struct phy_device *phydev)
 	 */
 	if (phydev->state == PHY_NOLINK) {
 		phy_init_hw(phydev);
-		phy_start_aneg(phydev);
+		_phy_start_aneg(phydev);
 	}
 }
 
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 93a8676dd8d8..a5fa077650e8 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -981,7 +981,7 @@ static int phy_check_link_status(struct phy_device *phydev)
  *   If the PHYCONTROL Layer is operating, we change the state to
  *   reflect the beginning of Auto-negotiation or forcing.
  */
-static int _phy_start_aneg(struct phy_device *phydev)
+int _phy_start_aneg(struct phy_device *phydev)
 {
 	int err;
 
@@ -1002,6 +1002,7 @@ static int _phy_start_aneg(struct phy_device *phydev)
 
 	return err;
 }
+EXPORT_SYMBOL(_phy_start_aneg);
 
 /**
  * phy_start_aneg - start auto-negotiation for this PHY device
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 1351b802ffcf..3cc52826f18e 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1736,6 +1736,7 @@ void phy_detach(struct phy_device *phydev);
 void phy_start(struct phy_device *phydev);
 void phy_stop(struct phy_device *phydev);
 int phy_config_aneg(struct phy_device *phydev);
+int _phy_start_aneg(struct phy_device *phydev);
 int phy_start_aneg(struct phy_device *phydev);
 int phy_aneg_done(struct phy_device *phydev);
 int phy_speed_down(struct phy_device *phydev, bool sync);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: phy: fix regression with AX88772A PHY driver
  2023-09-18 13:25 [PATCH net-next] net: phy: fix regression with AX88772A PHY driver Russell King (Oracle)
@ 2023-09-18 13:49 ` Andrew Lunn
  2023-09-18 13:57   ` Russell King (Oracle)
  2023-09-19 15:10 ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Lunn @ 2023-09-18 13:49 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Heiner Kallweit, Marek Szyprowski, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Florian Fainelli, netdev

On Mon, Sep 18, 2023 at 02:25:36PM +0100, Russell King (Oracle) wrote:
> Marek reports that a deadlock occurs with the AX88772A PHY used on the
> ASIX USB network driver:
> 
> asix 1-1.4:1.0 (unnamed net_device) (uninitialized): PHY [usb-001:003:10] driver [Asix Electronics AX88772A] (irq=POLL)
> Asix Electronics AX88772A usb-001:003:10: attached PHY driver(mii_bus:phy_addr=usb-001:003:10, irq=POLL)
> asix 1-1.4:1.0 eth0: register 'asix' at usb-12110000.usb-1.4, ASIX AX88772 USB 2.0 Ethernet, a2:99:b6:cd:11:eb
> asix 1-1.4:1.0 eth0: configuring for phy/internal link mode
> 
> ============================================
> WARNING: possible recursive locking detected
> 6.6.0-rc1-00239-g8da77df649c4-dirty #13949 Not tainted
> --------------------------------------------
> kworker/3:3/71 is trying to acquire lock:
> c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_start_aneg+0x1c/0x38
> 
> but task is already holding lock:
> c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_state_machine+0x100/0x2b8
> 
> This is because we now consistently call phy_process_state_change()
> while holding phydev->lock, but the AX88772A PHY driver then goes on
> to call phy_start_aneg() which tries to grab the same lock - causing
> deadlock.
> 
> Fix this by exporting the unlocked version, and use this in the PHY
> driver instead.
> 
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Fixes: ef113a60d0a9 ("net: phy: call phy_error_precise() while holding the lock")
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Hi Russell

Yes, this fixes the problem for stable.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

But maybe it would be better to move the hardware workaround into the
PHY driver? Its the PHY which is broken, so why is the MAC working
around it?

       Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: phy: fix regression with AX88772A PHY driver
  2023-09-18 13:49 ` Andrew Lunn
@ 2023-09-18 13:57   ` Russell King (Oracle)
  2023-09-18 16:34     ` Andrew Lunn
  0 siblings, 1 reply; 5+ messages in thread
From: Russell King (Oracle) @ 2023-09-18 13:57 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Heiner Kallweit, Marek Szyprowski, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Florian Fainelli, netdev

On Mon, Sep 18, 2023 at 03:49:32PM +0200, Andrew Lunn wrote:
> On Mon, Sep 18, 2023 at 02:25:36PM +0100, Russell King (Oracle) wrote:
> > Marek reports that a deadlock occurs with the AX88772A PHY used on the
> > ASIX USB network driver:
> > 
> > asix 1-1.4:1.0 (unnamed net_device) (uninitialized): PHY [usb-001:003:10] driver [Asix Electronics AX88772A] (irq=POLL)
> > Asix Electronics AX88772A usb-001:003:10: attached PHY driver(mii_bus:phy_addr=usb-001:003:10, irq=POLL)
> > asix 1-1.4:1.0 eth0: register 'asix' at usb-12110000.usb-1.4, ASIX AX88772 USB 2.0 Ethernet, a2:99:b6:cd:11:eb
> > asix 1-1.4:1.0 eth0: configuring for phy/internal link mode
> > 
> > ============================================
> > WARNING: possible recursive locking detected
> > 6.6.0-rc1-00239-g8da77df649c4-dirty #13949 Not tainted
> > --------------------------------------------
> > kworker/3:3/71 is trying to acquire lock:
> > c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_start_aneg+0x1c/0x38
> > 
> > but task is already holding lock:
> > c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_state_machine+0x100/0x2b8
> > 
> > This is because we now consistently call phy_process_state_change()
> > while holding phydev->lock, but the AX88772A PHY driver then goes on
> > to call phy_start_aneg() which tries to grab the same lock - causing
> > deadlock.
> > 
> > Fix this by exporting the unlocked version, and use this in the PHY
> > driver instead.
> > 
> > Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Fixes: ef113a60d0a9 ("net: phy: call phy_error_precise() while holding the lock")
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> 
> Hi Russell
> 
> Yes, this fixes the problem for stable.
> 
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
> 
> But maybe it would be better to move the hardware workaround into the
> PHY driver? Its the PHY which is broken, so why is the MAC working
> around it?

Err? Sorry, but your comment makes little sense given that my patch
only touches the PHY core (to export _phy_start_aneg()) and the PHY
driver (ax88796b.c) which is where the work-around is already located.

I'm not having to touch the MAC driver at all to fix this, because
afaics the MAC driver isn't involved in _this_ particular workaround.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: phy: fix regression with AX88772A PHY driver
  2023-09-18 13:57   ` Russell King (Oracle)
@ 2023-09-18 16:34     ` Andrew Lunn
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Lunn @ 2023-09-18 16:34 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Heiner Kallweit, Marek Szyprowski, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Florian Fainelli, netdev

> Err? Sorry, but your comment makes little sense

Sorry, -EMORECOFFEE.

       Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: phy: fix regression with AX88772A PHY driver
  2023-09-18 13:25 [PATCH net-next] net: phy: fix regression with AX88772A PHY driver Russell King (Oracle)
  2023-09-18 13:49 ` Andrew Lunn
@ 2023-09-19 15:10 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-09-19 15:10 UTC (permalink / raw)
  To: Russell King
  Cc: andrew, hkallweit1, m.szyprowski, davem, edumazet, kuba, pabeni,
	florian.fainelli, netdev

Hello:

This patch was applied to netdev/net-next.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Mon, 18 Sep 2023 14:25:36 +0100 you wrote:
> Marek reports that a deadlock occurs with the AX88772A PHY used on the
> ASIX USB network driver:
> 
> asix 1-1.4:1.0 (unnamed net_device) (uninitialized): PHY [usb-001:003:10] driver [Asix Electronics AX88772A] (irq=POLL)
> Asix Electronics AX88772A usb-001:003:10: attached PHY driver(mii_bus:phy_addr=usb-001:003:10, irq=POLL)
> asix 1-1.4:1.0 eth0: register 'asix' at usb-12110000.usb-1.4, ASIX AX88772 USB 2.0 Ethernet, a2:99:b6:cd:11:eb
> asix 1-1.4:1.0 eth0: configuring for phy/internal link mode
> 
> [...]

Here is the summary with links:
  - [net-next] net: phy: fix regression with AX88772A PHY driver
    https://git.kernel.org/netdev/net-next/c/6a23c555f7eb

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-09-19 15:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-18 13:25 [PATCH net-next] net: phy: fix regression with AX88772A PHY driver Russell King (Oracle)
2023-09-18 13:49 ` Andrew Lunn
2023-09-18 13:57   ` Russell King (Oracle)
2023-09-18 16:34     ` Andrew Lunn
2023-09-19 15:10 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).