* [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted @ 2025-05-23 8:37 Wei Fang 2025-05-23 13:18 ` Andrew Lunn ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Wei Fang @ 2025-05-23 8:37 UTC (permalink / raw) To: andrew, hkallweit1, linux, davem, edumazet, kuba, pabeni, f.fainelli, xiaolei.wang Cc: netdev, linux-kernel, imx There is a potential crash issue when disabling and re-enabling the network port. When disabling the network port, phy_detach() calls device_link_del() to remove the device link, but it does not clear phydev->devlink, so phydev->devlink is not a NULL pointer. Then the network port is re-enabled, but if phy_attach_direct() fails before calling device_link_add(), the code jumps to the "error" label and calls phy_detach(). Since phydev->devlink retains the old value from the previous attach/detach cycle, device_link_del() uses the old value, which accesses a NULL pointer and causes a crash. The simplified crash log is as follows. [ 24.702421] Call trace: [ 24.704856] device_link_put_kref+0x20/0x120 [ 24.709124] device_link_del+0x30/0x48 [ 24.712864] phy_detach+0x24/0x168 [ 24.716261] phy_attach_direct+0x168/0x3a4 [ 24.720352] phylink_fwnode_phy_connect+0xc8/0x14c [ 24.725140] phylink_of_phy_connect+0x1c/0x34 Therefore, phydev->devlink needs to be cleared when the device link is deleted. Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev") Signed-off-by: Wei Fang <wei.fang@nxp.com> --- v2: Improve the commit message. --- drivers/net/phy/phy_device.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index cc1bfd22fb81..7d5e76a3db0e 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1727,8 +1727,10 @@ void phy_detach(struct phy_device *phydev) struct module *ndev_owner = NULL; struct mii_bus *bus; - if (phydev->devlink) + if (phydev->devlink) { device_link_del(phydev->devlink); + phydev->devlink = NULL; + } if (phydev->sysfs_links) { if (dev) -- 2.34.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted 2025-05-23 8:37 [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted Wei Fang @ 2025-05-23 13:18 ` Andrew Lunn 2025-05-23 15:19 ` Florian Fainelli 2025-05-28 0:50 ` patchwork-bot+netdevbpf 2 siblings, 0 replies; 13+ messages in thread From: Andrew Lunn @ 2025-05-23 13:18 UTC (permalink / raw) To: Wei Fang Cc: hkallweit1, linux, davem, edumazet, kuba, pabeni, f.fainelli, xiaolei.wang, netdev, linux-kernel, imx On Fri, May 23, 2025 at 04:37:59PM +0800, Wei Fang wrote: > There is a potential crash issue when disabling and re-enabling the > network port. When disabling the network port, phy_detach() calls > device_link_del() to remove the device link, but it does not clear > phydev->devlink, so phydev->devlink is not a NULL pointer. Then the > network port is re-enabled, but if phy_attach_direct() fails before > calling device_link_add(), the code jumps to the "error" label and > calls phy_detach(). Since phydev->devlink retains the old value from > the previous attach/detach cycle, device_link_del() uses the old value, > which accesses a NULL pointer and causes a crash. The simplified crash > log is as follows. > > [ 24.702421] Call trace: > [ 24.704856] device_link_put_kref+0x20/0x120 > [ 24.709124] device_link_del+0x30/0x48 > [ 24.712864] phy_detach+0x24/0x168 > [ 24.716261] phy_attach_direct+0x168/0x3a4 > [ 24.720352] phylink_fwnode_phy_connect+0xc8/0x14c > [ 24.725140] phylink_of_phy_connect+0x1c/0x34 > > Therefore, phydev->devlink needs to be cleared when the device link is > deleted. > > Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev") > Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Andrew ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted 2025-05-23 8:37 [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted Wei Fang 2025-05-23 13:18 ` Andrew Lunn @ 2025-05-23 15:19 ` Florian Fainelli 2025-06-03 20:39 ` Abhishek Chauhan (ABC) 2025-05-28 0:50 ` patchwork-bot+netdevbpf 2 siblings, 1 reply; 13+ messages in thread From: Florian Fainelli @ 2025-05-23 15:19 UTC (permalink / raw) To: Wei Fang, andrew, hkallweit1, linux, davem, edumazet, kuba, pabeni, xiaolei.wang Cc: netdev, linux-kernel, imx On 5/23/2025 1:37 AM, Wei Fang wrote: > There is a potential crash issue when disabling and re-enabling the > network port. When disabling the network port, phy_detach() calls > device_link_del() to remove the device link, but it does not clear > phydev->devlink, so phydev->devlink is not a NULL pointer. Then the > network port is re-enabled, but if phy_attach_direct() fails before > calling device_link_add(), the code jumps to the "error" label and > calls phy_detach(). Since phydev->devlink retains the old value from > the previous attach/detach cycle, device_link_del() uses the old value, > which accesses a NULL pointer and causes a crash. The simplified crash > log is as follows. > > [ 24.702421] Call trace: > [ 24.704856] device_link_put_kref+0x20/0x120 > [ 24.709124] device_link_del+0x30/0x48 > [ 24.712864] phy_detach+0x24/0x168 > [ 24.716261] phy_attach_direct+0x168/0x3a4 > [ 24.720352] phylink_fwnode_phy_connect+0xc8/0x14c > [ 24.725140] phylink_of_phy_connect+0x1c/0x34 > > Therefore, phydev->devlink needs to be cleared when the device link is > deleted. > > Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev") > Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> -- Florian ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted 2025-05-23 15:19 ` Florian Fainelli @ 2025-06-03 20:39 ` Abhishek Chauhan (ABC) 2025-06-03 21:13 ` Russell King (Oracle) 2025-06-04 6:00 ` Wei Fang 0 siblings, 2 replies; 13+ messages in thread From: Abhishek Chauhan (ABC) @ 2025-06-03 20:39 UTC (permalink / raw) To: Florian Fainelli, Wei Fang, andrew, hkallweit1, linux, davem, edumazet, kuba, pabeni, xiaolei.wang Cc: netdev, linux-kernel, imx On 5/23/2025 8:19 AM, Florian Fainelli wrote: > > > On 5/23/2025 1:37 AM, Wei Fang wrote: >> There is a potential crash issue when disabling and re-enabling the >> network port. When disabling the network port, phy_detach() calls >> device_link_del() to remove the device link, but it does not clear >> phydev->devlink, so phydev->devlink is not a NULL pointer. Then the >> network port is re-enabled, but if phy_attach_direct() fails before >> calling device_link_add(), the code jumps to the "error" label and >> calls phy_detach(). Since phydev->devlink retains the old value from >> the previous attach/detach cycle, device_link_del() uses the old value, >> which accesses a NULL pointer and causes a crash. The simplified crash >> log is as follows. >> >> [ 24.702421] Call trace: >> [ 24.704856] device_link_put_kref+0x20/0x120 >> [ 24.709124] device_link_del+0x30/0x48 >> [ 24.712864] phy_detach+0x24/0x168 >> [ 24.716261] phy_attach_direct+0x168/0x3a4 >> [ 24.720352] phylink_fwnode_phy_connect+0xc8/0x14c >> [ 24.725140] phylink_of_phy_connect+0x1c/0x34 >> >> Therefore, phydev->devlink needs to be cleared when the device link is >> deleted. >> >> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev") >> Signed-off-by: Wei Fang <wei.fang@nxp.com> > @Wei What happens in case of shared mdio ? 1. Device 23040000 has the mdio node of both the ethernet phy and device 23000000 references the phy-handle present in the Device 23040000 2. When rmmod of the driver happens 3. the parent devlink is already deleted. 4. This cause the child mdio to access an entry causing a corruption. 5. Thought this fix would help but i see that its not helping the case. Wondering if this is a legacy issue with shared mdio framework. 43369.232799: qcom-ethqos 23040000.ethernet eth1: stmmac_dvr_remove: removing driver 43369.233782: stmmac_pcs: Link Down 43369.258337: qcom-ethqos 23040000.ethernet eth1: FPE workqueue stop 43369.258522: br1: port 1(eth1) entered disabled state 43369.758779: qcom-ethqos 23040000.ethernet eth1: Timeout accessing MAC_VLAN_Tag_Filter 43369.758789: qcom-ethqos 23040000.ethernet eth1: failed to kill vid 0081/0 43369.759270: qcom-ethqos 23040000.ethernet eth1 (unregistering): left allmulticast mode 43369.759275: qcom-ethqos 23040000.ethernet eth1 (unregistering): left promiscuous mode 43369.759301: br1: port 1(eth1) entered disabled state 43370.259618: qcom-ethqos 23040000.ethernet eth1 (unregistering): Timeout accessing MAC_VLAN_Tag_Filter 43370.309863: qcom-ethqos 23000000.ethernet eth0: stmmac_dvr_remove: removing driver 43370.310019: list_del corruption, ffffff80c6ec9408->prev is LIST_POISON2 (dead000000000122) 43370.310034: ------------[ cut here ]------------ 43370.310035: kernel BUG at lib/list_debug.c:59! 43370.310119: CPU: 4 PID: 3067767 Comm: rmmod Tainted: G W OE 6.6.65-rt47-debug #1 43370.310122: Hardware name: Qualcomm Technologies, Inc. SA8775P Ride (DT) 43370.310165: Call trace: 43370.310166: __list_del_entry_valid_or_report+0xa8/0xe0 43370.310168: __device_link_del+0x40/0xf0 43370.310172: device_link_put_kref+0xb4/0xc8 43370.310174: device_link_del+0x38/0x58 43370.310176: phy_detach+0x2c/0x170 43370.310181: phy_disconnect+0x4c/0x70 43370.310184: phylink_disconnect_phy+0x6c/0xc0 [phylink] 43370.310194: stmmac_release+0x60/0x358 [stmmac] 43370.310210: __dev_close_many+0xb4/0x160 43370.310213: dev_close_many+0xbc/0x1a0 43370.310215: unregister_netdevice_many_notify+0x178/0x870 43370.310218: unregister_netdevice_queue+0xf8/0x140 43370.310221: unregister_netdev+0x2c/0x48 43370.310223: stmmac_dvr_remove+0xd0/0x1b0 [stmmac] 43370.310233: devm_stmmac_pltfr_remove+0x2c/0x58 [stmmac_platform] > Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted 2025-06-03 20:39 ` Abhishek Chauhan (ABC) @ 2025-06-03 21:13 ` Russell King (Oracle) 2025-06-04 6:00 ` Wei Fang 1 sibling, 0 replies; 13+ messages in thread From: Russell King (Oracle) @ 2025-06-03 21:13 UTC (permalink / raw) To: Abhishek Chauhan (ABC) Cc: Florian Fainelli, Wei Fang, andrew, hkallweit1, davem, edumazet, kuba, pabeni, xiaolei.wang, netdev, linux-kernel, imx On Tue, Jun 03, 2025 at 01:39:47PM -0700, Abhishek Chauhan (ABC) wrote: > > > On 5/23/2025 8:19 AM, Florian Fainelli wrote: > > > > > > On 5/23/2025 1:37 AM, Wei Fang wrote: > >> There is a potential crash issue when disabling and re-enabling the > >> network port. When disabling the network port, phy_detach() calls > >> device_link_del() to remove the device link, but it does not clear > >> phydev->devlink, so phydev->devlink is not a NULL pointer. Then the > >> network port is re-enabled, but if phy_attach_direct() fails before > >> calling device_link_add(), the code jumps to the "error" label and > >> calls phy_detach(). Since phydev->devlink retains the old value from > >> the previous attach/detach cycle, device_link_del() uses the old value, > >> which accesses a NULL pointer and causes a crash. The simplified crash > >> log is as follows. > >> > >> [ 24.702421] Call trace: > >> [ 24.704856] device_link_put_kref+0x20/0x120 > >> [ 24.709124] device_link_del+0x30/0x48 > >> [ 24.712864] phy_detach+0x24/0x168 > >> [ 24.716261] phy_attach_direct+0x168/0x3a4 > >> [ 24.720352] phylink_fwnode_phy_connect+0xc8/0x14c > >> [ 24.725140] phylink_of_phy_connect+0x1c/0x34 > >> > >> Therefore, phydev->devlink needs to be cleared when the device link is > >> deleted. > >> > >> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev") > >> Signed-off-by: Wei Fang <wei.fang@nxp.com> > > > @Wei > What happens in case of shared mdio ? > > 1. Device 23040000 has the mdio node of both the ethernet phy and device 23000000 references the phy-handle present in the Device 23040000 > 2. When rmmod of the driver happens > 3. the parent devlink is already deleted. > 4. This cause the child mdio to access an entry causing a corruption. > 5. Thought this fix would help but i see that its not helping the case. > > Wondering if this is a legacy issue with shared mdio framework. The device link does nothing for this as it has DL_FLAG_STATELESS set, which only affects suspend/resume/shutdown ordering, and with DL_FLAG_PM_RUNTIME also set, runtime PM. The device probe/removal ordering is unaffected. Maybe that's a problem, but it needs careful consideration to change. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted 2025-06-03 20:39 ` Abhishek Chauhan (ABC) 2025-06-03 21:13 ` Russell King (Oracle) @ 2025-06-04 6:00 ` Wei Fang 2025-06-04 6:09 ` Abhishek Chauhan (ABC) 2025-06-04 8:07 ` Russell King (Oracle) 1 sibling, 2 replies; 13+ messages in thread From: Wei Fang @ 2025-06-04 6:00 UTC (permalink / raw) To: Abhishek Chauhan (ABC) Cc: Florian Fainelli, andrew@lunn.ch, hkallweit1@gmail.com, linux@armlinux.org.uk, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, xiaolei.wang@windriver.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, imx@lists.linux.dev > > On 5/23/2025 1:37 AM, Wei Fang wrote: > >> There is a potential crash issue when disabling and re-enabling the > >> network port. When disabling the network port, phy_detach() calls > >> device_link_del() to remove the device link, but it does not clear > >> phydev->devlink, so phydev->devlink is not a NULL pointer. Then the > >> network port is re-enabled, but if phy_attach_direct() fails before > >> calling device_link_add(), the code jumps to the "error" label and > >> calls phy_detach(). Since phydev->devlink retains the old value from > >> the previous attach/detach cycle, device_link_del() uses the old value, > >> which accesses a NULL pointer and causes a crash. The simplified crash > >> log is as follows. > >> > >> [ 24.702421] Call trace: > >> [ 24.704856] device_link_put_kref+0x20/0x120 > >> [ 24.709124] device_link_del+0x30/0x48 > >> [ 24.712864] phy_detach+0x24/0x168 > >> [ 24.716261] phy_attach_direct+0x168/0x3a4 > >> [ 24.720352] phylink_fwnode_phy_connect+0xc8/0x14c > >> [ 24.725140] phylink_of_phy_connect+0x1c/0x34 > >> > >> Therefore, phydev->devlink needs to be cleared when the device link is > >> deleted. > >> > >> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev") > >> Signed-off-by: Wei Fang <wei.fang@nxp.com> > > > @Wei > What happens in case of shared mdio ? > > 1. Device 23040000 has the mdio node of both the ethernet phy and device > 23000000 references the phy-handle present in the Device 23040000 > 2. When rmmod of the driver happens > 3. the parent devlink is already deleted. > 4. This cause the child mdio to access an entry causing a corruption. > 5. Thought this fix would help but i see that its not helping the case. > My patch is only to fix the potential crash issue when re-enabling the network interface. phy_detach() is not called when the MDIO controller driver is removed. So phydev->devlink is not cleared, but actually the device link has been removed by phy_device_remove() --> device_del(). Therefore, it will cause the crash when the MAC controller driver is removed. > Wondering if this is a legacy issue with shared mdio framework. > I think this issue is also introduced by the commit bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev"). I suggested to change the DL_FLAG_STATELESS flag to DL_FLAG_AUTOREMOVE_SUPPLIER to solve this issue, so that the consumer (MAC controller) driver will be automatically removed when the link is removed. The changes are as follows. diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 73f9cb2e2844..a6d7acd73391 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1515,6 +1515,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, struct mii_bus *bus = phydev->mdio.bus; struct device *d = &phydev->mdio.dev; struct module *ndev_owner = NULL; + struct device_link *devlink; bool using_genphy = false; int err; @@ -1646,9 +1647,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, * another mac interface, so we should create a device link between * phy dev and mac dev. */ - if (dev && phydev->mdio.bus->parent && dev->dev.parent != phydev->mdio.bus->parent) - phydev->devlink = device_link_add(dev->dev.parent, &phydev->mdio.dev, - DL_FLAG_PM_RUNTIME | DL_FLAG_STATELESS); + if (dev && phydev->mdio.bus->parent && + dev->dev.parent != phydev->mdio.bus->parent) { + devlink = device_link_add(dev->dev.parent, &phydev->mdio.dev, + DL_FLAG_PM_RUNTIME | + DL_FLAG_AUTOREMOVE_SUPPLIER); + if (!devlink) { + err = -ENOMEM; + goto error; + } + } return err; @@ -1749,11 +1757,6 @@ void phy_detach(struct phy_device *phydev) struct module *ndev_owner = NULL; struct mii_bus *bus; - if (phydev->devlink) { - device_link_del(phydev->devlink); - phydev->devlink = NULL; - } - if (phydev->sysfs_links) { if (dev) sysfs_remove_link(&dev->dev.kobj, "phydev"); diff --git a/include/linux/phy.h b/include/linux/phy.h index e194dad1623d..cc1f45c3ff21 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -505,8 +505,6 @@ struct macsec_ops; * * @mdio: MDIO bus this PHY is on * @drv: Pointer to the driver for this PHY instance - * @devlink: Create a link between phy dev and mac dev, if the external phy - * used by current mac interface is managed by another mac interface. * @phyindex: Unique id across the phy's parent tree of phys to address the PHY * from userspace, similar to ifindex. A zero index means the PHY * wasn't assigned an id yet. @@ -610,8 +608,6 @@ struct phy_device { /* And management functions */ const struct phy_driver *drv; - struct device_link *devlink; - u32 phyindex; u32 phy_id; ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted 2025-06-04 6:00 ` Wei Fang @ 2025-06-04 6:09 ` Abhishek Chauhan (ABC) 2025-06-05 4:59 ` Sarosh Hasan 2025-06-04 8:07 ` Russell King (Oracle) 1 sibling, 1 reply; 13+ messages in thread From: Abhishek Chauhan (ABC) @ 2025-06-04 6:09 UTC (permalink / raw) To: Wei Fang Cc: Florian Fainelli, andrew@lunn.ch, hkallweit1@gmail.com, linux@armlinux.org.uk, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, xiaolei.wang@windriver.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, imx@lists.linux.dev, Sarosh Hasan On 6/3/2025 11:00 PM, Wei Fang wrote: >>> On 5/23/2025 1:37 AM, Wei Fang wrote: >>>> There is a potential crash issue when disabling and re-enabling the >>>> network port. When disabling the network port, phy_detach() calls >>>> device_link_del() to remove the device link, but it does not clear >>>> phydev->devlink, so phydev->devlink is not a NULL pointer. Then the >>>> network port is re-enabled, but if phy_attach_direct() fails before >>>> calling device_link_add(), the code jumps to the "error" label and >>>> calls phy_detach(). Since phydev->devlink retains the old value from >>>> the previous attach/detach cycle, device_link_del() uses the old value, >>>> which accesses a NULL pointer and causes a crash. The simplified crash >>>> log is as follows. >>>> >>>> [ 24.702421] Call trace: >>>> [ 24.704856] device_link_put_kref+0x20/0x120 >>>> [ 24.709124] device_link_del+0x30/0x48 >>>> [ 24.712864] phy_detach+0x24/0x168 >>>> [ 24.716261] phy_attach_direct+0x168/0x3a4 >>>> [ 24.720352] phylink_fwnode_phy_connect+0xc8/0x14c >>>> [ 24.725140] phylink_of_phy_connect+0x1c/0x34 >>>> >>>> Therefore, phydev->devlink needs to be cleared when the device link is >>>> deleted. >>>> >>>> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev") >>>> Signed-off-by: Wei Fang <wei.fang@nxp.com> >>> >> @Wei >> What happens in case of shared mdio ? >> >> 1. Device 23040000 has the mdio node of both the ethernet phy and device >> 23000000 references the phy-handle present in the Device 23040000 >> 2. When rmmod of the driver happens >> 3. the parent devlink is already deleted. >> 4. This cause the child mdio to access an entry causing a corruption. >> 5. Thought this fix would help but i see that its not helping the case. >> > > My patch is only to fix the potential crash issue when re-enabling > the network interface. phy_detach() is not called when the MDIO > controller driver is removed. So phydev->devlink is not cleared, but > actually the device link has been removed by phy_device_remove() > --> device_del(). Therefore, it will cause the crash when the MAC > controller driver is removed. > >> Wondering if this is a legacy issue with shared mdio framework. >> > > I think this issue is also introduced by the commit bc66fa87d4fd > ("net: phy: Add link between phy dev and mac dev"). I suggested > to change the DL_FLAG_STATELESS flag to > DL_FLAG_AUTOREMOVE_SUPPLIER to solve this issue, so that > the consumer (MAC controller) driver will be automatically removed > when the link is removed. The changes are as follows. > thanks a lot , Russell and Wei for your prompt response. I appreciate your help. let me test this change and get back. > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index 73f9cb2e2844..a6d7acd73391 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -1515,6 +1515,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, > struct mii_bus *bus = phydev->mdio.bus; > struct device *d = &phydev->mdio.dev; > struct module *ndev_owner = NULL; > + struct device_link *devlink; > bool using_genphy = false; > int err; > > @@ -1646,9 +1647,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, > * another mac interface, so we should create a device link between > * phy dev and mac dev. > */ > - if (dev && phydev->mdio.bus->parent && dev->dev.parent != phydev->mdio.bus->parent) > - phydev->devlink = device_link_add(dev->dev.parent, &phydev->mdio.dev, > - DL_FLAG_PM_RUNTIME | DL_FLAG_STATELESS); > + if (dev && phydev->mdio.bus->parent && > + dev->dev.parent != phydev->mdio.bus->parent) { > + devlink = device_link_add(dev->dev.parent, &phydev->mdio.dev, > + DL_FLAG_PM_RUNTIME | > + DL_FLAG_AUTOREMOVE_SUPPLIER); > + if (!devlink) { > + err = -ENOMEM; > + goto error; > + } > + } > > return err; > > @@ -1749,11 +1757,6 @@ void phy_detach(struct phy_device *phydev) > struct module *ndev_owner = NULL; > struct mii_bus *bus; > > - if (phydev->devlink) { > - device_link_del(phydev->devlink); > - phydev->devlink = NULL; > - } > - > if (phydev->sysfs_links) { > if (dev) > sysfs_remove_link(&dev->dev.kobj, "phydev"); > diff --git a/include/linux/phy.h b/include/linux/phy.h > index e194dad1623d..cc1f45c3ff21 100644 > --- a/include/linux/phy.h > +++ b/include/linux/phy.h > @@ -505,8 +505,6 @@ struct macsec_ops; > * > * @mdio: MDIO bus this PHY is on > * @drv: Pointer to the driver for this PHY instance > - * @devlink: Create a link between phy dev and mac dev, if the external phy > - * used by current mac interface is managed by another mac interface. > * @phyindex: Unique id across the phy's parent tree of phys to address the PHY > * from userspace, similar to ifindex. A zero index means the PHY > * wasn't assigned an id yet. > @@ -610,8 +608,6 @@ struct phy_device { > /* And management functions */ > const struct phy_driver *drv; > > - struct device_link *devlink; > - > u32 phyindex; > u32 phy_id; > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted 2025-06-04 6:09 ` Abhishek Chauhan (ABC) @ 2025-06-05 4:59 ` Sarosh Hasan 0 siblings, 0 replies; 13+ messages in thread From: Sarosh Hasan @ 2025-06-05 4:59 UTC (permalink / raw) To: Abhishek Chauhan (ABC), Wei Fang Cc: Florian Fainelli, andrew@lunn.ch, hkallweit1@gmail.com, linux@armlinux.org.uk, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, xiaolei.wang@windriver.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, imx@lists.linux.dev On 6/4/2025 11:39 AM, Abhishek Chauhan (ABC) wrote: > > > On 6/3/2025 11:00 PM, Wei Fang wrote: >>>> On 5/23/2025 1:37 AM, Wei Fang wrote: >>>>> There is a potential crash issue when disabling and re-enabling the >>>>> network port. When disabling the network port, phy_detach() calls >>>>> device_link_del() to remove the device link, but it does not clear >>>>> phydev->devlink, so phydev->devlink is not a NULL pointer. Then the >>>>> network port is re-enabled, but if phy_attach_direct() fails before >>>>> calling device_link_add(), the code jumps to the "error" label and >>>>> calls phy_detach(). Since phydev->devlink retains the old value from >>>>> the previous attach/detach cycle, device_link_del() uses the old value, >>>>> which accesses a NULL pointer and causes a crash. The simplified crash >>>>> log is as follows. >>>>> >>>>> [ 24.702421] Call trace: >>>>> [ 24.704856] device_link_put_kref+0x20/0x120 >>>>> [ 24.709124] device_link_del+0x30/0x48 >>>>> [ 24.712864] phy_detach+0x24/0x168 >>>>> [ 24.716261] phy_attach_direct+0x168/0x3a4 >>>>> [ 24.720352] phylink_fwnode_phy_connect+0xc8/0x14c >>>>> [ 24.725140] phylink_of_phy_connect+0x1c/0x34 >>>>> >>>>> Therefore, phydev->devlink needs to be cleared when the device link is >>>>> deleted. >>>>> >>>>> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev") >>>>> Signed-off-by: Wei Fang <wei.fang@nxp.com> >>>> >>> @Wei >>> What happens in case of shared mdio ? >>> >>> 1. Device 23040000 has the mdio node of both the ethernet phy and device >>> 23000000 references the phy-handle present in the Device 23040000 >>> 2. When rmmod of the driver happens >>> 3. the parent devlink is already deleted. >>> 4. This cause the child mdio to access an entry causing a corruption. >>> 5. Thought this fix would help but i see that its not helping the case. >>> >> >> My patch is only to fix the potential crash issue when re-enabling >> the network interface. phy_detach() is not called when the MDIO >> controller driver is removed. So phydev->devlink is not cleared, but >> actually the device link has been removed by phy_device_remove() >> --> device_del(). Therefore, it will cause the crash when the MAC >> controller driver is removed. >> >>> Wondering if this is a legacy issue with shared mdio framework. >>> >> >> I think this issue is also introduced by the commit bc66fa87d4fd >> ("net: phy: Add link between phy dev and mac dev"). I suggested >> to change the DL_FLAG_STATELESS flag to >> DL_FLAG_AUTOREMOVE_SUPPLIER to solve this issue, so that >> the consumer (MAC controller) driver will be automatically removed >> when the link is removed. The changes are as follows. >> > > thanks a lot , Russell and Wei for your prompt response. > I appreciate your help. let me test this change and get back. > Myself and Abhishek tested this suggestion from Wei and we now observe that the driver removal is successful without a crash. It looks like the flag 'DL_FLAG_AUTOREMOVE_SUPPLIER' is helping. Wei and Russell, please suggest the next steps. >> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c >> index 73f9cb2e2844..a6d7acd73391 100644 >> --- a/drivers/net/phy/phy_device.c >> +++ b/drivers/net/phy/phy_device.c >> @@ -1515,6 +1515,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, >> struct mii_bus *bus = phydev->mdio.bus; >> struct device *d = &phydev->mdio.dev; >> struct module *ndev_owner = NULL; >> + struct device_link *devlink; >> bool using_genphy = false; >> int err; >> >> @@ -1646,9 +1647,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, >> * another mac interface, so we should create a device link between >> * phy dev and mac dev. >> */ >> - if (dev && phydev->mdio.bus->parent && dev->dev.parent != phydev->mdio.bus->parent) >> - phydev->devlink = device_link_add(dev->dev.parent, &phydev->mdio.dev, >> - DL_FLAG_PM_RUNTIME | DL_FLAG_STATELESS); >> + if (dev && phydev->mdio.bus->parent && >> + dev->dev.parent != phydev->mdio.bus->parent) { >> + devlink = device_link_add(dev->dev.parent, &phydev->mdio.dev, >> + DL_FLAG_PM_RUNTIME | >> + DL_FLAG_AUTOREMOVE_SUPPLIER); >> + if (!devlink) { >> + err = -ENOMEM; >> + goto error; >> + } >> + } >> >> return err; >> >> @@ -1749,11 +1757,6 @@ void phy_detach(struct phy_device *phydev) >> struct module *ndev_owner = NULL; >> struct mii_bus *bus; >> >> - if (phydev->devlink) { >> - device_link_del(phydev->devlink); >> - phydev->devlink = NULL; >> - } >> - >> if (phydev->sysfs_links) { >> if (dev) >> sysfs_remove_link(&dev->dev.kobj, "phydev"); >> diff --git a/include/linux/phy.h b/include/linux/phy.h >> index e194dad1623d..cc1f45c3ff21 100644 >> --- a/include/linux/phy.h >> +++ b/include/linux/phy.h >> @@ -505,8 +505,6 @@ struct macsec_ops; >> * >> * @mdio: MDIO bus this PHY is on >> * @drv: Pointer to the driver for this PHY instance >> - * @devlink: Create a link between phy dev and mac dev, if the external phy >> - * used by current mac interface is managed by another mac interface. >> * @phyindex: Unique id across the phy's parent tree of phys to address the PHY >> * from userspace, similar to ifindex. A zero index means the PHY >> * wasn't assigned an id yet. >> @@ -610,8 +608,6 @@ struct phy_device { >> /* And management functions */ >> const struct phy_driver *drv; >> >> - struct device_link *devlink; >> - >> u32 phyindex; >> u32 phy_id; >> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted 2025-06-04 6:00 ` Wei Fang 2025-06-04 6:09 ` Abhishek Chauhan (ABC) @ 2025-06-04 8:07 ` Russell King (Oracle) 2025-06-04 12:08 ` Wei Fang 1 sibling, 1 reply; 13+ messages in thread From: Russell King (Oracle) @ 2025-06-04 8:07 UTC (permalink / raw) To: Wei Fang Cc: Abhishek Chauhan (ABC), Florian Fainelli, andrew@lunn.ch, hkallweit1@gmail.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, xiaolei.wang@windriver.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, imx@lists.linux.dev On Wed, Jun 04, 2025 at 06:00:54AM +0000, Wei Fang wrote: > I think this issue is also introduced by the commit bc66fa87d4fd > ("net: phy: Add link between phy dev and mac dev"). I suggested > to change the DL_FLAG_STATELESS flag to > DL_FLAG_AUTOREMOVE_SUPPLIER to solve this issue, so that > the consumer (MAC controller) driver will be automatically removed > when the link is removed. The changes are as follows. I suspect this still has problems. This is fine if the PHY device is going away and as you say device_del() is called. However, you need to consider the case where a MAC driver attaches the PHY during .ndo_open and releases it during .ndo_release. These will happen multiple times. Each time the MAC driver attaches to the PHY via .ndo_open, we will call device_link_add(), but the device link will not be removed when .ndo_release is called. Either device_link_add() will fail, or we will eat memory each time the device is closed and re-opened. If that is correct, then we're trading one problem for another. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted 2025-06-04 8:07 ` Russell King (Oracle) @ 2025-06-04 12:08 ` Wei Fang 2025-06-06 0:52 ` Abhishek Chauhan (ABC) 0 siblings, 1 reply; 13+ messages in thread From: Wei Fang @ 2025-06-04 12:08 UTC (permalink / raw) To: Russell King Cc: Abhishek Chauhan (ABC), Florian Fainelli, andrew@lunn.ch, hkallweit1@gmail.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, xiaolei.wang@windriver.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, imx@lists.linux.dev > On Wed, Jun 04, 2025 at 06:00:54AM +0000, Wei Fang wrote: > > I think this issue is also introduced by the commit bc66fa87d4fd > > ("net: phy: Add link between phy dev and mac dev"). I suggested > > to change the DL_FLAG_STATELESS flag to > > DL_FLAG_AUTOREMOVE_SUPPLIER to solve this issue, so that > > the consumer (MAC controller) driver will be automatically removed > > when the link is removed. The changes are as follows. > > I suspect this still has problems. This is fine if the PHY device is > going away and as you say device_del() is called. > > However, you need to consider the case where a MAC driver attaches the > PHY during .ndo_open and releases it during .ndo_release. These will > happen multiple times. .ndo_release? Do you mean .ndo_stop? > > Each time the MAC driver attaches to the PHY via .ndo_open, we will > call device_link_add(), but the device link will not be removed when > .ndo_release is called. > > Either device_link_add() will fail, or we will eat memory each time > the device is closed and re-opened. Below is what I find in the kernel doc of device_link_add(). https://elixir.bootlin.com/linux/v6.15/source/drivers/base/core.c#L711 if a device link between the given @consumer and @supplier pair exists already when this function is called for them, the existing link will be returned regardless of its current type and status. Therefore, it will not create new link each time the netdev is re-opened. > > If that is correct, then we're trading one problem for another. > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted 2025-06-04 12:08 ` Wei Fang @ 2025-06-06 0:52 ` Abhishek Chauhan (ABC) 2025-06-06 5:26 ` Wei Fang 0 siblings, 1 reply; 13+ messages in thread From: Abhishek Chauhan (ABC) @ 2025-06-06 0:52 UTC (permalink / raw) To: Wei Fang, Russell King Cc: Florian Fainelli, andrew@lunn.ch, hkallweit1@gmail.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, xiaolei.wang@windriver.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, imx@lists.linux.dev, Sarosh Hasan On 6/4/2025 5:08 AM, Wei Fang wrote: >> On Wed, Jun 04, 2025 at 06:00:54AM +0000, Wei Fang wrote: >>> I think this issue is also introduced by the commit bc66fa87d4fd >>> ("net: phy: Add link between phy dev and mac dev"). I suggested >>> to change the DL_FLAG_STATELESS flag to >>> DL_FLAG_AUTOREMOVE_SUPPLIER to solve this issue, so that >>> the consumer (MAC controller) driver will be automatically removed >>> when the link is removed. The changes are as follows. >> >> I suspect this still has problems. This is fine if the PHY device is >> going away and as you say device_del() is called. >> >> However, you need to consider the case where a MAC driver attaches the >> PHY during .ndo_open and releases it during .ndo_release. These will >> happen multiple times. > > .ndo_release? Do you mean .ndo_stop? > >> >> Each time the MAC driver attaches to the PHY via .ndo_open, we will >> call device_link_add(), but the device link will not be removed when >> .ndo_release is called. >> >> Either device_link_add() will fail, or we will eat memory each time >> the device is closed and re-opened. > > Below is what I find in the kernel doc of device_link_add(). > https://elixir.bootlin.com/linux/v6.15/source/drivers/base/core.c#L711 > > if a device link between the given @consumer and @supplier pair > exists already when this function is called for them, the existing link will > be returned regardless of its current type and status. > > Therefore, it will not create new link each time the netdev is re-opened. > @Wei, We were able to verify the suggestion what you gave us. No crash is seen now. Should we raise a patch or shall we wait until this is clarified and Russell has no more further questions. ? I am okay with anything. >> >> If that is correct, then we're trading one problem for another. >> > ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted 2025-06-06 0:52 ` Abhishek Chauhan (ABC) @ 2025-06-06 5:26 ` Wei Fang 0 siblings, 0 replies; 13+ messages in thread From: Wei Fang @ 2025-06-06 5:26 UTC (permalink / raw) To: Abhishek Chauhan (ABC), Russell King Cc: Florian Fainelli, andrew@lunn.ch, hkallweit1@gmail.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, xiaolei.wang@windriver.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, imx@lists.linux.dev, Sarosh Hasan Best Regards, Wei Fang > >> On Wed, Jun 04, 2025 at 06:00:54AM +0000, Wei Fang wrote: > >>> I think this issue is also introduced by the commit bc66fa87d4fd > >>> ("net: phy: Add link between phy dev and mac dev"). I suggested to > >>> change the DL_FLAG_STATELESS flag to DL_FLAG_AUTOREMOVE_SUPPLIER to > >>> solve this issue, so that the consumer (MAC controller) driver will > >>> be automatically removed when the link is removed. The changes are > >>> as follows. > >> > >> I suspect this still has problems. This is fine if the PHY device is > >> going away and as you say device_del() is called. > >> > >> However, you need to consider the case where a MAC driver attaches > >> the PHY during .ndo_open and releases it during .ndo_release. These > >> will happen multiple times. > > > > .ndo_release? Do you mean .ndo_stop? > > > >> > >> Each time the MAC driver attaches to the PHY via .ndo_open, we will > >> call device_link_add(), but the device link will not be removed when > >> .ndo_release is called. > >> > >> Either device_link_add() will fail, or we will eat memory each time > >> the device is closed and re-opened. > > > > Below is what I find in the kernel doc of device_link_add(). > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felix > > > ir.bootlin.com%2Flinux%2Fv6.15%2Fsource%2Fdrivers%2Fbase%2Fcore.c%23L7 > > > 11&data=05%7C02%7Cwei.fang%40nxp.com%7C078b59e45bd54f13cfea08dda > 49477a > > > 4%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638847679768356 > 722%7CUn > > > known%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCI > sIlAiOi > > > JXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=iFtK > o%2F > > NfwPVYbxLBtZM8Y15Aa9eomUNhPTRfnEb0L7c%3D&reserved=0 > > > > if a device link between the given @consumer and @supplier pair exists > > already when this function is called for them, the existing link will > > be returned regardless of its current type and status. > > > > Therefore, it will not create new link each time the netdev is re-opened. > > > @Wei, > > We were able to verify the suggestion what you gave us. > No crash is seen now. > Should we raise a patch or shall we wait until this is clarified and Russell has no > more further questions. ? > I'm okay, but I don't know if Russel has any other concerns. And if there are no more comments, you can raise a patch. :) > I am okay with anything. > > >> > >> If that is correct, then we're trading one problem for another. > >> > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted 2025-05-23 8:37 [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted Wei Fang 2025-05-23 13:18 ` Andrew Lunn 2025-05-23 15:19 ` Florian Fainelli @ 2025-05-28 0:50 ` patchwork-bot+netdevbpf 2 siblings, 0 replies; 13+ messages in thread From: patchwork-bot+netdevbpf @ 2025-05-28 0:50 UTC (permalink / raw) To: Wei Fang Cc: andrew, hkallweit1, linux, davem, edumazet, kuba, pabeni, f.fainelli, xiaolei.wang, netdev, linux-kernel, imx Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Fri, 23 May 2025 16:37:59 +0800 you wrote: > There is a potential crash issue when disabling and re-enabling the > network port. When disabling the network port, phy_detach() calls > device_link_del() to remove the device link, but it does not clear > phydev->devlink, so phydev->devlink is not a NULL pointer. Then the > network port is re-enabled, but if phy_attach_direct() fails before > calling device_link_add(), the code jumps to the "error" label and > calls phy_detach(). Since phydev->devlink retains the old value from > the previous attach/detach cycle, device_link_del() uses the old value, > which accesses a NULL pointer and causes a crash. The simplified crash > log is as follows. > > [...] Here is the summary with links: - [v2,net] net: phy: clear phydev->devlink when the link is deleted https://git.kernel.org/netdev/net/c/0795b05a59b1 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-06-06 5:26 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-23 8:37 [PATCH v2 net] net: phy: clear phydev->devlink when the link is deleted Wei Fang 2025-05-23 13:18 ` Andrew Lunn 2025-05-23 15:19 ` Florian Fainelli 2025-06-03 20:39 ` Abhishek Chauhan (ABC) 2025-06-03 21:13 ` Russell King (Oracle) 2025-06-04 6:00 ` Wei Fang 2025-06-04 6:09 ` Abhishek Chauhan (ABC) 2025-06-05 4:59 ` Sarosh Hasan 2025-06-04 8:07 ` Russell King (Oracle) 2025-06-04 12:08 ` Wei Fang 2025-06-06 0:52 ` Abhishek Chauhan (ABC) 2025-06-06 5:26 ` Wei Fang 2025-05-28 0:50 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).