From: Maxime Chevallier <maxime.chevallier@bootlin.com>
To: "Russell King (Oracle)" <linux@armlinux.org.uk>
Cc: Wei Fang <wei.fang@nxp.com>,
andrew@lunn.ch, hkallweit1@gmail.com, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
florian.fainelli@broadcom.com, xiaolei.wang@windriver.com,
quic_abchauha@quicinc.com, quic_sarohasa@quicinc.com,
imx@lists.linux.dev, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 net] net: phy: change devlink flag to AUTOREMOVE_SUPPLIER for non-SFP PHYs
Date: Mon, 2 Feb 2026 18:38:48 +0100 [thread overview]
Message-ID: <0689a2ab-352a-4275-9662-0c45daf8a0b2@bootlin.com> (raw)
In-Reply-To: <aYCzxVTTZvok9_d1@shell.armlinux.org.uk>
On 02/02/2026 15:25, Russell King (Oracle) wrote:
> On Mon, Feb 02, 2026 at 12:10:41PM +0100, Maxime Chevallier wrote:
>> Hi Wei,
>>
>> On 02/02/2026 06:45, Wei Fang wrote:
>>> For the shared MDIO bus use case, multiple MACs will share the same MDIO
>>> bus. Therefore, these MACs all depend on this MDIO bus. If this shared
>>> MDIO bus is removed, all the PHY devices attached to this MDIO bus will
>>> also be removed. Consequently, the MAC driver should not access the PHY
>>> device, otherwise, it will lead to some potential crashes. Because the
>>> corresponding phydev and the mii_bus have been freed, some pointers have
>>> become invalid.
>>>
>>> For example. Abhishek reported a crash issue that occurred if the MDIO
>>> bus driver was removed first, followed by the MAC driver. The crash log
>>> is as below.
>>>
>>> Call trace:
>>> __list_del_entry_valid_or_report+0xa8/0xe0
>>> __device_link_del+0x40/0xf0
>>> device_link_put_kref+0xb4/0xc8
>>> device_link_del+0x38/0x58
>>> phy_detach+0x2c/0x170
>>> phy_disconnect+0x4c/0x70
>>> phylink_disconnect_phy+0x6c/0xc0 [phylink]
>>> stmmac_release+0x60/0x358 [stmmac]
>>>
>>> Another example is the i.MX95-15x15 platform which has two ENETC ports.
>>> When all the external PHYs are managed the EMDIO (the MDIO controller),
>>> if the enetc driver is removed after the EMDIO driver. Users will see
>>> the below crash log and the console is hanged.
>>>
>>> Call trace:
>>> _phy_state_machine+0x230/0x36c (P)
>>> phy_stop+0x74/0x190
>>> phylink_stop+0x28/0xb8
>>> enetc_close+0x28/0x8c
>>> __dev_close_many+0xb4/0x1d8
>>> netif_close_many+0x8c/0x13c
>>> enetc4_pf_remove+0x2c/0x84
>>> pci_device_remove+0x44/0xe8
>>>
>>> To address this issue, Sarosh Hasan tried to change the devlink flag to
>>> DL_FLAG_AUTOREMOVE_SUPPLIER [1], so that the MAC driver will be removed
>>> along with the PHY driver. However, the solution does not take into
>>> account the hot-swappable PHY devices (SFP PHYs), so when the PHY device
>>> is unplugged, the MAC driver will automatically be removed, which is not
>>> the expected behavior. This issue should not exist for SFP PHYs, so based
>>> on the Sarosh's patch, the flag is changed to DL_FLAG_AUTOREMOVE_SUPPLIER
>>> for non-SFP PHYs.
>>>
>>> Reported-by: Abhishek Chauhan (ABC) <quic_abchauha@quicinc.com>
>>> Closes: https://lore.kernel.org/all/d696a426-40bb-4c1a-b42d-990fb690de5e@quicinc.com/
>>> Link: https://lore.kernel.org/imx/20250703090041.23137-1-quic_sarohasa@quicinc.com/ # [1]
>>> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
>>> Suggested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
>>> Signed-off-by: Wei Fang <wei.fang@nxp.com>
>>
>> I gave that patch a test, with the following cases :
>>
>> - On Macchiatobin (we have PHYs that share an mdiobus).
>> When unbinding a PHY, the MAC dissapears as well :
>
> Correct, this is why these band-aids are harmful. One "device" can
> correspond with *multiple* network interfaces, and the loss of one
> PHY can have a *very* detrimental effect.
>
> Consider the case where root-NFS is being used, and removing a PHY
> on another interface takes out the interface that root-NFS is
> using. Your machine is now dead in the water.
That's what I've been seeing. I unbound one PHY, it took out 3 netdevs
and I don't have log regarding "why". I guess there's devlink debug
knobs for that, but not enabled by default it seems.
However, we seem to have the issue even without this patch.
On MCBin, if I unbind eth1 for example, all 3 interfaces that are on CP1
are gone :
cd /sys/class/net/eth1/device/driver
echo f4000000.ethernet > unbind
only eth0 is now left. This is on net-next/main :(
For Wei's case where unbinding netdev 1 brings the mdio bus down, used
by PHY on netdev 2, we'd be also dead in the water as well no matter
what as well no ?
> In my opinion, we should be concentrating more on the issue behind
> the oops.
>
> Given that this problem is because of the bus being removed, one
> thing that would help would be for the MDIO bus to be properly
> refcounted, and when the bus is unbound, to replace the bus ops
> with versions that return -ENXIO or similar under the MII bus
> lock. This would be easier of the MDIO bus ops were a separate struct
> to struct mii_bus.
>
> Similar with the PHY itself - if the PHY is in-use, it should be
> refcounted to stop the struct phy_device from going away, and
> should we have the situation where the PHY driver is unbound,
> phydev->drv should be set to a set of dummy ops (under the phydev
> mutex and probably rtnl.)
>
> It seems to me that throwing devlinks at this problem is giving us
> more problems than it's solving.
>
> A graceful way to handle a MAC losing its PHY is for phylib to
> indicate that the PHY has gone down, rather than removing the
> network interface (and potentially a whole host of other network
> interfaces in the case of one struct device being associated
> with many interfaces.)
>
Agreed, that's quite the can of worms though I suspect :(
Maxime
next prev parent reply other threads:[~2026-02-02 17:39 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-02 5:45 [PATCH v2 net] net: phy: change devlink flag to AUTOREMOVE_SUPPLIER for non-SFP PHYs Wei Fang
2026-02-02 11:10 ` Maxime Chevallier
2026-02-02 14:25 ` Russell King (Oracle)
2026-02-02 17:38 ` Maxime Chevallier [this message]
2026-02-02 18:00 ` Russell King (Oracle)
2026-02-02 18:37 ` Maxime Chevallier
2026-02-03 5:14 ` Wei Fang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0689a2ab-352a-4275-9662-0c45daf8a0b2@bootlin.com \
--to=maxime.chevallier@bootlin.com \
--cc=andrew@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=florian.fainelli@broadcom.com \
--cc=hkallweit1@gmail.com \
--cc=imx@lists.linux.dev \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=quic_abchauha@quicinc.com \
--cc=quic_sarohasa@quicinc.com \
--cc=wei.fang@nxp.com \
--cc=xiaolei.wang@windriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox