public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Maxime Chevallier <maxime.chevallier@bootlin.com>
To: "Russell King (Oracle)" <linux@armlinux.org.uk>
Cc: Wei Fang <wei.fang@nxp.com>,
	andrew@lunn.ch, hkallweit1@gmail.com, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	florian.fainelli@broadcom.com, xiaolei.wang@windriver.com,
	quic_abchauha@quicinc.com, quic_sarohasa@quicinc.com,
	imx@lists.linux.dev, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 net] net: phy: change devlink flag to AUTOREMOVE_SUPPLIER for non-SFP PHYs
Date: Mon, 2 Feb 2026 18:38:48 +0100	[thread overview]
Message-ID: <0689a2ab-352a-4275-9662-0c45daf8a0b2@bootlin.com> (raw)
In-Reply-To: <aYCzxVTTZvok9_d1@shell.armlinux.org.uk>



On 02/02/2026 15:25, Russell King (Oracle) wrote:
> On Mon, Feb 02, 2026 at 12:10:41PM +0100, Maxime Chevallier wrote:
>> Hi Wei,
>>
>> On 02/02/2026 06:45, Wei Fang wrote:
>>> For the shared MDIO bus use case, multiple MACs will share the same MDIO
>>> bus. Therefore, these MACs all depend on this MDIO bus. If this shared
>>> MDIO bus is removed, all the PHY devices attached to this MDIO bus will
>>> also be removed. Consequently, the MAC driver should not access the PHY
>>> device, otherwise, it will lead to some potential crashes. Because the
>>> corresponding phydev and the mii_bus have been freed, some pointers have
>>> become invalid.
>>>
>>> For example. Abhishek reported a crash issue that occurred if the MDIO
>>> bus driver was removed first, followed by the MAC driver. The crash log
>>> is as below.
>>>
>>> Call trace:
>>>  __list_del_entry_valid_or_report+0xa8/0xe0
>>>  __device_link_del+0x40/0xf0
>>>  device_link_put_kref+0xb4/0xc8
>>>  device_link_del+0x38/0x58
>>>  phy_detach+0x2c/0x170
>>>  phy_disconnect+0x4c/0x70
>>>  phylink_disconnect_phy+0x6c/0xc0 [phylink]
>>>  stmmac_release+0x60/0x358 [stmmac]
>>>
>>> Another example is the i.MX95-15x15 platform which has two ENETC ports.
>>> When all the external PHYs are managed the EMDIO (the MDIO controller),
>>> if the enetc driver is removed after the EMDIO driver. Users will see
>>> the below crash log and the console is hanged.
>>>
>>> Call trace:
>>>  _phy_state_machine+0x230/0x36c (P)
>>>  phy_stop+0x74/0x190
>>>  phylink_stop+0x28/0xb8
>>>  enetc_close+0x28/0x8c
>>>  __dev_close_many+0xb4/0x1d8
>>>  netif_close_many+0x8c/0x13c
>>>  enetc4_pf_remove+0x2c/0x84
>>>  pci_device_remove+0x44/0xe8
>>>
>>> To address this issue, Sarosh Hasan tried to change the devlink flag to
>>> DL_FLAG_AUTOREMOVE_SUPPLIER [1], so that the MAC driver will be removed
>>> along with the PHY driver. However, the solution does not take into
>>> account the hot-swappable PHY devices (SFP PHYs), so when the PHY device
>>> is unplugged, the MAC driver will automatically be removed, which is not
>>> the expected behavior. This issue should not exist for SFP PHYs, so based
>>> on the Sarosh's patch, the flag is changed to DL_FLAG_AUTOREMOVE_SUPPLIER
>>> for non-SFP PHYs.
>>>
>>> Reported-by: Abhishek Chauhan (ABC) <quic_abchauha@quicinc.com>
>>> Closes: https://lore.kernel.org/all/d696a426-40bb-4c1a-b42d-990fb690de5e@quicinc.com/
>>> Link: https://lore.kernel.org/imx/20250703090041.23137-1-quic_sarohasa@quicinc.com/ # [1]
>>> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
>>> Suggested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
>>> Signed-off-by: Wei Fang <wei.fang@nxp.com>
>>
>> I gave that patch a test, with the following cases :
>>
>>  - On Macchiatobin (we have PHYs that share an mdiobus).
>> When unbinding a PHY, the MAC dissapears as well :
> 
> Correct, this is why these band-aids are harmful. One "device" can
> correspond with *multiple* network interfaces, and the loss of one
> PHY can have a *very* detrimental effect.
> 
> Consider the case where root-NFS is being used, and removing a PHY
> on another interface takes out the interface that root-NFS is
> using. Your machine is now dead in the water.

That's what I've been seeing. I unbound one PHY, it took out 3 netdevs
and I don't have log regarding "why". I guess there's devlink debug
knobs for that, but not enabled by default it seems.

However, we seem to have the issue even without this patch.

On MCBin, if I unbind eth1 for example, all 3 interfaces that are on CP1
are gone :

cd /sys/class/net/eth1/device/driver
echo f4000000.ethernet > unbind

only eth0 is now left. This is on net-next/main :(

For Wei's case where unbinding netdev 1 brings the mdio bus down, used
by PHY on netdev 2, we'd be also dead in the water as well no matter
what as well no ?

> In my opinion, we should be concentrating more on the issue behind
> the oops.
> 
> Given that this problem is because of the bus being removed, one
> thing that would help would be for the MDIO bus to be properly
> refcounted, and when the bus is unbound, to replace the bus ops
> with versions that return -ENXIO or similar under the MII bus
> lock. This would be easier of the MDIO bus ops were a separate struct
> to struct mii_bus.
> 
> Similar with the PHY itself - if the PHY is in-use, it should be
> refcounted to stop the struct phy_device from going away, and
> should we have the situation where the PHY driver is unbound,
> phydev->drv should be set to a set of dummy ops (under the phydev
> mutex and probably rtnl.)
> 
> It seems to me that throwing devlinks at this problem is giving us
> more problems than it's solving.
> 
> A graceful way to handle a MAC losing its PHY is for phylib to
> indicate that the PHY has gone down, rather than removing the
> network interface (and potentially a whole host of other network
> interfaces in the case of one struct device being associated
> with many interfaces.)
> 

Agreed, that's quite the can of worms though I suspect :(

Maxime


  reply	other threads:[~2026-02-02 17:39 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-02  5:45 [PATCH v2 net] net: phy: change devlink flag to AUTOREMOVE_SUPPLIER for non-SFP PHYs Wei Fang
2026-02-02 11:10 ` Maxime Chevallier
2026-02-02 14:25   ` Russell King (Oracle)
2026-02-02 17:38     ` Maxime Chevallier [this message]
2026-02-02 18:00       ` Russell King (Oracle)
2026-02-02 18:37         ` Maxime Chevallier
2026-02-03  5:14           ` Wei Fang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0689a2ab-352a-4275-9662-0c45daf8a0b2@bootlin.com \
    --to=maxime.chevallier@bootlin.com \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=florian.fainelli@broadcom.com \
    --cc=hkallweit1@gmail.com \
    --cc=imx@lists.linux.dev \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=quic_abchauha@quicinc.com \
    --cc=quic_sarohasa@quicinc.com \
    --cc=wei.fang@nxp.com \
    --cc=xiaolei.wang@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox