From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Niklas Cassel <cassel@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>,
Francesco Valla <francesco@valla.it>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Heiner Kallweit <hkallweit1@gmail.com>,
netdev@vger.kernel.org, Anand Moon <linux.amoon@gmail.com>
Subject: Re: [PATCH] net: phy: don't issue a module request if a driver is available
Date: Fri, 3 Jan 2025 14:03:43 +0000 [thread overview]
Message-ID: <Z3fuP47lueQpzMst@shell.armlinux.org.uk> (raw)
In-Reply-To: <Z3fc2jiJJDzbCHLu@ryzen>
On Fri, Jan 03, 2025 at 01:49:30PM +0100, Niklas Cassel wrote:
> On Fri, Jan 03, 2025 at 12:08:43PM +0000, Russell King (Oracle) wrote:
> > On Fri, Jan 03, 2025 at 12:25:52PM +0100, Niklas Cassel wrote:
> > > I'm trying to enable async probe for my PCIe controller (pcie-dw-rockchip),
> > > which on the radxa rock5b has a RTL8125 NIC connected to it.
> > >
> > > By enabling async probe for the PCIe driver I get the same splat as Francesco.
> > >
> > > Looking at the prints, it is trying to load a module for PHY ID: 0x1cc840
> > > This PHY ID is defined in: drivers/net/phy/realtek.c.
> > >
> > > Looking at my .config I have:
> > > CONFIG_REALTEK_PHY=y
> > >
> > > So this is not built as a module, so I am a bit surprised to see this
> > > splat (since the driver is built as built-in).
> > >
> > >
> > > I think it would be nice if the phylib core could be fixed so that
> > > it does not try to load modules for drivers which are built as built-in.
> > >
> > >
> > > Also see this old thread that tries to enable async probe by default on
> > > DT systems:
> > > https://lore.kernel.org/linux-kernel//d5796286-ec24-511a-5910-5673f8ea8b10@samsung.com/T/#u
> > >
> > > AFAICT, it seems that the phylib core is one of the biggest blockers from
> > > being able to enable async probe by default on DT systems.
> >
> > Yes, we accept that phylib is incompatible with async probing. I don't
> > think that's going to change, because it's fundamentally baked in with
> > the way the whole fallback driver stuff works.
> >
> > We *certainly* don't want to move the request_module() into
> > phy_attach*() (which is the point where we require the driver to be
> > bound or we fallback to the generic feature-reduced driver). First,
> > that *will* break SFP modules, no ifs or buts.
> >
> > Second, moving it there would mean calling request_module() in many
> > cases with the RTNL held, which blocks things like new connections
> > network establishing while the module is requested (I've run into this
> > problem when the TI Wilink driver locks up holding the RTNL lock making
> > the platform impossible to remotely resolve if there isn't an already
> > open SSH connection.) We certainly don't want userspace to be doing
> > stuff while holding the "big" RTNL that affects much of networking.
> >
> > Third, I suspect phylib already has a race between the PHY driver /
> > driver core binding the appropriate driver and phy_attach_direct()
> > attaching the fallback generic driver to the driverless PHY device,
> > and making this more "async" is going to open that race possibly to
> > the point where it becomes a problem. (At the moment, it doesn't
> > seem to cause any issue, so is theoretical right now - but if one
> > reads the code, it's obvious that there is no locking that prevents
> > a race there.)
> >
> > What saves phylib right now is that by issueing the request_module(),
> > that will wait for the module to be loaded and initialised. The
> > initialisation function will register the PHY drivers in this module.
> > As this is synchronous, it will happen before request_module() returns,
> > and thus before phy_device_create() returns. Thus, if there is a module
> > available for the PHY, it will be loaded and available to be bound
> > to the PHY device by the time phy_device_register() is called. This
> > ensures that - in the case of an auto-loaded module, the race will
> > never happen.
> >
> > Yes, it's weak. A scenario that could trigger this is loading PHY
> > driver modules in parallel with a call to the phy_attach*() functions,
> > e.g. when bringing up a network interface where the network driver
> > calls through to phy_attach*() from its .ndo_open() method. If we
> > simply make phylib's request_module() async, then this race will be
> > opened for auto-loaded modules as well.
> >
> > Closing this race to give consistent results is impossible, even if
> > we add locking. If phy_attach*() were to complete first, the generic
> > driver would be used despite the PHY specific driver module being
> > loaded. Alternatively, if the PHY specific driver module finishes
> > being loaded before phy_attach*() is called, then the PHY specific
> > driver will be used for the device. So... it needs to be synchronous.
> >
> > I also don't think "make a list of built-in drivers and omit the
> > request module" is an acceptable workaround - it's a sticky plaster
> > for the problem. If the PHY driver isn't built-in, then you have the
> > same problem with request_module() being issued. You could work around
> > that by ensuring that the PHY driver is built-in, but then we're
> > relying on multiple different things all being correct in diverse
> > areas, which is fragile.
>
> FWIW, the patch in $subject does make the splat go away for me.
> (I have the PHY driver built as built-in).
This is not about "making the splat go away". Making something go away
is not solving the problem if it's a fragile sticky plaster. Such things
come back to bite.
> The patch in $subject does "Add a list of registered drivers and check
> if one is already available before resorting to call request_module();
> in this way, if the PHY driver is already there, the MDIO bus can perform
> the async probe."
This is the sticky plaster. I covered that in my message to which you're
replying. See the last paragraph from my reply quoted above.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
next prev parent reply other threads:[~2025-01-03 14:03 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-01 23:51 [PATCH] net: phy: don't issue a module request if a driver is available Francesco Valla
2025-01-02 9:35 ` [EXTERNAL] " Suman Ghosh
2025-01-02 10:21 ` Francesco Valla
2025-01-03 7:47 ` Suman Ghosh
2025-01-02 11:06 ` Russell King (Oracle)
2025-01-02 13:26 ` Francesco Valla
2025-01-02 13:52 ` Andrew Lunn
2025-01-02 14:01 ` Francesco Valla
2025-01-02 14:21 ` Andrew Lunn
2025-01-03 18:00 ` Francesco Valla
2025-01-03 11:25 ` Niklas Cassel
2025-01-03 12:08 ` Russell King (Oracle)
2025-01-03 12:49 ` Niklas Cassel
2025-01-03 14:03 ` Russell King (Oracle) [this message]
2025-01-03 14:12 ` Andrew Lunn
2025-01-03 14:34 ` Niklas Cassel
2025-01-03 15:37 ` Andrew Lunn
2025-01-03 17:57 ` Francesco Valla
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z3fuP47lueQpzMst@shell.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=andrew+netdev@lunn.ch \
--cc=andrew@lunn.ch \
--cc=cassel@kernel.org \
--cc=francesco@valla.it \
--cc=hkallweit1@gmail.com \
--cc=linux.amoon@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).