From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Niklas Cassel <cassel@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>,
Francesco Valla <francesco@valla.it>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Heiner Kallweit <hkallweit1@gmail.com>,
netdev@vger.kernel.org, Anand Moon <linux.amoon@gmail.com>
Subject: Re: [PATCH] net: phy: don't issue a module request if a driver is available
Date: Fri, 3 Jan 2025 14:03:43 +0000 [thread overview]
Message-ID: <Z3fuP47lueQpzMst@shell.armlinux.org.uk> (raw)
In-Reply-To: <Z3fc2jiJJDzbCHLu@ryzen>
On Fri, Jan 03, 2025 at 01:49:30PM +0100, Niklas Cassel wrote:
> On Fri, Jan 03, 2025 at 12:08:43PM +0000, Russell King (Oracle) wrote:
> > On Fri, Jan 03, 2025 at 12:25:52PM +0100, Niklas Cassel wrote:
> > > I'm trying to enable async probe for my PCIe controller (pcie-dw-rockchip),
> > > which on the radxa rock5b has a RTL8125 NIC connected to it.
> > >
> > > By enabling async probe for the PCIe driver I get the same splat as Francesco.
> > >
> > > Looking at the prints, it is trying to load a module for PHY ID: 0x1cc840
> > > This PHY ID is defined in: drivers/net/phy/realtek.c.
> > >
> > > Looking at my .config I have:
> > > CONFIG_REALTEK_PHY=y
> > >
> > > So this is not built as a module, so I am a bit surprised to see this
> > > splat (since the driver is built as built-in).
> > >
> > >
> > > I think it would be nice if the phylib core could be fixed so that
> > > it does not try to load modules for drivers which are built as built-in.
> > >
> > >
> > > Also see this old thread that tries to enable async probe by default on
> > > DT systems:
> > > https://lore.kernel.org/linux-kernel//d5796286-ec24-511a-5910-5673f8ea8b10@samsung.com/T/#u
> > >
> > > AFAICT, it seems that the phylib core is one of the biggest blockers from
> > > being able to enable async probe by default on DT systems.
> >
> > Yes, we accept that phylib is incompatible with async probing. I don't
> > think that's going to change, because it's fundamentally baked in with
> > the way the whole fallback driver stuff works.
> >
> > We *certainly* don't want to move the request_module() into
> > phy_attach*() (which is the point where we require the driver to be
> > bound or we fallback to the generic feature-reduced driver). First,
> > that *will* break SFP modules, no ifs or buts.
> >
> > Second, moving it there would mean calling request_module() in many
> > cases with the RTNL held, which blocks things like new connections
> > network establishing while the module is requested (I've run into this
> > problem when the TI Wilink driver locks up holding the RTNL lock making
> > the platform impossible to remotely resolve if there isn't an already
> > open SSH connection.) We certainly don't want userspace to be doing
> > stuff while holding the "big" RTNL that affects much of networking.
> >
> > Third, I suspect phylib already has a race between the PHY driver /
> > driver core binding the appropriate driver and phy_attach_direct()
> > attaching the fallback generic driver to the driverless PHY device,
> > and making this more "async" is going to open that race possibly to
> > the point where it becomes a problem. (At the moment, it doesn't
> > seem to cause any issue, so is theoretical right now - but if one
> > reads the code, it's obvious that there is no locking that prevents
> > a race there.)
> >
> > What saves phylib right now is that by issueing the request_module(),
> > that will wait for the module to be loaded and initialised. The
> > initialisation function will register the PHY drivers in this module.
> > As this is synchronous, it will happen before request_module() returns,
> > and thus before phy_device_create() returns. Thus, if there is a module
> > available for the PHY, it will be loaded and available to be bound
> > to the PHY device by the time phy_device_register() is called. This
> > ensures that - in the case of an auto-loaded module, the race will
> > never happen.
> >
> > Yes, it's weak. A scenario that could trigger this is loading PHY
> > driver modules in parallel with a call to the phy_attach*() functions,
> > e.g. when bringing up a network interface where the network driver
> > calls through to phy_attach*() from its .ndo_open() method. If we
> > simply make phylib's request_module() async, then this race will be
> > opened for auto-loaded modules as well.
> >
> > Closing this race to give consistent results is impossible, even if
> > we add locking. If phy_attach*() were to complete first, the generic
> > driver would be used despite the PHY specific driver module being
> > loaded. Alternatively, if the PHY specific driver module finishes
> > being loaded before phy_attach*() is called, then the PHY specific
> > driver will be used for the device. So... it needs to be synchronous.
> >
> > I also don't think "make a list of built-in drivers and omit the
> > request module" is an acceptable workaround - it's a sticky plaster
> > for the problem. If the PHY driver isn't built-in, then you have the
> > same problem with request_module() being issued. You could work around
> > that by ensuring that the PHY driver is built-in, but then we're
> > relying on multiple different things all being correct in diverse
> > areas, which is fragile.
>
> FWIW, the patch in $subject does make the splat go away for me.
> (I have the PHY driver built as built-in).
This is not about "making the splat go away". Making something go away
is not solving the problem if it's a fragile sticky plaster. Such things
come back to bite.
> The patch in $subject does "Add a list of registered drivers and check
> if one is already available before resorting to call request_module();
> in this way, if the PHY driver is already there, the MDIO bus can perform
> the async probe."
This is the sticky plaster. I covered that in my message to which you're
replying. See the last paragraph from my reply quoted above.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
next prev parent reply other threads:[~2025-01-03 14:03 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-01 23:51 [PATCH] net: phy: don't issue a module request if a driver is available Francesco Valla
2025-01-02 9:35 ` [EXTERNAL] " Suman Ghosh
2025-01-02 10:21 ` Francesco Valla
2025-01-03 7:47 ` Suman Ghosh
2025-01-02 11:06 ` Russell King (Oracle)
2025-01-02 13:26 ` Francesco Valla
2025-01-02 13:52 ` Andrew Lunn
2025-01-02 14:01 ` Francesco Valla
2025-01-02 14:21 ` Andrew Lunn
2025-01-03 18:00 ` Francesco Valla
2025-01-03 11:25 ` Niklas Cassel
2025-01-03 12:08 ` Russell King (Oracle)
2025-01-03 12:49 ` Niklas Cassel
2025-01-03 14:03 ` Russell King (Oracle) [this message]
2025-01-03 14:12 ` Andrew Lunn
2025-01-03 14:34 ` Niklas Cassel
2025-01-03 15:37 ` Andrew Lunn
2025-01-03 17:57 ` Francesco Valla
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z3fuP47lueQpzMst@shell.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=andrew+netdev@lunn.ch \
--cc=andrew@lunn.ch \
--cc=cassel@kernel.org \
--cc=francesco@valla.it \
--cc=hkallweit1@gmail.com \
--cc=linux.amoon@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.