From: Sean Anderson <sean.anderson@linux.dev>
To: Daniel Golle <daniel@makrotopia.org>
Cc: Christian Marangi <ansuelsmth@gmail.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Rob Herring <robh@kernel.org>,
Krzysztof Kozlowski <krzk+dt@kernel.org>,
Conor Dooley <conor+dt@kernel.org>,
Lorenzo Bianconi <lorenzo@kernel.org>,
Heiner Kallweit <hkallweit1@gmail.com>,
Russell King <linux@armlinux.org.uk>,
Philipp Zabel <p.zabel@pengutronix.de>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <nick.desaulniers+lkml@gmail.com>,
Bill Wendling <morbo@google.com>,
Justin Stitt <justinstitt@google.com>,
netdev@vger.kernel.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-mediatek@lists.infradead.org, llvm@lists.linux.dev
Subject: Re: [net-next PATCH v4 03/11] net: phylink: introduce internal phylink PCS handling
Date: Mon, 19 May 2025 19:28:34 -0400 [thread overview]
Message-ID: <7bb37bea-917c-4082-9eaa-063c4d97833b@linux.dev> (raw)
In-Reply-To: <cdfbbdca-001b-4ed5-92ff-40fd3a8e3341@linux.dev>
On 5/19/25 14:10, Sean Anderson wrote:
> On 5/13/25 23:00, Daniel Golle wrote:
>> On Tue, May 13, 2025 at 03:23:32PM -0400, Sean Anderson wrote:
>>> On 5/13/25 15:03, Daniel Golle wrote:
>>> > just instead of having many
>>> > more or less identical implementations of .mac_select_pcs, this
>>> > functionality is moved into phylink. As a nice side-effect that also
>>> > makes managing the life-cycle of the PCS more easy, so we won't need all
>>> > the wrappers for all the PCS OPs.
>>>
>>> I think the wrapper approach is very obviously correct. This way has me
>>> worried about exciting new concurrency bugs.
>>
>> You may not be surprised to read that this was also our starting point 2
>> months ago, I had implemented support for standalone PCS very similar to
>> the approach you have published now, using refcnt'ed instances and
>> locked wrapper functions for all OPs. My approach, like yours, was to
>> create a new subsystem for standalone PCS drivers which is orthogonal to
>> phylink and only requires very few very small changes to phylink itself.
>> It was a draft and not as complete and well-documented like your series
>> now, of course.
>>
>> I've then shared that implementation with Christian and some other
>> experienced OpenWrt developers and we concluded that having phylink handle
>> the PCS lifecycle and PCS selection would be the better and more elegant
>> approach for multiple reasons:
>> - The lifetime management of the wrapper instances becomes tricky:
>> We would either have to live with them being allocated by the
>> MAC-driver (imagine test-case doing unbind and then bind in a loop
>> for a while -- we would end up oom). Or we need some kind of garbage
>> collecting mechanism which frees the wrapper once refcnt is zero --
>> and as .select_pcs would 'get' the PCS (ie. bump refcnt) we'd need a
>> 'put' equivalent (eg. a .pcs_destroy() OP) in phylink.
>>
>> Russell repeatedly pointed me to the possibility of a PCS
>> "disappearing" (and potentially "reappearing" some time later), and
>> in this case it is unclear who would then ever call pcs_put(), or
>> even notify the Ethernet driver or phylink about the PCS now being
>> available (again). Using device_link_add(), like it is done in
>> pcs-rzn1-miic.c, prevents the worst (ie. use-after-free), but also
>> impacts all other netdevs exposed by the same Ethernet driver
>> instance, and has a few other rather ugly implications.
>
> SRCU neatly solves the lifetime management issues. The wrapper lives as
> long as anyone (provider or user) holds a reference. A PCS can disappear
> at any point and everything still works (although the link goes down).
> Device links are only an optimization; they cannot be relied on for
> correctness.
>
>> - phylink currently expects .mac_select_pcs to never fail. But we may
>> need a mechanism similar to probe deferral in case the PCS is not
>> yet available.
>
> Which is why you grab the PCS in probe. If you want to be more dynamic,
> you can do it in netdev open like is done for PHYs.
>
>> Your series partially solves this in patch 11/11 "of: property: Add
>> device link support for PCS", but also that still won't make the link
>> come back in case of a PCS showing up late to the party, eg. due to
>> constraints such as phy drivers (drivers/phy, not drivers/net/phy)
>> waiting for nvmem providers, or PCS instances "going away" and
>> "coming back" later.
>
> This all works correctly due to device links. The only case that doesn't
> work automatically is something like
>
> MAC built-in
> MDIO built-in
> PCS module
>
> where the PCS module gets loaded late. In that case you have to manually
> re-probe the MAC. I think the best way to address this would be to grab
> the PCS in netdev open so that the MAC can probe without the PCS.
>
>> - removal of a PCS instance (eg. via sysfs unbind) would still
>> require changes to phylink. there is no phylink function to
>> impair the link in this case, and using dev_close() is a bit ugly,
>> and also won't bring the link back up once the PCS (re-)appears.
>
> This works just fine. There are two cases:
>
> - If the PCS has an IRQ, we notify phylink and then it polls the PCS
> (see below).
> - If the PCS is polled, phylink will call pcs_get_state and see that the
> link is down.
>
> Either way, the link goes down. But bringing the link back up is pretty
> unusual anyway. Unlike PHYs (which theoretically can be on removable
> busses) PCSs are generally permanently attached to their MACs. The only
> removable scenario I can think of is if the PCS is on an FPGA and the
> MAC is not.
>
> So if the PCS goes away, the MAC is likely to follow shortly after
> (since the whole thing is on a removable bus). Or someone has manually
> removed the PCS, in which case I think it's reasonable to have them
> manually remove the MAC as well. If you really want to support this,
> then just grab the PCS in netdev open.
So I had a closer look at this and unfortunately it isn't as easy as
just grabbing the PCS in ndo_open. The problem is that we need to
know the supported interfaces before phylink_create. The interfaces are
validated and are visible to userspace as soon as the netdev is
registered. And we can't just defer phylink_create to ndo_open because a
lot of the ethtool ops are implemented with phylink. So this would
probably need something like phylink_update_supported_interfaces().
But TBH I don't think this use case is very relevant. As I said above,
it only affects FPGA reconfiguration and people manually unbinding
drivers. Either way I think they are savvy enough to reprobe the netdev.
--Sean
>> - phylink anyway is the only user of PCS drivers, and will very likely
>> always be. So why create another subsystem?
>
> To avoid adding overhead for the majority of PCSs where the PCS is built
> into the MAC and literally can't be removed. We only pay the price for
> dynamicism on the drivers where it matters.
>
>> All that being said I also see potential problems with Christians
>> current implementation as it doesn't prevent the Ethernet driver to
>> still store a pointer to struct phylink_pcs (returned eg. from
>> fwnode_pcs_get()).
>>
>> Hence I would like to see an even more tight integration with phylink,
>> in the sense that pointers to 'struct phylink_pcs' should never be
>> exposed to the MAC driver, as only in that way we can be sure that
>> phylink, and only phylink, is responsible for reacting to a PCS "going
>> away".
>
> OK, but then how does the MAC select the PCS? If there are multiple PCSs
> then ultimately someone has to configure a mux somewhere.
>
>> Ie. instead of fwnode_phylink_pcs_parse() handing pointers to struct
>> phylink_pcs to the Ethernet driver, so it can use it to populate struct
>> phylink_config available_pcs member, this should be the responsibility
>> of phylink alltogether, directly populating the list of available PCS in
>> phylink's private structure.
>>
>> Similarly, there should not be fwnode_pcs_get() but rather phylink
>> providing a function fwnode_phylink_pcs_register(phylink, fwnode) which
>> directly adds the PCS referenced to the internal list of available PCS.
>
> This is difficult to work with for existing drivers. Many of them have
> non-standard ways of looking up their PCS that they need to support for
> backwards-compatibility. And some of them create the PCS themselves
> (such as if they are PCI devices with internal MDIO busses). It's much
> easier for the MAC to create or look up the PCS itself and then hand it
> off to phylink.
>
>> I hope we can pick the best of all the suggested implementations, and
>> together come up with something even better.
>
> Sure. And I think we were starting from a clean slate then this would be
> the obvious way to do things. But we must support existing drivers and
> provide an upgrade path for them. This is why I favor an incremental
> approach.
>
> --Sean
next prev parent reply other threads:[~2025-05-19 23:28 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-11 20:12 [net-next PATCH v4 00/11] net: pcs: Introduce support for fwnode PCS Christian Marangi
2025-05-11 20:12 ` [net-next PATCH v4 01/11] net: phy: introduce phy_interface_copy helper Christian Marangi
2025-05-13 18:18 ` Sean Anderson
2025-05-11 20:12 ` [net-next PATCH v4 02/11] net: phylink: keep and use MAC supported_interfaces in phylink struct Christian Marangi
2025-05-11 20:12 ` [net-next PATCH v4 03/11] net: phylink: introduce internal phylink PCS handling Christian Marangi
2025-05-13 18:18 ` Sean Anderson
2025-05-13 19:03 ` Daniel Golle
2025-05-13 19:23 ` Sean Anderson
2025-05-13 19:42 ` Sean Anderson
2025-05-14 3:00 ` Daniel Golle
2025-05-19 18:10 ` Sean Anderson
2025-05-19 23:28 ` Sean Anderson [this message]
2025-05-11 20:12 ` [net-next PATCH v4 04/11] net: phylink: add phylink_release_pcs() to externally release a PCS Christian Marangi
2025-05-11 20:12 ` [net-next PATCH v4 05/11] net: pcs: implement Firmware node support for PCS driver Christian Marangi
2025-05-13 18:40 ` Sean Anderson
2025-05-11 20:12 ` [net-next PATCH v4 06/11] net: phylink: support late PCS provider attach Christian Marangi
2025-05-11 20:12 ` [net-next PATCH v4 07/11] dt-bindings: net: ethernet-controller: permit to define multiple PCS Christian Marangi
2025-05-13 18:16 ` Sean Anderson
2025-05-14 21:32 ` Rob Herring
2025-05-11 20:12 ` [net-next PATCH v4 08/11] net: phylink: add .pcs_link_down PCS OP Christian Marangi
2025-05-13 18:44 ` Sean Anderson
2025-05-11 20:12 ` [net-next PATCH v4 09/11] dt-bindings: net: pcs: Document support for Airoha Ethernet PCS Christian Marangi
2025-05-13 18:25 ` Sean Anderson
2025-05-11 20:12 ` [net-next PATCH v4 10/11] net: pcs: airoha: add PCS driver for Airoha SoC Christian Marangi
2025-05-13 4:04 ` kernel test robot
2025-05-11 20:12 ` [net-next PATCH v4 11/11] net: airoha: add phylink support for GDM2/3/4 Christian Marangi
2025-05-12 9:21 ` Lorenzo Bianconi
2025-05-12 9:27 ` Christian Marangi
2025-05-12 12:53 ` Lorenzo Bianconi
2025-05-13 18:50 ` Sean Anderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7bb37bea-917c-4082-9eaa-063c4d97833b@linux.dev \
--to=sean.anderson@linux.dev \
--cc=andrew+netdev@lunn.ch \
--cc=ansuelsmth@gmail.com \
--cc=conor+dt@kernel.org \
--cc=daniel@makrotopia.org \
--cc=davem@davemloft.net \
--cc=devicetree@vger.kernel.org \
--cc=edumazet@google.com \
--cc=hkallweit1@gmail.com \
--cc=justinstitt@google.com \
--cc=krzk+dt@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mediatek@lists.infradead.org \
--cc=linux@armlinux.org.uk \
--cc=llvm@lists.linux.dev \
--cc=lorenzo@kernel.org \
--cc=morbo@google.com \
--cc=nathan@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=nick.desaulniers+lkml@gmail.com \
--cc=p.zabel@pengutronix.de \
--cc=pabeni@redhat.com \
--cc=robh@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox