From: Oleksij Rempel <o.rempel@pengutronix.de>
To: Maxime Chevallier <maxime.chevallier@bootlin.com>
Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
Antoine Tenart <atenart@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Heiner Kallweit <hkallweit1@gmail.com>,
Florian Fainelli <f.fainelli@gmail.com>,
Vivien Didelot <vivien.didelot@gmail.com>,
Andrew Lunn <andrew@lunn.ch>,
Russell King - ARM Linux admin <linux@armlinux.org.uk>,
Tobias Waldekranz <tobias@waldekranz.com>,
Jakub Kicinski <kuba@kernel.org>
Subject: Re: Multi-PHYs and multiple-ports bonding support
Date: Mon, 17 Oct 2022 12:03:18 +0200 [thread overview]
Message-ID: <20221017100318.GH9703@pengutronix.de> (raw)
In-Reply-To: <20221017105100.0cb33490@pc-8.home>
Hi Maxime,
On Mon, Oct 17, 2022 at 10:51:00AM +0200, Maxime Chevallier wrote:
> Hello everyone,
>
> I'm reaching out to discuss a PHY topic that we would like to see
> upstreamed, to support multiple ports attached to a MAC.
>
> The end-goal is to achieve some redundancy in case of a physical link
> interruption, in a transparent manner, but using only one network
> interface (1 MAC).
>
> We've been made aware that some products in the wild propose this
> feature-set, using 2 PHYs connected to the same MAC, using some custom
> logic to switch back and forth between the 2 PHYs, and that's the main
> use-case we'd like to see supported :
>
> +-------+
> /----- | PHY | --- BaseT port
> +-------+ | +-------+
> | MAC |-- RGMII --|
> +-------+ | +-------+
> \----- | PHY | --- BaseT port
> +-------+
>
I can add more cases:
- case 1:
Similar HW can be found in combination with AX88772B:
https://cms.nacsemi.com/content/AuthDatasheets/ASIXS00048-1.pdf
Page 6
Current ASIX driver only takes care to power down internal PHY if
external is present:
https://elixir.bootlin.com/linux/latest/source/drivers/net/usb/asix_devices.c#L659
But I can image some one wants to implement hot switching between
internal PHY and external PHY or direct RMII connection too.
- case 2:
A $CUSTOMER of us has a system where the RGMII from the MAC is routed
via a analog multiplexer to a PHY or to an optional external
board where the RGMII connects to the host port of a switch chip
supported by DSA.
> This configuration comes with quite a lot of challenges since we bend
> the existing standards in numerous ways :
>
> - We have 2 PHYs on the same xMII bus, and they can't be active on that
> bus at the same time. To solve that, we have 2 strategies:
>
> - Put the PHY in isolate mode when not in use, they can perform link
> detection and reporting, but wont communicate on the MII bus.
> This can have side effects if both links are connected to the same
> network, which can be addressed through the use of gratuitous ARPs
> to make sure the right link gets known by the spanning-tree.
Can we "announce" topology change/reseting by switch the link state?
Usually, switches should drop forwarding entry for a port with down
state. But the problem with get complicated if there are multiple
bridges... :/
> - Put PHY down entirely when not is use, select an active PHY, and
> when the link goes down on that PHY, switch to the other. This was
> used on products that had PHYs were the isolate mode is broken.
This is probably better way to go. I assume the use cases where this
kind of redundancy is used, it is preferable to to reduce weight, cost
and power consumption.
> Upstream, we have one device that does something a bit similar, which is
> the macchiatobin, using the 88x3310 PHY. This PHY exports both an SFP
> interface as long as a copper BaseT interface. These 2 interfaces are
> connected to the same MAC and are mutually exclusive.
>
> It looks like this :
>
> +-------+ +---------+ |---- Copper BaseT
> | MAC | -- xxxMII -- | PHY |---|
> +-------+ +---------+ |---- SFP
>
> We don't have any way to control which port gets used, the first that
> has the link gets the link.
>
> Ideally we would like to be able to configure every aspects of these
> 2 cases, like :
> - Which link do we use
> - Do we switch automatically from one to the other
> - What are the links available
>
> I see 4 different aspects of this that would need to be added for this
> whole mechanism to work :
>
> 1) DT representation
>
> To support that, we would need a way to give knowledge to the kernel
> about the numer of physical ports that are connected to a given MAC.
> In the dual-phy mode, it's pretty straightforward, since we would
> "just" need to pass multiple phy handles to the mac node. In the MCBin
> case, it's a bit more complex, since we don't have a clear view on the
> number of ports connected to a given phy.
>
> The assumption is that we have only one port per phy, and it's nature is
> derived from the presence of an sfp=<> phandle in the DT, plus the
> driver itself specifying the phydev->port field (which to my knowledge
> isn't used that much ?)
>
> The subject of describing the ports a PHY exposes in a sensible way that
> doesn't require changing all DTs out-there has been discussed in the
> past here :
> https://lore.kernel.org/netdev/20201119152246.085514e1@bootlin.com/
>
> If we only focus on the dual-phy use-case - and not the single-phy
> dual-port - we might not have to deal with extensive DT changes at all.
>
> 2) Changes in Phylink
>
> This might be the tricky part, as we need to track several ports,
> possibly connected to different PHYs, to get their state. For now, I
> haven't prototyped any of this yet.
>
> The goal would be to allow either automatic switching, as is already
> done by the 3310 driver, but at a higher level. Phylink might not be the
> right place to do that, so maybe we just want to expose an API to get
> the possible ports on a given interface, their repective state, and a
> way to select one
>
> My idea would be to introduce a notion of a struct phy_port, that would
> describe a physical port. They would be controlled by a PHY (or a MAC,
> if the mac outputs 1000BaseX for example), one phy can
> possibly control multiple ports.
>
> The whole link redundancy would then be done manipulating ports, giving
> a layer of abstraction on the hardware topology itself.
>
> We would therefore abstract the logic by having :
> +--------+
> /---| Port |
> +-------------+ | +--------+
> | netdevice | ----|
> +-------------+ |
> | +---------+
> \---| Port |
> +---------+
>
> This is the representation the userspace would know about, without
> necessarily having to worry about the phys inbetween.
>
> I don't see that as a breaking change, since as of today, most systems
> only have one port per netdevice. We would need to add a way to deal
> with multiple ports per netdevice.
>
> 3) Adding a L2 bonding driver
>
> If the link switching logic is deported outside of phylink, we might
> want a generic way of bonding ports on an interface, configuring the
> policy to use for the switching (automatic, manual selection, maybe
> more like trying to elect the link with the highest speed ?). This is
> where we would handle sending the gratuitous ARPs upon link switching
> too.
>
> 3) UAPI
>
> From userspace, we would need ways to list the ports, their state, and
> possibly to configure the bonding parameters. for now in ethtool, we
> don't have the notion of port at all, we just have 1 netdevice == 1
> port. Should we therefore create one netdevice per port ? or stick to
> that one interface and refer to its ports with some ethtool parameters ?
>
> All of these are open questions, as this topic spans quite a lot of
> aspects in the stack. Any input, idea, comment, are very very welcome.
What about this use case:
MAC with > 1 PHYs. One PHY is active, you want to do cable testing
and/or to check the signal quality with SQI. Both are triggered
currently via ethtool on an interface.
Regards,
Oleksij
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
prev parent reply other threads:[~2022-10-17 10:03 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-17 8:51 Multi-PHYs and multiple-ports bonding support Maxime Chevallier
2022-10-17 9:24 ` Russell King (Oracle)
2022-10-17 13:03 ` Andrew Lunn
2022-10-18 11:45 ` Maxime Chevallier
2022-10-18 8:02 ` Maxime Chevallier
2022-10-18 8:13 ` Russell King (Oracle)
2022-10-18 9:20 ` Maxime Chevallier
2022-10-17 9:45 ` Jiri Pirko
2022-10-17 10:03 ` Oleksij Rempel [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221017100318.GH9703@pengutronix.de \
--to=o.rempel@pengutronix.de \
--cc=andrew@lunn.ch \
--cc=atenart@kernel.org \
--cc=davem@davemloft.net \
--cc=f.fainelli@gmail.com \
--cc=hkallweit1@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux@armlinux.org.uk \
--cc=maxime.chevallier@bootlin.com \
--cc=netdev@vger.kernel.org \
--cc=thomas.petazzoni@bootlin.com \
--cc=tobias@waldekranz.com \
--cc=vivien.didelot@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).