netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Component API not right for DSA?
@ 2024-09-18 11:10 Vladimir Oltean
  2024-09-18 12:48 ` Andrew Lunn
  2024-09-18 13:47 ` Russell King (Oracle)
  0 siblings, 2 replies; 5+ messages in thread
From: Vladimir Oltean @ 2024-09-18 11:10 UTC (permalink / raw)
  To: Russell King, Andrew Lunn, Florian Fainelli; +Cc: Saravana Kannan, netdev

Hi,

DSA probing in a multi-switch tree is a bit of a hack job.
It relies on the device tree providing complete "routing" information
(the "link" property between one cascade port and the cascade ports of
all other switches reachable through it) to determine whether the tree
has completely probed or not.

A switch which probes but finds that it has a dst->rtable entry (struct
dsa_link) towards another switch which hasn't probed will just report
a successful early probe here:

static int dsa_tree_setup(struct dsa_switch_tree *dst)
{
	...
	complete = dsa_tree_setup_routing_table(dst);
	if (!complete)
		return 0;
	...
}

and will mostly do nothing, except for listing its ports (dsa_port_touch())
in dst->ports.

The dst->ports puzzle becomes complete piece by piece, and when the
final piece of the puzzle probes (last switch of the tree), it sees that
all its link_dp ports are present in dst->ports, so complete=true, and
the code flow actually proceeds further in dsa_tree_setup().

The dsa_tree_setup() runs in the context of the last switch to probe,
and calls ds->ops->setup() etc for all other switches in this tree.

I don't like this for the following reasons:

1. I would like the device tree binding to make the "link" property optional
   (full routing table). Only direct adjacency information is sufficient
   to compute the rest, plus it is much easier on the device tree writer
   (https://lore.kernel.org/netdev/20240913131507.2760966-3-vladimir.oltean@nxp.com/).
   But I would have to keep this workaround functional - otherwise a
   switch will not wait for the entire tree to probe, just for its
   directly connected neighbors. Thus, I would need to run BFS to
   construct a routing table for probe time (_before_ struct dsa_switch
   and struct dsa_port are allocated for the other switches in the tree),
   and then I also need the regular variant of the rtable, that
   dsa_routing_port() uses. Could be done, but not great.

2. I honestly don't think that the workaround to wait until the routing
   table is complete is in the best interest of DSA. The larger context
   here is that one can imagine DSA trees operating in a "degraded state"
   where not all switches are present. For example, if there is a chain
   of 3 switches and the last switch is missing, nothing prevents the
   first 2 from doing their normal job. There is actually a customer who
   wants to take down a switch for regular maintainance, while keeping
   the rest of the system operational. This is currently not possible
   with DSA, because the tree wants all switches to be present. As soon
   as one single switch unbinds, the entire tree is torn down. They are
   pretty serious about this request, not just "would be nice if".
   And of course, not any degraded state makes functional sense. For
   example, removing the top-most switch must result in all switches
   downstream of it also getting removed, because traffic from the CPU
   can no longer reach them. That's fine, and I plan to handle that
   using device links from one switch to its upstream.

For reason #1, I started prototyping an integration between DSA and the
component API - something which was also suggested by Saravana Kannan in
the past. The component master is the tree, and the components are the
switches. For each struct dsa_switch_tree, I instantiated a struct
platform_device and a struct component_master_ops. The tree is created
by the first switch that calls dsa_register_switch(). It has a function
that explores the device tree using BFS, starting from that first switch
that created it, "link" phandle by phandle, calling component_match_add_release()
for the OF node of each other reachable switch. Every other switch in
the tree no longer creates it, but finds it and bumps its refcount.
The tree is bound when all switches have called component_add() with
their struct component_ops.

This is all great, but then I realized that, for addressing issue #2,
it is no better than what we currently have. Namely, by default the tree
looks like this:

$ cat /sys/kernel/debug/device_component/dsa_tree.0.auto
aggregate_device name                                  status
-------------------------------------------------------------
dsa_tree.0.auto                                         bound

device name                                            status
-------------------------------------------------------------
d0032004.mdio-mii:11                                    bound
d0032004.mdio-mii:10                                    bound
d0032004.mdio-mii:12                                    bound

but after this operation:

$ echo d0032004.mdio-mii:11 > /sys/bus/mdio_bus/devices/d0032004.mdio-mii\:11/driver/unbind
$ cat /sys/kernel/debug/device_component/dsa_tree.0.auto
aggregate_device name                                  status
-------------------------------------------------------------
dsa_tree.0.auto                                     not bound

device name                                            status
-------------------------------------------------------------
(unknown)                                      not registered
d0032004.mdio-mii:10                                not bound
d0032004.mdio-mii:12                                not bound

the tree (component master) is unbound, its unbind() method calls
component_unbind_all(), and this also unbinds the other switches.

The idea that a "degraded state" doesn't make sense for the component
master seems pretty fundamental to the entire component API. But I
figured I'd ask, before I put the idea of using this API to rest.

[ although I do like the debugfs device_component folder ]

Otherwise, my plan is to go ahead and introduce new dsa_switch_ops API
to let them know of dynamic updates to the routing table. This way, we
remove the baked assumption that all of it is available at ds->ops->setup()
time and never changes afterwards. There is nothing, functionally speaking,
that the component API can help me with, for this desired behavior.

Just thought I'd let everybody know of my current intention of making
changes to DSA, and gather some opinions/confirmation that I am on the
right track/alternative suggestions.

Thanks,
Vladimir

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Component API not right for DSA?
  2024-09-18 11:10 Component API not right for DSA? Vladimir Oltean
@ 2024-09-18 12:48 ` Andrew Lunn
  2024-09-18 12:59   ` Vladimir Oltean
  2024-09-18 13:47 ` Russell King (Oracle)
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Lunn @ 2024-09-18 12:48 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Russell King, Florian Fainelli, Saravana Kannan, netdev

> 2. I honestly don't think that the workaround to wait until the routing
>    table is complete is in the best interest of DSA. The larger context
>    here is that one can imagine DSA trees operating in a "degraded state"
>    where not all switches are present. For example, if there is a chain
>    of 3 switches and the last switch is missing, nothing prevents the
>    first 2 from doing their normal job. There is actually a customer who
>    wants to take down a switch for regular maintainance, while keeping
>    the rest of the system operational.

Do you plan to use hotplug for this? The user interfaces disappear
when the switch is removed? The kernel will then try to clean up all
state for those interfaces, removing them from bridges and bonds etc?

It will be interesting to see what happens if something in userspace
is keeping a reference on the interfaces, so they cannot be destroyed,
and then the switch is probed again, and we have a name clash. I've
seen USB interfaces not fully disappear when i had a flaky USB hub
causing disconnects.

I wounder what configuration exists which is transparent to
Linux. Hotplugging interfaces won't deal with this.  The routing table
is one, it is a DSA concept. You will need to change the internal API,
be able to tell a switch the topology has changed, it needs to reload
its routing table. But i don't think that is hard.

	Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Component API not right for DSA?
  2024-09-18 12:48 ` Andrew Lunn
@ 2024-09-18 12:59   ` Vladimir Oltean
  0 siblings, 0 replies; 5+ messages in thread
From: Vladimir Oltean @ 2024-09-18 12:59 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Florian Fainelli, Saravana Kannan, netdev

On Wed, Sep 18, 2024 at 02:48:26PM +0200, Andrew Lunn wrote:
> > 2. I honestly don't think that the workaround to wait until the routing
> >    table is complete is in the best interest of DSA. The larger context
> >    here is that one can imagine DSA trees operating in a "degraded state"
> >    where not all switches are present. For example, if there is a chain
> >    of 3 switches and the last switch is missing, nothing prevents the
> >    first 2 from doing their normal job. There is actually a customer who
> >    wants to take down a switch for regular maintainance, while keeping
> >    the rest of the system operational.
> 
> Do you plan to use hotplug for this? The user interfaces disappear
> when the switch is removed? The kernel will then try to clean up all
> state for those interfaces, removing them from bridges and bonds etc?
> 
> It will be interesting to see what happens if something in userspace
> is keeping a reference on the interfaces, so they cannot be destroyed,
> and then the switch is probed again, and we have a name clash. I've
> seen USB interfaces not fully disappear when i had a flaky USB hub
> causing disconnects.
> 
> I wounder what configuration exists which is transparent to
> Linux. Hotplugging interfaces won't deal with this.  The routing table
> is one, it is a DSA concept. You will need to change the internal API,
> be able to tell a switch the topology has changed, it needs to reload
> its routing table. But i don't think that is hard.
> 
> 	Andrew

Nope, it won't be so complicated - removal of switches would be
initiated by user space (in turn, by the user). For this use case, it is
known ahead of time by a few seconds that a switch is going to disappear,
so regular driver unbinding should be fine. Later attempts to rebind the
driver to the device should fail as long as it is physically inaccessible.

Side note - there is another related request to take down a PHY of a
single port for maintainance, with the same simplifying assumption that
it is known ahead of time by a few seconds when this is needed, and with
the same requirement that the rest of the switch ports must go on. I was
planning to handle that in a similar way: user space puts the net_device
down, and it cannot be put back up until the PHY is accessible again.
I need to submit some more changes to DSA for that to work (phylink_of_phy_connect()
done at ndo_open() time), but that was much easier compared to adapting
the outcome of dsa_routing_port(), and thus drivers, to dynamic changes
to the rtable.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Component API not right for DSA?
  2024-09-18 11:10 Component API not right for DSA? Vladimir Oltean
  2024-09-18 12:48 ` Andrew Lunn
@ 2024-09-18 13:47 ` Russell King (Oracle)
  2024-09-18 14:25   ` Vladimir Oltean
  1 sibling, 1 reply; 5+ messages in thread
From: Russell King (Oracle) @ 2024-09-18 13:47 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Andrew Lunn, Florian Fainelli, Saravana Kannan, netdev

On Wed, Sep 18, 2024 at 02:10:08PM +0300, Vladimir Oltean wrote:
> This is all great, but then I realized that, for addressing issue #2,
> it is no better than what we currently have. Namely, by default the tree
> looks like this:
> 
...
> 
> but after this operation:
> 
> $ echo d0032004.mdio-mii:11 > /sys/bus/mdio_bus/devices/d0032004.mdio-mii\:11/driver/unbind
> $ cat /sys/kernel/debug/device_component/dsa_tree.0.auto
> aggregate_device name                                  status
> -------------------------------------------------------------
> dsa_tree.0.auto                                     not bound
> 
> device name                                            status
> -------------------------------------------------------------
> (unknown)                                      not registered
> d0032004.mdio-mii:10                                not bound
> d0032004.mdio-mii:12                                not bound
> 
> the tree (component master) is unbound, its unbind() method calls
> component_unbind_all(), and this also unbinds the other switches.

Correct. As author of the component helper... The component helper was
designed for an overall device that is made up of multiple component
devices that are themselves drivers, and _all_ need to be present in
order for the overall device to be functional. It is not intended to
address cases where an overall device has optional components.

The helper was originally written to address that problem for the
Freescale i.MX IPU, which had been sitting in staging for considerable
time, and was blocked from being moved out because of issues with this
that weren't solvable at the time (we didn't have device links back
then, which probably could've been used instead.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Component API not right for DSA?
  2024-09-18 13:47 ` Russell King (Oracle)
@ 2024-09-18 14:25   ` Vladimir Oltean
  0 siblings, 0 replies; 5+ messages in thread
From: Vladimir Oltean @ 2024-09-18 14:25 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, Florian Fainelli, Saravana Kannan, netdev

On Wed, Sep 18, 2024 at 02:47:30PM +0100, Russell King (Oracle) wrote:
> On Wed, Sep 18, 2024 at 02:10:08PM +0300, Vladimir Oltean wrote:
> > This is all great, but then I realized that, for addressing issue #2,
> > it is no better than what we currently have. Namely, by default the tree
> > looks like this:
> > 
> ...
> > 
> > but after this operation:
> > 
> > $ echo d0032004.mdio-mii:11 > /sys/bus/mdio_bus/devices/d0032004.mdio-mii\:11/driver/unbind
> > $ cat /sys/kernel/debug/device_component/dsa_tree.0.auto
> > aggregate_device name                                  status
> > -------------------------------------------------------------
> > dsa_tree.0.auto                                     not bound
> > 
> > device name                                            status
> > -------------------------------------------------------------
> > (unknown)                                      not registered
> > d0032004.mdio-mii:10                                not bound
> > d0032004.mdio-mii:12                                not bound
> > 
> > the tree (component master) is unbound, its unbind() method calls
> > component_unbind_all(), and this also unbinds the other switches.
> 
> Correct. As author of the component helper... The component helper was
> designed for an overall device that is made up of multiple component
> devices that are themselves drivers, and _all_ need to be present in
> order for the overall device to be functional. It is not intended to
> address cases where an overall device has optional components.
> 
> The helper was originally written to address that problem for the
> Freescale i.MX IPU, which had been sitting in staging for considerable
> time, and was blocked from being moved out because of issues with this
> that weren't solvable at the time (we didn't have device links back
> then, which probably could've been used instead.)

Thanks for confirming. Although I could use the component helper in DSA
as it is now, it would ossify a limitation which I would like to remove.
The logic I would need is "aggregate device is bound as long as one
component is added, and gets notified of the addition/removal of all
other components". I guess it would be better suited to open-code this
logic in DSA.

As for the actual device_component debugfs folder that the component API
creates by default, I like the introspection into the switch tree state
that it offers when applied to DSA (it is useful especially if some
switches will go missing). Maybe I will keep the idea of having a
platform_device per dsa_switch_tree, and also create a debugfs with
similar information for it. If I also open-code this, I could also look
into adding a column with the last error code returned by each switch's
setup() function (I've lost count of how many times I had to debug
cross-chip probe issues). I will have to see.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-09-18 14:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-18 11:10 Component API not right for DSA? Vladimir Oltean
2024-09-18 12:48 ` Andrew Lunn
2024-09-18 12:59   ` Vladimir Oltean
2024-09-18 13:47 ` Russell King (Oracle)
2024-09-18 14:25   ` Vladimir Oltean

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).