From: Tobias Waldekranz <tobias@waldekranz.com>
To: Nikolay Aleksandrov <nikolay@nvidia.com>,
davem@davemloft.net, kuba@kernel.org
Cc: Andrew Lunn <andrew@lunn.ch>,
Vivien Didelot <vivien.didelot@gmail.com>,
Florian Fainelli <f.fainelli@gmail.com>,
Vladimir Oltean <olteanv@gmail.com>,
Jiri Pirko <jiri@resnulli.us>, Ivan Vecera <ivecera@redhat.com>,
Roopa Prabhu <roopa@nvidia.com>,
Russell King <linux@armlinux.org.uk>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
bridge@lists.linux-foundation.org
Subject: Re: [RFC net-next 0/9] net: bridge: vlan: Multiple Spanning Trees
Date: Wed, 16 Feb 2022 16:56:04 +0100 [thread overview]
Message-ID: <87mtiqajqj.fsf@waldekranz.com> (raw)
In-Reply-To: <d59ee33c-79f9-2622-cec2-987a35f4ec1e@nvidia.com>
On Wed, Feb 16, 2022 at 17:28, Nikolay Aleksandrov <nikolay@nvidia.com> wrote:
> On 16/02/2022 15:29, Tobias Waldekranz wrote:
>> The bridge has had per-VLAN STP support for a while now, since:
>>
>> https://lore.kernel.org/netdev/20200124114022.10883-1-nikolay@cumulusnetworks.com/
>>
>> The current implementation has some problems:
>>
>> - The mapping from VLAN to STP state is fixed as 1:1, i.e. each VLAN
>> is managed independently. This is awkward from an MSTP (802.1Q-2018,
>> Clause 13.5) point of view, where the model is that multiple VLANs
>> are grouped into MST instances.
>>
>> Because of the way that the standard is written, presumably, this is
>> also reflected in hardware implementations. It is not uncommon for a
>> switch to support the full 4k range of VIDs, but that the pool of
>> MST instances is much smaller. Some examples:
>>
>> Marvell LinkStreet (mv88e6xxx): 4k VLANs, but only 64 MSTIs
>> Marvell Prestera: 4k VLANs, but only 128 MSTIs
>> Microchip SparX-5i: 4k VLANs, but only 128 MSTIs
>>
>> - By default, the feature is enabled, and there is no way to disable
>> it. This makes it hard to add offloading in a backwards compatible
>> way, since any underlying switchdevs have no way to refuse the
>> function if the hardware does not support it
>>
>> - The port-global STP state has precedence over per-VLAN states. In
>> MSTP, as far as I understand it, all VLANs will use the common
>> spanning tree (CST) by default - through traffic engineering you can
>> then optimize your network to group subsets of VLANs to use
>> different trees (MSTI). To my understanding, the way this is
>> typically managed in silicon is roughly:
>>
>> Incoming packet:
>> .----.----.--------------.----.-------------
>> | DA | SA | 802.1Q VID=X | ET | Payload ...
>> '----'----'--------------'----'-------------
>> |
>> '->|\ .----------------------------.
>> | +--> | VID | Members | ... | MSTI |
>> PVID -->|/ |-----|---------|-----|------|
>> | 1 | 0001001 | ... | 0 |
>> | 2 | 0001010 | ... | 10 |
>> | 3 | 0001100 | ... | 10 |
>> '----------------------------'
>> |
>> .-----------------------------'
>> | .------------------------.
>> '->| MSTI | Fwding | Lrning |
>> |------|--------|--------|
>> | 0 | 111110 | 111110 |
>> | 10 | 110111 | 110111 |
>> '------------------------'
>>
>> What this is trying to show is that the STP state (whether MSTP is
>> used, or ye olde STP) is always accessed via the VLAN table. If STP
>> is running, all MSTI pointers in that table will reference the same
>> index in the STP stable - if MSTP is running, some VLANs may point
>> to other trees (like in this example).
>>
>> The fact that in the Linux bridge, the global state (think: index 0
>> in most hardware implementations) is supposed to override the
>> per-VLAN state, is very awkward to offload. In effect, this means
>> that when the global state changes to blocking, drivers will have to
>> iterate over all MSTIs in use, and alter them all to match. This
>> also means that you have to cache whether the hardware state is
>> currently tracking the global state or the per-VLAN state. In the
>> first case, you also have to cache the per-VLAN state so that you
>> can restore it if the global state transitions back to forwarding.
>>
>> This series adds support for an arbitrary M:N mapping of VIDs to
>> MSTIs, proposing one solution to the first issue. An example of an
>> offload implementation for mv88e6xxx is also provided. Offloading is
>> done on a best-effort basis, i.e. notifications of the relevant events
>> are generated, but there is no way for the user to see whether the
>> per-VLAN state has been offloaded or not. There is also no handling of
>> the relationship between the port-global state the the per-VLAN ditto.
>>
>> If I was king of net/bridge/*, I would make the following additional
>> changes:
>>
>> - By default, when a VLAN is created, assign it to MSTID 0, which
>> would mean that no per-VLAN state is used and that packets belonging
>> to this VLAN should be filtered according to the port-global state.
>>
>> This way, when a VLAN is configured to use a separate tree (setting
>> a non-zero MSTID), an underlying switchdev could oppose it if it is
>> not supported.
>>
>> Obviously, this adds an extra step for existing users of per-VLAN
>> STP states and would thus not be backwards compatible. Maybe this
>> means that that is impossible to do, maybe not.
>>
>> - Swap the precedence of the port-global and the per-VLAN state,
>> i.e. the port-global state only applies to packets belonging to
>> VLANs that does not make use of a per-VLAN state (MSTID != 0).
>>
>> This would make the offloading much more natural, as you avoid all
>> of the caching stuff described above.
>>
>> Again, this changes the behavior of the kernel so it is not
>> backwards compatible. I suspect that this is less of an issue
>> though, since my guess is that very few people rely on the old
>> behavior.
>>
>> Thoughts?
>>
>
> Interesting! Would adding a new (e.g. vlan_mst_enable) option which changes the behaviour
> as described help? It can require that there are no vlans present to change
> similar to the per-port vlan stats option.
Great idea, I did not know that that's how vlan stats worked. I will
definitely look into it, thanks!
> Also based on that option you can alter
> how the state checks are performed. For example, you can skip the initial port state
> check, then in br_vlan_allowed_ingress() you can use the port state if vlan filtering
> is disabled and mst enabled and you can avoid checking it altogether if filter && mst
> are enabled then always use the vlan mst state. Similar changes would have to happen
> for the egress path. Since we are talking about multiple tests the new MST logic can
> be hidden behind a static key for both br_handle_frame() and later stages.
Makes sense.
So should we keep the current per-VLAN state as-is then? And bolt the
MST on to the side? I.e. should `struct net_bridge_vlan` both have `u8
state` for the current implementation _and_ a `struct br_vlan_mst *`
that is populated for VLANs tied to a non-zero MSTI?
> This set needs to read a new cache line to fetch mst ptr for all packets in the vlan fast-path,
> that is definitely undesirable. Please either cache that state in the vlan and update it when
> something changes, or think of some way which avoids that cache line in fast-path.
> Alternative would be to make that cache line dependent on the new option, so it's needed
> only when mst feature is enabled.
If we go with the approach I suggested above, then the current `u8
state` on `struct net_bridge_vlan` could be that cache, right?
With the current implementation, it is set directly - in the new MST
mode all grouped VLANs would have their states updated when updating the
MSTI's state.
next prev parent reply other threads:[~2022-02-16 15:56 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-16 13:29 [RFC net-next 0/9] net: bridge: vlan: Multiple Spanning Trees Tobias Waldekranz
2022-02-16 13:29 ` [RFC net-next 1/9] net: bridge: vlan: Introduce multiple spanning trees (MST) Tobias Waldekranz
2022-02-16 13:29 ` [RFC net-next 2/9] net: bridge: vlan: Allow multiple VLANs to be mapped to a single MST Tobias Waldekranz
2022-02-16 13:29 ` [RFC net-next 3/9] net: bridge: vlan: Notify switchdev drivers of VLAN MST migrations Tobias Waldekranz
2022-02-16 13:29 ` [RFC net-next 4/9] net: bridge: vlan: Notify switchdev drivers of MST state changes Tobias Waldekranz
2022-02-16 13:29 ` [RFC net-next 5/9] net: dsa: Pass VLAN MST migration notifications to driver Tobias Waldekranz
2022-02-16 13:29 ` [RFC net-next 6/9] net: dsa: Pass MST state changes " Tobias Waldekranz
2022-02-16 13:29 ` [RFC net-next 7/9] net: dsa: mv88e6xxx: Disentangle STU from VTU Tobias Waldekranz
2022-02-16 13:29 ` [RFC net-next 8/9] net: dsa: mv88e6xxx: Export STU as devlink region Tobias Waldekranz
2022-02-16 13:29 ` [RFC net-next 9/9] net: dsa: mv88e6xxx: MST Offloading Tobias Waldekranz
2022-02-16 13:39 ` Russell King (Oracle)
2022-02-16 15:28 ` [RFC net-next 0/9] net: bridge: vlan: Multiple Spanning Trees Nikolay Aleksandrov
2022-02-16 15:56 ` Tobias Waldekranz [this message]
2022-02-16 16:12 ` Nikolay Aleksandrov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mtiqajqj.fsf@waldekranz.com \
--to=tobias@waldekranz.com \
--cc=andrew@lunn.ch \
--cc=bridge@lists.linux-foundation.org \
--cc=davem@davemloft.net \
--cc=f.fainelli@gmail.com \
--cc=ivecera@redhat.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=netdev@vger.kernel.org \
--cc=nikolay@nvidia.com \
--cc=olteanv@gmail.com \
--cc=roopa@nvidia.com \
--cc=vivien.didelot@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).