From: Tobias Waldekranz <tobias@waldekranz.com>
To: Vladimir Oltean <vladimir.oltean@nxp.com>,
Grygorii Strashko <grygorii.strashko@ti.com>
Cc: "netdev\@vger.kernel.org" <netdev@vger.kernel.org>,
Jakub Kicinski <kuba@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Andrew Lunn <andrew@lunn.ch>,
Florian Fainelli <f.fainelli@gmail.com>,
Vivien Didelot <vivien.didelot@gmail.com>,
Jiri Pirko <jiri@resnulli.us>, Ido Schimmel <idosch@idosch.org>,
Roopa Prabhu <roopa@nvidia.com>,
Nikolay Aleksandrov <nikolay@nvidia.com>,
Stephen Hemminger <stephen@networkplumber.org>,
"bridge\@lists.linux-foundation.org"
<bridge@lists.linux-foundation.org>,
Alexander Duyck <alexander.duyck@gmail.com>
Subject: Re: [RFC PATCH v2 net-next 04/10] net: bridge: switchdev: allow the data plane forwarding to be offloaded
Date: Mon, 12 Jul 2021 14:28:42 +0200 [thread overview]
Message-ID: <87r1g37m2t.fsf@waldekranz.com> (raw)
In-Reply-To: <20210709140940.4ak5vvt5hxay3wus@skbuf>
On Fri, Jul 09, 2021 at 14:09, Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> Hi Grygorii,
>
> On Fri, Jul 09, 2021 at 04:16:13PM +0300, Grygorii Strashko wrote:
>> On 03/07/2021 14:56, Vladimir Oltean wrote:
>> > From: Tobias Waldekranz <tobias@waldekranz.com>
>> >
>> > Allow switchdevs to forward frames from the CPU in accordance with the
>> > bridge configuration in the same way as is done between bridge
>> > ports. This means that the bridge will only send a single skb towards
>> > one of the ports under the switchdev's control, and expects the driver
>> > to deliver the packet to all eligible ports in its domain.
>> >
>> > Primarily this improves the performance of multicast flows with
>> > multiple subscribers, as it allows the hardware to perform the frame
>> > replication.
>> >
>> > The basic flow between the driver and the bridge is as follows:
>> >
>> > - The switchdev accepts the offload by returning a non-null pointer
>> > from .ndo_dfwd_add_station when the port is added to the bridge.
>> >
>> > - The bridge sends offloadable skbs to one of the ports under the
>> > switchdev's control using dev_queue_xmit_accel.
>> >
>> > - The switchdev notices the offload by checking for a non-NULL
>> > "sb_dev" in the core's call to .ndo_select_queue.
>>
>> Sry, I could be missing smth.
>>
>> Is there any possibility to just mark skb itself as "fwd_offload" (or smth), so driver can
>> just check it and decide what to do. Following you series:
>> - BR itself will send packet only once to one port if fwd offload possible and supported
>> - switchdev driver can check/negotiate BR_FWD_OFFLOAD flag
>>
>> In our case, TI CPSW can send directed packet (default now), by specifying port_id if DMA desc
>> or keep port_id == 0 which will allow HW to process packet internally, including MC duplication.
>>
>> Sry, again, but necessity to add 3 callbacks and manipulate with "virtual" queue to achieve
>> MC offload (seems like one of the primary goals) from BR itself looks a bit over-complicated :(
>
> After cutting my teeth myself with Tobias' patches, I tend to agree with
> the idea that the macvlan offload framework is not a great fit for the
> software bridge data plane TX offloading. Some reasons:
I agree. I was trying to find an API that would not require adding new
.ndos or other infrastructure. You can see in my original RFC cover that
this was something I wrestled with.
> - the sb_dev pointer is necessary for macvlan because you can have
> multiple macvlan uppers and you need to know which one this packet
> came from. Whereas in the case of a bridge, any given switchdev net
> device can have a single bridge upper. So a single bit per skb,
> possibly even skb->offload_fwd_mark, could be used to encode this bit
> of information: please look up your FDB for this packet and
> forward/replicate it accordingly.
In fact, in the version I was about to publish, I reused
skb->offload_fwd_mark to encode precisely this property. It works really
well. Maybe I should just publish it, even with the issues regarding
mv88e6xxx. Let me know if you want to take a look at it.
> - I am a bit on the fence about the "net: allow ndo_select_queue to go
> beyond dev->num_real_tx_queues" and "net: extract helpers for binding
> a subordinate device to TX queues" patches, they look like the wrong
> approach overall, just to shoehorn our use case into a framework that
> was not meant to cover it.
Yep.
> - most importantly: Ido asked about the possibility for a switchdev to
> accelerate the data plane for a bridge port that is a LAG upper. In the
> current design, where the bridge attempts to call the
> .ndo_dfwd_add_station method of the bond/team driver, this will not
> work. Traditionally, switchdev has migrated away from ndo's towards
> notifiers because of the ability for a switchdev to intercept the
> notifier emitted by the bridge for the bonding interface, and to treat
> it by itself. So, logically speaking, it would make more sense to
> introduce a new switchdev notifier for TX data plane offloading per
> port. Actually, now that I'm thinking even more about this, it would
> be great not only if we could migrate towards notifiers, but if the
> notification could be emitted by the switchdev driver itself, at
I added pass-through implementations of these .ndos to make it work on
top of LAGs, but a notifier is much cleaner.
> bridge join time. Once upon a time I had an RFC patch that changed all
> switchdev drivers to inform the bridge that they are capable of
> offloading the RX data plane:
> https://patchwork.kernel.org/project/netdevbpf/patch/20210318231829.3892920-17-olteanv@gmail.com/
Really like this approach! It also opens up the possibility of disabling
it manually (something like `ethtool -K swp0 bridge-{rx, tx} off`). This
will allow you to run a DPI firewall on a specific port in a LAN, for
example.
> That patch was necessary because the bridge, when it sees a bridge
> port that is a LAG, and the LAG is on top of a switchdev, will assign
> the port hwdom based on the devlink switch ID of the switchdev. This
> is wrong because it assumes that the switchdev offloads the LAG, but
> in the vast majority of cases this is false, only a handful of
> switchdev drivers have LAG offload right now. So the expectation is
> that the bridge can do software forwarding between such LAG comprised
> of two switchdev interfaces, and a third (standalone) switchdev
> interface, but it doesn't do that, because to the bridge, all ports
> have the same hwdom.
> Now it seems common sense that I pick up this patch again and make the
> switchdev drivers give 2 pieces of information:
> (a) can I offload the RX data path
> (b) can I offload the TX data path
>
> I can try to draft another RFC with these changes.
next prev parent reply other threads:[~2021-07-12 12:28 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-03 11:56 [RFC PATCH v2 net-next 00/10] Allow forwarding for the software bridge data path to be offloaded to capable devices Vladimir Oltean
2021-07-03 11:56 ` [RFC PATCH v2 net-next 01/10] net: dfwd: constrain existing users to macvlan subordinates Vladimir Oltean
2021-07-03 11:56 ` [RFC PATCH v2 net-next 02/10] net: bridge: disambiguate offload_fwd_mark Vladimir Oltean
2021-07-09 13:23 ` Grygorii Strashko
2021-07-03 11:56 ` [RFC PATCH v2 net-next 03/10] net: bridge: switchdev: recycle unused hwdoms Vladimir Oltean
2021-07-03 11:56 ` [RFC PATCH v2 net-next 04/10] net: bridge: switchdev: allow the data plane forwarding to be offloaded Vladimir Oltean
2021-07-09 13:16 ` Grygorii Strashko
2021-07-09 14:09 ` Vladimir Oltean
2021-07-12 12:28 ` Tobias Waldekranz [this message]
2021-07-12 13:03 ` Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 05/10] net: extract helpers for binding a subordinate device to TX queues Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 06/10] net: allow ndo_select_queue to go beyond dev->num_real_tx_queues Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 07/10] net: dsa: track the number of switches in a tree Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 08/10] net: dsa: add support for bridge forwarding offload Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 09/10] net: dsa: mv88e6xxx: map virtual bridges with forwarding offload in the PVT Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 10/10] net: dsa: tag_dsa: offload the bridge forwarding process Vladimir Oltean
2021-07-03 22:04 ` [RFC PATCH v2 net-next 00/10] Allow forwarding for the software bridge data path to be offloaded to capable devices Tobias Waldekranz
2021-07-04 8:11 ` Vladimir Oltean
2021-07-05 8:09 ` Tobias Waldekranz
2021-07-05 8:54 ` Vladimir Oltean
2021-07-05 4:20 ` DENG Qingfang
2021-07-05 8:32 ` Tobias Waldekranz
2021-07-05 9:57 ` Vladimir Oltean
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r1g37m2t.fsf@waldekranz.com \
--to=tobias@waldekranz.com \
--cc=alexander.duyck@gmail.com \
--cc=andrew@lunn.ch \
--cc=bridge@lists.linux-foundation.org \
--cc=davem@davemloft.net \
--cc=f.fainelli@gmail.com \
--cc=grygorii.strashko@ti.com \
--cc=idosch@idosch.org \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=nikolay@nvidia.com \
--cc=roopa@nvidia.com \
--cc=stephen@networkplumber.org \
--cc=vivien.didelot@gmail.com \
--cc=vladimir.oltean@nxp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).