All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tobias Waldekranz <tobias@waldekranz.com>
To: Vladimir Oltean <olteanv@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>, Marek Behun <marek.behun@nic.cz>,
	vivien.didelot@gmail.com, f.fainelli@gmail.com,
	netdev@vger.kernel.org
Subject: Re: [RFC PATCH 0/4] net: dsa: link aggregation support
Date: Tue, 27 Oct 2020 21:53:45 +0100	[thread overview]
Message-ID: <878sbrv19i.fsf@waldekranz.com> (raw)
In-Reply-To: <20201027200220.3ai2lcyrxkvmd2f4@skbuf>

On Tue, Oct 27, 2020 at 22:02, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Tue, Oct 27, 2020 at 08:37:58PM +0100, Tobias Waldekranz wrote:
>> >> In order for this to work on transmit, we need to add forward offloading
>> >> to the bridge so that we can, for example, send one FORWARD from the CPU
>> >> to send an ARP broadcast to swp1..4 instead of four FROM_CPUs.
>
> [...]
>
>> In a single-chip system I agree that it is not needed, the CPU can do
>> the load-balancing in software. But in order to have the hardware do
>> load-balancing on a switch-to-switch LAG, you need to send a FORWARD.
>> 
>> FROM_CPUs would just follow whatever is in the device mapping table. You
>> essentially have the inverse of the TO_CPU problem, but on Tx FROM_CPU
>> would make up 100% of traffic.
>
> Woah, hold on, could you explain in more detail for non-expert people
> like myself to understand.
>
> So FROM_CPU frames (what tag_edsa.c uses now in xmit) can encode a
> _single_ destination port in the frame header.

Correct.

> Whereas the FORWARD frames encode a _source_ port in the frame header.
> You inject FORWARD frames from the CPU port, and you just let the L2
> forwarding process select the adequate destination ports (or LAG, if
> any ports are under one) _automatically_. The reason why you do this, is
> because you want to take advantage of the switch's flooding abilities in
> order to replicate the packet into 4 packets. So you will avoid cloning
> that packet in the bridge in the first place.

Exactly so.

> But correct me if I'm wrong, sending a FORWARD frame from the CPU is a
> slippery slope, since you're never sure that the switch will perform the
> replication exactly as you intended to. The switch will replicate a
> FORWARD frame by looking up the FDB, and we don't even attempt in DSA to
> keep the FDB in sync between software and hardware. And that's why we
> send FROM_CPU frames in tag_edsa.c and not FORWARD frames.

I'm not sure if I agree that it's a slippery slope. The whole point of
the switchdev effort is to sync the switch with the bridge. We trust the
fabric to do all the steps you describe for _all_ other ports.

> What you are really looking for is hardware where the destination field
> for FROM_CPU packets is not a single port index, but a port mask.
>
> Right?

Sure, if that's available it's great. Chips from Marvell's Prestera line
can do this, and many others I'm sure. Alas, LinkStreet devices can not,
and I still want the best performance I can get i that case.

> Also, this problem is completely orthogonal to LAG? Where does LAG even
> come into play here?

It matters if you setup switch-to-switch LAGs. FROM_CPU packets encode
the final device/port, and switches will forward those packet according
to their device mapping tables, which selects a _single_ local port to
use to reach a remote device/port. So all FROM_CPU packets to a given
device/port will always travel through the same set of ports.

In the FORWARD case, you look up the destination in the FDB of each
device, find that it is located on the other side of a LAG, and the
hardware will perform load-balancing.

>> Other than that there are some things that, while strictly speaking
>> possible to do without FORWARDs, become much easier to deal with:
>> 
>> - Multicast routing. This is one case where performance _really_ suffers
>>   from having to skb_clone() to each recipient.
>> 
>> - Bridging between virtual interfaces and DSA ports. Typical example is
>>   an L2 VPN tunnel or one end of a veth pair. On FROM_CPUs, the switch
>>   can not perform SA learning, which means that once you bridge traffic
>>   from the VPN out to a DSA port, the return traffic will be classified
>>   as unknown unicast by the switch and be flooded everywhere.
>
> And how is this going to solve that problem? You mean that the switch
> learns only from FORWARD, but not from FROM_CPU?

Yes, so when you send the FORWARD the switch knows that the station is
located somewhere behind the CPU port. It does not know exactly where,
i.e. it has no knowledge of the VPN tunnel or anything. It just directs
it towards the CPU and the bridge's FDB will take care of the rest.

> Why don't you attempt to solve this more generically somehow? Your
> switch is not the only one that can't perform source address learning
> for injected traffic, there are tons more, some are not even DSA. We
> can't have everybody roll their own solution.

Who said anything about rolling my solution? I'm going for a generic
solution where a netdev can announce to the bridge it is being added to
that it can offload forwarding of packets for all ports belonging to the
same switchdev device. Most probably modeled after how the macvlan
offloading stuff is done.

In the case of mv88e6xxx that would kill two birds with one stone -
great! In other cases you might have to have the DSA subsystem listen to
new neighbors appearing on the bridge and sync those to hardware or
something. Hopefully someone working with that kind of hardware can
solve that problem.

  reply	other threads:[~2020-10-27 20:53 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-27 10:51 [RFC PATCH 0/4] net: dsa: link aggregation support Tobias Waldekranz
2020-10-27 10:51 ` [RFC PATCH 1/4] net: dsa: mv88e6xxx: use ethertyped dsa for 6390/6390X Tobias Waldekranz
2020-10-27 14:52   ` Marek Behun
2020-10-27 14:54     ` Marek Behun
2020-10-27 14:58       ` Marek Behun
2020-10-27 10:51 ` [RFC PATCH 2/4] net: dsa: link aggregation support Tobias Waldekranz
2020-10-28  0:58   ` Vladimir Oltean
2020-10-28 14:03     ` Tobias Waldekranz
2020-10-27 10:51 ` [RFC PATCH 3/4] net: dsa: mv88e6xxx: " Tobias Waldekranz
2020-10-29  5:28   ` kernel test robot
2020-10-29 11:54   ` Dan Carpenter
2020-10-29 11:54     ` Dan Carpenter
2020-10-27 10:51 ` [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices Tobias Waldekranz
2020-10-28 12:05   ` Vladimir Oltean
2020-10-28 15:28     ` Tobias Waldekranz
2020-10-28 18:18       ` Vladimir Oltean
2020-10-28 22:31         ` Tobias Waldekranz
2020-10-28 23:08           ` Vladimir Oltean
2020-10-29  7:47             ` Tobias Waldekranz
2020-10-30  9:21               ` Vladimir Oltean
2020-11-01 11:31         ` Ido Schimmel
2020-10-27 12:27 ` [RFC PATCH 0/4] net: dsa: link aggregation support Vladimir Oltean
2020-10-27 14:29   ` Andrew Lunn
2020-10-27 14:59   ` Tobias Waldekranz
2020-10-27 14:00 ` Andrew Lunn
2020-10-27 15:09   ` Tobias Waldekranz
2020-10-27 15:05 ` Marek Behun
2020-10-27 15:23   ` Andrew Lunn
2020-10-27 18:25     ` Tobias Waldekranz
2020-10-27 18:33       ` Marek Behun
2020-10-27 19:04         ` Vladimir Oltean
2020-10-27 19:21           ` Tobias Waldekranz
2020-10-27 19:00       ` Vladimir Oltean
2020-10-27 19:37         ` Tobias Waldekranz
2020-10-27 20:02           ` Vladimir Oltean
2020-10-27 20:53             ` Tobias Waldekranz [this message]
2020-10-27 22:32               ` Vladimir Oltean
2020-10-28  0:27                 ` Tobias Waldekranz
2020-10-28 22:35       ` Marek Behun
2020-10-27 22:36 ` Andrew Lunn
2020-10-28  0:45   ` Tobias Waldekranz
2020-10-28  1:03     ` Andrew Lunn
2020-11-11  4:28 ` Florian Fainelli
2020-11-19 10:51 ` Vladimir Oltean
2020-11-19 11:52   ` Tobias Waldekranz
2020-11-19 18:12     ` Vladimir Oltean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878sbrv19i.fsf@waldekranz.com \
    --to=tobias@waldekranz.com \
    --cc=andrew@lunn.ch \
    --cc=f.fainelli@gmail.com \
    --cc=marek.behun@nic.cz \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=vivien.didelot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.