From: Ido Schimmel <idosch@idosch.org>
To: Vladimir Oltean <olteanv@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>,
Florian Fainelli <f.fainelli@gmail.com>,
Vivien Didelot <vivien.didelot@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
Jiri Pirko <jiri@resnulli.us>, Jakub Kicinski <kuba@kernel.org>,
Ivan Vecera <ivecera@redhat.com>, netdev <netdev@vger.kernel.org>,
Horatiu Vultur <horatiu.vultur@microchip.com>,
"Allan W. Nielsen" <allan.nielsen@microchip.com>,
Nikolay Aleksandrov <nikolay@cumulusnetworks.com>,
Roopa Prabhu <roopa@cumulusnetworks.com>
Subject: Re: [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding
Date: Mon, 25 May 2020 23:11:11 +0300 [thread overview]
Message-ID: <20200525201111.GB1449199@splinter> (raw)
In-Reply-To: <CA+h21hpktNFcpwxTXVFikyWfgHkFZDofZ8=qVqraxcUp_EwJqg@mail.gmail.com>
On Sun, May 24, 2020 at 07:13:46PM +0300, Vladimir Oltean wrote:
> Hi Ido,
>
> On Sun, 24 May 2020 at 17:26, Ido Schimmel <idosch@idosch.org> wrote:
> >
> > On Fri, May 22, 2020 at 12:10:33AM +0300, Vladimir Oltean wrote:
> > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > >
> > > In cases where the bridge is offloaded by a switchdev, there are
> > > situations where we can optimize RX filtering towards the host. To be
> > > precise, the host only needs to do termination, which it can do by
> > > responding at the MAC addresses of the slave ports and of the bridge
> > > interface itself. But most notably, it doesn't need to do forwarding,
> > > so there is no need to see packets with unknown destination address.
> > >
> > > But there are, however, cases when a switchdev does need to flood to the
> > > CPU. Such an example is when the switchdev is bridged with a foreign
> > > interface, and since there is no offloaded datapath, packets need to
> > > pass through the CPU. Currently this is the only identified case, but it
> > > can be extended at any time.
> > >
> > > So far, switchdev implementers made driver-level assumptions, such as:
> > > this chip is never integrated in SoCs where it can be bridged with a
> > > foreign interface, so I'll just disable host flooding and save some CPU
> > > cycles. Or: I can never know what else can be bridged with this
> > > switchdev port, so I must leave host flooding enabled in any case.
> > >
> > > Let the bridge drive the host flooding decision, and pass it to
> > > switchdev via the same mechanism as the external flooding flags.
> > >
> > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > ---
> > > include/linux/if_bridge.h | 3 +++
> > > net/bridge/br_if.c | 40 +++++++++++++++++++++++++++++++++++++++
> > > net/bridge/br_switchdev.c | 4 +++-
> > > 3 files changed, 46 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> > > index b3a8d3054af0..6891a432862d 100644
> > > --- a/include/linux/if_bridge.h
> > > +++ b/include/linux/if_bridge.h
> > > @@ -49,6 +49,9 @@ struct br_ip_list {
> > > #define BR_ISOLATED BIT(16)
> > > #define BR_MRP_AWARE BIT(17)
> > > #define BR_MRP_LOST_CONT BIT(18)
> > > +#define BR_HOST_FLOOD BIT(19)
> > > +#define BR_HOST_MCAST_FLOOD BIT(20)
> > > +#define BR_HOST_BCAST_FLOOD BIT(21)
> > >
> > > #define BR_DEFAULT_AGEING_TIME (300 * HZ)
> > >
> > > diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> > > index a0e9a7937412..aae59d1e619b 100644
> > > --- a/net/bridge/br_if.c
> > > +++ b/net/bridge/br_if.c
> > > @@ -166,6 +166,45 @@ void br_manage_promisc(struct net_bridge *br)
> > > }
> > > }
> > >
> > > +static int br_manage_host_flood(struct net_bridge *br)
> > > +{
> > > + const unsigned long mask = BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD |
> > > + BR_HOST_BCAST_FLOOD;
> > > + struct net_bridge_port *p, *q;
> > > +
> > > + list_for_each_entry(p, &br->port_list, list) {
> > > + unsigned long flags = p->flags;
> > > + bool sw_bridging = false;
> > > + int err;
> > > +
> > > + list_for_each_entry(q, &br->port_list, list) {
> > > + if (p == q)
> > > + continue;
> > > +
> > > + if (!netdev_port_same_parent_id(p->dev, q->dev)) {
> > > + sw_bridging = true;
> >
> > It's not that simple. There are cases where not all bridge slaves have
> > the same parent ID and still there is no reason to flood traffic to the
> > CPU. VXLAN, for example.
> >
> > You could argue that the VXLAN device needs to have the same parent ID
> > as the physical netdevs member in the bridge, but it will break your
> > data path. For example, lets assume your hardware decided to flood a
> > packet in L2. The packet will egress all the local ports, but will also
> > perform VXLAN encapsulation. The packet continues with the IP of the
> > remote VTEP(s) to the underlay router and then encounters a neighbour
> > miss exception, which sends it to the CPU for resolution.
> >
> > Since this exception was encountered in the router the driver would mark
> > the packet with 'offload_fwd_mark', as it already performed L2
> > forwarding. If the VXLAN device has the same parent ID as the physical
> > netdevs, then the Linux bridge will never let it egress, nothing will
> > trigger neighbour resolution and the packet will be discarded.
> >
>
> I wasn't going to argue that.
> Ok, so with a bridged VXLAN only certain multicast DMACs corresponding
> to multicast IPs should be flooded to the CPU.
> Actually Allan's example was a bit simpler, he said that host flooding
> can be made a per-VLAN flag. I'm glad that you raised this. So maybe
> we should try to define some mechanism by which virtual interfaces can
> specify to the bridge that they don't need to see all traffic? Do you
> have any ideas?
Maybe, when a port joins a bridge, query member ports if they can
forward traffic to it in hardware and based on the answer determine the
flooding towards the CPU?
>
> > > + break;
> > > + }
> > > + }
> > > +
> > > + if (sw_bridging)
> > > + flags |= mask;
> > > + else
> > > + flags &= ~mask;
> > > +
> > > + if (flags == p->flags)
> > > + continue;
> > > +
> > > + err = br_switchdev_set_port_flag(p, flags, mask);
> > > + if (err)
> > > + return err;
> > > +
> > > + p->flags = flags;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > int nbp_backup_change(struct net_bridge_port *p,
> > > struct net_device *backup_dev)
> > > {
> > > @@ -231,6 +270,7 @@ static void nbp_update_port_count(struct net_bridge *br)
> > > br->auto_cnt = cnt;
> > > br_manage_promisc(br);
> > > }
> > > + br_manage_host_flood(br);
> > > }
> > >
> > > static void nbp_delete_promisc(struct net_bridge_port *p)
> > > diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> > > index 015209bf44aa..360806ac7463 100644
> > > --- a/net/bridge/br_switchdev.c
> > > +++ b/net/bridge/br_switchdev.c
> > > @@ -56,7 +56,9 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
> > >
> > > /* Flags that can be offloaded to hardware */
> > > #define BR_PORT_FLAGS_HW_OFFLOAD (BR_LEARNING | BR_FLOOD | \
> > > - BR_MCAST_FLOOD | BR_BCAST_FLOOD)
> > > + BR_MCAST_FLOOD | BR_BCAST_FLOOD | \
> > > + BR_HOST_FLOOD | BR_HOST_MCAST_FLOOD | \
> > > + BR_HOST_BCAST_FLOOD)
> > >
> > > int br_switchdev_set_port_flag(struct net_bridge_port *p,
> > > unsigned long flags,
> > > --
> > > 2.25.1
> > >
>
> Thanks,
> -Vladimir
next prev parent reply other threads:[~2020-05-25 20:11 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-21 21:10 [PATCH RFC net-next 00/13] RX filtering for DSA switches Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 01/13] net: core: dev_addr_lists: add VID to device address Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 02/13] net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 03/13] net: 8021q: vlan_dev: add vid tag for vlan device own address Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 04/13] ethernet: eth: add default vid len for all ethernet kind devices Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 05/13] net: bridge: multicast: propagate br_mc_disabled_update() return Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 06/13] net: core: dev_addr_lists: export some raw __hw_addr helpers Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 07/13] net: dsa: don't use switchdev_notifier_fdb_info in dsa_switchdev_event_work Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 08/13] net: dsa: add ability to program unicast and multicast filters for CPU port Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 09/13] net: dsa: mroute: don't panic the kernel if called without the prepare phase Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 10/13] net: bridge: add port flags for host flooding Vladimir Oltean
2020-05-22 12:38 ` Nikolay Aleksandrov
2020-05-22 13:13 ` Vladimir Oltean
2020-05-22 18:45 ` Allan W. Nielsen
2020-07-20 11:08 ` Vladimir Oltean
2020-05-24 14:26 ` Ido Schimmel
2020-05-24 16:13 ` Vladimir Oltean
2020-05-25 20:11 ` Ido Schimmel [this message]
2020-05-25 20:32 ` Vladimir Oltean
2020-07-23 22:35 ` Vladimir Oltean
2020-07-27 17:15 ` Ido Schimmel
2020-05-21 21:10 ` [PATCH RFC net-next 11/13] net: dsa: deal with new flooding port attributes from bridge Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 12/13] net: dsa: treat switchdev notifications for multicast router connected to port Vladimir Oltean
2020-05-21 21:10 ` [PATCH RFC net-next 13/13] net: dsa: wire up multicast IGMP snooping attribute notification Vladimir Oltean
2020-05-22 18:42 ` [PATCH RFC net-next 00/13] RX filtering for DSA switches Allan W. Nielsen
2020-05-24 14:06 ` Ido Schimmel
2020-05-24 16:24 ` Vladimir Oltean
2020-05-25 19:48 ` Ido Schimmel
2020-05-25 20:23 ` Vladimir Oltean
2020-05-26 14:01 ` Ido Schimmel
2020-05-27 11:36 ` Vladimir Oltean
2020-05-28 14:37 ` Ido Schimmel
2020-07-20 10:00 ` Vladimir Oltean
2020-07-27 16:56 ` Ido Schimmel
2020-10-27 11:52 ` Vladimir Oltean
2020-10-28 14:43 ` Ido Schimmel
2020-10-28 18:46 ` Vladimir Oltean
2020-11-01 11:27 ` Ido Schimmel
2020-11-01 12:06 ` Vladimir Oltean
2020-11-01 14:42 ` Ido Schimmel
2020-11-01 15:04 ` Vladimir Oltean
2020-11-01 15:39 ` Ido Schimmel
2020-11-01 16:13 ` Vladimir Oltean
2020-11-11 4:12 ` Florian Fainelli
2020-05-24 16:13 ` Florian Fainelli
2020-05-24 16:34 ` Vladimir Oltean
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200525201111.GB1449199@splinter \
--to=idosch@idosch.org \
--cc=allan.nielsen@microchip.com \
--cc=andrew@lunn.ch \
--cc=davem@davemloft.net \
--cc=f.fainelli@gmail.com \
--cc=horatiu.vultur@microchip.com \
--cc=ivecera@redhat.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=nikolay@cumulusnetworks.com \
--cc=olteanv@gmail.com \
--cc=roopa@cumulusnetworks.com \
--cc=vivien.didelot@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).