* [Bridge] [PATCH v2 net-next 02/11] net: bridge: offload all port flags at once in br_setport
[not found] <20210209151936.97382-1-olteanv@gmail.com>
@ 2021-02-09 15:19 ` Vladimir Oltean
2021-02-09 18:27 ` Vladimir Oltean
2021-02-09 15:19 ` [Bridge] [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev Vladimir Oltean
` (6 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2021-02-09 15:19 UTC (permalink / raw)
To: Jakub Kicinski, David S. Miller
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, netdev, bridge, Ioana Ciornei,
linux-kernel, UNGLinuxDriver, Taras Chornyi, Ido Schimmel,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, linux-omap, Vivien Didelot
From: Vladimir Oltean <vladimir.oltean@nxp.com>
The br_switchdev_set_port_flag function uses the atomic notifier call
chain because br_setport runs in an atomic section (under br->lock).
This is because port flag changes need to be synchronized with the data
path. But actually the switchdev notifier doesn't need that, only
br_set_port_flag does. So we can collect all the port flag changes and
only emit the notification at the end, then revert the changes if the
switchdev notification failed.
There's also the other aspect: if for example this command:
ip link set swp0 type bridge_slave flood off mcast_flood off learning off
succeeded at configuring BR_FLOOD and BR_MCAST_FLOOD but not at
BR_LEARNING, there would be no attempt to revert the partial state in
any way. Arguably, if the user changes more than one flag through the
same netlink command, this one _should_ be all or nothing, which means
it should be passed through switchdev as all or nothing.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
Changes in v2:
Patch is new.
net/bridge/br_netlink.c | 155 +++++++++++++++-----------------------
net/bridge/br_switchdev.c | 7 +-
2 files changed, 66 insertions(+), 96 deletions(-)
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index bd3962da345a..2c110bcbc6d0 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -853,103 +853,82 @@ static int br_set_port_state(struct net_bridge_port *p, u8 state)
}
/* Set/clear or port flags based on attribute */
-static int br_set_port_flag(struct net_bridge_port *p, struct nlattr *tb[],
- int attrtype, unsigned long mask)
+static void br_set_port_flag(struct net_bridge_port *p, struct nlattr *tb[],
+ int attrtype, unsigned long mask)
{
- unsigned long flags;
- int err;
-
if (!tb[attrtype])
- return 0;
+ return;
if (nla_get_u8(tb[attrtype]))
- flags = p->flags | mask;
+ p->flags |= mask;
else
- flags = p->flags & ~mask;
-
- err = br_switchdev_set_port_flag(p, flags, mask);
- if (err)
- return err;
-
- p->flags = flags;
- return 0;
+ p->flags &= ~mask;
}
/* Process bridge protocol info on port */
static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
{
- unsigned long old_flags = p->flags;
- bool br_vlan_tunnel_old = false;
+ unsigned long old_flags, changed_mask;
+ bool br_vlan_tunnel_old;
int err;
- err = br_set_port_flag(p, tb, IFLA_BRPORT_MODE, BR_HAIRPIN_MODE);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_GUARD, BR_BPDU_GUARD);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_FAST_LEAVE, BR_MULTICAST_FAST_LEAVE);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_PROTECT, BR_ROOT_BLOCK);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_LEARNING, BR_LEARNING);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_UNICAST_FLOOD, BR_FLOOD);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_FLOOD, BR_MCAST_FLOOD);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_TO_UCAST, BR_MULTICAST_TO_UNICAST);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_BCAST_FLOOD, BR_BCAST_FLOOD);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP, BR_PROXYARP);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP_WIFI, BR_PROXYARP_WIFI);
- if (err)
+ spin_lock_bh(&p->br->lock);
+
+ old_flags = p->flags;
+ br_vlan_tunnel_old = (old_flags & BR_VLAN_TUNNEL) ? true : false;
+
+ br_set_port_flag(p, tb, IFLA_BRPORT_MODE, BR_HAIRPIN_MODE);
+ br_set_port_flag(p, tb, IFLA_BRPORT_GUARD, BR_BPDU_GUARD);
+ br_set_port_flag(p, tb, IFLA_BRPORT_FAST_LEAVE,
+ BR_MULTICAST_FAST_LEAVE);
+ br_set_port_flag(p, tb, IFLA_BRPORT_PROTECT, BR_ROOT_BLOCK);
+ br_set_port_flag(p, tb, IFLA_BRPORT_LEARNING, BR_LEARNING);
+ br_set_port_flag(p, tb, IFLA_BRPORT_UNICAST_FLOOD, BR_FLOOD);
+ br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_FLOOD, BR_MCAST_FLOOD);
+ br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_TO_UCAST,
+ BR_MULTICAST_TO_UNICAST);
+ br_set_port_flag(p, tb, IFLA_BRPORT_BCAST_FLOOD, BR_BCAST_FLOOD);
+ br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP, BR_PROXYARP);
+ br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP_WIFI, BR_PROXYARP_WIFI);
+ br_set_port_flag(p, tb, IFLA_BRPORT_VLAN_TUNNEL, BR_VLAN_TUNNEL);
+ br_set_port_flag(p, tb, IFLA_BRPORT_NEIGH_SUPPRESS, BR_NEIGH_SUPPRESS);
+ br_set_port_flag(p, tb, IFLA_BRPORT_ISOLATED, BR_ISOLATED);
+
+ changed_mask = old_flags ^ p->flags;
+
+ spin_unlock_bh(&p->br->lock);
+
+ err = br_switchdev_set_port_flag(p, p->flags, changed_mask);
+ if (err) {
+ spin_lock_bh(&p->br->lock);
+ p->flags = old_flags;
+ spin_unlock_bh(&p->br->lock);
return err;
+ }
- br_vlan_tunnel_old = (p->flags & BR_VLAN_TUNNEL) ? true : false;
- err = br_set_port_flag(p, tb, IFLA_BRPORT_VLAN_TUNNEL, BR_VLAN_TUNNEL);
- if (err)
- return err;
+ spin_lock_bh(&p->br->lock);
if (br_vlan_tunnel_old && !(p->flags & BR_VLAN_TUNNEL))
nbp_vlan_tunnel_info_flush(p);
+ br_port_flags_change(p, changed_mask);
+
if (tb[IFLA_BRPORT_COST]) {
err = br_stp_set_path_cost(p, nla_get_u32(tb[IFLA_BRPORT_COST]));
if (err)
- return err;
+ goto out;
}
if (tb[IFLA_BRPORT_PRIORITY]) {
err = br_stp_set_port_priority(p, nla_get_u16(tb[IFLA_BRPORT_PRIORITY]));
if (err)
- return err;
+ goto out;
}
if (tb[IFLA_BRPORT_STATE]) {
err = br_set_port_state(p, nla_get_u8(tb[IFLA_BRPORT_STATE]));
if (err)
- return err;
+ goto out;
}
if (tb[IFLA_BRPORT_FLUSH])
@@ -961,7 +940,7 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
err = br_multicast_set_port_router(p, mcast_router);
if (err)
- return err;
+ goto out;
}
if (tb[IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT]) {
@@ -970,27 +949,20 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
hlimit = nla_get_u32(tb[IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT]);
err = br_multicast_eht_set_hosts_limit(p, hlimit);
if (err)
- return err;
+ goto out;
}
#endif
if (tb[IFLA_BRPORT_GROUP_FWD_MASK]) {
u16 fwd_mask = nla_get_u16(tb[IFLA_BRPORT_GROUP_FWD_MASK]);
- if (fwd_mask & BR_GROUPFWD_MACPAUSE)
- return -EINVAL;
+ if (fwd_mask & BR_GROUPFWD_MACPAUSE) {
+ err = -EINVAL;
+ goto out;
+ }
p->group_fwd_mask = fwd_mask;
}
- err = br_set_port_flag(p, tb, IFLA_BRPORT_NEIGH_SUPPRESS,
- BR_NEIGH_SUPPRESS);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_ISOLATED, BR_ISOLATED);
- if (err)
- return err;
-
if (tb[IFLA_BRPORT_BACKUP_PORT]) {
struct net_device *backup_dev = NULL;
u32 backup_ifindex;
@@ -999,17 +971,21 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
if (backup_ifindex) {
backup_dev = __dev_get_by_index(dev_net(p->dev),
backup_ifindex);
- if (!backup_dev)
- return -ENOENT;
+ if (!backup_dev) {
+ err = -ENOENT;
+ goto out;
+ }
}
err = nbp_backup_change(p, backup_dev);
if (err)
- return err;
+ goto out;
}
- br_port_flags_change(p, old_flags ^ p->flags);
- return 0;
+out:
+ spin_unlock_bh(&p->br->lock);
+
+ return err;
}
/* Change state and parameters on port. */
@@ -1045,9 +1021,7 @@ int br_setlink(struct net_device *dev, struct nlmsghdr *nlh, u16 flags,
if (err)
return err;
- spin_lock_bh(&p->br->lock);
err = br_setport(p, tb);
- spin_unlock_bh(&p->br->lock);
} else {
/* Binary compatibility with old RSTP */
if (nla_len(protinfo) < sizeof(u8))
@@ -1134,17 +1108,10 @@ static int br_port_slave_changelink(struct net_device *brdev,
struct nlattr *data[],
struct netlink_ext_ack *extack)
{
- struct net_bridge *br = netdev_priv(brdev);
- int ret;
-
if (!data)
return 0;
- spin_lock_bh(&br->lock);
- ret = br_setport(br_port_get_rtnl(dev), data);
- spin_unlock_bh(&br->lock);
-
- return ret;
+ return br_setport(br_port_get_rtnl(dev), data);
}
static int br_port_fill_slave_info(struct sk_buff *skb,
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index a9c23ef83443..c004ade25ac0 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -65,16 +65,19 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
struct switchdev_attr attr = {
.orig_dev = p->dev,
.id = SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
- .u.brport_flags = mask,
};
struct switchdev_notifier_port_attr_info info = {
.attr = &attr,
};
int err;
- if (mask & ~BR_PORT_FLAGS_HW_OFFLOAD)
+ flags &= BR_PORT_FLAGS_HW_OFFLOAD;
+ mask &= BR_PORT_FLAGS_HW_OFFLOAD;
+ if (!mask)
return 0;
+ attr.u.brport_flags = mask;
+
/* We run from atomic context here */
err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
&info.info, NULL);
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [Bridge] [PATCH v2 net-next 02/11] net: bridge: offload all port flags at once in br_setport
2021-02-09 15:19 ` [Bridge] [PATCH v2 net-next 02/11] net: bridge: offload all port flags at once in br_setport Vladimir Oltean
@ 2021-02-09 18:27 ` Vladimir Oltean
2021-02-09 18:36 ` Vladimir Oltean
0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2021-02-09 18:27 UTC (permalink / raw)
To: Jakub Kicinski, David S. Miller
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, netdev, bridge, Ioana Ciornei,
linux-kernel, UNGLinuxDriver, Taras Chornyi, Ido Schimmel,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, linux-omap, Vivien Didelot
On Tue, Feb 09, 2021 at 05:19:27PM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> The br_switchdev_set_port_flag function uses the atomic notifier call
> chain because br_setport runs in an atomic section (under br->lock).
> This is because port flag changes need to be synchronized with the data
> path. But actually the switchdev notifier doesn't need that, only
> br_set_port_flag does. So we can collect all the port flag changes and
> only emit the notification at the end, then revert the changes if the
> switchdev notification failed.
>
> There's also the other aspect: if for example this command:
>
> ip link set swp0 type bridge_slave flood off mcast_flood off learning off
>
> succeeded at configuring BR_FLOOD and BR_MCAST_FLOOD but not at
> BR_LEARNING, there would be no attempt to revert the partial state in
> any way. Arguably, if the user changes more than one flag through the
> same netlink command, this one _should_ be all or nothing, which means
> it should be passed through switchdev as all or nothing.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
(...)
> + spin_lock_bh(&p->br->lock);
> +
> + old_flags = p->flags;
> + br_vlan_tunnel_old = (old_flags & BR_VLAN_TUNNEL) ? true : false;
> +
> + br_set_port_flag(p, tb, IFLA_BRPORT_MODE, BR_HAIRPIN_MODE);
> + br_set_port_flag(p, tb, IFLA_BRPORT_GUARD, BR_BPDU_GUARD);
> + br_set_port_flag(p, tb, IFLA_BRPORT_FAST_LEAVE,
> + BR_MULTICAST_FAST_LEAVE);
> + br_set_port_flag(p, tb, IFLA_BRPORT_PROTECT, BR_ROOT_BLOCK);
> + br_set_port_flag(p, tb, IFLA_BRPORT_LEARNING, BR_LEARNING);
> + br_set_port_flag(p, tb, IFLA_BRPORT_UNICAST_FLOOD, BR_FLOOD);
> + br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_FLOOD, BR_MCAST_FLOOD);
> + br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_TO_UCAST,
> + BR_MULTICAST_TO_UNICAST);
> + br_set_port_flag(p, tb, IFLA_BRPORT_BCAST_FLOOD, BR_BCAST_FLOOD);
> + br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP, BR_PROXYARP);
> + br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP_WIFI, BR_PROXYARP_WIFI);
> + br_set_port_flag(p, tb, IFLA_BRPORT_VLAN_TUNNEL, BR_VLAN_TUNNEL);
> + br_set_port_flag(p, tb, IFLA_BRPORT_NEIGH_SUPPRESS, BR_NEIGH_SUPPRESS);
> + br_set_port_flag(p, tb, IFLA_BRPORT_ISOLATED, BR_ISOLATED);
> +
> + changed_mask = old_flags ^ p->flags;
> +
> + spin_unlock_bh(&p->br->lock);
> +
> + err = br_switchdev_set_port_flag(p, p->flags, changed_mask);
> + if (err) {
> + spin_lock_bh(&p->br->lock);
> + p->flags = old_flags;
> + spin_unlock_bh(&p->br->lock);
> return err;
> + }
>
I know it's a bit strange to insert this in the middle of review, but
bear with me.
While I was reworking the patch series to also make sysfs non-atomic,
like this:
-----------------------------[cut here]-----------------------------
From 6ff6714b6686e4f9406425edf15db6c92e944954 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Tue, 9 Feb 2021 19:43:40 +0200
Subject: [PATCH] net: bridge: temporarily drop br->lock for
br_switchdev_set_port_flag in sysfs
Since we would like br_switchdev_set_port_flag to not use an atomic
notifier, it should be called from outside spinlock context.
Dropping the lock creates some concurrency complications:
- There might be an "echo 1 > multicast_flood" simultaneous with an
"echo 0 > multicast_flood". The result of this is nondeterministic
either way, so I'm not too concerned as long as the result is
consistent (no other flags have changed).
- There might be an "echo 1 > multicast_flood" simultaneous with an
"echo 0 > learning". My expectation is that none of the two writes are
"eaten", and the final flags contain BR_MCAST_FLOOD=1 and BR_LEARNING=0
regardless of the order of execution. That is actually possible if, on
the commit path, we don't do a trivial "p->flags = flags" which might
overwrite bits outside of our mask, but instead we just change the
flags corresponding to our mask.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
net/bridge/br_sysfs_if.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 62540b31e356..b419d9aad548 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -68,17 +68,23 @@ static int store_flag(struct net_bridge_port *p, unsigned long v,
else
flags &= ~mask;
- if (flags != p->flags) {
- err = br_switchdev_set_port_flag(p, flags, mask, &extack);
- if (err) {
- if (extack._msg)
- netdev_err(p->dev, "%s\n", extack._msg);
- return err;
- }
+ if (flags == p->flags)
+ return 0;
- p->flags = flags;
- br_port_flags_change(p, mask);
+ spin_unlock_bh(&p->br->lock);
+ err = br_switchdev_set_port_flag(p, flags, mask, &extack);
+ spin_lock_bh(&p->br->lock);
+ if (err) {
+ if (extack._msg)
+ netdev_err(p->dev, "%s\n", extack._msg);
+ return err;
}
+
+ p->flags &= ~mask;
+ p->flags |= (flags & mask);
+
+ br_port_flags_change(p, mask);
+
return 0;
}
-----------------------------[cut here]-----------------------------
I figured there's a similar problem in this patch, which I had missed.
The code now looks like this:
changed_mask = old_flags ^ p->flags;
flags = p->flags;
spin_unlock_bh(&p->br->lock);
err = br_switchdev_set_port_flag(p, flags, changed_mask, extack);
if (err) {
spin_lock_bh(&p->br->lock);
p->flags &= ~changed_mask;
p->flags |= (old_flags & changed_mask);
spin_unlock_bh(&p->br->lock);
return err;
}
spin_lock_bh(&p->br->lock);
where I no longer access p->flags directly when calling
br_switchdev_set_port_flag (because I'm not protected by br->lock) but a
copy of it saved on stack. Also, I restore just the mask portion of
p->flags.
But there's an interesting side effect of allowing
br_switchdev_set_port_flag to run concurrently (notifier call chains use
a rw_semaphore and only take the read side). Basically now drivers that
cache the brport flags in their entirety are broken, because there isn't
any guarantee that bits outside the mask are valid any longer (we can
even enforce that by masking the flags with the mask when notifying
them). They would need to do the same trick of updating just the masked
part of their cached flags. Except for the fact that they would need
some sort of spinlock too, I don't think that the basic bitwise
operations are atomic or anything like that. I'm a bit reluctant to add
a spinlock in prestera, rocker, mlxsw just for this purpose. What do you
think?
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [Bridge] [PATCH v2 net-next 02/11] net: bridge: offload all port flags at once in br_setport
2021-02-09 18:27 ` Vladimir Oltean
@ 2021-02-09 18:36 ` Vladimir Oltean
0 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2021-02-09 18:36 UTC (permalink / raw)
To: Jakub Kicinski, David S. Miller
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, netdev, bridge, Ioana Ciornei,
linux-kernel, UNGLinuxDriver, Taras Chornyi, Ido Schimmel,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, linux-omap, Vivien Didelot
On Tue, Feb 09, 2021 at 08:27:24PM +0200, Vladimir Oltean wrote:
> But there's an interesting side effect of allowing
> br_switchdev_set_port_flag to run concurrently (notifier call chains use
> a rw_semaphore and only take the read side). Basically now drivers that
> cache the brport flags in their entirety are broken, because there isn't
> any guarantee that bits outside the mask are valid any longer (we can
> even enforce that by masking the flags with the mask when notifying
> them). They would need to do the same trick of updating just the masked
> part of their cached flags. Except for the fact that they would need
> some sort of spinlock too, I don't think that the basic bitwise
> operations are atomic or anything like that. I'm a bit reluctant to add
> a spinlock in prestera, rocker, mlxsw just for this purpose. What do you
> think?
My take on things is that I can change those drivers to do what ocelot
and sja1105 do, which is to just have some bool values like this:
if (flags.mask & BR_LEARNING)
ocelot_port->learn_ena = !!(flags.val & BR_LEARNING);
which eliminates concurrency to the shared unsigned long brport_flags
variable. No locking, no complications.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bridge] [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev
[not found] <20210209151936.97382-1-olteanv@gmail.com>
2021-02-09 15:19 ` [Bridge] [PATCH v2 net-next 02/11] net: bridge: offload all port flags at once in br_setport Vladimir Oltean
@ 2021-02-09 15:19 ` Vladimir Oltean
2021-02-09 18:51 ` Ido Schimmel
2021-02-09 15:19 ` [Bridge] [PATCH v2 net-next 05/11] net: dsa: stop setting initial and final brport flags Vladimir Oltean
` (5 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2021-02-09 15:19 UTC (permalink / raw)
To: Jakub Kicinski, David S. Miller
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, netdev, bridge, Ioana Ciornei,
linux-kernel, UNGLinuxDriver, Taras Chornyi, Ido Schimmel,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, linux-omap, Vivien Didelot
From: Vladimir Oltean <vladimir.oltean@nxp.com>
It must first be admitted that switchdev device drivers have a life
beyond the bridge, and when they aren't offloading the bridge driver
they are operating with forwarding disabled between ports, emulating as
closely as possible N standalone network interfaces.
Now it must be said that for a switchdev port operating in standalone
mode, address learning doesn't make much sense since that is a bridge
function. In fact, address learning even breaks setups such as this one:
+---------------------------------------------+
| |
| +-------------------+ |
| | br0 | send receive |
| +--------+-+--------+ +--------+ +--------+ |
| | | | | | | | | |
| | swp0 | | swp1 | | swp2 | | swp3 | |
| | | | | | | | | |
+-+--------+-+--------+-+--------+-+--------+-+
| ^ | ^
| | | |
| +-----------+ |
| |
+--------------------------------+
because if the ASIC has a single FDB (can offload a single bridge)
then source address learning on swp3 can "steal" the source MAC address
of swp2 from br0's FDB, because learning frames coming from swp2 will be
done twice: first on the swp1 ingress port, second on the swp3 ingress
port. So the hardware FDB will become out of sync with the software
bridge, and when swp2 tries to send one more packet towards swp1, the
ASIC will attempt to short-circuit the forwarding path and send it
directly to swp3 (since that's the last port it learned that address on),
which it obviously can't, because swp3 operates in standalone mode.
So switchdev drivers operating in standalone mode should disable address
learning. As a matter of practicality, we can reduce code duplication in
drivers by having the bridge notify through switchdev of the initial and
final brport flags. Then, drivers can simply start up hardcoded for no
address learning (similar to how they already start up hardcoded for no
forwarding), then they only need to listen for
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no
need for special cases when the port joins or leaves the bridge etc.
When a port leaves the bridge (and therefore becomes standalone), we
issue a switchdev attribute that apart from disabling address learning,
enables flooding of all kinds. This is also done for pragmatic reasons,
because even though standalone switchdev ports might not need to have
flooding enabled in order to inject traffic with any MAC DA from the
control interface, it certainly doesn't hurt either, and it even makes
more sense than disabling flooding of unknown traffic towards that port.
Note that the implementation is a bit wacky because the switchdev API
for port attributes is very counterproductive. Instead of issuing a
single switchdev notification with a bitwise OR of all flags that we're
modifying, we need to issue 4 individual notifications, one for each bit.
This is because the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS notifier
forces you to refuse the entire operation if there's at least one bit
which you can't offload, and that is currently BR_BCAST_FLOOD which
nobody does. So this change would do nothing for no one if we offloaded
all flags at once, but the idea is to offload as much as possible
instead of all or nothing.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
Changes in v2:
- Renamed nbp_flags_change to nbp_flags_notify.
- Don't return any errors, offload flags one by one as opposed to all at
once.
include/linux/if_bridge.h | 3 +++
net/bridge/br_if.c | 21 ++++++++++++++++++++-
net/bridge/br_switchdev.c | 3 +--
3 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index b979005ea39c..36d77fa8f40b 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -58,6 +58,9 @@ struct br_ip_list {
#define BR_MRP_LOST_CONT BIT(18)
#define BR_MRP_LOST_IN_CONT BIT(19)
+#define BR_PORT_DEFAULT_FLAGS (BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD | \
+ BR_LEARNING)
+
#define BR_DEFAULT_AGEING_TIME (300 * HZ)
extern void brioctl_set(int (*ioctl_hook)(struct net *, unsigned int, void __user *));
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index f7d2f472ae24..f813eec986ba 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -89,6 +89,23 @@ void br_port_carrier_check(struct net_bridge_port *p, bool *notified)
spin_unlock_bh(&br->lock);
}
+/* If @mask has multiple bits set at once, offload them one by one to
+ * switchdev, to allow it to reject only what it doesn't support and accept
+ * what it does.
+ */
+static void nbp_flags_notify(struct net_bridge_port *p, unsigned long flags,
+ unsigned long mask)
+{
+ int flag;
+
+ for_each_set_bit(flag, &mask, 32)
+ br_switchdev_set_port_flag(p, flags & BIT(flag),
+ BIT(flag), NULL);
+
+ p->flags &= ~mask;
+ p->flags |= flags;
+}
+
static void br_port_set_promisc(struct net_bridge_port *p)
{
int err = 0;
@@ -343,6 +360,8 @@ static void del_nbp(struct net_bridge_port *p)
update_headroom(br, get_max_headroom(br));
netdev_reset_rx_headroom(dev);
+ nbp_flags_notify(p, BR_PORT_DEFAULT_FLAGS & ~BR_LEARNING,
+ BR_PORT_DEFAULT_FLAGS);
nbp_vlan_flush(p);
br_fdb_delete_by_port(br, p, 0, 1);
switchdev_deferred_process();
@@ -428,7 +447,7 @@ static struct net_bridge_port *new_nbp(struct net_bridge *br,
p->path_cost = port_cost(dev);
p->priority = 0x8000 >> BR_PORT_BITS;
p->port_no = index;
- p->flags = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
+ nbp_flags_notify(p, BR_PORT_DEFAULT_FLAGS, BR_PORT_DEFAULT_FLAGS);
br_init_port(p);
br_set_state(p, BR_STATE_DISABLED);
br_stp_port_timer_init(p);
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index ac8dead86bf2..1fae532cfbb1 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -55,8 +55,7 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
}
/* Flags that can be offloaded to hardware */
-#define BR_PORT_FLAGS_HW_OFFLOAD (BR_LEARNING | BR_FLOOD | \
- BR_MCAST_FLOOD | BR_BCAST_FLOOD)
+#define BR_PORT_FLAGS_HW_OFFLOAD BR_PORT_DEFAULT_FLAGS
int br_switchdev_set_port_flag(struct net_bridge_port *p,
unsigned long flags,
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [Bridge] [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev
2021-02-09 15:19 ` [Bridge] [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev Vladimir Oltean
@ 2021-02-09 18:51 ` Ido Schimmel
2021-02-09 20:20 ` Vladimir Oltean
0 siblings, 1 reply; 22+ messages in thread
From: Ido Schimmel @ 2021-02-09 18:51 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, linux-omap, netdev, bridge,
Ioana Ciornei, linux-kernel, Vivien Didelot, Taras Chornyi,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, Jakub Kicinski, UNGLinuxDriver, David S. Miller
On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote:
> So switchdev drivers operating in standalone mode should disable address
> learning. As a matter of practicality, we can reduce code duplication in
> drivers by having the bridge notify through switchdev of the initial and
> final brport flags. Then, drivers can simply start up hardcoded for no
> address learning (similar to how they already start up hardcoded for no
> forwarding), then they only need to listen for
> SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no
> need for special cases when the port joins or leaves the bridge etc.
How are you handling the case where a port leaves a LAG that is linked
to a bridge? In this case the port becomes a standalone port, but will
not get this notification.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bridge] [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev
2021-02-09 18:51 ` Ido Schimmel
@ 2021-02-09 20:20 ` Vladimir Oltean
2021-02-09 22:01 ` Ido Schimmel
0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2021-02-09 20:20 UTC (permalink / raw)
To: Ido Schimmel
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, linux-omap, netdev, bridge,
Ioana Ciornei, linux-kernel, Vivien Didelot, Taras Chornyi,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, Jakub Kicinski, UNGLinuxDriver, David S. Miller
On Tue, Feb 09, 2021 at 08:51:00PM +0200, Ido Schimmel wrote:
> On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote:
> > So switchdev drivers operating in standalone mode should disable address
> > learning. As a matter of practicality, we can reduce code duplication in
> > drivers by having the bridge notify through switchdev of the initial and
> > final brport flags. Then, drivers can simply start up hardcoded for no
> > address learning (similar to how they already start up hardcoded for no
> > forwarding), then they only need to listen for
> > SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no
> > need for special cases when the port joins or leaves the bridge etc.
>
> How are you handling the case where a port leaves a LAG that is linked
> to a bridge? In this case the port becomes a standalone port, but will
> not get this notification.
Apparently the answer to that question is "I delete the code that makes
this use case work", how smart of me. Thanks.
Unless you have any idea how I could move the logic into the bridge, I
guess I'm stuck with DSA and all the other switchdev drivers having this
forest of corner cases to deal with. At least I can add a comment so I'm
not tempted to delete it next time.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bridge] [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev
2021-02-09 20:20 ` Vladimir Oltean
@ 2021-02-09 22:01 ` Ido Schimmel
2021-02-09 22:51 ` Vladimir Oltean
0 siblings, 1 reply; 22+ messages in thread
From: Ido Schimmel @ 2021-02-09 22:01 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, linux-omap, netdev, bridge,
Ioana Ciornei, linux-kernel, Vivien Didelot, Taras Chornyi,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, Jakub Kicinski, UNGLinuxDriver, David S. Miller
On Tue, Feb 09, 2021 at 10:20:45PM +0200, Vladimir Oltean wrote:
> On Tue, Feb 09, 2021 at 08:51:00PM +0200, Ido Schimmel wrote:
> > On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote:
> > > So switchdev drivers operating in standalone mode should disable address
> > > learning. As a matter of practicality, we can reduce code duplication in
> > > drivers by having the bridge notify through switchdev of the initial and
> > > final brport flags. Then, drivers can simply start up hardcoded for no
> > > address learning (similar to how they already start up hardcoded for no
> > > forwarding), then they only need to listen for
> > > SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no
> > > need for special cases when the port joins or leaves the bridge etc.
> >
> > How are you handling the case where a port leaves a LAG that is linked
> > to a bridge? In this case the port becomes a standalone port, but will
> > not get this notification.
>
> Apparently the answer to that question is "I delete the code that makes
> this use case work", how smart of me. Thanks.
Not sure how you expect to interpret this.
>
> Unless you have any idea how I could move the logic into the bridge, I
> guess I'm stuck with DSA and all the other switchdev drivers having this
> forest of corner cases to deal with. At least I can add a comment so I'm
> not tempted to delete it next time.
There are too many moving pieces with stacked devices. It is not only
LAG/bridge. In L3 you have VRFs, SVIs, macvlans etc. It might be better
to gracefully / explicitly not handle a case rather than pretending to
handle it correctly with complex / buggy code.
For example, you should refuse to be enslaved to a LAG that already has
upper devices such as a bridge. You are probably not handling this
correctly / at all. This is easy. Just a call to
netdev_has_any_upper_dev().
The reverse, during unlinking, would be to refuse unlinking if the upper
has uppers of its own. netdev_upper_dev_unlink() needs to learn to
return an error and callers such as team/bond need to learn to handle
it, but it seems patchable.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bridge] [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev
2021-02-09 22:01 ` Ido Schimmel
@ 2021-02-09 22:51 ` Vladimir Oltean
2021-02-10 10:59 ` Ido Schimmel
0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2021-02-09 22:51 UTC (permalink / raw)
To: Ido Schimmel
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, linux-omap, netdev, bridge,
Ioana Ciornei, linux-kernel, Vivien Didelot, Taras Chornyi,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, Jakub Kicinski, UNGLinuxDriver, David S. Miller
On Wed, Feb 10, 2021 at 12:01:24AM +0200, Ido Schimmel wrote:
> On Tue, Feb 09, 2021 at 10:20:45PM +0200, Vladimir Oltean wrote:
> > On Tue, Feb 09, 2021 at 08:51:00PM +0200, Ido Schimmel wrote:
> > > On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote:
> > > > So switchdev drivers operating in standalone mode should disable address
> > > > learning. As a matter of practicality, we can reduce code duplication in
> > > > drivers by having the bridge notify through switchdev of the initial and
> > > > final brport flags. Then, drivers can simply start up hardcoded for no
> > > > address learning (similar to how they already start up hardcoded for no
> > > > forwarding), then they only need to listen for
> > > > SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no
> > > > need for special cases when the port joins or leaves the bridge etc.
> > >
> > > How are you handling the case where a port leaves a LAG that is linked
> > > to a bridge? In this case the port becomes a standalone port, but will
> > > not get this notification.
> >
> > Apparently the answer to that question is "I delete the code that makes
> > this use case work", how smart of me. Thanks.
>
> Not sure how you expect to interpret this.
Next patch (05/11) deletes that explicit notification from dsa_port_bridge_leave,
function which is called from dsa_port_lag_leave too, apparently with good reason.
> > Unless you have any idea how I could move the logic into the bridge, I
> > guess I'm stuck with DSA and all the other switchdev drivers having this
> > forest of corner cases to deal with. At least I can add a comment so I'm
> > not tempted to delete it next time.
>
> There are too many moving pieces with stacked devices. It is not only
> LAG/bridge. In L3 you have VRFs, SVIs, macvlans etc. It might be better
> to gracefully / explicitly not handle a case rather than pretending to
> handle it correctly with complex / buggy code.
>
> For example, you should refuse to be enslaved to a LAG that already has
> upper devices such as a bridge. You are probably not handling this
> correctly / at all. This is easy. Just a call to
> netdev_has_any_upper_dev().
Correct, good point, in particular this means that joining a bridged LAG
will not get me any notifications of that LAG's CHANGEUPPER because that
was consumed a long time ago. An equally valid approach seems to be to
check for netdev_master_upper_dev_get_rcu in dsa_port_lag_join, and call
dsa_port_bridge_join on the upper if that is present.
> The reverse, during unlinking, would be to refuse unlinking if the upper
> has uppers of its own. netdev_upper_dev_unlink() needs to learn to
> return an error and callers such as team/bond need to learn to handle
> it, but it seems patchable.
Again, this was treated prior to my deletion in this series and not by
erroring out, I just really didn't think it through.
So you're saying that if we impose that all switchdev drivers restrict
the house of cards to be constructed from the bottom up, and destructed
from the top down, then the notification of bridge port flags can stay
in the bridge layer?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bridge] [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev
2021-02-09 22:51 ` Vladimir Oltean
@ 2021-02-10 10:59 ` Ido Schimmel
2021-02-10 23:23 ` Vladimir Oltean
0 siblings, 1 reply; 22+ messages in thread
From: Ido Schimmel @ 2021-02-10 10:59 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, linux-omap, netdev, bridge,
Ioana Ciornei, linux-kernel, Vivien Didelot, Taras Chornyi,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, Jakub Kicinski, UNGLinuxDriver, David S. Miller
On Wed, Feb 10, 2021 at 12:51:53AM +0200, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 12:01:24AM +0200, Ido Schimmel wrote:
> > On Tue, Feb 09, 2021 at 10:20:45PM +0200, Vladimir Oltean wrote:
> > > On Tue, Feb 09, 2021 at 08:51:00PM +0200, Ido Schimmel wrote:
> > > > On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote:
> > > > > So switchdev drivers operating in standalone mode should disable address
> > > > > learning. As a matter of practicality, we can reduce code duplication in
> > > > > drivers by having the bridge notify through switchdev of the initial and
> > > > > final brport flags. Then, drivers can simply start up hardcoded for no
> > > > > address learning (similar to how they already start up hardcoded for no
> > > > > forwarding), then they only need to listen for
> > > > > SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no
> > > > > need for special cases when the port joins or leaves the bridge etc.
> > > >
> > > > How are you handling the case where a port leaves a LAG that is linked
> > > > to a bridge? In this case the port becomes a standalone port, but will
> > > > not get this notification.
> > >
> > > Apparently the answer to that question is "I delete the code that makes
> > > this use case work", how smart of me. Thanks.
> >
> > Not sure how you expect to interpret this.
>
> Next patch (05/11) deletes that explicit notification from dsa_port_bridge_leave,
> function which is called from dsa_port_lag_leave too, apparently with good reason.
>
> > > Unless you have any idea how I could move the logic into the bridge, I
> > > guess I'm stuck with DSA and all the other switchdev drivers having this
> > > forest of corner cases to deal with. At least I can add a comment so I'm
> > > not tempted to delete it next time.
> >
> > There are too many moving pieces with stacked devices. It is not only
> > LAG/bridge. In L3 you have VRFs, SVIs, macvlans etc. It might be better
> > to gracefully / explicitly not handle a case rather than pretending to
> > handle it correctly with complex / buggy code.
> >
> > For example, you should refuse to be enslaved to a LAG that already has
> > upper devices such as a bridge. You are probably not handling this
> > correctly / at all. This is easy. Just a call to
> > netdev_has_any_upper_dev().
>
> Correct, good point, in particular this means that joining a bridged LAG
> will not get me any notifications of that LAG's CHANGEUPPER because that
> was consumed a long time ago. An equally valid approach seems to be to
> check for netdev_master_upper_dev_get_rcu in dsa_port_lag_join, and call
> dsa_port_bridge_join on the upper if that is present.
The bridge might already have a state you are not familiar with (e.g.,
FDB entry pointing to the LAG), so best to just forbid this. I think
it's fair to impose such limitations (assuming they are properly
communicated to user space) given it results in a much less
buggy/complex code to maintain.
>
> > The reverse, during unlinking, would be to refuse unlinking if the upper
> > has uppers of its own. netdev_upper_dev_unlink() needs to learn to
> > return an error and callers such as team/bond need to learn to handle
> > it, but it seems patchable.
>
> Again, this was treated prior to my deletion in this series and not by
> erroring out, I just really didn't think it through.
>
> So you're saying that if we impose that all switchdev drivers restrict
> the house of cards to be constructed from the bottom up, and destructed
> from the top down, then the notification of bridge port flags can stay
> in the bridge layer?
I actually don't think it's a good idea to have this in the bridge in
any case. I understand that it makes sense for some devices where
learning, flooding, etc are port attributes, but in other devices these
can be {port,vlan} attributes and then you need to take care of them
when a vlan is added / deleted and not only when a port is removed from
the bridge. So for such devices this really won't save anything. I would
thus leave it to the lower levels to decide.
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [Bridge] [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev
2021-02-10 10:59 ` Ido Schimmel
@ 2021-02-10 23:23 ` Vladimir Oltean
2021-02-11 7:44 ` Ido Schimmel
0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2021-02-10 23:23 UTC (permalink / raw)
To: Ido Schimmel
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, linux-omap, netdev, bridge,
Ioana Ciornei, linux-kernel, Vivien Didelot, Taras Chornyi,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, Jakub Kicinski, UNGLinuxDriver, David S. Miller
On Wed, Feb 10, 2021 at 12:59:49PM +0200, Ido Schimmel wrote:
> > > The reverse, during unlinking, would be to refuse unlinking if the upper
> > > has uppers of its own. netdev_upper_dev_unlink() needs to learn to
> > > return an error and callers such as team/bond need to learn to handle
> > > it, but it seems patchable.
> >
> > Again, this was treated prior to my deletion in this series and not by
> > erroring out, I just really didn't think it through.
> >
> > So you're saying that if we impose that all switchdev drivers restrict
> > the house of cards to be constructed from the bottom up, and destructed
> > from the top down, then the notification of bridge port flags can stay
> > in the bridge layer?
>
> I actually don't think it's a good idea to have this in the bridge in
> any case. I understand that it makes sense for some devices where
> learning, flooding, etc are port attributes, but in other devices these
> can be {port,vlan} attributes and then you need to take care of them
> when a vlan is added / deleted and not only when a port is removed from
> the bridge. So for such devices this really won't save anything. I would
> thus leave it to the lower levels to decide.
Just for my understanding, how are per-{port,vlan} attributes such as
learning and flooding managed by the Linux bridge? How can I disable
flooding only in a certain VLAN?
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [Bridge] [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev
2021-02-10 23:23 ` Vladimir Oltean
@ 2021-02-11 7:44 ` Ido Schimmel
2021-02-11 9:35 ` Vladimir Oltean
0 siblings, 1 reply; 22+ messages in thread
From: Ido Schimmel @ 2021-02-11 7:44 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, linux-omap, netdev, bridge,
Ioana Ciornei, linux-kernel, Vivien Didelot, Taras Chornyi,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, Jakub Kicinski, UNGLinuxDriver, David S. Miller
On Thu, Feb 11, 2021 at 01:23:52AM +0200, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 12:59:49PM +0200, Ido Schimmel wrote:
> > > > The reverse, during unlinking, would be to refuse unlinking if the upper
> > > > has uppers of its own. netdev_upper_dev_unlink() needs to learn to
> > > > return an error and callers such as team/bond need to learn to handle
> > > > it, but it seems patchable.
> > >
> > > Again, this was treated prior to my deletion in this series and not by
> > > erroring out, I just really didn't think it through.
> > >
> > > So you're saying that if we impose that all switchdev drivers restrict
> > > the house of cards to be constructed from the bottom up, and destructed
> > > from the top down, then the notification of bridge port flags can stay
> > > in the bridge layer?
> >
> > I actually don't think it's a good idea to have this in the bridge in
> > any case. I understand that it makes sense for some devices where
> > learning, flooding, etc are port attributes, but in other devices these
> > can be {port,vlan} attributes and then you need to take care of them
> > when a vlan is added / deleted and not only when a port is removed from
> > the bridge. So for such devices this really won't save anything. I would
> > thus leave it to the lower levels to decide.
>
> Just for my understanding, how are per-{port,vlan} attributes such as
> learning and flooding managed by the Linux bridge? How can I disable
> flooding only in a certain VLAN?
You can't (currently). But it does not change the fact that in some
devices these are {port,vlan} attributes and we are talking here about
the interface towards these devices. Having these as {port,vlan}
attributes allows you to support use cases such as a port being enslaved
to a VLAN-aware bridge and its VLAN upper(s) enslaved to VLAN unaware
bridge(s). Obviously you need to ensure there is no conflict between the
VLANs used by the VLAN-aware bridge and the VLAN device(s).
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [Bridge] [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev
2021-02-11 7:44 ` Ido Schimmel
@ 2021-02-11 9:35 ` Vladimir Oltean
2021-02-11 22:20 ` Ido Schimmel
0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2021-02-11 9:35 UTC (permalink / raw)
To: Ido Schimmel
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, linux-omap, netdev, bridge,
Ioana Ciornei, linux-kernel, Vivien Didelot, Taras Chornyi,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, Jakub Kicinski, UNGLinuxDriver, David S. Miller
On Thu, Feb 11, 2021 at 09:44:43AM +0200, Ido Schimmel wrote:
> On Thu, Feb 11, 2021 at 01:23:52AM +0200, Vladimir Oltean wrote:
> > On Wed, Feb 10, 2021 at 12:59:49PM +0200, Ido Schimmel wrote:
> > > > > The reverse, during unlinking, would be to refuse unlinking if the upper
> > > > > has uppers of its own. netdev_upper_dev_unlink() needs to learn to
> > > > > return an error and callers such as team/bond need to learn to handle
> > > > > it, but it seems patchable.
> > > >
> > > > Again, this was treated prior to my deletion in this series and not by
> > > > erroring out, I just really didn't think it through.
> > > >
> > > > So you're saying that if we impose that all switchdev drivers restrict
> > > > the house of cards to be constructed from the bottom up, and destructed
> > > > from the top down, then the notification of bridge port flags can stay
> > > > in the bridge layer?
> > >
> > > I actually don't think it's a good idea to have this in the bridge in
> > > any case. I understand that it makes sense for some devices where
> > > learning, flooding, etc are port attributes, but in other devices these
> > > can be {port,vlan} attributes and then you need to take care of them
> > > when a vlan is added / deleted and not only when a port is removed from
> > > the bridge. So for such devices this really won't save anything. I would
> > > thus leave it to the lower levels to decide.
> >
> > Just for my understanding, how are per-{port,vlan} attributes such as
> > learning and flooding managed by the Linux bridge? How can I disable
> > flooding only in a certain VLAN?
>
> You can't (currently). But it does not change the fact that in some
> devices these are {port,vlan} attributes and we are talking here about
> the interface towards these devices. Having these as {port,vlan}
> attributes allows you to support use cases such as a port being enslaved
> to a VLAN-aware bridge and its VLAN upper(s) enslaved to VLAN unaware
> bridge(s).
I don't think I understand the use case really. You mean something like this?
br1 (vlan_filtering=0)
/ \
/ \
swp0.100 \
| \
|(vlan_filtering \
| br0 =1) \
| / \ \
|/ \ \
swp0 swp1 swp2
A packet received on swp0 with VLAN tag 100 will go to swp0.100 which
will be forwarded according to the FDB of br1, and will be delivered to
swp2 as untagged? Respectively in the other direction, a packet received
on swp2 will have a VLAN 100 tag pushed on egress towards swp0, even if
it is already VLAN-tagged?
What do you even use this for?
And also: if the {port,vlan} attributes can be simulated by making the
bridge port be an 8021q upper of a physical interface, then as far as
the bridge is concerned, they still are per-port attributes, and they
are per-{port,vlan} only as far as the switch driver is concerned -
therefore I don't see why it isn't okay for the bridge to notify the
brport flags in exactly the same way for them too.
> Obviously you need to ensure there is no conflict between the
> VLANs used by the VLAN-aware bridge and the VLAN device(s).
On the other hand I think I have a more real-life use case that I think
is in conflict with this last phrase.
I have a VLAN-aware bridge and I want to run PTP in VLAN 7, but I also
need to add VLAN 7 in the VLAN table of the bridge ports so that it
doesn't drop traffic. PTP is link-local, so I need to run it on VLAN
uppers of the switch ports. Like this:
ip link add br0 type bridge vlan_filtering 1
ip link set swp0 master br0
ip link set swp1 master br0
bridge vlan add dev swp0 vid 7 master
bridge vlan add dev swp1 vid 7 master
bridge vlan add dev br0 vid 7 self
ip link add link swp0 name swp0.7 type vlan id 7
ip link add link swp1 name swp0.7 type vlan id 7
ptp4l -i swp0.7 -i swp1.7 -m
How can I do that considering that you recommend avoiding conflicts
between the VLAN-aware bridge and 8021q uppers? Or is that true only
when the 8021q uppers are bridged?
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [Bridge] [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev
2021-02-11 9:35 ` Vladimir Oltean
@ 2021-02-11 22:20 ` Ido Schimmel
0 siblings, 0 replies; 22+ messages in thread
From: Ido Schimmel @ 2021-02-11 22:20 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, linux-omap, netdev, bridge,
Ioana Ciornei, linux-kernel, Vivien Didelot, Taras Chornyi,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, Jakub Kicinski, UNGLinuxDriver, David S. Miller
On Thu, Feb 11, 2021 at 11:35:27AM +0200, Vladimir Oltean wrote:
> On Thu, Feb 11, 2021 at 09:44:43AM +0200, Ido Schimmel wrote:
> > On Thu, Feb 11, 2021 at 01:23:52AM +0200, Vladimir Oltean wrote:
> > > On Wed, Feb 10, 2021 at 12:59:49PM +0200, Ido Schimmel wrote:
> > > > > > The reverse, during unlinking, would be to refuse unlinking if the upper
> > > > > > has uppers of its own. netdev_upper_dev_unlink() needs to learn to
> > > > > > return an error and callers such as team/bond need to learn to handle
> > > > > > it, but it seems patchable.
> > > > >
> > > > > Again, this was treated prior to my deletion in this series and not by
> > > > > erroring out, I just really didn't think it through.
> > > > >
> > > > > So you're saying that if we impose that all switchdev drivers restrict
> > > > > the house of cards to be constructed from the bottom up, and destructed
> > > > > from the top down, then the notification of bridge port flags can stay
> > > > > in the bridge layer?
> > > >
> > > > I actually don't think it's a good idea to have this in the bridge in
> > > > any case. I understand that it makes sense for some devices where
> > > > learning, flooding, etc are port attributes, but in other devices these
> > > > can be {port,vlan} attributes and then you need to take care of them
> > > > when a vlan is added / deleted and not only when a port is removed from
> > > > the bridge. So for such devices this really won't save anything. I would
> > > > thus leave it to the lower levels to decide.
> > >
> > > Just for my understanding, how are per-{port,vlan} attributes such as
> > > learning and flooding managed by the Linux bridge? How can I disable
> > > flooding only in a certain VLAN?
> >
> > You can't (currently). But it does not change the fact that in some
> > devices these are {port,vlan} attributes and we are talking here about
> > the interface towards these devices. Having these as {port,vlan}
> > attributes allows you to support use cases such as a port being enslaved
> > to a VLAN-aware bridge and its VLAN upper(s) enslaved to VLAN unaware
> > bridge(s).
>
> I don't think I understand the use case really. You mean something like this?
>
> br1 (vlan_filtering=0)
> / \
> / \
> swp0.100 \
> | \
> |(vlan_filtering \
> | br0 =1) \
> | / \ \
> |/ \ \
> swp0 swp1 swp2
>
> A packet received on swp0 with VLAN tag 100 will go to swp0.100 which
> will be forwarded according to the FDB of br1, and will be delivered to
> swp2 as untagged? Respectively in the other direction, a packet received
> on swp2 will have a VLAN 100 tag pushed on egress towards swp0, even if
> it is already VLAN-tagged?
>
> What do you even use this for?
The more common use case is to have multiple VLAN-unaware bridges
instead of one VLAN-aware bridge. I'm not aware of users that use the
hybrid model (VLAN-aware + VLAN-unaware). But regardless, this entails
treating above mentioned attributes as {port,vlan} attributes. A device
that only supports them as port attributes will have problems supporting
such a model.
> And also: if the {port,vlan} attributes can be simulated by making the
> bridge port be an 8021q upper of a physical interface, then as far as
> the bridge is concerned, they still are per-port attributes, and they
> are per-{port,vlan} only as far as the switch driver is concerned -
> therefore I don't see why it isn't okay for the bridge to notify the
> brport flags in exactly the same way for them too.
Look at this hunk from the patch:
@@ -343,6 +360,8 @@ static void del_nbp(struct net_bridge_port *p)
update_headroom(br, get_max_headroom(br));
netdev_reset_rx_headroom(dev);
+ nbp_flags_notify(p, BR_PORT_DEFAULT_FLAGS & ~BR_LEARNING,
+ BR_PORT_DEFAULT_FLAGS);
nbp_vlan_flush(p);
br_fdb_delete_by_port(br, p, 0, 1);
switchdev_deferred_process();
Devices that treat these attributes as {port,vlan} attributes will undo
this change upon the call to nbp_vlan_flush() when all the VLANs are
flushed.
>
> > Obviously you need to ensure there is no conflict between the
> > VLANs used by the VLAN-aware bridge and the VLAN device(s).
>
> On the other hand I think I have a more real-life use case that I think
> is in conflict with this last phrase.
> I have a VLAN-aware bridge and I want to run PTP in VLAN 7, but I also
> need to add VLAN 7 in the VLAN table of the bridge ports so that it
> doesn't drop traffic. PTP is link-local, so I need to run it on VLAN
> uppers of the switch ports. Like this:
>
> ip link add br0 type bridge vlan_filtering 1
> ip link set swp0 master br0
> ip link set swp1 master br0
> bridge vlan add dev swp0 vid 7 master
> bridge vlan add dev swp1 vid 7 master
> bridge vlan add dev br0 vid 7 self
> ip link add link swp0 name swp0.7 type vlan id 7
> ip link add link swp1 name swp0.7 type vlan id 7
> ptp4l -i swp0.7 -i swp1.7 -m
>
> How can I do that considering that you recommend avoiding conflicts
> between the VLAN-aware bridge and 8021q uppers? Or is that true only
> when the 8021q uppers are bridged?
The problem is with the statement "I also need to add VLAN 7 in the VLAN
table of the bridge ports so that it doesn't drop traffic". Packets with
VLAN 7 received by swp0 will be processed by swp0.7. br0 is irrelevant
and configuring swp0.7 should be enough in order to enable the VLAN
filter for VLAN 7 on swp0. I don't know the internals of the HW you are
working with, but I imagine that you would need to create a HW bridge
between {swp0, VLAN 7} and the CPU port so that all the traffic with
VLAN 7 will be sent / flooded to the CPU.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bridge] [PATCH v2 net-next 05/11] net: dsa: stop setting initial and final brport flags
[not found] <20210209151936.97382-1-olteanv@gmail.com>
2021-02-09 15:19 ` [Bridge] [PATCH v2 net-next 02/11] net: bridge: offload all port flags at once in br_setport Vladimir Oltean
2021-02-09 15:19 ` [Bridge] [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev Vladimir Oltean
@ 2021-02-09 15:19 ` Vladimir Oltean
2021-02-09 15:19 ` [Bridge] [PATCH v2 net-next 06/11] net: squash switchdev attributes PRE_BRIDGE_FLAGS and BRIDGE_FLAGS Vladimir Oltean
` (4 subsequent siblings)
7 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2021-02-09 15:19 UTC (permalink / raw)
To: Jakub Kicinski, David S. Miller
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, netdev, bridge, Ioana Ciornei,
linux-kernel, UNGLinuxDriver, Taras Chornyi, Ido Schimmel,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, linux-omap, Vivien Didelot
From: Vladimir Oltean <vladimir.oltean@nxp.com>
With the bridge driver doing that for us now, we can simplify our
mid-layer logic a little bit, which would have otherwise needed some
tuning for the disabling of address learning that is necessary in
standalone mode.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
Changes in v2:
None.
net/dsa/port.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 5e079a61528e..aa1cbba7f89f 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -132,11 +132,6 @@ int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
};
int err;
- /* Set the flooding mode before joining the port in the switch */
- err = dsa_port_bridge_flags(dp, BR_FLOOD | BR_MCAST_FLOOD);
- if (err)
- return err;
-
/* Here the interface is already bridged. Reflect the current
* configuration so that drivers can program their chips accordingly.
*/
@@ -145,10 +140,8 @@ int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
err = dsa_broadcast(DSA_NOTIFIER_BRIDGE_JOIN, &info);
/* The bridging is rolled back on error */
- if (err) {
- dsa_port_bridge_flags(dp, 0);
+ if (err)
dp->bridge_dev = NULL;
- }
return err;
}
@@ -172,9 +165,6 @@ void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br)
if (err)
pr_err("DSA: failed to notify DSA_NOTIFIER_BRIDGE_LEAVE\n");
- /* Port is leaving the bridge, disable flooding */
- dsa_port_bridge_flags(dp, 0);
-
/* Port left the bridge, put in BR_STATE_DISABLED by the bridge layer,
* so allow it to be in BR_STATE_FORWARDING to be kept functional
*/
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread* [Bridge] [PATCH v2 net-next 06/11] net: squash switchdev attributes PRE_BRIDGE_FLAGS and BRIDGE_FLAGS
[not found] <20210209151936.97382-1-olteanv@gmail.com>
` (2 preceding siblings ...)
2021-02-09 15:19 ` [Bridge] [PATCH v2 net-next 05/11] net: dsa: stop setting initial and final brport flags Vladimir Oltean
@ 2021-02-09 15:19 ` Vladimir Oltean
2021-02-09 15:19 ` [Bridge] [PATCH v2 net-next 07/11] net: dsa: kill .port_egress_floods overengineering Vladimir Oltean
` (3 subsequent siblings)
7 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2021-02-09 15:19 UTC (permalink / raw)
To: Jakub Kicinski, David S. Miller
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, netdev, bridge, Ioana Ciornei,
linux-kernel, UNGLinuxDriver, Taras Chornyi, Ido Schimmel,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, linux-omap, Vivien Didelot
From: Vladimir Oltean <vladimir.oltean@nxp.com>
There does not appear to be any strong reason why
br_switchdev_set_port_flag issues a separate notification for checking
the supported brport flags rather than just attempting to apply them and
propagating the error if that fails.
However, there is a reason why this switchdev API is counterproductive
for a driver writer, and that is because although br_switchdev_set_port_flag
gets passed a "flags" and a "mask", those are passed piecemeal to the
driver, so while the PRE_BRIDGE_FLAGS listener knows what changed
because it has the "mask", the BRIDGE_FLAGS listener doesn't, because it
only has the final value. This means that "edge detection" needs to be
done by each individual BRIDGE_FLAGS listener by XOR-ing the old and the
new flags, which in turn means that copying the flags into a driver
private variable is strictly necessary.
This can be solved by passing the "flags" and the "mask" together into
a single switchdev attribute, and it also reduces some boilerplate in
the drivers that offload this.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
Changes in v2:
- Renamed "val" to "flags".
- Reworked drivers to check mask before performing any change.
.../marvell/prestera/prestera_switchdev.c | 29 +++++++------
.../mellanox/mlxsw/spectrum_switchdev.c | 28 ++++--------
drivers/net/ethernet/rocker/rocker_main.c | 24 ++---------
drivers/net/ethernet/ti/cpsw_switchdev.c | 32 ++++++--------
drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 43 +++++++++----------
include/net/switchdev.h | 8 +++-
net/bridge/br_switchdev.c | 15 ++-----
net/dsa/dsa_priv.h | 4 +-
net/dsa/port.c | 22 +++-------
net/dsa/slave.c | 3 --
10 files changed, 78 insertions(+), 130 deletions(-)
diff --git a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
index 2c1619715a4b..a797a7ff0cfe 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
@@ -581,24 +581,32 @@ int prestera_bridge_port_event(struct net_device *dev, unsigned long event,
static int prestera_port_attr_br_flags_set(struct prestera_port *port,
struct net_device *dev,
- unsigned long flags)
+ struct switchdev_brport_flags flags)
{
struct prestera_bridge_port *br_port;
int err;
+ if (flags.mask & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD))
+ err = -EINVAL;
+
br_port = prestera_bridge_port_by_dev(port->sw->swdev, dev);
if (!br_port)
return 0;
- err = prestera_hw_port_flood_set(port, flags & BR_FLOOD);
- if (err)
- return err;
+ if (flags.mask & BR_FLOOD) {
+ err = prestera_hw_port_flood_set(port, flags.val & BR_FLOOD);
+ if (err)
+ return err;
+ }
- err = prestera_hw_port_learning_set(port, flags & BR_LEARNING);
- if (err)
- return err;
+ if (flags.mask & BR_LEARNING) {
+ err = prestera_hw_port_learning_set(port,
+ flags.val & BR_LEARNING);
+ if (err)
+ return err;
+ }
- memcpy(&br_port->flags, &flags, sizeof(flags));
+ memcpy(&br_port->flags, &flags.val, sizeof(flags.val));
return 0;
}
@@ -706,11 +714,6 @@ static int prestera_port_obj_attr_set(struct net_device *dev,
err = prestera_port_attr_stp_state_set(port, attr->orig_dev,
attr->u.stp_state);
break;
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- if (attr->u.brport_flags &
- ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD))
- err = -EINVAL;
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
err = prestera_port_attr_br_flags_set(port, attr->orig_dev,
attr->u.brport_flags);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 18e4f1cd5587..0a8521adb4e9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -653,23 +653,16 @@ mlxsw_sp_bridge_port_learning_set(struct mlxsw_sp_port *mlxsw_sp_port,
return err;
}
-static int mlxsw_sp_port_attr_br_pre_flags_set(struct mlxsw_sp_port
- *mlxsw_sp_port,
- unsigned long brport_flags)
-{
- if (brport_flags & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD))
- return -EINVAL;
-
- return 0;
-}
-
static int mlxsw_sp_port_attr_br_flags_set(struct mlxsw_sp_port *mlxsw_sp_port,
struct net_device *orig_dev,
- unsigned long brport_flags)
+ struct switchdev_brport_flags flags)
{
struct mlxsw_sp_bridge_port *bridge_port;
int err;
+ if (flags.mask & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD))
+ return -EINVAL;
+
bridge_port = mlxsw_sp_bridge_port_find(mlxsw_sp_port->mlxsw_sp->bridge,
orig_dev);
if (!bridge_port)
@@ -677,12 +670,12 @@ static int mlxsw_sp_port_attr_br_flags_set(struct mlxsw_sp_port *mlxsw_sp_port,
err = mlxsw_sp_bridge_port_flood_table_set(mlxsw_sp_port, bridge_port,
MLXSW_SP_FLOOD_TYPE_UC,
- brport_flags & BR_FLOOD);
+ flags.val & BR_FLOOD);
if (err)
return err;
err = mlxsw_sp_bridge_port_learning_set(mlxsw_sp_port, bridge_port,
- brport_flags & BR_LEARNING);
+ flags.val & BR_LEARNING);
if (err)
return err;
@@ -691,13 +684,12 @@ static int mlxsw_sp_port_attr_br_flags_set(struct mlxsw_sp_port *mlxsw_sp_port,
err = mlxsw_sp_bridge_port_flood_table_set(mlxsw_sp_port, bridge_port,
MLXSW_SP_FLOOD_TYPE_MC,
- brport_flags &
- BR_MCAST_FLOOD);
+ flags.val & BR_MCAST_FLOOD);
if (err)
return err;
out:
- memcpy(&bridge_port->flags, &brport_flags, sizeof(brport_flags));
+ memcpy(&bridge_port->flags, &flags.val, sizeof(flags.val));
return 0;
}
@@ -899,10 +891,6 @@ static int mlxsw_sp_port_attr_set(struct net_device *dev,
attr->orig_dev,
attr->u.stp_state);
break;
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- err = mlxsw_sp_port_attr_br_pre_flags_set(mlxsw_sp_port,
- attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
err = mlxsw_sp_port_attr_br_flags_set(mlxsw_sp_port,
attr->orig_dev,
diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index 740a715c49c6..898abf3d14d0 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -1575,8 +1575,8 @@ rocker_world_port_attr_bridge_flags_support_get(const struct rocker_port *
}
static int
-rocker_world_port_attr_pre_bridge_flags_set(struct rocker_port *rocker_port,
- unsigned long brport_flags)
+rocker_world_port_attr_bridge_flags_set(struct rocker_port *rocker_port,
+ struct switchdev_brport_flags flags)
{
struct rocker_world_ops *wops = rocker_port->rocker->wops;
unsigned long brport_flags_s;
@@ -1590,22 +1590,10 @@ rocker_world_port_attr_pre_bridge_flags_set(struct rocker_port *rocker_port,
if (err)
return err;
- if (brport_flags & ~brport_flags_s)
+ if (flags.mask & ~brport_flags_s)
return -EINVAL;
- return 0;
-}
-
-static int
-rocker_world_port_attr_bridge_flags_set(struct rocker_port *rocker_port,
- unsigned long brport_flags)
-{
- struct rocker_world_ops *wops = rocker_port->rocker->wops;
-
- if (!wops->port_attr_bridge_flags_set)
- return -EOPNOTSUPP;
-
- return wops->port_attr_bridge_flags_set(rocker_port, brport_flags);
+ return wops->port_attr_bridge_flags_set(rocker_port, flags.val);
}
static int
@@ -2056,10 +2044,6 @@ static int rocker_port_attr_set(struct net_device *dev,
err = rocker_world_port_attr_stp_state_set(rocker_port,
attr->u.stp_state);
break;
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- err = rocker_world_port_attr_pre_bridge_flags_set(rocker_port,
- attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
err = rocker_world_port_attr_bridge_flags_set(rocker_port,
attr->u.brport_flags);
diff --git a/drivers/net/ethernet/ti/cpsw_switchdev.c b/drivers/net/ethernet/ti/cpsw_switchdev.c
index 13524cbaa8b6..5d8ec34f82ad 100644
--- a/drivers/net/ethernet/ti/cpsw_switchdev.c
+++ b/drivers/net/ethernet/ti/cpsw_switchdev.c
@@ -57,27 +57,25 @@ static int cpsw_port_stp_state_set(struct cpsw_priv *priv, u8 state)
static int cpsw_port_attr_br_flags_set(struct cpsw_priv *priv,
struct net_device *orig_dev,
- unsigned long brport_flags)
+ struct switchdev_brport_flags flags)
{
struct cpsw_common *cpsw = priv->cpsw;
- bool unreg_mcast_add = false;
- if (brport_flags & BR_MCAST_FLOOD)
- unreg_mcast_add = true;
- dev_dbg(priv->dev, "BR_MCAST_FLOOD: %d port %u\n",
- unreg_mcast_add, priv->emac_port);
+ if (flags.mask & ~(BR_LEARNING | BR_MCAST_FLOOD))
+ return -EINVAL;
- cpsw_ale_set_unreg_mcast(cpsw->ale, BIT(priv->emac_port),
- unreg_mcast_add);
+ if (flags.mask & BR_MCAST_FLOOD) {
+ bool unreg_mcast_add = false;
- return 0;
-}
+ if (flags.val & BR_MCAST_FLOOD)
+ unreg_mcast_add = true;
-static int cpsw_port_attr_br_flags_pre_set(struct net_device *netdev,
- unsigned long flags)
-{
- if (flags & ~(BR_LEARNING | BR_MCAST_FLOOD))
- return -EINVAL;
+ dev_dbg(priv->dev, "BR_MCAST_FLOOD: %d port %u\n",
+ unreg_mcast_add, priv->emac_port);
+
+ cpsw_ale_set_unreg_mcast(cpsw->ale, BIT(priv->emac_port),
+ unreg_mcast_add);
+ }
return 0;
}
@@ -92,10 +90,6 @@ static int cpsw_port_attr_set(struct net_device *ndev,
dev_dbg(priv->dev, "attr: id %u port: %u\n", attr->id, priv->emac_port);
switch (attr->id) {
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- ret = cpsw_port_attr_br_flags_pre_set(ndev,
- attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
ret = cpsw_port_stp_state_set(priv, attr->u.stp_state);
dev_dbg(priv->dev, "stp state: %u\n", attr->u.stp_state);
diff --git a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
index ca3d07fe7f58..f675a2ba4dce 100644
--- a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
+++ b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
@@ -908,31 +908,32 @@ static int dpaa2_switch_port_attr_stp_state_set(struct net_device *netdev,
return dpaa2_switch_port_set_stp_state(port_priv, state);
}
-static int dpaa2_switch_port_attr_br_flags_pre_set(struct net_device *netdev,
- unsigned long flags)
-{
- if (flags & ~(BR_LEARNING | BR_FLOOD))
- return -EINVAL;
-
- return 0;
-}
-
-static int dpaa2_switch_port_attr_br_flags_set(struct net_device *netdev,
- unsigned long flags)
+static int
+dpaa2_switch_port_attr_br_flags_set(struct net_device *netdev,
+ struct switchdev_brport_flags flags)
{
struct ethsw_port_priv *port_priv = netdev_priv(netdev);
int err = 0;
- /* Learning is enabled per switch */
- err = dpaa2_switch_set_learning(port_priv->ethsw_data,
- !!(flags & BR_LEARNING));
- if (err)
- goto exit;
+ if (flags.mask & ~(BR_LEARNING | BR_FLOOD))
+ return -EINVAL;
+
+ if (flags.mask & BR_LEARNING) {
+ /* Learning is enabled per switch */
+ err = dpaa2_switch_set_learning(port_priv->ethsw_data,
+ !!(flags.val & BR_LEARNING));
+ if (err)
+ return err;
+ }
- err = dpaa2_switch_port_set_flood(port_priv, !!(flags & BR_FLOOD));
+ if (flags.mask & BR_FLOOD) {
+ err = dpaa2_switch_port_set_flood(port_priv,
+ !!(flags.val & BR_FLOOD));
+ if (err)
+ return err;
+ }
-exit:
- return err;
+ return 0;
}
static int dpaa2_switch_port_attr_set(struct net_device *netdev,
@@ -945,10 +946,6 @@ static int dpaa2_switch_port_attr_set(struct net_device *netdev,
err = dpaa2_switch_port_attr_stp_state_set(netdev,
attr->u.stp_state);
break;
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- err = dpaa2_switch_port_attr_br_flags_pre_set(netdev,
- attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
err = dpaa2_switch_port_attr_br_flags_set(netdev,
attr->u.brport_flags);
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 84c765312001..aa9cad9bad7d 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -20,7 +20,6 @@ enum switchdev_attr_id {
SWITCHDEV_ATTR_ID_UNDEFINED,
SWITCHDEV_ATTR_ID_PORT_STP_STATE,
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS,
- SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
SWITCHDEV_ATTR_ID_PORT_MROUTER,
SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME,
SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING,
@@ -33,6 +32,11 @@ enum switchdev_attr_id {
#endif
};
+struct switchdev_brport_flags {
+ unsigned long val;
+ unsigned long mask;
+};
+
struct switchdev_attr {
struct net_device *orig_dev;
enum switchdev_attr_id id;
@@ -41,7 +45,7 @@ struct switchdev_attr {
void (*complete)(struct net_device *dev, int err, void *priv);
union {
u8 stp_state; /* PORT_STP_STATE */
- unsigned long brport_flags; /* PORT_{PRE}_BRIDGE_FLAGS */
+ struct switchdev_brport_flags brport_flags; /* PORT_BRIDGE_FLAGS */
bool mrouter; /* PORT_MROUTER */
clock_t ageing_time; /* BRIDGE_AGEING_TIME */
bool vlan_filtering; /* BRIDGE_VLAN_FILTERING */
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index 1fae532cfbb1..bc63b10b2e67 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -64,7 +64,7 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
{
struct switchdev_attr attr = {
.orig_dev = p->dev,
- .id = SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
+ .id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS,
};
struct switchdev_notifier_port_attr_info info = {
.attr = &attr,
@@ -76,7 +76,8 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
if (!mask)
return 0;
- attr.u.brport_flags = mask;
+ attr.u.brport_flags.val = flags;
+ attr.u.brport_flags.mask = mask;
/* We run from atomic context here */
err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
@@ -92,16 +93,6 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
return -EOPNOTSUPP;
}
- attr.id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS;
- attr.flags = SWITCHDEV_F_DEFER;
- attr.u.brport_flags = flags;
-
- err = switchdev_port_attr_set(p->dev, &attr);
- if (err) {
- NL_SET_ERR_MSG_MOD(extack, "error setting offload flag on port");
- return err;
- }
-
return 0;
}
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 8a1bcb2b4208..63770e421e4d 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -174,8 +174,8 @@ int dsa_port_mdb_add(const struct dsa_port *dp,
const struct switchdev_obj_port_mdb *mdb);
int dsa_port_mdb_del(const struct dsa_port *dp,
const struct switchdev_obj_port_mdb *mdb);
-int dsa_port_pre_bridge_flags(const struct dsa_port *dp, unsigned long flags);
-int dsa_port_bridge_flags(const struct dsa_port *dp, unsigned long flags);
+int dsa_port_bridge_flags(const struct dsa_port *dp,
+ struct switchdev_brport_flags flags);
int dsa_port_mrouter(struct dsa_port *dp, bool mrouter);
int dsa_port_vlan_add(struct dsa_port *dp,
const struct switchdev_obj_port_vlan *vlan);
diff --git a/net/dsa/port.c b/net/dsa/port.c
index aa1cbba7f89f..597d3d3eb507 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -382,28 +382,18 @@ int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock)
return 0;
}
-int dsa_port_pre_bridge_flags(const struct dsa_port *dp, unsigned long flags)
+int dsa_port_bridge_flags(const struct dsa_port *dp,
+ struct switchdev_brport_flags flags)
{
struct dsa_switch *ds = dp->ds;
+ int port = dp->index;
if (!ds->ops->port_egress_floods ||
- (flags & ~(BR_FLOOD | BR_MCAST_FLOOD)))
+ (flags.mask & ~(BR_FLOOD | BR_MCAST_FLOOD)))
return -EINVAL;
- return 0;
-}
-
-int dsa_port_bridge_flags(const struct dsa_port *dp, unsigned long flags)
-{
- struct dsa_switch *ds = dp->ds;
- int port = dp->index;
- int err = 0;
-
- if (ds->ops->port_egress_floods)
- err = ds->ops->port_egress_floods(ds, port, flags & BR_FLOOD,
- flags & BR_MCAST_FLOOD);
-
- return err;
+ return ds->ops->port_egress_floods(ds, port, flags.val & BR_FLOOD,
+ flags.val & BR_MCAST_FLOOD);
}
int dsa_port_mrouter(struct dsa_port *dp, bool mrouter)
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 8f4c7c232e2c..0e1f8f1d4e2c 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -290,9 +290,6 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
case SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME:
ret = dsa_port_ageing_time(dp, attr->u.ageing_time);
break;
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- ret = dsa_port_pre_bridge_flags(dp, attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
ret = dsa_port_bridge_flags(dp, attr->u.brport_flags);
break;
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread* [Bridge] [PATCH v2 net-next 07/11] net: dsa: kill .port_egress_floods overengineering
[not found] <20210209151936.97382-1-olteanv@gmail.com>
` (3 preceding siblings ...)
2021-02-09 15:19 ` [Bridge] [PATCH v2 net-next 06/11] net: squash switchdev attributes PRE_BRIDGE_FLAGS and BRIDGE_FLAGS Vladimir Oltean
@ 2021-02-09 15:19 ` Vladimir Oltean
2021-02-09 20:37 ` Vladimir Oltean
2021-02-09 15:19 ` [Bridge] [PATCH v2 net-next 08/11] net: bridge: put SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS on the blocking call chain Vladimir Oltean
` (2 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2021-02-09 15:19 UTC (permalink / raw)
To: Jakub Kicinski, David S. Miller
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, netdev, bridge, Ioana Ciornei,
linux-kernel, UNGLinuxDriver, Taras Chornyi, Ido Schimmel,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, linux-omap, Vivien Didelot
From: Vladimir Oltean <vladimir.oltean@nxp.com>
The bridge offloads the port flags through a single bit mask using
switchdev, which among others, contains learning and flooding settings.
The commit 57652796aa97 ("net: dsa: add support for bridge flags")
missed one crucial aspect of the SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS API
when designing the API one level lower, towards the drivers.
This is that the bitmask of passed brport flags never has more than one
bit set at a time. On the other hand, the prototype passed to the driver
is .port_egress_floods(int port, bool unicast, bool multicast), which
configures two flags at a time.
DSA currently checks if .port_egress_floods is implemented, and if it
is, reports both BR_FLOOD and BR_MCAST_FLOOD as supported. So the driver
has no choice if it wants to inform the bridge that, for example, it
can't configure unicast flooding independently of multicast flooding -
the DSA mid layer is standing in the way. Or the other way around: a new
driver wants to start configuring BR_BCAST_FLOOD separately, but what do
we do with the rest, which only support unicast and multicast flooding?
Do we report broadcast flooding configuration as supported for those
too, and silently do nothing?
Secondly, currently DSA deems the driver too dumb to deserve knowing that
a SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it
just calls .port_egress_floods for the CPU port. When we'll add support
for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real
problem because the flood settings will need to be held statefully in
the DSA middle layer, otherwise changing the mrouter port attribute will
impact the flooding attribute. And that's _assuming_ that the underlying
hardware doesn't have anything else to do when a multicast router
attaches to a port than flood unknown traffic to it. If it does, there
will need to be a dedicated .port_set_mrouter anyway.
Lastly, we have DSA drivers that have a backlink into a pure switchdev
driver (felix -> ocelot). It seems reasonable that the other switchdev
drivers should not have to suffer from the oddities of DSA overengineering,
so keeping DSA a pass-through layer makes more sense there.
To simplify the brport flags situation we just delete .port_egress_floods
and we introduce a simple .port_bridge_flags which is passed to the
driver. Also, the logic from dsa_port_mrouter is removed and a
.port_set_mrouter is created.
Functionally speaking, we simply move the calls to .port_egress_floods
one step lower, in the two drivers that implement it: mv88e6xxx and b53,
so things should work just as before.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
Changes in v2:
- Reordered with previous patch such that we don't need to introduce
.port_pre_bridge_flags
- Pass extack to drivers.
drivers/net/dsa/b53/b53_common.c | 20 +++++++++++++++++++-
drivers/net/dsa/mv88e6xxx/chip.c | 21 ++++++++++++++++++++-
include/net/dsa.h | 7 +++++--
net/dsa/dsa_priv.h | 6 ++++--
net/dsa/port.c | 18 ++++++++----------
net/dsa/slave.c | 4 ++--
6 files changed, 58 insertions(+), 18 deletions(-)
diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 23fc7225c8d1..d480493cb64d 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1948,6 +1948,23 @@ int b53_br_egress_floods(struct dsa_switch *ds, int port,
}
EXPORT_SYMBOL(b53_br_egress_floods);
+static int b53_br_flags(struct dsa_switch *ds, int port,
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack)
+{
+ if (flags.mask & ~(BR_FLOOD | BR_MCAST_FLOOD))
+ return -EINVAL;
+
+ return b53_br_egress_floods(ds, port, flags.val & BR_FLOOD,
+ flags.val & BR_MCAST_FLOOD);
+}
+
+static int b53_set_mrouter(struct dsa_switch *ds, int port, bool mrouter,
+ struct netlink_ext_ack *extack)
+{
+ return b53_br_egress_floods(ds, port, true, mrouter);
+}
+
static bool b53_possible_cpu_port(struct dsa_switch *ds, int port)
{
/* Broadcom switches will accept enabling Broadcom tags on the
@@ -2187,9 +2204,10 @@ static const struct dsa_switch_ops b53_switch_ops = {
.set_mac_eee = b53_set_mac_eee,
.port_bridge_join = b53_br_join,
.port_bridge_leave = b53_br_leave,
+ .port_bridge_flags = b53_br_flags,
+ .port_set_mrouter = b53_set_mrouter,
.port_stp_state_set = b53_br_set_stp_state,
.port_fast_age = b53_br_fast_age,
- .port_egress_floods = b53_br_egress_floods,
.port_vlan_filtering = b53_vlan_filtering,
.port_vlan_add = b53_vlan_add,
.port_vlan_del = b53_vlan_del,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index ae0b490f00cd..b230bfcc4050 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -5380,6 +5380,24 @@ static int mv88e6xxx_port_egress_floods(struct dsa_switch *ds, int port,
return err;
}
+static int mv88e6xxx_port_bridge_flags(struct dsa_switch *ds, int port,
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack)
+{
+ if (flags.mask & ~(BR_FLOOD | BR_MCAST_FLOOD))
+ return -EINVAL;
+
+ return mv88e6xxx_port_egress_floods(ds, port, flags.val & BR_FLOOD,
+ flags.val & BR_MCAST_FLOOD);
+}
+
+static int mv88e6xxx_port_set_mrouter(struct dsa_switch *ds, int port,
+ bool mrouter,
+ struct netlink_ext_ack *extack)
+{
+ return mv88e6xxx_port_egress_floods(ds, port, true, mrouter);
+}
+
static bool mv88e6xxx_lag_can_offload(struct dsa_switch *ds,
struct net_device *lag,
struct netdev_lag_upper_info *info)
@@ -5678,7 +5696,8 @@ static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
.set_ageing_time = mv88e6xxx_set_ageing_time,
.port_bridge_join = mv88e6xxx_port_bridge_join,
.port_bridge_leave = mv88e6xxx_port_bridge_leave,
- .port_egress_floods = mv88e6xxx_port_egress_floods,
+ .port_bridge_flags = mv88e6xxx_port_bridge_flags,
+ .port_set_mrouter = mv88e6xxx_port_set_mrouter,
.port_stp_state_set = mv88e6xxx_port_stp_state_set,
.port_fast_age = mv88e6xxx_port_fast_age,
.port_vlan_filtering = mv88e6xxx_port_vlan_filtering,
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 60acb9fca124..09aa28e667c7 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -621,8 +621,11 @@ struct dsa_switch_ops {
void (*port_stp_state_set)(struct dsa_switch *ds, int port,
u8 state);
void (*port_fast_age)(struct dsa_switch *ds, int port);
- int (*port_egress_floods)(struct dsa_switch *ds, int port,
- bool unicast, bool multicast);
+ int (*port_bridge_flags)(struct dsa_switch *ds, int port,
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack);
+ int (*port_set_mrouter)(struct dsa_switch *ds, int port, bool mrouter,
+ struct netlink_ext_ack *extack);
/*
* VLAN support
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 63770e421e4d..8125806ee135 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -175,8 +175,10 @@ int dsa_port_mdb_add(const struct dsa_port *dp,
int dsa_port_mdb_del(const struct dsa_port *dp,
const struct switchdev_obj_port_mdb *mdb);
int dsa_port_bridge_flags(const struct dsa_port *dp,
- struct switchdev_brport_flags flags);
-int dsa_port_mrouter(struct dsa_port *dp, bool mrouter);
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack);
+int dsa_port_mrouter(struct dsa_port *dp, bool mrouter,
+ struct netlink_ext_ack *extack);
int dsa_port_vlan_add(struct dsa_port *dp,
const struct switchdev_obj_port_vlan *vlan);
int dsa_port_vlan_del(struct dsa_port *dp,
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 597d3d3eb507..be5b2244667b 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -383,28 +383,26 @@ int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock)
}
int dsa_port_bridge_flags(const struct dsa_port *dp,
- struct switchdev_brport_flags flags)
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack)
{
struct dsa_switch *ds = dp->ds;
- int port = dp->index;
- if (!ds->ops->port_egress_floods ||
- (flags.mask & ~(BR_FLOOD | BR_MCAST_FLOOD)))
+ if (!ds->ops->port_bridge_flags)
return -EINVAL;
- return ds->ops->port_egress_floods(ds, port, flags.val & BR_FLOOD,
- flags.val & BR_MCAST_FLOOD);
+ return ds->ops->port_bridge_flags(ds, dp->index, flags, extack);
}
-int dsa_port_mrouter(struct dsa_port *dp, bool mrouter)
+int dsa_port_mrouter(struct dsa_port *dp, bool mrouter,
+ struct netlink_ext_ack *extack)
{
struct dsa_switch *ds = dp->ds;
- int port = dp->index;
- if (!ds->ops->port_egress_floods)
+ if (!ds->ops->port_set_mrouter)
return -EOPNOTSUPP;
- return ds->ops->port_egress_floods(ds, port, true, mrouter);
+ return ds->ops->port_set_mrouter(ds, dp->index, mrouter, extack);
}
int dsa_port_mtu_change(struct dsa_port *dp, int new_mtu,
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 0e1f8f1d4e2c..4a979245e059 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -291,10 +291,10 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
ret = dsa_port_ageing_time(dp, attr->u.ageing_time);
break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
- ret = dsa_port_bridge_flags(dp, attr->u.brport_flags);
+ ret = dsa_port_bridge_flags(dp, attr->u.brport_flags, extack);
break;
case SWITCHDEV_ATTR_ID_BRIDGE_MROUTER:
- ret = dsa_port_mrouter(dp->cpu_dp, attr->u.mrouter);
+ ret = dsa_port_mrouter(dp->cpu_dp, attr->u.mrouter, extack);
break;
default:
ret = -EOPNOTSUPP;
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [Bridge] [PATCH v2 net-next 07/11] net: dsa: kill .port_egress_floods overengineering
2021-02-09 15:19 ` [Bridge] [PATCH v2 net-next 07/11] net: dsa: kill .port_egress_floods overengineering Vladimir Oltean
@ 2021-02-09 20:37 ` Vladimir Oltean
2021-02-09 21:29 ` Florian Fainelli
0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2021-02-09 20:37 UTC (permalink / raw)
To: Jakub Kicinski, David S. Miller
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, netdev, bridge, Ioana Ciornei,
linux-kernel, UNGLinuxDriver, Taras Chornyi, Ido Schimmel,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, linux-omap, Vivien Didelot
On Tue, Feb 09, 2021 at 05:19:32PM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> The bridge offloads the port flags through a single bit mask using
> switchdev, which among others, contains learning and flooding settings.
>
> The commit 57652796aa97 ("net: dsa: add support for bridge flags")
> missed one crucial aspect of the SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS API
> when designing the API one level lower, towards the drivers.
> This is that the bitmask of passed brport flags never has more than one
> bit set at a time. On the other hand, the prototype passed to the driver
> is .port_egress_floods(int port, bool unicast, bool multicast), which
> configures two flags at a time.
>
> DSA currently checks if .port_egress_floods is implemented, and if it
> is, reports both BR_FLOOD and BR_MCAST_FLOOD as supported. So the driver
> has no choice if it wants to inform the bridge that, for example, it
> can't configure unicast flooding independently of multicast flooding -
> the DSA mid layer is standing in the way. Or the other way around: a new
> driver wants to start configuring BR_BCAST_FLOOD separately, but what do
> we do with the rest, which only support unicast and multicast flooding?
> Do we report broadcast flooding configuration as supported for those
> too, and silently do nothing?
>
> Secondly, currently DSA deems the driver too dumb to deserve knowing that
> a SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it
> just calls .port_egress_floods for the CPU port. When we'll add support
> for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real
> problem because the flood settings will need to be held statefully in
> the DSA middle layer, otherwise changing the mrouter port attribute will
> impact the flooding attribute. And that's _assuming_ that the underlying
> hardware doesn't have anything else to do when a multicast router
> attaches to a port than flood unknown traffic to it. If it does, there
> will need to be a dedicated .port_set_mrouter anyway.
>
> Lastly, we have DSA drivers that have a backlink into a pure switchdev
> driver (felix -> ocelot). It seems reasonable that the other switchdev
> drivers should not have to suffer from the oddities of DSA overengineering,
> so keeping DSA a pass-through layer makes more sense there.
>
> To simplify the brport flags situation we just delete .port_egress_floods
> and we introduce a simple .port_bridge_flags which is passed to the
> driver. Also, the logic from dsa_port_mrouter is removed and a
> .port_set_mrouter is created.
>
> Functionally speaking, we simply move the calls to .port_egress_floods
> one step lower, in the two drivers that implement it: mv88e6xxx and b53,
> so things should work just as before.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
Florian, Andrew, what are your opinions on this patch? I guess what I
dislike the most about .port_egress_floods is that it combines the
unicast and multicast settings in the same callback, for no good
apparent reason. So that, at the very least, needs to change.
What do you prefer between having:
.port_set_unicast_floods
.port_set_multicast_floods
.port_set_broadcast_floods
.port_set_learning
and a single:
.port_bridge_flags
?
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [Bridge] [PATCH v2 net-next 07/11] net: dsa: kill .port_egress_floods overengineering
2021-02-09 20:37 ` Vladimir Oltean
@ 2021-02-09 21:29 ` Florian Fainelli
0 siblings, 0 replies; 22+ messages in thread
From: Florian Fainelli @ 2021-02-09 21:29 UTC (permalink / raw)
To: Vladimir Oltean, Jakub Kicinski, David S. Miller
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Grygorii Strashko,
Jiri Pirko, Vadym Kochan, netdev, bridge, Ioana Ciornei,
linux-kernel, UNGLinuxDriver, Taras Chornyi, Ido Schimmel,
Claudiu Manoil, Nikolay Aleksandrov, Roopa Prabhu, linux-omap,
Vivien Didelot
On 2/9/21 12:37 PM, Vladimir Oltean wrote:
> On Tue, Feb 09, 2021 at 05:19:32PM +0200, Vladimir Oltean wrote:
>> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>>
>> The bridge offloads the port flags through a single bit mask using
>> switchdev, which among others, contains learning and flooding settings.
>>
>> The commit 57652796aa97 ("net: dsa: add support for bridge flags")
>> missed one crucial aspect of the SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS API
>> when designing the API one level lower, towards the drivers.
>> This is that the bitmask of passed brport flags never has more than one
>> bit set at a time. On the other hand, the prototype passed to the driver
>> is .port_egress_floods(int port, bool unicast, bool multicast), which
>> configures two flags at a time.
>>
>> DSA currently checks if .port_egress_floods is implemented, and if it
>> is, reports both BR_FLOOD and BR_MCAST_FLOOD as supported. So the driver
>> has no choice if it wants to inform the bridge that, for example, it
>> can't configure unicast flooding independently of multicast flooding -
>> the DSA mid layer is standing in the way. Or the other way around: a new
>> driver wants to start configuring BR_BCAST_FLOOD separately, but what do
>> we do with the rest, which only support unicast and multicast flooding?
>> Do we report broadcast flooding configuration as supported for those
>> too, and silently do nothing?
>>
>> Secondly, currently DSA deems the driver too dumb to deserve knowing that
>> a SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it
>> just calls .port_egress_floods for the CPU port. When we'll add support
>> for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real
>> problem because the flood settings will need to be held statefully in
>> the DSA middle layer, otherwise changing the mrouter port attribute will
>> impact the flooding attribute. And that's _assuming_ that the underlying
>> hardware doesn't have anything else to do when a multicast router
>> attaches to a port than flood unknown traffic to it. If it does, there
>> will need to be a dedicated .port_set_mrouter anyway.
>>
>> Lastly, we have DSA drivers that have a backlink into a pure switchdev
>> driver (felix -> ocelot). It seems reasonable that the other switchdev
>> drivers should not have to suffer from the oddities of DSA overengineering,
>> so keeping DSA a pass-through layer makes more sense there.
>>
>> To simplify the brport flags situation we just delete .port_egress_floods
>> and we introduce a simple .port_bridge_flags which is passed to the
>> driver. Also, the logic from dsa_port_mrouter is removed and a
>> .port_set_mrouter is created.
>>
>> Functionally speaking, we simply move the calls to .port_egress_floods
>> one step lower, in the two drivers that implement it: mv88e6xxx and b53,
>> so things should work just as before.
>>
>> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
>> ---
>
> Florian, Andrew, what are your opinions on this patch? I guess what I
> dislike the most about .port_egress_floods is that it combines the
> unicast and multicast settings in the same callback, for no good
> apparent reason. So that, at the very least, needs to change.
> What do you prefer between having:
> .port_set_unicast_floods
> .port_set_multicast_floods
> .port_set_broadcast_floods
> .port_set_learning
> and a single:
> .port_bridge_flags
Tough one, from a driver writer perspective the fewer callbacks to wire
up the better, but from a framework perspective it is certainly easier
to audit drivers if there is a callback for a narrow and specific use
case. My vote goes for the single callback, that would lead to an easier
patch set to review IMHO.
--
Florian
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bridge] [PATCH v2 net-next 08/11] net: bridge: put SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS on the blocking call chain
[not found] <20210209151936.97382-1-olteanv@gmail.com>
` (4 preceding siblings ...)
2021-02-09 15:19 ` [Bridge] [PATCH v2 net-next 07/11] net: dsa: kill .port_egress_floods overengineering Vladimir Oltean
@ 2021-02-09 15:19 ` Vladimir Oltean
[not found] ` <20210209151936.97382-4-olteanv@gmail.com>
[not found] ` <20210209151936.97382-2-olteanv@gmail.com>
7 siblings, 0 replies; 22+ messages in thread
From: Vladimir Oltean @ 2021-02-09 15:19 UTC (permalink / raw)
To: Jakub Kicinski, David S. Miller
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, netdev, bridge, Ioana Ciornei,
linux-kernel, UNGLinuxDriver, Taras Chornyi, Ido Schimmel,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, linux-omap, Vivien Didelot
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Now that br_switchdev_set_port_flag is never called from under br->lock,
it runs in sleepable context.
All switchdev drivers handle SWITCHDEV_PORT_ATTR_SET as both blocking
and atomic, so no changes are needed on that front.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
Changes in v2:
Patch is new.
net/bridge/br_switchdev.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index bc63b10b2e67..3b152f2cd9b5 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -79,9 +79,8 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
attr.u.brport_flags.val = flags;
attr.u.brport_flags.mask = mask;
- /* We run from atomic context here */
- err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
- &info.info, extack);
+ err = call_switchdev_blocking_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
+ &info.info, extack);
err = notifier_to_errno(err);
if (err == -EOPNOTSUPP)
return 0;
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread[parent not found: <20210209151936.97382-4-olteanv@gmail.com>]
* Re: [Bridge] [PATCH v2 net-next 03/11] net: bridge: don't print in br_switchdev_set_port_flag
[not found] ` <20210209151936.97382-4-olteanv@gmail.com>
@ 2021-02-09 17:36 ` Vladimir Oltean
2021-02-09 18:26 ` Ido Schimmel
0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Oltean @ 2021-02-09 17:36 UTC (permalink / raw)
To: Jakub Kicinski, David S. Miller
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, netdev, bridge, Ioana Ciornei,
linux-kernel, UNGLinuxDriver, Taras Chornyi, Ido Schimmel,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, linux-omap, Vivien Didelot
On Tue, Feb 09, 2021 at 05:19:28PM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> Currently br_switchdev_set_port_flag has two options for error handling
> and neither is good:
> - The driver returns -EOPNOTSUPP in PRE_BRIDGE_FLAGS if it doesn't
> support offloading that flag, and this gets silently ignored and
> converted to an errno of 0. Nobody does this.
> - The driver returns some other error code, like -EINVAL, in
> PRE_BRIDGE_FLAGS, and br_switchdev_set_port_flag shouts loudly.
>
> The problem is that we'd like to offload some port flags during bridge
> join and leave, but also not have the bridge shout at us if those fail.
> But on the other hand we'd like the user to know that we can't offload
> something when they set that through netlink. And since we can't have
> the driver return -EOPNOTSUPP or -EINVAL depending on whether it's
> called by the user or internally by the bridge, let's just add an extack
> argument to br_switchdev_set_port_flag and propagate it to its callers.
> Then, when we need offloading to really fail silently, this can simply
> be passed a NULL argument.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
The build fails because since I started working on v2 and until I sent
it, Jakub merged net into net-next which contained this fix:
https://patchwork.kernel.org/project/netdevbpf/patch/20210207194733.1811529-1-olteanv@gmail.com/
for which I couldn't change prototype due to it missing in net-next.
I think I would like to rather wait to gather some feedback first before
respinning v3, if possible.
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [Bridge] [PATCH v2 net-next 03/11] net: bridge: don't print in br_switchdev_set_port_flag
2021-02-09 17:36 ` [Bridge] [PATCH v2 net-next 03/11] net: bridge: don't print in br_switchdev_set_port_flag Vladimir Oltean
@ 2021-02-09 18:26 ` Ido Schimmel
0 siblings, 0 replies; 22+ messages in thread
From: Ido Schimmel @ 2021-02-09 18:26 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, linux-omap, netdev, bridge,
Ioana Ciornei, linux-kernel, Vivien Didelot, Taras Chornyi,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, Jakub Kicinski, UNGLinuxDriver, David S. Miller
On Tue, Feb 09, 2021 at 07:36:31PM +0200, Vladimir Oltean wrote:
> On Tue, Feb 09, 2021 at 05:19:28PM +0200, Vladimir Oltean wrote:
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> >
> > Currently br_switchdev_set_port_flag has two options for error handling
> > and neither is good:
> > - The driver returns -EOPNOTSUPP in PRE_BRIDGE_FLAGS if it doesn't
> > support offloading that flag, and this gets silently ignored and
> > converted to an errno of 0. Nobody does this.
> > - The driver returns some other error code, like -EINVAL, in
> > PRE_BRIDGE_FLAGS, and br_switchdev_set_port_flag shouts loudly.
> >
> > The problem is that we'd like to offload some port flags during bridge
> > join and leave, but also not have the bridge shout at us if those fail.
> > But on the other hand we'd like the user to know that we can't offload
> > something when they set that through netlink. And since we can't have
> > the driver return -EOPNOTSUPP or -EINVAL depending on whether it's
> > called by the user or internally by the bridge, let's just add an extack
> > argument to br_switchdev_set_port_flag and propagate it to its callers.
> > Then, when we need offloading to really fail silently, this can simply
> > be passed a NULL argument.
> >
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > ---
>
> The build fails because since I started working on v2 and until I sent
> it, Jakub merged net into net-next which contained this fix:
> https://patchwork.kernel.org/project/netdevbpf/patch/20210207194733.1811529-1-olteanv@gmail.com/
> for which I couldn't change prototype due to it missing in net-next.
> I think I would like to rather wait to gather some feedback first before
> respinning v3, if possible.
It seems that in the sysfs call path br_switchdev_set_port_flag() will
be called with the bridge lock held, which is going to be a problem
given that patch #8 allows this function to block.
^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <20210209151936.97382-2-olteanv@gmail.com>]
* Re: [Bridge] [PATCH v2 net-next 01/11] net: switchdev: propagate extack to port attributes
[not found] ` <20210209151936.97382-2-olteanv@gmail.com>
@ 2021-02-09 18:00 ` Ido Schimmel
0 siblings, 0 replies; 22+ messages in thread
From: Ido Schimmel @ 2021-02-09 18:00 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Ivan Vecera, Andrew Lunn, Alexandre Belloni, Florian Fainelli,
Jiri Pirko, Vadym Kochan, linux-omap, netdev, bridge,
Ioana Ciornei, linux-kernel, Vivien Didelot, Taras Chornyi,
Claudiu Manoil, Grygorii Strashko, Nikolay Aleksandrov,
Roopa Prabhu, Jakub Kicinski, UNGLinuxDriver, David S. Miller
On Tue, Feb 09, 2021 at 05:19:26PM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> When a struct switchdev_attr is notified through switchdev, there is no
> way to report informational messages, unlike for struct switchdev_obj.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
^ permalink raw reply [flat|nested] 22+ messages in thread