netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0
@ 2025-09-04 17:07 Petr Machata
  2025-09-04 17:07 ` [PATCH net-next 01/10] net: bridge: Introduce BROPT_FDB_LOCAL_VLAN_0 Petr Machata
                   ` (13 more replies)
  0 siblings, 14 replies; 22+ messages in thread
From: Petr Machata @ 2025-09-04 17:07 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Simon Horman, Nikolay Aleksandrov, Ido Schimmel, bridge,
	Petr Machata, mlxsw

The bridge FDB contains one local entry per port per VLAN, for the MAC of
the port in question, and likewise for the bridge itself. This allows
bridge to locally receive and punt "up" any packets whose destination MAC
address matches that of one of the bridge interfaces or of the bridge
itself.

The number of these local "service" FDB entries grows linearly with number
of bridge-global VLAN memberships, but that in turn will tend to grow
quadratically with number of ports and per-port VLAN memberships. While
that does not cause issues during forwarding lookups, it does make dumps
impractically slow.

As an example, with 100 interfaces, each on 4K VLANs, a full dump of FDB
that just contains these 400K local entries, takes 6.5s. That's _without_
considering iproute2 formatting overhead, this is just how long it takes to
walk the FDB (repeatedly), serialize it into netlink messages, and parse
the messages back in userspace.

This is to illustrate that with growing number of ports and VLANs, the time
required to dump this repetitive information blows up. Arguably 4K VLANs
per interface is not a very realistic configuration, but then modern
switches can instead have several hundred interfaces, and we have fielded
requests for >1K VLAN memberships per port among customers.

FDB entries are currently all kept on a single linked list, and then
dumping uses this linked list to walk all entries and dump them in order.
When the message buffer is full, the iteration is cut short, and later
restarted. Of course, to restart the iteration, it's first necessary to
walk the already-dumped front part of the list before starting dumping
again. So one possibility is to organize the FDB entries in different
structure more amenable to walk restarts.

One option is to walk directly the hash table. The advantage is that no
auxiliary structure needs to be introduced. With a rough sketch of this
approach, the above scenario gets dumped in not quite 3 s, saving over 50 %
of time. However hash table iteration requires maintaining an active cursor
that must be collected when the dump is aborted. It looks like that would
require changes in the NDO protocol to allow to run this cleanup. Moreover,
on hash table resize the iteration is simply restarted. FDB dumps are
currently not guaranteed to correspond to any one particular state: entries
can be missed, or be duplicated. But with hash table iteration we would get
that plus the much less graceful resize behavior, where swaths of FDB are
duplicated.

Another option is to maintain the FDB entries in a red-black tree. We have
a PoC of this approach on hand, and the above scenario is dumped in about
2.5 s. Still not as snappy as we'd like it, but better than the hash table.
However the savings come at the expense of a more expensive insertion, and
require locking during dumps, which blocks insertion.

The upside of these approaches is that they provide benefits whatever the
FDB contents. But it does not seem like either of these is workable.
However we intend to clean up the RB tree PoC and present it for
consideration later on in case the trade-offs are considered acceptable.

Yet another option might be to use in-kernel FDB filtering, and to filter
the local entries when dumping. Unfortunately, this does not help all that
much either, because the linked-list walk still needs to happen. Also, with
the obvious filtering interface built around ndm_flags / ndm_state
filtering, one can't just exclude pure local entries in one query. One
needs to dump all non-local entries first, and then to get permanent
entries in another run filter local & added_by_user. I.e. one needs to pay
the iteration overhead twice, and then integrate the result in userspace.
To get significant savings, one would need a very specific knob like "dump,
but skip/only include local entries". But if we are adding a local-specific
knobs, maybe let's have an option to just not duplicate them in the first
place.

All this FDB duplication is there merely to make things snappy during
forwarding. But high-radix switches with thousands of VLANs typically do
not process much traffic in the SW datapath at all, but rather offload vast
majority of it. So we could exchange some of the runtime performance for a
neater FDB.

To that end, in this patchset, introduce a new bridge option,
BR_BOOLOPT_FDB_LOCAL_VLAN_0, which when enabled, has local FDB entries
installed only on VLAN 0, instead of duplicating them across all VLANs.
Then to maintain the local termination behavior, on FDB miss, the bridge
does a second lookup on VLAN 0.

Enabling this option changes the bridge behavior in expected ways. Since
the entries are only kept on VLAN 0, FDB get, flush and dump will not
perceive them on non-0 VLANs. And deleting the VLAN 0 entry affects
forwarding on all VLANs.

This patchset is loosely based on a privately circulated patch by Nikolay
Aleksandrov.

The patchset progresses as follows:

- Patch #1 introduces a bridge option to enable the above feature. Then
  patches #2 to #5 gradually patch the bridge to do the right thing when
  the option is enabled. Finally patch #6 adds the UAPI knob and the code
  for when the feature is enabled or disabled.
- Patches #7, #8 and #9 contain fixes and improvements to selftest
  libraries
- Patch #10 contains a new selftest

The corresponding iproute2 support is at:
https://github.com/pmachata/iproute2/commits/fdb_local_vlan_0/

Petr Machata (10):
  net: bridge: Introduce BROPT_FDB_LOCAL_VLAN_0
  net: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss
  net: bridge: BROPT_FDB_LOCAL_VLAN_0: On port changeaddr, skip per-VLAN
    FDBs
  net: bridge: BROPT_FDB_LOCAL_VLAN_0: On bridge changeaddr, skip
    per-VLAN FDBs
  net: bridge: BROPT_FDB_LOCAL_VLAN_0: Skip local FDBs on VLAN creation
  net: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0
  selftests: defer: Allow spaces in arguments of deferred commands
  selftests: defer: Introduce DEFER_PAUSE_ON_FAIL
  selftests: net: lib.sh: Don't defer failed commands
  selftests: forwarding: Add test for BR_BOOLOPT_FDB_LOCAL_VLAN_0

 include/uapi/linux/if_bridge.h                |   3 +
 net/bridge/br.c                               |  22 ++
 net/bridge/br_fdb.c                           | 114 +++++-
 net/bridge/br_input.c                         |   8 +
 net/bridge/br_private.h                       |   3 +
 net/bridge/br_vlan.c                          |  10 +-
 .../testing/selftests/net/forwarding/Makefile |   1 +
 .../net/forwarding/bridge_fdb_local_vlan_0.sh | 374 ++++++++++++++++++
 tools/testing/selftests/net/lib.sh            |  32 +-
 tools/testing/selftests/net/lib/sh/defer.sh   |  20 +-
 10 files changed, 559 insertions(+), 28 deletions(-)
 create mode 100755 tools/testing/selftests/net/forwarding/bridge_fdb_local_vlan_0.sh

-- 
2.49.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH net-next 01/10] net: bridge: Introduce BROPT_FDB_LOCAL_VLAN_0
  2025-09-04 17:07 [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Petr Machata
@ 2025-09-04 17:07 ` Petr Machata
  2025-09-04 17:07 ` [PATCH net-next 02/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss Petr Machata
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-09-04 17:07 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Simon Horman, Nikolay Aleksandrov, Ido Schimmel, bridge,
	Petr Machata, mlxsw

The following patches will gradually introduce the ability of the bridge
to look up local FDB entries on VLAN 0 instead of using the VLAN indicated
by a packet.

In this patch, just introduce the option itself, with which the feature
will be linked.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 net/bridge/br_private.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 8de0904b9627..87da287f19fe 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -487,6 +487,7 @@ enum net_bridge_opts {
 	BROPT_MCAST_VLAN_SNOOPING_ENABLED,
 	BROPT_MST_ENABLED,
 	BROPT_MDB_OFFLOAD_FAIL_NOTIFICATION,
+	BROPT_FDB_LOCAL_VLAN_0,
 };
 
 struct net_bridge {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 02/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss
  2025-09-04 17:07 [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Petr Machata
  2025-09-04 17:07 ` [PATCH net-next 01/10] net: bridge: Introduce BROPT_FDB_LOCAL_VLAN_0 Petr Machata
@ 2025-09-04 17:07 ` Petr Machata
  2025-09-09  2:15   ` Jakub Kicinski
  2025-09-04 17:07 ` [PATCH net-next 03/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: On port changeaddr, skip per-VLAN FDBs Petr Machata
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 22+ messages in thread
From: Petr Machata @ 2025-09-04 17:07 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Simon Horman, Nikolay Aleksandrov, Ido Schimmel, bridge,
	Petr Machata, mlxsw

When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for the
member ports as well as the bridge itself should not be created per-VLAN,
but instead only on VLAN 0.

That means that br_handle_frame_finish() needs to make two lookups: the
primary lookup on an appropriate VLAN, and when that misses, a lookup on
VLAN 0.

Have the second lookup only accept local MAC addresses. Turning this into a
generic second-lookup feature is not the goal.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 net/bridge/br_input.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 5f6ac9bf1527..67b4c905e49a 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -202,6 +202,14 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
 		break;
 	case BR_PKT_UNICAST:
 		dst = br_fdb_find_rcu(br, eth_hdr(skb)->h_dest, vid);
+		if (unlikely(!dst && vid &&
+			     br_opt_get(br, BROPT_FDB_LOCAL_VLAN_0))) {
+			dst = br_fdb_find_rcu(br, eth_hdr(skb)->h_dest, 0);
+			if (dst &&
+			    (!test_bit(BR_FDB_LOCAL, &dst->flags) ||
+			     test_bit(BR_FDB_ADDED_BY_USER, &dst->flags)))
+				dst = NULL;
+		}
 		break;
 	default:
 		break;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 03/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: On port changeaddr, skip per-VLAN FDBs
  2025-09-04 17:07 [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Petr Machata
  2025-09-04 17:07 ` [PATCH net-next 01/10] net: bridge: Introduce BROPT_FDB_LOCAL_VLAN_0 Petr Machata
  2025-09-04 17:07 ` [PATCH net-next 02/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss Petr Machata
@ 2025-09-04 17:07 ` Petr Machata
  2025-09-04 17:07 ` [PATCH net-next 04/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: On bridge " Petr Machata
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-09-04 17:07 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Simon Horman, Nikolay Aleksandrov, Ido Schimmel, bridge,
	Petr Machata, mlxsw

When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for member
ports should not be created per-VLAN, but instead only on VLAN 0. When the
member port address changes, the local FDB entries need to be updated,
which is done in br_fdb_changeaddr().

Under the VLAN-0 mode, only one local FDB entry will ever be added for a
port's address, and that on VLAN 0. Thus bail out of the delete loop early.
For the same reason, also skip adding the per-VLAN entries.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 net/bridge/br_fdb.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 902694c0ce64..918c37554638 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -459,6 +459,9 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
 	struct net_bridge_fdb_entry *f;
 	struct net_bridge *br = p->br;
 	struct net_bridge_vlan *v;
+	bool local_vlan_0;
+
+	local_vlan_0 = br_opt_get(br, BROPT_FDB_LOCAL_VLAN_0);
 
 	spin_lock_bh(&br->hash_lock);
 	vg = nbp_vlan_group(p);
@@ -468,11 +471,11 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
 			/* delete old one */
 			fdb_delete_local(br, p, f);
 
-			/* if this port has no vlan information
-			 * configured, we can safely be done at
-			 * this point.
+			/* if this port has no vlan information configured, or
+			 * local entries are only kept on VLAN 0, we can safely
+			 * be done at this point.
 			 */
-			if (!vg || !vg->num_vlans)
+			if (!vg || !vg->num_vlans || local_vlan_0)
 				goto insert;
 		}
 	}
@@ -481,7 +484,7 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
 	/* insert new address,  may fail if invalid address or dup. */
 	fdb_add_local(br, p, newaddr, 0);
 
-	if (!vg || !vg->num_vlans)
+	if (!vg || !vg->num_vlans || local_vlan_0)
 		goto done;
 
 	/* Now add entries for every VLAN configured on the port.
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 04/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: On bridge changeaddr, skip per-VLAN FDBs
  2025-09-04 17:07 [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Petr Machata
                   ` (2 preceding siblings ...)
  2025-09-04 17:07 ` [PATCH net-next 03/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: On port changeaddr, skip per-VLAN FDBs Petr Machata
@ 2025-09-04 17:07 ` Petr Machata
  2025-09-04 17:07 ` [PATCH net-next 05/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: Skip local FDBs on VLAN creation Petr Machata
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-09-04 17:07 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Simon Horman, Nikolay Aleksandrov, Ido Schimmel, bridge,
	Petr Machata, mlxsw

When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for the
bridge itself should not be created per-VLAN, but instead only on VLAN 0.
When the bridge address changes, the local FDB entries need to be updated,
which is done in br_fdb_change_mac_address().

Bail out early when in VLAN-0 mode, so that the per-VLAN FDB entries are
not created. The per-VLAN walk is only done afterwards.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 net/bridge/br_fdb.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 918c37554638..4a20578517a5 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -503,6 +503,9 @@ void br_fdb_change_mac_address(struct net_bridge *br, const u8 *newaddr)
 	struct net_bridge_vlan_group *vg;
 	struct net_bridge_fdb_entry *f;
 	struct net_bridge_vlan *v;
+	bool local_vlan_0;
+
+	local_vlan_0 = br_opt_get(br, BROPT_FDB_LOCAL_VLAN_0);
 
 	spin_lock_bh(&br->hash_lock);
 
@@ -514,7 +517,7 @@ void br_fdb_change_mac_address(struct net_bridge *br, const u8 *newaddr)
 
 	fdb_add_local(br, NULL, newaddr, 0);
 	vg = br_vlan_group(br);
-	if (!vg || !vg->num_vlans)
+	if (!vg || !vg->num_vlans || local_vlan_0)
 		goto out;
 	/* Now remove and add entries for every VLAN configured on the
 	 * bridge.  This function runs under RTNL so the bitmap will not
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 05/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: Skip local FDBs on VLAN creation
  2025-09-04 17:07 [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Petr Machata
                   ` (3 preceding siblings ...)
  2025-09-04 17:07 ` [PATCH net-next 04/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: On bridge " Petr Machata
@ 2025-09-04 17:07 ` Petr Machata
  2025-09-04 17:07 ` [PATCH net-next 06/10] net: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0 Petr Machata
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-09-04 17:07 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Simon Horman, Nikolay Aleksandrov, Ido Schimmel, bridge,
	Petr Machata, mlxsw

When BROPT_FDB_LOCAL_VLAN_0 is enabled, the local FDB entries for the
member ports as well as the bridge itself should not be created per-VLAN,
but instead only on VLAN 0.

Thus when a VLAN is added for a port or the bridge itself, a local FDB
entry with the corresponding address should not be added when in the VLAN-0
mode.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 net/bridge/br_vlan.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 939a3aa78d5c..ae911220cb3c 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -331,10 +331,12 @@ static int __vlan_add(struct net_bridge_vlan *v, u16 flags,
 
 	/* Add the dev mac and count the vlan only if it's usable */
 	if (br_vlan_should_use(v)) {
-		err = br_fdb_add_local(br, p, dev->dev_addr, v->vid);
-		if (err) {
-			br_err(br, "failed insert local address into bridge forwarding table\n");
-			goto out_filt;
+		if (!br_opt_get(br, BROPT_FDB_LOCAL_VLAN_0)) {
+			err = br_fdb_add_local(br, p, dev->dev_addr, v->vid);
+			if (err) {
+				br_err(br, "failed insert local address into bridge forwarding table\n");
+				goto out_filt;
+			}
 		}
 		vg->num_vlans++;
 	}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 06/10] net: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0
  2025-09-04 17:07 [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Petr Machata
                   ` (4 preceding siblings ...)
  2025-09-04 17:07 ` [PATCH net-next 05/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: Skip local FDBs on VLAN creation Petr Machata
@ 2025-09-04 17:07 ` Petr Machata
  2025-09-04 17:07 ` [PATCH net-next 07/10] selftests: defer: Allow spaces in arguments of deferred commands Petr Machata
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-09-04 17:07 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Simon Horman, Nikolay Aleksandrov, Ido Schimmel, bridge,
	Petr Machata, mlxsw, Andrew Lunn

The previous patches introduced a new option, BR_BOOLOPT_FDB_LOCAL_VLAN_0.
When enabled, it has local FDB entries installed only on VLAN 0, instead of
duplicating them across all VLANs.

In this patch, add the corresponding UAPI toggle, and the code for turning
the feature on and off.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---

Notes:
CC: Andrew Lunn <andrew+netdev@lunn.ch>

 include/uapi/linux/if_bridge.h |  3 ++
 net/bridge/br.c                | 22 ++++++++
 net/bridge/br_fdb.c            | 96 ++++++++++++++++++++++++++++++++++
 net/bridge/br_private.h        |  2 +
 4 files changed, 123 insertions(+)

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 73876c0e2bba..e52f8207ab27 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -823,6 +823,8 @@ struct br_mcast_stats {
 /* bridge boolean options
  * BR_BOOLOPT_NO_LL_LEARN - disable learning from link-local packets
  * BR_BOOLOPT_MCAST_VLAN_SNOOPING - control vlan multicast snooping
+ * BR_BOOLOPT_FDB_LOCAL_VLAN_0 - local FDB entries installed by the bridge
+ *                               driver itself should only be added on VLAN 0
  *
  * IMPORTANT: if adding a new option do not forget to handle
  *            it in br_boolopt_toggle/get and bridge sysfs
@@ -832,6 +834,7 @@ enum br_boolopt_id {
 	BR_BOOLOPT_MCAST_VLAN_SNOOPING,
 	BR_BOOLOPT_MST_ENABLE,
 	BR_BOOLOPT_MDB_OFFLOAD_FAIL_NOTIFICATION,
+	BR_BOOLOPT_FDB_LOCAL_VLAN_0,
 	BR_BOOLOPT_MAX
 };
 
diff --git a/net/bridge/br.c b/net/bridge/br.c
index 1885d0c315f0..4bfaf543835a 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -259,6 +259,23 @@ static struct notifier_block br_switchdev_blocking_notifier = {
 	.notifier_call = br_switchdev_blocking_event,
 };
 
+static int
+br_toggle_fdb_local_vlan_0(struct net_bridge *br, bool on,
+			   struct netlink_ext_ack *extack)
+{
+	int err;
+
+	if (br_opt_get(br, BROPT_FDB_LOCAL_VLAN_0) == on)
+		return 0;
+
+	err = br_fdb_toggle_local_vlan_0(br, on, extack);
+	if (err)
+		return err;
+
+	br_opt_toggle(br, BROPT_FDB_LOCAL_VLAN_0, on);
+	return 0;
+}
+
 /* br_boolopt_toggle - change user-controlled boolean option
  *
  * @br: bridge device
@@ -287,6 +304,9 @@ int br_boolopt_toggle(struct net_bridge *br, enum br_boolopt_id opt, bool on,
 	case BR_BOOLOPT_MDB_OFFLOAD_FAIL_NOTIFICATION:
 		br_opt_toggle(br, BROPT_MDB_OFFLOAD_FAIL_NOTIFICATION, on);
 		break;
+	case BR_BOOLOPT_FDB_LOCAL_VLAN_0:
+		err = br_toggle_fdb_local_vlan_0(br, on, extack);
+		break;
 	default:
 		/* shouldn't be called with unsupported options */
 		WARN_ON(1);
@@ -307,6 +327,8 @@ int br_boolopt_get(const struct net_bridge *br, enum br_boolopt_id opt)
 		return br_opt_get(br, BROPT_MST_ENABLED);
 	case BR_BOOLOPT_MDB_OFFLOAD_FAIL_NOTIFICATION:
 		return br_opt_get(br, BROPT_MDB_OFFLOAD_FAIL_NOTIFICATION);
+	case BR_BOOLOPT_FDB_LOCAL_VLAN_0:
+		return br_opt_get(br, BROPT_FDB_LOCAL_VLAN_0);
 	default:
 		/* shouldn't be called with unsupported options */
 		WARN_ON(1);
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 4a20578517a5..58d22e2b85fc 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -582,6 +582,102 @@ void br_fdb_cleanup(struct work_struct *work)
 	mod_delayed_work(system_long_wq, &br->gc_work, work_delay);
 }
 
+static void br_fdb_delete_locals_per_vlan_port(struct net_bridge *br,
+					       struct net_bridge_port *p)
+{
+	struct net_bridge_vlan_group *vg;
+	struct net_bridge_vlan *v;
+	struct net_device *dev;
+
+	if (p) {
+		vg = nbp_vlan_group(p);
+		dev = p->dev;
+	} else {
+		vg = br_vlan_group(br);
+		dev = br->dev;
+	}
+
+	list_for_each_entry(v, &vg->vlan_list, vlist)
+		br_fdb_find_delete_local(br, p, dev->dev_addr, v->vid);
+}
+
+static void br_fdb_delete_locals_per_vlan(struct net_bridge *br)
+{
+	struct net_bridge_port *p;
+
+	ASSERT_RTNL();
+
+	list_for_each_entry(p, &br->port_list, list)
+		br_fdb_delete_locals_per_vlan_port(br, p);
+
+	br_fdb_delete_locals_per_vlan_port(br, NULL);
+}
+
+static int br_fdb_insert_locals_per_vlan_port(struct net_bridge *br,
+					      struct net_bridge_port *p,
+					      struct netlink_ext_ack *extack)
+{
+	struct net_bridge_vlan_group *vg;
+	struct net_bridge_vlan *v;
+	struct net_device *dev;
+	int err;
+
+	if (p) {
+		vg = nbp_vlan_group(p);
+		dev = p->dev;
+	} else {
+		vg = br_vlan_group(br);
+		dev = br->dev;
+	}
+
+	list_for_each_entry(v, &vg->vlan_list, vlist) {
+		if (!br_vlan_should_use(v))
+			continue;
+
+		err = br_fdb_add_local(br, p, dev->dev_addr, v->vid);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int br_fdb_insert_locals_per_vlan(struct net_bridge *br,
+					 struct netlink_ext_ack *extack)
+{
+	struct net_bridge_port *p;
+	int err;
+
+	ASSERT_RTNL();
+
+	list_for_each_entry(p, &br->port_list, list) {
+		err = br_fdb_insert_locals_per_vlan_port(br, p, extack);
+		if (err)
+			goto rollback;
+	}
+
+	err = br_fdb_insert_locals_per_vlan_port(br, NULL, extack);
+	if (err)
+		goto rollback;
+
+	return 0;
+
+rollback:
+	NL_SET_ERR_MSG_MOD(extack, "fdb_local_vlan_0 toggle: FDB entry insertion failed");
+	br_fdb_delete_locals_per_vlan(br);
+	return err;
+}
+
+int br_fdb_toggle_local_vlan_0(struct net_bridge *br, bool on,
+			       struct netlink_ext_ack *extack)
+{
+	if (!on)
+		return br_fdb_insert_locals_per_vlan(br, extack);
+
+	br_fdb_delete_locals_per_vlan(br);
+	return 0;
+}
+
 static bool __fdb_flush_matches(const struct net_bridge *br,
 				const struct net_bridge_fdb_entry *f,
 				const struct net_bridge_fdb_flush_desc *desc)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 87da287f19fe..16be5d250402 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -844,6 +844,8 @@ void br_fdb_find_delete_local(struct net_bridge *br,
 void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr);
 void br_fdb_change_mac_address(struct net_bridge *br, const u8 *newaddr);
 void br_fdb_cleanup(struct work_struct *work);
+int br_fdb_toggle_local_vlan_0(struct net_bridge *br, bool on,
+			       struct netlink_ext_ack *extack);
 void br_fdb_delete_by_port(struct net_bridge *br,
 			   const struct net_bridge_port *p, u16 vid, int do_all);
 struct net_bridge_fdb_entry *br_fdb_find_rcu(struct net_bridge *br,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 07/10] selftests: defer: Allow spaces in arguments of deferred commands
  2025-09-04 17:07 [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Petr Machata
                   ` (5 preceding siblings ...)
  2025-09-04 17:07 ` [PATCH net-next 06/10] net: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0 Petr Machata
@ 2025-09-04 17:07 ` Petr Machata
  2025-09-04 17:07 ` [PATCH net-next 08/10] selftests: defer: Introduce DEFER_PAUSE_ON_FAIL Petr Machata
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-09-04 17:07 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Simon Horman, Nikolay Aleksandrov, Ido Schimmel, bridge,
	Petr Machata, mlxsw, Shuah Khan, linux-kselftest

Currently the way deferred commands are stored and invoked causes any
whitespace to act as an argument separator when the command is executed.
To make it possible to use spaces in deferred commands, store the commands
quoted, and then eval the string prior to execution.

Fixes: a6e263f125cd ("selftests: net: lib: Introduce deferred commands")
Signed-off-by: Petr Machata <petrm@nvidia.com>
---

Notes:
CC: Shuah Khan <shuah@kernel.org>
CC: linux-kselftest@vger.kernel.org

 tools/testing/selftests/net/lib/sh/defer.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/lib/sh/defer.sh b/tools/testing/selftests/net/lib/sh/defer.sh
index 082f5d38321b..6c642f3d0ced 100644
--- a/tools/testing/selftests/net/lib/sh/defer.sh
+++ b/tools/testing/selftests/net/lib/sh/defer.sh
@@ -39,7 +39,7 @@ __defer__run()
 	local defer_ix=$1; shift
 	local defer_key=$(__defer__defer_key $track $defer_ix)
 
-	${__DEFER__JOBS[$defer_key]}
+	eval ${__DEFER__JOBS[$defer_key]}
 	unset __DEFER__JOBS[$defer_key]
 }
 
@@ -49,7 +49,7 @@ __defer__schedule()
 	local ndefers=$(__defer__ndefers $track)
 	local ndefers_key=$(__defer__ndefer_key $track)
 	local defer_key=$(__defer__defer_key $track $ndefers)
-	local defer="$@"
+	local defer="${@@Q}"
 
 	__DEFER__JOBS[$defer_key]="$defer"
 	__DEFER__NJOBS[$ndefers_key]=$((ndefers + 1))
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 08/10] selftests: defer: Introduce DEFER_PAUSE_ON_FAIL
  2025-09-04 17:07 [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Petr Machata
                   ` (6 preceding siblings ...)
  2025-09-04 17:07 ` [PATCH net-next 07/10] selftests: defer: Allow spaces in arguments of deferred commands Petr Machata
@ 2025-09-04 17:07 ` Petr Machata
  2025-09-04 17:07 ` [PATCH net-next 09/10] selftests: net: lib.sh: Don't defer failed commands Petr Machata
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-09-04 17:07 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Simon Horman, Nikolay Aleksandrov, Ido Schimmel, bridge,
	Petr Machata, mlxsw, Shuah Khan, linux-kselftest

The fact that all cleanup (ideally) goes through the defer framework makes
debugging of these commands a bit tricky. However, this also gives us a
nice point to place a hook along the lines of PAUSE_ON_FAIL. When the
environment variable DEFER_PAUSE_ON_FAIL is set, and a cleanup command
results in non-zero exit status, show a bit of debuginfo and give the user
an opportunity to interrupt the execution altogether.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---

Notes:
CC: Shuah Khan <shuah@kernel.org>
CC: linux-kselftest@vger.kernel.org

 tools/testing/selftests/net/lib/sh/defer.sh | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tools/testing/selftests/net/lib/sh/defer.sh b/tools/testing/selftests/net/lib/sh/defer.sh
index 6c642f3d0ced..47ab78c4d465 100644
--- a/tools/testing/selftests/net/lib/sh/defer.sh
+++ b/tools/testing/selftests/net/lib/sh/defer.sh
@@ -1,6 +1,10 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+# Whether to pause and allow debugging when an executed deferred command has a
+# non-zero exit code.
+: "${DEFER_PAUSE_ON_FAIL:=no}"
+
 # map[(scope_id,track,cleanup_id) -> cleanup_command]
 # track={d=default | p=priority}
 declare -A __DEFER__JOBS
@@ -38,8 +42,20 @@ __defer__run()
 	local track=$1; shift
 	local defer_ix=$1; shift
 	local defer_key=$(__defer__defer_key $track $defer_ix)
+	local ret
 
 	eval ${__DEFER__JOBS[$defer_key]}
+	ret=$?
+
+	if [[ "$DEFER_PAUSE_ON_FAIL" == yes && "$ret" -ne 0 ]]; then
+		echo "Deferred command (track $track index $defer_ix):"
+		echo "	${__DEFER__JOBS[$defer_key]}"
+		echo "... ended with an exit status of $ret"
+		echo "Hit enter to continue, 'q' to quit"
+		read a
+		[[ "$a" == q ]] && exit 1
+	fi
+
 	unset __DEFER__JOBS[$defer_key]
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 09/10] selftests: net: lib.sh: Don't defer failed commands
  2025-09-04 17:07 [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Petr Machata
                   ` (7 preceding siblings ...)
  2025-09-04 17:07 ` [PATCH net-next 08/10] selftests: defer: Introduce DEFER_PAUSE_ON_FAIL Petr Machata
@ 2025-09-04 17:07 ` Petr Machata
  2025-09-04 17:07 ` [PATCH net-next 10/10] selftests: forwarding: Add test for BR_BOOLOPT_FDB_LOCAL_VLAN_0 Petr Machata
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-09-04 17:07 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Simon Horman, Nikolay Aleksandrov, Ido Schimmel, bridge,
	Petr Machata, mlxsw, Shuah Khan, linux-kselftest

Usually the autodefer helpers in lib.sh are expected to be run in context
where success is the expected outcome. However when using them for feature
detection, failure can legitimately occur. But the failed command still
schedules a cleanup, which will likely fail again.

Instead, only schedule deferred cleanup when the positive command succeeds.

This way of organizing the cleanup has the added benefit that now the
return code from these functions reflects whether the command passed.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---

Notes:
CC: Shuah Khan <shuah@kernel.org>
CC: linux-kselftest@vger.kernel.org

 tools/testing/selftests/net/lib.sh | 32 +++++++++++++++---------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index c7add0dc4c60..80cf1a75136c 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -547,8 +547,8 @@ ip_link_add()
 {
 	local name=$1; shift
 
-	ip link add name "$name" "$@"
-	defer ip link del dev "$name"
+	ip link add name "$name" "$@" && \
+		defer ip link del dev "$name"
 }
 
 ip_link_set_master()
@@ -556,8 +556,8 @@ ip_link_set_master()
 	local member=$1; shift
 	local master=$1; shift
 
-	ip link set dev "$member" master "$master"
-	defer ip link set dev "$member" nomaster
+	ip link set dev "$member" master "$master" && \
+		defer ip link set dev "$member" nomaster
 }
 
 ip_link_set_addr()
@@ -566,8 +566,8 @@ ip_link_set_addr()
 	local addr=$1; shift
 
 	local old_addr=$(mac_get "$name")
-	ip link set dev "$name" address "$addr"
-	defer ip link set dev "$name" address "$old_addr"
+	ip link set dev "$name" address "$addr" && \
+		defer ip link set dev "$name" address "$old_addr"
 }
 
 ip_link_has_flag()
@@ -590,8 +590,8 @@ ip_link_set_up()
 	local name=$1; shift
 
 	if ! ip_link_is_up "$name"; then
-		ip link set dev "$name" up
-		defer ip link set dev "$name" down
+		ip link set dev "$name" up && \
+			defer ip link set dev "$name" down
 	fi
 }
 
@@ -600,8 +600,8 @@ ip_link_set_down()
 	local name=$1; shift
 
 	if ip_link_is_up "$name"; then
-		ip link set dev "$name" down
-		defer ip link set dev "$name" up
+		ip link set dev "$name" down && \
+			defer ip link set dev "$name" up
 	fi
 }
 
@@ -609,20 +609,20 @@ ip_addr_add()
 {
 	local name=$1; shift
 
-	ip addr add dev "$name" "$@"
-	defer ip addr del dev "$name" "$@"
+	ip addr add dev "$name" "$@" && \
+		defer ip addr del dev "$name" "$@"
 }
 
 ip_route_add()
 {
-	ip route add "$@"
-	defer ip route del "$@"
+	ip route add "$@" && \
+		defer ip route del "$@"
 }
 
 bridge_vlan_add()
 {
-	bridge vlan add "$@"
-	defer bridge vlan del "$@"
+	bridge vlan add "$@" && \
+		defer bridge vlan del "$@"
 }
 
 wait_local_port_listen()
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 10/10] selftests: forwarding: Add test for BR_BOOLOPT_FDB_LOCAL_VLAN_0
  2025-09-04 17:07 [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Petr Machata
                   ` (8 preceding siblings ...)
  2025-09-04 17:07 ` [PATCH net-next 09/10] selftests: net: lib.sh: Don't defer failed commands Petr Machata
@ 2025-09-04 17:07 ` Petr Machata
  2025-09-06 18:16 ` [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Nikolay Aleksandrov
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-09-04 17:07 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Simon Horman, Nikolay Aleksandrov, Ido Schimmel, bridge,
	Petr Machata, mlxsw, Shuah Khan, linux-kselftest

Add a selftest to check the operation of this newly-introduced bridge
option.

Signed-off-by: Petr Machata <petrm@nvidia.com>
---

Notes:
CC: Shuah Khan <shuah@kernel.org>
CC: linux-kselftest@vger.kernel.org

 .../testing/selftests/net/forwarding/Makefile |   1 +
 .../net/forwarding/bridge_fdb_local_vlan_0.sh | 374 ++++++++++++++++++
 2 files changed, 375 insertions(+)
 create mode 100755 tools/testing/selftests/net/forwarding/bridge_fdb_local_vlan_0.sh

diff --git a/tools/testing/selftests/net/forwarding/Makefile b/tools/testing/selftests/net/forwarding/Makefile
index 0a0d4c2a85f7..e6f482a600da 100644
--- a/tools/testing/selftests/net/forwarding/Makefile
+++ b/tools/testing/selftests/net/forwarding/Makefile
@@ -5,6 +5,7 @@ TEST_PROGS = \
 	bridge_fdb_learning_limit.sh \
 	bridge_igmp.sh \
 	bridge_locked_port.sh \
+	bridge_fdb_local_vlan_0.sh \
 	bridge_mdb.sh \
 	bridge_mdb_host.sh \
 	bridge_mdb_max.sh \
diff --git a/tools/testing/selftests/net/forwarding/bridge_fdb_local_vlan_0.sh b/tools/testing/selftests/net/forwarding/bridge_fdb_local_vlan_0.sh
new file mode 100755
index 000000000000..5a0b43aff5aa
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/bridge_fdb_local_vlan_0.sh
@@ -0,0 +1,374 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# +-----------------------+ +-----------------------+ +-----------------------+
+# | H1 (vrf)              | | H2 (vrf)              | | H3 (vrf)              |
+# |    + $h1              | |    + $h2              | |    + $h3              |
+# |    | 192.0.2.1/28     | |    | 192.0.2.2/28     | |    | 192.0.2.18/28    |
+# |    | 2001:db8:1::1/64 | |    | 2001:db8:1::2/64 | |    | 2001:db8:2::2/64 |
+# |    |                  | |    |                  | |    |                  |
+# +----|------------------+ +----|------------------+ +----|------------------+
+#      |                         |                         |
+# +----|-------------------------|-------------------------|------------------+
+# | +--|-------------------------|------------------+      |                  |
+# | |  + $swp1                   + $swp2            |      + $swp3            |
+# | |                                               |        192.0.2.17/28    |
+# | |  BR1 (802.1q)                                 |        2001:db8:2::1/64 |
+# | |  192.0.2.3/28                                 |                         |
+# | |  2001:db8:1::3/64                             |                         |
+# | +-----------------------------------------------+                      SW |
+# +---------------------------------------------------------------------------+
+#
+#shellcheck disable=SC2317 # SC doesn't see our uses of functions.
+#shellcheck disable=SC2034 # ... and global variables
+
+ALL_TESTS="
+	test_d_no_sharing
+	test_d_sharing
+	test_q_no_sharing
+	test_q_sharing
+"
+
+NUM_NETIFS=6
+source lib.sh
+
+pMAC=00:11:22:33:44:55
+bMAC=00:11:22:33:44:66
+mMAC=00:11:22:33:44:77
+xMAC=00:11:22:33:44:88
+
+host_create()
+{
+	local h=$1; shift
+	local ipv4=$1; shift
+	local ipv6=$1; shift
+
+	simple_if_init "$h" "$ipv4" "$ipv6"
+	defer simple_if_fini "$h" "$ipv4" "$ipv6"
+
+	ip_route_add vrf "v$h" 192.0.2.16/28 nexthop via 192.0.2.3
+	ip_route_add vrf "v$h" 2001:db8:2::/64 nexthop via 2001:db8:1::3
+}
+
+h3_create()
+{
+	simple_if_init "$h3" 192.0.2.18/28 2001:db8:2::2/64
+	defer simple_if_fini "$h3" 192.0.2.18/28 2001:db8:2::2/64
+
+	ip_route_add vrf "v$h3" 192.0.2.0/28 nexthop via 192.0.2.17
+	ip_route_add vrf "v$h3" 2001:db8:1::/64 nexthop via 2001:db8:2::1
+
+	tc qdisc add dev "$h3" clsact
+	defer tc qdisc del dev "$h3" clsact
+
+	tc filter add dev "$h3" ingress proto ip pref 104 \
+	   flower skip_hw ip_proto udp dst_port 4096 \
+	   action pass
+	defer tc filter del dev "$h3" ingress proto ip pref 104
+
+	tc qdisc add dev "$h2" clsact
+	defer tc qdisc del dev "$h2" clsact
+
+	tc filter add dev "$h2" ingress proto ip pref 104 \
+	   flower skip_hw ip_proto udp dst_port 4096 \
+	   action pass
+	defer tc filter del dev "$h2" ingress proto ip pref 104
+}
+
+switch_create()
+{
+	ip_link_set_up "$swp1"
+
+	ip_link_set_up "$swp2"
+
+	ip_addr_add "$swp3" 192.0.2.17/28
+	ip_addr_add "$swp3" 2001:db8:2::1/64
+	ip_link_set_up "$swp3"
+}
+
+setup_prepare()
+{
+	h1=${NETIFS[p1]}
+	swp1=${NETIFS[p2]}
+
+	swp2=${NETIFS[p3]}
+	h2=${NETIFS[p4]}
+
+	swp3=${NETIFS[p5]}
+	h3=${NETIFS[p6]}
+
+	vrf_prepare
+	defer vrf_cleanup
+
+	forwarding_enable
+	defer forwarding_restore
+
+	host_create "$h1" 192.0.2.1/28 2001:db8:1::1/64
+	host_create "$h2" 192.0.2.2/28 2001:db8:1::2/64
+	h3_create
+
+	switch_create
+}
+
+adf_bridge_create()
+{
+	local dev
+	local mac
+
+	ip_link_add br up type bridge vlan_default_pvid 0 "$@"
+	mac=$(mac_get br)
+	ip_addr_add br 192.0.2.3/28
+	ip_addr_add br 2001:db8:1::3/64
+
+	bridge_vlan_add dev br vid 1 pvid untagged self
+	bridge_vlan_add dev br vid 2 self
+	bridge_vlan_add dev br vid 3 self
+
+	for dev in "$swp1" "$swp2"; do
+		ip_link_set_master "$dev" br
+		bridge_vlan_add dev "$dev" vid 1 pvid untagged
+		bridge_vlan_add dev "$dev" vid 2
+		bridge_vlan_add dev "$dev" vid 3
+	done
+
+	ip_link_set_addr br "$mac"
+}
+
+check_fdb_local_vlan_0_support()
+{
+	if ip_link_add XXbr up type bridge vlan_filtering 1 fdb_local_vlan_0 1 \
+		    &>/dev/null; then
+		return 0
+	fi
+
+	log_test_skip "FDB sharing" \
+		      "iproute 2 or the kernel do not support fdb_local_vlan_0"
+}
+
+check_mac_presence()
+{
+	local should_fail=$1; shift
+	local dev=$1; shift
+	local vlan=$1; shift
+	local mac
+
+	mac=$(mac_get "$dev")
+
+	if ((vlan == 0)); then
+		vlan=null
+	fi
+
+	bridge -j fdb show dev "$dev" |
+	    jq -e --arg mac "$mac" --argjson vlan "$vlan" \
+	       '.[] | select(.mac == $mac) | select(.vlan == $vlan)' > /dev/null
+	check_err_fail "$should_fail" $? "FDB dev $dev vid $vlan addr $mac exists"
+}
+
+do_sharing_test()
+{
+	local should_fail=$1; shift
+	local what=$1; shift
+	local dev
+
+	RET=0
+
+	for dev in "$swp1" "$swp2" br; do
+		check_mac_presence 0 "$dev" 0
+		check_mac_presence "$should_fail" "$dev" 1
+		check_mac_presence "$should_fail" "$dev" 2
+		check_mac_presence "$should_fail" "$dev" 3
+	done
+
+	log_test "$what"
+}
+
+do_end_to_end_test()
+{
+	local mac=$1; shift
+	local what=$1; shift
+	local probe_dev=${1-$h3}; shift
+	local expect=${1-10}; shift
+
+	local t0
+	local t1
+	local dd
+
+	RET=0
+
+	# In mausezahn, use $dev MAC as the destination MAC. In the MAC sharing
+	# context, that will cause an FDB miss on VLAN 1 and prompt a second
+	# lookup in VLAN 0.
+
+	t0=$(tc_rule_stats_get "$probe_dev" 104 ingress)
+
+	$MZ "$h1" -c 10 -p 64 -a own -b "$mac" \
+		  -A 192.0.2.1 -B 192.0.2.18 -t udp "dp=4096,sp=2048" -q
+	sleep 1
+
+	t1=$(tc_rule_stats_get "$probe_dev" 104 ingress)
+	dd=$((t1 - t0))
+
+	((dd == expect))
+	check_err $? "Expected $expect packets on $probe_dev got $dd"
+
+	log_test "$what"
+}
+
+do_tests()
+{
+	local should_fail=$1; shift
+	local what=$1; shift
+	local swp1_mac
+	local br_mac
+
+	swp1_mac=$(mac_get "$swp1")
+	br_mac=$(mac_get br)
+
+	do_sharing_test "$should_fail" "$what"
+	do_end_to_end_test "$swp1_mac" "$what: end to end, $swp1 MAC"
+	do_end_to_end_test "$br_mac" "$what: end to end, br MAC"
+}
+
+bridge_standard()
+{
+	local vlan_filtering=$1; shift
+
+	if ((vlan_filtering)); then
+		echo 802.1q
+	else
+		echo 802.1d
+	fi
+}
+
+nonexistent_fdb_test()
+{
+	local vlan_filtering=$1; shift
+	local standard
+
+	standard=$(bridge_standard "$vlan_filtering")
+
+	# We expect flooding, so $h2 should get the traffic.
+	do_end_to_end_test "$xMAC" "$standard: Nonexistent FDB" "$h2"
+}
+
+misleading_fdb_test()
+{
+	local vlan_filtering=$1; shift
+	local standard
+
+	standard=$(bridge_standard "$vlan_filtering")
+
+	defer_scope_push
+		# Add an FDB entry on VLAN 0. The lookup on VLAN-aware bridge
+		# shouldn't pick this up even with fdb_local_vlan_0 enabled, so
+		# the traffic should be flooded. This all holds on
+		# vlan_filtering bridge, on non-vlan_filtering one the FDB entry
+		# is expected to be found as usual, no flooding takes place.
+		#
+		# Adding only on VLAN 0 is a bit tricky, because bridge is
+		# trying to be nice and interprets the request as if the FDB
+		# should be added on each VLAN.
+
+		bridge fdb add "$mMAC" dev "$swp1" master
+		bridge fdb del "$mMAC" dev "$swp1" vlan 1 master
+		bridge fdb del "$mMAC" dev "$swp1" vlan 2 master
+		bridge fdb del "$mMAC" dev "$swp1" vlan 3 master
+
+		local expect=$((vlan_filtering ? 10 : 0))
+		do_end_to_end_test "$mMAC" \
+				   "$standard: Lookup of non-local MAC on VLAN 0" \
+				   "$h2" "$expect"
+	defer_scope_pop
+}
+
+change_mac()
+{
+	local dev=$1; shift
+	local mac=$1; shift
+	local cur_mac
+
+	cur_mac=$(mac_get "$dev")
+
+	log_info "Change $dev MAC $cur_mac -> $mac"
+	ip_link_set_addr "$dev" "$mac"
+	defer log_info "Change $dev MAC back"
+}
+
+do_test_no_sharing()
+{
+	local vlan_filtering=$1; shift
+	local standard
+
+	standard=$(bridge_standard "$vlan_filtering")
+
+	adf_bridge_create vlan_filtering "$vlan_filtering"
+	setup_wait
+
+	do_tests 0 "$standard, no FDB sharing"
+
+	change_mac "$swp1" "$pMAC"
+	change_mac br "$bMAC"
+
+	do_tests 0 "$standard, no FDB sharing after MAC change"
+
+	in_defer_scope check_fdb_local_vlan_0_support || return
+
+	log_info "Set fdb_local_vlan_0=1"
+	ip link set dev br type bridge fdb_local_vlan_0 1
+
+	do_tests 1 "$standard, fdb sharing after toggle"
+}
+
+do_test_sharing()
+{
+	local vlan_filtering=$1; shift
+	local standard
+
+	standard=$(bridge_standard "$vlan_filtering")
+
+	in_defer_scope check_fdb_local_vlan_0_support || return
+
+	adf_bridge_create vlan_filtering "$vlan_filtering" fdb_local_vlan_0 1
+	setup_wait
+
+	do_tests 1 "$standard, FDB sharing"
+
+	nonexistent_fdb_test "$vlan_filtering"
+	misleading_fdb_test "$vlan_filtering"
+
+	change_mac "$swp1" "$pMAC"
+	change_mac br "$bMAC"
+
+	do_tests 1 "$standard, FDB sharing after MAC change"
+
+	log_info "Set fdb_local_vlan_0=0"
+	ip link set dev br type bridge fdb_local_vlan_0 0
+
+	do_tests 0 "$standard, No FDB sharing after toggle"
+}
+
+test_d_no_sharing()
+{
+	do_test_no_sharing 0
+}
+
+test_d_sharing()
+{
+	do_test_sharing 0
+}
+
+test_q_no_sharing()
+{
+	do_test_no_sharing 1
+}
+
+test_q_sharing()
+{
+	do_test_sharing 1
+}
+
+
+trap cleanup EXIT
+
+setup_prepare
+tests_run
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0
  2025-09-04 17:07 [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Petr Machata
                   ` (9 preceding siblings ...)
  2025-09-04 17:07 ` [PATCH net-next 10/10] selftests: forwarding: Add test for BR_BOOLOPT_FDB_LOCAL_VLAN_0 Petr Machata
@ 2025-09-06 18:16 ` Nikolay Aleksandrov
  2025-09-09  2:27 ` Jakub Kicinski
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Nikolay Aleksandrov @ 2025-09-06 18:16 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev
  Cc: Simon Horman, Ido Schimmel, bridge, mlxsw

On 9/4/25 20:07, Petr Machata wrote:
> The bridge FDB contains one local entry per port per VLAN, for the MAC of
> the port in question, and likewise for the bridge itself. This allows
> bridge to locally receive and punt "up" any packets whose destination MAC
> address matches that of one of the bridge interfaces or of the bridge
> itself.
> 
> The number of these local "service" FDB entries grows linearly with number
> of bridge-global VLAN memberships, but that in turn will tend to grow
> quadratically with number of ports and per-port VLAN memberships. While
> that does not cause issues during forwarding lookups, it does make dumps
> impractically slow.
> 
> As an example, with 100 interfaces, each on 4K VLANs, a full dump of FDB
> that just contains these 400K local entries, takes 6.5s. That's _without_
> considering iproute2 formatting overhead, this is just how long it takes to
> walk the FDB (repeatedly), serialize it into netlink messages, and parse
> the messages back in userspace.
> 
> This is to illustrate that with growing number of ports and VLANs, the time
> required to dump this repetitive information blows up. Arguably 4K VLANs
> per interface is not a very realistic configuration, but then modern
> switches can instead have several hundred interfaces, and we have fielded
> requests for >1K VLAN memberships per port among customers.
> 
[snip]
> All this FDB duplication is there merely to make things snappy during
> forwarding. But high-radix switches with thousands of VLANs typically do
> not process much traffic in the SW datapath at all, but rather offload vast
> majority of it. So we could exchange some of the runtime performance for a
> neater FDB.
> 
> To that end, in this patchset, introduce a new bridge option,
> BR_BOOLOPT_FDB_LOCAL_VLAN_0, which when enabled, has local FDB entries
> installed only on VLAN 0, instead of duplicating them across all VLANs.
> Then to maintain the local termination behavior, on FDB miss, the bridge
> does a second lookup on VLAN 0.
> 
> Enabling this option changes the bridge behavior in expected ways. Since
> the entries are only kept on VLAN 0, FDB get, flush and dump will not
> perceive them on non-0 VLANs. And deleting the VLAN 0 entry affects
> forwarding on all VLANs.
> 
> This patchset is loosely based on a privately circulated patch by Nikolay
> Aleksandrov.
> 

I knew this sounded familiar, I actually did try to upstream the original patch[1] way back
in 2015 and was rejected, at the time that led to the vlan rhashtable code. :-)

By the way the original idea and change predate me and were by Wilson Kok, I just polished
them and took over the patch while at Cumulus.

Now, this is presented in a much shinier new option manner with selftests which is great.
I think we can take the new option this time around, it will be very helpful for some
setups as explained.

The code looks good to me, I appreciate how well split it is.
For the series:

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>

Thanks,
  Nik

[1] https://lore.kernel.org/netdev/1440549295-3979-1-git-send-email-razor@blackwall.org/

> The patchset progresses as follows:
> 
> - Patch #1 introduces a bridge option to enable the above feature. Then
>    patches #2 to #5 gradually patch the bridge to do the right thing when
>    the option is enabled. Finally patch #6 adds the UAPI knob and the code
>    for when the feature is enabled or disabled.
> - Patches #7, #8 and #9 contain fixes and improvements to selftest
>    libraries
> - Patch #10 contains a new selftest
> 
> The corresponding iproute2 support is at:
> https://github.com/pmachata/iproute2/commits/fdb_local_vlan_0/
> 
> Petr Machata (10):
>    net: bridge: Introduce BROPT_FDB_LOCAL_VLAN_0
>    net: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss
>    net: bridge: BROPT_FDB_LOCAL_VLAN_0: On port changeaddr, skip per-VLAN
>      FDBs
>    net: bridge: BROPT_FDB_LOCAL_VLAN_0: On bridge changeaddr, skip
>      per-VLAN FDBs
>    net: bridge: BROPT_FDB_LOCAL_VLAN_0: Skip local FDBs on VLAN creation
>    net: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0
>    selftests: defer: Allow spaces in arguments of deferred commands
>    selftests: defer: Introduce DEFER_PAUSE_ON_FAIL
>    selftests: net: lib.sh: Don't defer failed commands
>    selftests: forwarding: Add test for BR_BOOLOPT_FDB_LOCAL_VLAN_0
> 
>   include/uapi/linux/if_bridge.h                |   3 +
>   net/bridge/br.c                               |  22 ++
>   net/bridge/br_fdb.c                           | 114 +++++-
>   net/bridge/br_input.c                         |   8 +
>   net/bridge/br_private.h                       |   3 +
>   net/bridge/br_vlan.c                          |  10 +-
>   .../testing/selftests/net/forwarding/Makefile |   1 +
>   .../net/forwarding/bridge_fdb_local_vlan_0.sh | 374 ++++++++++++++++++
>   tools/testing/selftests/net/lib.sh            |  32 +-
>   tools/testing/selftests/net/lib/sh/defer.sh   |  20 +-
>   10 files changed, 559 insertions(+), 28 deletions(-)
>   create mode 100755 tools/testing/selftests/net/forwarding/bridge_fdb_local_vlan_0.sh
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 02/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss
  2025-09-04 17:07 ` [PATCH net-next 02/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss Petr Machata
@ 2025-09-09  2:15   ` Jakub Kicinski
  2025-09-09 13:34     ` Petr Machata
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2025-09-09  2:15 UTC (permalink / raw)
  To: Petr Machata
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, netdev, Simon Horman,
	Nikolay Aleksandrov, Ido Schimmel, bridge, mlxsw

On Thu, 4 Sep 2025 19:07:19 +0200 Petr Machata wrote:
>  		dst = br_fdb_find_rcu(br, eth_hdr(skb)->h_dest, vid);
> +		if (unlikely(!dst && vid &&
> +			     br_opt_get(br, BROPT_FDB_LOCAL_VLAN_0))) {

What does the assembly look like for this? I wonder if we're not better
off with:

	unlikely(!dst) && vid && br_opt..

Checking dst will be very fast here, and if we missed we're already 
on the slow path. So maybe we're better off moving the rest of the
condition out of the fast path, too?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0
  2025-09-04 17:07 [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Petr Machata
                   ` (10 preceding siblings ...)
  2025-09-06 18:16 ` [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Nikolay Aleksandrov
@ 2025-09-09  2:27 ` Jakub Kicinski
  2025-09-09  9:07   ` Nikolay Aleksandrov
  2025-09-09 12:12   ` Petr Machata
  2025-09-12  2:05 ` Jakub Kicinski
  2025-09-12  2:10 ` patchwork-bot+netdevbpf
  13 siblings, 2 replies; 22+ messages in thread
From: Jakub Kicinski @ 2025-09-09  2:27 UTC (permalink / raw)
  To: Petr Machata
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, netdev, Simon Horman,
	Nikolay Aleksandrov, Ido Schimmel, bridge, mlxsw, Vladimir Oltean,
	Andrew Lunn, Jiri Pirko

On Thu, 4 Sep 2025 19:07:17 +0200 Petr Machata wrote:
> Yet another option might be to use in-kernel FDB filtering, and to filter
> the local entries when dumping. Unfortunately, this does not help all that
> much either, because the linked-list walk still needs to happen. Also, with
> the obvious filtering interface built around ndm_flags / ndm_state
> filtering, one can't just exclude pure local entries in one query. One
> needs to dump all non-local entries first, and then to get permanent
> entries in another run filter local & added_by_user. I.e. one needs to pay
> the iteration overhead twice, and then integrate the result in userspace.
> To get significant savings, one would need a very specific knob like "dump,
> but skip/only include local entries". But if we are adding a local-specific
> knobs, maybe let's have an option to just not duplicate them in the first
> place.

Local-specific knob for dump seems like the most direct way to address
your concern, if I'm reading the cover letter right. Also, is it normal
to special case vlan 0 the way this series does? Wouldn't it be cleaner
to store local entries in a separate hash table? Perhaps if they lived
in a separate hash table it'd be odd to dump them for VLAN 0 (so the
series also conflates the kernel internals and control path/dump output)

Given that Nik has authored the previous version a third opinion would
be great... adding a handful of people to CC.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0
  2025-09-09  2:27 ` Jakub Kicinski
@ 2025-09-09  9:07   ` Nikolay Aleksandrov
  2025-09-09  9:19     ` Nikolay Aleksandrov
  2025-10-04 13:19     ` Linus Lüssing
  2025-09-09 12:12   ` Petr Machata
  1 sibling, 2 replies; 22+ messages in thread
From: Nikolay Aleksandrov @ 2025-09-09  9:07 UTC (permalink / raw)
  To: Jakub Kicinski, Petr Machata
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, netdev, Simon Horman,
	Ido Schimmel, bridge, mlxsw, Vladimir Oltean, Andrew Lunn,
	Jiri Pirko

On 9/9/25 05:27, Jakub Kicinski wrote:
> On Thu, 4 Sep 2025 19:07:17 +0200 Petr Machata wrote:
>> Yet another option might be to use in-kernel FDB filtering, and to filter
>> the local entries when dumping. Unfortunately, this does not help all that
>> much either, because the linked-list walk still needs to happen. Also, with
>> the obvious filtering interface built around ndm_flags / ndm_state
>> filtering, one can't just exclude pure local entries in one query. One
>> needs to dump all non-local entries first, and then to get permanent
>> entries in another run filter local & added_by_user. I.e. one needs to pay
>> the iteration overhead twice, and then integrate the result in userspace.
>> To get significant savings, one would need a very specific knob like "dump,
>> but skip/only include local entries". But if we are adding a local-specific
>> knobs, maybe let's have an option to just not duplicate them in the first
>> place.
> 
> Local-specific knob for dump seems like the most direct way to address
> your concern, if I'm reading the cover letter right. Also, is it normal
> to special case vlan 0 the way this series does? Wouldn't it be cleaner
> to store local entries in a separate hash table? Perhaps if they lived
> in a separate hash table it'd be odd to dump them for VLAN 0 (so the
> series also conflates the kernel internals and control path/dump output)
> 
> Given that Nik has authored the previous version a third opinion would> be great... adding a handful of people to CC.

My 2c, it is ok to special case vlan 0 as it is illegal to use, so it can be used
to match on "special" entries like this. I'd like to avoid the complexity of maintaining
multiple hash tables or lists just because of this issue, it is not a common problem
and I think the optional vlan 0 trick is a nice minimal way to deal with it. We already
have fdb filtering for dumps, but that requires us to go over all fdbs and with
fdb duplication the local ones alone can be a pretty high number.

The pros are that we don't have fdb duplication (much smaller number of fdbs),
we use standard bridge infra available to store and find them (the fdb rhashtable,
with local entries tagged as vlan 0) and code-wise it is pretty simple to maintain.
This solution has worked for over 10 years at Cumulus, it is well tested.

It needs to be optional though as it changes user-visible behaviour for local fdbs.
Any other solution would probably have to be optional as well.

And obviously I am biased, so good alternatives are welcome too. :-)

Cheers,
  Nik





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0
  2025-09-09  9:07   ` Nikolay Aleksandrov
@ 2025-09-09  9:19     ` Nikolay Aleksandrov
  2025-10-04 13:19     ` Linus Lüssing
  1 sibling, 0 replies; 22+ messages in thread
From: Nikolay Aleksandrov @ 2025-09-09  9:19 UTC (permalink / raw)
  To: Jakub Kicinski, Petr Machata
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, netdev, Simon Horman,
	Ido Schimmel, bridge, mlxsw, Vladimir Oltean, Andrew Lunn,
	Jiri Pirko

On 9/9/25 12:07, Nikolay Aleksandrov wrote:
> On 9/9/25 05:27, Jakub Kicinski wrote:
>> On Thu, 4 Sep 2025 19:07:17 +0200 Petr Machata wrote:
>>> Yet another option might be to use in-kernel FDB filtering, and to filter
>>> the local entries when dumping. Unfortunately, this does not help all that
>>> much either, because the linked-list walk still needs to happen. Also, with
>>> the obvious filtering interface built around ndm_flags / ndm_state
>>> filtering, one can't just exclude pure local entries in one query. One
>>> needs to dump all non-local entries first, and then to get permanent
>>> entries in another run filter local & added_by_user. I.e. one needs to pay
>>> the iteration overhead twice, and then integrate the result in userspace.
>>> To get significant savings, one would need a very specific knob like "dump,
>>> but skip/only include local entries". But if we are adding a local-specific
>>> knobs, maybe let's have an option to just not duplicate them in the first
>>> place.
>>
>> Local-specific knob for dump seems like the most direct way to address
>> your concern, if I'm reading the cover letter right. Also, is it normal
>> to special case vlan 0 the way this series does? Wouldn't it be cleaner
>> to store local entries in a separate hash table? Perhaps if they lived
>> in a separate hash table it'd be odd to dump them for VLAN 0 (so the
>> series also conflates the kernel internals and control path/dump output)
>>
>> Given that Nik has authored the previous version a third opinion would> be great... adding a handful of people to CC.
> 
> My 2c, it is ok to special case vlan 0 as it is illegal to use, so it can be used
> to match on "special" entries like this. I'd like to avoid the complexity of maintaining
> multiple hash tables or lists just because of this issue, it is not a common problem
> and I think the optional vlan 0 trick is a nice minimal way to deal with it. We already
> have fdb filtering for dumps, but that requires us to go over all fdbs and with
> fdb duplication the local ones alone can be a pretty high number.
> 

And just to make sure it is clear - usually we are talking about a low number of fdb entries
(w/o duplicates), IIRC up to 100 in most cases.

> The pros are that we don't have fdb duplication (much smaller number of fdbs),
> we use standard bridge infra available to store and find them (the fdb rhashtable,
> with local entries tagged as vlan 0) and code-wise it is pretty simple to maintain.
> This solution has worked for over 10 years at Cumulus, it is well tested.
> 
> It needs to be optional though as it changes user-visible behaviour for local fdbs.
> Any other solution would probably have to be optional as well.
> 
> And obviously I am biased, so good alternatives are welcome too. :-)
> 
> Cheers,
>   Nik
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0
  2025-09-09  2:27 ` Jakub Kicinski
  2025-09-09  9:07   ` Nikolay Aleksandrov
@ 2025-09-09 12:12   ` Petr Machata
  1 sibling, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-09-09 12:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Petr Machata, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	Simon Horman, Nikolay Aleksandrov, Ido Schimmel, bridge, mlxsw,
	Vladimir Oltean, Andrew Lunn, Jiri Pirko


Jakub Kicinski <kuba@kernel.org> writes:

> On Thu, 4 Sep 2025 19:07:17 +0200 Petr Machata wrote:
>> Yet another option might be to use in-kernel FDB filtering, and to filter
>> the local entries when dumping. Unfortunately, this does not help all that
>> much either, because the linked-list walk still needs to happen. Also, with
>> the obvious filtering interface built around ndm_flags / ndm_state
>> filtering, one can't just exclude pure local entries in one query. One
>> needs to dump all non-local entries first, and then to get permanent
>> entries in another run filter local & added_by_user. I.e. one needs to pay
>> the iteration overhead twice, and then integrate the result in userspace.
>> To get significant savings, one would need a very specific knob like "dump,
>> but skip/only include local entries". But if we are adding a local-specific
>> knobs, maybe let's have an option to just not duplicate them in the first
>> place.
>
> Local-specific knob for dump seems like the most direct way to address
> your concern, if I'm reading the cover letter right. Also, is it normal
> to special case vlan 0 the way this series does? Wouldn't it be cleaner
> to store local entries in a separate hash table? Perhaps if they lived
> in a separate hash table it'd be odd to dump them for VLAN 0 (so the
> series also conflates the kernel internals and control path/dump output)

I'm not sure why it would be helpful to keep them separate. You would
still need to dump them presumably? Or maybe there's a way to request
skipping specifically local entries, but then you don't need to keep
them separate? I find it better to not create them in the first place,
because then you have faster iteration in all for-each-fdb contexts,
faster marshalling, less memory taken.

> Given that Nik has authored the previous version a third opinion would
> be great... adding a handful of people to CC.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 02/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss
  2025-09-09  2:15   ` Jakub Kicinski
@ 2025-09-09 13:34     ` Petr Machata
  0 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-09-09 13:34 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Petr Machata, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	Simon Horman, Nikolay Aleksandrov, Ido Schimmel, bridge, mlxsw


Jakub Kicinski <kuba@kernel.org> writes:

> On Thu, 4 Sep 2025 19:07:19 +0200 Petr Machata wrote:
>>  		dst = br_fdb_find_rcu(br, eth_hdr(skb)->h_dest, vid);
>> +		if (unlikely(!dst && vid &&
>> +			     br_opt_get(br, BROPT_FDB_LOCAL_VLAN_0))) {
>
> What does the assembly look like for this? I wonder if we're not better
> off with:
>
> 	unlikely(!dst) && vid && br_opt..

Well, I don't see much there. A couple basic blocks end up reordered is
all I can glean out. I looked at GCC tree dumps which are more
transparent to me, and it's the same exact code in both cases, except
for variable naming, basic block numbering and branch prediction
annotations -- kinda obviously.

> Checking dst will be very fast here, and if we missed we're already 
> on the slow path. So maybe we're better off moving the rest of the
> condition out of the fast path, too?

GCC appears to compile unlikely(A && B && C) as unlikely(A) &&
unlikely(B) && unlikely(C). So it's really much of a muchness -- when we
annotate just !dst, it's going to assume 50/50 on those latter branches,
but we don't really care at that point anymore, because the first 90/10
is going to send us to the slow path. There's a bunch of 50/50 branches
because of __builtin_constant_p's, but those are optimized away in
middle-end. In the end it's the same code, just different basic block
layout decisions.

So we can do either.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0
  2025-09-04 17:07 [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Petr Machata
                   ` (11 preceding siblings ...)
  2025-09-09  2:27 ` Jakub Kicinski
@ 2025-09-12  2:05 ` Jakub Kicinski
  2025-09-12  2:10 ` patchwork-bot+netdevbpf
  13 siblings, 0 replies; 22+ messages in thread
From: Jakub Kicinski @ 2025-09-12  2:05 UTC (permalink / raw)
  To: Petr Machata
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, netdev, Simon Horman,
	Nikolay Aleksandrov, Ido Schimmel, bridge, mlxsw

On Thu, 4 Sep 2025 19:07:17 +0200 Petr Machata wrote:
> The bridge FDB contains one local entry per port per VLAN, for the MAC of
> the port in question, and likewise for the bridge itself. This allows
> bridge to locally receive and punt "up" any packets whose destination MAC
> address matches that of one of the bridge interfaces or of the bridge
> itself.

Alright, I'll admit I don't have a better solution..
And my attempts to stir up controversy have failed :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0
  2025-09-04 17:07 [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Petr Machata
                   ` (12 preceding siblings ...)
  2025-09-12  2:05 ` Jakub Kicinski
@ 2025-09-12  2:10 ` patchwork-bot+netdevbpf
  13 siblings, 0 replies; 22+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-09-12  2:10 UTC (permalink / raw)
  To: Petr Machata
  Cc: davem, edumazet, kuba, pabeni, netdev, horms, razor, idosch,
	bridge, mlxsw

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 4 Sep 2025 19:07:17 +0200 you wrote:
> The bridge FDB contains one local entry per port per VLAN, for the MAC of
> the port in question, and likewise for the bridge itself. This allows
> bridge to locally receive and punt "up" any packets whose destination MAC
> address matches that of one of the bridge interfaces or of the bridge
> itself.
> 
> The number of these local "service" FDB entries grows linearly with number
> of bridge-global VLAN memberships, but that in turn will tend to grow
> quadratically with number of ports and per-port VLAN memberships. While
> that does not cause issues during forwarding lookups, it does make dumps
> impractically slow.
> 
> [...]

Here is the summary with links:
  - [net-next,01/10] net: bridge: Introduce BROPT_FDB_LOCAL_VLAN_0
    https://git.kernel.org/netdev/net-next/c/c1164178e9a8
  - [net-next,02/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss
    https://git.kernel.org/netdev/net-next/c/60d6be0931e9
  - [net-next,03/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: On port changeaddr, skip per-VLAN FDBs
    https://git.kernel.org/netdev/net-next/c/4cf5fd849787
  - [net-next,04/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: On bridge changeaddr, skip per-VLAN FDBs
    https://git.kernel.org/netdev/net-next/c/40df3b8e90ee
  - [net-next,05/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: Skip local FDBs on VLAN creation
    https://git.kernel.org/netdev/net-next/c/a29aba64e022
  - [net-next,06/10] net: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0
    https://git.kernel.org/netdev/net-next/c/21446c06b441
  - [net-next,07/10] selftests: defer: Allow spaces in arguments of deferred commands
    https://git.kernel.org/netdev/net-next/c/d89d3b29ce1a
  - [net-next,08/10] selftests: defer: Introduce DEFER_PAUSE_ON_FAIL
    https://git.kernel.org/netdev/net-next/c/ed07c8f2b854
  - [net-next,09/10] selftests: net: lib.sh: Don't defer failed commands
    https://git.kernel.org/netdev/net-next/c/fa57032941d4
  - [net-next,10/10] selftests: forwarding: Add test for BR_BOOLOPT_FDB_LOCAL_VLAN_0
    https://git.kernel.org/netdev/net-next/c/dbd91347927d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0
  2025-09-09  9:07   ` Nikolay Aleksandrov
  2025-09-09  9:19     ` Nikolay Aleksandrov
@ 2025-10-04 13:19     ` Linus Lüssing
  2025-10-04 14:31       ` Nikolay Aleksandrov
  1 sibling, 1 reply; 22+ messages in thread
From: Linus Lüssing @ 2025-10-04 13:19 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Jakub Kicinski, Petr Machata, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Simon Horman, Ido Schimmel, bridge, mlxsw,
	Vladimir Oltean, Andrew Lunn, Jiri Pirko

On Tue, Sep 09, 2025 at 12:07:43PM +0300, Nikolay Aleksandrov wrote:
> My 2c, it is ok to special case vlan 0 as it is illegal to use, so it can be used
> to match on "special" entries like this.

I'm probably missing some context, but why would VLAN 0 be illegal
to use? Isn't VLAN 0 used for untagged frames with priorities? A
priority tagged frame?

Regards, Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0
  2025-10-04 13:19     ` Linus Lüssing
@ 2025-10-04 14:31       ` Nikolay Aleksandrov
  0 siblings, 0 replies; 22+ messages in thread
From: Nikolay Aleksandrov @ 2025-10-04 14:31 UTC (permalink / raw)
  To: Linus Lüssing
  Cc: Jakub Kicinski, Petr Machata, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Simon Horman, Ido Schimmel, bridge, mlxsw,
	Vladimir Oltean, Andrew Lunn, Jiri Pirko

On October 4, 2025 4:19:25 PM GMT+03:00, "Linus Lüssing" <linus.luessing@c0d3.blue> wrote:
>On Tue, Sep 09, 2025 at 12:07:43PM +0300, Nikolay Aleksandrov wrote:
>> My 2c, it is ok to special case vlan 0 as it is illegal to use, so it can be used
>> to match on "special" entries like this.
>
>I'm probably missing some context, but why would VLAN 0 be illegal
>to use? Isn't VLAN 0 used for untagged frames with priorities? A
>priority tagged frame?
>
>Regards, Linus

Sure it is, I said it in the context of the bridge which doesn't have such support.

Cheers,
 Nik



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-10-04 14:31 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-04 17:07 [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Petr Machata
2025-09-04 17:07 ` [PATCH net-next 01/10] net: bridge: Introduce BROPT_FDB_LOCAL_VLAN_0 Petr Machata
2025-09-04 17:07 ` [PATCH net-next 02/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss Petr Machata
2025-09-09  2:15   ` Jakub Kicinski
2025-09-09 13:34     ` Petr Machata
2025-09-04 17:07 ` [PATCH net-next 03/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: On port changeaddr, skip per-VLAN FDBs Petr Machata
2025-09-04 17:07 ` [PATCH net-next 04/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: On bridge " Petr Machata
2025-09-04 17:07 ` [PATCH net-next 05/10] net: bridge: BROPT_FDB_LOCAL_VLAN_0: Skip local FDBs on VLAN creation Petr Machata
2025-09-04 17:07 ` [PATCH net-next 06/10] net: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0 Petr Machata
2025-09-04 17:07 ` [PATCH net-next 07/10] selftests: defer: Allow spaces in arguments of deferred commands Petr Machata
2025-09-04 17:07 ` [PATCH net-next 08/10] selftests: defer: Introduce DEFER_PAUSE_ON_FAIL Petr Machata
2025-09-04 17:07 ` [PATCH net-next 09/10] selftests: net: lib.sh: Don't defer failed commands Petr Machata
2025-09-04 17:07 ` [PATCH net-next 10/10] selftests: forwarding: Add test for BR_BOOLOPT_FDB_LOCAL_VLAN_0 Petr Machata
2025-09-06 18:16 ` [PATCH net-next 00/10] bridge: Allow keeping local FDB entries only on VLAN 0 Nikolay Aleksandrov
2025-09-09  2:27 ` Jakub Kicinski
2025-09-09  9:07   ` Nikolay Aleksandrov
2025-09-09  9:19     ` Nikolay Aleksandrov
2025-10-04 13:19     ` Linus Lüssing
2025-10-04 14:31       ` Nikolay Aleksandrov
2025-09-09 12:12   ` Petr Machata
2025-09-12  2:05 ` Jakub Kicinski
2025-09-12  2:10 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).