netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] net: bridge: reduce multicast checks in fast path
@ 2025-08-29  8:53 Linus Lüssing
  2025-08-29  8:53 ` [PATCH 1/9] net: bridge: mcast: track active state, IGMP/MLD querier appearance Linus Lüssing
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Linus Lüssing @ 2025-08-29  8:53 UTC (permalink / raw)
  To: bridge
  Cc: netdev, linux-kernel, Nikolay Aleksandrov, Ido Schimmel,
	Andrew Lunn, Simon Horman, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, David S . Miller, Kuniyuki Iwashima,
	Stanislav Fomichev, Xiao Liang

This patchset introduces new state variables to combine and reduce the
number of checks we would otherwise perform on every multicast packet
in fast/data path.
  
The second reason for introducing these new, internal multicast active
variables is to later propagate a safety mechanism which was introduced
in b00589af3b04 ("bridge: disable snooping if there is no querier") to
switchdev/DSA, too. That is to notify switchdev/DSA if multicast
snooping can safely be applied without potential packet loss.
    
Regards, Linus

---

# Changelog

Changelog to / follow-up of: [PATCH net-next 0/5] net: bridge: propagate safe mcast snooping to switchdev + DSA
-> https://lkml.org/lkml/2025/5/22/1413

* removed the switchdev/DSA changes for now
* splitting "[PATCH net-next 1/5] net: bridge: mcast: explicitly track active state"
  into:
  * net: bridge: mcast: track active state, IGMP/MLD querier appearance
  * net: bridge: mcast: track active state, foreign IGMP/MLD querier disappearance
  * net: bridge: mcast: track active state, IPv6 address availability
  * net: bridge: mcast: track active state, own MLD querier disappearance
  * net: bridge: mcast: use combined active state in fast/data path
  * net: bridge: mcast: track active state, bridge up/down

* rebased to current net-next/main:
  * from_timer() -> timer_container_of()

* net: bridge: mcast: export ip{4,6}_active state to netlink:
  * changing NLA_U8 to NLA_REJECT to make it read-only

* moved br_multicast_update_active() call from br_ip{4,6}_multicast_query_expired()
  (own querier timer callback) to br_ip{4,6}_multicast_querier_expired()
  (other querier timer callback)
  * even though both should have worked as br_multicast_querier_expired()
    would call br_multicast_start_querier()->...->br_multicast_query_expired(),
    even if the own querier is disabled, but let's use the more direct way

* simplified br_multicast_update_active():
  * no return value for now, don't track if the active state has changed,
    these aren't necessary (yet)
  * removed __br_multicast_update_active() variant as was used to force
    an inactive state in __br_multicast_stop(), instead using an
    netif_running(brmctx->br->dev) check in br_multicast_update_active()
  * replaced br_ip{4,6}_multicast_check_active() with simpler
    br_ip{4,6}_multicast_update_active() and
    br_ip{4,6}_multicast_querier_exists()
  * fixing build errors with CONFIG_IPV6 unset
* simplified br_multicast_toggle_enabled()
  * no return value for now
  * fixes "old used uninitialized" issue

* removed const from __br_multicast_querier_exists()'s "bool is_ipv6"
* replaced "struct ethhdr *eth" in br_multicast_{snooping,querier}_active()
  with direct ethernet protocol integer attributes
* added a few comments in br_multicast_update_active() calling functions


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/9] net: bridge: mcast: track active state, IGMP/MLD querier appearance
  2025-08-29  8:53 [PATCH 0/9] net: bridge: reduce multicast checks in fast path Linus Lüssing
@ 2025-08-29  8:53 ` Linus Lüssing
  2025-08-29  8:53 ` [PATCH 2/9] net: bridge: mcast: track active state, foreign IGMP/MLD querier disappearance Linus Lüssing
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Linus Lüssing @ 2025-08-29  8:53 UTC (permalink / raw)
  To: bridge
  Cc: netdev, linux-kernel, Nikolay Aleksandrov, Ido Schimmel,
	Andrew Lunn, Simon Horman, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, David S . Miller, Kuniyuki Iwashima,
	Stanislav Fomichev, Xiao Liang, Linus Lüssing

This is the first step to track in dedicated, per protocol family
variables if we can actively and safely use multicast snooping.
To later use these in the fast/data path instead of performing
all these checks for every packet and to later notify DSA/switchdev
about this state.

This toggles these new variables to true after a Maximum Response Delay
(default: 10 seconds) if a new IGMP or MLD querier has appeared. This
can be triggered either through receiving an IGMP/MLD query from another
host or by a user enabling our own IGMP/MLD querier.

This is the first of several requirements, similar to what
br_multicast_querier_exists() already checks so far, to be able to
reliably receive IGMP/MLD reports, which in turn are needed to build
a complete multicast database.

No functional change for the fast/data path yet.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
---
 net/bridge/br_multicast.c | 91 +++++++++++++++++++++++++++++++++++++--
 net/bridge/br_private.h   |  2 +
 2 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 22d12e545966..5cc713adcf9b 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1069,6 +1069,65 @@ static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge_mcast *brm
 	return skb;
 }
 
+static bool br_ip4_multicast_querier_exists(struct net_bridge_mcast *brmctx)
+{
+	return __br_multicast_querier_exists(brmctx, &brmctx->ip4_other_query, false);
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+static bool br_ip6_multicast_querier_exists(struct net_bridge_mcast *brmctx)
+{
+	return __br_multicast_querier_exists(brmctx, &brmctx->ip6_other_query, true);
+}
+#endif
+
+static void br_ip4_multicast_update_active(struct net_bridge_mcast *brmctx,
+					   bool force_inactive)
+{
+	if (force_inactive)
+		brmctx->ip4_active = false;
+	else
+		brmctx->ip4_active = br_ip4_multicast_querier_exists(brmctx);
+}
+
+static void br_ip6_multicast_update_active(struct net_bridge_mcast *brmctx,
+					   bool force_inactive)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+	if (force_inactive)
+		brmctx->ip6_active = false;
+	else
+		brmctx->ip6_active = br_ip6_multicast_querier_exists(brmctx);
+#endif
+}
+
+/**
+ * br_multicast_update_active() - update mcast active state
+ * @brmctx: the bridge multicast context to check
+ *
+ * This (potentially) updates the IPv4/IPv6 multicast active state. And by
+ * that enables or disables snooping of multicast payload traffic in fast
+ * path.
+ *
+ * The multicast active state is set, per protocol family, if:
+ *
+ * - an IGMP/MLD querier is present
+ *
+ * And is unset otherwise.
+ *
+ * This function should be called by anything that changes one of the
+ * above prerequisites.
+ */
+static void br_multicast_update_active(struct net_bridge_mcast *brmctx)
+{
+	bool force_inactive = false;
+
+	lockdep_assert_held_once(&brmctx->br->multicast_lock);
+
+	br_ip4_multicast_update_active(brmctx, force_inactive);
+	br_ip6_multicast_update_active(brmctx, force_inactive);
+}
+
 #if IS_ENABLED(CONFIG_IPV6)
 static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge_mcast *brmctx,
 						    struct net_bridge_mcast_port *pmctx,
@@ -1762,10 +1821,34 @@ static void br_ip6_multicast_querier_expired(struct timer_list *t)
 }
 #endif
 
-static void br_multicast_query_delay_expired(struct timer_list *t)
+static void br_ip4_multicast_query_delay_expired(struct timer_list *t)
 {
+	struct net_bridge_mcast *brmctx = timer_container_of(brmctx, t,
+							     ip4_other_query.delay_timer);
+
+	spin_lock(&brmctx->br->multicast_lock);
+	/* an own or other IGMP querier appeared some seconds ago and all
+	 * reports should have arrived by now, maybe set multicast state to active
+	 */
+	br_multicast_update_active(brmctx);
+	spin_unlock(&brmctx->br->multicast_lock);
 }
 
+#if IS_ENABLED(CONFIG_IPV6)
+static void br_ip6_multicast_query_delay_expired(struct timer_list *t)
+{
+	struct net_bridge_mcast *brmctx = timer_container_of(brmctx, t,
+							     ip6_other_query.delay_timer);
+
+	spin_lock(&brmctx->br->multicast_lock);
+	/* an own or other MLD querier appeared some seconds ago and all
+	 * reports should have arrived, maybe set multicast state to active
+	 */
+	br_multicast_update_active(brmctx);
+	spin_unlock(&brmctx->br->multicast_lock);
+}
+#endif
+
 static void br_multicast_select_own_querier(struct net_bridge_mcast *brmctx,
 					    struct br_ip *ip,
 					    struct sk_buff *skb)
@@ -4112,11 +4195,13 @@ void br_multicast_ctx_init(struct net_bridge *br,
 	brmctx->multicast_membership_interval = 260 * HZ;
 
 	brmctx->ip4_querier.port_ifidx = 0;
+	brmctx->ip4_active = 0;
 	seqcount_spinlock_init(&brmctx->ip4_querier.seq, &br->multicast_lock);
 	brmctx->multicast_igmp_version = 2;
 #if IS_ENABLED(CONFIG_IPV6)
 	brmctx->multicast_mld_version = 1;
 	brmctx->ip6_querier.port_ifidx = 0;
+	brmctx->ip6_active = 0;
 	seqcount_spinlock_init(&brmctx->ip6_querier.seq, &br->multicast_lock);
 #endif
 
@@ -4125,7 +4210,7 @@ void br_multicast_ctx_init(struct net_bridge *br,
 	timer_setup(&brmctx->ip4_other_query.timer,
 		    br_ip4_multicast_querier_expired, 0);
 	timer_setup(&brmctx->ip4_other_query.delay_timer,
-		    br_multicast_query_delay_expired, 0);
+		    br_ip4_multicast_query_delay_expired, 0);
 	timer_setup(&brmctx->ip4_own_query.timer,
 		    br_ip4_multicast_query_expired, 0);
 #if IS_ENABLED(CONFIG_IPV6)
@@ -4134,7 +4219,7 @@ void br_multicast_ctx_init(struct net_bridge *br,
 	timer_setup(&brmctx->ip6_other_query.timer,
 		    br_ip6_multicast_querier_expired, 0);
 	timer_setup(&brmctx->ip6_other_query.delay_timer,
-		    br_multicast_query_delay_expired, 0);
+		    br_ip6_multicast_query_delay_expired, 0);
 	timer_setup(&brmctx->ip6_own_query.timer,
 		    br_ip6_multicast_query_expired, 0);
 #endif
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 8de0904b9627..f83c24def595 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -160,12 +160,14 @@ struct net_bridge_mcast {
 	struct bridge_mcast_other_query	ip4_other_query;
 	struct bridge_mcast_own_query	ip4_own_query;
 	struct bridge_mcast_querier	ip4_querier;
+	bool				ip4_active;
 #if IS_ENABLED(CONFIG_IPV6)
 	struct hlist_head		ip6_mc_router_list;
 	struct timer_list		ip6_mc_router_timer;
 	struct bridge_mcast_other_query	ip6_other_query;
 	struct bridge_mcast_own_query	ip6_own_query;
 	struct bridge_mcast_querier	ip6_querier;
+	bool				ip6_active;
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 #endif /* CONFIG_BRIDGE_IGMP_SNOOPING */
 };
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/9] net: bridge: mcast: track active state, foreign IGMP/MLD querier disappearance
  2025-08-29  8:53 [PATCH 0/9] net: bridge: reduce multicast checks in fast path Linus Lüssing
  2025-08-29  8:53 ` [PATCH 1/9] net: bridge: mcast: track active state, IGMP/MLD querier appearance Linus Lüssing
@ 2025-08-29  8:53 ` Linus Lüssing
  2025-08-29  8:53 ` [PATCH 3/9] net: bridge: mcast: track active state, IPv6 address availability Linus Lüssing
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Linus Lüssing @ 2025-08-29  8:53 UTC (permalink / raw)
  To: bridge
  Cc: netdev, linux-kernel, Nikolay Aleksandrov, Ido Schimmel,
	Andrew Lunn, Simon Horman, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, David S . Miller, Kuniyuki Iwashima,
	Stanislav Fomichev, Xiao Liang, Linus Lüssing

This change ensures that the new multicast active state variable is unset
again after a foreign IGMP/MLD querier has disappeared (default: 255
seconds). If no new, other IGMP/MLD querier took over then we can't
reliably receive IGMP/MLD reports anymore and in turn can't ensure the
completeness of our MDB anymore either.

No functional change for the fast/data path yet.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
---
 net/bridge/br_multicast.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 5cc713adcf9b..2e5b5281e484 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1800,6 +1800,10 @@ static void br_multicast_querier_expired(struct net_bridge_mcast *brmctx,
 	br_multicast_start_querier(brmctx, query);
 
 out:
+	/* another IGMP/MLD querier disappeared, set multicast state to inactive
+	 * if our own querier is disabled, too
+	 */
+	br_multicast_update_active(brmctx);
 	spin_unlock(&brmctx->br->multicast_lock);
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/9] net: bridge: mcast: track active state, IPv6 address availability
  2025-08-29  8:53 [PATCH 0/9] net: bridge: reduce multicast checks in fast path Linus Lüssing
  2025-08-29  8:53 ` [PATCH 1/9] net: bridge: mcast: track active state, IGMP/MLD querier appearance Linus Lüssing
  2025-08-29  8:53 ` [PATCH 2/9] net: bridge: mcast: track active state, foreign IGMP/MLD querier disappearance Linus Lüssing
@ 2025-08-29  8:53 ` Linus Lüssing
  2025-08-29  8:53 ` [PATCH 4/9] net: bridge: mcast: track active state, own MLD querier disappearance Linus Lüssing
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Linus Lüssing @ 2025-08-29  8:53 UTC (permalink / raw)
  To: bridge
  Cc: netdev, linux-kernel, Nikolay Aleksandrov, Ido Schimmel,
	Andrew Lunn, Simon Horman, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, David S . Miller, Kuniyuki Iwashima,
	Stanislav Fomichev, Xiao Liang, Linus Lüssing

If we are the only potential MLD querier but don't have an IPv6
link-local address configured on our bridge interface then we can't
create a valid MLD query and in turn can't reliably receive MLD reports
and can't build a complete MDB. Hence disable the new multicast active
state variable then. Or reenable it if an IPv6 link-local address
became available.

No functional change for the fast/data path yet.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
---
 net/bridge/br_multicast.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 2e5b5281e484..b689e62b9e74 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1112,6 +1112,7 @@ static void br_ip6_multicast_update_active(struct net_bridge_mcast *brmctx,
  * The multicast active state is set, per protocol family, if:
  *
  * - an IGMP/MLD querier is present
+ * - for own IPv6 MLD querier: an IPv6 address is configured on the bridge
  *
  * And is unset otherwise.
  *
@@ -1206,10 +1207,12 @@ static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge_mcast *brm
 			       &ip6h->daddr, 0, &ip6h->saddr)) {
 		kfree_skb(skb);
 		br_opt_toggle(brmctx->br, BROPT_HAS_IPV6_ADDR, false);
+		br_multicast_update_active(brmctx);
 		return NULL;
 	}
 
 	br_opt_toggle(brmctx->br, BROPT_HAS_IPV6_ADDR, true);
+	br_multicast_update_active(brmctx);
 	ipv6_eth_mc_map(&ip6h->daddr, eth->h_dest);
 
 	hopopt = (u8 *)(ip6h + 1);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/9] net: bridge: mcast: track active state, own MLD querier disappearance
  2025-08-29  8:53 [PATCH 0/9] net: bridge: reduce multicast checks in fast path Linus Lüssing
                   ` (2 preceding siblings ...)
  2025-08-29  8:53 ` [PATCH 3/9] net: bridge: mcast: track active state, IPv6 address availability Linus Lüssing
@ 2025-08-29  8:53 ` Linus Lüssing
  2025-08-29  8:53 ` [PATCH 5/9] net: bridge: mcast: export ip{4,6}_active state to netlink Linus Lüssing
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Linus Lüssing @ 2025-08-29  8:53 UTC (permalink / raw)
  To: bridge
  Cc: netdev, linux-kernel, Nikolay Aleksandrov, Ido Schimmel,
	Andrew Lunn, Simon Horman, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, David S . Miller, Kuniyuki Iwashima,
	Stanislav Fomichev, Xiao Liang, Linus Lüssing

This change ensures that the new multicast active state variable is
immediately unset if our internal IGMP/MLD querier was elected and
now disabled.

If no IGMP/MLD querier exists on the link then we can't reliably receive
IGMP/MLD reports and in turn can't ensure the completeness of our MDB
anymore either.

No functional change for the fast/data path yet. This is the last
necessary check before using the new multicast active state variable
in the fast/data path, too.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
---
 net/bridge/br_multicast.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index b689e62b9e74..13840f8f2e5f 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -4849,6 +4849,7 @@ int br_multicast_set_querier(struct net_bridge_mcast *brmctx, unsigned long val)
 #endif
 
 unlock:
+	br_multicast_update_active(brmctx);
 	spin_unlock_bh(&brmctx->br->multicast_lock);
 
 	return 0;
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/9] net: bridge: mcast: export ip{4,6}_active state to netlink
  2025-08-29  8:53 [PATCH 0/9] net: bridge: reduce multicast checks in fast path Linus Lüssing
                   ` (3 preceding siblings ...)
  2025-08-29  8:53 ` [PATCH 4/9] net: bridge: mcast: track active state, own MLD querier disappearance Linus Lüssing
@ 2025-08-29  8:53 ` Linus Lüssing
  2025-08-29  8:53 ` [PATCH 6/9] net: bridge: mcast: use combined active state in fast/data path Linus Lüssing
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Linus Lüssing @ 2025-08-29  8:53 UTC (permalink / raw)
  To: bridge
  Cc: netdev, linux-kernel, Nikolay Aleksandrov, Ido Schimmel,
	Andrew Lunn, Simon Horman, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, David S . Miller, Kuniyuki Iwashima,
	Stanislav Fomichev, Xiao Liang, Linus Lüssing

Export the new ip{4,6}_active variables to netlink, to be able to
check from userspace that they are updated as intended.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
---
 include/uapi/linux/if_bridge.h |  2 ++
 include/uapi/linux/if_link.h   | 12 ++++++++++++
 net/bridge/br_netlink.c        | 10 +++++++++-
 net/bridge/br_vlan_options.c   | 10 +++++++++-
 net/core/rtnetlink.c           |  2 +-
 5 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 73876c0e2bba..c4d92df8d374 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -584,6 +584,8 @@ enum {
 	BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS,
 	BRIDGE_VLANDB_GOPTS_MCAST_QUERIER_STATE,
 	BRIDGE_VLANDB_GOPTS_MSTI,
+	BRIDGE_VLANDB_GOPTS_MCAST_ACTIVE_V4,
+	BRIDGE_VLANDB_GOPTS_MCAST_ACTIVE_V6,
 	__BRIDGE_VLANDB_GOPTS_MAX
 };
 #define BRIDGE_VLANDB_GOPTS_MAX (__BRIDGE_VLANDB_GOPTS_MAX - 1)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 784ace3a519c..fcf2c42aa6c4 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -742,6 +742,16 @@ enum in6_addr_gen_mode {
  * @IFLA_BR_FDB_MAX_LEARNED
  *   Set the number of max dynamically learned FDB entries for the current
  *   bridge.
+ *
+ * @IFLA_BR_MCAST_ACTIVE_V4
+ *   Bridge IPv4 mcast active state, read only.
+ *
+ *   1 if an IGMP querier is present, 0 otherwise.
+ *
+ * @IFLA_BR_MCAST_ACTIVE_V6
+ *   Bridge IPv6 mcast active state, read only.
+ *
+ *   1 if an MLD querier is present, 0 otherwise.
  */
 enum {
 	IFLA_BR_UNSPEC,
@@ -794,6 +804,8 @@ enum {
 	IFLA_BR_MCAST_QUERIER_STATE,
 	IFLA_BR_FDB_N_LEARNED,
 	IFLA_BR_FDB_MAX_LEARNED,
+	IFLA_BR_MCAST_ACTIVE_V4,
+	IFLA_BR_MCAST_ACTIVE_V6,
 	__IFLA_BR_MAX,
 };
 
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 4e2d53b27221..5aa1721830d6 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1264,7 +1264,9 @@ static const struct nla_policy br_policy[IFLA_BR_MAX + 1] = {
 	[IFLA_BR_VLAN_STATS_ENABLED] = { .type = NLA_U8 },
 	[IFLA_BR_MCAST_STATS_ENABLED] = { .type = NLA_U8 },
 	[IFLA_BR_MCAST_IGMP_VERSION] = { .type = NLA_U8 },
+	[IFLA_BR_MCAST_ACTIVE_V4] = { .type = NLA_REJECT },
 	[IFLA_BR_MCAST_MLD_VERSION] = { .type = NLA_U8 },
+	[IFLA_BR_MCAST_ACTIVE_V6] = { .type = NLA_REJECT },
 	[IFLA_BR_VLAN_STATS_PER_PORT] = { .type = NLA_U8 },
 	[IFLA_BR_MULTI_BOOLOPT] =
 		NLA_POLICY_EXACT_LEN(sizeof(struct br_boolopt_multi)),
@@ -1625,7 +1627,9 @@ static size_t br_get_size(const struct net_device *brdev)
 	       nla_total_size_64bit(sizeof(u64)) + /* IFLA_BR_MCAST_QUERY_RESPONSE_INTVL */
 	       nla_total_size_64bit(sizeof(u64)) + /* IFLA_BR_MCAST_STARTUP_QUERY_INTVL */
 	       nla_total_size(sizeof(u8)) +	/* IFLA_BR_MCAST_IGMP_VERSION */
+	       nla_total_size(sizeof(u8)) +     /* IFLA_BR_MCAST_ACTIVE_V4 */
 	       nla_total_size(sizeof(u8)) +	/* IFLA_BR_MCAST_MLD_VERSION */
+	       nla_total_size(sizeof(u8)) +     /* IFLA_BR_MCAST_ACTIVE_V6 */
 	       br_multicast_querier_state_size() + /* IFLA_BR_MCAST_QUERIER_STATE */
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
@@ -1717,12 +1721,16 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev)
 			br->multicast_ctx.multicast_startup_query_count) ||
 	    nla_put_u8(skb, IFLA_BR_MCAST_IGMP_VERSION,
 		       br->multicast_ctx.multicast_igmp_version) ||
+	    nla_put_u8(skb, IFLA_BR_MCAST_ACTIVE_V4,
+		       br->multicast_ctx.ip4_active) ||
 	    br_multicast_dump_querier_state(skb, &br->multicast_ctx,
 					    IFLA_BR_MCAST_QUERIER_STATE))
 		return -EMSGSIZE;
 #if IS_ENABLED(CONFIG_IPV6)
 	if (nla_put_u8(skb, IFLA_BR_MCAST_MLD_VERSION,
-		       br->multicast_ctx.multicast_mld_version))
+		       br->multicast_ctx.multicast_mld_version) ||
+	    nla_put_u8(skb, IFLA_BR_MCAST_ACTIVE_V6,
+		       br->multicast_ctx.ip6_active))
 		return -EMSGSIZE;
 #endif
 	clockval = jiffies_to_clock_t(br->multicast_ctx.multicast_last_member_interval);
diff --git a/net/bridge/br_vlan_options.c b/net/bridge/br_vlan_options.c
index 8fa89b04ee94..33f055d3a304 100644
--- a/net/bridge/br_vlan_options.c
+++ b/net/bridge/br_vlan_options.c
@@ -369,6 +369,8 @@ bool br_vlan_global_opts_fill(struct sk_buff *skb, u16 vid, u16 vid_range,
 		       !!(v_opts->priv_flags & BR_VLFLAG_GLOBAL_MCAST_ENABLED)) ||
 	    nla_put_u8(skb, BRIDGE_VLANDB_GOPTS_MCAST_IGMP_VERSION,
 		       v_opts->br_mcast_ctx.multicast_igmp_version) ||
+	    nla_put_u8(skb, BRIDGE_VLANDB_GOPTS_MCAST_ACTIVE_V4,
+		       v_opts->br_mcast_ctx.ip4_active) ||
 	    nla_put_u32(skb, BRIDGE_VLANDB_GOPTS_MCAST_LAST_MEMBER_CNT,
 			v_opts->br_mcast_ctx.multicast_last_member_count) ||
 	    nla_put_u32(skb, BRIDGE_VLANDB_GOPTS_MCAST_STARTUP_QUERY_CNT,
@@ -423,7 +425,9 @@ bool br_vlan_global_opts_fill(struct sk_buff *skb, u16 vid, u16 vid_range,
 
 #if IS_ENABLED(CONFIG_IPV6)
 	if (nla_put_u8(skb, BRIDGE_VLANDB_GOPTS_MCAST_MLD_VERSION,
-		       v_opts->br_mcast_ctx.multicast_mld_version))
+		       v_opts->br_mcast_ctx.multicast_mld_version) ||
+	    nla_put_u8(skb, BRIDGE_VLANDB_GOPTS_MCAST_ACTIVE_V6,
+		       v_opts->br_mcast_ctx.ip4_active))
 		goto out_err;
 #endif
 #endif
@@ -448,7 +452,9 @@ static size_t rtnl_vlan_global_opts_nlmsg_size(const struct net_bridge_vlan *v)
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 		+ nla_total_size(sizeof(u8)) /* BRIDGE_VLANDB_GOPTS_MCAST_SNOOPING */
 		+ nla_total_size(sizeof(u8)) /* BRIDGE_VLANDB_GOPTS_MCAST_IGMP_VERSION */
+		+ nla_total_size(sizeof(u8)) /* BRIDGE_VLANDB_GOPTS_MCAST_ACTIVE_V4 */
 		+ nla_total_size(sizeof(u8)) /* BRIDGE_VLANDB_GOPTS_MCAST_MLD_VERSION */
+		+ nla_total_size(sizeof(u8)) /* BRIDGE_VLANDB_GOPTS_MCAST_ACTIVE_V6 */
 		+ nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_GOPTS_MCAST_LAST_MEMBER_CNT */
 		+ nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_GOPTS_MCAST_STARTUP_QUERY_CNT */
 		+ nla_total_size(sizeof(u64)) /* BRIDGE_VLANDB_GOPTS_MCAST_LAST_MEMBER_INTVL */
@@ -630,9 +636,11 @@ static const struct nla_policy br_vlan_db_gpol[BRIDGE_VLANDB_GOPTS_MAX + 1] = {
 	[BRIDGE_VLANDB_GOPTS_RANGE]	= { .type = NLA_U16 },
 	[BRIDGE_VLANDB_GOPTS_MCAST_SNOOPING]	= { .type = NLA_U8 },
 	[BRIDGE_VLANDB_GOPTS_MCAST_MLD_VERSION]	= { .type = NLA_U8 },
+	[BRIDGE_VLANDB_GOPTS_MCAST_ACTIVE_V6]	= { .type = NLA_REJECT },
 	[BRIDGE_VLANDB_GOPTS_MCAST_QUERY_INTVL]	= { .type = NLA_U64 },
 	[BRIDGE_VLANDB_GOPTS_MCAST_QUERIER]	= { .type = NLA_U8 },
 	[BRIDGE_VLANDB_GOPTS_MCAST_IGMP_VERSION]	= { .type = NLA_U8 },
+	[BRIDGE_VLANDB_GOPTS_MCAST_ACTIVE_V4]	= { .type = NLA_REJECT },
 	[BRIDGE_VLANDB_GOPTS_MCAST_LAST_MEMBER_CNT]	= { .type = NLA_U32 },
 	[BRIDGE_VLANDB_GOPTS_MCAST_STARTUP_QUERY_CNT]	= { .type = NLA_U32 },
 	[BRIDGE_VLANDB_GOPTS_MCAST_LAST_MEMBER_INTVL]	= { .type = NLA_U64 },
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 094b085cff20..cb24370b719f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -62,7 +62,7 @@
 
 #include "dev.h"
 
-#define RTNL_MAX_TYPE		50
+#define RTNL_MAX_TYPE		52
 #define RTNL_SLAVE_MAX_TYPE	44
 
 struct rtnl_link {
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/9] net: bridge: mcast: use combined active state in fast/data path
  2025-08-29  8:53 [PATCH 0/9] net: bridge: reduce multicast checks in fast path Linus Lüssing
                   ` (4 preceding siblings ...)
  2025-08-29  8:53 ` [PATCH 5/9] net: bridge: mcast: export ip{4,6}_active state to netlink Linus Lüssing
@ 2025-08-29  8:53 ` Linus Lüssing
  2025-08-29  8:53 ` [PATCH 7/9] net: bridge: mcast: active state, if snooping is enabled Linus Lüssing
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Linus Lüssing @ 2025-08-29  8:53 UTC (permalink / raw)
  To: bridge
  Cc: netdev, linux-kernel, Nikolay Aleksandrov, Ido Schimmel,
	Andrew Lunn, Simon Horman, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, David S . Miller, Kuniyuki Iwashima,
	Stanislav Fomichev, Xiao Liang, Linus Lüssing

As the multicast active state variable is now always up to date and
functionally equivalent to our manual, extensive checks in fast path
we can just use this state variable in fast path, too. This allows to
save some CPU cycles for every multicast packet in the fast/data path.

Next to using brmctx->ip4_active / brmctx->ip6_active in fast path this
mostly just moves some code around to not expose it via br_private.h
anymore. While at it now also passing the ethernet protocol number
directly, instead of a pointer into the ethernet header.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
---
 net/bridge/br_device.c    |  2 +-
 net/bridge/br_input.c     |  2 +-
 net/bridge/br_multicast.c | 40 ++++++++++++++++++++++++++++++++++-----
 net/bridge/br_private.h   | 38 ++++++++-----------------------------
 4 files changed, 45 insertions(+), 37 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index a818fdc22da9..3cdf1c17108b 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -102,7 +102,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		mdst = br_mdb_entry_skb_get(brmctx, skb, vid);
 		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
-		    br_multicast_querier_exists(brmctx, eth_hdr(skb), mdst))
+		    br_multicast_snooping_active(brmctx, eth_hdr(skb)->h_proto, mdst))
 			br_multicast_flood(mdst, skb, brmctx, false, true);
 		else
 			br_flood(br, skb, BR_PKT_MULTICAST, false, true, vid);
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 5f6ac9bf1527..dfd49b309683 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -187,7 +187,7 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
 	case BR_PKT_MULTICAST:
 		mdst = br_mdb_entry_skb_get(brmctx, skb, vid);
 		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
-		    br_multicast_querier_exists(brmctx, eth_hdr(skb), mdst)) {
+		    br_multicast_snooping_active(brmctx, eth_hdr(skb)->h_proto, mdst)) {
 			if ((mdst && mdst->host_joined) ||
 			    br_multicast_is_router(brmctx, skb) ||
 			    br->dev->flags & IFF_ALLMULTI) {
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 13840f8f2e5f..54163c74b9a9 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1069,6 +1069,26 @@ static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge_mcast *brm
 	return skb;
 }
 
+static bool
+__br_multicast_querier_exists(struct net_bridge_mcast *brmctx,
+			      struct bridge_mcast_other_query *querier,
+			      bool is_ipv6)
+{
+	bool own_querier_enabled;
+
+	if (brmctx->multicast_querier) {
+		if (is_ipv6 && !br_opt_get(brmctx->br, BROPT_HAS_IPV6_ADDR))
+			own_querier_enabled = false;
+		else
+			own_querier_enabled = true;
+	} else {
+		own_querier_enabled = false;
+	}
+
+	return !timer_pending(&querier->delay_timer) &&
+	       (own_querier_enabled || timer_pending(&querier->timer));
+}
+
 static bool br_ip4_multicast_querier_exists(struct net_bridge_mcast *brmctx)
 {
 	return __br_multicast_querier_exists(brmctx, &brmctx->ip4_other_query, false);
@@ -1081,6 +1101,20 @@ static bool br_ip6_multicast_querier_exists(struct net_bridge_mcast *brmctx)
 }
 #endif
 
+static bool br_multicast_querier_exists(struct net_bridge_mcast *brmctx, int proto)
+{
+	switch (proto) {
+	case (ETH_P_IP):
+		return br_ip4_multicast_querier_exists(brmctx);
+#if IS_ENABLED(CONFIG_IPV6)
+	case (ETH_P_IPV6):
+		return br_ip6_multicast_querier_exists(brmctx);
+#endif
+	default:
+		return false;
+	}
+}
+
 static void br_ip4_multicast_update_active(struct net_bridge_mcast *brmctx,
 					   bool force_inactive)
 {
@@ -5013,7 +5047,6 @@ bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto)
 {
 	struct net_bridge *br;
 	struct net_bridge_port *port;
-	struct ethhdr eth;
 	bool ret = false;
 
 	rcu_read_lock();
@@ -5026,10 +5059,7 @@ bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto)
 
 	br = port->br;
 
-	memset(&eth, 0, sizeof(eth));
-	eth.h_proto = htons(proto);
-
-	ret = br_multicast_querier_exists(&br->multicast_ctx, &eth, NULL);
+	ret = br_multicast_querier_exists(&br->multicast_ctx, proto);
 
 unlock:
 	rcu_read_unlock();
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index f83c24def595..05650af596ab 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -1154,37 +1154,15 @@ br_multicast_is_router(struct net_bridge_mcast *brmctx, struct sk_buff *skb)
 }
 
 static inline bool
-__br_multicast_querier_exists(struct net_bridge_mcast *brmctx,
-			      struct bridge_mcast_other_query *querier,
-			      const bool is_ipv6)
+br_multicast_snooping_active(struct net_bridge_mcast *brmctx, __be16 eth_proto,
+			     const struct net_bridge_mdb_entry *mdb)
 {
-	bool own_querier_enabled;
-
-	if (brmctx->multicast_querier) {
-		if (is_ipv6 && !br_opt_get(brmctx->br, BROPT_HAS_IPV6_ADDR))
-			own_querier_enabled = false;
-		else
-			own_querier_enabled = true;
-	} else {
-		own_querier_enabled = false;
-	}
-
-	return !timer_pending(&querier->delay_timer) &&
-	       (own_querier_enabled || timer_pending(&querier->timer));
-}
-
-static inline bool br_multicast_querier_exists(struct net_bridge_mcast *brmctx,
-					       struct ethhdr *eth,
-					       const struct net_bridge_mdb_entry *mdb)
-{
-	switch (eth->h_proto) {
+	switch (eth_proto) {
 	case (htons(ETH_P_IP)):
-		return __br_multicast_querier_exists(brmctx,
-			&brmctx->ip4_other_query, false);
+		return brmctx->ip4_active;
 #if IS_ENABLED(CONFIG_IPV6)
 	case (htons(ETH_P_IPV6)):
-		return __br_multicast_querier_exists(brmctx,
-			&brmctx->ip6_other_query, true);
+		return brmctx->ip6_active;
 #endif
 	default:
 		return !!mdb && br_group_is_l2(&mdb->addr);
@@ -1439,9 +1417,9 @@ static inline bool br_multicast_is_router(struct net_bridge_mcast *brmctx,
 	return false;
 }
 
-static inline bool br_multicast_querier_exists(struct net_bridge_mcast *brmctx,
-					       struct ethhdr *eth,
-					       const struct net_bridge_mdb_entry *mdb)
+static inline bool
+br_multicast_snooping_active(struct net_bridge_mcast *brmctx, __be16 eth_proto,
+			     const struct net_bridge_mdb_entry *mdb)
 {
 	return false;
 }
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/9] net: bridge: mcast: active state, if snooping is enabled
  2025-08-29  8:53 [PATCH 0/9] net: bridge: reduce multicast checks in fast path Linus Lüssing
                   ` (5 preceding siblings ...)
  2025-08-29  8:53 ` [PATCH 6/9] net: bridge: mcast: use combined active state in fast/data path Linus Lüssing
@ 2025-08-29  8:53 ` Linus Lüssing
  2025-08-29  8:53 ` [PATCH 8/9] net: bridge: mcast: track active state, bridge up/down Linus Lüssing
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Linus Lüssing @ 2025-08-29  8:53 UTC (permalink / raw)
  To: bridge
  Cc: netdev, linux-kernel, Nikolay Aleksandrov, Ido Schimmel,
	Andrew Lunn, Simon Horman, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, David S . Miller, Kuniyuki Iwashima,
	Stanislav Fomichev, Xiao Liang, Linus Lüssing

To be able to use the upcoming SWITCHDEV_ATTR_ID_BRIDGE_MC_ACTIVE
as a potential replacement for SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED
also check and toggle the active state if multicast snooping is enabled
or disabled. So that MC_ACTIVE not only checks the querier state, but
also if multicast snooping is enabled in general.

No functional change for the fast/data path yet.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
---
 include/uapi/linux/if_link.h |  6 ++++--
 net/bridge/br_multicast.c    | 21 +++++++++++++++++++--
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index fcf2c42aa6c4..ef686ea17afe 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -746,12 +746,14 @@ enum in6_addr_gen_mode {
  * @IFLA_BR_MCAST_ACTIVE_V4
  *   Bridge IPv4 mcast active state, read only.
  *
- *   1 if an IGMP querier is present, 0 otherwise.
+ *   1 if *IFLA_BR_MCAST_SNOOPING* is enabled and an IGMP querier is present,
+ *   0 otherwise.
  *
  * @IFLA_BR_MCAST_ACTIVE_V6
  *   Bridge IPv6 mcast active state, read only.
  *
- *   1 if an MLD querier is present, 0 otherwise.
+ *   1 if *IFLA_BR_MCAST_SNOOPING* is enabled and an MLD querier is present,
+ *   0 otherwise.
  */
 enum {
 	IFLA_BR_UNSPEC,
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 54163c74b9a9..53720337a1e3 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1145,6 +1145,7 @@ static void br_ip6_multicast_update_active(struct net_bridge_mcast *brmctx,
  *
  * The multicast active state is set, per protocol family, if:
  *
+ * - multicast snooping is enabled
  * - an IGMP/MLD querier is present
  * - for own IPv6 MLD querier: an IPv6 address is configured on the bridge
  *
@@ -1159,6 +1160,13 @@ static void br_multicast_update_active(struct net_bridge_mcast *brmctx)
 
 	lockdep_assert_held_once(&brmctx->br->multicast_lock);
 
+	if (!br_opt_get(brmctx->br, BROPT_MULTICAST_ENABLED))
+		force_inactive = true;
+
+	if (br_opt_get(brmctx->br, BROPT_MCAST_VLAN_SNOOPING_ENABLED) &&
+	    br_multicast_ctx_vlan_disabled(brmctx))
+		force_inactive = true;
+
 	br_ip4_multicast_update_active(brmctx, force_inactive);
 	br_ip6_multicast_update_active(brmctx, force_inactive);
 }
@@ -1371,6 +1379,12 @@ static struct sk_buff *br_multicast_alloc_query(struct net_bridge_mcast *brmctx,
 	return NULL;
 }
 
+static void br_multicast_toggle_enabled(struct net_bridge *br, bool on)
+{
+	br_opt_toggle(br, BROPT_MULTICAST_ENABLED, on);
+	br_multicast_update_active(&br->multicast_ctx);
+}
+
 struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
 						    struct br_ip *group)
 {
@@ -1384,7 +1398,7 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
 	if (atomic_read(&br->mdb_hash_tbl.nelems) >= br->hash_max) {
 		trace_br_mdb_full(br->dev, group);
 		br_mc_disabled_update(br->dev, false, NULL);
-		br_opt_toggle(br, BROPT_MULTICAST_ENABLED, false);
+		br_multicast_toggle_enabled(br, false);
 		return ERR_PTR(-E2BIG);
 	}
 
@@ -4452,6 +4466,7 @@ void br_multicast_toggle_one_vlan(struct net_bridge_vlan *vlan, bool on)
 
 		spin_lock_bh(&br->multicast_lock);
 		vlan->priv_flags ^= BR_VLFLAG_MCAST_ENABLED;
+		br_multicast_update_active(&vlan->br_mcast_ctx);
 		spin_unlock_bh(&br->multicast_lock);
 
 		if (on)
@@ -4472,6 +4487,7 @@ void br_multicast_toggle_one_vlan(struct net_bridge_vlan *vlan, bool on)
 			__br_multicast_enable_port_ctx(&vlan->port_mcast_ctx);
 		else
 			__br_multicast_disable_port_ctx(&vlan->port_mcast_ctx);
+		br_multicast_update_active(brmctx);
 		spin_unlock_bh(&br->multicast_lock);
 	}
 }
@@ -4792,7 +4808,8 @@ int br_multicast_toggle(struct net_bridge *br, unsigned long val,
 	if (err)
 		goto unlock;
 
-	br_opt_toggle(br, BROPT_MULTICAST_ENABLED, !!val);
+	br_multicast_toggle_enabled(br, !!val);
+
 	if (!br_opt_get(br, BROPT_MULTICAST_ENABLED)) {
 		change_snoopers = true;
 		goto unlock;
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 8/9] net: bridge: mcast: track active state, bridge up/down
  2025-08-29  8:53 [PATCH 0/9] net: bridge: reduce multicast checks in fast path Linus Lüssing
                   ` (6 preceding siblings ...)
  2025-08-29  8:53 ` [PATCH 7/9] net: bridge: mcast: active state, if snooping is enabled Linus Lüssing
@ 2025-08-29  8:53 ` Linus Lüssing
  2025-08-29  8:53 ` [PATCH 9/9] net: bridge: mcast: add inactive state assertions Linus Lüssing
  2025-08-29 15:47 ` [PATCH 0/9] net: bridge: reduce multicast checks in fast path Jakub Kicinski
  9 siblings, 0 replies; 16+ messages in thread
From: Linus Lüssing @ 2025-08-29  8:53 UTC (permalink / raw)
  To: bridge
  Cc: netdev, linux-kernel, Nikolay Aleksandrov, Ido Schimmel,
	Andrew Lunn, Simon Horman, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, David S . Miller, Kuniyuki Iwashima,
	Stanislav Fomichev, Xiao Liang, Linus Lüssing

This is mainly for switchdev and DSA later: To ensure that we switch
to inactive before destroying a bridge interface. A switchdev/DSA driver
might have allocated resources after we switched to an enabled multicast
active state. This gives switchdev/DSA drivers a chance to free these
resources again when we destroy the bridge (later).

Putting it into the ndo_stop / bridge interface down part instead of the
ndo_uninit / bridge destroy part though for a better semantic match. If
the bridge interface is down / stopped then it is also inactive.

No functional change for the fast/data path.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
---
 include/uapi/linux/if_link.h |  8 ++++----
 net/bridge/br_device.c       |  3 +++
 net/bridge/br_multicast.c    | 26 ++++++++++++++++++++++----
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index ef686ea17afe..76d15a07344d 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -746,14 +746,14 @@ enum in6_addr_gen_mode {
  * @IFLA_BR_MCAST_ACTIVE_V4
  *   Bridge IPv4 mcast active state, read only.
  *
- *   1 if *IFLA_BR_MCAST_SNOOPING* is enabled and an IGMP querier is present,
- *   0 otherwise.
+ *   1 if *IFLA_BR_MCAST_SNOOPING* is enabled, an IGMP querier is present
+ *   and the bridge interface is up, 0 otherwise.
  *
  * @IFLA_BR_MCAST_ACTIVE_V6
  *   Bridge IPv6 mcast active state, read only.
  *
- *   1 if *IFLA_BR_MCAST_SNOOPING* is enabled and an MLD querier is present,
- *   0 otherwise.
+ *   1 if *IFLA_BR_MCAST_SNOOPING* is enabled, an MLD querier is present
+ *   and the bridge interface is up, 0 otherwise.
  */
 enum {
 	IFLA_BR_UNSPEC,
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 3cdf1c17108b..efb5d35c2dd4 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -168,7 +168,10 @@ static int br_dev_open(struct net_device *dev)
 	netdev_update_features(dev);
 	netif_start_queue(dev);
 	br_stp_enable_bridge(br);
+
+	spin_lock_bh(&br->multicast_lock);
 	br_multicast_open(br);
+	spin_unlock_bh(&br->multicast_lock);
 
 	if (br_opt_get(br, BROPT_MULTICAST_ENABLED))
 		br_multicast_join_snoopers(br);
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 53720337a1e3..09e23e4d8b74 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1145,6 +1145,7 @@ static void br_ip6_multicast_update_active(struct net_bridge_mcast *brmctx,
  *
  * The multicast active state is set, per protocol family, if:
  *
+ * - the bridge interface is up
  * - multicast snooping is enabled
  * - an IGMP/MLD querier is present
  * - for own IPv6 MLD querier: an IPv6 address is configured on the bridge
@@ -1160,6 +1161,9 @@ static void br_multicast_update_active(struct net_bridge_mcast *brmctx)
 
 	lockdep_assert_held_once(&brmctx->br->multicast_lock);
 
+	if (!netif_running(brmctx->br->dev))
+		force_inactive = true;
+
 	if (!br_opt_get(brmctx->br, BROPT_MULTICAST_ENABLED))
 		force_inactive = true;
 
@@ -4379,6 +4383,9 @@ static void __br_multicast_open(struct net_bridge_mcast *brmctx)
 #if IS_ENABLED(CONFIG_IPV6)
 	__br_multicast_open_query(brmctx->br, &brmctx->ip6_own_query);
 #endif
+
+	/* bridge interface is up, maybe set multicast state to active */
+	br_multicast_update_active(brmctx);
 }
 
 void br_multicast_open(struct net_bridge *br)
@@ -4417,6 +4424,11 @@ static void __br_multicast_stop(struct net_bridge_mcast *brmctx)
 	timer_delete_sync(&brmctx->ip6_other_query.delay_timer);
 	timer_delete_sync(&brmctx->ip6_own_query.timer);
 #endif
+
+	spin_lock_bh(&brmctx->br->multicast_lock);
+	/* bridge interface is down, set multicast state to inactive */
+	br_multicast_update_active(brmctx);
+	spin_unlock_bh(&brmctx->br->multicast_lock);
 }
 
 void br_multicast_update_vlan_mcast_ctx(struct net_bridge_vlan *v, u8 state)
@@ -4469,10 +4481,13 @@ void br_multicast_toggle_one_vlan(struct net_bridge_vlan *vlan, bool on)
 		br_multicast_update_active(&vlan->br_mcast_ctx);
 		spin_unlock_bh(&br->multicast_lock);
 
-		if (on)
+		if (on) {
+			spin_lock_bh(&br->multicast_lock);
 			__br_multicast_open(&vlan->br_mcast_ctx);
-		else
+			spin_unlock_bh(&br->multicast_lock);
+		} else {
 			__br_multicast_stop(&vlan->br_mcast_ctx);
+		}
 	} else {
 		struct net_bridge_mcast *brmctx;
 
@@ -4534,10 +4549,13 @@ int br_multicast_toggle_vlan_snooping(struct net_bridge *br, bool on,
 	br_opt_toggle(br, BROPT_MCAST_VLAN_SNOOPING_ENABLED, on);
 
 	/* disable/enable non-vlan mcast contexts based on vlan snooping */
-	if (on)
+	if (on) {
 		__br_multicast_stop(&br->multicast_ctx);
-	else
+	} else {
+		spin_lock_bh(&br->multicast_lock);
 		__br_multicast_open(&br->multicast_ctx);
+		spin_unlock_bh(&br->multicast_lock);
+	}
 	list_for_each_entry(p, &br->port_list, list) {
 		if (on)
 			br_multicast_disable_port_ctx(&p->multicast_ctx);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 9/9] net: bridge: mcast: add inactive state assertions
  2025-08-29  8:53 [PATCH 0/9] net: bridge: reduce multicast checks in fast path Linus Lüssing
                   ` (7 preceding siblings ...)
  2025-08-29  8:53 ` [PATCH 8/9] net: bridge: mcast: track active state, bridge up/down Linus Lüssing
@ 2025-08-29  8:53 ` Linus Lüssing
  2025-08-29 15:47 ` [PATCH 0/9] net: bridge: reduce multicast checks in fast path Jakub Kicinski
  9 siblings, 0 replies; 16+ messages in thread
From: Linus Lüssing @ 2025-08-29  8:53 UTC (permalink / raw)
  To: bridge
  Cc: netdev, linux-kernel, Nikolay Aleksandrov, Ido Schimmel,
	Andrew Lunn, Simon Horman, Paolo Abeni, Jakub Kicinski,
	Eric Dumazet, David S . Miller, Kuniyuki Iwashima,
	Stanislav Fomichev, Xiao Liang, Linus Lüssing

To avoid packetloss and as it is very hard from a user's perspective to
debug multicast snooping related issues it is even more crucial to properly
switch from an active to an inactive multicast snooping state than the
other way around.

Therefore adding a few kernel warnings if any of our assertions to be in
an inactive state would fail.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
---
 net/bridge/br_multicast.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 09e23e4d8b74..46161e707fc5 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1383,10 +1383,29 @@ static struct sk_buff *br_multicast_alloc_query(struct net_bridge_mcast *brmctx,
 	return NULL;
 }
 
+static void br_ip4_multicast_assert_inactive(struct net_bridge_mcast *brmctx)
+{
+	WARN_ON(br_multicast_snooping_active(brmctx, htons(ETH_P_IP), NULL));
+}
+
+static void br_ip6_multicast_assert_inactive(struct net_bridge_mcast *brmctx)
+{
+	WARN_ON(br_multicast_snooping_active(brmctx, htons(ETH_P_IPV6), NULL));
+}
+
+static void br_multicast_assert_inactive(struct net_bridge_mcast *brmctx)
+{
+	br_ip4_multicast_assert_inactive(brmctx);
+	br_ip6_multicast_assert_inactive(brmctx);
+}
+
 static void br_multicast_toggle_enabled(struct net_bridge *br, bool on)
 {
 	br_opt_toggle(br, BROPT_MULTICAST_ENABLED, on);
 	br_multicast_update_active(&br->multicast_ctx);
+
+	if (!on)
+		br_multicast_assert_inactive(&br->multicast_ctx);
 }
 
 struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
@@ -1846,7 +1865,6 @@ static void br_ip6_multicast_local_router_expired(struct timer_list *t)
 static void br_multicast_querier_expired(struct net_bridge_mcast *brmctx,
 					 struct bridge_mcast_own_query *query)
 {
-	spin_lock(&brmctx->br->multicast_lock);
 	if (!netif_running(brmctx->br->dev) ||
 	    br_multicast_ctx_vlan_global_disabled(brmctx) ||
 	    !br_opt_get(brmctx->br, BROPT_MULTICAST_ENABLED))
@@ -1859,7 +1877,6 @@ static void br_multicast_querier_expired(struct net_bridge_mcast *brmctx,
 	 * if our own querier is disabled, too
 	 */
 	br_multicast_update_active(brmctx);
-	spin_unlock(&brmctx->br->multicast_lock);
 }
 
 static void br_ip4_multicast_querier_expired(struct timer_list *t)
@@ -1867,7 +1884,12 @@ static void br_ip4_multicast_querier_expired(struct timer_list *t)
 	struct net_bridge_mcast *brmctx = timer_container_of(brmctx, t,
 							     ip4_other_query.timer);
 
+	spin_lock(&brmctx->br->multicast_lock);
 	br_multicast_querier_expired(brmctx, &brmctx->ip4_own_query);
+
+	if (!brmctx->multicast_querier)
+		br_ip4_multicast_assert_inactive(brmctx);
+	spin_unlock(&brmctx->br->multicast_lock);
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
@@ -1876,7 +1898,12 @@ static void br_ip6_multicast_querier_expired(struct timer_list *t)
 	struct net_bridge_mcast *brmctx = timer_container_of(brmctx, t,
 							     ip6_other_query.timer);
 
+	spin_lock(&brmctx->br->multicast_lock);
 	br_multicast_querier_expired(brmctx, &brmctx->ip6_own_query);
+
+	if (!brmctx->multicast_querier)
+		br_ip6_multicast_assert_inactive(brmctx);
+	spin_unlock(&brmctx->br->multicast_lock);
 }
 #endif
 
@@ -4428,6 +4455,7 @@ static void __br_multicast_stop(struct net_bridge_mcast *brmctx)
 	spin_lock_bh(&brmctx->br->multicast_lock);
 	/* bridge interface is down, set multicast state to inactive */
 	br_multicast_update_active(brmctx);
+	br_multicast_assert_inactive(brmctx);
 	spin_unlock_bh(&brmctx->br->multicast_lock);
 }
 
@@ -4479,6 +4507,8 @@ void br_multicast_toggle_one_vlan(struct net_bridge_vlan *vlan, bool on)
 		spin_lock_bh(&br->multicast_lock);
 		vlan->priv_flags ^= BR_VLFLAG_MCAST_ENABLED;
 		br_multicast_update_active(&vlan->br_mcast_ctx);
+		if (!on)
+			br_multicast_assert_inactive(&vlan->br_mcast_ctx);
 		spin_unlock_bh(&br->multicast_lock);
 
 		if (on) {
@@ -4503,6 +4533,8 @@ void br_multicast_toggle_one_vlan(struct net_bridge_vlan *vlan, bool on)
 		else
 			__br_multicast_disable_port_ctx(&vlan->port_mcast_ctx);
 		br_multicast_update_active(brmctx);
+		if (!on)
+			br_multicast_assert_inactive(brmctx);
 		spin_unlock_bh(&br->multicast_lock);
 	}
 }
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] net: bridge: reduce multicast checks in fast path
  2025-08-29  8:53 [PATCH 0/9] net: bridge: reduce multicast checks in fast path Linus Lüssing
                   ` (8 preceding siblings ...)
  2025-08-29  8:53 ` [PATCH 9/9] net: bridge: mcast: add inactive state assertions Linus Lüssing
@ 2025-08-29 15:47 ` Jakub Kicinski
  2025-08-29 16:23   ` Nikolay Aleksandrov
  9 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2025-08-29 15:47 UTC (permalink / raw)
  To: Linus Lüssing
  Cc: bridge, netdev, linux-kernel, Nikolay Aleksandrov, Ido Schimmel,
	Andrew Lunn, Simon Horman, Paolo Abeni, Eric Dumazet,
	David S . Miller, Kuniyuki Iwashima, Stanislav Fomichev,
	Xiao Liang

On Fri, 29 Aug 2025 10:53:41 +0200 Linus Lüssing wrote:
> This patchset introduces new state variables to combine and reduce the
> number of checks we would otherwise perform on every multicast packet
> in fast/data path.
>   
> The second reason for introducing these new, internal multicast active
> variables is to later propagate a safety mechanism which was introduced
> in b00589af3b04 ("bridge: disable snooping if there is no querier") to
> switchdev/DSA, too. That is to notify switchdev/DSA if multicast
> snooping can safely be applied without potential packet loss.

Please leave the git-generated diff stat in the cover letter.
Please include tree designation in the subject, per:
https://www.kernel.org/doc/html/next/process/maintainer-netdev.html

I'll leave the real review to the experts but this series appears
to make kselftests unhappy:

[  106.423894] WARNING: CPU: 3 PID: 1121 at net/bridge/br_multicast.c:1388 __br_multicast_stop+0xa0/0xc0 [bridge]
[  106.424022] Modules linked in: sch_ingress 8021q act_mirred cls_matchall sch_red dummy bridge stp llc sch_tbf vrf veth
[  106.424144] CPU: 3 UID: 0 PID: 1121 Comm: ip Not tainted 6.17.0-rc2-virtme #1 PREEMPT(voluntary) 
[  106.424235] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  106.424301] RIP: 0010:__br_multicast_stop+0xa0/0xc0 [bridge]
[  106.424371] Code: 89 df e8 f3 fd ff ff 80 bb 2c 01 00 00 00 75 19 80 bb 0c 02 00 00 00 75 1d 48 8b 3b 5b 48 81 c7 4c 03 00 00 e9 b1 c2 0b f8 90 <0f> 0b 90 80 bb 0c 02 00 00 00 74 e3 90 0f 0b 90 48 8b 3b 5b 48 81
[  106.424544] RSP: 0018:ffffb78500283888 EFLAGS: 00010202
[  106.424586] RAX: 0000000000000000 RBX: ffff979907d81af0 RCX: 0000000000000000
[  106.424665] RDX: ffff979907d81c50 RSI: 0000000000000001 RDI: ffff979907d819c0
[  106.424753] RBP: ffff979907d819c0 R08: ffffffffb94f0180 R09: ffffb78500283b40
[  106.424836] R10: ffff97990342d250 R11: ffff979907d81000 R12: ffff979907d819c0
[  106.424913] R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000001
[  106.424992] FS:  00007f7e9de78800(0000) GS:ffff979985967000(0000) knlGS:0000000000000000
[  106.425073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.425144] CR2: 0000000000447b60 CR3: 0000000004d3b003 CR4: 0000000000772ef0
[  106.425226] PKRU: 55555554
[  106.425250] Call Trace:
[  106.425278]  <TASK>
[  106.425305]  br_multicast_toggle_vlan_snooping+0x1a9/0x1e0 [bridge]
[  106.425383]  br_boolopt_multi_toggle+0x54/0x90 [bridge]
[  106.425443]  br_changelink+0x4be/0x510 [bridge]
[  106.425510]  ? ns_capable+0x2d/0x60
[  106.425557]  rtnl_newlink+0x73f/0xbc0
[  106.425605]  ? rtnl_setlink+0x2c0/0x2c0
[  106.425633]  rtnetlink_rcv_msg+0x358/0x400
[  106.425676]  ? update_load_avg+0x6f/0x350
[  106.425726]  ? rtnl_calcit.isra.0+0x110/0x110
[  106.425778]  netlink_rcv_skb+0x57/0x100
[  106.425819]  netlink_unicast+0x252/0x380
[  106.425858]  ? __alloc_skb+0xdb/0x190
[  106.425904]  netlink_sendmsg+0x1be/0x3e0
[  106.425945]  ____sys_sendmsg+0x132/0x260
[  106.425984]  ? copy_msghdr_from_user+0x6c/0xa0
[  106.426039]  ___sys_sendmsg+0x87/0xd0
[  106.426083]  ? __handle_mm_fault+0xa41/0xe50
[  106.426144]  __sys_sendmsg+0x71/0xd0
[  106.426184]  do_syscall_64+0xa4/0x260
[  106.426224]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[  106.426279] RIP: 0033:0x7f7e9e0451e7

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] net: bridge: reduce multicast checks in fast path
  2025-08-29 15:47 ` [PATCH 0/9] net: bridge: reduce multicast checks in fast path Jakub Kicinski
@ 2025-08-29 16:23   ` Nikolay Aleksandrov
  2025-09-02 20:16     ` Linus Lüssing
  0 siblings, 1 reply; 16+ messages in thread
From: Nikolay Aleksandrov @ 2025-08-29 16:23 UTC (permalink / raw)
  To: Jakub Kicinski, Linus Lüssing
  Cc: bridge, netdev, linux-kernel, Ido Schimmel, Andrew Lunn,
	Simon Horman, Paolo Abeni, Eric Dumazet, David S . Miller,
	Kuniyuki Iwashima, Stanislav Fomichev, Xiao Liang

On 8/29/25 18:47, Jakub Kicinski wrote:
> On Fri, 29 Aug 2025 10:53:41 +0200 Linus Lüssing wrote:
>> This patchset introduces new state variables to combine and reduce the
>> number of checks we would otherwise perform on every multicast packet
>> in fast/data path.
>>    
>> The second reason for introducing these new, internal multicast active
>> variables is to later propagate a safety mechanism which was introduced
>> in b00589af3b04 ("bridge: disable snooping if there is no querier") to
>> switchdev/DSA, too. That is to notify switchdev/DSA if multicast
>> snooping can safely be applied without potential packet loss.
> 
> Please leave the git-generated diff stat in the cover letter.
> Please include tree designation in the subject, per:
> https://www.kernel.org/doc/html/next/process/maintainer-netdev.html
> 
> I'll leave the real review to the experts but this series appears
> to make kselftests unhappy:
> 

just fyi my email wasn't working for 2 days and unfortunately I missed this set
I took a look now on patchwork, I do have comments but it's difficult to reply as
I don't have the emails and have to do it manually to each, so I'd rather wait
for v2.

a few notes for v2:
- please use READ/WRTE_ONCE() for variables that are used without locking
- please make locking symmetric, I saw that br_multicast_open() expects the lock to be already held, while
   __br_multicast_stop() takes it itself
- target net-next
- is the mcast lock really necessary, would atomic ops do for this tracking?
- can you provide the full view somewhere, how would this tracking be used? I fear
   there might still be races.
- please add more details exactly what we save on the fast-path, I know but it'd be nice
   to have it in the commit message as well, all commits just say "reduce checks, save cycles"
   but there are no details what we save

I will try to give more detailed comments in v2.

Thank you,
  Nik


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] net: bridge: reduce multicast checks in fast path
  2025-08-29 16:23   ` Nikolay Aleksandrov
@ 2025-09-02 20:16     ` Linus Lüssing
  2025-09-02 23:00       ` Linus Lüssing
  2025-09-03 10:08       ` Nikolay Aleksandrov
  0 siblings, 2 replies; 16+ messages in thread
From: Linus Lüssing @ 2025-09-02 20:16 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Jakub Kicinski, bridge, netdev, linux-kernel, Ido Schimmel,
	Andrew Lunn, Simon Horman, Paolo Abeni, Eric Dumazet,
	David S . Miller, Kuniyuki Iwashima, Stanislav Fomichev,
	Xiao Liang

Hi Nik, thanks for the suggestions and review again!


On Fri, Aug 29, 2025 at 07:23:24PM +0300, Nikolay Aleksandrov wrote:
> 
> a few notes for v2:
> - please use READ/WRTE_ONCE() for variables that are used without locking

Just to understand the issue, otherwise the data path would assume
an old inactive or active state for a prolonged time after toggling?
Or what would happen?


> - please make locking symmetric, I saw that br_multicast_open() expects the lock to be already held, while
>   __br_multicast_stop() takes it itself

I think that's what I tried before submitting. Initially wanted
to have the locking inside, but then it would deadlock on
br_multicast_toggle()->br_multicast_open()->... as this is the one
place where br_multicast_open() is called with the multicast spinlock
already held.

On the other hand, moving the spinlock out of / around
__br_multicast_stop() would lead to a sleeping-while-atomic bug
when calling timer_delete_sync() in there. And if I were to change
these to a timer_delete() I guess there is no way to do the sync
part after unlocking? There is no equivalent to something like the
flush_workqueue() / drain_workqueue() for workqueues, but for
simple timers instead, right?

I would also love to go for the first approach, taking the
spinlock inside of __br_multicast_open(). But not quite sure how
to best and safely change br_multicast_toggle() then.


> - is the mcast lock really necessary, would atomic ops do for this tracking?

Hm, not sure. We'd be checking multiple variables before toggling
the new brmctx->ip{4,6}_active. As the checked variables can
change from multiple contexts couldn't the following happen then?


Start: ip*_active = false, snooping_enabled = false,
       vlan_snooping_enabled = true, vlan{id:42}->snooping_enabled = false

Thread A)                     Thread B)
--------------------------------------------------------------------------
                              br_multicast_toggle(br, 1, ...)
			      -> loads vlan{id:42}->snooping_enabled: false
--------------------------------------------------------------------------
br_multicast_toggle_one_vlan(vlan{id:42}, true)
-> vlan->priv_flags: set per-vlan-mc-snooping to true
-> br_multicast_update_active()
   checks snooping_enabled: false
   -> keeping vlan's ip*_active at false
--------------------------------------------------------------------------
                              -> sets snooping_enabled: true
                              -> br_multicast_update_active()
			         -> checks vlan{id:42}->snooping_enabled:
				    false (from earlier load above)
                                 -> keeping vlan's ip*_active at false

Result: vlan's ip*_active is still false even though it should be
true now?

.o(Or were netlink and sysfs handlers somehow ensured to never run in
parallel?)


I'm not that versed with atomic's and explicit memory barriers,
could that prevent something like that even if multiple variables
are involved? Only used lockless atomic_t's for atomic_inc()/_dec() so far.
And otherwise used explicit locking.



> - can you provide the full view somewhere, how would this tracking be used? I fear
>   there might still be races.

My original switchdev integration plan so far was roughly still the same
as in the previous pull-request:

https://patchwork.kernel.org/project/netdevbpf/patch/20250522195952.29265-5-linus.luessing@c0d3.blue/

And using it in rtl83xx in OpenWrt like this:
https://github.com/openwrt/openwrt/pull/18780/commits/708bbc4b73bc90cd13839c613e6634aa5faeeace#diff-b5c9a9cc66e207d77fea5d063dca2cce20cf4ae3a28fc0a5d5eebd47a024d5c3

But haven't updated these yet onto this PR, will do that later.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] net: bridge: reduce multicast checks in fast path
  2025-09-02 20:16     ` Linus Lüssing
@ 2025-09-02 23:00       ` Linus Lüssing
  2025-09-03 10:11         ` Nikolay Aleksandrov
  2025-09-03 10:08       ` Nikolay Aleksandrov
  1 sibling, 1 reply; 16+ messages in thread
From: Linus Lüssing @ 2025-09-02 23:00 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Jakub Kicinski, bridge, netdev, linux-kernel, Ido Schimmel,
	Andrew Lunn, Simon Horman, Paolo Abeni, Eric Dumazet,
	David S . Miller, Kuniyuki Iwashima, Stanislav Fomichev,
	Xiao Liang

On Tue, Sep 02, 2025 at 10:16:04PM +0200, Linus Lüssing wrote:
> On the other hand, moving the spinlock out of / around
> __br_multicast_stop() would lead to a sleeping-while-atomic bug
> when calling timer_delete_sync() in there. And if I were to change
> these to a timer_delete() I guess there is no way to do the sync
> part after unlocking? There is no equivalent to something like the
> flush_workqueue() / drain_workqueue() for workqueues, but for
> simple timers instead, right?

I'm wondering if it would be sufficient to use timer_del() on
.ndo_stop->br_dev_stop()->br_multicast_stop().

And use timer_del_sync() only on
.ndo_uninit->br_dev_uninit()-> br_multicast_dev_del()->
br_multicast_ctx_deinit() and
br_vlan_put_master()->br_multicast_ctx_deinit().


So basically only sync / wait for timer callbacks to finish if
their context is about to be free'd, on bridge or VLAN destruction?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] net: bridge: reduce multicast checks in fast path
  2025-09-02 20:16     ` Linus Lüssing
  2025-09-02 23:00       ` Linus Lüssing
@ 2025-09-03 10:08       ` Nikolay Aleksandrov
  1 sibling, 0 replies; 16+ messages in thread
From: Nikolay Aleksandrov @ 2025-09-03 10:08 UTC (permalink / raw)
  To: Linus Lüssing
  Cc: Jakub Kicinski, bridge, netdev, linux-kernel, Ido Schimmel,
	Andrew Lunn, Simon Horman, Paolo Abeni, Eric Dumazet,
	David S . Miller, Kuniyuki Iwashima, Stanislav Fomichev,
	Xiao Liang

On 9/2/25 23:16, Linus Lüssing wrote:
> Hi Nik, thanks for the suggestions and review again!
> 
> 
> On Fri, Aug 29, 2025 at 07:23:24PM +0300, Nikolay Aleksandrov wrote:
>>
>> a few notes for v2:
>> - please use READ/WRTE_ONCE() for variables that are used without locking
> 
> Just to understand the issue, otherwise the data path would assume
> an old inactive or active state for a prolonged time after toggling?
> Or what would happen?
>
> 

It's more about marking these as used without locking so KCSAN won't flag them and also
to clearly show people that intent.

>> - please make locking symmetric, I saw that br_multicast_open() expects the lock to be already held, while
>>    __br_multicast_stop() takes it itself
> 
> I think that's what I tried before submitting. Initially wanted
> to have the locking inside, but then it would deadlock on
> br_multicast_toggle()->br_multicast_open()->... as this is the one
> place where br_multicast_open() is called with the multicast spinlock
> already held.
> 
> On the other hand, moving the spinlock out of / around
> __br_multicast_stop() would lead to a sleeping-while-atomic bug
> when calling timer_delete_sync() in there. And if I were to change
> these to a timer_delete() I guess there is no way to do the sync
> part after unlocking? There is no equivalent to something like the
> flush_workqueue() / drain_workqueue() for workqueues, but for
> simple timers instead, right?
> 
> I would also love to go for the first approach, taking the
> spinlock inside of __br_multicast_open(). But not quite sure how
> to best and safely change br_multicast_toggle() then.
> 
> 

Well, this is not an easy one to solve, would require more thought and
changes to get the proper locking, but it certainly shouldn't be left
asymmetric - having one take the lock, and expecting that it's already taken
for the other, that is definitely unacceptable. Please spend more time on this
and think about how it can be changed. Right now I don't have the time to dig
in and make a proper proposal, but I'm happy to review and discuss proposed
solutions.

>> - is the mcast lock really necessary, would atomic ops do for this tracking?
> 
> Hm, not sure. We'd be checking multiple variables before toggling
> the new brmctx->ip{4,6}_active. As the checked variables can
> change from multiple contexts couldn't the following happen then?
> 
> 
> Start: ip*_active = false, snooping_enabled = false,
>         vlan_snooping_enabled = true, vlan{id:42}->snooping_enabled = false
> 
> Thread A)                     Thread B)
> --------------------------------------------------------------------------
>                                br_multicast_toggle(br, 1, ...)
> 			      -> loads vlan{id:42}->snooping_enabled: false
> --------------------------------------------------------------------------
> br_multicast_toggle_one_vlan(vlan{id:42}, true)
> -> vlan->priv_flags: set per-vlan-mc-snooping to true
> -> br_multicast_update_active()
>     checks snooping_enabled: false
>     -> keeping vlan's ip*_active at false
> --------------------------------------------------------------------------
>                                -> sets snooping_enabled: true
>                                -> br_multicast_update_active()
> 			         -> checks vlan{id:42}->snooping_enabled:
> 				    false (from earlier load above)
>                                   -> keeping vlan's ip*_active at false
> 
> Result: vlan's ip*_active is still false even though it should be
> true now?
> 
> .o(Or were netlink and sysfs handlers somehow ensured to never run in
> parallel?)
> 

They are, netlink and sysfs are supposed to take the same locks so they
cannot run in parallel changing options.

> 
> I'm not that versed with atomic's and explicit memory barriers,
> could that prevent something like that even if multiple variables
> are involved? Only used lockless atomic_t's for atomic_inc()/_dec() so far.
> And otherwise used explicit locking.
> 
> 
> 
>> - can you provide the full view somewhere, how would this tracking be used? I fear
>>    there might still be races.
> 
> My original switchdev integration plan so far was roughly still the same
> as in the previous pull-request:
> 
> https://patchwork.kernel.org/project/netdevbpf/patch/20250522195952.29265-5-linus.luessing@c0d3.blue/
> 
> And using it in rtl83xx in OpenWrt like this:
> https://github.com/openwrt/openwrt/pull/18780/commits/708bbc4b73bc90cd13839c613e6634aa5faeeace#diff-b5c9a9cc66e207d77fea5d063dca2cce20cf4ae3a28fc0a5d5eebd47a024d5c3
> 
> But haven't updated these yet onto this PR, will do that later.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] net: bridge: reduce multicast checks in fast path
  2025-09-02 23:00       ` Linus Lüssing
@ 2025-09-03 10:11         ` Nikolay Aleksandrov
  0 siblings, 0 replies; 16+ messages in thread
From: Nikolay Aleksandrov @ 2025-09-03 10:11 UTC (permalink / raw)
  To: Linus Lüssing
  Cc: Jakub Kicinski, bridge, netdev, linux-kernel, Ido Schimmel,
	Andrew Lunn, Simon Horman, Paolo Abeni, Eric Dumazet,
	David S . Miller, Kuniyuki Iwashima, Stanislav Fomichev,
	Xiao Liang

On 9/3/25 02:00, Linus Lüssing wrote:
> On Tue, Sep 02, 2025 at 10:16:04PM +0200, Linus Lüssing wrote:
>> On the other hand, moving the spinlock out of / around
>> __br_multicast_stop() would lead to a sleeping-while-atomic bug
>> when calling timer_delete_sync() in there. And if I were to change
>> these to a timer_delete() I guess there is no way to do the sync
>> part after unlocking? There is no equivalent to something like the
>> flush_workqueue() / drain_workqueue() for workqueues, but for
>> simple timers instead, right?
> 
> I'm wondering if it would be sufficient to use timer_del() on
> .ndo_stop->br_dev_stop()->br_multicast_stop().
> 
> And use timer_del_sync() only on
> .ndo_uninit->br_dev_uninit()-> br_multicast_dev_del()->
> br_multicast_ctx_deinit() and
> br_vlan_put_master()->br_multicast_ctx_deinit().
> 
> 
> So basically only sync / wait for timer callbacks to finish if
> their context is about to be free'd, on bridge or VLAN destruction?

You should make sure the state is sane if the callbacks can be executed after
br_multicast_stop(). There are many timers that can affect different aspects,
it might be doable but needs careful inspection and code ordering.


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-09-03 10:11 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-29  8:53 [PATCH 0/9] net: bridge: reduce multicast checks in fast path Linus Lüssing
2025-08-29  8:53 ` [PATCH 1/9] net: bridge: mcast: track active state, IGMP/MLD querier appearance Linus Lüssing
2025-08-29  8:53 ` [PATCH 2/9] net: bridge: mcast: track active state, foreign IGMP/MLD querier disappearance Linus Lüssing
2025-08-29  8:53 ` [PATCH 3/9] net: bridge: mcast: track active state, IPv6 address availability Linus Lüssing
2025-08-29  8:53 ` [PATCH 4/9] net: bridge: mcast: track active state, own MLD querier disappearance Linus Lüssing
2025-08-29  8:53 ` [PATCH 5/9] net: bridge: mcast: export ip{4,6}_active state to netlink Linus Lüssing
2025-08-29  8:53 ` [PATCH 6/9] net: bridge: mcast: use combined active state in fast/data path Linus Lüssing
2025-08-29  8:53 ` [PATCH 7/9] net: bridge: mcast: active state, if snooping is enabled Linus Lüssing
2025-08-29  8:53 ` [PATCH 8/9] net: bridge: mcast: track active state, bridge up/down Linus Lüssing
2025-08-29  8:53 ` [PATCH 9/9] net: bridge: mcast: add inactive state assertions Linus Lüssing
2025-08-29 15:47 ` [PATCH 0/9] net: bridge: reduce multicast checks in fast path Jakub Kicinski
2025-08-29 16:23   ` Nikolay Aleksandrov
2025-09-02 20:16     ` Linus Lüssing
2025-09-02 23:00       ` Linus Lüssing
2025-09-03 10:11         ` Nikolay Aleksandrov
2025-09-03 10:08       ` Nikolay Aleksandrov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).