[PATCH net v2] bridge: mcast: Fix a false positive lockdep splat

Netdev List
 help / color / mirror / Atom feed

* [PATCH net v2] bridge: mcast: Fix a false positive lockdep splat
@ 2026-04-30 16:26 Ido Schimmel
  2026-04-30 17:10 ` Eric Dumazet
  0 siblings, 1 reply; 2+ messages in thread
From: Ido Schimmel @ 2026-04-30 16:26 UTC (permalink / raw)
  To: netdev, bridge
  Cc: davem, kuba, pabeni, edumazet, razor, horms, herbert,
	linus.luessing, petrm, Ido Schimmel

Connecting two bridges on the same system [1] can result in a lockdep
splat [2].

The report is a false positive. Multicast queries are built and
transmitted under the bridge multicast lock. When the outgoing port of
one bridge is configured on top of another bridge, the transmit path
re-enters bridge code and acquires the other bridge's multicast lock in
order to snoop the query. Both lock instances share a single lockdep
class, so lockdep flags the nested acquisition as an AA deadlock.

Giving each bridge its own lock class will not solve the problem: the
reverse topology would produce an ABBA splat with the same pair of
classes. It also consumes a lockdep key per bridge.

Instead, fix the problem by deferring the transmission of the queries to
a workqueue. Build the skb and update querier state under the lock as
before, then enqueue the skb on a per multicast context queue and
schedule the work.

Flush the work when the multicast context is de-initialized. At this
stage the work cannot be requeued. There is no need to take a reference
on skb->dev since the work cannot outlive the bridge or the bridge port.

Use the high priority workqueue to reduce the delay between the enqueue
time and the transmission time. With default settings (i.e., querier
interval - 255 seconds, query interval - 125 seconds) the extra delay
should not be a problem.

Avoid the unlikely case of the queue growing endlessly by limiting it to
1,000 skbs. Use this number for the simple reason that this is the
default Tx queue length.

[1]
ip link add name br1 up type bridge mcast_snooping 1 mcast_querier 1
ip link add name br0 up type bridge mcast_snooping 1 mcast_querier 1
ip link add link br0 name br0.10 up master br1 type vlan id 10

[2]
WARNING: possible recursive locking detected
7.0.0-virtme-gb50c64a58a90 #1 Not tainted
[...]
ip/339 is trying to acquire lock:
ffff888104f0b480 (&br->multicast_lock){+.-.}-{3:3}, at: br_ip6_multicast_query (net/bridge/br_multicast.c:3584)

but task is already holding lock:
ffff888104f03480 (&br->multicast_lock){+.-.}-{3:3}, at: br_multicast_port_query_expired (net/bridge/br_multicast.c:1904)

[...]

Call Trace:
[...]
br_ip6_multicast_query (net/bridge/br_multicast.c:3584)
br_multicast_ipv6_rcv (net/bridge/br_multicast.c:3988)
br_dev_xmit (net/bridge/br_device.c:98 (discriminator 1))
dev_hard_start_xmit (./include/linux/netdevice.h:5343 ./include/linux/netdevice.h:5352 net/core/dev.c:3888 net/core/dev.c:3904)
__dev_queue_xmit (./include/linux/netdevice.h:3619 net/core/dev.c:4871)
vlan_dev_hard_start_xmit (net/8021q/vlan_dev.c:131 (discriminator 1))
dev_hard_start_xmit (./include/linux/netdevice.h:5343 ./include/linux/netdevice.h:5352 net/core/dev.c:3888 net/core/dev.c:3904)
__dev_queue_xmit (./include/linux/netdevice.h:3619 net/core/dev.c:4871)
br_dev_queue_push_xmit (net/bridge/br_forward.c:60)
__br_multicast_send_query (net/bridge/br_multicast.c:1811 (discriminator 1))
br_multicast_send_query (net/bridge/br_multicast.c:1889)
br_multicast_port_query_expired (./include/linux/spinlock.h:390 net/bridge/br_multicast.c:1914)
call_timer_fn (./arch/x86/include/asm/jump_label.h:37 ./include/trace/events/timer.h:127 kernel/time/timer.c:1749)
[...]

Fixes: eb1d16414339 ("bridge: Add core IGMP snooping support")
Reported-by: syzbot+d7b7f1412c02134efa6d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/000000000000c4c9d405f2643e01@google.com/
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
v2:
  - Limit the queue to 1,000 skbs.
  - Edit the trace to avoid checkpatch errors.
v1: https://lore.kernel.org/netdev/20260426133435.207006-1-idosch@nvidia.com/
---
 net/bridge/br_multicast.c | 47 +++++++++++++++++++++++++++++++++++----
 net/bridge/br_private.h   |  4 ++++
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 881d866d687a..e9f5fe01ff95 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1776,6 +1776,30 @@ static void br_multicast_select_own_querier(struct net_bridge_mcast *brmctx,
 #endif
 }
 
+static void br_multicast_port_query_queue_work(struct work_struct *work)
+{
+	struct net_bridge_mcast_port *pmctx;
+	struct sk_buff *skb;
+
+	pmctx = container_of(work, struct net_bridge_mcast_port,
+			     query_queue_work);
+	while ((skb = skb_dequeue(&pmctx->query_queue)))
+		NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT, dev_net(skb->dev),
+			NULL, skb, NULL, skb->dev, br_dev_queue_push_xmit);
+}
+
+static void br_multicast_query_queue_work(struct work_struct *work)
+{
+	struct net_bridge_mcast *brmctx;
+	struct sk_buff *skb;
+
+	brmctx = container_of(work, struct net_bridge_mcast, query_queue_work);
+	while ((skb = skb_dequeue(&brmctx->query_queue)))
+		netif_rx(skb);
+}
+
+#define BR_MULTICAST_QUERY_QUEUE_LEN_MAX	1000
+
 static void __br_multicast_send_query(struct net_bridge_mcast *brmctx,
 				      struct net_bridge_mcast_port *pmctx,
 				      struct net_bridge_port_group *pg,
@@ -1785,6 +1809,7 @@ static void __br_multicast_send_query(struct net_bridge_mcast *brmctx,
 				      u8 sflag,
 				      bool *need_rexmit)
 {
+	struct sk_buff_head *queue;
 	bool over_lmqt = !!sflag;
 	struct sk_buff *skb;
 	u8 igmp_type;
@@ -1793,7 +1818,12 @@ static void __br_multicast_send_query(struct net_bridge_mcast *brmctx,
 	    !br_multicast_ctx_matches_vlan_snooping(brmctx))
 		return;
 
+	queue = pmctx ? &pmctx->query_queue : &brmctx->query_queue;
+
 again_under_lmqt:
+	if (skb_queue_len_lockless(queue) >= BR_MULTICAST_QUERY_QUEUE_LEN_MAX)
+		return;
+
 	skb = br_multicast_alloc_query(brmctx, pmctx, pg, ip_dst, group,
 				       with_srcs, over_lmqt, sflag, &igmp_type,
 				       need_rexmit);
@@ -1804,9 +1834,8 @@ static void __br_multicast_send_query(struct net_bridge_mcast *brmctx,
 		skb->dev = pmctx->port->dev;
 		br_multicast_count(brmctx->br, pmctx->port, skb, igmp_type,
 				   BR_MCAST_DIR_TX);
-		NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT,
-			dev_net(pmctx->port->dev), NULL, skb, NULL, skb->dev,
-			br_dev_queue_push_xmit);
+		skb_queue_tail(queue, skb);
+		queue_work(system_highpri_wq, &pmctx->query_queue_work);
 
 		if (over_lmqt && with_srcs && sflag) {
 			over_lmqt = false;
@@ -1816,7 +1845,8 @@ static void __br_multicast_send_query(struct net_bridge_mcast *brmctx,
 		br_multicast_select_own_querier(brmctx, group, skb);
 		br_multicast_count(brmctx->br, NULL, skb, igmp_type,
 				   BR_MCAST_DIR_RX);
-		netif_rx(skb);
+		skb_queue_tail(queue, skb);
+		queue_work(system_highpri_wq, &brmctx->query_queue_work);
 	}
 }
 
@@ -1999,6 +2029,10 @@ void br_multicast_port_ctx_init(struct net_bridge_port *port,
 	pmctx->port = port;
 	pmctx->vlan = vlan;
 	pmctx->multicast_router = MDB_RTR_TYPE_TEMP_QUERY;
+
+	skb_queue_head_init(&pmctx->query_queue);
+	INIT_WORK(&pmctx->query_queue_work, br_multicast_port_query_queue_work);
+
 	timer_setup(&pmctx->ip4_mc_router_timer,
 		    br_ip4_multicast_router_expired, 0);
 	timer_setup(&pmctx->ip4_own_query.timer,
@@ -2038,6 +2072,7 @@ void br_multicast_port_ctx_deinit(struct net_bridge_mcast_port *pmctx)
 	del |= br_ip4_multicast_rport_del(pmctx);
 	br_multicast_rport_del_notify(pmctx, del);
 	spin_unlock_bh(&br->multicast_lock);
+	flush_work(&pmctx->query_queue_work);
 }
 
 int br_multicast_add_port(struct net_bridge_port *port)
@@ -4111,6 +4146,9 @@ void br_multicast_ctx_init(struct net_bridge *br,
 	seqcount_spinlock_init(&brmctx->ip6_querier.seq, &br->multicast_lock);
 #endif
 
+	skb_queue_head_init(&brmctx->query_queue);
+	INIT_WORK(&brmctx->query_queue_work, br_multicast_query_queue_work);
+
 	timer_setup(&brmctx->ip4_mc_router_timer,
 		    br_ip4_multicast_local_router_expired, 0);
 	timer_setup(&brmctx->ip4_other_query.timer,
@@ -4134,6 +4172,7 @@ void br_multicast_ctx_init(struct net_bridge *br,
 void br_multicast_ctx_deinit(struct net_bridge_mcast *brmctx)
 {
 	__br_multicast_stop(brmctx);
+	flush_work(&brmctx->query_queue_work);
 }
 
 void br_multicast_init(struct net_bridge *br)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index bed1b1d9b282..31e317a3529c 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -131,6 +131,8 @@ struct net_bridge_mcast_port {
 	unsigned char			multicast_router;
 	u32				mdb_n_entries;
 	u32				mdb_max_entries;
+	struct sk_buff_head             query_queue;
+	struct work_struct              query_queue_work;
 #endif /* CONFIG_BRIDGE_IGMP_SNOOPING */
 };
 
@@ -167,6 +169,8 @@ struct net_bridge_mcast {
 	struct bridge_mcast_own_query	ip6_own_query;
 	struct bridge_mcast_querier	ip6_querier;
 #endif /* IS_ENABLED(CONFIG_IPV6) */
+	struct sk_buff_head             query_queue;
+	struct work_struct              query_queue_work;
 #endif /* CONFIG_BRIDGE_IGMP_SNOOPING */
 };
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH net v2] bridge: mcast: Fix a false positive lockdep splat
  2026-04-30 16:26 [PATCH net v2] bridge: mcast: Fix a false positive lockdep splat Ido Schimmel
@ 2026-04-30 17:10 ` Eric Dumazet
  0 siblings, 0 replies; 2+ messages in thread
From: Eric Dumazet @ 2026-04-30 17:10 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, bridge, davem, kuba, pabeni, razor, horms, herbert,
	linus.luessing, petrm

On Thu, Apr 30, 2026 at 9:27 AM Ido Schimmel <idosch@nvidia.com> wrote:
>
> Connecting two bridges on the same system [1] can result in a lockdep
> splat [2].
>
> The report is a false positive. Multicast queries are built and
> transmitted under the bridge multicast lock. When the outgoing port of
> one bridge is configured on top of another bridge, the transmit path
> re-enters bridge code and acquires the other bridge's multicast lock in
> order to snoop the query. Both lock instances share a single lockdep
> class, so lockdep flags the nested acquisition as an AA deadlock.
>
> Giving each bridge its own lock class will not solve the problem: the
> reverse topology would produce an ABBA splat with the same pair of
> classes. It also consumes a lockdep key per bridge.
>
> Instead, fix the problem by deferring the transmission of the queries to
> a workqueue. Build the skb and update querier state under the lock as
> before, then enqueue the skb on a per multicast context queue and
> schedule the work.
>
> Flush the work when the multicast context is de-initialized. At this
> stage the work cannot be requeued. There is no need to take a reference
> on skb->dev since the work cannot outlive the bridge or the bridge port.
>
> Use the high priority workqueue to reduce the delay between the enqueue
> time and the transmission time. With default settings (i.e., querier
> interval - 255 seconds, query interval - 125 seconds) the extra delay
> should not be a problem.
>
> Avoid the unlikely case of the queue growing endlessly by limiting it to
> 1,000 skbs. Use this number for the simple reason that this is the
> default Tx queue length.
>
> [1]
> ip link add name br1 up type bridge mcast_snooping 1 mcast_querier 1
> ip link add name br0 up type bridge mcast_snooping 1 mcast_querier 1
> ip link add link br0 name br0.10 up master br1 type vlan id 10
>
> [2]
> WARNING: possible recursive locking detected
> 7.0.0-virtme-gb50c64a58a90 #1 Not tainted
> [...]
> ip/339 is trying to acquire lock:
> ffff888104f0b480 (&br->multicast_lock){+.-.}-{3:3}, at: br_ip6_multicast_query (net/bridge/br_multicast.c:3584)
>
> but task is already holding lock:
> ffff888104f03480 (&br->multicast_lock){+.-.}-{3:3}, at: br_multicast_port_query_expired (net/bridge/br_multicast.c:1904)
>
> [...]
>
> Call Trace:
> [...]
> br_ip6_multicast_query (net/bridge/br_multicast.c:3584)
> br_multicast_ipv6_rcv (net/bridge/br_multicast.c:3988)
> br_dev_xmit (net/bridge/br_device.c:98 (discriminator 1))
> dev_hard_start_xmit (./include/linux/netdevice.h:5343 ./include/linux/netdevice.h:5352 net/core/dev.c:3888 net/core/dev.c:3904)
> __dev_queue_xmit (./include/linux/netdevice.h:3619 net/core/dev.c:4871)
> vlan_dev_hard_start_xmit (net/8021q/vlan_dev.c:131 (discriminator 1))
> dev_hard_start_xmit (./include/linux/netdevice.h:5343 ./include/linux/netdevice.h:5352 net/core/dev.c:3888 net/core/dev.c:3904)
> __dev_queue_xmit (./include/linux/netdevice.h:3619 net/core/dev.c:4871)
> br_dev_queue_push_xmit (net/bridge/br_forward.c:60)
> __br_multicast_send_query (net/bridge/br_multicast.c:1811 (discriminator 1))
> br_multicast_send_query (net/bridge/br_multicast.c:1889)
> br_multicast_port_query_expired (./include/linux/spinlock.h:390 net/bridge/br_multicast.c:1914)
> call_timer_fn (./arch/x86/include/asm/jump_label.h:37 ./include/trace/events/timer.h:127 kernel/time/timer.c:1749)
> [...]
>
> Fixes: eb1d16414339 ("bridge: Add core IGMP snooping support")
> Reported-by: syzbot+d7b7f1412c02134efa6d@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/netdev/000000000000c4c9d405f2643e01@google.com/
> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
> v2:
>   - Limit the queue to 1,000 skbs.
>   - Edit the trace to avoid checkpatch errors.
> v1: https://lore.kernel.org/netdev/20260426133435.207006-1-idosch@nvidia.com/
> ---
>  net/bridge/br_multicast.c | 47 +++++++++++++++++++++++++++++++++++----
>  net/bridge/br_private.h   |  4 ++++
>  2 files changed, 47 insertions(+), 4 deletions(-)
>
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 881d866d687a..e9f5fe01ff95 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -1776,6 +1776,30 @@ static void br_multicast_select_own_querier(struct net_bridge_mcast *brmctx,
>  #endif
>  }
>
> +static void br_multicast_port_query_queue_work(struct work_struct *work)
> +{
> +       struct net_bridge_mcast_port *pmctx;
> +       struct sk_buff *skb;
> +
> +       pmctx = container_of(work, struct net_bridge_mcast_port,
> +                            query_queue_work);
> +       while ((skb = skb_dequeue(&pmctx->query_queue)))
> +               NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT, dev_net(skb->dev),
> +                       NULL, skb, NULL, skb->dev, br_dev_queue_push_xmit);
> +}
> +
> +static void br_multicast_query_queue_work(struct work_struct *work)
> +{
> +       struct net_bridge_mcast *brmctx;
> +       struct sk_buff *skb;
> +
> +       brmctx = container_of(work, struct net_bridge_mcast, query_queue_work);
> +       while ((skb = skb_dequeue(&brmctx->query_queue)))
> +               netif_rx(skb);
> +}

These two functions could loop forever under flood.

Perhaps use skb_queue_splice_init() to limit each round to the ~1,000
limit you have mentioned in the changelog.

And return after the spliced list has been processed.

Bonus: no more spinlock acquisition for each skb_dequeue()

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-30 17:10 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-30 16:26 [PATCH net v2] bridge: mcast: Fix a false positive lockdep splat Ido Schimmel
2026-04-30 17:10 ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox