[PATCH net] bridge: mcast: Fix a false positive lockdep splat

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH net] bridge: mcast: Fix a false positive lockdep splat
@ 2026-04-26 13:34 Ido Schimmel
  2026-04-28 13:35 ` Simon Horman
  2026-04-28 14:10 ` Paolo Abeni
  0 siblings, 2 replies; 4+ messages in thread
From: Ido Schimmel @ 2026-04-26 13:34 UTC (permalink / raw)
  To: netdev, bridge
  Cc: davem, kuba, pabeni, edumazet, razor, horms, herbert,
	linus.luessing, Ido Schimmel

Connecting two bridges on the same system [1] can result in a lockdep
splat [2].

The report is a false positive. Multicast queries are built and
transmitted under the bridge multicast lock. When the outgoing port of
one bridge is configured on top of another bridge, the transmit path
re-enters bridge code and acquires the other bridge's multicast lock in
order to snoop the query. Both lock instances share a single lockdep
class, so lockdep flags the nested acquisition as an AA deadlock.

Giving each bridge its own lock class will not solve the problem: the
reverse topology would produce an ABBA splat with the same pair of
classes. It also consumes a lockdep key per bridge.

Instead, fix the problem by deferring the transmission of the queries to
a workqueue. Build the skb and update querier state under the lock as
before, then enqueue the skb on a per multicast context queue and
schedule the work.

Flush the work when the multicast context is de-initialized. At this
stage the work cannot be requeued. There is no need to take a reference
on skb->dev since the work cannot outlive the bridge or the bridge port.

Use the high priority workqueue to reduce the delay between the enqueue
time and the transmission time. With default settings (i.e., querier
interval - 255 seconds, query interval - 125 seconds) the extra delay
should not be a problem.

[1]
ip link add name br1 up type bridge mcast_snooping 1 mcast_querier 1
ip link add name br0 up type bridge mcast_snooping 1 mcast_querier 1
ip link add link br0 name br0.10 up master br1 type vlan id 10

[2]
============================================
WARNING: possible recursive locking detected
7.0.0-virtme-gb50c64a58a90 #1 Not tainted
--------------------------------------------
ip/339 is trying to acquire lock:
ffff888104f0b480 (&br->multicast_lock){+.-.}-{3:3}, at: br_ip6_multicast_query (net/bridge/br_multicast.c:3584)

but task is already holding lock:
ffff888104f03480 (&br->multicast_lock){+.-.}-{3:3}, at: br_multicast_port_query_expired (net/bridge/br_multicast.c:1904)

[...]

Call Trace:
[...]
br_ip6_multicast_query (net/bridge/br_multicast.c:3584)
br_multicast_ipv6_rcv (net/bridge/br_multicast.c:3988)
br_dev_xmit (net/bridge/br_device.c:98 (discriminator 1))
dev_hard_start_xmit (./include/linux/netdevice.h:5343 ./include/linux/netdevice.h:5352 net/core/dev.c:3888 net/core/dev.c:3904)
__dev_queue_xmit (./include/linux/netdevice.h:3619 net/core/dev.c:4871)
vlan_dev_hard_start_xmit (net/8021q/vlan_dev.c:131 (discriminator 1))
dev_hard_start_xmit (./include/linux/netdevice.h:5343 ./include/linux/netdevice.h:5352 net/core/dev.c:3888 net/core/dev.c:3904)
__dev_queue_xmit (./include/linux/netdevice.h:3619 net/core/dev.c:4871)
br_dev_queue_push_xmit (net/bridge/br_forward.c:60)
__br_multicast_send_query (net/bridge/br_multicast.c:1811 (discriminator 1))
br_multicast_send_query (net/bridge/br_multicast.c:1889)
br_multicast_port_query_expired (./include/linux/spinlock.h:390 net/bridge/br_multicast.c:1914)
call_timer_fn (./arch/x86/include/asm/jump_label.h:37 ./include/trace/events/timer.h:127 kernel/time/timer.c:1749)
[...]

Fixes: eb1d16414339 ("bridge: Add core IGMP snooping support")
Reported-by: syzbot+d7b7f1412c02134efa6d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/000000000000c4c9d405f2643e01@google.com/
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 net/bridge/br_multicast.c | 39 +++++++++++++++++++++++++++++++++++----
 net/bridge/br_private.h   |  4 ++++
 2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 881d866d687a..252c46977ed5 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1776,6 +1776,28 @@ static void br_multicast_select_own_querier(struct net_bridge_mcast *brmctx,
 #endif
 }
 
+static void br_multicast_port_query_queue_work(struct work_struct *work)
+{
+	struct net_bridge_mcast_port *pmctx;
+	struct sk_buff *skb;
+
+	pmctx = container_of(work, struct net_bridge_mcast_port,
+			     query_queue_work);
+	while ((skb = skb_dequeue(&pmctx->query_queue)))
+		NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT, dev_net(skb->dev),
+			NULL, skb, NULL, skb->dev, br_dev_queue_push_xmit);
+}
+
+static void br_multicast_query_queue_work(struct work_struct *work)
+{
+	struct net_bridge_mcast *brmctx;
+	struct sk_buff *skb;
+
+	brmctx = container_of(work, struct net_bridge_mcast, query_queue_work);
+	while ((skb = skb_dequeue(&brmctx->query_queue)))
+		netif_rx(skb);
+}
+
 static void __br_multicast_send_query(struct net_bridge_mcast *brmctx,
 				      struct net_bridge_mcast_port *pmctx,
 				      struct net_bridge_port_group *pg,
@@ -1804,9 +1826,8 @@ static void __br_multicast_send_query(struct net_bridge_mcast *brmctx,
 		skb->dev = pmctx->port->dev;
 		br_multicast_count(brmctx->br, pmctx->port, skb, igmp_type,
 				   BR_MCAST_DIR_TX);
-		NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT,
-			dev_net(pmctx->port->dev), NULL, skb, NULL, skb->dev,
-			br_dev_queue_push_xmit);
+		skb_queue_tail(&pmctx->query_queue, skb);
+		queue_work(system_highpri_wq, &pmctx->query_queue_work);
 
 		if (over_lmqt && with_srcs && sflag) {
 			over_lmqt = false;
@@ -1816,7 +1837,8 @@ static void __br_multicast_send_query(struct net_bridge_mcast *brmctx,
 		br_multicast_select_own_querier(brmctx, group, skb);
 		br_multicast_count(brmctx->br, NULL, skb, igmp_type,
 				   BR_MCAST_DIR_RX);
-		netif_rx(skb);
+		skb_queue_tail(&brmctx->query_queue, skb);
+		queue_work(system_highpri_wq, &brmctx->query_queue_work);
 	}
 }
 
@@ -1999,6 +2021,10 @@ void br_multicast_port_ctx_init(struct net_bridge_port *port,
 	pmctx->port = port;
 	pmctx->vlan = vlan;
 	pmctx->multicast_router = MDB_RTR_TYPE_TEMP_QUERY;
+
+	skb_queue_head_init(&pmctx->query_queue);
+	INIT_WORK(&pmctx->query_queue_work, br_multicast_port_query_queue_work);
+
 	timer_setup(&pmctx->ip4_mc_router_timer,
 		    br_ip4_multicast_router_expired, 0);
 	timer_setup(&pmctx->ip4_own_query.timer,
@@ -2038,6 +2064,7 @@ void br_multicast_port_ctx_deinit(struct net_bridge_mcast_port *pmctx)
 	del |= br_ip4_multicast_rport_del(pmctx);
 	br_multicast_rport_del_notify(pmctx, del);
 	spin_unlock_bh(&br->multicast_lock);
+	flush_work(&pmctx->query_queue_work);
 }
 
 int br_multicast_add_port(struct net_bridge_port *port)
@@ -4111,6 +4138,9 @@ void br_multicast_ctx_init(struct net_bridge *br,
 	seqcount_spinlock_init(&brmctx->ip6_querier.seq, &br->multicast_lock);
 #endif
 
+	skb_queue_head_init(&brmctx->query_queue);
+	INIT_WORK(&brmctx->query_queue_work, br_multicast_query_queue_work);
+
 	timer_setup(&brmctx->ip4_mc_router_timer,
 		    br_ip4_multicast_local_router_expired, 0);
 	timer_setup(&brmctx->ip4_other_query.timer,
@@ -4134,6 +4164,7 @@ void br_multicast_ctx_init(struct net_bridge *br,
 void br_multicast_ctx_deinit(struct net_bridge_mcast *brmctx)
 {
 	__br_multicast_stop(brmctx);
+	flush_work(&brmctx->query_queue_work);
 }
 
 void br_multicast_init(struct net_bridge *br)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 361a9b84451e..2695b9128705 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -131,6 +131,8 @@ struct net_bridge_mcast_port {
 	unsigned char			multicast_router;
 	u32				mdb_n_entries;
 	u32				mdb_max_entries;
+	struct sk_buff_head             query_queue;
+	struct work_struct              query_queue_work;
 #endif /* CONFIG_BRIDGE_IGMP_SNOOPING */
 };
 
@@ -167,6 +169,8 @@ struct net_bridge_mcast {
 	struct bridge_mcast_own_query	ip6_own_query;
 	struct bridge_mcast_querier	ip6_querier;
 #endif /* IS_ENABLED(CONFIG_IPV6) */
+	struct sk_buff_head             query_queue;
+	struct work_struct              query_queue_work;
 #endif /* CONFIG_BRIDGE_IGMP_SNOOPING */
 };
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net] bridge: mcast: Fix a false positive lockdep splat
  2026-04-26 13:34 [PATCH net] bridge: mcast: Fix a false positive lockdep splat Ido Schimmel
@ 2026-04-28 13:35 ` Simon Horman
  2026-04-28 14:10 ` Paolo Abeni
  1 sibling, 0 replies; 4+ messages in thread
From: Simon Horman @ 2026-04-28 13:35 UTC (permalink / raw)
  To: idosch
  Cc: 'Simon Horman', netdev, bridge, davem, kuba, pabeni,
	edumazet, razor, herbert, linus.luessing

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
bridge: mcast: Fix a false positive lockdep splat

This commit defers the transmission of multicast queries to a high priority
workqueue to avoid a false positive lockdep splat caused by nested bridge
multicast lock acquisitions. It queues skbs to a per-context query_queue
and flushes them asynchronously.

Closes: https://lore.kernel.org/netdev/000000000000c4c9d405f2643e01@google.com/

> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 881d866d687a..252c46977ed5 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c

[ ... ]

> @@ -1804,9 +1826,8 @@ static void __br_multicast_send_query(struct net_bridge_mcast *brmctx,
>  		skb->dev = pmctx->port->dev;
>  		br_multicast_count(brmctx->br, pmctx->port, skb, igmp_type,
>  				   BR_MCAST_DIR_TX);
> -		NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT,
> -			dev_net(pmctx->port->dev), NULL, skb, NULL, skb->dev,
> -			br_dev_queue_push_xmit);
> +		skb_queue_tail(&pmctx->query_queue, skb);
> +		queue_work(system_highpri_wq, &pmctx->query_queue_work);

Can this code cause an out-of-memory condition if flooded with IGMP/MLD
Leave packets?

In the IGMP data path, br_multicast_leave_group() is triggered for each
received Leave packet. If the bridge is the active querier, it directly calls
__br_multicast_send_query() for each packet, which allocates a new query skb
and queues it here using skb_queue_tail().

Because the workqueue execution handles TX networking overhead, could softirq
RX packet processing outpace the workqueue's ability to dequeue them?

Since skb_queue_tail() does not impose any limits on the queue length, it
looks like this might result in an unbounded accumulation of skbs in the
query queue, potentially bypassing standard qdisc backpressure mechanisms and
exhausting system memory.

>  
>  		if (over_lmqt && with_srcs && sflag) {
>  			over_lmqt = false;
> @@ -1816,7 +1837,8 @@ static void __br_multicast_send_query(struct net_bridge_mcast *brmctx,
>  		br_multicast_select_own_querier(brmctx, group, skb);
>  		br_multicast_count(brmctx->br, NULL, skb, igmp_type,
>  				   BR_MCAST_DIR_RX);
> -		netif_rx(skb);
> +		skb_queue_tail(&brmctx->query_queue, skb);
> +		queue_work(system_highpri_wq, &brmctx->query_queue_work);
>  	}
>  }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] bridge: mcast: Fix a false positive lockdep splat
  2026-04-26 13:34 [PATCH net] bridge: mcast: Fix a false positive lockdep splat Ido Schimmel
  2026-04-28 13:35 ` Simon Horman
@ 2026-04-28 14:10 ` Paolo Abeni
  2026-04-29  7:32   ` Nikolay Aleksandrov
  1 sibling, 1 reply; 4+ messages in thread
From: Paolo Abeni @ 2026-04-28 14:10 UTC (permalink / raw)
  To: Ido Schimmel, netdev, bridge
  Cc: davem, kuba, edumazet, razor, horms, herbert, linus.luessing

On 4/26/26 3:34 PM, Ido Schimmel wrote:
> Connecting two bridges on the same system [1] can result in a lockdep
> splat [2].
> 
> The report is a false positive. Multicast queries are built and
> transmitted under the bridge multicast lock. When the outgoing port of
> one bridge is configured on top of another bridge, the transmit path
> re-enters bridge code and acquires the other bridge's multicast lock in
> order to snoop the query. Both lock instances share a single lockdep
> class, so lockdep flags the nested acquisition as an AA deadlock.
> 
> Giving each bridge its own lock class will not solve the problem: the
> reverse topology would produce an ABBA splat with the same pair of
> classes. It also consumes a lockdep key per bridge.
> 
> Instead, fix the problem by deferring the transmission of the queries to
> a workqueue. Build the skb and update querier state under the lock as
> before, then enqueue the skb on a per multicast context queue and
> schedule the work.

I must admit that introducing an additional WQ to fix a false positive
feels a bit overkill to me - even if I can't think of a better solution
on top of my head.

> Flush the work when the multicast context is de-initialized. At this
> stage the work cannot be requeued. There is no need to take a reference
> on skb->dev since the work cannot outlive the bridge or the bridge port.
> 
> Use the high priority workqueue to reduce the delay between the enqueue
> time and the transmission time. With default settings (i.e., querier
> interval - 255 seconds, query interval - 125 seconds) the extra delay
> should not be a problem.
> 
> [1]
> ip link add name br1 up type bridge mcast_snooping 1 mcast_querier 1
> ip link add name br0 up type bridge mcast_snooping 1 mcast_querier 1
> ip link add link br0 name br0.10 up master br1 type vlan id 10
> 
> [2]
> ============================================
> WARNING: possible recursive locking detected
> 7.0.0-virtme-gb50c64a58a90 #1 Not tainted
> --------------------------------------------

checkpatch reports that the above separator may break tool. Possibly
just remove it from the commit message.

> ip/339 is trying to acquire lock:
> ffff888104f0b480 (&br->multicast_lock){+.-.}-{3:3}, at: br_ip6_multicast_query (net/bridge/br_multicast.c:3584)
> 
> but task is already holding lock:
> ffff888104f03480 (&br->multicast_lock){+.-.}-{3:3}, at: br_multicast_port_query_expired (net/bridge/br_multicast.c:1904)
> 
> [...]
> 
> Call Trace:
> [...]
> br_ip6_multicast_query (net/bridge/br_multicast.c:3584)
> br_multicast_ipv6_rcv (net/bridge/br_multicast.c:3988)
> br_dev_xmit (net/bridge/br_device.c:98 (discriminator 1))
> dev_hard_start_xmit (./include/linux/netdevice.h:5343 ./include/linux/netdevice.h:5352 net/core/dev.c:3888 net/core/dev.c:3904)
> __dev_queue_xmit (./include/linux/netdevice.h:3619 net/core/dev.c:4871)
> vlan_dev_hard_start_xmit (net/8021q/vlan_dev.c:131 (discriminator 1))
> dev_hard_start_xmit (./include/linux/netdevice.h:5343 ./include/linux/netdevice.h:5352 net/core/dev.c:3888 net/core/dev.c:3904)
> __dev_queue_xmit (./include/linux/netdevice.h:3619 net/core/dev.c:4871)
> br_dev_queue_push_xmit (net/bridge/br_forward.c:60)
> __br_multicast_send_query (net/bridge/br_multicast.c:1811 (discriminator 1))
> br_multicast_send_query (net/bridge/br_multicast.c:1889)
> br_multicast_port_query_expired (./include/linux/spinlock.h:390 net/bridge/br_multicast.c:1914)
> call_timer_fn (./arch/x86/include/asm/jump_label.h:37 ./include/trace/events/timer.h:127 kernel/time/timer.c:1749)
> [...]
> 
> Fixes: eb1d16414339 ("bridge: Add core IGMP snooping support")
> Reported-by: syzbot+d7b7f1412c02134efa6d@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/netdev/000000000000c4c9d405f2643e01@google.com/
> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_multicast.c | 39 +++++++++++++++++++++++++++++++++++----
>  net/bridge/br_private.h   |  4 ++++
>  2 files changed, 39 insertions(+), 4 deletions(-)
> 
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 881d866d687a..252c46977ed5 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -1776,6 +1776,28 @@ static void br_multicast_select_own_querier(struct net_bridge_mcast *brmctx,
>  #endif
>  }
>  
> +static void br_multicast_port_query_queue_work(struct work_struct *work)
> +{
> +	struct net_bridge_mcast_port *pmctx;
> +	struct sk_buff *skb;
> +
> +	pmctx = container_of(work, struct net_bridge_mcast_port,
> +			     query_queue_work);
> +	while ((skb = skb_dequeue(&pmctx->query_queue)))
> +		NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT, dev_net(skb->dev),
> +			NULL, skb, NULL, skb->dev, br_dev_queue_push_xmit);
> +}
> +
> +static void br_multicast_query_queue_work(struct work_struct *work)
> +{
> +	struct net_bridge_mcast *brmctx;
> +	struct sk_buff *skb;
> +
> +	brmctx = container_of(work, struct net_bridge_mcast, query_queue_work);
> +	while ((skb = skb_dequeue(&brmctx->query_queue)))
> +		netif_rx(skb);
> +}
> +
>  static void __br_multicast_send_query(struct net_bridge_mcast *brmctx,
>  				      struct net_bridge_mcast_port *pmctx,
>  				      struct net_bridge_port_group *pg,
> @@ -1804,9 +1826,8 @@ static void __br_multicast_send_query(struct net_bridge_mcast *brmctx,
>  		skb->dev = pmctx->port->dev;
>  		br_multicast_count(brmctx->br, pmctx->port, skb, igmp_type,
>  				   BR_MCAST_DIR_TX);
> -		NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT,
> -			dev_net(pmctx->port->dev), NULL, skb, NULL, skb->dev,
> -			br_dev_queue_push_xmit);
> +		skb_queue_tail(&pmctx->query_queue, skb);
> +		queue_work(system_highpri_wq, &pmctx->query_queue_work);

Also the AI reported concerns vs unbounded queue len looks relevant.
Usually the RX path is slower than TX, but i.e. asymmetric filtering
rules could reverse the scenario.

/P


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] bridge: mcast: Fix a false positive lockdep splat
  2026-04-28 14:10 ` Paolo Abeni
@ 2026-04-29  7:32   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 4+ messages in thread
From: Nikolay Aleksandrov @ 2026-04-29  7:32 UTC (permalink / raw)
  To: Paolo Abeni, Ido Schimmel, netdev, bridge
  Cc: davem, kuba, edumazet, horms, herbert, linus.luessing

On 28/04/2026 17:10, Paolo Abeni wrote:
> On 4/26/26 3:34 PM, Ido Schimmel wrote:
>> Connecting two bridges on the same system [1] can result in a lockdep
>> splat [2].
>>
>> The report is a false positive. Multicast queries are built and
>> transmitted under the bridge multicast lock. When the outgoing port of
>> one bridge is configured on top of another bridge, the transmit path
>> re-enters bridge code and acquires the other bridge's multicast lock in
>> order to snoop the query. Both lock instances share a single lockdep
>> class, so lockdep flags the nested acquisition as an AA deadlock.
>>
>> Giving each bridge its own lock class will not solve the problem: the
>> reverse topology would produce an ABBA splat with the same pair of
>> classes. It also consumes a lockdep key per bridge.
>>
>> Instead, fix the problem by deferring the transmission of the queries to
>> a workqueue. Build the skb and update querier state under the lock as
>> before, then enqueue the skb on a per multicast context queue and
>> schedule the work.
> 
> I must admit that introducing an additional WQ to fix a false positive
> feels a bit overkill to me - even if I can't think of a better solution
> on top of my head.
> 

I don't have a simpler way to fix it either, and we've had multiple reports
of this false positive in the past. It's just another job on the system wq
and if Ido limits the queue, IMO should be fine.

Cheers,
  Nik



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-29  7:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-26 13:34 [PATCH net] bridge: mcast: Fix a false positive lockdep splat Ido Schimmel
2026-04-28 13:35 ` Simon Horman
2026-04-28 14:10 ` Paolo Abeni
2026-04-29  7:32   ` Nikolay Aleksandrov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox