netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH net v2] net: bridge: fix soft lockup in br_multicast_query_expired()
  2025-08-12  9:18 [PATCH net v2] net: bridge: fix soft lockup in br_multicast_query_expired() Wang Liang
@ 2025-08-12  6:10 ` Ido Schimmel
  2025-08-13  1:09   ` Wang Liang
  0 siblings, 1 reply; 3+ messages in thread
From: Ido Schimmel @ 2025-08-12  6:10 UTC (permalink / raw)
  To: Wang Liang
  Cc: razor, davem, edumazet, kuba, pabeni, horms, bridge, netdev,
	linux-kernel, yuehaibing, zhangchangzhong

On Tue, Aug 12, 2025 at 05:18:18PM +0800, Wang Liang wrote:
> When set multicast_query_interval to a large value, the local variable
> 'time' in br_multicast_send_query() may overflow. If the time is smaller
> than jiffies, the timer will expire immediately, and then call mod_timer()
> again, which creates a loop and may trigger the following soft lockup
> issue.
> 
>   watchdog: BUG: soft lockup - CPU#1 stuck for 221s! [rb_consumer:66]
>   CPU: 1 UID: 0 PID: 66 Comm: rb_consumer Not tainted 6.16.0+ #259 PREEMPT(none)
>   Call Trace:
>    <IRQ>
>    __netdev_alloc_skb+0x2e/0x3a0
>    br_ip6_multicast_alloc_query+0x212/0x1b70
>    __br_multicast_send_query+0x376/0xac0
>    br_multicast_send_query+0x299/0x510
>    br_multicast_query_expired.constprop.0+0x16d/0x1b0
>    call_timer_fn+0x3b/0x2a0
>    __run_timers+0x619/0x950
>    run_timer_softirq+0x11c/0x220
>    handle_softirqs+0x18e/0x560
>    __irq_exit_rcu+0x158/0x1a0
>    sysvec_apic_timer_interrupt+0x76/0x90
>    </IRQ>
> 
> This issue can be reproduced with:
>   ip link add br0 type bridge
>   echo 1 > /sys/class/net/br0/bridge/multicast_querier
>   echo 0xffffffffffffffff >
>   	/sys/class/net/br0/bridge/multicast_query_interval
>   ip link set dev br0 up
> 
> The multicast_startup_query_interval can also cause this issue. Similar to
> the commit 99b40610956a("net: bridge: mcast: add and enforce query interval
                         ^ missing space

> minimum"), add check for the query interval maximum to fix this issue.
> 
> Link: https://lore.kernel.org/netdev/20250806094941.1285944-1-wangliang74@huawei.com/
> Fixes: 7e4df51eb35d ("bridge: netlink: add support for igmp's intervals")

Probably doesn't matter in practice given how old both commits are, but
I think you should blame d902eee43f19 ("bridge: Add multicast
count/interval sysfs entries") instead. The commit message also uses the
sysfs path and not the netlink one.

> Suggested-by: Nikolay Aleksandrov <razor@blackwall.org>
> Signed-off-by: Wang Liang <wangliang74@huawei.com>

Code looks fine to me.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH net v2] net: bridge: fix soft lockup in br_multicast_query_expired()
@ 2025-08-12  9:18 Wang Liang
  2025-08-12  6:10 ` Ido Schimmel
  0 siblings, 1 reply; 3+ messages in thread
From: Wang Liang @ 2025-08-12  9:18 UTC (permalink / raw)
  To: razor, idosch, davem, edumazet, kuba, pabeni, horms
  Cc: bridge, netdev, linux-kernel, yuehaibing, zhangchangzhong,
	wangliang74

When set multicast_query_interval to a large value, the local variable
'time' in br_multicast_send_query() may overflow. If the time is smaller
than jiffies, the timer will expire immediately, and then call mod_timer()
again, which creates a loop and may trigger the following soft lockup
issue.

  watchdog: BUG: soft lockup - CPU#1 stuck for 221s! [rb_consumer:66]
  CPU: 1 UID: 0 PID: 66 Comm: rb_consumer Not tainted 6.16.0+ #259 PREEMPT(none)
  Call Trace:
   <IRQ>
   __netdev_alloc_skb+0x2e/0x3a0
   br_ip6_multicast_alloc_query+0x212/0x1b70
   __br_multicast_send_query+0x376/0xac0
   br_multicast_send_query+0x299/0x510
   br_multicast_query_expired.constprop.0+0x16d/0x1b0
   call_timer_fn+0x3b/0x2a0
   __run_timers+0x619/0x950
   run_timer_softirq+0x11c/0x220
   handle_softirqs+0x18e/0x560
   __irq_exit_rcu+0x158/0x1a0
   sysvec_apic_timer_interrupt+0x76/0x90
   </IRQ>

This issue can be reproduced with:
  ip link add br0 type bridge
  echo 1 > /sys/class/net/br0/bridge/multicast_querier
  echo 0xffffffffffffffff >
  	/sys/class/net/br0/bridge/multicast_query_interval
  ip link set dev br0 up

The multicast_startup_query_interval can also cause this issue. Similar to
the commit 99b40610956a("net: bridge: mcast: add and enforce query interval
minimum"), add check for the query interval maximum to fix this issue.

Link: https://lore.kernel.org/netdev/20250806094941.1285944-1-wangliang74@huawei.com/
Fixes: 7e4df51eb35d ("bridge: netlink: add support for igmp's intervals")
Suggested-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Wang Liang <wangliang74@huawei.com>
---
 net/bridge/br_multicast.c | 16 ++++++++++++++++
 net/bridge/br_private.h   |  2 ++
 2 files changed, 18 insertions(+)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 1377f31b719c..8ce145938b02 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -4818,6 +4818,14 @@ void br_multicast_set_query_intvl(struct net_bridge_mcast *brmctx,
 		intvl_jiffies = BR_MULTICAST_QUERY_INTVL_MIN;
 	}
 
+	if (intvl_jiffies > BR_MULTICAST_QUERY_INTVL_MAX) {
+		br_info(brmctx->br,
+			"trying to set multicast query interval above maximum, setting to %lu (%ums)\n",
+			jiffies_to_clock_t(BR_MULTICAST_QUERY_INTVL_MAX),
+			jiffies_to_msecs(BR_MULTICAST_QUERY_INTVL_MAX));
+		intvl_jiffies = BR_MULTICAST_QUERY_INTVL_MAX;
+	}
+
 	brmctx->multicast_query_interval = intvl_jiffies;
 }
 
@@ -4834,6 +4842,14 @@ void br_multicast_set_startup_query_intvl(struct net_bridge_mcast *brmctx,
 		intvl_jiffies = BR_MULTICAST_STARTUP_QUERY_INTVL_MIN;
 	}
 
+	if (intvl_jiffies > BR_MULTICAST_STARTUP_QUERY_INTVL_MAX) {
+		br_info(brmctx->br,
+			"trying to set multicast startup query interval above maximum, setting to %lu (%ums)\n",
+			jiffies_to_clock_t(BR_MULTICAST_STARTUP_QUERY_INTVL_MAX),
+			jiffies_to_msecs(BR_MULTICAST_STARTUP_QUERY_INTVL_MAX));
+		intvl_jiffies = BR_MULTICAST_STARTUP_QUERY_INTVL_MAX;
+	}
+
 	brmctx->multicast_startup_query_interval = intvl_jiffies;
 }
 
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index b159aae594c0..8de0904b9627 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -31,6 +31,8 @@
 #define BR_MULTICAST_DEFAULT_HASH_MAX 4096
 #define BR_MULTICAST_QUERY_INTVL_MIN msecs_to_jiffies(1000)
 #define BR_MULTICAST_STARTUP_QUERY_INTVL_MIN BR_MULTICAST_QUERY_INTVL_MIN
+#define BR_MULTICAST_QUERY_INTVL_MAX msecs_to_jiffies(86400000) /* 24 hours */
+#define BR_MULTICAST_STARTUP_QUERY_INTVL_MAX BR_MULTICAST_QUERY_INTVL_MAX
 
 #define BR_HWDOM_MAX BITS_PER_LONG
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net v2] net: bridge: fix soft lockup in br_multicast_query_expired()
  2025-08-12  6:10 ` Ido Schimmel
@ 2025-08-13  1:09   ` Wang Liang
  0 siblings, 0 replies; 3+ messages in thread
From: Wang Liang @ 2025-08-13  1:09 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: razor, davem, edumazet, kuba, pabeni, horms, bridge, netdev,
	linux-kernel, yuehaibing, zhangchangzhong


在 2025/8/12 14:10, Ido Schimmel 写道:
> On Tue, Aug 12, 2025 at 05:18:18PM +0800, Wang Liang wrote:
>> When set multicast_query_interval to a large value, the local variable
>> 'time' in br_multicast_send_query() may overflow. If the time is smaller
>> than jiffies, the timer will expire immediately, and then call mod_timer()
>> again, which creates a loop and may trigger the following soft lockup
>> issue.
>>
>>    watchdog: BUG: soft lockup - CPU#1 stuck for 221s! [rb_consumer:66]
>>    CPU: 1 UID: 0 PID: 66 Comm: rb_consumer Not tainted 6.16.0+ #259 PREEMPT(none)
>>    Call Trace:
>>     <IRQ>
>>     __netdev_alloc_skb+0x2e/0x3a0
>>     br_ip6_multicast_alloc_query+0x212/0x1b70
>>     __br_multicast_send_query+0x376/0xac0
>>     br_multicast_send_query+0x299/0x510
>>     br_multicast_query_expired.constprop.0+0x16d/0x1b0
>>     call_timer_fn+0x3b/0x2a0
>>     __run_timers+0x619/0x950
>>     run_timer_softirq+0x11c/0x220
>>     handle_softirqs+0x18e/0x560
>>     __irq_exit_rcu+0x158/0x1a0
>>     sysvec_apic_timer_interrupt+0x76/0x90
>>     </IRQ>
>>
>> This issue can be reproduced with:
>>    ip link add br0 type bridge
>>    echo 1 > /sys/class/net/br0/bridge/multicast_querier
>>    echo 0xffffffffffffffff >
>>    	/sys/class/net/br0/bridge/multicast_query_interval
>>    ip link set dev br0 up
>>
>> The multicast_startup_query_interval can also cause this issue. Similar to
>> the commit 99b40610956a("net: bridge: mcast: add and enforce query interval
>                           ^ missing space
>
>> minimum"), add check for the query interval maximum to fix this issue.
>>
>> Link: https://lore.kernel.org/netdev/20250806094941.1285944-1-wangliang74@huawei.com/
>> Fixes: 7e4df51eb35d ("bridge: netlink: add support for igmp's intervals")
> Probably doesn't matter in practice given how old both commits are, but
> I think you should blame d902eee43f19 ("bridge: Add multicast
> count/interval sysfs entries") instead. The commit message also uses the
> sysfs path and not the netlink one.


Thanks for your suggestions!

The bug fix tag is really important. I will correct it and send a new 
patch later.

>> Suggested-by: Nikolay Aleksandrov <razor@blackwall.org>
>> Signed-off-by: Wang Liang <wangliang74@huawei.com>
> Code looks fine to me.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-08-13  1:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-12  9:18 [PATCH net v2] net: bridge: fix soft lockup in br_multicast_query_expired() Wang Liang
2025-08-12  6:10 ` Ido Schimmel
2025-08-13  1:09   ` Wang Liang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).