[PATCH v2 net] net: core: dev: add reprocess depth limit for another_round in __netif_receive_skb

Linux kernel -stable discussions
 help / color / mirror / Atom feed

* [PATCH v2 net] net: core: dev: add reprocess depth limit for another_round in __netif_receive_skb_core
@ 2026-05-12  2:21 Yizhou Zhao
  2026-05-14 11:45 ` Paolo Abeni
  0 siblings, 1 reply; 2+ messages in thread
From: Yizhou Zhao @ 2026-05-12  2:21 UTC (permalink / raw)
  To: netdev
  Cc: Yizhou Zhao, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Stanislav Fomichev, Kuniyuki Iwashima,
	Samiullah Khawaja, Hangbin Liu, Krishna Kumar, Yuxiang Yang,
	Xuewei Feng, Qi Li, Ke Xu, stable

In __netif_receive_skb_core(), the another_round label can be reached 
via a TC ingress redirect (bpf_redirect_peer returning -EAGAIN).

Across network namespaces, two BPF programs on peer devices can redirect
packets back and forth indefinitely, creating an unbounded loop that 
monopolizes a CPU core in softirq context. This leads to RCU stalls, 
soft lockups, and system-wide denial of service.

We reproduced it by creating a pair of TC BPF programs across two 
network namespaces that redirect packets to each other, and the RCU 
subsystem detects a stall:

```
[   24.835219] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[   24.835837] rcu: 	(detected by 0, t=21002 jiffies, g=-627, q=2 ncpus=1)
[   24.835959] rcu: All QSes seen, last rcu_preempt kthread activity 21002 (4294691810-4294670808), jiffies_till_next_fqs=3, root ->qsmask 0x0
[   24.836239] rcu: rcu_preempt kthread starved for 21002 jiffies! g-627 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[   24.836362] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[   24.836460] rcu: RCU grace-period kthread stack dump:
[   24.836601] task:rcu_preempt     state:R  running task     stack:15448 pid:15    tgid:15    ppid:2      task_flags:0x208040 flags:0x00080000
[   24.837139] Call Trace:
[   24.837568]  <TASK>
[   24.838008]  __schedule+0x4ed/0xea0
[   24.838934]  schedule+0x22/0xd0
[   24.839023]  schedule_timeout+0x81/0x100
[   24.839095]  ? __pfx_process_timeout+0x10/0x10
[   24.839165]  rcu_gp_fqs_loop+0x11b/0x650
[   24.839226]  ? __pfx_rcu_gp_kthread+0x10/0x10
[   24.839282]  rcu_gp_kthread+0x17e/0x210
[   24.839333]  ? __pfx_rcu_gp_kthread+0x10/0x10
[   24.839383]  kthread+0xdd/0x110
[   24.839433]  ? __pfx_kthread+0x10/0x10
[   24.839481]  ret_from_fork+0x1aa/0x260
[   24.839538]  ? __pfx_kthread+0x10/0x10
[   24.839585]  ret_from_fork_asm+0x1a/0x30
[   24.839686]  </TASK>
......
```

Fix this by adding a depth counter when it is about to go to another_round
label. When the counter exceeds XMIT_RECURSION_LIMIT (8), the packet is 
dropped. This follows the same pattern as dev_xmit_recursion() which 
protects the TX redirect path with the same limit.

Reuse SKB_DROP_REASON_TC_RECLASSIFY_LOOP for observability.

Fixes: 9aa1206e8f48 ("bpf: Add redirect_peer helper")
Cc: stable@vger.kernel.org
Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
Reported-by: Xuewei Feng <fengxw06@126.com>
Reported-by: Qi Li <qli01@tsinghua.edu.cn>
Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
Assisted-by: GLM:GLM-5.1
Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
---
Changes in v2:
- Move the check just after `another` is set to true to avoid affecting the fast path
- Reuse SKB_DROP_REASON_TC_RECLASSIFY_LOOP to avoid adding new drop reason
- Link to v1: https://lore.kernel.org/netdev/20260511063005.38134-1-zhaoyz24@mails.tsinghua.edu.cn/
---
 net/core/dev.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 831129f2a..bb9ae92f0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5958,6 +5958,7 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
 	struct net_device *orig_dev;
 	bool deliver_exact = false;
 	int ret = NET_RX_DROP;
+	int redirect_depth = 0;
 	__be16 type;

 	net_timestamp_check(!READ_ONCE(net_hotdata.tstamp_prequeue), skb);
@@ -6031,8 +6032,16 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
 		nf_skip_egress(skb, true);
 		skb = sch_handle_ingress(skb, &pt_prev, &ret, orig_dev,
 					 &another);
-		if (another)
+		if (another) {
+			if (unlikely(++redirect_depth > XMIT_RECURSION_LIMIT)) {
+				net_warn_ratelimited(
+					"%s: redirect loop limit reached, dropping (dev=%s)\n",
+					__func__, skb->dev->name);
+				drop_reason = SKB_DROP_REASON_TC_RECLASSIFY_LOOP;
+				goto drop;
+			}
 			goto another_round;
+		}
 		if (!skb)
 			goto out;

--
2.43.0

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v2 net] net: core: dev: add reprocess depth limit for another_round in __netif_receive_skb_core
  2026-05-12  2:21 [PATCH v2 net] net: core: dev: add reprocess depth limit for another_round in __netif_receive_skb_core Yizhou Zhao
@ 2026-05-14 11:45 ` Paolo Abeni
  0 siblings, 0 replies; 2+ messages in thread
From: Paolo Abeni @ 2026-05-14 11:45 UTC (permalink / raw)
  To: Yizhou Zhao, netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Simon Horman,
	Stanislav Fomichev, Kuniyuki Iwashima, Samiullah Khawaja,
	Hangbin Liu, Krishna Kumar, Yuxiang Yang, Xuewei Feng, Qi Li,
	Ke Xu, stable

On 5/12/26 4:21 AM, Yizhou Zhao wrote:
> In __netif_receive_skb_core(), the another_round label can be reached 
> via a TC ingress redirect (bpf_redirect_peer returning -EAGAIN).
> 
> Across network namespaces, two BPF programs on peer devices can redirect
> packets back and forth indefinitely, creating an unbounded loop that 
> monopolizes a CPU core in softirq context. This leads to RCU stalls, 
> soft lockups, and system-wide denial of service.
> 
> We reproduced it by creating a pair of TC BPF programs across two 
> network namespaces that redirect packets to each other, and the RCU 
> subsystem detects a stall:
> 
> ```
> [   24.835219] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [   24.835837] rcu: 	(detected by 0, t=21002 jiffies, g=-627, q=2 ncpus=1)
> [   24.835959] rcu: All QSes seen, last rcu_preempt kthread activity 21002 (4294691810-4294670808), jiffies_till_next_fqs=3, root ->qsmask 0x0
> [   24.836239] rcu: rcu_preempt kthread starved for 21002 jiffies! g-627 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> [   24.836362] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [   24.836460] rcu: RCU grace-period kthread stack dump:
> [   24.836601] task:rcu_preempt     state:R  running task     stack:15448 pid:15    tgid:15    ppid:2      task_flags:0x208040 flags:0x00080000
> [   24.837139] Call Trace:
> [   24.837568]  <TASK>
> [   24.838008]  __schedule+0x4ed/0xea0
> [   24.838934]  schedule+0x22/0xd0
> [   24.839023]  schedule_timeout+0x81/0x100
> [   24.839095]  ? __pfx_process_timeout+0x10/0x10
> [   24.839165]  rcu_gp_fqs_loop+0x11b/0x650
> [   24.839226]  ? __pfx_rcu_gp_kthread+0x10/0x10
> [   24.839282]  rcu_gp_kthread+0x17e/0x210
> [   24.839333]  ? __pfx_rcu_gp_kthread+0x10/0x10
> [   24.839383]  kthread+0xdd/0x110
> [   24.839433]  ? __pfx_kthread+0x10/0x10
> [   24.839481]  ret_from_fork+0x1aa/0x260
> [   24.839538]  ? __pfx_kthread+0x10/0x10
> [   24.839585]  ret_from_fork_asm+0x1a/0x30
> [   24.839686]  </TASK>
> ......
> ```
> 
> Fix this by adding a depth counter when it is about to go to another_round
> label. When the counter exceeds XMIT_RECURSION_LIMIT (8), the packet is 
> dropped. This follows the same pattern as dev_xmit_recursion() which 
> protects the TX redirect path with the same limit.
> 
> Reuse SKB_DROP_REASON_TC_RECLASSIFY_LOOP for observability.
> 
> Fixes: 9aa1206e8f48 ("bpf: Add redirect_peer helper")
> Cc: stable@vger.kernel.org
> Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
> Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
> Reported-by: Xuewei Feng <fengxw06@126.com>
> Reported-by: Qi Li <qli01@tsinghua.edu.cn>
> Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
> Assisted-by: GLM:GLM-5.1
> Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
> ---
> Changes in v2:
> - Move the check just after `another` is set to true to avoid affecting the fast path
> - Reuse SKB_DROP_REASON_TC_RECLASSIFY_LOOP to avoid adding new drop reason
> - Link to v1: https://lore.kernel.org/netdev/20260511063005.38134-1-zhaoyz24@mails.tsinghua.edu.cn/
> ---
>  net/core/dev.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 831129f2a..bb9ae92f0 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5958,6 +5958,7 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
>  	struct net_device *orig_dev;
>  	bool deliver_exact = false;
>  	int ret = NET_RX_DROP;
> +	int redirect_depth = 0;

As reported by sashiko, the above will cause an unused variable warning,
should be protected by #ifdef CONFIG_NET_INGRESS compiler guard.

Also please respect the reverse christmas tree order above.

/P


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-05-14 11:45 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12  2:21 [PATCH v2 net] net: core: dev: add reprocess depth limit for another_round in __netif_receive_skb_core Yizhou Zhao
2026-05-14 11:45 ` Paolo Abeni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox