* [PATCH v3 net] net: core: dev: add reprocess depth limit for another_round in __netif_receive_skb_core
@ 2026-05-14 12:24 Yizhou Zhao
2026-05-18 23:09 ` Jakub Kicinski
0 siblings, 1 reply; 2+ messages in thread
From: Yizhou Zhao @ 2026-05-14 12:24 UTC (permalink / raw)
To: netdev
Cc: Yizhou Zhao, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stanislav Fomichev, Kuniyuki Iwashima,
Samiullah Khawaja, Hangbin Liu, Krishna Kumar, Yuxiang Yang,
Xuewei Feng, Qi Li, Ke Xu, stable
In __netif_receive_skb_core(), the another_round label can be reached
via a TC ingress redirect (bpf_redirect_peer returning -EAGAIN).
Across network namespaces, two BPF programs on peer devices can redirect
packets back and forth indefinitely, creating an unbounded loop that
monopolizes a CPU core in softirq context. This leads to RCU stalls,
soft lockups, and system-wide denial of service.
We reproduced it by creating a pair of TC BPF programs across two
network namespaces that redirect packets to each other, and the RCU
subsystem detects a stall:
```
[ 24.835219] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 24.835837] rcu: (detected by 0, t=21002 jiffies, g=-627, q=2 ncpus=1)
[ 24.835959] rcu: All QSes seen, last rcu_preempt kthread activity 21002 (4294691810-4294670808), jiffies_till_next_fqs=3, root ->qsmask 0x0
[ 24.836239] rcu: rcu_preempt kthread starved for 21002 jiffies! g-627 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[ 24.836362] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[ 24.836460] rcu: RCU grace-period kthread stack dump:
[ 24.836601] task:rcu_preempt state:R running task stack:15448 pid:15 tgid:15 ppid:2 task_flags:0x208040 flags:0x00080000
[ 24.837139] Call Trace:
[ 24.837568] <TASK>
[ 24.838008] __schedule+0x4ed/0xea0
[ 24.838934] schedule+0x22/0xd0
[ 24.839023] schedule_timeout+0x81/0x100
[ 24.839095] ? __pfx_process_timeout+0x10/0x10
[ 24.839165] rcu_gp_fqs_loop+0x11b/0x650
[ 24.839226] ? __pfx_rcu_gp_kthread+0x10/0x10
[ 24.839282] rcu_gp_kthread+0x17e/0x210
[ 24.839333] ? __pfx_rcu_gp_kthread+0x10/0x10
[ 24.839383] kthread+0xdd/0x110
[ 24.839433] ? __pfx_kthread+0x10/0x10
[ 24.839481] ret_from_fork+0x1aa/0x260
[ 24.839538] ? __pfx_kthread+0x10/0x10
[ 24.839585] ret_from_fork_asm+0x1a/0x30
[ 24.839686] </TASK>
......
```
Fix this by adding a depth counter when it is about to go to another_round
label. When the counter exceeds XMIT_RECURSION_LIMIT (8), the packet is
dropped. This follows the same pattern as dev_xmit_recursion() which
protects the TX redirect path with the same limit.
Reuse SKB_DROP_REASON_TC_RECLASSIFY_LOOP for observability.
Fixes: 9aa1206e8f48 ("bpf: Add redirect_peer helper")
Cc: stable@vger.kernel.org
Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
Reported-by: Xuewei Feng <fengxw06@126.com>
Reported-by: Qi Li <qli01@tsinghua.edu.cn>
Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
Assisted-by: GLM:GLM-5.1
Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
---
Changes in v3:
- Guard redirect_depth declaration with #ifdef CONFIG_NET_INGRESS to
avoid unused variable warning when CONFIG_NET_INGRESS is not set
- Reorder variable declarations to follow reverse christmas tree style
- Link to v2: https://lore.kernel.org/netdev/20260512022127.7818-1-zhaoyz24@mails.tsinghua.edu.cn/
Changes in v2:
- Move the check just after `another` is set to true to avoid affecting the fast path
- Reuse SKB_DROP_REASON_TC_RECLASSIFY_LOOP to avoid adding new drop reason
- Link to v1: https://lore.kernel.org/netdev/20260511063005.38134-1-zhaoyz24@mails.tsinghua.edu.cn/
---
net/core/dev.c | 12 ++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 831129f2a..c8e4a1d3f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5958,6 +5958,9 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
struct net_device *orig_dev;
bool deliver_exact = false;
+#ifdef CONFIG_NET_INGRESS
+ int redirect_depth = 0;
+#endif
int ret = NET_RX_DROP;
__be16 type;
@@ -6031,8 +6034,16 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
nf_skip_egress(skb, true);
skb = sch_handle_ingress(skb, &pt_prev, &ret, orig_dev,
&another);
- if (another)
+ if (another) {
+ if (unlikely(++redirect_depth > XMIT_RECURSION_LIMIT)) {
+ net_warn_ratelimited(
+ "%s: redirect loop limit reached, dropping (dev=%s)\n",
+ __func__, skb->dev->name);
+ drop_reason = SKB_DROP_REASON_TC_RECLASSIFY_LOOP;
+ goto drop;
+ }
goto another_round;
+ }
if (!skb)
goto out;
--
2.43.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH v3 net] net: core: dev: add reprocess depth limit for another_round in __netif_receive_skb_core
2026-05-14 12:24 [PATCH v3 net] net: core: dev: add reprocess depth limit for another_round in __netif_receive_skb_core Yizhou Zhao
@ 2026-05-18 23:09 ` Jakub Kicinski
0 siblings, 0 replies; 2+ messages in thread
From: Jakub Kicinski @ 2026-05-18 23:09 UTC (permalink / raw)
To: Yizhou Zhao
Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Stanislav Fomichev, Kuniyuki Iwashima, Samiullah Khawaja,
Hangbin Liu, Krishna Kumar, Yuxiang Yang, Xuewei Feng, Qi Li,
Ke Xu, stable
On Thu, 14 May 2026 20:24:41 +0800 Yizhou Zhao wrote:
> In __netif_receive_skb_core(), the another_round label can be reached
> via a TC ingress redirect (bpf_redirect_peer returning -EAGAIN).
Does not apply to netdev/net, please rebase+repost.
--
pw-bot: cr
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-05-18 23:09 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14 12:24 [PATCH v3 net] net: core: dev: add reprocess depth limit for another_round in __netif_receive_skb_core Yizhou Zhao
2026-05-18 23:09 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox