* Re: [PATCH v2 net] net: core: dev: add reprocess depth limit for another_round in __netif_receive_skb_core
2026-05-12 2:21 [PATCH v2 net] net: core: dev: add reprocess depth limit for another_round in __netif_receive_skb_core Yizhou Zhao
@ 2026-05-14 11:45 ` Paolo Abeni
2026-05-14 22:11 ` kernel test robot
1 sibling, 0 replies; 3+ messages in thread
From: Paolo Abeni @ 2026-05-14 11:45 UTC (permalink / raw)
To: Yizhou Zhao, netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Simon Horman,
Stanislav Fomichev, Kuniyuki Iwashima, Samiullah Khawaja,
Hangbin Liu, Krishna Kumar, Yuxiang Yang, Xuewei Feng, Qi Li,
Ke Xu, stable
On 5/12/26 4:21 AM, Yizhou Zhao wrote:
> In __netif_receive_skb_core(), the another_round label can be reached
> via a TC ingress redirect (bpf_redirect_peer returning -EAGAIN).
>
> Across network namespaces, two BPF programs on peer devices can redirect
> packets back and forth indefinitely, creating an unbounded loop that
> monopolizes a CPU core in softirq context. This leads to RCU stalls,
> soft lockups, and system-wide denial of service.
>
> We reproduced it by creating a pair of TC BPF programs across two
> network namespaces that redirect packets to each other, and the RCU
> subsystem detects a stall:
>
> ```
> [ 24.835219] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 24.835837] rcu: (detected by 0, t=21002 jiffies, g=-627, q=2 ncpus=1)
> [ 24.835959] rcu: All QSes seen, last rcu_preempt kthread activity 21002 (4294691810-4294670808), jiffies_till_next_fqs=3, root ->qsmask 0x0
> [ 24.836239] rcu: rcu_preempt kthread starved for 21002 jiffies! g-627 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> [ 24.836362] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [ 24.836460] rcu: RCU grace-period kthread stack dump:
> [ 24.836601] task:rcu_preempt state:R running task stack:15448 pid:15 tgid:15 ppid:2 task_flags:0x208040 flags:0x00080000
> [ 24.837139] Call Trace:
> [ 24.837568] <TASK>
> [ 24.838008] __schedule+0x4ed/0xea0
> [ 24.838934] schedule+0x22/0xd0
> [ 24.839023] schedule_timeout+0x81/0x100
> [ 24.839095] ? __pfx_process_timeout+0x10/0x10
> [ 24.839165] rcu_gp_fqs_loop+0x11b/0x650
> [ 24.839226] ? __pfx_rcu_gp_kthread+0x10/0x10
> [ 24.839282] rcu_gp_kthread+0x17e/0x210
> [ 24.839333] ? __pfx_rcu_gp_kthread+0x10/0x10
> [ 24.839383] kthread+0xdd/0x110
> [ 24.839433] ? __pfx_kthread+0x10/0x10
> [ 24.839481] ret_from_fork+0x1aa/0x260
> [ 24.839538] ? __pfx_kthread+0x10/0x10
> [ 24.839585] ret_from_fork_asm+0x1a/0x30
> [ 24.839686] </TASK>
> ......
> ```
>
> Fix this by adding a depth counter when it is about to go to another_round
> label. When the counter exceeds XMIT_RECURSION_LIMIT (8), the packet is
> dropped. This follows the same pattern as dev_xmit_recursion() which
> protects the TX redirect path with the same limit.
>
> Reuse SKB_DROP_REASON_TC_RECLASSIFY_LOOP for observability.
>
> Fixes: 9aa1206e8f48 ("bpf: Add redirect_peer helper")
> Cc: stable@vger.kernel.org
> Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
> Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
> Reported-by: Xuewei Feng <fengxw06@126.com>
> Reported-by: Qi Li <qli01@tsinghua.edu.cn>
> Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
> Assisted-by: GLM:GLM-5.1
> Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
> ---
> Changes in v2:
> - Move the check just after `another` is set to true to avoid affecting the fast path
> - Reuse SKB_DROP_REASON_TC_RECLASSIFY_LOOP to avoid adding new drop reason
> - Link to v1: https://lore.kernel.org/netdev/20260511063005.38134-1-zhaoyz24@mails.tsinghua.edu.cn/
> ---
> net/core/dev.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 831129f2a..bb9ae92f0 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5958,6 +5958,7 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
> struct net_device *orig_dev;
> bool deliver_exact = false;
> int ret = NET_RX_DROP;
> + int redirect_depth = 0;
As reported by sashiko, the above will cause an unused variable warning,
should be protected by #ifdef CONFIG_NET_INGRESS compiler guard.
Also please respect the reverse christmas tree order above.
/P
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH v2 net] net: core: dev: add reprocess depth limit for another_round in __netif_receive_skb_core
2026-05-12 2:21 [PATCH v2 net] net: core: dev: add reprocess depth limit for another_round in __netif_receive_skb_core Yizhou Zhao
2026-05-14 11:45 ` Paolo Abeni
@ 2026-05-14 22:11 ` kernel test robot
1 sibling, 0 replies; 3+ messages in thread
From: kernel test robot @ 2026-05-14 22:11 UTC (permalink / raw)
To: Yizhou Zhao, netdev
Cc: oe-kbuild-all, Yizhou Zhao, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stanislav Fomichev, Kuniyuki Iwashima,
Samiullah Khawaja, Hangbin Liu, Krishna Kumar, Yuxiang Yang,
Xuewei Feng, Qi Li, Ke Xu, stable
Hi Yizhou,
kernel test robot noticed the following build warnings:
[auto build test WARNING on net/main]
[also build test WARNING on net-next/main linus/master horms-ipvs/master v7.1-rc3 next-20260508]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Yizhou-Zhao/net-core-dev-add-reprocess-depth-limit-for-another_round-in-__netif_receive_skb_core/20260514-205938
base: net/main
patch link: https://lore.kernel.org/r/20260512022127.7818-1-zhaoyz24%40mails.tsinghua.edu.cn
patch subject: [PATCH v2 net] net: core: dev: add reprocess depth limit for another_round in __netif_receive_skb_core
config: openrisc-defconfig (https://download.01.org/0day-ci/archive/20260515/202605150631.QDJOt3V7-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 15.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260515/202605150631.QDJOt3V7-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605150631.QDJOt3V7-lkp@intel.com/
All warnings (new ones prefixed by >>):
net/core/dev.c: In function '__netif_receive_skb_core':
>> net/core/dev.c:5982:13: warning: unused variable 'redirect_depth' [-Wunused-variable]
5982 | int redirect_depth = 0;
| ^~~~~~~~~~~~~~
vim +/redirect_depth +5982 net/core/dev.c
5971
5972 static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
5973 struct packet_type **ppt_prev)
5974 {
5975 enum skb_drop_reason drop_reason = SKB_DROP_REASON_UNHANDLED_PROTO;
5976 struct packet_type *ptype, *pt_prev;
5977 rx_handler_func_t *rx_handler;
5978 struct sk_buff *skb = *pskb;
5979 struct net_device *orig_dev;
5980 bool deliver_exact = false;
5981 int ret = NET_RX_DROP;
> 5982 int redirect_depth = 0;
5983 __be16 type;
5984
5985 net_timestamp_check(!READ_ONCE(net_hotdata.tstamp_prequeue), skb);
5986
5987 trace_netif_receive_skb(skb);
5988
5989 orig_dev = skb->dev;
5990
5991 skb_reset_network_header(skb);
5992 #if !defined(CONFIG_DEBUG_NET)
5993 /* We plan to no longer reset the transport header here.
5994 * Give some time to fuzzers and dev build to catch bugs
5995 * in network stacks.
5996 */
5997 if (!skb_transport_header_was_set(skb))
5998 skb_reset_transport_header(skb);
5999 #endif
6000 skb_reset_mac_len(skb);
6001
6002 pt_prev = NULL;
6003
6004 another_round:
6005 skb->skb_iif = skb->dev->ifindex;
6006
6007 __this_cpu_inc(softnet_data.processed);
6008
6009 if (static_branch_unlikely(&generic_xdp_needed_key)) {
6010 int ret2;
6011
6012 migrate_disable();
6013 ret2 = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog),
6014 &skb);
6015 migrate_enable();
6016
6017 if (ret2 != XDP_PASS) {
6018 ret = NET_RX_DROP;
6019 goto out;
6020 }
6021 }
6022
6023 if (eth_type_vlan(skb->protocol)) {
6024 skb = skb_vlan_untag(skb);
6025 if (unlikely(!skb))
6026 goto out;
6027 }
6028
6029 if (skb_skip_tc_classify(skb))
6030 goto skip_classify;
6031
6032 if (pfmemalloc)
6033 goto skip_taps;
6034
6035 list_for_each_entry_rcu(ptype, &dev_net_rcu(skb->dev)->ptype_all,
6036 list) {
6037 if (unlikely(pt_prev))
6038 ret = deliver_skb(skb, pt_prev, orig_dev);
6039 pt_prev = ptype;
6040 }
6041
6042 list_for_each_entry_rcu(ptype, &skb->dev->ptype_all, list) {
6043 if (unlikely(pt_prev))
6044 ret = deliver_skb(skb, pt_prev, orig_dev);
6045 pt_prev = ptype;
6046 }
6047
6048 skip_taps:
6049 #ifdef CONFIG_NET_INGRESS
6050 if (static_branch_unlikely(&ingress_needed_key)) {
6051 bool another = false;
6052
6053 nf_skip_egress(skb, true);
6054 skb = sch_handle_ingress(skb, &pt_prev, &ret, orig_dev,
6055 &another);
6056 if (another) {
6057 if (unlikely(++redirect_depth > XMIT_RECURSION_LIMIT)) {
6058 net_warn_ratelimited(
6059 "%s: redirect loop limit reached, dropping (dev=%s)\n",
6060 __func__, skb->dev->name);
6061 drop_reason = SKB_DROP_REASON_TC_RECLASSIFY_LOOP;
6062 goto drop;
6063 }
6064 goto another_round;
6065 }
6066 if (!skb)
6067 goto out;
6068
6069 nf_skip_egress(skb, false);
6070 if (nf_ingress(skb, &pt_prev, &ret, orig_dev) < 0)
6071 goto out;
6072 }
6073 #endif
6074 skb_reset_redirect(skb);
6075 skip_classify:
6076 if (pfmemalloc && !skb_pfmemalloc_protocol(skb)) {
6077 drop_reason = SKB_DROP_REASON_PFMEMALLOC;
6078 goto drop;
6079 }
6080
6081 if (skb_vlan_tag_present(skb)) {
6082 if (unlikely(pt_prev)) {
6083 ret = deliver_skb(skb, pt_prev, orig_dev);
6084 pt_prev = NULL;
6085 }
6086 if (vlan_do_receive(&skb))
6087 goto another_round;
6088 else if (unlikely(!skb))
6089 goto out;
6090 }
6091
6092 rx_handler = rcu_dereference(skb->dev->rx_handler);
6093 if (rx_handler) {
6094 if (unlikely(pt_prev)) {
6095 ret = deliver_skb(skb, pt_prev, orig_dev);
6096 pt_prev = NULL;
6097 }
6098 switch (rx_handler(&skb)) {
6099 case RX_HANDLER_CONSUMED:
6100 ret = NET_RX_SUCCESS;
6101 goto out;
6102 case RX_HANDLER_ANOTHER:
6103 goto another_round;
6104 case RX_HANDLER_EXACT:
6105 deliver_exact = true;
6106 break;
6107 case RX_HANDLER_PASS:
6108 break;
6109 default:
6110 BUG();
6111 }
6112 }
6113
6114 if (unlikely(skb_vlan_tag_present(skb)) && !netdev_uses_dsa(skb->dev)) {
6115 check_vlan_id:
6116 if (skb_vlan_tag_get_id(skb)) {
6117 /* Vlan id is non 0 and vlan_do_receive() above couldn't
6118 * find vlan device.
6119 */
6120 skb->pkt_type = PACKET_OTHERHOST;
6121 } else if (eth_type_vlan(skb->protocol)) {
6122 /* Outer header is 802.1P with vlan 0, inner header is
6123 * 802.1Q or 802.1AD and vlan_do_receive() above could
6124 * not find vlan dev for vlan id 0.
6125 */
6126 __vlan_hwaccel_clear_tag(skb);
6127 skb = skb_vlan_untag(skb);
6128 if (unlikely(!skb))
6129 goto out;
6130 if (vlan_do_receive(&skb))
6131 /* After stripping off 802.1P header with vlan 0
6132 * vlan dev is found for inner header.
6133 */
6134 goto another_round;
6135 else if (unlikely(!skb))
6136 goto out;
6137 else
6138 /* We have stripped outer 802.1P vlan 0 header.
6139 * But could not find vlan dev.
6140 * check again for vlan id to set OTHERHOST.
6141 */
6142 goto check_vlan_id;
6143 }
6144 /* Note: we might in the future use prio bits
6145 * and set skb->priority like in vlan_do_receive()
6146 * For the time being, just ignore Priority Code Point
6147 */
6148 __vlan_hwaccel_clear_tag(skb);
6149 }
6150
6151 type = skb->protocol;
6152
6153 /* deliver only exact match when indicated */
6154 if (likely(!deliver_exact)) {
6155 deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
6156 &ptype_base[ntohs(type) &
6157 PTYPE_HASH_MASK]);
6158
6159 /* orig_dev and skb->dev could belong to different netns;
6160 * Even in such case we need to traverse only the list
6161 * coming from skb->dev, as the ptype owner (packet socket)
6162 * will use dev_net(skb->dev) to do namespace filtering.
6163 */
6164 deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
6165 &dev_net_rcu(skb->dev)->ptype_specific);
6166 }
6167
6168 deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
6169 &orig_dev->ptype_specific);
6170
6171 if (unlikely(skb->dev != orig_dev)) {
6172 deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
6173 &skb->dev->ptype_specific);
6174 }
6175
6176 if (pt_prev) {
6177 *ppt_prev = pt_prev;
6178 } else {
6179 drop:
6180 if (!deliver_exact)
6181 dev_core_stats_rx_dropped_inc(skb->dev);
6182 else
6183 dev_core_stats_rx_nohandler_inc(skb->dev);
6184
6185 kfree_skb_reason(skb, drop_reason);
6186 /* Jamal, now you will not able to escape explaining
6187 * me how you were going to use this. :-)
6188 */
6189 ret = NET_RX_DROP;
6190 }
6191
6192 out:
6193 /* The invariant here is that if *ppt_prev is not NULL
6194 * then skb should also be non-NULL.
6195 *
6196 * Apparently *ppt_prev assignment above holds this invariant due to
6197 * skb dereferencing near it.
6198 */
6199 *pskb = skb;
6200 return ret;
6201 }
6202
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 3+ messages in thread