public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v2] net: fix __this_cpu_add() in preemptible code in dev_xmit_recursion_inc/dec
@ 2026-04-10  2:06 Jiayuan Chen
  2026-04-10  8:55 ` Eric Dumazet
  2026-04-12 16:33 ` Jakub Kicinski
  0 siblings, 2 replies; 3+ messages in thread
From: Jiayuan Chen @ 2026-04-10  2:06 UTC (permalink / raw)
  To: netdev
  Cc: Jiayuan Chen, David S. Miller, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Weiming Shi,
	linux-kernel

dev_xmit_recursion_{inc,dec}() use __this_cpu_{inc,dec}() which requires
the caller to be non-preemptible in order to avoid cpu migration. However,
some callers like SCTP's UDP encapsulation path invoke iptunnel_xmit()
from process context without disabling BH or preemption:

  sctp_inet_connect -> __sctp_connect -> sctp_do_sm ->
  sctp_outq_flush -> sctp_packet_transmit -> sctp_v4_xmit ->
  udp_tunnel_xmit_skb -> iptunnel_xmit -> dev_xmit_recursion_inc

This triggers the following warning on PREEMPT(full) kernels:

  BUG: using __this_cpu_add() in preemptible [00000000]
  caller is dev_xmit_recursion_inc include/linux/netdevice.h:3595 [inline]
  caller is iptunnel_xmit+0x1cd/0xb80 net/ipv4/ip_tunnel_core.c:72
  Tainted: [L]=SOFTLOCKUP
  Call Trace:
   <TASK>
   __dump_stack lib/dump_stack.c:94 [inline]
   dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
   check_preemption_disabled+0xd8/0xe0 lib/smp_processor_id.c:47
   dev_xmit_recursion_inc include/linux/netdevice.h:3595 [inline]
   iptunnel_xmit+0x1cd/0xb80 net/ipv4/ip_tunnel_core.c:72
   sctp_v4_xmit+0x75f/0x1060 net/sctp/protocol.c:1073
   sctp_packet_transmit+0x22ec/0x3060 net/sctp/output.c:653
   sctp_packet_singleton+0x19e/0x370 net/sctp/outqueue.c:783
   sctp_outq_flush_ctrl net/sctp/outqueue.c:914 [inline]
   sctp_outq_flush+0x315/0x3350 net/sctp/outqueue.c:1212
   sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1824 [inline]
   sctp_side_effects net/sctp/sm_sideeffect.c:1204 [inline]
   sctp_do_sm+0xce1/0x5be0 net/sctp/sm_sideeffect.c:1175
   sctp_primitive_ASSOCIATE+0x9c/0xd0 net/sctp/primitive.c:73
   __sctp_connect+0x9fc/0xc70 net/sctp/socket.c:1235
   sctp_connect net/sctp/socket.c:4818 [inline]
   sctp_inet_connect+0x15f/0x220 net/sctp/socket.c:4833
   __sys_connect_file+0x141/0x1a0 net/socket.c:2089
   __sys_connect+0x141/0x170 net/socket.c:2108
   __do_sys_connect net/socket.c:2114 [inline]
   __se_sys_connect net/socket.c:2111 [inline]
   __x64_sys_connect+0x72/0xb0 net/socket.c:2111
   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

All other callers of dev_xmit_recursion_{inc,dec}() are fine: those in
net/core/dev.c and net/core/filter.c run under local_bh_disable(), and
lwtunnel_input() asserts in_softirq() context. Currently only
iptunnel_xmit() and ip6tunnel_xmit() can be reached from process
context via the SCTP UDP encapsulation path.

Fix this by adding guard(migrate)() at the top of iptunnel_xmit() and
ip6tunnel_xmit() to ensure dev_recursion_level(), dev_xmit_recursion_inc()
and dev_xmit_recursion_dec() all run on the same CPU.

Fixes: 6f1a9140ecda ("net: add xmit recursion limit to tunnel xmit functions")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
v1->v2: https://lore.kernel.org/netdev/20260409035344.214279-1-jiayuan.chen@linux.dev/
 - Move guard(migrate)() to iptunnel_xmit()/ip6tunnel_xmit() instead
   of dev_xmit_recursion_{inc,dec}(), so that dev_recursion_level() is
   also covered under the same migration protection.
 - Revert changes to include/linux/netdevice.h.
---
 include/net/ip6_tunnel.h  | 2 ++
 net/ipv4/ip_tunnel_core.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index 359b595f1df9..3f877164233c 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -156,6 +156,8 @@ static inline void ip6tunnel_xmit(struct sock *sk, struct sk_buff *skb,
 {
 	int pkt_len, err;
 
+	guard(migrate)();
+
 	if (unlikely(dev_recursion_level() > IP_TUNNEL_RECURSION_LIMIT)) {
 		if (dev) {
 			net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n",
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 5683c328990f..808b8eaf7fad 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -58,6 +58,8 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 	struct iphdr *iph;
 	int err;
 
+	guard(migrate)();
+
 	if (unlikely(dev_recursion_level() > IP_TUNNEL_RECURSION_LIMIT)) {
 		if (dev) {
 			net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n",
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net v2] net: fix __this_cpu_add() in preemptible code in dev_xmit_recursion_inc/dec
  2026-04-10  2:06 [PATCH net v2] net: fix __this_cpu_add() in preemptible code in dev_xmit_recursion_inc/dec Jiayuan Chen
@ 2026-04-10  8:55 ` Eric Dumazet
  2026-04-12 16:33 ` Jakub Kicinski
  1 sibling, 0 replies; 3+ messages in thread
From: Eric Dumazet @ 2026-04-10  8:55 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: netdev, David S. Miller, David Ahern, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Weiming Shi, linux-kernel

On Thu, Apr 9, 2026 at 7:06 PM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> dev_xmit_recursion_{inc,dec}() use __this_cpu_{inc,dec}() which requires
> the caller to be non-preemptible in order to avoid cpu migration. However,
> some callers like SCTP's UDP encapsulation path invoke iptunnel_xmit()
> from process context without disabling BH or preemption:
>
>   sctp_inet_connect -> __sctp_connect -> sctp_do_sm ->
>   sctp_outq_flush -> sctp_packet_transmit -> sctp_v4_xmit ->
>   udp_tunnel_xmit_skb -> iptunnel_xmit -> dev_xmit_recursion_inc
>
> This triggers the following warning on PREEMPT(full) kernels:
>
>

> All other callers of dev_xmit_recursion_{inc,dec}() are fine: those in
> net/core/dev.c and net/core/filter.c run under local_bh_disable(), and
> lwtunnel_input() asserts in_softirq() context. Currently only
> iptunnel_xmit() and ip6tunnel_xmit() can be reached from process
> context via the SCTP UDP encapsulation path.
>
> Fix this by adding guard(migrate)() at the top of iptunnel_xmit() and
> ip6tunnel_xmit() to ensure dev_recursion_level(), dev_xmit_recursion_inc()
> and dev_xmit_recursion_dec() all run on the same CPU.
>
> Fixes: 6f1a9140ecda ("net: add xmit recursion limit to tunnel xmit functions")
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks!

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net v2] net: fix __this_cpu_add() in preemptible code in dev_xmit_recursion_inc/dec
  2026-04-10  2:06 [PATCH net v2] net: fix __this_cpu_add() in preemptible code in dev_xmit_recursion_inc/dec Jiayuan Chen
  2026-04-10  8:55 ` Eric Dumazet
@ 2026-04-12 16:33 ` Jakub Kicinski
  1 sibling, 0 replies; 3+ messages in thread
From: Jakub Kicinski @ 2026-04-12 16:33 UTC (permalink / raw)
  To: Jiayuan Chen, Eric Dumazet
  Cc: netdev, David S. Miller, David Ahern, Paolo Abeni, Simon Horman,
	Weiming Shi, linux-kernel

On Fri, 10 Apr 2026 10:06:30 +0800 Jiayuan Chen wrote:
> dev_xmit_recursion_{inc,dec}() use __this_cpu_{inc,dec}() which requires
> the caller to be non-preemptible in order to avoid cpu migration. However,
> some callers like SCTP's UDP encapsulation path invoke iptunnel_xmit()
> from process context without disabling BH or preemption:
> 
>   sctp_inet_connect -> __sctp_connect -> sctp_do_sm ->
>   sctp_outq_flush -> sctp_packet_transmit -> sctp_v4_xmit ->
>   udp_tunnel_xmit_skb -> iptunnel_xmit -> dev_xmit_recursion_inc

Eric, weren't there also a bunch of RCU reports because of this path?
Should we perhaps take the RCU read lock here?

> +	guard(migrate)();

Sorry but I detest the guard() usage. Please use migrate_disable()

Quoting documentation:

  Use of ``guard()`` is discouraged within any function longer than 20
  lines, ``scoped_guard()`` is considered more readable. Using normal
  lock/unlock is still (weakly) preferred.

See:
https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#using-device-managed-and-cleanup-h-constructs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-12 16:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-10  2:06 [PATCH net v2] net: fix __this_cpu_add() in preemptible code in dev_xmit_recursion_inc/dec Jiayuan Chen
2026-04-10  8:55 ` Eric Dumazet
2026-04-12 16:33 ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox