* [PATCH net-next v8] l2tp: fix double dst_release() on sk_dst_cache race
@ 2025-12-15 14:55 Mikhail Lobanov
2025-12-16 16:35 ` Jakub Kicinski
0 siblings, 1 reply; 3+ messages in thread
From: Mikhail Lobanov @ 2025-12-15 14:55 UTC (permalink / raw)
To: David S. Miller
Cc: Mikhail Lobanov, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, David Bauer, James Chapman, netdev, linux-kernel,
lvc-project
A reproducible rcuref - imbalanced put() warning is observed under
IPv6 L2TP (pppol2tp) traffic with blackhole routes, indicating an
imbalance in dst reference counting for routes cached in
sk->sk_dst_cache and pointing to a subtle lifetime/synchronization
issue between the helpers that validate and drop cached dst entries.
rcuref - imbalanced put()
WARNING: CPU: 0 PID: 899 at lib/rcuref.c:266 rcuref_put_slowpath+0x1ce/0x240 lib/rcuref.>
Modules linked in:
CPSocket connected tcp:127.0.0.1:48148,server=on <-> 127.0.0.1:33750
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01>
RIP: 0010:rcuref_put_slowpath+0x1ce/0x240 lib/rcuref.c:266
Call Trace:
<TASK>
__rcuref_put include/linux/rcuref.h:97 [inline]
rcuref_put include/linux/rcuref.h:153 [inline]
dst_release+0x291/0x310 net/core/dst.c:167
__sk_dst_check+0x2d4/0x350 net/core/sock.c:604
__inet6_csk_dst_check net/ipv6/inet6_connection_sock.c:76 [inline]
inet6_csk_route_socket+0x6ed/0x10c0 net/ipv6/inet6_connection_sock.c:104
inet6_csk_xmit+0x12f/0x740 net/ipv6/inet6_connection_sock.c:121
l2tp_xmit_queue net/l2tp/l2tp_core.c:1214 [inline]
l2tp_xmit_core net/l2tp/l2tp_core.c:1309 [inline]
l2tp_xmit_skb+0x1404/0x1910 net/l2tp/l2tp_core.c:1325
pppol2tp_sendmsg+0x3ca/0x550 net/l2tp/l2tp_ppp.c:302
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg net/socket.c:744 [inline]
____sys_sendmsg+0xab2/0xc70 net/socket.c:2609
___sys_sendmsg+0x11d/0x1c0 net/socket.c:2663
__sys_sendmmsg+0x188/0x450 net/socket.c:2749
__do_sys_sendmmsg net/socket.c:2778 [inline]
__se_sys_sendmmsg net/socket.c:2775 [inline]
__x64_sys_sendmmsg+0x98/0x100 net/socket.c:2775
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x64/0x140 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fe6960ec719
</TASK>
The race occurs between the lockless UDPv6 transmit path
(udpv6_sendmsg() -> sk_dst_check()) and the locked L2TP/pppol2tp
transmit path (pppol2tp_sendmsg() -> l2tp_xmit_skb() ->
... -> inet6_csk_xmit() → __sk_dst_check()), when both handle
the same obsolete dst from sk->sk_dst_cache: the UDPv6 side takes
an extra reference and atomically steals and releases the cached
dst, while the L2TP side, using a stale cached pointer, still
calls dst_release() on it, and together these updates produce
an extra final dst_release() on that dst, triggering
rcuref - imbalanced put().
The Race Condition:
Initial:
sk->sk_dst_cache = dst
ref(dst) = 1
Thread 1: sk_dst_check() Thread 2: __sk_dst_>
------------------------ -------------------->
sk_dst_get(sk):
rcu_read_lock()
dst = rcu_dereference(sk->sk_dst_cache)
rcuref_get(dst) succeeds
rcu_read_unlock()
// ref = 2
dst = __sk_dst_>
// reads same dst from >
// ref still = 2 (no ex>
[both see dst obsolete & check() == NULL]
sk_dst_reset(sk):
old = xchg(&sk->sk_dst_cache, NULL)
// old = dst
dst_release(old)
// drop cached ref
// ref: 2 -> 1
RCU_INIT_POINTER(sk->sk_d>
// cache already NULL aft>
dst_release(dst)
// ref: 1 -> 0
dst_release(dst)
// tries to drop its own ref after final put
// rcuref_put_slowpath() -> "rcuref - imbalanced put()"
Make L2TP’s IPv6 transmit path stop using inet6_csk_xmit()
(and thus __sk_dst_check()) and instead open-code the same
routing and transmit sequence using ip6_sk_dst_lookup_flow()
and ip6_xmit(). The new code builds a flowi6 from the socket
fields in the same way as inet6_csk_route_socket(), then ca>
ip6_sk_dst_lookup_flow(), which internally relies on the lo>
sk_dst_check()/sk_dst_reset() pattern shared with UDPv6, and
attaches the resulting dst to the skb before invoking ip6_x>
This makes both the UDPv6 and L2TP IPv6 paths use the same
dst-cache handling logic for a given socket and removes the
possibility that sk_dst_check() and __sk_dst_check() concur>
drop the same cached dst and trigger the rcuref - imbalance>
warning under concurrent traffic.
Use a helper to pre-route IPv4 L2TP packets via sk_dst_check()
and ip_route_output_ports(), attach the resulting dst to the
skb, and then hand the skb to ip_queue_xmit(). With skb->dst
already set, __ip_queue_xmit() skips its __sk_dst_check()-based
dst cache handling, so IPv4 L2TP uses the same lockless
sk_dst_check() helper as UDPv4 for a given socket. This avoids
mixed sk_dst_check()/__sk_dst_check() users of sk->sk_dst_cache
and closes the same class of double dst_release() race on IPv4.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: b0270e91014d ("ipv4: add a sock pointer to ip_queue_xmit()")
Signed-off-by: Mikhail Lobanov <m.lobanov@rosa.ru>
---
v2: move fix to L2TP as suggested by Eric Dumazet.
v3: dropped the lockless sk_dst_check() pre-validation
and the extra sk_dst_get() reference; instead, under
the socket lock, mirror __sk_dst_check()’s condition
and invalidate the cached dst via sk_dst_reset(sk) so
the cache-owned ref is released exactly once via the
xchg-based helper.
v4: switch L2TP IPv6 xmit to open-coded (using sk_dst_check())
and test with tools/testing/selftests/net/l2tp.sh.
https://lore.kernel.org/netdev/a601c049-0926-418b-aa54-31686eea0a78@redhat.com/T/#t
v5: use sk_uid(sk) and add READ_ONCE() for sk_mark and
sk_bound_dev_if as suggested by Eric Dumazet.
v6: move IPv6 L2TP xmit into an open-coded helper using
ip6_sk_dst_lookup_flow() and sk_dst_check(), and add an
analogous open-coded IPv4 helper mirroring __ip_queue_xmit()
but using sk_dst_check() so both IPv4 and IPv6 L2TP paths
stop calling __sk_dst_check() and share the UDP-style dst
cache handling.
v7: Rework IPv4 L2TP xmit to pre-route via sk_dst_check()
and hand pre-routed skb to ip_queue_xmit(), avoiding
__sk_dst_check() on this socket and keeping IP options
handling in the core IPv4 stack.
https://lore.kernel.org/lkml/20251202110805.765fa71d@kernel.org/
v8: Resent as non-RFC for review, no functional changes
https://lore.kernel.org/lkml/20251203182434.327964-1-m.lobanov@rosa.ru/
net/l2tp/l2tp_core.c | 101 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 98 insertions(+), 3 deletions(-)
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 0710281dd95a..342e65db6eb8 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1202,19 +1202,114 @@ static int l2tp_build_l2tpv3_header(struct l2tp_session *session, void *buf)
return bufp - optr;
}
+#if IS_ENABLED(CONFIG_IPV6)
+static int l2tp_xmit_ipv6(struct sock *sk, struct sk_buff *skb)
+{
+ struct ipv6_pinfo *np = inet6_sk(sk);
+ struct inet_sock *inet = inet_sk(sk);
+ struct in6_addr *final_p, final;
+ struct ipv6_txoptions *opt;
+ struct dst_entry *dst;
+ struct flowi6 fl6;
+ int err;
+
+ memset(&fl6, 0, sizeof(fl6));
+ fl6.flowi6_proto = sk->sk_protocol;
+ fl6.daddr = sk->sk_v6_daddr;
+ fl6.saddr = np->saddr;
+ fl6.flowlabel = np->flow_label;
+ IP6_ECN_flow_xmit(sk, fl6.flowlabel);
+
+ fl6.flowi6_oif = READ_ONCE(sk->sk_bound_dev_if);
+ fl6.flowi6_mark = READ_ONCE(sk->sk_mark);
+ fl6.fl6_sport = inet->inet_sport;
+ fl6.fl6_dport = inet->inet_dport;
+ fl6.flowi6_uid = sk_uid(sk);
+
+ security_sk_classify_flow(sk, flowi6_to_flowi_common(&fl6));
+
+ rcu_read_lock();
+ opt = rcu_dereference(np->opt);
+ final_p = fl6_update_dst(&fl6, opt, &final);
+
+ dst = ip6_sk_dst_lookup_flow(sk, &fl6, final_p, true);
+ if (IS_ERR(dst)) {
+ rcu_read_unlock();
+ kfree_skb(skb);
+ return NET_XMIT_DROP;
+ }
+
+ skb_dst_set(skb, dst);
+ fl6.daddr = sk->sk_v6_daddr;
+
+ err = ip6_xmit(sk, skb, &fl6, READ_ONCE(sk->sk_mark),
+ opt, np->tclass,
+ READ_ONCE(sk->sk_priority));
+ rcu_read_unlock();
+ return err;
+}
+#endif
+
+static int l2tp_xmit_ipv4(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
+{
+ struct inet_sock *inet = inet_sk(sk);
+ struct net *net = sock_net(sk);
+ struct ip_options_rcu *inet_opt;
+ struct flowi4 *fl4;
+ struct rtable *rt;
+ __u8 tos;
+ int err;
+
+ rcu_read_lock();
+ inet_opt = rcu_dereference(inet->inet_opt);
+ fl4 = &fl->u.ip4;
+ tos = READ_ONCE(inet->tos);
+
+ rt = dst_rtable(sk_dst_check(sk, 0));
+ if (!rt) {
+ __be32 daddr = inet->inet_daddr;
+
+ if (inet_opt && inet_opt->opt.srr)
+ daddr = inet_opt->opt.faddr;
+
+ rt = ip_route_output_ports(net, fl4, sk,
+ daddr, inet->inet_saddr,
+ inet->inet_dport,
+ inet->inet_sport,
+ sk->sk_protocol,
+ tos & INET_DSCP_MASK,
+ READ_ONCE(sk->sk_bound_dev_if));
+ if (IS_ERR(rt)) {
+ rcu_read_unlock();
+ IP_INC_STATS(net, IPSTATS_MIB_OUTNOROUTES);
+ kfree_skb_reason(skb, SKB_DROP_REASON_IP_OUTNOROUTES);
+ return -EHOSTUNREACH;
+ }
+
+ sk_setup_caps(sk, &rt->dst);
+ }
+
+ skb_dst_set_noref(skb, &rt->dst);
+ rcu_read_unlock();
+
+ err = ip_queue_xmit(sk, skb, fl);
+ return err;
+}
+
/* Queue the packet to IP for output: tunnel socket lock must be held */
static int l2tp_xmit_queue(struct l2tp_tunnel *tunnel, struct sk_buff *skb, struct flowi *fl)
{
int err;
+ struct sock *sk = tunnel->sock;
skb->ignore_df = 1;
skb_dst_drop(skb);
#if IS_ENABLED(CONFIG_IPV6)
- if (l2tp_sk_is_v6(tunnel->sock))
- err = inet6_csk_xmit(tunnel->sock, skb, NULL);
+ if (l2tp_sk_is_v6(sk))
+ err = l2tp_xmit_ipv6(sk, skb);
else
#endif
- err = ip_queue_xmit(tunnel->sock, skb, fl);
+ err = l2tp_xmit_ipv4(sk, skb, fl);
return err >= 0 ? NET_XMIT_SUCCESS : NET_XMIT_DROP;
}
--
2.47.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net-next v8] l2tp: fix double dst_release() on sk_dst_cache race
2025-12-15 14:55 [PATCH net-next v8] l2tp: fix double dst_release() on sk_dst_cache race Mikhail Lobanov
@ 2025-12-16 16:35 ` Jakub Kicinski
2026-03-06 7:44 ` Li Xiasong
0 siblings, 1 reply; 3+ messages in thread
From: Jakub Kicinski @ 2025-12-16 16:35 UTC (permalink / raw)
To: Mikhail Lobanov
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
David Bauer, James Chapman, netdev, linux-kernel, lvc-project
On Mon, 15 Dec 2025 17:55:33 +0300 Mikhail Lobanov wrote:
> A reproducible rcuref - imbalanced put() warning is observed under
> IPv6 L2TP (pppol2tp) traffic with blackhole routes, indicating an
> imbalance in dst reference counting for routes cached in
> sk->sk_dst_cache and pointing to a subtle lifetime/synchronization
> issue between the helpers that validate and drop cached dst entries.
This seems to be causing a leak:
unreferenced object 0xffff888017774e40 (size 64):
comm "ip", pid 3486, jiffies 4298584595
hex dump (first 32 bytes):
10 00 00 00 e9 01 00 00 01 00 00 00 00 00 00 00 ................
ff ff ff ff 00 00 00 00 48 4e 77 17 80 88 ff ff ........HNw.....
backtrace (crc b523287):
__kmalloc_node_noprof+0x579/0x8c0
crypto_alloc_tfmmem.isra.0+0x2e/0xf0
crypto_create_tfm_node+0x81/0x2e0
crypto_spawn_tfm2+0x4e/0x80
crypto_rfc4106_init_tfm+0x41/0x190
crypto_create_tfm_node+0x108/0x2e0
crypto_spawn_tfm2+0x4e/0x80
aead_init_geniv+0x14c/0x2a0
crypto_create_tfm_node+0x108/0x2e0
crypto_alloc_tfm_node+0xe0/0x1d0
esp_init_aead.constprop.0+0xe4/0x340
esp_init_state+0x83/0x4c0
__xfrm_init_state+0x6f2/0x13d0
xfrm_state_construct+0x1455/0x2480 [xfrm_user]
xfrm_add_sa+0x137/0x3c0 [xfrm_user]
xfrm_user_rcv_msg+0x502/0x920 [xfrm_user]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net-next v8] l2tp: fix double dst_release() on sk_dst_cache race
2025-12-16 16:35 ` Jakub Kicinski
@ 2026-03-06 7:44 ` Li Xiasong
0 siblings, 0 replies; 3+ messages in thread
From: Li Xiasong @ 2026-03-06 7:44 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
David Bauer, James Chapman, netdev, linux-kernel, lvc-project,
Mikhail Lobanov, weiyongjun1, yuehaibing, zhangchangzhong
Hi Jakub,
On 12/17/2025 12:35 AM, Jakub Kicinski wrote:
> On Mon, 15 Dec 2025 17:55:33 +0300 Mikhail Lobanov wrote:
>> A reproducible rcuref - imbalanced put() warning is observed under
>> IPv6 L2TP (pppol2tp) traffic with blackhole routes, indicating an
>> imbalance in dst reference counting for routes cached in
>> sk->sk_dst_cache and pointing to a subtle lifetime/synchronization
>> issue between the helpers that validate and drop cached dst entries.
>
> This seems to be causing a leak:
>
> unreferenced object 0xffff888017774e40 (size 64):
> comm "ip", pid 3486, jiffies 4298584595
> hex dump (first 32 bytes):
> 10 00 00 00 e9 01 00 00 01 00 00 00 00 00 00 00 ................
> ff ff ff ff 00 00 00 00 48 4e 77 17 80 88 ff ff ........HNw.....
> backtrace (crc b523287):
> __kmalloc_node_noprof+0x579/0x8c0
> crypto_alloc_tfmmem.isra.0+0x2e/0xf0
> crypto_create_tfm_node+0x81/0x2e0
> crypto_spawn_tfm2+0x4e/0x80
> crypto_rfc4106_init_tfm+0x41/0x190
> crypto_create_tfm_node+0x108/0x2e0
> crypto_spawn_tfm2+0x4e/0x80
> aead_init_geniv+0x14c/0x2a0
> crypto_create_tfm_node+0x108/0x2e0
> crypto_alloc_tfm_node+0xe0/0x1d0
> esp_init_aead.constprop.0+0xe4/0x340
> esp_init_state+0x83/0x4c0
> __xfrm_init_state+0x6f2/0x13d0
> xfrm_state_construct+0x1455/0x2480 [xfrm_user]
> xfrm_add_sa+0x137/0x3c0 [xfrm_user]
> xfrm_user_rcv_msg+0x502/0x920 [xfrm_user]
I'm writing to follow up on Mikhail's patch, which aims to fix a race
condition in sk_dst_cache.
I've encountered the exact same issue. Specifically, the problem occurs
when using L2TP with a UDP socket file descriptor specified via netlink.
Below is the call stack from our system where the issue manifested:
rcuref - imbalanced put()
WARNING: CPU: 13 PID: 1222 at lib/rcuref.c:266 rcuref_put_slowpath+0x10c/0x120
Modules linked in:
CPU: 13 PID: 1222 Comm: l2tp Not tainted 6.6.0+ #9
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:rcuref_put_slowpath+0x10c/0x120
Code: 83 e4 01 31 ff 44 89 e6 e8 21 bb 53 ff 45 84 e4 75 1a e8 f7 c4 53 ff
48 c7 c7 4e ee c8 86 c6 05 41 22 01 06 01 e8 a4 10 39 ff <0f> 0b e8 dd c4
53 ff c7 03 00 00 00 e0 e9 2d ff ff ff 66 90 90 90
RSP: 0018:ffffc90006b7b9d8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88811df01740 RCX: ffffffff8134b438
RDX: ffff8881223fd080 RSI: 0000000000000000 RDI: 0000000000000001
RBP: 00000000dfffffff R08: 0000000000000000 R09: 6e616c61626d6920
R10: 0000000000000000 R11: 2928747570206465 R12: 0000000000000000
R13: ffff88811df01740 R14: 00000000ffffffea R15: 0000000000000000
FS: 00007f5dd7d22500(0000) GS:ffff88813b480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000ee11668 CR3: 000000012259e000 CR4: 00000000000006e0
Call Trace:
<TASK>
dst_release+0x141/0x160
ip6_dst_lookup_tail.constprop.0+0xa0/0x9b0
? sysvec_apic_timer_interrupt+0xf/0x90
? asm_sysvec_apic_timer_interrupt+0x1a/0x20
? sk_dst_check+0x1c/0x160
ip6_dst_lookup_flow+0x54/0xe0
ip6_sk_dst_lookup_flow+0x1f6/0x2d0
udpv6_sendmsg+0xc08/0x1760
? __pfx_ip_generic_getfrag+0x10/0x10
? sysvec_apic_timer_interrupt+0xf/0x90
? asm_sysvec_apic_timer_interrupt+0x1a/0x20
? update_curr+0x35/0x3e0
? __pfx_udpv6_sendmsg+0x10/0x10
? inet6_sendmsg+0xb2/0xd0
inet6_sendmsg+0xb2/0xd0
__sock_sendmsg+0x60/0x120
__sys_sendto+0x17a/0x230
? timerqueue_add+0xe1/0x140
? sched_clock+0x37/0x60
? sched_clock_cpu+0x11/0x1b0
? irqtime_account_irq+0x53/0x100
? handle_softirqs+0x189/0x320
__x64_sys_sendto+0x2d/0x40
do_syscall_64+0x5b/0x110
entry_SYSCALL_64_after_hwframe+0x78/0xe2
RIP: 0033:0x7f5dd7722c4d
Code: ff ff ff ff eb b6 0f 1f 80 00 00 00 00 48 8d 05 c1 dc 2c 00 41 89 ca
8b 00 85 c0 75 20 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff
ff 77 6b f3 c3 66 0f 1f 84 00 00 00 00 00 41 56 41
RSP: 002b:00007ffcbf9e36d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5dd7722c4d
RDX: 0000000000000009 RSI: 000055cefbc01a57 RDI: 0000000000000003
RBP: 00007ffcbf9e4790 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000055cefbc00bf0
R13: 00007ffcbf9e4870 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Our investigation led us to the same root cause that Mikhail identified. I
have applied his patch and can confirm that it resolves the issue perfectly.
Thank you for looking into this previously. We noted your concern about a
potential memory leak during your review. We have been running the patch
but haven't been able to reproduce the leak ourselves.
Could you perhaps share the specific conditions or steps required to
trigger the memory leak? Any details you can provide would be very helpful
for us to further validate the patch or identify the root cause of the leak.
Thanks for your time and effort on this.
Best regards,
Li Xiasong
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-06 7:44 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-15 14:55 [PATCH net-next v8] l2tp: fix double dst_release() on sk_dst_cache race Mikhail Lobanov
2025-12-16 16:35 ` Jakub Kicinski
2026-03-06 7:44 ` Li Xiasong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox