* [PATCH 0/7] pull request (net): ipsec 2026-06-22
@ 2026-06-22 7:57 Steffen Klassert
2026-06-22 7:57 ` [PATCH 1/7] xfrm: use compat translator only for u64 alignment mismatch Steffen Klassert
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
1) xfrm: use compat translator only for u64 alignment mismatch
Gate the XFRM_USER_COMPAT translator on COMPAT_FOR_U64_ALIGNMENT
so 32-bit compat tasks on arches whose 32-bit ABI already matches
the native 64-bit layout are no longer rejected with -EOPNOTSUPP.
From Sanman Pradhan.
2) net: af_key: initialize alg_key_len for IPComp states
Initialize the alg_key_len to 0 in the IPComp branch of
pfkey_msg2xfrm_state() so an uninitialized value cannot drive
xfrm_alg_len() into a slab-out-of-bounds kmemdup during
XFRM_MSG_MIGRATE. From Zijing Yin.
3) xfrm: Fix dev use-after-free in xfrm async resumption
Stash the original skb->dev and extend the RCU critical section
across xfrm_rcv_cb() and transport_finish() to prevent a
tunnel-device UAF and original-device refcount leak when a
callback replaces skb->dev. From Dong Chenchen.
4) xfrm: Fix xfrm state cache insertion race
Move the state-validity check inside xfrm_state_lock in the
input state cache insertion path so a state cannot be killed
between the check and the insert. From Herbert Xu.
5) xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]
Add READ_ONCE()/WRITE_ONCE() annotations on xfrm_policy_count
and xfrm_policy_default to silence the KCSAN data race reported
on net->xfrm.policy_count. From Eric Dumazet.
6) espintcp: use sk_msg_free_partial to fix partial send
Replace the manual skmsg accounting in espintcp with
sk_msg_free_partial() so the skmsg stays consistent on every
iteration and the partial-send accounting bugs go away.
From Sabrina Dubroca.
7) xfrm: validate selector family and prefixlen during match
Reject mismatched address families in xfrm_selector_match() and
bound prefixlen in addr4_match()/addr_match() to prevent the
shift-out-of-bounds syzbot reported when an AF_UNSPEC selector
with a large prefixlen is matched against an IPv4 flow.
From Eric Dumazet.
Please pull or let me know if there are problems.
Thanks!
The following changes since commit 9bf10032894f429b3e221de63cf95a8544511a90:
Merge branch 'tipc-fix-netlink-gate-and-receive-path-bugs' (2026-06-11 16:01:19 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git tags/ipsec-2026-06-22
for you to fetch changes up to 40f0b1047918539f0b0f795ac65e35336b4c2c78:
xfrm: validate selector family and prefixlen during match (2026-06-17 11:17:27 +0200)
----------------------------------------------------------------
ipsec-2026-06-22
----------------------------------------------------------------
Dong Chenchen (1):
xfrm: Fix dev use-after-free in xfrm async resumption
Eric Dumazet (2):
xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]
xfrm: validate selector family and prefixlen during match
Herbert Xu (1):
xfrm: Fix xfrm state cache insertion race
Sabrina Dubroca (1):
espintcp: use sk_msg_free_partial to fix partial send
Sanman Pradhan (1):
xfrm: use compat translator only for u64 alignment mismatch
Zijing Yin (1):
net: af_key: initialize alg_key_len for IPComp states
include/net/xfrm.h | 15 +++++++++++----
net/ipv4/xfrm4_input.c | 2 --
net/ipv6/xfrm6_input.c | 2 --
net/key/af_key.c | 1 +
net/xfrm/espintcp.c | 34 +++++++---------------------------
net/xfrm/xfrm_input.c | 29 ++++++++++++++++-------------
net/xfrm/xfrm_policy.c | 27 +++++++++++++++------------
net/xfrm/xfrm_state.c | 23 +++++++++++++++--------
net/xfrm/xfrm_user.c | 20 ++++++++++----------
9 files changed, 75 insertions(+), 78 deletions(-)
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/7] xfrm: use compat translator only for u64 alignment mismatch
2026-06-22 7:57 [PATCH 0/7] pull request (net): ipsec 2026-06-22 Steffen Klassert
@ 2026-06-22 7:57 ` Steffen Klassert
2026-06-22 7:57 ` [PATCH 2/7] net: af_key: initialize alg_key_len for IPComp states Steffen Klassert
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Sanman Pradhan <psanman@juniper.net>
The XFRM compat layer (CONFIG_XFRM_USER_COMPAT) translates 32-bit xfrm
netlink and setsockopt messages into the native 64-bit layout. It is
only needed on architectures where the 32-bit and 64-bit ABIs disagree
on u64 alignment, which the kernel encodes as COMPAT_FOR_U64_ALIGNMENT.
That symbol is defined only by arch/x86. XFRM_USER_COMPAT depends on it,
so the translator can never be built on any other architecture,
including arm64, which still provides a 32-bit compat ABI (CONFIG_COMPAT)
for AArch32 EL0 userspace. On arm64 the AArch32 EABI already aligns u64
to 8 bytes, identical to the AArch64 ABI, so no translation is required
and the native code path is correct for 32-bit tasks.
However, xfrm_user_rcv_msg() and xfrm_user_policy() gate on
in_compat_syscall() alone and then call xfrm_get_translator(), which
returns NULL when no translator is registered. On arm64 that is always
the case, so every xfrm netlink message and the XFRM_POLICY setsockopt
issued by a 32-bit task returns -EOPNOTSUPP. A 32-bit userspace process
on arm64 (and on any other arch with CONFIG_COMPAT but without
COMPAT_FOR_U64_ALIGNMENT) therefore cannot configure XFRM state or
policy through the XFRM_USER netlink API, and cannot use the XFRM_POLICY
setsockopt path, because both fail before reaching the native parser.
The translator series replaced the blanket compat rejection with a
translator lookup. That made the path usable on x86 when the translator
is available, but left architectures that cannot build the translator
permanently rejected even when their compat layout already matches the
native layout. Let those architectures use the native parser instead.
Gate the translator requirement on COMPAT_FOR_U64_ALIGNMENT instead of
on in_compat_syscall() alone. Gating on the ABI property rather than on
CONFIG_XFRM_USER_COMPAT is deliberate: on x86 with IA32_EMULATION=y but
XFRM_USER_COMPAT=n, a 32-bit task must still be rejected rather than
routed through the native parser, which would misread genuinely
4-byte-aligned x86-32 messages. COMPAT_FOR_U64_ALIGNMENT is the ABI
property that makes the XFRM translator mandatory.
Only the receive/input direction needs the guard. The send, dump and
notification paths already call the translator as "if (xtr) { ... }"
with no error on NULL, so on arches without a translator they no-op and
the kernel emits native 64-bit-layout messages, which is what an AArch32
task expects.
Tested on Juniper SRX hardware: with the fix, 32-bit IPsec userspace
netlink and XFRM_POLICY setsockopt operations that previously failed
with -EOPNOTSUPP now succeed; x86 behaviour is unchanged by inspection.
Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator")
Fixes: 96392ee5a13b ("xfrm/compat: Translate 32-bit user_policy from sockptr")
Cc: stable@vger.kernel.org
Signed-off-by: Sanman Pradhan <psanman@juniper.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_state.c | 2 +-
net/xfrm/xfrm_user.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 589c3b6e4679..d8457ceaf28c 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2976,7 +2976,7 @@ int xfrm_user_policy(struct sock *sk, int optname, sockptr_t optval, int optlen)
if (IS_ERR(data))
return PTR_ERR(data);
- if (in_compat_syscall()) {
+ if (IS_ENABLED(CONFIG_COMPAT_FOR_U64_ALIGNMENT) && in_compat_syscall()) {
struct xfrm_translator *xtr = xfrm_get_translator();
if (!xtr) {
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 71a4b7278eba..3b1cf29bc402 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -3472,7 +3472,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
if (!netlink_net_capable(skb, CAP_NET_ADMIN))
return -EPERM;
- if (in_compat_syscall()) {
+ if (IS_ENABLED(CONFIG_COMPAT_FOR_U64_ALIGNMENT) && in_compat_syscall()) {
struct xfrm_translator *xtr = xfrm_get_translator();
if (!xtr)
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/7] net: af_key: initialize alg_key_len for IPComp states
2026-06-22 7:57 [PATCH 0/7] pull request (net): ipsec 2026-06-22 Steffen Klassert
2026-06-22 7:57 ` [PATCH 1/7] xfrm: use compat translator only for u64 alignment mismatch Steffen Klassert
@ 2026-06-22 7:57 ` Steffen Klassert
2026-06-22 7:57 ` [PATCH 3/7] xfrm: Fix dev use-after-free in xfrm async resumption Steffen Klassert
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Zijing Yin <yzjaurora@gmail.com>
pfkey_msg2xfrm_state() handles the IPComp (SADB_X_SATYPE_IPCOMP) case by
allocating x->calg and copying only the algorithm name:
x->calg = kmalloc_obj(*x->calg);
if (!x->calg) {
err = -ENOMEM;
goto out;
}
strcpy(x->calg->alg_name, a->name);
x->props.calgo = sa->sadb_sa_encrypt;
Unlike the authentication (x->aalg) and encryption (x->ealg) branches of
the same function, the compression branch never initializes
calg->alg_key_len. IPComp carries no key and the allocation only
reserves sizeof(struct xfrm_algo) (i.e. no room for a key), so the field
is left containing uninitialized slab data.
calg->alg_key_len is later used as a length by xfrm_algo_clone() when an
IPComp state is cloned during XFRM_MSG_MIGRATE:
xfrm_state_migrate()
xfrm_state_clone_and_setup()
x->calg = xfrm_algo_clone(orig->calg);
kmemdup(orig, xfrm_alg_len(orig));
where xfrm_alg_len() returns sizeof(*alg) + (alg_key_len + 7) / 8. With
a non-zero garbage alg_key_len, kmemdup() reads past the end of the
68-byte calg object. Adding an IPComp SA via PF_KEY and then migrating
it triggers (net-next, KASAN, init_on_alloc=0):
BUG: KASAN: slab-out-of-bounds in kmemdup_noprof+0x44/0x60
Read of size 4164 at addr ff11000025a74980 by task diag2/9287
CPU: 3 UID: 0 PID: 9287 Comm: diag2 7.1.0-rc6-g903db046d557 #1
Call Trace:
<TASK>
dump_stack_lvl+0x10e/0x1f0
print_report+0xf7/0x600
kasan_report+0xe4/0x120
kasan_check_range+0x105/0x1b0
__asan_memcpy+0x23/0x60
kmemdup_noprof+0x44/0x60
xfrm_state_migrate+0x70a/0x1da0
xfrm_migrate+0x753/0x18a0
xfrm_do_migrate+0xb47/0xf10
xfrm_user_rcv_msg+0x411/0xb50
netlink_rcv_skb+0x158/0x420
xfrm_netlink_rcv+0x71/0x90
netlink_unicast+0x584/0x850
netlink_sendmsg+0x8b0/0xdc0
____sys_sendmsg+0x9f7/0xb90
___sys_sendmsg+0x134/0x1d0
__sys_sendmsg+0x16d/0x220
do_syscall_64+0x116/0x7d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
Allocated by task 9287:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
__kasan_kmalloc+0xaa/0xb0
pfkey_add+0x2652/0x2ea0
pfkey_process+0x6d0/0x830
pfkey_sendmsg+0x42c/0x850
__sys_sendto+0x461/0x4b0
__x64_sys_sendto+0xe0/0x1c0
do_syscall_64+0x116/0x7d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The buggy address belongs to the object at ff11000025a74980
which belongs to the cache kmalloc-96 of size 96
The buggy address is located 0 bytes inside of
allocated 68-byte region [ff11000025a74980, ff11000025a749c4)
Depending on the uninitialized value the same field can instead request
an oversized kmemdup() allocation and make the migration clone fail.
The XFRM netlink path is not affected: verify_one_alg() rejects an
XFRMA_ALG_COMP attribute shorter than xfrm_alg_len(), so a calg added via
XFRM_MSG_NEWSA is always self-consistent.
Initialize calg->alg_key_len to 0, matching the aalg/ealg branches.
Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Cc: stable@vger.kernel.org
Signed-off-by: Zijing Yin <yzjaurora@gmail.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/key/af_key.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 9cffeef18cd9..3216f897a305 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1218,6 +1218,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net,
goto out;
}
strcpy(x->calg->alg_name, a->name);
+ x->calg->alg_key_len = 0;
x->props.calgo = sa->sadb_sa_encrypt;
} else {
int keysize = 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/7] xfrm: Fix dev use-after-free in xfrm async resumption
2026-06-22 7:57 [PATCH 0/7] pull request (net): ipsec 2026-06-22 Steffen Klassert
2026-06-22 7:57 ` [PATCH 1/7] xfrm: use compat translator only for u64 alignment mismatch Steffen Klassert
2026-06-22 7:57 ` [PATCH 2/7] net: af_key: initialize alg_key_len for IPComp states Steffen Klassert
@ 2026-06-22 7:57 ` Steffen Klassert
2026-06-22 7:57 ` [PATCH 4/7] xfrm: Fix xfrm state cache insertion race Steffen Klassert
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Dong Chenchen <dongchenchen2@huawei.com>
xfrm async resumption hold skb->dev refcnt until after transport_finish.
However, xfrm_rcv_cb may modify skb->dev to tunnel dev without taking
device reference, such as vti_rcv_cb. The subsequent async resumption
will decrement the tunnel device's reference count, which lead to uaf
of tunnel dev and refcnt leak of orig dev as below:
unregister_netdevice: waiting for vti1 to become free. Usage count = -2
Stash the original skb->dev to fix refcnt imbalance. The new skb->dev set
by xfrm_rcv_cb can race with device teardown. Extend rcu protection over
xfrm_rcv_cb and transport_finish to prevent races.
Fixes: 1c428b038400 ("xfrm: hold dev ref until after transport_finish NF_HOOK")
Reported-by: Xu Chunxiao <xuchunxiao3@huawei.com>
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/ipv4/xfrm4_input.c | 2 --
net/ipv6/xfrm6_input.c | 2 --
net/xfrm/xfrm_input.c | 29 ++++++++++++++++-------------
3 files changed, 16 insertions(+), 17 deletions(-)
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index c2eac844bcdb..f6f2a8ef3f88 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -76,8 +76,6 @@ int xfrm4_transport_finish(struct sk_buff *skb, int async)
NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
dev_net(dev), NULL, skb, dev, NULL,
xfrm4_rcv_encap_finish);
- if (async)
- dev_put(dev);
return 0;
}
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 699a001ac166..89d0443b5307 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -71,8 +71,6 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
dev_net(dev), NULL, skb, dev, NULL,
xfrm6_transport_finish2);
- if (async)
- dev_put(dev);
return 0;
}
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index e4c2cd24936d..eecab337bd0a 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -467,6 +467,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
{
const struct xfrm_state_afinfo *afinfo;
struct net *net = dev_net(skb->dev);
+ struct net_device *dev = skb->dev;
int err;
__be32 seq;
__be32 seq_hi;
@@ -493,7 +494,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
LINUX_MIB_XFRMINSTATEINVALID);
if (encap_type == -1)
- dev_put(skb->dev);
+ dev_put(dev);
goto drop;
}
@@ -655,16 +656,16 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
if (!crypto_done) {
spin_unlock(&x->lock);
- dev_hold(skb->dev);
+ dev_hold(dev);
nexthdr = x->type->input(x, skb);
if (nexthdr == -EINPROGRESS) {
if (async)
- dev_put(skb->dev);
+ dev_put(dev);
return 0;
}
- dev_put(skb->dev);
+ dev_put(dev);
spin_lock(&x->lock);
}
resume:
@@ -699,7 +700,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
err = xfrm_inner_mode_input(x, skb);
if (err == -EINPROGRESS) {
if (async)
- dev_put(skb->dev);
+ dev_put(dev);
return 0;
} else if (err) {
XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMODEERROR);
@@ -726,9 +727,12 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
crypto_done = false;
} while (!err);
+ rcu_read_lock();
err = xfrm_rcv_cb(skb, family, x->type->proto, 0);
- if (err)
+ if (err) {
+ rcu_read_unlock();
goto drop;
+ }
nf_reset_ct(skb);
@@ -739,8 +743,9 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
if (skb_valid_dst(skb))
skb_dst_drop(skb);
if (async)
- dev_put(skb->dev);
+ dev_put(dev);
gro_cells_receive(&gro_cells, skb);
+ rcu_read_unlock();
return 0;
} else {
xo = xfrm_offload(skb);
@@ -748,23 +753,21 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
xfrm_gro = xo->flags & XFRM_GRO;
err = -EAFNOSUPPORT;
- rcu_read_lock();
afinfo = xfrm_state_afinfo_get_rcu(x->props.family);
if (likely(afinfo))
err = afinfo->transport_finish(skb, xfrm_gro || async);
- rcu_read_unlock();
if (xfrm_gro) {
sp = skb_sec_path(skb);
if (sp)
sp->olen = 0;
if (skb_valid_dst(skb))
skb_dst_drop(skb);
- if (async)
- dev_put(skb->dev);
gro_cells_receive(&gro_cells, skb);
- return err;
}
+ if (async)
+ dev_put(dev);
+ rcu_read_unlock();
return err;
}
@@ -772,7 +775,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
spin_unlock(&x->lock);
drop:
if (async)
- dev_put(skb->dev);
+ dev_put(dev);
xfrm_rcv_cb(skb, family, x && x->type ? x->type->proto : nexthdr, -1);
kfree_skb(skb);
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 4/7] xfrm: Fix xfrm state cache insertion race
2026-06-22 7:57 [PATCH 0/7] pull request (net): ipsec 2026-06-22 Steffen Klassert
` (2 preceding siblings ...)
2026-06-22 7:57 ` [PATCH 3/7] xfrm: Fix dev use-after-free in xfrm async resumption Steffen Klassert
@ 2026-06-22 7:57 ` Steffen Klassert
2026-06-22 7:57 ` [PATCH 5/7] xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[] Steffen Klassert
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Herbert Xu <herbert@gondor.apana.org.au>
The xfrm input state cache insertion code checks the validity of
the state before acquiring the global xfrm_state_lock. Thus it's
possible for someone else to kill the state after it passed the
validity check, and then the insertion will add the dead state
to the cache.
Fix this by moving the validity check inside the lock.
This entire function is called on the input path, where BH must
be off (e.g., the caller of this function xfrm_input acquires
its spinlocks without disabling BH).
So there is no need to disable BH here or take the RCU read lock.
Remove both and replace them with an assertion that trips if BH
is accidentally enabled on some future calling path.
Fixes: 81a331a0e72d ("xfrm: Add an inbound percpu state cache.")
Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_state.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index d8457ceaf28c..9e87f7028201 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1207,9 +1207,11 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
struct hlist_head *state_cache_input;
struct xfrm_state *x = NULL;
+ /* BH is always disabled on the input path. */
+ lockdep_assert_in_softirq();
+
state_cache_input = raw_cpu_ptr(net->xfrm.state_cache_input);
- rcu_read_lock();
hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) {
if (x->props.family != family ||
x->id.spi != spi ||
@@ -1227,20 +1229,25 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
xfrm_hash_ptrs_get(net, &state_ptrs);
x = __xfrm_state_lookup(&state_ptrs, mark, daddr, spi, proto, family);
-
- if (x && x->km.state == XFRM_STATE_VALID) {
- spin_lock_bh(&net->xfrm.xfrm_state_lock);
- if (hlist_unhashed(&x->state_cache_input)) {
+ if (x) {
+ spin_lock(&net->xfrm.xfrm_state_lock);
+ if (x->km.state != XFRM_STATE_VALID) {
+ /*
+ * The state is about to be destroyed.
+ *
+ * Don't add it to the cache but still
+ * return it to the caller.
+ */
+ } else if (hlist_unhashed(&x->state_cache_input)) {
hlist_add_head_rcu(&x->state_cache_input, state_cache_input);
} else {
hlist_del_rcu(&x->state_cache_input);
hlist_add_head_rcu(&x->state_cache_input, state_cache_input);
}
- spin_unlock_bh(&net->xfrm.xfrm_state_lock);
+ spin_unlock(&net->xfrm.xfrm_state_lock);
}
out:
- rcu_read_unlock();
return x;
}
EXPORT_SYMBOL(xfrm_input_state_lookup);
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 5/7] xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]
2026-06-22 7:57 [PATCH 0/7] pull request (net): ipsec 2026-06-22 Steffen Klassert
` (3 preceding siblings ...)
2026-06-22 7:57 ` [PATCH 4/7] xfrm: Fix xfrm state cache insertion race Steffen Klassert
@ 2026-06-22 7:57 ` Steffen Klassert
2026-06-22 7:57 ` [PATCH 6/7] espintcp: use sk_msg_free_partial to fix partial send Steffen Klassert
2026-06-22 7:57 ` [PATCH 7/7] xfrm: validate selector family and prefixlen during match Steffen Klassert
6 siblings, 0 replies; 8+ messages in thread
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Eric Dumazet <edumazet@google.com>
KCSAN reported a data race involving net->xfrm.policy_count access.
Add missing READ_ONCE()/WRITE_ONCE() annotations on
xfrm_policy_count and xfrm_policy_default.
Fixes: 2518c7c2b3d7 ("[XFRM]: Hash policies when non-prefixed.")
Reported-by: syzbot+d85ba1c732720b9a4097@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a2b9e96.99669fcc.12a77b.0006.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
include/net/xfrm.h | 8 ++++----
net/xfrm/xfrm_policy.c | 24 ++++++++++++------------
net/xfrm/xfrm_user.c | 18 +++++++++---------
3 files changed, 25 insertions(+), 25 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 874409127e29..35a743129329 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1250,8 +1250,8 @@ int __xfrm_policy_check(struct sock *, int dir, struct sk_buff *skb,
static inline bool __xfrm_check_nopolicy(struct net *net, struct sk_buff *skb,
int dir)
{
- if (!net->xfrm.policy_count[dir] && !secpath_exists(skb))
- return net->xfrm.policy_default[dir] == XFRM_USERPOLICY_ACCEPT;
+ if (!READ_ONCE(net->xfrm.policy_count[dir]) && !secpath_exists(skb))
+ return READ_ONCE(net->xfrm.policy_default[dir]) == XFRM_USERPOLICY_ACCEPT;
return false;
}
@@ -1351,8 +1351,8 @@ static inline int xfrm_route_forward(struct sk_buff *skb, unsigned short family)
{
struct net *net = dev_net(skb->dev);
- if (!net->xfrm.policy_count[XFRM_POLICY_OUT] &&
- net->xfrm.policy_default[XFRM_POLICY_OUT] == XFRM_USERPOLICY_ACCEPT)
+ if (!READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT]) &&
+ READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT]) == XFRM_USERPOLICY_ACCEPT)
return true;
return (skb_dst(skb)->flags & DST_NOXFRM) ||
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 959544425692..1f4afd580105 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -685,7 +685,7 @@ static void xfrm_byidx_resize(struct net *net)
static inline int xfrm_bydst_should_resize(struct net *net, int dir, int *total)
{
- unsigned int cnt = net->xfrm.policy_count[dir];
+ unsigned int cnt = READ_ONCE(net->xfrm.policy_count[dir]);
unsigned int hmask = net->xfrm.policy_bydst[dir].hmask;
if (total)
@@ -711,12 +711,12 @@ static inline int xfrm_byidx_should_resize(struct net *net, int total)
void xfrm_spd_getinfo(struct net *net, struct xfrmk_spdinfo *si)
{
- si->incnt = net->xfrm.policy_count[XFRM_POLICY_IN];
- si->outcnt = net->xfrm.policy_count[XFRM_POLICY_OUT];
- si->fwdcnt = net->xfrm.policy_count[XFRM_POLICY_FWD];
- si->inscnt = net->xfrm.policy_count[XFRM_POLICY_IN+XFRM_POLICY_MAX];
- si->outscnt = net->xfrm.policy_count[XFRM_POLICY_OUT+XFRM_POLICY_MAX];
- si->fwdscnt = net->xfrm.policy_count[XFRM_POLICY_FWD+XFRM_POLICY_MAX];
+ si->incnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_IN]);
+ si->outcnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT]);
+ si->fwdcnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_FWD]);
+ si->inscnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_IN+XFRM_POLICY_MAX]);
+ si->outscnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT+XFRM_POLICY_MAX]);
+ si->fwdscnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_FWD+XFRM_POLICY_MAX]);
si->spdhcnt = net->xfrm.policy_idx_hmask;
si->spdhmcnt = xfrm_policy_hashmax;
}
@@ -2318,7 +2318,7 @@ static void __xfrm_policy_link(struct xfrm_policy *pol, int dir)
}
list_add(&pol->walk.all, &net->xfrm.policy_all);
- net->xfrm.policy_count[dir]++;
+ WRITE_ONCE(net->xfrm.policy_count[dir], net->xfrm.policy_count[dir] + 1);
xfrm_pol_hold(pol);
}
@@ -2337,7 +2337,7 @@ static struct xfrm_policy *__xfrm_policy_unlink(struct xfrm_policy *pol,
}
list_del_init(&pol->walk.all);
- net->xfrm.policy_count[dir]--;
+ WRITE_ONCE(net->xfrm.policy_count[dir], net->xfrm.policy_count[dir] - 1);
return pol;
}
@@ -3222,7 +3222,7 @@ struct dst_entry *xfrm_lookup_with_ifid(struct net *net,
/* To accelerate a bit... */
if (!if_id && ((dst_orig->flags & DST_NOXFRM) ||
- !net->xfrm.policy_count[XFRM_POLICY_OUT]))
+ !READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT])))
goto nopol;
xdst = xfrm_bundle_lookup(net, fl, family, dir, &xflo, if_id);
@@ -3296,7 +3296,7 @@ struct dst_entry *xfrm_lookup_with_ifid(struct net *net,
nopol:
if ((!dst_orig->dev || !(dst_orig->dev->flags & IFF_LOOPBACK)) &&
- net->xfrm.policy_default[dir] == XFRM_USERPOLICY_BLOCK) {
+ READ_ONCE(net->xfrm.policy_default[dir]) == XFRM_USERPOLICY_BLOCK) {
err = -EPERM;
goto error;
}
@@ -3750,7 +3750,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
const bool is_crypto_offload = sp &&
(xfrm_input_state(skb)->xso.type == XFRM_DEV_OFFLOAD_CRYPTO);
- if (net->xfrm.policy_default[dir] == XFRM_USERPOLICY_BLOCK) {
+ if (READ_ONCE(net->xfrm.policy_default[dir]) == XFRM_USERPOLICY_BLOCK) {
XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOPOLS);
return 0;
}
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 3b1cf29bc402..61eb5de33b87 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2485,9 +2485,9 @@ static int xfrm_notify_userpolicy(struct net *net)
}
up = nlmsg_data(nlh);
- up->in = net->xfrm.policy_default[XFRM_POLICY_IN];
- up->fwd = net->xfrm.policy_default[XFRM_POLICY_FWD];
- up->out = net->xfrm.policy_default[XFRM_POLICY_OUT];
+ up->in = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_IN]);
+ up->fwd = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_FWD]);
+ up->out = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT]);
nlmsg_end(skb, nlh);
@@ -2511,13 +2511,13 @@ static int xfrm_set_default(struct sk_buff *skb, struct nlmsghdr *nlh,
struct xfrm_userpolicy_default *up = nlmsg_data(nlh);
if (xfrm_userpolicy_is_valid(up->in))
- net->xfrm.policy_default[XFRM_POLICY_IN] = up->in;
+ WRITE_ONCE(net->xfrm.policy_default[XFRM_POLICY_IN], up->in);
if (xfrm_userpolicy_is_valid(up->fwd))
- net->xfrm.policy_default[XFRM_POLICY_FWD] = up->fwd;
+ WRITE_ONCE(net->xfrm.policy_default[XFRM_POLICY_FWD], up->fwd);
if (xfrm_userpolicy_is_valid(up->out))
- net->xfrm.policy_default[XFRM_POLICY_OUT] = up->out;
+ WRITE_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT], up->out);
rt_genid_bump_all(net);
@@ -2547,9 +2547,9 @@ static int xfrm_get_default(struct sk_buff *skb, struct nlmsghdr *nlh,
}
r_up = nlmsg_data(r_nlh);
- r_up->in = net->xfrm.policy_default[XFRM_POLICY_IN];
- r_up->fwd = net->xfrm.policy_default[XFRM_POLICY_FWD];
- r_up->out = net->xfrm.policy_default[XFRM_POLICY_OUT];
+ r_up->in = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_IN]);
+ r_up->fwd = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_FWD]);
+ r_up->out = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT]);
nlmsg_end(r_skb, r_nlh);
return nlmsg_unicast(xfrm_net_nlsk(net, skb), r_skb, portid);
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 6/7] espintcp: use sk_msg_free_partial to fix partial send
2026-06-22 7:57 [PATCH 0/7] pull request (net): ipsec 2026-06-22 Steffen Klassert
` (4 preceding siblings ...)
2026-06-22 7:57 ` [PATCH 5/7] xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[] Steffen Klassert
@ 2026-06-22 7:57 ` Steffen Klassert
2026-06-22 7:57 ` [PATCH 7/7] xfrm: validate selector family and prefixlen during match Steffen Klassert
6 siblings, 0 replies; 8+ messages in thread
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Sabrina Dubroca <sd@queasysnail.net>
sk_msg_free_partial() ensures consistency of the skmsg at every
iteration, without having to manually handle uncharges and offsets.
This simplifies the code, and fixes some bugs in skmsg accounting when
we don't send the full contents.
Cc: stable@vger.kernel.org
Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
Reported-by: Aaron Esau <aaron1esau@gmail.com>
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/espintcp.c | 34 +++++++---------------------------
1 file changed, 7 insertions(+), 27 deletions(-)
diff --git a/net/xfrm/espintcp.c b/net/xfrm/espintcp.c
index d9035546375e..374e1b964438 100644
--- a/net/xfrm/espintcp.c
+++ b/net/xfrm/espintcp.c
@@ -212,43 +212,23 @@ static int espintcp_sendskmsg_locked(struct sock *sk,
struct sk_msg *skmsg = &emsg->skmsg;
bool more = flags & MSG_MORE;
struct scatterlist *sg;
- int done = 0;
int ret;
- sg = &skmsg->sg.data[skmsg->sg.start];
do {
struct bio_vec bvec;
- size_t size = sg->length - emsg->offset;
- int offset = sg->offset + emsg->offset;
- struct page *p;
-
- emsg->offset = 0;
+ sg = &skmsg->sg.data[skmsg->sg.start];
if (sg_is_last(sg) && !more)
msghdr.msg_flags &= ~MSG_MORE;
- p = sg_page(sg);
-retry:
- bvec_set_page(&bvec, p, size, offset);
- iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, size);
- ret = tcp_sendmsg_locked(sk, &msghdr, size);
- if (ret < 0) {
- emsg->offset = offset - sg->offset;
- skmsg->sg.start += done;
+ bvec_set_page(&bvec, sg_page(sg), sg->length, sg->offset);
+ iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, sg->length);
+ ret = tcp_sendmsg_locked(sk, &msghdr, sg->length);
+ if (ret < 0)
return ret;
- }
-
- if (ret != size) {
- offset += ret;
- size -= ret;
- goto retry;
- }
- done++;
- put_page(p);
- sk_mem_uncharge(sk, sg->length);
- sg = sg_next(sg);
- } while (sg);
+ sk_msg_free_partial(sk, skmsg, ret);
+ } while (skmsg->sg.size);
memset(emsg, 0, sizeof(*emsg));
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 7/7] xfrm: validate selector family and prefixlen during match
2026-06-22 7:57 [PATCH 0/7] pull request (net): ipsec 2026-06-22 Steffen Klassert
` (5 preceding siblings ...)
2026-06-22 7:57 ` [PATCH 6/7] espintcp: use sk_msg_free_partial to fix partial send Steffen Klassert
@ 2026-06-22 7:57 ` Steffen Klassert
6 siblings, 0 replies; 8+ messages in thread
From: Steffen Klassert @ 2026-06-22 7:57 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Eric Dumazet <edumazet@google.com>
syzbot reported a shift-out-of-bounds in xfrm_selector_match()
due to AF_UNSPEC selector with large prefixlen (e.g. 128) matched
against IPv4 flow (when XFRM_STATE_AF_UNSPEC is set).
Fix this by:
- Rejecting mismatched families in xfrm_selector_match.
- Returning false in addr4_match if prefixlen > 32.
- Returning false in addr_match if prefixlen > 128 (prevents overflow).
Fixes: 3f0ab59e6537 ("xfrm: validate new SA's prefixlen using SA family when sel.family is unset")
Reported-by: syzbot+9383b1ff0df4b29ca5e6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a2fbe35.be3f099c.2836ae.0018.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
include/net/xfrm.h | 7 +++++++
net/xfrm/xfrm_policy.c | 3 +++
2 files changed, 10 insertions(+)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 35a743129329..f8c909b0f0c3 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -943,6 +943,9 @@ static inline bool addr_match(const void *token1, const void *token2,
unsigned int pdw;
unsigned int pbi;
+ if (prefixlen > 128)
+ return false;
+
pdw = prefixlen >> 5; /* num of whole u32 in prefix */
pbi = prefixlen & 0x1f; /* num of bits in incomplete u32 in prefix */
@@ -967,6 +970,10 @@ static inline bool addr4_match(__be32 a1, __be32 a2, u8 prefixlen)
/* C99 6.5.7 (3): u32 << 32 is undefined behaviour */
if (sizeof(long) == 4 && prefixlen == 0)
return true;
+
+ if (prefixlen > 32)
+ return false;
+
return !((a1 ^ a2) & htonl(~0UL << (32 - prefixlen)));
}
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 1f4afd580105..639934f30016 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -242,6 +242,9 @@ __xfrm6_selector_match(const struct xfrm_selector *sel, const struct flowi *fl)
bool xfrm_selector_match(const struct xfrm_selector *sel, const struct flowi *fl,
unsigned short family)
{
+ if (family != sel->family && sel->family != AF_UNSPEC)
+ return false;
+
switch (family) {
case AF_INET:
return __xfrm4_selector_match(sel, fl);
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-06-22 7:57 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-22 7:57 [PATCH 0/7] pull request (net): ipsec 2026-06-22 Steffen Klassert
2026-06-22 7:57 ` [PATCH 1/7] xfrm: use compat translator only for u64 alignment mismatch Steffen Klassert
2026-06-22 7:57 ` [PATCH 2/7] net: af_key: initialize alg_key_len for IPComp states Steffen Klassert
2026-06-22 7:57 ` [PATCH 3/7] xfrm: Fix dev use-after-free in xfrm async resumption Steffen Klassert
2026-06-22 7:57 ` [PATCH 4/7] xfrm: Fix xfrm state cache insertion race Steffen Klassert
2026-06-22 7:57 ` [PATCH 5/7] xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[] Steffen Klassert
2026-06-22 7:57 ` [PATCH 6/7] espintcp: use sk_msg_free_partial to fix partial send Steffen Klassert
2026-06-22 7:57 ` [PATCH 7/7] xfrm: validate selector family and prefixlen during match Steffen Klassert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox