* [PATCH net v5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
@ 2026-05-30 4:26 Jason Xing
2026-05-31 4:27 ` sashiko-bot
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Jason Xing @ 2026-05-30 4:26 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
The TX metadata area resides in the UMEM buffer which is memory-mapped
and concurrently writable by userspace. In xsk_skb_metadata(),
csum_start and csum_offset are read from shared memory for bounds
validation, then read again for skb assignment. A malicious userspace
application can race to overwrite these values between the two reads,
bypassing the bounds check and causing out-of-bounds memory access
during checksum computation in the transmit path.
Fix this by reading csum_start and csum_offset into local variables
once, then using the local copies for both validation and assignment.
Note that other metadata fields (flags, launch_time) and the cached
csum fields may be mutually inconsistent due to concurrent userspace
writes, but this is benign: the only security-critical invariant is
that each field's validated value is the same one used, which local
caching guarantees.
Closes: https://lore.kernel.org/all/20260503200927.73EA1C2BCB4@smtp.kernel.org/
Fixes: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
V5
Link: https://lore.kernel.org/all/20260520004244.55663-1-kerneljasonxing@gmail.com/
1. send this patch only and wait for Maciej to come up with a solution
to adjust the selftests. I believe he will send it out along with the
rest of patches.
2. add reviewed-by tag.
V4
Link: https://lore.kernel.org/all/20260517063311.28921-1-kerneljasonxing@gmail.com/
1. correct the description of xmit path in patch 3 (sashiko)
2. move set logic into xmit path in patch 3 (Stan)
V3
Link: https://lore.kernel.org/all/20260515123018.80147-1-kerneljasonxing@gmail.com/
1. avoid breaking previous usage of sendto, and siliently handle
overflow case (Stan, sashiko)
2. add one particular exception process in patch 4 (sashiko)
3. adjust the selftest to make sure it passes in either virutal or
physical machines, which includes add usleep to support physical machine.
V2
Link: https://lore.kernel.org/all/20260510012310.88570-1-kerneljasonxing@gmail.com/
1. adjust selftests (Jakub)
2. add READ_ONCE in patch 1 (Stan)
---
net/xdp/xsk.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 5e5786cd9af5..f8c8a8c9dfba 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -802,6 +802,7 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
u32 hr)
{
struct xsk_tx_metadata *meta = NULL;
+ u16 csum_start, csum_offset;
if (unlikely(pool->tx_metadata_len == 0))
return -EINVAL;
@@ -811,13 +812,15 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
return -EINVAL;
if (meta->flags & XDP_TXMD_FLAGS_CHECKSUM) {
- if (unlikely(meta->request.csum_start +
- meta->request.csum_offset +
+ csum_start = READ_ONCE(meta->request.csum_start);
+ csum_offset = READ_ONCE(meta->request.csum_offset);
+
+ if (unlikely(csum_start + csum_offset +
sizeof(__sum16) > desc->len))
return -EINVAL;
- skb->csum_start = hr + meta->request.csum_start;
- skb->csum_offset = meta->request.csum_offset;
+ skb->csum_start = hr + csum_start;
+ skb->csum_offset = csum_offset;
skb->ip_summed = CHECKSUM_PARTIAL;
if (unlikely(pool->tx_sw_csum)) {
--
2.43.7
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net v5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
2026-05-30 4:26 [PATCH net v5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
@ 2026-05-31 4:27 ` sashiko-bot
2026-05-31 5:00 ` Jason Xing
2026-06-01 15:18 ` Stanislav Fomichev
2026-06-04 1:14 ` patchwork-bot+netdevbpf
2 siblings, 1 reply; 5+ messages in thread
From: sashiko-bot @ 2026-05-31 4:27 UTC (permalink / raw)
To: Jason Xing; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [High] The `struct xsk_tx_metadata` UAPI structure has an inconsistent memory layout between 32-bit and 64-bit architectures, breaking compat mode due to implicit padding.
- [High] Silent 16-bit integer overflow when calculating `skb->csum_start` leads to a negative checksum offset and out-of-bounds hardware DMA faults.
--
commit 8339ac65115fadb8d99c0e54d06d57fab30b5e84
Author: Jason Xing <kerneljasonxing@gmail.com>
Subject: xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
This commit addresses a TOCTOU vulnerability where userspace could modify
csum_start and csum_offset between bounds validation and assignment. It fixes
this by using READ_ONCE() to cache the values into local variables, ensuring
that the same validated values are used for SKB assignment.
Closes: https://lore.kernel.org/all/20260503200927.73EA1C2BCB4@smtp.kernel.org/
Fixes: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 5e5786cd9af55..f8c8a8c9dfba5 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -811,13 +812,15 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
> return -EINVAL;
>
> if (meta->flags & XDP_TXMD_FLAGS_CHECKSUM) {
> - if (unlikely(meta->request.csum_start +
> - meta->request.csum_offset +
> + csum_start = READ_ONCE(meta->request.csum_start);
[Severity: High]
This isn't a bug introduced by this patch, but can the UAPI layout for
xsk_tx_metadata cause issues for 32-bit userspace?
include/uapi/linux/if_xdp.h:
struct xsk_tx_metadata {
__u64 flags;
union {
struct {
/* XDP_TXMD_FLAGS_CHECKSUM */
/* Offset from desc->addr where checksumming should start. */
__u16 csum_start;
/* Offset from csum_start where checksum should be stored. */
__u16 csum_offset;
/* XDP_TXMD_FLAGS_LAUNCH_TIME */
/* Launch time in nanosecond against the PTP HW Clock */
__u64 launch_time;
} request;
Because launch_time is an 8-byte type following two 2-byte types, on 64-bit
architectures there will be 4 bytes of implicit padding inserted after
csum_offset. On 32-bit architectures like x86, __u64 only requires 4-byte
alignment, so there is no padding.
Does this mean a 32-bit compat userspace will write launch_time to an offset
4 bytes earlier than where a 64-bit kernel will read it?
> + csum_offset = READ_ONCE(meta->request.csum_offset);
> +
> + if (unlikely(csum_start + csum_offset +
> sizeof(__sum16) > desc->len))
> return -EINVAL;
>
> - skb->csum_start = hr + meta->request.csum_start;
> - skb->csum_offset = meta->request.csum_offset;
> + skb->csum_start = hr + csum_start;
> + skb->csum_offset = csum_offset;
[Severity: High]
This is a pre-existing issue, but is it possible for skb->csum_start to silently
overflow here?
If desc->len is large (e.g. in unaligned_chunks mode or on platforms with
>= 64KB PAGE_SIZE) and csum_start is near 65535, could hr + csum_start exceed
the 16-bit skb->csum_start field?
For example, if hr is 256 and csum_start is 65500, hr + csum_start is 65756
which truncates to 220. Later, skb_checksum_start_offset(skb) calculates
skb->csum_start - skb_headroom(skb) (e.g. 220 - 256 = -36).
When hardware checksum offload is used, NIC drivers might use this negative
offset to compute header lengths and shift it into the TX descriptor,
potentially causing hardware DMA faults or PCIe completion timeouts.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260530042630.80626-1-kerneljasonxing@gmail.com?part=1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net v5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
2026-05-31 4:27 ` sashiko-bot
@ 2026-05-31 5:00 ` Jason Xing
0 siblings, 0 replies; 5+ messages in thread
From: Jason Xing @ 2026-05-31 5:00 UTC (permalink / raw)
To: sashiko-reviews; +Cc: bpf
On Sun, May 31, 2026 at 12:27 PM <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
>
> Pre-existing issues:
> - [High] The `struct xsk_tx_metadata` UAPI structure has an inconsistent memory layout between 32-bit and 64-bit architectures, breaking compat mode due to implicit padding.
> - [High] Silent 16-bit integer overflow when calculating `skb->csum_start` leads to a negative checksum offset and out-of-bounds hardware DMA faults.
> --
>
> commit 8339ac65115fadb8d99c0e54d06d57fab30b5e84
> Author: Jason Xing <kerneljasonxing@gmail.com>
> Subject: xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
>
> This commit addresses a TOCTOU vulnerability where userspace could modify
> csum_start and csum_offset between bounds validation and assignment. It fixes
> this by using READ_ONCE() to cache the values into local variables, ensuring
> that the same validated values are used for SKB assignment.
>
> Closes: https://lore.kernel.org/all/20260503200927.73EA1C2BCB4@smtp.kernel.org/
> Fixes: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
>
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 5e5786cd9af55..f8c8a8c9dfba5 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -811,13 +812,15 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
> > return -EINVAL;
> >
> > if (meta->flags & XDP_TXMD_FLAGS_CHECKSUM) {
> > - if (unlikely(meta->request.csum_start +
> > - meta->request.csum_offset +
> > + csum_start = READ_ONCE(meta->request.csum_start);
>
> [Severity: High]
> This isn't a bug introduced by this patch, but can the UAPI layout for
> xsk_tx_metadata cause issues for 32-bit userspace?
>
> include/uapi/linux/if_xdp.h:
> struct xsk_tx_metadata {
> __u64 flags;
>
> union {
> struct {
> /* XDP_TXMD_FLAGS_CHECKSUM */
>
> /* Offset from desc->addr where checksumming should start. */
> __u16 csum_start;
> /* Offset from csum_start where checksum should be stored. */
> __u16 csum_offset;
>
> /* XDP_TXMD_FLAGS_LAUNCH_TIME */
> /* Launch time in nanosecond against the PTP HW Clock */
> __u64 launch_time;
> } request;
>
> Because launch_time is an 8-byte type following two 2-byte types, on 64-bit
> architectures there will be 4 bytes of implicit padding inserted after
> csum_offset. On 32-bit architectures like x86, __u64 only requires 4-byte
> alignment, so there is no padding.
>
> Does this mean a 32-bit compat userspace will write launch_time to an offset
> 4 bytes earlier than where a 64-bit kernel will read it?
>
> > + csum_offset = READ_ONCE(meta->request.csum_offset);
> > +
> > + if (unlikely(csum_start + csum_offset +
> > sizeof(__sum16) > desc->len))
> > return -EINVAL;
> >
> > - skb->csum_start = hr + meta->request.csum_start;
> > - skb->csum_offset = meta->request.csum_offset;
> > + skb->csum_start = hr + csum_start;
> > + skb->csum_offset = csum_offset;
>
> [Severity: High]
> This is a pre-existing issue, but is it possible for skb->csum_start to silently
> overflow here?
>
> If desc->len is large (e.g. in unaligned_chunks mode or on platforms with
> >= 64KB PAGE_SIZE) and csum_start is near 65535, could hr + csum_start exceed
> the 16-bit skb->csum_start field?
>
> For example, if hr is 256 and csum_start is 65500, hr + csum_start is 65756
> which truncates to 220. Later, skb_checksum_start_offset(skb) calculates
> skb->csum_start - skb_headroom(skb) (e.g. 220 - 256 = -36).
>
> When hardware checksum offload is used, NIC drivers might use this negative
> offset to compute header lengths and shift it into the TX descriptor,
> potentially causing hardware DMA faults or PCIe completion timeouts.
They were brought up a few times before... The second one will be
fixed shortly after this patch, I said.
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260530042630.80626-1-kerneljasonxing@gmail.com?part=1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net v5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
2026-05-30 4:26 [PATCH net v5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
2026-05-31 4:27 ` sashiko-bot
@ 2026-06-01 15:18 ` Stanislav Fomichev
2026-06-04 1:14 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 5+ messages in thread
From: Stanislav Fomichev @ 2026-06-01 15:18 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing
On 05/30, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
>
> The TX metadata area resides in the UMEM buffer which is memory-mapped
> and concurrently writable by userspace. In xsk_skb_metadata(),
> csum_start and csum_offset are read from shared memory for bounds
> validation, then read again for skb assignment. A malicious userspace
> application can race to overwrite these values between the two reads,
> bypassing the bounds check and causing out-of-bounds memory access
> during checksum computation in the transmit path.
>
> Fix this by reading csum_start and csum_offset into local variables
> once, then using the local copies for both validation and assignment.
>
> Note that other metadata fields (flags, launch_time) and the cached
> csum fields may be mutually inconsistent due to concurrent userspace
> writes, but this is benign: the only security-critical invariant is
> that each field's validated value is the same one used, which local
> caching guarantees.
>
> Closes: https://lore.kernel.org/all/20260503200927.73EA1C2BCB4@smtp.kernel.org/
> Fixes: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net v5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
2026-05-30 4:26 [PATCH net v5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
2026-05-31 4:27 ` sashiko-bot
2026-06-01 15:18 ` Stanislav Fomichev
@ 2026-06-04 1:14 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-06-04 1:14 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev, bpf, netdev, kernelxing
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Sat, 30 May 2026 12:26:30 +0800 you wrote:
> From: Jason Xing <kernelxing@tencent.com>
>
> The TX metadata area resides in the UMEM buffer which is memory-mapped
> and concurrently writable by userspace. In xsk_skb_metadata(),
> csum_start and csum_offset are read from shared memory for bounds
> validation, then read again for skb assignment. A malicious userspace
> application can race to overwrite these values between the two reads,
> bypassing the bounds check and causing out-of-bounds memory access
> during checksum computation in the transmit path.
>
> [...]
Here is the summary with links:
- [net,v5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
https://git.kernel.org/netdev/net/c/22ba97ea9cc1
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-04 1:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-30 4:26 [PATCH net v5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
2026-05-31 4:27 ` sashiko-bot
2026-05-31 5:00 ` Jason Xing
2026-06-01 15:18 ` Stanislav Fomichev
2026-06-04 1:14 ` patchwork-bot+netdevbpf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.