BPF List
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf.kernel@gmail.com>
To: Jason Xing <kerneljasonxing@gmail.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	 pabeni@redhat.com, bjorn@kernel.org, magnus.karlsson@intel.com,
	 maciej.fijalkowski@intel.com, jonathan.lemon@gmail.com,
	sdf@fomichev.me, ast@kernel.org,  daniel@iogearbox.net,
	hawk@kernel.org, john.fastabend@gmail.com, horms@kernel.org,
	 andrew+netdev@lunn.ch, bpf@vger.kernel.org,
	netdev@vger.kernel.org,  Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH net 1/4] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
Date: Wed, 13 May 2026 08:37:55 -0700	[thread overview]
Message-ID: <agSaZtBh-v6isF_8@devvm7509.cco0.facebook.com> (raw)
In-Reply-To: <CAL+tcoDqtEAqX65-8crUkUko5T_0nq=LSxAbMg+5jCAHa_uC=A@mail.gmail.com>

On 05/13, Jason Xing wrote:
> On Wed, May 13, 2026 at 6:34 AM Stanislav Fomichev <sdf.kernel@gmail.com> wrote:
> >
> > On 05/12, Jason Xing wrote:
> > > On Mon, May 11, 2026 at 11:03 PM Stanislav Fomichev
> > > <sdf.kernel@gmail.com> wrote:
> > > >
> > > > On 05/10, Jason Xing wrote:
> > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > >
> > > > > The TX metadata area resides in the UMEM buffer which is memory-mapped
> > > > > and concurrently writable by userspace. In xsk_skb_metadata(),
> > > > > csum_start and csum_offset are read from shared memory for bounds
> > > > > validation, then read again for skb assignment. A malicious userspace
> > > > > application can race to overwrite these values between the two reads,
> > > > > bypassing the bounds check and causing out-of-bounds memory access
> > > > > during checksum computation in the transmit path.
> > > > >
> > > > > Fix this by reading csum_start and csum_offset into local variables
> > > > > once, then using the local copies for both validation and assignment.
> > > > >
> > > > > Note that other metadata fields (flags, launch_time) and the cached
> > > > > csum fields may be mutually inconsistent due to concurrent userspace
> > > > > writes, but this is benign: the only security-critical invariant is
> > > > > that each field's validated value is the same one used, which local
> > > > > caching guarantees.
> > > > >
> > > > > Closes: https://lore.kernel.org/all/20260503200927.73EA1C2BCB4@smtp.kernel.org/
> > > > > Fixes: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
> > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > > ---
> > > > >  net/xdp/xsk.c | 11 +++++++----
> > > > >  1 file changed, 7 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > > index 6bcd77068e52..cd039e397018 100644
> > > > > --- a/net/xdp/xsk.c
> > > > > +++ b/net/xdp/xsk.c
> > > > > @@ -722,6 +722,7 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
> > > > >                           u32 hr)
> > > > >  {
> > > > >       struct xsk_tx_metadata *meta = NULL;
> > > > > +     u16 csum_start, csum_offset;
> > > > >
> > > > >       if (unlikely(pool->tx_metadata_len == 0))
> > > > >               return -EINVAL;
> > > > > @@ -731,13 +732,15 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
> > > > >               return -EINVAL;
> > > > >
> > > > >       if (meta->flags & XDP_TXMD_FLAGS_CHECKSUM) {
> > > > > -             if (unlikely(meta->request.csum_start +
> > > > > -                          meta->request.csum_offset +
> > > > > +             csum_start = meta->request.csum_start;
> > > > > +             csum_offset = meta->request.csum_offset;
> > > >
> > > > Wondering if it's better to READ_ONCE(x) these?
> > >
> > > I still chose not to use it after reading the suggestion from local
> > > AI. The reason is there is no WRITE_ONCE pair to make sure everything
> > > is no data-race. I also checked some existing implementations around
> > > the shared buffer (between userspace and kernel) and didn't manage to
> > > see the usage of XXXX_ONCE(). Does it make any sense to you :) ?
> >
> > Without READ_ONCE your patch relies on the compiler honoring exactly the
> > loads and stores as written. Which I don't think it does (hence that
> > whole WRITE_ONCE/READ_ONCE mess). IOW, it can pretty much generate
> > the same code (and read csum_start twice) even with your patch applied. Happy
> > to argue with your local AI if you give me the output :-D
> >
> > I grepped io_uring/ and I see similar pattern for reading user supplied
> > entries (via READ_ONCE).
> 
> I roughly understand what your meaning is here. My thought is the
> data-race condition in this case still happens even with the
> READ_ONCE() protection (because no such corresponding operation is
> performed on the writer side).
> 
> Actually the local AI said use READ_ONCE instead.
> 
> I can change this as you suggested for sure. Maybe related fields can
> be protected in the same way in another patch.

In general, feels like we should be doing READ_ONCE on all user-supplied
descriptors (to make it more apparent that those can be concurrently changed
by the userspace). In practice, probably too late and too much churn to update
everything at this point, so let's only fix the ones that can lead to
TOCTOU.

  reply	other threads:[~2026-05-13 15:37 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-10  1:23 [PATCH net 0/4] xsk: fix meta and publish of cq issues Jason Xing
2026-05-10  1:23 ` [PATCH net 1/4] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
2026-05-11 15:03   ` Stanislav Fomichev
2026-05-12 14:32     ` Jason Xing
2026-05-12 22:34       ` Stanislav Fomichev
2026-05-13 14:21         ` Jason Xing
2026-05-13 15:37           ` Stanislav Fomichev [this message]
2026-05-10  1:23 ` [PATCH net 2/4] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Jason Xing
2026-05-10  1:23 ` [PATCH net 3/4] xsk: drain continuation descs after overflow in xsk_build_skb() Jason Xing
2026-05-13 16:27   ` Stanislav Fomichev
2026-05-10  1:23 ` [PATCH net 4/4] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Jason Xing
2026-05-11 14:16 ` [PATCH net 0/4] xsk: fix meta and publish of cq issues Jakub Kicinski
2026-05-12 14:29   ` Jason Xing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agSaZtBh-v6isF_8@devvm7509.cco0.facebook.com \
    --to=sdf.kernel@gmail.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kerneljasonxing@gmail.com \
    --cc=kernelxing@tencent.com \
    --cc=kuba@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox