From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>,
Stanislav Fomichev <sdf@fomichev.me>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>, Kees Cook <kees@kernel.org>,
<nxne.cnse.osdt.itp.upstreaming@intel.com>, <bpf@vger.kernel.org>,
<netdev@vger.kernel.org>, <linux-hardening@vger.kernel.org>,
<stable@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH bpf] xsk: harden userspace-supplied &xdp_desc validation
Date: Thu, 9 Oct 2025 16:27:50 +0200 [thread overview]
Message-ID: <aOfGZvSxC8X2h8Zb@boxer> (raw)
In-Reply-To: <20251008165659.4141318-1-aleksander.lobakin@intel.com>
On Wed, Oct 08, 2025 at 06:56:59PM +0200, Alexander Lobakin wrote:
> Turned out certain clearly invalid values passed in &xdp_desc from
> userspace can pass xp_{,un}aligned_validate_desc() and then lead
> to UBs or just invalid frames to be queued for xmit.
>
> desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len
> can cause positive integer overflow and wraparound, the same way low
> enough desc->addr with a non-zero pool->tx_metadata_len can cause
> negative integer overflow. Both scenarios can then pass the
> validation successfully.
Hmm, when underflow happens the addr would be enormous, passing
existing validation would really be rare. However let us fix it while at
it.
> This doesn't happen with valid XSk applications, but can be used
> to perform attacks.
>
> Always promote desc->len to ``u64`` first to exclude positive
> overflows of it. Use explicit check_{add,sub}_overflow() when
> validating desc->addr (which is ``u64`` already).
>
> bloat-o-meter reports a little growth of the code size:
>
> add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44)
> Function old new delta
> xskq_cons_peek_desc 299 330 +31
> xsk_tx_peek_release_desc_batch 973 1002 +29
> xsk_generic_xmit 3148 3132 -16
>
> but hopefully this doesn't hurt the performance much.
Let us be fully transparent and link the previous discussion here?
I was commenting that breaking up single statement to multiple branches
might affect subtly performance as this code is executed per each
descriptor. Jason tested copy+aligned mode, let us see if zc+unaligned
mode is affected.
<rant>
I am also thinking about test side, but xsk tx metadata came with a
separate test (xdp_hw_metadata), which was rather about testing positive
cases. That is probably a separate discussion, but metadata negative
tests should appear somewhere, I suppose xskxceiver would be a good fit,
but then, should we merge the existing logic from xdp_hw_metadata?
</rant>
>
> Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len")
> Cc: stable@vger.kernel.org # 6.8+
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
> net/xdp/xsk_queue.h | 45 +++++++++++++++++++++++++++++++++++----------
> 1 file changed, 35 insertions(+), 10 deletions(-)
>
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index f16f390370dc..1eb8d9f8b104 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -143,14 +143,24 @@ static inline bool xp_unused_options_set(u32 options)
> static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
> struct xdp_desc *desc)
> {
> - u64 addr = desc->addr - pool->tx_metadata_len;
> - u64 len = desc->len + pool->tx_metadata_len;
> - u64 offset = addr & (pool->chunk_size - 1);
> + u64 len = desc->len;
> + u64 addr, offset;
>
> - if (!desc->len)
> + if (!len)
This is yet another thing being fixed here as for non-zero tx_metadata_len
we were allowing 0 length descriptors... :< overall feels like we relied
too much on contract with userspace WRT descriptor layout.
If zc perf is fine, then:
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> return false;
>
> - if (offset + len > pool->chunk_size)
> + /* Can overflow if desc->addr < pool->tx_metadata_len */
> + if (check_sub_overflow(desc->addr, pool->tx_metadata_len, &addr))
> + return false;
> +
> + offset = addr & (pool->chunk_size - 1);
> +
> + /*
> + * Can't overflow: @offset is guaranteed to be < ``U32_MAX``
> + * (pool->chunk_size is ``u32``), @len is guaranteed
> + * to be <= ``U32_MAX``.
> + */
> + if (offset + len + pool->tx_metadata_len > pool->chunk_size)
> return false;
>
> if (addr >= pool->addrs_cnt)
> @@ -158,27 +168,42 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
>
> if (xp_unused_options_set(desc->options))
> return false;
> +
> return true;
> }
>
> static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool,
> struct xdp_desc *desc)
> {
> - u64 addr = xp_unaligned_add_offset_to_addr(desc->addr) - pool->tx_metadata_len;
> - u64 len = desc->len + pool->tx_metadata_len;
> + u64 len = desc->len;
> + u64 addr, end;
>
> - if (!desc->len)
> + if (!len)
> return false;
>
> + /* Can't overflow: @len is guaranteed to be <= ``U32_MAX`` */
> + len += pool->tx_metadata_len;
> if (len > pool->chunk_size)
> return false;
>
> - if (addr >= pool->addrs_cnt || addr + len > pool->addrs_cnt ||
> - xp_desc_crosses_non_contig_pg(pool, addr, len))
> + /* Can overflow if desc->addr is close to 0 */
> + if (check_sub_overflow(xp_unaligned_add_offset_to_addr(desc->addr),
> + pool->tx_metadata_len, &addr))
> + return false;
> +
> + if (addr >= pool->addrs_cnt)
> + return false;
> +
> + /* Can overflow if pool->addrs_cnt is high enough */
> + if (check_add_overflow(addr, len, &end) || end > pool->addrs_cnt)
> + return false;
> +
> + if (xp_desc_crosses_non_contig_pg(pool, addr, len))
> return false;
>
> if (xp_unused_options_set(desc->options))
> return false;
> +
> return true;
> }
>
> --
> 2.51.0
>
next prev parent reply other threads:[~2025-10-09 14:28 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-08 16:56 [PATCH bpf] xsk: harden userspace-supplied &xdp_desc validation Alexander Lobakin
2025-10-09 14:02 ` Jason Xing
2025-10-09 14:50 ` Alexander Lobakin
2025-10-09 14:27 ` Maciej Fijalkowski [this message]
2025-10-09 15:05 ` Alexander Lobakin
2025-10-10 6:51 ` Jason Xing
2025-10-10 17:10 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aOfGZvSxC8X2h8Zb@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=aleksander.lobakin@intel.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=kees@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=nxne.cnse.osdt.itp.upstreaming@intel.com \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.