* [PATCH bpf] xsk: harden userspace-supplied &xdp_desc validation
@ 2025-10-08 16:56 Alexander Lobakin
2025-10-09 14:02 ` Jason Xing
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Alexander Lobakin @ 2025-10-08 16:56 UTC (permalink / raw)
To: Magnus Karlsson, Maciej Fijalkowski
Cc: Alexander Lobakin, Stanislav Fomichev, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Kees Cook, nxne.cnse.osdt.itp.upstreaming, bpf,
netdev, linux-hardening, stable, linux-kernel
Turned out certain clearly invalid values passed in &xdp_desc from
userspace can pass xp_{,un}aligned_validate_desc() and then lead
to UBs or just invalid frames to be queued for xmit.
desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len
can cause positive integer overflow and wraparound, the same way low
enough desc->addr with a non-zero pool->tx_metadata_len can cause
negative integer overflow. Both scenarios can then pass the
validation successfully.
This doesn't happen with valid XSk applications, but can be used
to perform attacks.
Always promote desc->len to ``u64`` first to exclude positive
overflows of it. Use explicit check_{add,sub}_overflow() when
validating desc->addr (which is ``u64`` already).
bloat-o-meter reports a little growth of the code size:
add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44)
Function old new delta
xskq_cons_peek_desc 299 330 +31
xsk_tx_peek_release_desc_batch 973 1002 +29
xsk_generic_xmit 3148 3132 -16
but hopefully this doesn't hurt the performance much.
Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len")
Cc: stable@vger.kernel.org # 6.8+
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
net/xdp/xsk_queue.h | 45 +++++++++++++++++++++++++++++++++++----------
1 file changed, 35 insertions(+), 10 deletions(-)
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index f16f390370dc..1eb8d9f8b104 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -143,14 +143,24 @@ static inline bool xp_unused_options_set(u32 options)
static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
struct xdp_desc *desc)
{
- u64 addr = desc->addr - pool->tx_metadata_len;
- u64 len = desc->len + pool->tx_metadata_len;
- u64 offset = addr & (pool->chunk_size - 1);
+ u64 len = desc->len;
+ u64 addr, offset;
- if (!desc->len)
+ if (!len)
return false;
- if (offset + len > pool->chunk_size)
+ /* Can overflow if desc->addr < pool->tx_metadata_len */
+ if (check_sub_overflow(desc->addr, pool->tx_metadata_len, &addr))
+ return false;
+
+ offset = addr & (pool->chunk_size - 1);
+
+ /*
+ * Can't overflow: @offset is guaranteed to be < ``U32_MAX``
+ * (pool->chunk_size is ``u32``), @len is guaranteed
+ * to be <= ``U32_MAX``.
+ */
+ if (offset + len + pool->tx_metadata_len > pool->chunk_size)
return false;
if (addr >= pool->addrs_cnt)
@@ -158,27 +168,42 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
if (xp_unused_options_set(desc->options))
return false;
+
return true;
}
static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool,
struct xdp_desc *desc)
{
- u64 addr = xp_unaligned_add_offset_to_addr(desc->addr) - pool->tx_metadata_len;
- u64 len = desc->len + pool->tx_metadata_len;
+ u64 len = desc->len;
+ u64 addr, end;
- if (!desc->len)
+ if (!len)
return false;
+ /* Can't overflow: @len is guaranteed to be <= ``U32_MAX`` */
+ len += pool->tx_metadata_len;
if (len > pool->chunk_size)
return false;
- if (addr >= pool->addrs_cnt || addr + len > pool->addrs_cnt ||
- xp_desc_crosses_non_contig_pg(pool, addr, len))
+ /* Can overflow if desc->addr is close to 0 */
+ if (check_sub_overflow(xp_unaligned_add_offset_to_addr(desc->addr),
+ pool->tx_metadata_len, &addr))
+ return false;
+
+ if (addr >= pool->addrs_cnt)
+ return false;
+
+ /* Can overflow if pool->addrs_cnt is high enough */
+ if (check_add_overflow(addr, len, &end) || end > pool->addrs_cnt)
+ return false;
+
+ if (xp_desc_crosses_non_contig_pg(pool, addr, len))
return false;
if (xp_unused_options_set(desc->options))
return false;
+
return true;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH bpf] xsk: harden userspace-supplied &xdp_desc validation
2025-10-08 16:56 [PATCH bpf] xsk: harden userspace-supplied &xdp_desc validation Alexander Lobakin
@ 2025-10-09 14:02 ` Jason Xing
2025-10-09 14:50 ` Alexander Lobakin
2025-10-09 14:27 ` Maciej Fijalkowski
2025-10-10 17:10 ` patchwork-bot+netdevbpf
2 siblings, 1 reply; 7+ messages in thread
From: Jason Xing @ 2025-10-09 14:02 UTC (permalink / raw)
To: Alexander Lobakin
Cc: Magnus Karlsson, Maciej Fijalkowski, Stanislav Fomichev,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Kees Cook,
nxne.cnse.osdt.itp.upstreaming, bpf, netdev, linux-hardening,
stable, linux-kernel
On Thu, Oct 9, 2025 at 12:59 AM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> Turned out certain clearly invalid values passed in &xdp_desc from
> userspace can pass xp_{,un}aligned_validate_desc() and then lead
> to UBs or just invalid frames to be queued for xmit.
>
> desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len
> can cause positive integer overflow and wraparound, the same way low
> enough desc->addr with a non-zero pool->tx_metadata_len can cause
> negative integer overflow. Both scenarios can then pass the
> validation successfully.
> This doesn't happen with valid XSk applications, but can be used
> to perform attacks.
>
> Always promote desc->len to ``u64`` first to exclude positive
> overflows of it. Use explicit check_{add,sub}_overflow() when
> validating desc->addr (which is ``u64`` already).
>
> bloat-o-meter reports a little growth of the code size:
>
> add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44)
> Function old new delta
> xskq_cons_peek_desc 299 330 +31
> xsk_tx_peek_release_desc_batch 973 1002 +29
> xsk_generic_xmit 3148 3132 -16
>
> but hopefully this doesn't hurt the performance much.
I don't see an evident point that might affect the performance. Since
you said that, I tested by running './xdpsock -i eth1 -t -S -s 64' and
didn't spot any degradation.
>
> Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len")
> Cc: stable@vger.kernel.org # 6.8+
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Thanks for the fix!
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
> net/xdp/xsk_queue.h | 45 +++++++++++++++++++++++++++++++++++----------
> 1 file changed, 35 insertions(+), 10 deletions(-)
>
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index f16f390370dc..1eb8d9f8b104 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -143,14 +143,24 @@ static inline bool xp_unused_options_set(u32 options)
> static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
> struct xdp_desc *desc)
> {
> - u64 addr = desc->addr - pool->tx_metadata_len;
> - u64 len = desc->len + pool->tx_metadata_len;
> - u64 offset = addr & (pool->chunk_size - 1);
> + u64 len = desc->len;
> + u64 addr, offset;
>
> - if (!desc->len)
> + if (!len)
> return false;
>
> - if (offset + len > pool->chunk_size)
> + /* Can overflow if desc->addr < pool->tx_metadata_len */
> + if (check_sub_overflow(desc->addr, pool->tx_metadata_len, &addr))
> + return false;
> +
> + offset = addr & (pool->chunk_size - 1);
> +
> + /*
> + * Can't overflow: @offset is guaranteed to be < ``U32_MAX``
> + * (pool->chunk_size is ``u32``), @len is guaranteed
> + * to be <= ``U32_MAX``.
> + */
> + if (offset + len + pool->tx_metadata_len > pool->chunk_size)
> return false;
>
> if (addr >= pool->addrs_cnt)
> @@ -158,27 +168,42 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
>
> if (xp_unused_options_set(desc->options))
> return false;
> +
nit?
> return true;
> }
>
> static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool,
> struct xdp_desc *desc)
> {
> - u64 addr = xp_unaligned_add_offset_to_addr(desc->addr) - pool->tx_metadata_len;
> - u64 len = desc->len + pool->tx_metadata_len;
> + u64 len = desc->len;
> + u64 addr, end;
>
> - if (!desc->len)
> + if (!len)
> return false;
>
> + /* Can't overflow: @len is guaranteed to be <= ``U32_MAX`` */
> + len += pool->tx_metadata_len;
> if (len > pool->chunk_size)
> return false;
>
> - if (addr >= pool->addrs_cnt || addr + len > pool->addrs_cnt ||
> - xp_desc_crosses_non_contig_pg(pool, addr, len))
> + /* Can overflow if desc->addr is close to 0 */
> + if (check_sub_overflow(xp_unaligned_add_offset_to_addr(desc->addr),
> + pool->tx_metadata_len, &addr))
> + return false;
> +
> + if (addr >= pool->addrs_cnt)
> + return false;
> +
> + /* Can overflow if pool->addrs_cnt is high enough */
> + if (check_add_overflow(addr, len, &end) || end > pool->addrs_cnt)
> + return false;
> +
> + if (xp_desc_crosses_non_contig_pg(pool, addr, len))
> return false;
>
> if (xp_unused_options_set(desc->options))
> return false;
> +
> return true;
> }
>
> --
> 2.51.0
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH bpf] xsk: harden userspace-supplied &xdp_desc validation
2025-10-08 16:56 [PATCH bpf] xsk: harden userspace-supplied &xdp_desc validation Alexander Lobakin
2025-10-09 14:02 ` Jason Xing
@ 2025-10-09 14:27 ` Maciej Fijalkowski
2025-10-09 15:05 ` Alexander Lobakin
2025-10-10 6:51 ` Jason Xing
2025-10-10 17:10 ` patchwork-bot+netdevbpf
2 siblings, 2 replies; 7+ messages in thread
From: Maciej Fijalkowski @ 2025-10-09 14:27 UTC (permalink / raw)
To: Alexander Lobakin
Cc: Magnus Karlsson, Stanislav Fomichev, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Kees Cook, nxne.cnse.osdt.itp.upstreaming, bpf,
netdev, linux-hardening, stable, linux-kernel
On Wed, Oct 08, 2025 at 06:56:59PM +0200, Alexander Lobakin wrote:
> Turned out certain clearly invalid values passed in &xdp_desc from
> userspace can pass xp_{,un}aligned_validate_desc() and then lead
> to UBs or just invalid frames to be queued for xmit.
>
> desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len
> can cause positive integer overflow and wraparound, the same way low
> enough desc->addr with a non-zero pool->tx_metadata_len can cause
> negative integer overflow. Both scenarios can then pass the
> validation successfully.
Hmm, when underflow happens the addr would be enormous, passing
existing validation would really be rare. However let us fix it while at
it.
> This doesn't happen with valid XSk applications, but can be used
> to perform attacks.
>
> Always promote desc->len to ``u64`` first to exclude positive
> overflows of it. Use explicit check_{add,sub}_overflow() when
> validating desc->addr (which is ``u64`` already).
>
> bloat-o-meter reports a little growth of the code size:
>
> add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44)
> Function old new delta
> xskq_cons_peek_desc 299 330 +31
> xsk_tx_peek_release_desc_batch 973 1002 +29
> xsk_generic_xmit 3148 3132 -16
>
> but hopefully this doesn't hurt the performance much.
Let us be fully transparent and link the previous discussion here?
I was commenting that breaking up single statement to multiple branches
might affect subtly performance as this code is executed per each
descriptor. Jason tested copy+aligned mode, let us see if zc+unaligned
mode is affected.
<rant>
I am also thinking about test side, but xsk tx metadata came with a
separate test (xdp_hw_metadata), which was rather about testing positive
cases. That is probably a separate discussion, but metadata negative
tests should appear somewhere, I suppose xskxceiver would be a good fit,
but then, should we merge the existing logic from xdp_hw_metadata?
</rant>
>
> Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len")
> Cc: stable@vger.kernel.org # 6.8+
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
> net/xdp/xsk_queue.h | 45 +++++++++++++++++++++++++++++++++++----------
> 1 file changed, 35 insertions(+), 10 deletions(-)
>
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index f16f390370dc..1eb8d9f8b104 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -143,14 +143,24 @@ static inline bool xp_unused_options_set(u32 options)
> static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
> struct xdp_desc *desc)
> {
> - u64 addr = desc->addr - pool->tx_metadata_len;
> - u64 len = desc->len + pool->tx_metadata_len;
> - u64 offset = addr & (pool->chunk_size - 1);
> + u64 len = desc->len;
> + u64 addr, offset;
>
> - if (!desc->len)
> + if (!len)
This is yet another thing being fixed here as for non-zero tx_metadata_len
we were allowing 0 length descriptors... :< overall feels like we relied
too much on contract with userspace WRT descriptor layout.
If zc perf is fine, then:
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> return false;
>
> - if (offset + len > pool->chunk_size)
> + /* Can overflow if desc->addr < pool->tx_metadata_len */
> + if (check_sub_overflow(desc->addr, pool->tx_metadata_len, &addr))
> + return false;
> +
> + offset = addr & (pool->chunk_size - 1);
> +
> + /*
> + * Can't overflow: @offset is guaranteed to be < ``U32_MAX``
> + * (pool->chunk_size is ``u32``), @len is guaranteed
> + * to be <= ``U32_MAX``.
> + */
> + if (offset + len + pool->tx_metadata_len > pool->chunk_size)
> return false;
>
> if (addr >= pool->addrs_cnt)
> @@ -158,27 +168,42 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
>
> if (xp_unused_options_set(desc->options))
> return false;
> +
> return true;
> }
>
> static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool,
> struct xdp_desc *desc)
> {
> - u64 addr = xp_unaligned_add_offset_to_addr(desc->addr) - pool->tx_metadata_len;
> - u64 len = desc->len + pool->tx_metadata_len;
> + u64 len = desc->len;
> + u64 addr, end;
>
> - if (!desc->len)
> + if (!len)
> return false;
>
> + /* Can't overflow: @len is guaranteed to be <= ``U32_MAX`` */
> + len += pool->tx_metadata_len;
> if (len > pool->chunk_size)
> return false;
>
> - if (addr >= pool->addrs_cnt || addr + len > pool->addrs_cnt ||
> - xp_desc_crosses_non_contig_pg(pool, addr, len))
> + /* Can overflow if desc->addr is close to 0 */
> + if (check_sub_overflow(xp_unaligned_add_offset_to_addr(desc->addr),
> + pool->tx_metadata_len, &addr))
> + return false;
> +
> + if (addr >= pool->addrs_cnt)
> + return false;
> +
> + /* Can overflow if pool->addrs_cnt is high enough */
> + if (check_add_overflow(addr, len, &end) || end > pool->addrs_cnt)
> + return false;
> +
> + if (xp_desc_crosses_non_contig_pg(pool, addr, len))
> return false;
>
> if (xp_unused_options_set(desc->options))
> return false;
> +
> return true;
> }
>
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH bpf] xsk: harden userspace-supplied &xdp_desc validation
2025-10-09 14:02 ` Jason Xing
@ 2025-10-09 14:50 ` Alexander Lobakin
0 siblings, 0 replies; 7+ messages in thread
From: Alexander Lobakin @ 2025-10-09 14:50 UTC (permalink / raw)
To: Jason Xing
Cc: Magnus Karlsson, Maciej Fijalkowski, Stanislav Fomichev,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Kees Cook,
nxne.cnse.osdt.itp.upstreaming, bpf, netdev, linux-hardening,
stable, linux-kernel
From: Jason Xing <kerneljasonxing@gmail.com>
Date: Thu, 9 Oct 2025 22:02:23 +0800
> On Thu, Oct 9, 2025 at 12:59 AM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
>>
>> Turned out certain clearly invalid values passed in &xdp_desc from
>> userspace can pass xp_{,un}aligned_validate_desc() and then lead
>> to UBs or just invalid frames to be queued for xmit.
>>
>> desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len
>> can cause positive integer overflow and wraparound, the same way low
>> enough desc->addr with a non-zero pool->tx_metadata_len can cause
>> negative integer overflow. Both scenarios can then pass the
>> validation successfully.
>> This doesn't happen with valid XSk applications, but can be used
>> to perform attacks.
>>
>> Always promote desc->len to ``u64`` first to exclude positive
>> overflows of it. Use explicit check_{add,sub}_overflow() when
>> validating desc->addr (which is ``u64`` already).
>>
>> bloat-o-meter reports a little growth of the code size:
>>
>> add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44)
>> Function old new delta
>> xskq_cons_peek_desc 299 330 +31
>> xsk_tx_peek_release_desc_batch 973 1002 +29
>> xsk_generic_xmit 3148 3132 -16
>>
>> but hopefully this doesn't hurt the performance much.
>
> I don't see an evident point that might affect the performance. Since
> you said that, I tested by running './xdpsock -i eth1 -t -S -s 64' and
> didn't spot any degradation.
Thanks for testing!
>
>>
>> Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len")
>> Cc: stable@vger.kernel.org # 6.8+
>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>
> Thanks for the fix!
>
> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
>
>> ---
>> net/xdp/xsk_queue.h | 45 +++++++++++++++++++++++++++++++++++----------
>> 1 file changed, 35 insertions(+), 10 deletions(-)
>>
>> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
>> index f16f390370dc..1eb8d9f8b104 100644
>> --- a/net/xdp/xsk_queue.h
>> +++ b/net/xdp/xsk_queue.h
>> @@ -143,14 +143,24 @@ static inline bool xp_unused_options_set(u32 options)
>> static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
>> struct xdp_desc *desc)
>> {
>> - u64 addr = desc->addr - pool->tx_metadata_len;
>> - u64 len = desc->len + pool->tx_metadata_len;
>> - u64 offset = addr & (pool->chunk_size - 1);
>> + u64 len = desc->len;
>> + u64 addr, offset;
>>
>> - if (!desc->len)
>> + if (!len)
>> return false;
>>
>> - if (offset + len > pool->chunk_size)
>> + /* Can overflow if desc->addr < pool->tx_metadata_len */
>> + if (check_sub_overflow(desc->addr, pool->tx_metadata_len, &addr))
>> + return false;
>> +
>> + offset = addr & (pool->chunk_size - 1);
>> +
>> + /*
>> + * Can't overflow: @offset is guaranteed to be < ``U32_MAX``
>> + * (pool->chunk_size is ``u32``), @len is guaranteed
>> + * to be <= ``U32_MAX``.
>> + */
>> + if (offset + len + pool->tx_metadata_len > pool->chunk_size)
>> return false;
>>
>> if (addr >= pool->addrs_cnt)
>> @@ -158,27 +168,42 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
>>
>> if (xp_unused_options_set(desc->options))
>> return false;
>> +
>
> nit?
Yes, probably doesn't fit well for this particular fix. I have no strong
preference and can remove it if the community wishes.
>
>> return true;
>> }
>>
>> static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool,
>> struct xdp_desc *desc)
>> {
>> - u64 addr = xp_unaligned_add_offset_to_addr(desc->addr) - pool->tx_metadata_len;
>> - u64 len = desc->len + pool->tx_metadata_len;
>> + u64 len = desc->len;
>> + u64 addr, end;
>>
>> - if (!desc->len)
>> + if (!len)
>> return false;
>>
>> + /* Can't overflow: @len is guaranteed to be <= ``U32_MAX`` */
>> + len += pool->tx_metadata_len;
>> if (len > pool->chunk_size)
>> return false;
>>
>> - if (addr >= pool->addrs_cnt || addr + len > pool->addrs_cnt ||
>> - xp_desc_crosses_non_contig_pg(pool, addr, len))
>> + /* Can overflow if desc->addr is close to 0 */
>> + if (check_sub_overflow(xp_unaligned_add_offset_to_addr(desc->addr),
>> + pool->tx_metadata_len, &addr))
>> + return false;
>> +
>> + if (addr >= pool->addrs_cnt)
>> + return false;
>> +
>> + /* Can overflow if pool->addrs_cnt is high enough */
>> + if (check_add_overflow(addr, len, &end) || end > pool->addrs_cnt)
>> + return false;
>> +
>> + if (xp_desc_crosses_non_contig_pg(pool, addr, len))
>> return false;
>>
>> if (xp_unused_options_set(desc->options))
>> return false;
>> +
Same here.
>> return true;
>> }
Thanks,
Olek
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH bpf] xsk: harden userspace-supplied &xdp_desc validation
2025-10-09 14:27 ` Maciej Fijalkowski
@ 2025-10-09 15:05 ` Alexander Lobakin
2025-10-10 6:51 ` Jason Xing
1 sibling, 0 replies; 7+ messages in thread
From: Alexander Lobakin @ 2025-10-09 15:05 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: Magnus Karlsson, Stanislav Fomichev, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Kees Cook, nxne.cnse.osdt.itp.upstreaming, bpf,
netdev, linux-hardening, stable, linux-kernel
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Thu, 9 Oct 2025 16:27:50 +0200
> On Wed, Oct 08, 2025 at 06:56:59PM +0200, Alexander Lobakin wrote:
>> Turned out certain clearly invalid values passed in &xdp_desc from
>> userspace can pass xp_{,un}aligned_validate_desc() and then lead
>> to UBs or just invalid frames to be queued for xmit.
>>
>> desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len
>> can cause positive integer overflow and wraparound, the same way low
>> enough desc->addr with a non-zero pool->tx_metadata_len can cause
>> negative integer overflow. Both scenarios can then pass the
>> validation successfully.
>
> Hmm, when underflow happens the addr would be enormous, passing
> existing validation would really be rare. However let us fix it while at
> it.
It depends on how big pool->addrs_cnt can be. I haven't dug deep into
the internals, is this value also userspace-supplied or generated by the
core code and is always valid?
Also see below (xp_aligned_validate_desc()).
>
>> This doesn't happen with valid XSk applications, but can be used
>> to perform attacks.
>>
>> Always promote desc->len to ``u64`` first to exclude positive
>> overflows of it. Use explicit check_{add,sub}_overflow() when
>> validating desc->addr (which is ``u64`` already).
>>
>> bloat-o-meter reports a little growth of the code size:
>>
>> add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44)
>> Function old new delta
>> xskq_cons_peek_desc 299 330 +31
>> xsk_tx_peek_release_desc_batch 973 1002 +29
>> xsk_generic_xmit 3148 3132 -16
>>
>> but hopefully this doesn't hurt the performance much.
>
> Let us be fully transparent and link the previous discussion here?
As per a quick discussion with the maintainers yesterday, we would like
to not mention FSB-sponsored code/companies in any way...
>
> I was commenting that breaking up single statement to multiple branches
> might affect subtly performance as this code is executed per each
The compilers successfully merge such stuff.
The only overhead introduced is the calls to
__builtin_{add,sub}_overflow(), they are definitely inlined (compiler
intrinsics), also check_{add,sub}_overflow() are wrapped with
unlikely(), but still may take a couple instructions.
> descriptor. Jason tested copy+aligned mode, let us see if zc+unaligned
> mode is affected.
>
> <rant>
> I am also thinking about test side, but xsk tx metadata came with a
> separate test (xdp_hw_metadata), which was rather about testing positive
> cases. That is probably a separate discussion, but metadata negative
> tests should appear somewhere, I suppose xskxceiver would be a good fit,
> but then, should we merge the existing logic from xdp_hw_metadata?
> </rant>
I'd love to write a test that would prove that invalid descriptors are
successfully rejected, but rather separately from this particular fix.
[...]
>> @@ -143,14 +143,24 @@ static inline bool xp_unused_options_set(u32 options)
>> static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
>> struct xdp_desc *desc)
>> {
>> - u64 addr = desc->addr - pool->tx_metadata_len;
>> - u64 len = desc->len + pool->tx_metadata_len;
>> - u64 offset = addr & (pool->chunk_size - 1);
>> + u64 len = desc->len;
>> + u64 addr, offset;
>>
>> - if (!desc->len)
>> + if (!len)
>
> This is yet another thing being fixed here as for non-zero tx_metadata_len
> we were allowing 0 length descriptors... :< overall feels like we relied
> too much on contract with userspace WRT descriptor layout.
>
> If zc perf is fine, then:
> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
>
>> return false;
>>
>> - if (offset + len > pool->chunk_size)
>> + /* Can overflow if desc->addr < pool->tx_metadata_len */
>> + if (check_sub_overflow(desc->addr, pool->tx_metadata_len, &addr))
>> + return false;
>> +
>> + offset = addr & (pool->chunk_size - 1);
If there's an overflow and @addr went crazy, @offset can still be valid
as it's capped by pool->chunk_size here.
>> +
>> + /*
>> + * Can't overflow: @offset is guaranteed to be < ``U32_MAX``
>> + * (pool->chunk_size is ``u32``), @len is guaranteed
>> + * to be <= ``U32_MAX``.
>> + */
>> + if (offset + len + pool->tx_metadata_len > pool->chunk_size)
>> return false;
>>
>> if (addr >= pool->addrs_cnt)
But if pool->addrs_cnt is always valid, insanely big @addr would be
rejected here, right.
Thanks,
Olek
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH bpf] xsk: harden userspace-supplied &xdp_desc validation
2025-10-09 14:27 ` Maciej Fijalkowski
2025-10-09 15:05 ` Alexander Lobakin
@ 2025-10-10 6:51 ` Jason Xing
1 sibling, 0 replies; 7+ messages in thread
From: Jason Xing @ 2025-10-10 6:51 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: Alexander Lobakin, Magnus Karlsson, Stanislav Fomichev,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Kees Cook,
nxne.cnse.osdt.itp.upstreaming, bpf, netdev, linux-hardening,
stable, linux-kernel
On Thu, Oct 9, 2025 at 10:28 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Wed, Oct 08, 2025 at 06:56:59PM +0200, Alexander Lobakin wrote:
> > Turned out certain clearly invalid values passed in &xdp_desc from
> > userspace can pass xp_{,un}aligned_validate_desc() and then lead
> > to UBs or just invalid frames to be queued for xmit.
> >
> > desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len
> > can cause positive integer overflow and wraparound, the same way low
> > enough desc->addr with a non-zero pool->tx_metadata_len can cause
> > negative integer overflow. Both scenarios can then pass the
> > validation successfully.
>
> Hmm, when underflow happens the addr would be enormous, passing
> existing validation would really be rare. However let us fix it while at
> it.
>
> > This doesn't happen with valid XSk applications, but can be used
> > to perform attacks.
> >
> > Always promote desc->len to ``u64`` first to exclude positive
> > overflows of it. Use explicit check_{add,sub}_overflow() when
> > validating desc->addr (which is ``u64`` already).
> >
> > bloat-o-meter reports a little growth of the code size:
> >
> > add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44)
> > Function old new delta
> > xskq_cons_peek_desc 299 330 +31
> > xsk_tx_peek_release_desc_batch 973 1002 +29
> > xsk_generic_xmit 3148 3132 -16
> >
> > but hopefully this doesn't hurt the performance much.
>
> Let us be fully transparent and link the previous discussion here?
>
> I was commenting that breaking up single statement to multiple branches
> might affect subtly performance as this code is executed per each
> descriptor. Jason tested copy+aligned mode, let us see if zc+unaligned
> mode is affected.
>
> <rant>
> I am also thinking about test side, but xsk tx metadata came with a
> separate test (xdp_hw_metadata), which was rather about testing positive
> cases. That is probably a separate discussion, but metadata negative
> tests should appear somewhere, I suppose xskxceiver would be a good fit,
> but then, should we merge the existing logic from xdp_hw_metadata?
> </rant>
>
> >
> > Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len")
> > Cc: stable@vger.kernel.org # 6.8+
> > Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> > ---
> > net/xdp/xsk_queue.h | 45 +++++++++++++++++++++++++++++++++++----------
> > 1 file changed, 35 insertions(+), 10 deletions(-)
> >
> > diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> > index f16f390370dc..1eb8d9f8b104 100644
> > --- a/net/xdp/xsk_queue.h
> > +++ b/net/xdp/xsk_queue.h
> > @@ -143,14 +143,24 @@ static inline bool xp_unused_options_set(u32 options)
> > static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
> > struct xdp_desc *desc)
> > {
> > - u64 addr = desc->addr - pool->tx_metadata_len;
> > - u64 len = desc->len + pool->tx_metadata_len;
> > - u64 offset = addr & (pool->chunk_size - 1);
> > + u64 len = desc->len;
> > + u64 addr, offset;
> >
> > - if (!desc->len)
> > + if (!len)
>
> This is yet another thing being fixed here as for non-zero tx_metadata_len
> we were allowing 0 length descriptors... :< overall feels like we relied
> too much on contract with userspace WRT descriptor layout.
>
> If zc perf is fine, then:
Testing on IXGBE that can reach 10Gb/sec line rate, I didn't see any
impact either. This time I used -u to cover the unaligned case.
What I did was just like below:
# sysctl -w vm.nr_hugepages=1024
# taskset -c 1 ./xdpsock -i eth1 -t -z -u -s 64
Thanks,
Jason
> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
>
> > return false;
> >
> > - if (offset + len > pool->chunk_size)
> > + /* Can overflow if desc->addr < pool->tx_metadata_len */
> > + if (check_sub_overflow(desc->addr, pool->tx_metadata_len, &addr))
> > + return false;
> > +
> > + offset = addr & (pool->chunk_size - 1);
> > +
> > + /*
> > + * Can't overflow: @offset is guaranteed to be < ``U32_MAX``
> > + * (pool->chunk_size is ``u32``), @len is guaranteed
> > + * to be <= ``U32_MAX``.
> > + */
> > + if (offset + len + pool->tx_metadata_len > pool->chunk_size)
> > return false;
> >
> > if (addr >= pool->addrs_cnt)
> > @@ -158,27 +168,42 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
> >
> > if (xp_unused_options_set(desc->options))
> > return false;
> > +
> > return true;
> > }
> >
> > static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool,
> > struct xdp_desc *desc)
> > {
> > - u64 addr = xp_unaligned_add_offset_to_addr(desc->addr) - pool->tx_metadata_len;
> > - u64 len = desc->len + pool->tx_metadata_len;
> > + u64 len = desc->len;
> > + u64 addr, end;
> >
> > - if (!desc->len)
> > + if (!len)
> > return false;
> >
> > + /* Can't overflow: @len is guaranteed to be <= ``U32_MAX`` */
> > + len += pool->tx_metadata_len;
> > if (len > pool->chunk_size)
> > return false;
> >
> > - if (addr >= pool->addrs_cnt || addr + len > pool->addrs_cnt ||
> > - xp_desc_crosses_non_contig_pg(pool, addr, len))
> > + /* Can overflow if desc->addr is close to 0 */
> > + if (check_sub_overflow(xp_unaligned_add_offset_to_addr(desc->addr),
> > + pool->tx_metadata_len, &addr))
> > + return false;
> > +
> > + if (addr >= pool->addrs_cnt)
> > + return false;
> > +
> > + /* Can overflow if pool->addrs_cnt is high enough */
> > + if (check_add_overflow(addr, len, &end) || end > pool->addrs_cnt)
> > + return false;
> > +
> > + if (xp_desc_crosses_non_contig_pg(pool, addr, len))
> > return false;
> >
> > if (xp_unused_options_set(desc->options))
> > return false;
> > +
> > return true;
> > }
> >
> > --
> > 2.51.0
> >
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH bpf] xsk: harden userspace-supplied &xdp_desc validation
2025-10-08 16:56 [PATCH bpf] xsk: harden userspace-supplied &xdp_desc validation Alexander Lobakin
2025-10-09 14:02 ` Jason Xing
2025-10-09 14:27 ` Maciej Fijalkowski
@ 2025-10-10 17:10 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-10-10 17:10 UTC (permalink / raw)
To: Alexander Lobakin
Cc: magnus.karlsson, maciej.fijalkowski, sdf, ast, daniel, hawk,
john.fastabend, davem, edumazet, kuba, pabeni, horms, kees,
nxne.cnse.osdt.itp.upstreaming, bpf, netdev, linux-hardening,
stable, linux-kernel
Hello:
This patch was applied to bpf/bpf.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Wed, 8 Oct 2025 18:56:59 +0200 you wrote:
> Turned out certain clearly invalid values passed in &xdp_desc from
> userspace can pass xp_{,un}aligned_validate_desc() and then lead
> to UBs or just invalid frames to be queued for xmit.
>
> desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len
> can cause positive integer overflow and wraparound, the same way low
> enough desc->addr with a non-zero pool->tx_metadata_len can cause
> negative integer overflow. Both scenarios can then pass the
> validation successfully.
> This doesn't happen with valid XSk applications, but can be used
> to perform attacks.
>
> [...]
Here is the summary with links:
- [bpf] xsk: harden userspace-supplied &xdp_desc validation
https://git.kernel.org/bpf/bpf/c/07ca98f906a4
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-10-10 17:10 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-08 16:56 [PATCH bpf] xsk: harden userspace-supplied &xdp_desc validation Alexander Lobakin
2025-10-09 14:02 ` Jason Xing
2025-10-09 14:50 ` Alexander Lobakin
2025-10-09 14:27 ` Maciej Fijalkowski
2025-10-09 15:05 ` Alexander Lobakin
2025-10-10 6:51 ` Jason Xing
2025-10-10 17:10 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).