Linux io-uring development
 help / color / mirror / Atom feed
* [BUG io_uring] Failed RECVSEND_BUNDLE can persistently shrink non-INC pbuf ring len and affect later READ operations
@ 2026-06-07 11:41 Federico Brasili
  2026-06-07 19:07 ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Federico Brasili @ 2026-06-07 11:41 UTC (permalink / raw)
  To: io-uring; +Cc: linux-kernel

Hi,

I found a reproducible io_uring provided-buffer ring issue on Ubuntu
kernel 7.0.0-22-generic.

A failed IORING_RECVSEND_BUNDLE receive on a non-INC provided-buffer
ring can persistently shrink the user-visible buffer descriptor
length. The modified length is not rolled back when the receive fails
with -EAGAIN/no data, and a later unrelated io_uring operation, such
as IORING_OP_READ from a pipe, consumes the corrupted length.

This is not a demonstrated privilege escalation. The demonstrated
impact is deterministic unprivileged provided-buffer ring metadata
corruption across unrelated io_uring operations.

Tested kernel:

Linux ubuntu 7.0.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC Mon May
25 15:54:34 UTC 2026 x86_64 GNU/Linux

Summary:

Create an io_uring instance as an unprivileged user.

Register a non-INC provided-buffer ring with two buffers:

entry0.len = 4096

entry1.len = 4096

Submit IORING_OP_RECV with:

IOSQE_BUFFER_SELECT

IORING_RECVSEND_BUNDLE

req_len = 1

MSG_DONTWAIT

empty AF_UNIX SOCK_DGRAM socket

The receive fails with -EAGAIN, but entry0.len is changed from 4096 to 1.

Submit a later unrelated IORING_OP_READ from a pipe using the same
provided-buffer group with req_len = 4096.

The READ returns only 1 byte, because it uses the previously corrupted
entry0.len.

A second READ then consumes entry1 normally and returns 4096 bytes,
showing that head/bid accounting remains coherent and the corruption
is localized to the poisoned descriptor.

Observed output from clean unprivileged reproduction:

[INIT] uid=1002 entry0.len=4096 entry1.len=4096 tail=2
[STEP1] RECV BUNDLE on empty socket, req_len=1, expected CQE=-EAGAIN
[CQE_RECV_BUNDLE] res=-11 flags=0x0 user=0x1111
[AFTER_RECV_BUNDLE] entry0.len=1 entry1.len=4096 changed_buf0=0
changed_buf1=0 guard_before=0 guard_after=0
[STEP2] write pipe bytes=4096, then IORING_OP_READ req_len=4096 using
same pbuf group
[CQE_READ1] res=1 flags=0x1 user=0x6666
[AFTER_READ1] entry0.len=1 entry1.len=4096 changed_buf0=1
changed_buf1=0 guard_before=0 guard_after=0
[STEP3] write second pipe bytes=4096, then second IORING_OP_READ
req_len=4096 without republish
[CQE_READ2] res=4096 flags=0x10001 user=0x7777
[AFTER_READ2] entry0.len=1 entry1.len=4096 changed_buf0=1
changed_buf1=4096 guard_before=0 guard_after=0
[RESULT] PASS: unprivileged RECV_BUNDLE -EAGAIN poisoned pbuf len and
later IORING_OP_READ consumed the corrupted len.

Why this looks like a bug:

The failed receive should not persistently alter the provided-buffer
descriptor in a way that affects future unrelated operations. In this
case, a no-data/-EAGAIN RECV_BUNDLE changes entry0.len from 4096 to 1,
and that corrupted length is later consumed by IORING_OP_READ from a
pipe.

The suspected root cause is in the non-INC provided-buffer ring BUNDLE
selection path:

io_ring_buffers_peek()
if (len > arg->max_len) {
len = arg->max_len;
if (!(bl->flags & IOBL_INC)) {
arg->partial_map = 1;
if (iov != arg->iovs)
break;
WRITE_ONCE(buf->len, len);
}
}

The descriptor length is modified during buffer selection/peek before
the receive operation has completed successfully. If the receive later
fails with -EAGAIN/no data, the buffer is recycled but the modified
buf->len is not restored.

Additional observations:

The issue reproduces as an unprivileged user.

The effect crosses io_uring operations: RECV affects a later READ.

The effect crosses subsystems: socket receive affects pipe read.

The second READ correctly uses entry1 and returns 4096 bytes, so this
does not appear to be a head/bid desync in the tested case.

No kernel crash, OOB write, UAF, or privilege escalation has been demonstrated.

Expected behavior:

If IORING_RECVSEND_BUNDLE fails with -EAGAIN/no data, the
provided-buffer ring descriptor should not be persistently modified,
or the original len should be restored during recycle/rollback.

Actual behavior:

The failed BUNDLE receive leaves entry0.len shortened to the requested
length, and later unrelated operations using the same provided-buffer
group consume that corrupted length.

I can provide the minimal C reproducer and full output if useful.

Thanks,
Federico

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG io_uring] Failed RECVSEND_BUNDLE can persistently shrink non-INC pbuf ring len and affect later READ operations
  2026-06-07 11:41 [BUG io_uring] Failed RECVSEND_BUNDLE can persistently shrink non-INC pbuf ring len and affect later READ operations Federico Brasili
@ 2026-06-07 19:07 ` Jens Axboe
  2026-06-07 20:08   ` Federico Brasili
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2026-06-07 19:07 UTC (permalink / raw)
  To: Federico Brasili, io-uring; +Cc: linux-kernel

On 6/7/26 5:41 AM, Federico Brasili wrote:
> Hi,
> 
> I found a reproducible io_uring provided-buffer ring issue on Ubuntu
> kernel 7.0.0-22-generic.
> 
> A failed IORING_RECVSEND_BUNDLE receive on a non-INC provided-buffer
> ring can persistently shrink the user-visible buffer descriptor
> length. The modified length is not rolled back when the receive fails
> with -EAGAIN/no data, and a later unrelated io_uring operation, such
> as IORING_OP_READ from a pipe, consumes the corrupted length.
> 
> This is not a demonstrated privilege escalation. The demonstrated
> impact is deterministic unprivileged provided-buffer ring metadata
> corruption across unrelated io_uring operations.
> 
> Tested kernel:
> 
> Linux ubuntu 7.0.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC Mon May
> 25 15:54:34 UTC 2026 x86_64 GNU/Linux
> 
> Summary:
> 
> Create an io_uring instance as an unprivileged user.
> 
> Register a non-INC provided-buffer ring with two buffers:
> 
> entry0.len = 4096
> 
> entry1.len = 4096
> 
> Submit IORING_OP_RECV with:
> 
> IOSQE_BUFFER_SELECT
> 
> IORING_RECVSEND_BUNDLE
> 
> req_len = 1
> 
> MSG_DONTWAIT
> 
> empty AF_UNIX SOCK_DGRAM socket
> 
> The receive fails with -EAGAIN, but entry0.len is changed from 4096 to 1.
> 
> Submit a later unrelated IORING_OP_READ from a pipe using the same
> provided-buffer group with req_len = 4096.
> 
> The READ returns only 1 byte, because it uses the previously corrupted
> entry0.len.
> 
> A second READ then consumes entry1 normally and returns 4096 bytes,
> showing that head/bid accounting remains coherent and the corruption
> is localized to the poisoned descriptor.
> 
> Observed output from clean unprivileged reproduction:
> 
> [INIT] uid=1002 entry0.len=4096 entry1.len=4096 tail=2
> [STEP1] RECV BUNDLE on empty socket, req_len=1, expected CQE=-EAGAIN
> [CQE_RECV_BUNDLE] res=-11 flags=0x0 user=0x1111
> [AFTER_RECV_BUNDLE] entry0.len=1 entry1.len=4096 changed_buf0=0
> changed_buf1=0 guard_before=0 guard_after=0
> [STEP2] write pipe bytes=4096, then IORING_OP_READ req_len=4096 using
> same pbuf group
> [CQE_READ1] res=1 flags=0x1 user=0x6666
> [AFTER_READ1] entry0.len=1 entry1.len=4096 changed_buf0=1
> changed_buf1=0 guard_before=0 guard_after=0
> [STEP3] write second pipe bytes=4096, then second IORING_OP_READ
> req_len=4096 without republish
> [CQE_READ2] res=4096 flags=0x10001 user=0x7777
> [AFTER_READ2] entry0.len=1 entry1.len=4096 changed_buf0=1
> changed_buf1=4096 guard_before=0 guard_after=0
> [RESULT] PASS: unprivileged RECV_BUNDLE -EAGAIN poisoned pbuf len and
> later IORING_OP_READ consumed the corrupted len.
> 
> Why this looks like a bug:
> 
> The failed receive should not persistently alter the provided-buffer
> descriptor in a way that affects future unrelated operations. In this
> case, a no-data/-EAGAIN RECV_BUNDLE changes entry0.len from 4096 to 1,
> and that corrupted length is later consumed by IORING_OP_READ from a
> pipe.
> 
> The suspected root cause is in the non-INC provided-buffer ring BUNDLE
> selection path:
> 
> io_ring_buffers_peek()
> if (len > arg->max_len) {
> len = arg->max_len;
> if (!(bl->flags & IOBL_INC)) {
> arg->partial_map = 1;
> if (iov != arg->iovs)
> break;
> WRITE_ONCE(buf->len, len);
> }
> }
> 
> The descriptor length is modified during buffer selection/peek before
> the receive operation has completed successfully. If the receive later
> fails with -EAGAIN/no data, the buffer is recycled but the modified
> buf->len is not restored.
> 
> Additional observations:
> 
> The issue reproduces as an unprivileged user.
> 
> The effect crosses io_uring operations: RECV affects a later READ.
> 
> The effect crosses subsystems: socket receive affects pipe read.
> 
> The second READ correctly uses entry1 and returns 4096 bytes, so this
> does not appear to be a head/bid desync in the tested case.
> 
> No kernel crash, OOB write, UAF, or privilege escalation has been demonstrated.
> 
> Expected behavior:
> 
> If IORING_RECVSEND_BUNDLE fails with -EAGAIN/no data, the
> provided-buffer ring descriptor should not be persistently modified,
> or the original len should be restored during recycle/rollback.
> 
> Actual behavior:
> 
> The failed BUNDLE receive leaves entry0.len shortened to the requested
> length, and later unrelated operations using the same provided-buffer
> group consume that corrupted length.
> 
> I can provide the minimal C reproducer and full output if useful.

Please do, no point in me recreating one for it. Then it can also get
turned into a regression test cor liburing. Reproducers also mean more
than a thousand words in an email, it tells us exactly what is bring run
and what is going wrong. Or in some cases, what the wrong expectations
are.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG io_uring] Failed RECVSEND_BUNDLE can persistently shrink non-INC pbuf ring len and affect later READ operations
  2026-06-07 19:07 ` Jens Axboe
@ 2026-06-07 20:08   ` Federico Brasili
  2026-06-07 21:22     ` Nyakundi Emmanuel
  2026-06-07 21:38     ` [BUG io_uring] Failed RECVSEND_BUNDLE can persistently shrink non-INC pbuf ring len and affect later READ operations Jens Axboe
  0 siblings, 2 replies; 9+ messages in thread
From: Federico Brasili @ 2026-06-07 20:08 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5881 bytes --]

Hi Jens,

Sure, attaching the minimal reproducer and the output from my Ubuntu
7.0.0-22-generic test system.

The reproducer runs unprivileged and demonstrates:

1. non-INC provided-buffer ring with entry0.len = 4096 and entry1.len = 4096
2. IORING_OP_RECV + IOSQE_BUFFER_SELECT + IORING_RECVSEND_BUNDLE on an
empty SOCK_DGRAM socket
3. CQE returns -EAGAIN, but entry0.len is changed from 4096 to 1
4. a later unrelated IORING_OP_READ from a pipe using the same buffer
group returns 1 byte instead of 4096
5. a second READ uses entry1 and returns 4096, so head/bid accounting
appears coherent in this repro

I am not claiming privilege escalation from this. The demonstrated
issue is persistent provided-buffer descriptor length corruption after
a failed/no-data RECV_BUNDLE, affecting a later READ operation.

Thanks,
Federico

Il giorno dom 7 giu 2026 alle ore 21:07 Jens Axboe <axboe@kernel.dk> ha scritto:
>
> On 6/7/26 5:41 AM, Federico Brasili wrote:
> > Hi,
> >
> > I found a reproducible io_uring provided-buffer ring issue on Ubuntu
> > kernel 7.0.0-22-generic.
> >
> > A failed IORING_RECVSEND_BUNDLE receive on a non-INC provided-buffer
> > ring can persistently shrink the user-visible buffer descriptor
> > length. The modified length is not rolled back when the receive fails
> > with -EAGAIN/no data, and a later unrelated io_uring operation, such
> > as IORING_OP_READ from a pipe, consumes the corrupted length.
> >
> > This is not a demonstrated privilege escalation. The demonstrated
> > impact is deterministic unprivileged provided-buffer ring metadata
> > corruption across unrelated io_uring operations.
> >
> > Tested kernel:
> >
> > Linux ubuntu 7.0.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC Mon May
> > 25 15:54:34 UTC 2026 x86_64 GNU/Linux
> >
> > Summary:
> >
> > Create an io_uring instance as an unprivileged user.
> >
> > Register a non-INC provided-buffer ring with two buffers:
> >
> > entry0.len = 4096
> >
> > entry1.len = 4096
> >
> > Submit IORING_OP_RECV with:
> >
> > IOSQE_BUFFER_SELECT
> >
> > IORING_RECVSEND_BUNDLE
> >
> > req_len = 1
> >
> > MSG_DONTWAIT
> >
> > empty AF_UNIX SOCK_DGRAM socket
> >
> > The receive fails with -EAGAIN, but entry0.len is changed from 4096 to 1.
> >
> > Submit a later unrelated IORING_OP_READ from a pipe using the same
> > provided-buffer group with req_len = 4096.
> >
> > The READ returns only 1 byte, because it uses the previously corrupted
> > entry0.len.
> >
> > A second READ then consumes entry1 normally and returns 4096 bytes,
> > showing that head/bid accounting remains coherent and the corruption
> > is localized to the poisoned descriptor.
> >
> > Observed output from clean unprivileged reproduction:
> >
> > [INIT] uid=1002 entry0.len=4096 entry1.len=4096 tail=2
> > [STEP1] RECV BUNDLE on empty socket, req_len=1, expected CQE=-EAGAIN
> > [CQE_RECV_BUNDLE] res=-11 flags=0x0 user=0x1111
> > [AFTER_RECV_BUNDLE] entry0.len=1 entry1.len=4096 changed_buf0=0
> > changed_buf1=0 guard_before=0 guard_after=0
> > [STEP2] write pipe bytes=4096, then IORING_OP_READ req_len=4096 using
> > same pbuf group
> > [CQE_READ1] res=1 flags=0x1 user=0x6666
> > [AFTER_READ1] entry0.len=1 entry1.len=4096 changed_buf0=1
> > changed_buf1=0 guard_before=0 guard_after=0
> > [STEP3] write second pipe bytes=4096, then second IORING_OP_READ
> > req_len=4096 without republish
> > [CQE_READ2] res=4096 flags=0x10001 user=0x7777
> > [AFTER_READ2] entry0.len=1 entry1.len=4096 changed_buf0=1
> > changed_buf1=4096 guard_before=0 guard_after=0
> > [RESULT] PASS: unprivileged RECV_BUNDLE -EAGAIN poisoned pbuf len and
> > later IORING_OP_READ consumed the corrupted len.
> >
> > Why this looks like a bug:
> >
> > The failed receive should not persistently alter the provided-buffer
> > descriptor in a way that affects future unrelated operations. In this
> > case, a no-data/-EAGAIN RECV_BUNDLE changes entry0.len from 4096 to 1,
> > and that corrupted length is later consumed by IORING_OP_READ from a
> > pipe.
> >
> > The suspected root cause is in the non-INC provided-buffer ring BUNDLE
> > selection path:
> >
> > io_ring_buffers_peek()
> > if (len > arg->max_len) {
> > len = arg->max_len;
> > if (!(bl->flags & IOBL_INC)) {
> > arg->partial_map = 1;
> > if (iov != arg->iovs)
> > break;
> > WRITE_ONCE(buf->len, len);
> > }
> > }
> >
> > The descriptor length is modified during buffer selection/peek before
> > the receive operation has completed successfully. If the receive later
> > fails with -EAGAIN/no data, the buffer is recycled but the modified
> > buf->len is not restored.
> >
> > Additional observations:
> >
> > The issue reproduces as an unprivileged user.
> >
> > The effect crosses io_uring operations: RECV affects a later READ.
> >
> > The effect crosses subsystems: socket receive affects pipe read.
> >
> > The second READ correctly uses entry1 and returns 4096 bytes, so this
> > does not appear to be a head/bid desync in the tested case.
> >
> > No kernel crash, OOB write, UAF, or privilege escalation has been demonstrated.
> >
> > Expected behavior:
> >
> > If IORING_RECVSEND_BUNDLE fails with -EAGAIN/no data, the
> > provided-buffer ring descriptor should not be persistently modified,
> > or the original len should be restored during recycle/rollback.
> >
> > Actual behavior:
> >
> > The failed BUNDLE receive leaves entry0.len shortened to the requested
> > length, and later unrelated operations using the same provided-buffer
> > group consume that corrupted length.
> >
> > I can provide the minimal C reproducer and full output if useful.
>
> Please do, no point in me recreating one for it. Then it can also get
> turned into a regression test cor liburing. Reproducers also mean more
> than a thousand words in an email, it tells us exactly what is bring run
> and what is going wrong. Or in some cases, what the wrong expectations
> are.
>
> --
> Jens Axboe

[-- Attachment #2: iouring_pbuf_reproducer_for_jens.tar.gz --]
[-- Type: application/x-gzip, Size: 2865 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG io_uring] Failed RECVSEND_BUNDLE can persistently shrink non-INC pbuf ring len and affect later READ operations
  2026-06-07 20:08   ` Federico Brasili
@ 2026-06-07 21:22     ` Nyakundi Emmanuel
  2026-06-07 21:39       ` Jens Axboe
  2026-06-07 21:38     ` [BUG io_uring] Failed RECVSEND_BUNDLE can persistently shrink non-INC pbuf ring len and affect later READ operations Jens Axboe
  1 sibling, 1 reply; 9+ messages in thread
From: Nyakundi Emmanuel @ 2026-06-07 21:22 UTC (permalink / raw)
  To: federico.brasili; +Cc: axboe, io-uring, linux-kernel, Nyakundi Emmanuel

On Sun, 7 Jun 2026, Federico Brasili wrote:
> I found a reproducible io_uring provided-buffer ring issue on Ubuntu
> kernel 7.0.0-22-generic.
>
> A failed IORING_RECVSEND_BUNDLE receive on a non-INC provided-buffer
> ring can persistently shrink the user-visible buffer descriptor length.

Confirmed reproducible on:

  Linux archlinux 7.0.11-arch1-1 #1 SMP PREEMPT_DYNAMIC
  Tue, 02 Jun 2026 18:26:58 +0000 x86_64
  Arch Linux (rolling)

Output from your reproducer, run unprivileged:

  [INIT] entry0 len=4096 bid=0 entry1 len=4096 bid=1 tail=2
  [STEP1] poison empty socket: BUNDLE len=1 expect -EAGAIN but entry0 len may truncate
  [CQE1] res=-11 flags=0x0 user=0x1111
  [AFTER1] entry0 len=1 entry1 len=4096 tail=2 changed_buf0=0 changed_buf1=0 guard_before=0 guard_after=0
  [STEP2] wrote pipe bytes=4096, now IORING_OP_READ len=4096 after recv-BUNDLE poisoning
  [CQE_READ] res=1 flags=0x1 user=0x6666
  [AFTER_READ] entry0 len=1 entry1 len=4096 tail=2 changed_buf0=1 changed_buf1=0 guard_before=0 guard_after=0
  [STEP3] wrote second pipe chunk bytes=4096, second IORING_OP_READ len=4096 without republish
  [CQE_READ2] res=4096 flags=0x10001 user=0x7777
  [AFTER_READ2] entry0 len=1 entry1 len=4096 tail=2 changed_buf0=1 changed_buf1=4096 guard_before=0 guard_after=0

entry0.len persistently corrupted 4096 -> 1 after -EAGAIN RECV_BUNDLE.
Subsequent IORING_OP_READ consumed the poisoned length as reported.

This confirms the issue is not Ubuntu-specific and reproduces on a
stock upstream-tracking kernel.

Nyakundi Emmanuel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG io_uring] Failed RECVSEND_BUNDLE can persistently shrink non-INC pbuf ring len and affect later READ operations
  2026-06-07 20:08   ` Federico Brasili
  2026-06-07 21:22     ` Nyakundi Emmanuel
@ 2026-06-07 21:38     ` Jens Axboe
  2026-06-07 21:52       ` Jens Axboe
  1 sibling, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2026-06-07 21:38 UTC (permalink / raw)
  To: Federico Brasili; +Cc: io-uring, linux-kernel

On 6/7/26 2:08 PM, Federico Brasili wrote:
> Hi Jens,
> 
> Sure, attaching the minimal reproducer and the output from my Ubuntu
> 7.0.0-22-generic test system.

Great thanks, I'll take a look. For the record, please don't top post
reply. It makes a mess of conversations on the mailing list.

> The reproducer runs unprivileged and demonstrates:
> 
> 1. non-INC provided-buffer ring with entry0.len = 4096 and entry1.len = 4096
> 2. IORING_OP_RECV + IOSQE_BUFFER_SELECT + IORING_RECVSEND_BUNDLE on an
> empty SOCK_DGRAM socket
> 3. CQE returns -EAGAIN, but entry0.len is changed from 4096 to 1
> 4. a later unrelated IORING_OP_READ from a pipe using the same buffer
> group returns 1 byte instead of 4096
> 5. a second READ uses entry1 and returns 4096, so head/bid accounting
> appears coherent in this repro
> 
> I am not claiming privilege escalation from this. The demonstrated
> issue is persistent provided-buffer descriptor length corruption after
> a failed/no-data RECV_BUNDLE, affecting a later READ operation.

Right, I believe you already mentioned in the first email. It's just
a bug that can cause the app to (rightfully) get confused about the
state of a buffer.

And it's not a corruption in the sense that something else writes
to this buffer length field, the kernel is deliberately writing
to that valid piece of memory. It just misses restoring it when
the operation fails.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG io_uring] Failed RECVSEND_BUNDLE can persistently shrink non-INC pbuf ring len and affect later READ operations
  2026-06-07 21:22     ` Nyakundi Emmanuel
@ 2026-06-07 21:39       ` Jens Axboe
  2026-06-07 22:10         ` [PATCH] test/recv-bundle-pbuf-len-poison: add regression test for pbuf len corruption Nyakundi Emmanuel
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2026-06-07 21:39 UTC (permalink / raw)
  To: Nyakundi Emmanuel, federico.brasili; +Cc: io-uring, linux-kernel

On 6/7/26 3:22 PM, Nyakundi Emmanuel wrote:
> On Sun, 7 Jun 2026, Federico Brasili wrote:
>> I found a reproducible io_uring provided-buffer ring issue on Ubuntu
>> kernel 7.0.0-22-generic.
>>
>> A failed IORING_RECVSEND_BUNDLE receive on a non-INC provided-buffer
>> ring can persistently shrink the user-visible buffer descriptor length.
> 
> Confirmed reproducible on:
> 
>   Linux archlinux 7.0.11-arch1-1 #1 SMP PREEMPT_DYNAMIC
>   Tue, 02 Jun 2026 18:26:58 +0000 x86_64
>   Arch Linux (rolling)
> 
> Output from your reproducer, run unprivileged:
> 
>   [INIT] entry0 len=4096 bid=0 entry1 len=4096 bid=1 tail=2
>   [STEP1] poison empty socket: BUNDLE len=1 expect -EAGAIN but entry0 len may truncate
>   [CQE1] res=-11 flags=0x0 user=0x1111
>   [AFTER1] entry0 len=1 entry1 len=4096 tail=2 changed_buf0=0 changed_buf1=0 guard_before=0 guard_after=0
>   [STEP2] wrote pipe bytes=4096, now IORING_OP_READ len=4096 after recv-BUNDLE poisoning
>   [CQE_READ] res=1 flags=0x1 user=0x6666
>   [AFTER_READ] entry0 len=1 entry1 len=4096 tail=2 changed_buf0=1 changed_buf1=0 guard_before=0 guard_after=0
>   [STEP3] wrote second pipe chunk bytes=4096, second IORING_OP_READ len=4096 without republish
>   [CQE_READ2] res=4096 flags=0x10001 user=0x7777
>   [AFTER_READ2] entry0 len=1 entry1 len=4096 tail=2 changed_buf0=1 changed_buf1=4096 guard_before=0 guard_after=0
> 
> entry0.len persistently corrupted 4096 -> 1 after -EAGAIN RECV_BUNDLE.
> Subsequent IORING_OP_READ consumed the poisoned length as reported.
> 
> This confirms the issue is not Ubuntu-specific and reproduces on a
> stock upstream-tracking kernel.

Which is entirely expected, it's just a generic kernel bug and I doubt
that ubuntu is shipping any specific patches here that aren't already
in stable or upstream.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG io_uring] Failed RECVSEND_BUNDLE can persistently shrink non-INC pbuf ring len and affect later READ operations
  2026-06-07 21:38     ` [BUG io_uring] Failed RECVSEND_BUNDLE can persistently shrink non-INC pbuf ring len and affect later READ operations Jens Axboe
@ 2026-06-07 21:52       ` Jens Axboe
  0 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2026-06-07 21:52 UTC (permalink / raw)
  To: Federico Brasili; +Cc: io-uring, linux-kernel

On 6/7/26 3:38 PM, Jens Axboe wrote:
>> The reproducer runs unprivileged and demonstrates:
>>
>> 1. non-INC provided-buffer ring with entry0.len = 4096 and entry1.len = 4096
>> 2. IORING_OP_RECV + IOSQE_BUFFER_SELECT + IORING_RECVSEND_BUNDLE on an
>> empty SOCK_DGRAM socket
>> 3. CQE returns -EAGAIN, but entry0.len is changed from 4096 to 1
>> 4. a later unrelated IORING_OP_READ from a pipe using the same buffer
>> group returns 1 byte instead of 4096
>> 5. a second READ uses entry1 and returns 4096, so head/bid accounting
>> appears coherent in this repro
>>
>> I am not claiming privilege escalation from this. The demonstrated
>> issue is persistent provided-buffer descriptor length corruption after
>> a failed/no-data RECV_BUNDLE, affecting a later READ operation.
> 
> Right, I believe you already mentioned in the first email. It's just
> a bug that can cause the app to (rightfully) get confused about the
> state of a buffer.
> 
> And it's not a corruption in the sense that something else writes
> to this buffer length field, the kernel is deliberately writing
> to that valid piece of memory. It just misses restoring it when
> the operation fails.

IOW, it's a consistency issue. Words like unprivileged are tossed around
here, but the app could've just written this memory without even the
kernel to do it, it's application memory. There's absolutely nothing
privileged going on here, kernel isn't touching anything that the
application couldn't just have done itself, without involving the
kernel. The kernel _should_ not do it for this case, that's the bug. And
from a quick look, the fix would just be to remove that buf->len
assignment in this case. For the normal case of eg wanting to read 32b
where the length would've been truncated to 32b in the buffer, it should
be fine to leave it at 4096 or whatever size it is. For bundles,
userspace must iterate the buffers when it gets a completion for X
bytes. But the iteration should always be:

	unsigned this_len = min(buf->len, left);

and hence it should not matter if buf->len remains at the untouched
length, for a truncated end buffer.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] test/recv-bundle-pbuf-len-poison: add regression test for pbuf len corruption
  2026-06-07 21:39       ` Jens Axboe
@ 2026-06-07 22:10         ` Nyakundi Emmanuel
  2026-06-07 22:16           ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Nyakundi Emmanuel @ 2026-06-07 22:10 UTC (permalink / raw)
  To: axboe; +Cc: federico.brasili, io-uring, linux-kernel, Nyakundi Emmanuel

A failed IORING_RECVSEND_BUNDLE receive on a non-INC provided-buffer
ring can persistently corrupt the buffer descriptor length. When the
receive fails with -EAGAIN, the kernel writes the requested length into
buf->len during buffer selection but never restores it on failure.

A later unrelated IORING_OP_READ using the same buffer group then
consumes the corrupted length, returning fewer bytes than expected.

This test reproduces the issue as reported by Federico Brasili.

Reported-by: Federico Brasili <federico.brasili@gmail.com>
Link: https://lore.kernel.org/io-uring/CAAEr8jbY60noGj1fw_k91UJRBkyiRVoS6=nLhZ7Svwidjn4CAA@mail.gmail.com/
Signed-off-by: Nyakundi Emmanuel <nyariboemmanuel8@gmail.com>
---
 test/recv-bundle-pbuf-len-poison.c | 146 +++++++++++++++++++++++++++++
 1 file changed, 146 insertions(+)
 create mode 100644 test/recv-bundle-pbuf-len-poison.c

diff --git a/test/recv-bundle-pbuf-len-poison.c b/test/recv-bundle-pbuf-len-poison.c
new file mode 100644
index 00000000..90fafff4
--- /dev/null
+++ b/test/recv-bundle-pbuf-len-poison.c
@@ -0,0 +1,146 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Regression test for io_uring provided-buffer ring length corruption.
+ *
+ * A failed IORING_RECVSEND_BUNDLE receive on a non-INC provided-buffer
+ * ring can persistently shrink the user-visible buffer descriptor length.
+ * The modified length is not rolled back when the receive fails with
+ * -EAGAIN, and a later unrelated IORING_OP_READ from a pipe consumes
+ * the corrupted length.
+ *
+ * Reported-by: Federico Brasili <federico.brasili@gmail.com>
+ */
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/socket.h>
+
+#include "liburing.h"
+#include "helpers.h"
+
+#define BGID		8
+#define BUF_SIZE	4096
+#define NR_BUFS		2
+
+static int test(void)
+{
+	struct io_uring_buf_ring *br;
+	struct io_uring_cqe *cqe;
+	struct io_uring_sqe *sqe;
+	struct io_uring ring;
+	struct io_uring_buf *buf_entry;
+	int sockfd, pipefds[2], ret;
+	void *buf;
+	char pipe_data[BUF_SIZE];
+
+	ret = io_uring_queue_init(8, &ring, 0);
+	if (ret) {
+		fprintf(stderr, "queue init failed: %d\n", ret);
+		return T_EXIT_FAIL;
+	}
+
+	if (posix_memalign(&buf, 4096, BUF_SIZE * NR_BUFS))
+		return T_EXIT_FAIL;
+
+	/* set up non-INC provided buffer ring with 2 buffers of BUF_SIZE */
+	br = io_uring_setup_buf_ring(&ring, NR_BUFS, BGID, 0, &ret);
+	if (!br) {
+		if (ret == -EINVAL)
+			return T_EXIT_SKIP;
+		fprintf(stderr, "buf ring setup failed: %d\n", ret);
+		return T_EXIT_FAIL;
+	}
+
+	io_uring_buf_ring_add(br, buf,             BUF_SIZE, 0, NR_BUFS - 1, 0);
+	io_uring_buf_ring_add(br, buf + BUF_SIZE,  BUF_SIZE, 1, NR_BUFS - 1, 1);
+	io_uring_buf_ring_advance(br, NR_BUFS);
+
+	/* create an empty SOCK_DGRAM socket to trigger -EAGAIN */
+	sockfd = socket(AF_UNIX, SOCK_DGRAM, 0);
+	if (sockfd < 0) {
+		perror("socket");
+		return T_EXIT_FAIL;
+	}
+
+	/* submit RECV_BUNDLE on empty socket — expects -EAGAIN */
+	sqe = io_uring_get_sqe(&ring);
+	io_uring_prep_recv(sqe, sockfd, NULL, 1, MSG_DONTWAIT);
+	sqe->ioprio |= IORING_RECVSEND_BUNDLE;
+	sqe->flags  |= IOSQE_BUFFER_SELECT;
+	sqe->buf_group = BGID;
+	sqe->user_data  = 0x1111;
+	io_uring_submit(&ring);
+
+	ret = io_uring_wait_cqe(&ring, &cqe);
+	if (ret) {
+		fprintf(stderr, "wait cqe failed: %d\n", ret);
+		return T_EXIT_FAIL;
+	}
+	if (cqe->res != -EAGAIN) {
+		fprintf(stderr, "expected -EAGAIN, got %d\n", cqe->res);
+		io_uring_cqe_seen(&ring, cqe);
+		return T_EXIT_FAIL;
+	}
+	io_uring_cqe_seen(&ring, cqe);
+
+	/* check entry0.len — must still be BUF_SIZE after failed RECV */
+	buf_entry = &br->bufs[0];
+	if (buf_entry->len != BUF_SIZE) {
+		fprintf(stderr,
+			"FAIL: entry0.len corrupted after -EAGAIN RECV_BUNDLE: "
+			"got %u, expected %u\n",
+			buf_entry->len, BUF_SIZE);
+		return T_EXIT_FAIL;
+	}
+
+	/* now do a pipe READ using the same buffer group */
+	if (pipe(pipefds)) {
+		perror("pipe");
+		return T_EXIT_FAIL;
+	}
+
+	memset(pipe_data, 'A', BUF_SIZE);
+	if (write(pipefds[1], pipe_data, BUF_SIZE) != BUF_SIZE) {
+		fprintf(stderr, "pipe write failed\n");
+		return T_EXIT_FAIL;
+	}
+
+	sqe = io_uring_get_sqe(&ring);
+	io_uring_prep_read(sqe, pipefds[0], NULL, BUF_SIZE, 0);
+	sqe->flags    |= IOSQE_BUFFER_SELECT;
+	sqe->buf_group = BGID;
+	sqe->user_data  = 0x6666;
+	io_uring_submit(&ring);
+
+	ret = io_uring_wait_cqe(&ring, &cqe);
+	if (ret) {
+		fprintf(stderr, "wait read cqe failed: %d\n", ret);
+		return T_EXIT_FAIL;
+	}
+	if (cqe->res != BUF_SIZE) {
+		fprintf(stderr,
+			"FAIL: READ got %d bytes, expected %d — "
+			"pbuf len was poisoned by failed RECV_BUNDLE\n",
+			cqe->res, BUF_SIZE);
+		io_uring_cqe_seen(&ring, cqe);
+		return T_EXIT_FAIL;
+	}
+	io_uring_cqe_seen(&ring, cqe);
+
+	close(sockfd);
+	close(pipefds[0]);
+	close(pipefds[1]);
+	io_uring_queue_exit(&ring);
+	free(buf);
+	return T_EXIT_PASS;
+}
+
+int main(int argc, char *argv[])
+{
+	if (argc > 1)
+		return T_EXIT_SKIP;
+
+	return test();
+}
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] test/recv-bundle-pbuf-len-poison: add regression test for pbuf len corruption
  2026-06-07 22:10         ` [PATCH] test/recv-bundle-pbuf-len-poison: add regression test for pbuf len corruption Nyakundi Emmanuel
@ 2026-06-07 22:16           ` Jens Axboe
  0 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2026-06-07 22:16 UTC (permalink / raw)
  To: Nyakundi Emmanuel; +Cc: federico.brasili, io-uring, linux-kernel

On 6/7/26 4:10 PM, Nyakundi Emmanuel wrote:
> A failed IORING_RECVSEND_BUNDLE receive on a non-INC provided-buffer
> ring can persistently corrupt the buffer descriptor length. When the
> receive fails with -EAGAIN, the kernel writes the requested length into
> buf->len during buffer selection but never restores it on failure.
> 
> A later unrelated IORING_OP_READ using the same buffer group then
> consumes the corrupted length, returning fewer bytes than expected.
> 
> This test reproduces the issue as reported by Federico Brasili.

Thanks, but I already wrote one, which also tests the much more
important aspect of the kernel change - that the reported CQE
completion reports the right amount without truncating the
buffer length when no bytes have been transferred.

And once again, it's not _corrupting_ the buffer length. It's
shrinking it, which is unexpected and should not happen, but there's
no corruption taking place.

I'm dubious on how much AI koolaid was used in reproducing the
test case and report? That said, it is something we should fix,
as the kernel should not be changing the buffer length for this
case.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-06-07 22:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-07 11:41 [BUG io_uring] Failed RECVSEND_BUNDLE can persistently shrink non-INC pbuf ring len and affect later READ operations Federico Brasili
2026-06-07 19:07 ` Jens Axboe
2026-06-07 20:08   ` Federico Brasili
2026-06-07 21:22     ` Nyakundi Emmanuel
2026-06-07 21:39       ` Jens Axboe
2026-06-07 22:10         ` [PATCH] test/recv-bundle-pbuf-len-poison: add regression test for pbuf len corruption Nyakundi Emmanuel
2026-06-07 22:16           ` Jens Axboe
2026-06-07 21:38     ` [BUG io_uring] Failed RECVSEND_BUNDLE can persistently shrink non-INC pbuf ring len and affect later READ operations Jens Axboe
2026-06-07 21:52       ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox