From: Jens Axboe <axboe@kernel.dk>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: io-uring@vger.kernel.org, Keith Busch <kbusch@kernel.org>
Subject: Re: [PATCHSET v2 0/8] Add support for mixed sized CQEs
Date: Thu, 21 Aug 2025 11:46:03 -0600 [thread overview]
Message-ID: <670929ea-b614-40cf-b5cc-929a39d9e59d@kernel.dk> (raw)
In-Reply-To: <CADUfDZpPP2FR1X9hVSkhbtQs=2wtXkeXRBjPDXA9ShSCU0PM2w@mail.gmail.com>
On 8/21/25 11:41 AM, Caleb Sander Mateos wrote:
> On Thu, Aug 21, 2025 at 10:12?AM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 8/21/25 11:02 AM, Caleb Sander Mateos wrote:
>>> On Thu, Aug 21, 2025 at 7:28?AM Jens Axboe <axboe@kernel.dk> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Currently io_uring supports two modes for CQEs:
>>>>
>>>> 1) The standard mode, where 16b CQEs are used
>>>> 2) Setting IORING_SETUP_CQE32, which makes all CQEs posted 32b
>>>>
>>>> Certain features need to pass more information back than just a single
>>>> 32-bit res field, and hence mandate the use of CQE32 to be able to work.
>>>> Examples of that include passthrough or other uses of ->uring_cmd() like
>>>> socket option getting and setting, including timestamps.
>>>>
>>>> This patchset adds support for IORING_SETUP_CQE_MIXED, which allows
>>>> posting both 16b and 32b CQEs on the same CQ ring. The idea here is that
>>>> we need not waste twice the space for CQ rings, or use twice the space
>>>> per CQE posted, if only some of the CQEs posted require the use of 32b
>>>> CQEs. On a ring setup in CQE mixed mode, 32b posted CQEs will have
>>>> IORING_CQE_F_32 set in cqe->flags to tell the application (or liburing)
>>>> about this fact.
>>>
>>> This makes a lot of sense. Have you considered something analogous for
>>> SQEs? Requiring all SQEs to be 128 bytes when an io_uring is used for
>>> a mix of 64-byte and 128-byte SQEs also wastes memory, probably even
>>> more since SQEs are 4x larger than CQEs.
>>
>> Adding Keith, as he and I literally just talked about that. My answer
>> was that the case is a bit different in that 32b CQEs can be useful in
>> cases that are predominately 16b in the first place. For example,
>> networking workload doing send/recv/etc and the occassional
>> get/setsockopt kind of thing. Or maybe a mix of normal recv and zero
>> copy rx.
>>
>> For the SQE case, I think it's a bit different. At least the cases I
>> know of, it's mostly 100% 64b SQEs or 128b SQEs. I'm certainly willing
>> to be told otherwise! Because that is kind of the key question that
>> needs answering before even thinking about doing that kind of work.
>
> We certainly have a use case that mixes the two on the same io_uring:
> ublk commit/buffer register/unregister commands (64 byte SQEs) and
> NVMe passthru commands (128 byte SQEs). I could also imagine an
> application issuing both normal read/write commands and NVMe passthru
> commands. But you're probably right that this isn't a super common use
> case.
Yes that's a good point, and that would roughly be 50/50 in terms of 64b
vs 128b SQEs?
And yes, I can imagine other uses cases too, but I'm also finding a hard
time justifying those as likely. On the other hand, people do the
weirdest things...
>> But yes, it could be supported, and Keith (kind of) signed himself up to
>> do that. One oddity I see on that side is that while with CQE32 the
>> kernel can manage the potential wrap-around gap, for SQEs that's
>> obviously on the application to do. That could just be a NOP or
>> something like that, but you do need something to fill/skip that space.
>> I guess that could be as simple as having an opcode that is simply "skip
>> me", so on the kernel side it'd be easy as it'd just drop it on the
>> floor. You still need to app side to fill one, however, and then deal
>> with "oops SQ ring is now full" too.
>
> Sure, of course userspace would need to handle a misaligned big SQE at
> the end of the SQ analogously to mixed CQE sizes. I assume liburing
> should be able to do that mostly transparently, that logic could all
> be encapsulated by io_uring_get_sqe().
Yep I think so, we'd need a new helper to return the kind of SQE you
want, and it'd just need to get a 64b one and mark it with the SKIP
opcode first if being asked for a 128b one and we're one off from
wrapping around.
--
Jens Axboe
next prev parent reply other threads:[~2025-08-21 17:46 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-21 14:18 [PATCHSET v2 0/8] Add support for mixed sized CQEs Jens Axboe
2025-08-21 14:18 ` [PATCH 1/8] io_uring: remove io_ctx_cqe32() helper Jens Axboe
2025-08-21 14:18 ` [PATCH 2/8] io_uring: add UAPI definitions for mixed CQE postings Jens Axboe
2025-08-21 14:18 ` [PATCH 3/8] io_uring/fdinfo: handle mixed sized CQEs Jens Axboe
2025-08-21 14:18 ` [PATCH 4/8] io_uring/trace: support completion tracing of mixed 32b CQEs Jens Axboe
2025-08-21 14:18 ` [PATCH 5/8] io_uring: add support for IORING_SETUP_CQE_MIXED Jens Axboe
2025-08-21 14:18 ` [PATCH 6/8] io_uring/nop: " Jens Axboe
2025-08-21 14:18 ` [PATCH 7/8] io_uring/uring_cmd: " Jens Axboe
2025-08-21 14:18 ` [PATCH 8/8] io_uring/zcrx: " Jens Axboe
2025-08-21 17:02 ` [PATCHSET v2 0/8] Add support for mixed sized CQEs Caleb Sander Mateos
2025-08-21 17:12 ` Jens Axboe
2025-08-21 17:40 ` Keith Busch
2025-08-21 17:47 ` Jens Axboe
2025-08-21 17:41 ` Caleb Sander Mateos
2025-08-21 17:46 ` Jens Axboe [this message]
2025-08-21 18:19 ` Caleb Sander Mateos
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=670929ea-b614-40cf-b5cc-929a39d9e59d@kernel.dk \
--to=axboe@kernel.dk \
--cc=csander@purestorage.com \
--cc=io-uring@vger.kernel.org \
--cc=kbusch@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.