* [PATCH V3] io_uring: uring_cmd: add multishot support
@ 2025-08-19 15:00 Ming Lei
2025-08-19 16:00 ` Jens Axboe
0 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2025-08-19 15:00 UTC (permalink / raw)
To: Jens Axboe, io-uring, Pavel Begunkov; +Cc: Caleb Sander Mateos, Ming Lei
Add UAPI flag IORING_URING_CMD_MULTISHOT for supporting multishot
uring_cmd operations with provided buffer.
This enables drivers to post multiple completion events from a single
uring_cmd submission, which is useful for:
- Notifying userspace of device events (e.g., interrupt handling)
- Supporting devices with multiple event sources (e.g., multi-queue devices)
- Avoiding the need for device poll() support when events originate
from multiple sources device-wide
The implementation adds two new APIs:
- io_uring_cmd_select_buffer(): selects a buffer from the provided
buffer group for multishot uring_cmd
- io_uring_mshot_cmd_post_cqe(): posts a CQE after event data is
pushed to the provided buffer
Multishot uring_cmd must be used with buffer select (IOSQE_BUFFER_SELECT)
and is mutually exclusive with IORING_URING_CMD_FIXED for now.
The ublk driver will be the first user of this functionality:
https://github.com/ming1/linux/commits/ublk-devel/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
V3:
- enhance buffer select check(Jens)
V2:
- Fixed static inline return type
- Updated UAPI comments: Clarified that IORING_URING_CMD_MULTISHOT must be used with buffer select
- Refactored validation checks: Moved the mutual exclusion checks into the individual flag validation
sections for better code organization
- Added missing req_set_fail(): Added the missing failure handling in io_uring_mshot_cmd_post_cqe
- Improved commit message: Rewrote the commit message to be clearer, more technical, and better explain
the use cases and API changes
include/linux/io_uring/cmd.h | 28 ++++++++++++++
include/uapi/linux/io_uring.h | 6 ++-
io_uring/opdef.c | 1 +
io_uring/uring_cmd.c | 70 ++++++++++++++++++++++++++++++++++-
4 files changed, 103 insertions(+), 2 deletions(-)
diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
index cfa6d0c0c322..72832757f8ef 100644
--- a/include/linux/io_uring/cmd.h
+++ b/include/linux/io_uring/cmd.h
@@ -70,6 +70,22 @@ void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd,
/* Execute the request from a blocking context */
void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd);
+/*
+ * Select a buffer from the provided buffer group for multishot uring_cmd.
+ * Returns the selected buffer address and size.
+ */
+int io_uring_cmd_select_buffer(struct io_uring_cmd *ioucmd,
+ unsigned buf_group,
+ void **buf, size_t *len,
+ unsigned int issue_flags);
+
+/*
+ * Complete a multishot uring_cmd event. This will post a CQE to the completion
+ * queue and update the provided buffer.
+ */
+bool io_uring_mshot_cmd_post_cqe(struct io_uring_cmd *ioucmd,
+ ssize_t ret, unsigned int issue_flags);
+
#else
static inline int
io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
@@ -102,6 +118,18 @@ static inline void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd,
static inline void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd)
{
}
+static inline int io_uring_cmd_select_buffer(struct io_uring_cmd *ioucmd,
+ unsigned buf_group,
+ void **buf, size_t *len,
+ unsigned int issue_flags)
+{
+ return -EOPNOTSUPP;
+}
+static inline bool io_uring_mshot_cmd_post_cqe(struct io_uring_cmd *ioucmd,
+ ssize_t ret, unsigned int issue_flags)
+{
+ return true;
+}
#endif
/*
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 6957dc539d83..1e935f8901c5 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -298,9 +298,13 @@ enum io_uring_op {
* sqe->uring_cmd_flags top 8bits aren't available for userspace
* IORING_URING_CMD_FIXED use registered buffer; pass this flag
* along with setting sqe->buf_index.
+ * IORING_URING_CMD_MULTISHOT must be used with buffer select, like other
+ * multishot commands. Not compatible with
+ * IORING_URING_CMD_FIXED, for now.
*/
#define IORING_URING_CMD_FIXED (1U << 0)
-#define IORING_URING_CMD_MASK IORING_URING_CMD_FIXED
+#define IORING_URING_CMD_MULTISHOT (1U << 1)
+#define IORING_URING_CMD_MASK (IORING_URING_CMD_FIXED | IORING_URING_CMD_MULTISHOT)
/*
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 9568785810d9..932319633eac 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -413,6 +413,7 @@ const struct io_issue_def io_issue_defs[] = {
#endif
},
[IORING_OP_URING_CMD] = {
+ .buffer_select = 1,
.needs_file = 1,
.plug = 1,
.iopoll = 1,
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 053bac89b6c0..0f1920771b6f 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -11,6 +11,7 @@
#include "io_uring.h"
#include "alloc_cache.h"
#include "rsrc.h"
+#include "kbuf.h"
#include "uring_cmd.h"
#include "poll.h"
@@ -194,8 +195,21 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (ioucmd->flags & ~IORING_URING_CMD_MASK)
return -EINVAL;
- if (ioucmd->flags & IORING_URING_CMD_FIXED)
+ if (ioucmd->flags & IORING_URING_CMD_FIXED) {
+ if (ioucmd->flags & IORING_URING_CMD_MULTISHOT)
+ return -EINVAL;
req->buf_index = READ_ONCE(sqe->buf_index);
+ }
+
+ if (ioucmd->flags & IORING_URING_CMD_MULTISHOT) {
+ if (ioucmd->flags & IORING_URING_CMD_FIXED)
+ return -EINVAL;
+ if (!(req->flags & REQ_F_BUFFER_SELECT))
+ return -EINVAL;
+ } else {
+ if (req->flags & REQ_F_BUFFER_SELECT)
+ return -EINVAL;
+ }
ioucmd->cmd_op = READ_ONCE(sqe->cmd_op);
@@ -251,6 +265,11 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
}
ret = file->f_op->uring_cmd(ioucmd, issue_flags);
+ if (ioucmd->flags & IORING_URING_CMD_MULTISHOT) {
+ if (ret >= 0)
+ return IOU_ISSUE_SKIP_COMPLETE;
+ io_kbuf_recycle(req, issue_flags);
+ }
if (ret == -EAGAIN) {
ioucmd->flags |= IORING_URING_CMD_REISSUE;
return ret;
@@ -333,3 +352,52 @@ bool io_uring_cmd_post_mshot_cqe32(struct io_uring_cmd *cmd,
return false;
return io_req_post_cqe32(req, cqe);
}
+
+int io_uring_cmd_select_buffer(struct io_uring_cmd *ioucmd,
+ unsigned buf_group,
+ void __user **buf, size_t *len,
+ unsigned int issue_flags)
+{
+ struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
+ void __user *ubuf;
+
+ if (!(ioucmd->flags & IORING_URING_CMD_MULTISHOT))
+ return -EINVAL;
+
+ ubuf = io_buffer_select(req, len, buf_group, issue_flags);
+ if (!ubuf)
+ return -ENOBUFS;
+
+ *buf = ubuf;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(io_uring_cmd_select_buffer);
+
+/*
+ * Return true if this multishot uring_cmd needs to be completed, otherwise
+ * the event CQE is posted successfully.
+ *
+ * Should only be used from a task_work
+ *
+ */
+bool io_uring_mshot_cmd_post_cqe(struct io_uring_cmd *ioucmd,
+ ssize_t ret, unsigned int issue_flags)
+{
+ struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
+ unsigned int cflags = 0;
+
+ if (!(ioucmd->flags & IORING_URING_CMD_MULTISHOT))
+ return true;
+
+ if (ret > 0) {
+ cflags = io_put_kbuf(req, ret, issue_flags);
+ if (io_req_post_cqe(req, ret, cflags | IORING_CQE_F_MORE))
+ return false;
+ }
+
+ if (ret < 0)
+ req_set_fail(req);
+ io_req_set_res(req, ret, cflags);
+ return true;
+}
+EXPORT_SYMBOL_GPL(io_uring_mshot_cmd_post_cqe);
--
2.47.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH V3] io_uring: uring_cmd: add multishot support
2025-08-19 15:00 [PATCH V3] io_uring: uring_cmd: add multishot support Ming Lei
@ 2025-08-19 16:00 ` Jens Axboe
2025-08-20 11:08 ` Ming Lei
0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2025-08-19 16:00 UTC (permalink / raw)
To: Ming Lei, io-uring, Pavel Begunkov; +Cc: Caleb Sander Mateos
On 8/19/25 9:00 AM, Ming Lei wrote:
> @@ -251,6 +265,11 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
> }
>
> ret = file->f_op->uring_cmd(ioucmd, issue_flags);
> + if (ioucmd->flags & IORING_URING_CMD_MULTISHOT) {
> + if (ret >= 0)
> + return IOU_ISSUE_SKIP_COMPLETE;
> + io_kbuf_recycle(req, issue_flags);
> + }
> if (ret == -EAGAIN) {
> ioucmd->flags |= IORING_URING_CMD_REISSUE;
> return ret;
Final comment on this part... uring_cmd is unique in the sense that it'd
be the first potentially pollable file type that supports buffer
selection AND can return -EIOCBQUEUED. For non-pollable, the buffer
would get committed upfront. For pollable, we'd either finish and put it
within this same execution context, or we'd drop it entirely when
returning -EAGAIN.
So what happens if we get -EIOCBQUEUED with a selected buffer from
provided buffer ring, and someome malicious unregisters and frees the
buffer ring before that request completes?
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH V3] io_uring: uring_cmd: add multishot support
2025-08-19 16:00 ` Jens Axboe
@ 2025-08-20 11:08 ` Ming Lei
2025-08-20 13:11 ` Jens Axboe
0 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2025-08-20 11:08 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Caleb Sander Mateos
On Tue, Aug 19, 2025 at 10:00:36AM -0600, Jens Axboe wrote:
> On 8/19/25 9:00 AM, Ming Lei wrote:
> > @@ -251,6 +265,11 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
> > }
> >
> > ret = file->f_op->uring_cmd(ioucmd, issue_flags);
> > + if (ioucmd->flags & IORING_URING_CMD_MULTISHOT) {
> > + if (ret >= 0)
> > + return IOU_ISSUE_SKIP_COMPLETE;
> > + io_kbuf_recycle(req, issue_flags);
> > + }
> > if (ret == -EAGAIN) {
> > ioucmd->flags |= IORING_URING_CMD_REISSUE;
> > return ret;
>
> Final comment on this part... uring_cmd is unique in the sense that it'd
> be the first potentially pollable file type that supports buffer
> selection AND can return -EIOCBQUEUED. For non-pollable, the buffer
> would get committed upfront. For pollable, we'd either finish and put it
> within this same execution context, or we'd drop it entirely when
> returning -EAGAIN.
>
> So what happens if we get -EIOCBQUEUED with a selected buffer from
> provided buffer ring, and someome malicious unregisters and frees the
> buffer ring before that request completes?
Looks one real trouble for IORING_URING_CMD_MULTISHOT.
For pollable multishot, ->issue() is run in submitter tw context, and done
in `sync` style, so ctx->uring_lock protects the buffer list, and
unregister can't happen. That should be one reason why polled multishot
can't be run in io-wq context.
But now -EIOCBQUEUED is returned from ->issue(), we lose ->uring_lock's
protection for req->buf_list, one idea could be adding referenced buffer
list for failing unregister in case of any active consumer.
Do you have suggestions for this problem?
Thanks,
Ming
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH V3] io_uring: uring_cmd: add multishot support
2025-08-20 11:08 ` Ming Lei
@ 2025-08-20 13:11 ` Jens Axboe
2025-08-20 15:39 ` Ming Lei
0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2025-08-20 13:11 UTC (permalink / raw)
To: Ming Lei; +Cc: io-uring, Pavel Begunkov, Caleb Sander Mateos
On 8/20/25 5:08 AM, Ming Lei wrote:
> On Tue, Aug 19, 2025 at 10:00:36AM -0600, Jens Axboe wrote:
>> On 8/19/25 9:00 AM, Ming Lei wrote:
>>> @@ -251,6 +265,11 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
>>> }
>>>
>>> ret = file->f_op->uring_cmd(ioucmd, issue_flags);
>>> + if (ioucmd->flags & IORING_URING_CMD_MULTISHOT) {
>>> + if (ret >= 0)
>>> + return IOU_ISSUE_SKIP_COMPLETE;
>>> + io_kbuf_recycle(req, issue_flags);
>>> + }
>>> if (ret == -EAGAIN) {
>>> ioucmd->flags |= IORING_URING_CMD_REISSUE;
>>> return ret;
>>
>> Final comment on this part... uring_cmd is unique in the sense that it'd
>> be the first potentially pollable file type that supports buffer
>> selection AND can return -EIOCBQUEUED. For non-pollable, the buffer
>> would get committed upfront. For pollable, we'd either finish and put it
>> within this same execution context, or we'd drop it entirely when
>> returning -EAGAIN.
>>
>> So what happens if we get -EIOCBQUEUED with a selected buffer from
>> provided buffer ring, and someome malicious unregisters and frees the
>> buffer ring before that request completes?
>
> Looks one real trouble for IORING_URING_CMD_MULTISHOT.
>
> For pollable multishot, ->issue() is run in submitter tw context, and done
> in `sync` style, so ctx->uring_lock protects the buffer list, and
> unregister can't happen. That should be one reason why polled multishot
> can't be run in io-wq context.
>
> But now -EIOCBQUEUED is returned from ->issue(), we lose ->uring_lock's
> protection for req->buf_list, one idea could be adding referenced buffer
> list for failing unregister in case of any active consumer.
>
> Do you have suggestions for this problem?
Just commit the buffer upfront, rather than grab it at issue time and
commit when you get the completion callback? Yes that will pin the
buffer for the duration of the IO, but that should not be an issue,
nobody else can use it anyway. Avoiding the pin for pollable files with
potentially infinite IO times (eg pipe that never gets written to, or
socket that never gets data) is a key concept for those kinds of
workloads, but for finite completion times or single use cases like
yours here, that doesn't really matter.
I've got a bit of a side project making the provided buffer selection a
bit more foolproof in the sense that it makes it explicit that the scope
of it is the issue context, but across executions. One current problem
is req->buf_list, which for provided buffer rings really is local scope,
yet it's in the io_kiocb. I'll be moving that somewhere else and out of
io_kiocb. Just a side note, because it's currently easy to get this
wrong even if you know what you are doing, as per your patch.
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH V3] io_uring: uring_cmd: add multishot support
2025-08-20 13:11 ` Jens Axboe
@ 2025-08-20 15:39 ` Ming Lei
0 siblings, 0 replies; 5+ messages in thread
From: Ming Lei @ 2025-08-20 15:39 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Caleb Sander Mateos
On Wed, Aug 20, 2025 at 07:11:52AM -0600, Jens Axboe wrote:
> On 8/20/25 5:08 AM, Ming Lei wrote:
> > On Tue, Aug 19, 2025 at 10:00:36AM -0600, Jens Axboe wrote:
> >> On 8/19/25 9:00 AM, Ming Lei wrote:
> >>> @@ -251,6 +265,11 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
> >>> }
> >>>
> >>> ret = file->f_op->uring_cmd(ioucmd, issue_flags);
> >>> + if (ioucmd->flags & IORING_URING_CMD_MULTISHOT) {
> >>> + if (ret >= 0)
> >>> + return IOU_ISSUE_SKIP_COMPLETE;
> >>> + io_kbuf_recycle(req, issue_flags);
> >>> + }
> >>> if (ret == -EAGAIN) {
> >>> ioucmd->flags |= IORING_URING_CMD_REISSUE;
> >>> return ret;
> >>
> >> Final comment on this part... uring_cmd is unique in the sense that it'd
> >> be the first potentially pollable file type that supports buffer
> >> selection AND can return -EIOCBQUEUED. For non-pollable, the buffer
> >> would get committed upfront. For pollable, we'd either finish and put it
> >> within this same execution context, or we'd drop it entirely when
> >> returning -EAGAIN.
> >>
> >> So what happens if we get -EIOCBQUEUED with a selected buffer from
> >> provided buffer ring, and someome malicious unregisters and frees the
> >> buffer ring before that request completes?
> >
> > Looks one real trouble for IORING_URING_CMD_MULTISHOT.
> >
> > For pollable multishot, ->issue() is run in submitter tw context, and done
> > in `sync` style, so ctx->uring_lock protects the buffer list, and
> > unregister can't happen. That should be one reason why polled multishot
> > can't be run in io-wq context.
> >
> > But now -EIOCBQUEUED is returned from ->issue(), we lose ->uring_lock's
> > protection for req->buf_list, one idea could be adding referenced buffer
> > list for failing unregister in case of any active consumer.
> >
> > Do you have suggestions for this problem?
>
> Just commit the buffer upfront, rather than grab it at issue time and
> commit when you get the completion callback? Yes that will pin the
> buffer for the duration of the IO, but that should not be an issue,
> nobody else can use it anyway. Avoiding the pin for pollable files with
> potentially infinite IO times (eg pipe that never gets written to, or
> socket that never gets data) is a key concept for those kinds of
> workloads, but for finite completion times or single use cases like
> yours here, that doesn't really matter.
OK, I will send V4 with documenting "commit the buffer upfront" usage.
>
> I've got a bit of a side project making the provided buffer selection a
> bit more foolproof in the sense that it makes it explicit that the scope
> of it is the issue context, but across executions. One current problem
> is req->buf_list, which for provided buffer rings really is local scope,
> yet it's in the io_kiocb. I'll be moving that somewhere else and out of
> io_kiocb. Just a side note, because it's currently easy to get this
> wrong even if you know what you are doing, as per your patch.
Thanks for the clarification!
Thanks,
Ming
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-08-20 15:39 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-19 15:00 [PATCH V3] io_uring: uring_cmd: add multishot support Ming Lei
2025-08-19 16:00 ` Jens Axboe
2025-08-20 11:08 ` Ming Lei
2025-08-20 13:11 ` Jens Axboe
2025-08-20 15:39 ` Ming Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).