* [PATCHSET RFC 0/3] uring_cmd copy avoidance
@ 2025-06-05 16:30 Jens Axboe
2025-06-05 16:30 ` [PATCH 1/3] io_uring/uring_cmd: get rid of io_uring_cmd_prep_setup() Jens Axboe
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Jens Axboe @ 2025-06-05 16:30 UTC (permalink / raw)
To: io-uring; +Cc: csander
Hi,
Currently uring_cmd unconditionally copies the SQE at prep time, as it
has no other choice - the SQE data must remain stable after submit.
This can lead to excessive memory bandwidth being used for that copy,
as passthrough will often use 128b SQEs, and efficiency concerns as
those copies will potentially use quite a lot of CPU cycles as well.
As a quick test, running the current -git kernel on a box with 23
NVMe drives doing passthrough IO, memcpy() is the highest cycle user
at 9.05%, which is all off the uring_cmd prep path. The test case is
a 512b random read, which runs at 91-92M IOPS.
With these patches, memcpy() is gone from the profiles, and it runs
at 98-99M IOPS, or about 7-8% faster.
Even for lower IOPS production testing, Caleb reports that memcpy()
overhead is in the realm of 1.1% of CPU time.
This patch series attempts to mark requests in the io_uring core with
REQ_F_ASYNC_ISSUE, if we know the issue will happen out-of-line. A
helper is added to check for this, as REQ_F_FORCE_ASYNC should be
factored in as well, and I did not feel like adding ASYNC_ISSUE to
those locations.
io_uring_cmd_sqe_verify() is added on the uring_cmd side to verify
that the core did not mess this up - if that's the case, then
-EFAULT is returned for this request and a warning is triggered.
Still not fully in love with stashing an io_uring_sqe pointer from
prep to issue, would be much nicer if we kept passing it around...
Suggestions certainly more than welcome!
include/linux/io_uring_types.h | 3 ++
io_uring/io_uring.c | 1 +
io_uring/io_uring.h | 5 +++
io_uring/uring_cmd.c | 68 ++++++++++++++++++++++------------
4 files changed, 53 insertions(+), 24 deletions(-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/3] io_uring/uring_cmd: get rid of io_uring_cmd_prep_setup()
2025-06-05 16:30 [PATCHSET RFC 0/3] uring_cmd copy avoidance Jens Axboe
@ 2025-06-05 16:30 ` Jens Axboe
2025-06-05 16:30 ` [PATCH 2/3] io_uring: mark requests that will go async with REQ_F_ASYNC_ISSUE Jens Axboe
2025-06-05 16:30 ` [PATCH 3/3] io_uring/uring_cmd: copy SQE only when needed Jens Axboe
2 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2025-06-05 16:30 UTC (permalink / raw)
To: io-uring; +Cc: csander, Jens Axboe
It's a pretty pointless helper, just allocates and copies data. Fold it
into io_uring_cmd_prep().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
io_uring/uring_cmd.c | 34 +++++++++++++---------------------
1 file changed, 13 insertions(+), 21 deletions(-)
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 929cad6ee326..e204f4941d72 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -181,8 +181,7 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, u64 res2,
}
EXPORT_SYMBOL_GPL(io_uring_cmd_done);
-static int io_uring_cmd_prep_setup(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
+int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
struct io_async_cmd *ac;
@@ -190,6 +189,18 @@ static int io_uring_cmd_prep_setup(struct io_kiocb *req,
/* see io_uring_cmd_get_async_data() */
BUILD_BUG_ON(offsetof(struct io_async_cmd, data) != 0);
+ if (sqe->__pad1)
+ return -EINVAL;
+
+ ioucmd->flags = READ_ONCE(sqe->uring_cmd_flags);
+ if (ioucmd->flags & ~IORING_URING_CMD_MASK)
+ return -EINVAL;
+
+ if (ioucmd->flags & IORING_URING_CMD_FIXED)
+ req->buf_index = READ_ONCE(sqe->buf_index);
+
+ ioucmd->cmd_op = READ_ONCE(sqe->cmd_op);
+
ac = io_uring_alloc_async_data(&req->ctx->cmd_cache, req);
if (!ac)
return -ENOMEM;
@@ -207,25 +218,6 @@ static int io_uring_cmd_prep_setup(struct io_kiocb *req,
return 0;
}
-int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
- struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
-
- if (sqe->__pad1)
- return -EINVAL;
-
- ioucmd->flags = READ_ONCE(sqe->uring_cmd_flags);
- if (ioucmd->flags & ~IORING_URING_CMD_MASK)
- return -EINVAL;
-
- if (ioucmd->flags & IORING_URING_CMD_FIXED)
- req->buf_index = READ_ONCE(sqe->buf_index);
-
- ioucmd->cmd_op = READ_ONCE(sqe->cmd_op);
-
- return io_uring_cmd_prep_setup(req, sqe);
-}
-
int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
--
2.49.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/3] io_uring: mark requests that will go async with REQ_F_ASYNC_ISSUE
2025-06-05 16:30 [PATCHSET RFC 0/3] uring_cmd copy avoidance Jens Axboe
2025-06-05 16:30 ` [PATCH 1/3] io_uring/uring_cmd: get rid of io_uring_cmd_prep_setup() Jens Axboe
@ 2025-06-05 16:30 ` Jens Axboe
2025-06-05 17:06 ` Jens Axboe
2025-06-05 16:30 ` [PATCH 3/3] io_uring/uring_cmd: copy SQE only when needed Jens Axboe
2 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2025-06-05 16:30 UTC (permalink / raw)
To: io-uring; +Cc: csander, Jens Axboe
REQ_F_ASYNC_ISSUE is flagged in a request, if the core parts of io_uring
either knows that it will be issued from an out-of-line context, or if
there's a risk that it will. As REQ_F_FORCE_ASYNC will similarly force
async issue of a request, add a io_req_will_async_issue() helper that
callers can use without needing to worry about the internal details.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
include/linux/io_uring_types.h | 3 +++
io_uring/io_uring.c | 1 +
io_uring/io_uring.h | 5 +++++
3 files changed, 9 insertions(+)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 2922635986f5..3b17c1da1ab0 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -502,6 +502,7 @@ enum {
REQ_F_BUF_NODE_BIT,
REQ_F_HAS_METADATA_BIT,
REQ_F_IMPORT_BUFFER_BIT,
+ REQ_F_ASYNC_ISSUE_BIT,
/* not a real bit, just to check we're not overflowing the space */
__REQ_F_LAST_BIT,
@@ -591,6 +592,8 @@ enum {
* For SEND_ZC, whether to import buffers (i.e. the first issue).
*/
REQ_F_IMPORT_BUFFER = IO_REQ_FLAG(REQ_F_IMPORT_BUFFER_BIT),
+ /* request will be issued async */
+ REQ_F_ASYNC_ISSUE = IO_REQ_FLAG(REQ_F_ASYNC_ISSUE_BIT),
};
typedef void (*io_req_tw_func_t)(struct io_kiocb *req, io_tw_token_t tw);
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index cf759c172083..8f431a9a7812 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2197,6 +2197,7 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
trace_io_uring_link(req, link->last);
link->last->link = req;
link->last = req;
+ req->flags |= REQ_F_ASYNC_ISSUE;
if (req->flags & IO_REQ_LINK_FLAGS)
return 0;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index d59c12277d58..ff6b577c18f5 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -524,4 +524,9 @@ static inline bool io_has_work(struct io_ring_ctx *ctx)
return test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq) ||
io_local_work_pending(ctx);
}
+
+static inline bool io_req_will_async_issue(struct io_kiocb *req)
+{
+ return (req->flags & (REQ_F_FORCE_ASYNC | REQ_F_ASYNC_ISSUE));
+}
#endif
--
2.49.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/3] io_uring/uring_cmd: copy SQE only when needed
2025-06-05 16:30 [PATCHSET RFC 0/3] uring_cmd copy avoidance Jens Axboe
2025-06-05 16:30 ` [PATCH 1/3] io_uring/uring_cmd: get rid of io_uring_cmd_prep_setup() Jens Axboe
2025-06-05 16:30 ` [PATCH 2/3] io_uring: mark requests that will go async with REQ_F_ASYNC_ISSUE Jens Axboe
@ 2025-06-05 16:30 ` Jens Axboe
2 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2025-06-05 16:30 UTC (permalink / raw)
To: io-uring; +Cc: csander, Jens Axboe
If the request is flagged with REQ_F_FORCE_ASYNC or REQ_F_ASYNC_PREP,
then there's a chance that it will get issued out-of-line. For that case,
the SQE must be copied.
Add an SQE copy helper, and use it on the prep side if the request is
flagged as such, and from the main issue path if we get -EAGAIN when
attempting to issue the request.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
io_uring/uring_cmd.c | 50 ++++++++++++++++++++++++++++++++++----------
1 file changed, 39 insertions(+), 11 deletions(-)
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index e204f4941d72..76c6b91d249f 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -181,6 +181,17 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, u64 res2,
}
EXPORT_SYMBOL_GPL(io_uring_cmd_done);
+static void io_uring_cmd_sqe_copy(struct io_kiocb *req)
+{
+ struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
+ struct io_async_cmd *ac = req->async_data;
+
+ if (ioucmd->sqe != ac->sqes) {
+ memcpy(ac->sqes, ioucmd->sqe, uring_sqe_size(req->ctx));
+ ioucmd->sqe = ac->sqes;
+ }
+}
+
int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
@@ -205,19 +216,29 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (!ac)
return -ENOMEM;
ac->data.op_data = NULL;
+ ioucmd->sqe = sqe;
- /*
- * Unconditionally cache the SQE for now - this is only needed for
- * requests that go async, but prep handlers must ensure that any
- * sqe data is stable beyond prep. Since uring_cmd is special in
- * that it doesn't read in per-op data, play it safe and ensure that
- * any SQE data is stable beyond prep. This can later get relaxed.
- */
- memcpy(ac->sqes, sqe, uring_sqe_size(req->ctx));
- ioucmd->sqe = ac->sqes;
+ if (io_req_will_async_issue(req))
+ io_uring_cmd_sqe_copy(req);
return 0;
}
+/*
+ * Basic SQE validity check - should never trigger, can be removed later on
+ */
+static bool io_uring_cmd_sqe_verify(struct io_kiocb *req,
+ struct io_uring_cmd *ioucmd,
+ unsigned int issue_flags)
+{
+ struct io_async_cmd *ac = req->async_data;
+
+ if (ioucmd->sqe == ac->sqes)
+ return true;
+ if (WARN_ON_ONCE(issue_flags & (IO_URING_F_IOWQ | IO_URING_F_UNLOCKED)))
+ return false;
+ return true;
+}
+
int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
@@ -232,6 +253,9 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
if (ret)
return ret;
+ if (unlikely(!io_uring_cmd_sqe_verify(req, ioucmd, issue_flags)))
+ return -EFAULT;
+
if (ctx->flags & IORING_SETUP_SQE128)
issue_flags |= IO_URING_F_SQE128;
if (ctx->flags & IORING_SETUP_CQE32)
@@ -251,8 +275,12 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
}
ret = file->f_op->uring_cmd(ioucmd, issue_flags);
- if (ret == -EAGAIN || ret == -EIOCBQUEUED)
- return ret;
+ if (ret == -EAGAIN) {
+ io_uring_cmd_sqe_copy(req);
+ return -EAGAIN;
+ } else if (ret == -EIOCBQUEUED) {
+ return -EIOCBQUEUED;
+ }
if (ret < 0)
req_set_fail(req);
io_req_uring_cleanup(req, issue_flags);
--
2.49.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 2/3] io_uring: mark requests that will go async with REQ_F_ASYNC_ISSUE
2025-06-05 16:30 ` [PATCH 2/3] io_uring: mark requests that will go async with REQ_F_ASYNC_ISSUE Jens Axboe
@ 2025-06-05 17:06 ` Jens Axboe
0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2025-06-05 17:06 UTC (permalink / raw)
To: io-uring; +Cc: csander
On 6/5/25 10:30 AM, Jens Axboe wrote:
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index cf759c172083..8f431a9a7812 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -2197,6 +2197,7 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
> trace_io_uring_link(req, link->last);
> link->last->link = req;
> link->last = req;
> + req->flags |= REQ_F_ASYNC_ISSUE;
>
> if (req->flags & IO_REQ_LINK_FLAGS)
> return 0;
This is set too late, must obviously be set before ->prep()...
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-06-05 17:07 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-05 16:30 [PATCHSET RFC 0/3] uring_cmd copy avoidance Jens Axboe
2025-06-05 16:30 ` [PATCH 1/3] io_uring/uring_cmd: get rid of io_uring_cmd_prep_setup() Jens Axboe
2025-06-05 16:30 ` [PATCH 2/3] io_uring: mark requests that will go async with REQ_F_ASYNC_ISSUE Jens Axboe
2025-06-05 17:06 ` Jens Axboe
2025-06-05 16:30 ` [PATCH 3/3] io_uring/uring_cmd: copy SQE only when needed Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).