Re: [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>,
	Ming Lei <ming.lei@redhat.com>,
	io-uring@vger.kernel.org
Cc: linux-block@vger.kernel.org,
	Uday Shankar <ushankar@purestorage.com>,
	Akilesh Kailash <akailash@google.com>
Subject: Re: [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf
Date: Tue, 29 Oct 2024 14:06:14 -0600	[thread overview]
Message-ID: <e76d9742-5693-4057-b925-3917943c7441@kernel.dk> (raw)
In-Reply-To: <bc44d3c0-41e8-425c-957f-bad70aedcc50@kernel.dk>

On 10/29/24 1:18 PM, Jens Axboe wrote:
> Now, this implementation requires a user buffer, and as far as I'm told,
> you currently have kernel buffers on the ublk side. There's absolutely
> no reason why kernel buffers cannot work, we'd most likely just need to
> add a IORING_RSRC_KBUFFER type to handle that. My question here is how
> hard is this requirement? Reason I ask is that it's much simpler to work
> with userspace buffers. Yes the current implementation maps them
> everytime, we could certainly change that, however I don't see this
> being an issue. It's really no different than O_DIRECT, and you only
> need to map them once for a read + whatever number of writes you'd need
> to do. If a 'tag' is provided for LOCAL_BUF, it'll post a CQE whenever
> that buffer is unmapped. This is a notification for the application that
> it's done using the buffer. For a pure kernel buffer, we'd either need
> to be able to reference it (so that we KNOW it's not going away) and/or
> have a callback associated with the buffer.

Just to expand on this - if a kernel buffer is absolutely required, for
example if you're inheriting pages from the page cache or other
locations you cannot control, we would need to add something ala the
below:

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 9621ba533b35..b0258eb37681 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -474,6 +474,10 @@ void io_free_rsrc_node(struct io_rsrc_node *node)
 		if (node->buf)
 			io_buffer_unmap(node->ctx, node);
 		break;
+	case IORING_RSRC_KBUFFER:
+		if (node->buf)
+			node->kbuf_fn(node->buf);
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		break;
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index be9b490c400e..8d00460d47ff 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -11,6 +11,7 @@
 enum {
 	IORING_RSRC_FILE		= 0,
 	IORING_RSRC_BUFFER		= 1,
+	IORING_RSRC_KBUFFER		= 2,
 };

 struct io_rsrc_node {
@@ -19,6 +20,7 @@ struct io_rsrc_node {
 	u16				type;

 	u64 tag;
+	void (*kbuf_fn)(struct io_mapped_ubuf *);
 	union {
 		unsigned long file_ptr;
 		struct io_mapped_ubuf *buf;

and provide a helper that allocates an io_rsrc_node, sets it to type
IORING_RSRC_KBUFFER, and assigns a ->kbuf_fn() that gets a callback when
the final put of the node happens. Whatever ublk command that wants to
do zero copy would call this helper at prep time and set the
io_submit_state buffer to be used.

Likewise, probably provide an io_rsrc helper that can be called by
kbuf_fn as well to do final cleanup, so that the callback itself is only
tasked with whatever it needs to do once it's received the data.

For this to work, we'll absolutely need the provider to guarantee that
the pages mapped will remain persistent until that callback is received.
Or have a way to reference the data inside rsrc.c. I'm imagining this is
just stacking the IO, so eg you get a read with some data already in
there that you don't control, and you don't complete this read until
some other IO is done. That other IO is what is using the buffer
provided here.

Anyway, just a suggestion if the user provided memory is a no-go, there
are certainly ways we can make this trivially work with memory you
cannot control that is received from inside the kernel, without a lot of
additions.

-- 
Jens Axboe

next prev parent reply	other threads:[~2024-10-29 20:06 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-25 12:22 [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf Ming Lei
2024-10-25 12:22 ` [PATCH V8 1/7] io_uring: add io_link_req() helper Ming Lei
2024-10-25 12:22 ` [PATCH V8 2/7] io_uring: add io_submit_fail_link() helper Ming Lei
2024-10-25 12:22 ` [PATCH V8 3/7] io_uring: add helper of io_req_commit_cqe() Ming Lei
2024-10-25 12:22 ` [PATCH V8 4/7] io_uring: support SQE group Ming Lei
2024-10-29  0:12   ` Jens Axboe
2024-10-29  1:50     ` Ming Lei
2024-10-29 16:38       ` Pavel Begunkov
2024-10-31 21:24   ` Jens Axboe
2024-10-31 21:39     ` Jens Axboe
2024-11-01  0:00       ` Jens Axboe
2024-10-25 12:22 ` [PATCH V8 5/7] io_uring: support leased group buffer with REQ_F_GROUP_KBUF Ming Lei
2024-10-29 16:47   ` Pavel Begunkov
2024-10-30  0:45     ` Ming Lei
2024-10-30  1:25       ` Pavel Begunkov
2024-10-30  2:04         ` Ming Lei
2024-10-31 13:16           ` Pavel Begunkov
2024-11-01  1:04             ` Ming Lei
2024-11-03 22:31               ` Pavel Begunkov
2024-11-04  0:16                 ` Ming Lei
2024-11-04  1:08                   ` Pavel Begunkov
2024-11-04  1:21                     ` Ming Lei
2024-11-04 12:23                       ` Pavel Begunkov
2024-11-04 13:08                         ` Ming Lei
2024-11-04 13:24                           ` Pavel Begunkov
2024-11-04 13:35                             ` Ming Lei
2024-11-04 16:38                               ` Pavel Begunkov
2024-11-05  3:37                                 ` Ming Lei
2024-10-25 12:22 ` [PATCH V8 6/7] io_uring/uring_cmd: support leasing device kernel buffer to io_uring Ming Lei
2024-10-25 12:22 ` [PATCH V8 7/7] ublk: support leasing io " Ming Lei
2024-10-29 17:01 ` [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf Pavel Begunkov
2024-10-29 17:04   ` Jens Axboe
2024-10-29 19:18     ` Jens Axboe
2024-10-29 20:06       ` Jens Axboe [this message]
2024-10-29 21:26         ` Jens Axboe
2024-10-30  2:03           ` Ming Lei
2024-10-30  2:43             ` Jens Axboe
2024-10-30  3:08               ` Ming Lei
2024-10-30  4:11                 ` Ming Lei
2024-10-30 13:20                   ` Jens Axboe
2024-10-31  2:53                     ` Ming Lei
2024-10-31 13:35                       ` Jens Axboe
2024-10-31 15:07                         ` Jens Axboe
2024-11-01  2:57                           ` Ming Lei
2024-11-01  1:39                         ` Ming Lei
2024-10-31 13:42                       ` Pavel Begunkov
2024-10-30 13:18                 ` Jens Axboe
2024-10-31 13:25               ` Pavel Begunkov
2024-10-31 14:29                 ` Jens Axboe
2024-10-31 15:25                   ` Pavel Begunkov
2024-10-31 15:42                     ` Jens Axboe
2024-10-31 16:29                       ` Pavel Begunkov

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:9621ba533b3 dfblob:b0258eb3768 dfblob:be9b490c400
dfblob:8d00460d47f )
 OR (
bs:"Re: [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e76d9742-5693-4057-b925-3917943c7441@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=akailash@google.com \
    --cc=asml.silence@gmail.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=ushankar@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.