public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>, io-uring@vger.kernel.org
Subject: Re: [PATCH 5/7] io_uring: add ability for provided buffer to index registered buffers
Date: Thu, 24 Oct 2024 09:57:16 -0600	[thread overview]
Message-ID: <c44ef9b3-bea7-45f5-b050-9c74ff1e0344@kernel.dk> (raw)
In-Reply-To: <34d4cfb3-e605-4d37-b104-03b8b1a892f1@gmail.com>

On 10/24/24 9:44 AM, Pavel Begunkov wrote:
> On 10/23/24 17:07, Jens Axboe wrote:
>> This just adds the necessary shifts that define what a provided buffer
>> that is merely an index into a registered buffer looks like. A provided
>> buffer looks like the following:
>>
>> struct io_uring_buf {
>>     __u64    addr;
>>     __u32    len;
>>     __u16    bid;
>>     __u16    resv;
>> };
>>
>> where 'addr' holds a userspace address, 'len' is the length of the
>> buffer, and 'bid' is the buffer ID identifying the buffer. This works
>> fine for a virtual address, but it cannot be used efficiently denote
>> a registered buffer. Registered buffers are pre-mapped into the kernel
>> for more efficient IO, avoiding a get_user_pages() and page(s) inc+dec,
>> and are used for things like O_DIRECT on storage and zero copy send.
>>
>> Particularly for the send case, it'd be useful to support a mix of
>> provided and registered buffers. This enables the use of using a
>> provided ring buffer to serialize sends, and also enables the use of
>> send bundles, where a send can pick multiple buffers and send them all
>> at once.
>>
>> If provided buffers are used as an index into registered buffers, the
>> meaning of buf->addr changes. If registered buffer index 'regbuf_index'
>> is desired, with a length of 'len' and the offset 'regbuf_offset' from
>> the start of the buffer, then the application would fill out the entry
>> as follows:
>>
>> buf->addr = ((__u64) regbuf_offset << IOU_BUF_OFFSET_BITS) | regbuf_index;
>> buf->len = len;
>>
>> and otherwise add it to the buffer ring as usual. The kernel will then
>> first pick a buffer from the desired buffer group ID, and then decode
>> which registered buffer to use for the transfer.
>>
>> This provides a way to use both registered and provided buffers at the
>> same time.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>>   include/uapi/linux/io_uring.h | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>> index 86cb385fe0b5..eef88d570cb4 100644
>> --- a/include/uapi/linux/io_uring.h
>> +++ b/include/uapi/linux/io_uring.h
>> @@ -733,6 +733,14 @@ struct io_uring_buf_ring {
>>       };
>>   };
>>   +/*
>> + * When provided buffers are used as indices into registered buffers, the
>> + * lower IOU_BUF_REGBUF_BITS indicate the index into the registered buffers,
>> + * and the upper IOU_BUF_OFFSET_BITS indicate the offset into that buffer.
>> + */
>> +#define IOU_BUF_REGBUF_BITS    (32ULL)
>> +#define IOU_BUF_OFFSET_BITS    (32ULL)
> 
> 32 bit is fine for IO size but not enough to store offsets, it
> can only address under 4GB registered buffers.

I did think about that - at least as it stands, registered buffers are
limited to 1GB in size. That's how it's been since that got added. Now,
for the future, we may obviously lift that limitation, and yeah then
32-bits would not necessarily be enough for the offset.

For linux, the max read/write value has always been INT_MAX & PAGE_MASK,
so we could make do with 31 bits for the size, which would bump the
offset to 33-bits, or 8G. That'd leave enough room for, at least, 8G
buffers, or 8x what we support now. Which is probably fine, you'd just
split your buffer registrations into 8G chunks, if you want to register
more than 8G of memory.

-- 
Jens Axboe

  reply	other threads:[~2024-10-24 15:57 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-23 16:07 [PATCHSET RFC 0/7] Add support for provided registered buffers Jens Axboe
2024-10-23 16:07 ` [PATCH 1/7] io_uring/kbuf: mark buf_sel_arg mode as KBUF_MODE_FREE once allocated Jens Axboe
2024-10-23 16:07 ` [PATCH 2/7] io_uring/kbuf: change io_provided_buffers_select() calling convention Jens Axboe
2024-10-23 16:07 ` [PATCH 3/7] io_uring/net: abstract out io_send_import() helper Jens Axboe
2024-10-23 16:07 ` [PATCH 4/7] io_uring/net: move send zc fixed buffer import into helper Jens Axboe
2024-10-23 16:07 ` [PATCH 5/7] io_uring: add ability for provided buffer to index registered buffers Jens Axboe
2024-10-24 15:44   ` Pavel Begunkov
2024-10-24 15:57     ` Jens Axboe [this message]
2024-10-24 16:17       ` Pavel Begunkov
2024-10-24 17:16         ` Jens Axboe
2024-10-24 18:20           ` Pavel Begunkov
2024-10-24 19:53             ` Jens Axboe
2024-10-24 22:46               ` Jens Axboe
2024-10-23 16:07 ` [PATCH 6/7] io_uring/kbuf: add support for mapping type KBUF_MODE_BVEC Jens Axboe
2024-10-24 15:22   ` Pavel Begunkov
2024-10-24 15:27     ` Jens Axboe
2024-10-24 15:40       ` Pavel Begunkov
2024-10-24 15:49         ` Jens Axboe
2024-10-23 16:07 ` [PATCH 7/7] io_uring/net: add provided buffer and bundle support to send zc Jens Axboe
2024-10-24 14:44   ` Pavel Begunkov
2024-10-24 14:48     ` Jens Axboe
2024-10-24 15:36       ` Pavel Begunkov
2024-10-24 14:36 ` [PATCHSET RFC 0/7] Add support for provided registered buffers Pavel Begunkov
2024-10-24 14:43   ` Jens Axboe
2024-10-24 15:04     ` Pavel Begunkov
2024-10-24 15:11       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c44ef9b3-bea7-45f5-b050-9c74ff1e0344@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=asml.silence@gmail.com \
    --cc=io-uring@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox