From: Pavel Begunkov <asml.silence@gmail.com>
To: Jens Axboe <axboe@kernel.dk>, io-uring@vger.kernel.org
Subject: Re: [PATCH 5/7] io_uring: add ability for provided buffer to index registered buffers
Date: Thu, 24 Oct 2024 17:17:37 +0100 [thread overview]
Message-ID: <c51938c8-8bb4-44d1-8394-14aeebd58ba2@gmail.com> (raw)
In-Reply-To: <c44ef9b3-bea7-45f5-b050-9c74ff1e0344@kernel.dk>
On 10/24/24 16:57, Jens Axboe wrote:
> On 10/24/24 9:44 AM, Pavel Begunkov wrote:
>> On 10/23/24 17:07, Jens Axboe wrote:
>>> This just adds the necessary shifts that define what a provided buffer
>>> that is merely an index into a registered buffer looks like. A provided
>>> buffer looks like the following:
>>>
>>> struct io_uring_buf {
>>> __u64 addr;
>>> __u32 len;
>>> __u16 bid;
>>> __u16 resv;
>>> };
>>>
>>> where 'addr' holds a userspace address, 'len' is the length of the
>>> buffer, and 'bid' is the buffer ID identifying the buffer. This works
>>> fine for a virtual address, but it cannot be used efficiently denote
>>> a registered buffer. Registered buffers are pre-mapped into the kernel
>>> for more efficient IO, avoiding a get_user_pages() and page(s) inc+dec,
>>> and are used for things like O_DIRECT on storage and zero copy send.
>>>
>>> Particularly for the send case, it'd be useful to support a mix of
>>> provided and registered buffers. This enables the use of using a
>>> provided ring buffer to serialize sends, and also enables the use of
>>> send bundles, where a send can pick multiple buffers and send them all
>>> at once.
>>>
>>> If provided buffers are used as an index into registered buffers, the
>>> meaning of buf->addr changes. If registered buffer index 'regbuf_index'
>>> is desired, with a length of 'len' and the offset 'regbuf_offset' from
>>> the start of the buffer, then the application would fill out the entry
>>> as follows:
>>>
>>> buf->addr = ((__u64) regbuf_offset << IOU_BUF_OFFSET_BITS) | regbuf_index;
>>> buf->len = len;
>>>
>>> and otherwise add it to the buffer ring as usual. The kernel will then
>>> first pick a buffer from the desired buffer group ID, and then decode
>>> which registered buffer to use for the transfer.
>>>
>>> This provides a way to use both registered and provided buffers at the
>>> same time.
>>>
>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>> ---
>>> include/uapi/linux/io_uring.h | 8 ++++++++
>>> 1 file changed, 8 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>> index 86cb385fe0b5..eef88d570cb4 100644
>>> --- a/include/uapi/linux/io_uring.h
>>> +++ b/include/uapi/linux/io_uring.h
>>> @@ -733,6 +733,14 @@ struct io_uring_buf_ring {
>>> };
>>> };
>>> +/*
>>> + * When provided buffers are used as indices into registered buffers, the
>>> + * lower IOU_BUF_REGBUF_BITS indicate the index into the registered buffers,
>>> + * and the upper IOU_BUF_OFFSET_BITS indicate the offset into that buffer.
>>> + */
>>> +#define IOU_BUF_REGBUF_BITS (32ULL)
>>> +#define IOU_BUF_OFFSET_BITS (32ULL)
>>
>> 32 bit is fine for IO size but not enough to store offsets, it
>> can only address under 4GB registered buffers.
>
> I did think about that - at least as it stands, registered buffers are
> limited to 1GB in size. That's how it's been since that got added. Now,
> for the future, we may obviously lift that limitation, and yeah then
> 32-bits would not necessarily be enough for the offset.
Right, and I don't think it's unreasonable considering with how
much memory systems have nowadays, and we think that one large
registered buffer is a good thing.
> For linux, the max read/write value has always been INT_MAX & PAGE_MASK,
> so we could make do with 31 bits for the size, which would bump the
> offset to 33-bits, or 8G. That'd leave enough room for, at least, 8G
> buffers, or 8x what we support now. Which is probably fine, you'd just
> split your buffer registrations into 8G chunks, if you want to register
> more than 8G of memory.
That's why I mentioned IO size, you can register a very large buffer
and do IO with a small subchunk of it, even if that "small" is 4G,
but it still needs to be addressed. I think we need at least an order
of magnitude or two more space for it to last for a bit.
Can it steal bits from IOU_BUF_REGBUF_BITS?
--
Pavel Begunkov
next prev parent reply other threads:[~2024-10-24 16:17 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-23 16:07 [PATCHSET RFC 0/7] Add support for provided registered buffers Jens Axboe
2024-10-23 16:07 ` [PATCH 1/7] io_uring/kbuf: mark buf_sel_arg mode as KBUF_MODE_FREE once allocated Jens Axboe
2024-10-23 16:07 ` [PATCH 2/7] io_uring/kbuf: change io_provided_buffers_select() calling convention Jens Axboe
2024-10-23 16:07 ` [PATCH 3/7] io_uring/net: abstract out io_send_import() helper Jens Axboe
2024-10-23 16:07 ` [PATCH 4/7] io_uring/net: move send zc fixed buffer import into helper Jens Axboe
2024-10-23 16:07 ` [PATCH 5/7] io_uring: add ability for provided buffer to index registered buffers Jens Axboe
2024-10-24 15:44 ` Pavel Begunkov
2024-10-24 15:57 ` Jens Axboe
2024-10-24 16:17 ` Pavel Begunkov [this message]
2024-10-24 17:16 ` Jens Axboe
2024-10-24 18:20 ` Pavel Begunkov
2024-10-24 19:53 ` Jens Axboe
2024-10-24 22:46 ` Jens Axboe
2024-10-23 16:07 ` [PATCH 6/7] io_uring/kbuf: add support for mapping type KBUF_MODE_BVEC Jens Axboe
2024-10-24 15:22 ` Pavel Begunkov
2024-10-24 15:27 ` Jens Axboe
2024-10-24 15:40 ` Pavel Begunkov
2024-10-24 15:49 ` Jens Axboe
2024-10-23 16:07 ` [PATCH 7/7] io_uring/net: add provided buffer and bundle support to send zc Jens Axboe
2024-10-24 14:44 ` Pavel Begunkov
2024-10-24 14:48 ` Jens Axboe
2024-10-24 15:36 ` Pavel Begunkov
2024-10-24 14:36 ` [PATCHSET RFC 0/7] Add support for provided registered buffers Pavel Begunkov
2024-10-24 14:43 ` Jens Axboe
2024-10-24 15:04 ` Pavel Begunkov
2024-10-24 15:11 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c51938c8-8bb4-44d1-8394-14aeebd58ba2@gmail.com \
--to=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox