From: Pavel Begunkov <asml.silence@gmail.com>
To: Jens Axboe <axboe@kernel.dk>, hexue <xue01.he@samsung.com>
Cc: io-uring@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v8] io_uring: releasing CPU resources when polling
Date: Thu, 24 Oct 2024 15:49:01 +0100 [thread overview]
Message-ID: <8e0f74c1-aa11-4036-ba20-6f4dc0c40333@gmail.com> (raw)
In-Reply-To: <b50ce7d2-b2a8-4552-8246-0464602bfd84@kernel.dk>
On 10/24/24 15:40, Jens Axboe wrote:
> On 10/24/24 8:26 AM, Pavel Begunkov wrote:
>> On 10/24/24 15:18, Jens Axboe wrote:
>>> On 10/23/24 8:38 PM, hexue wrote:
>>>> On 9/25/2024 12:12, Pavel Begunkov wrote:
>> ...
>>>> When the number of threads exceeds the number of CPU cores,the
>>>> database throughput does not increase significantly. However,
>>>> hybrid polling can releasing some CPU resources during the polling
>>>> process, so that part of the CPU time can be used for frequent
>>>> data processing and other operations, which speeds up the reading
>>>> process, thereby improving throughput and optimizaing database
>>>> performance.I tried different compression strategies and got
>>>> results similar to the above table.(~30% throughput improvement)
>>>>
>>>> As more database applications adapt to the io_uring engine, I think
>>>> the application of hybrid poll may have potential in some scenarios.
>>>
>>> Thanks for posting some numbers on that part, that's useful. I do
>>> think the feature is useful as well, but I still have some issues
>>> with the implementation. Below is an incremental patch on top of
>>> yours to resolve some of those, potentially. Issues:
>>>
>>> 1) The patch still reads a bit like a hack, in that it doesn't seem to
>>> care about following the current style. This reads a bit lazy/sloppy
>>> or unfinished. I've fixed that up.
>>>
>>> 2) Appropriate member and function naming.
>>>
>>> 3) Same as above, it doesn't care about proper placement of additions to
>>> structs. Again this is a bit lazy and wasteful, attention should be
>>> paid to where additions are placed to not needlessly bloat
>>> structures, or place members in cache unfortunate locations. For
>>> example, available_time is just placed at the end of io_ring_ctx,
>>> why? It's a submission side member, and there's room with other
>>> related members. Not only is the placement now where you'd want it to
>>> be, memory wise, it also doesn't add 8 bytes to io_uring_ctx.
>>>
>>> 4) Like the above, the io_kiocb bloat is, by far, the worst. Seems to me
>>> that we can share space with the polling hash_node. This obviously
>>> needs careful vetting, haven't done that yet. IOPOLL setups should
>>> not be using poll at all. This needs extra checking. The poll_state
>>> can go with cancel_seq_set, as there's a hole there any. And just
>>> like that, rather than add 24b to io_kiocb, it doesn't take any extra
>>> space at all.
>>>
>>> 5) HY_POLL is a terrible name. It's associated with IOPOLL, and so let's
>>> please use a name related to that. And require IOPOLL being set with
>>> HYBRID_IOPOLL, as it's really a variant of that. Makes it clear that
>>> HYBRID_IOPOLL is really just a mode of operation for IOPOLL, and it
>>> can't exist without that.
>>>
>>> Please take a look at this incremental and test it, and then post a v9
>>> that looks a lot more finished. Caveat - I haven't tested this one at
>>> all. Thanks!
>>>
>>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>>> index c79ee9fe86d4..6cf6a45835e5 100644
>>> --- a/include/linux/io_uring_types.h
>>> +++ b/include/linux/io_uring_types.h
>>> @@ -238,6 +238,8 @@ struct io_ring_ctx {
>>> struct io_rings *rings;
>>> struct percpu_ref refs;
>>> + u64 poll_available_time;
>>> +
>>> clockid_t clockid;
>>> enum tk_offsets clock_offset;
>>> @@ -433,9 +435,6 @@ struct io_ring_ctx {
>>> struct page **sqe_pages;
>>> struct page **cq_wait_page;
>>> -
>>> - /* for io_uring hybrid poll*/
>>> - u64 available_time;
>>> };
>>> struct io_tw_state {
>>> @@ -647,9 +646,22 @@ struct io_kiocb {
>>> atomic_t refs;
>>> bool cancel_seq_set;
>>> + bool poll_state;
>>
>> As mentioned briefly before, that can be just a req->flags flag
>
> That'd be even better, I generally despise random bool addition.
>
>>> struct io_task_work io_task_work;
>>> - /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
>>> - struct hlist_node hash_node;
>>> + union {
>>> + /*
>>> + * for polled requests, i.e. IORING_OP_POLL_ADD and async armed
>>> + * poll
>>> + */
>>> + struct hlist_node hash_node;
>>> + /*
>>> + * For IOPOLL setup queues, with hybrid polling
>>> + */
>>> + struct {
>>> + u64 iopoll_start;
>>> + u64 iopoll_end;
>>
>> And IIRC it doesn't need to store the end as it's used immediately
>> after it's set in the same function.
>
> Nice, that opens up the door for less esoteric sharing as well. And
> yeah, I'd just use:
>
> runtime = ktime_get_ns() - req->iopoll_start - sleep_time;
>
> in io_uring_hybrid_poll() and kill it entirely, doesn't even need a
> local variable there. And then shove iopoll_start into the union with
> comp_list/apoll_events.
That's with what the current request is hooked into the list,
IOW such aliasing will corrupt the request
--
Pavel Begunkov
next prev parent reply other threads:[~2024-10-24 14:48 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20240918021016epcas5p14d6e771ee39bee5dddf253c39b84110c@epcas5p1.samsung.com>
2024-09-18 2:10 ` [PATCH v8 RESEND] io_uring: releasing CPU resources when polling hexue
2024-09-25 8:29 ` [PATCH V8] " hexue
2024-09-25 12:12 ` Pavel Begunkov
2024-10-24 2:38 ` Re: [PATCH v8] " hexue
2024-10-24 14:18 ` Jens Axboe
2024-10-24 14:26 ` Pavel Begunkov
2024-10-24 14:40 ` Jens Axboe
2024-10-24 14:49 ` Pavel Begunkov [this message]
2024-10-24 14:49 ` Jens Axboe
2024-10-25 6:10 ` hexue
[not found] <CGME20240819025446epcas5p43dde8e55435917ecd0175400b0b7cc62@epcas5p4.samsung.com>
2024-08-19 2:54 ` [PATCH V8] " hexue
[not found] <CGME20240812015950epcas5p26c0eb254964abfe8a7ff05de0ead05f8@epcas5p2.samsung.com>
2024-08-12 1:59 ` [PATCH v8] " hexue
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8e0f74c1-aa11-4036-ba20-6f4dc0c40333@gmail.com \
--to=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=xue01.he@samsung.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox