From: Gabriel Krisman Bertazi <krisman@suse.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org
Subject: Re: [PATCH] io_uring/sqpoll: Increase task_work submission batch size
Date: Mon, 07 Apr 2025 11:47:17 -0400 [thread overview]
Message-ID: <87o6x8as8q.fsf@mailhost.krisman.be> (raw)
In-Reply-To: <87friod8rs.fsf@mailhost.krisman.be> (Gabriel Krisman Bertazi's message of "Thu, 03 Apr 2025 21:18:15 -0400")
Gabriel Krisman Bertazi <krisman@suse.de> writes:
> Jens Axboe <axboe@kernel.dk> writes:
>
>> On 4/3/25 1:56 PM, Gabriel Krisman Bertazi wrote:
>>> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
>>> #define IORING_SQPOLL_CAP_ENTRIES_VALUE 8
>>> -#define IORING_TW_CAP_ENTRIES_VALUE 8
>>> +#define IORING_TW_CAP_ENTRIES_VALUE 1024
>>
>> That's a huge bump! This should not be a submission side thing, it's
>> purely running the task work. For this test case, I'm assuming you don't
>> see any io-wq activity, and hence everything is done purely inline from
>> the SQPOLL thread?
>> This confuses me a bit, as this should not be driving
>> the queue depth at all, as submissions would be done by
>> __io_sq_thread().
>
> Indeed, the submission happens fully inside __io_sq_thread, and I can
> confirm that from the profile. What is interesting is that, once I lift
> the cap, we end up spending more time inside io_submit_sqes, which means
> it is able to drive more requests.
I think have more input on what's happening:
Regarding the tw batch not driving the submission. This is a typical
submission with IORING_TW_CAP_ENTRIES_VALUE = 8
254,0 1 49927 0.016024812 5977 Q R 2061024 + 8 [iou-sqp-5976]
254,0 1 49928 0.016025044 5977 G R 2061024 + 8 [iou-sqp-5976]
254,0 1 49929 0.016025116 5977 P N [iou-sqp-5976]
254,0 1 49930 0.016025594 5977 Q R 1132240 + 8 [iou-sqp-5976]
254,0 1 49931 0.016025713 5977 G R 1132240 + 8 [iou-sqp-5976]
254,0 1 49932 0.016026205 5977 Q R 1187696 + 8 [iou-sqp-5976]
254,0 1 49933 0.016026317 5977 G R 1187696 + 8 [iou-sqp-5976]
254,0 1 49934 0.016026811 5977 Q R 1716272 + 8 [iou-sqp-5976]
254,0 1 49935 0.016026927 5977 G R 1716272 + 8 [iou-sqp-5976]
254,0 1 49936 0.016027447 5977 Q R 276336 + 8 [iou-sqp-5976]
254,0 1 49937 0.016027565 5977 G R 276336 + 8 [iou-sqp-5976]
254,0 1 49938 0.016028005 5977 Q R 1672040 + 8 [iou-sqp-5976]
254,0 1 49939 0.016028116 5977 G R 1672040 + 8 [iou-sqp-5976]
254,0 1 49940 0.016028551 5977 Q R 1770880 + 8 [iou-sqp-5976]
254,0 1 49941 0.016028685 5977 G R 1770880 + 8 [iou-sqp-5976]
254,0 1 49942 0.016028795 5977 U N [iou-sqp-5976] 7
We plug 7 requests, flush them all together. with
IORING_TW_CAP_ENTRIES_VALUE=1024, submissions look generally like this:
254,0 1 4931 0.001414021 3145 P N [iou-sqp-3144]
254,0 1 4932 0.001414415 3145 Q R 1268736 + 8 [iou-sqp-3144]
254,0 1 4933 0.001414584 3145 G R 1268736 + 8 [iou-sqp-3144]
254,0 1 4934 0.001414990 3145 Q R 1210304 + 8 [iou-sqp-3144]
254,0 1 4935 0.001415145 3145 G R 1210304 + 8 [iou-sqp-3144]
254,0 1 4936 0.001415553 3145 Q R 1476352 + 8 [iou-sqp-3144]
254,0 1 4937 0.001415722 3145 G R 1476352 + 8 [iou-sqp-3144]
254,0 1 4938 0.001416130 3145 Q R 1291752 + 8 [iou-sqp-3144]
254,0 1 4939 0.001416302 3145 G R 1291752 + 8 [iou-sqp-3144]
254,0 1 4940 0.001416763 3145 Q R 1171664 + 8 [iou-sqp-3144]
254,0 1 4941 0.001416928 3145 G R 1171664 + 8 [iou-sqp-3144]
254,0 1 4942 0.001417444 3145 Q R 197424 + 8 [iou-sqp-3144]
254,0 1 4943 0.001417602 3145 G R 197424 + 8 [iou-sqp-3144]
[...]
[...]
254,0 1 4993 0.001432191 3145 G R 371656 + 8 [iou-sqp-3144]
254,0 1 4994 0.001432601 3145 Q R 1864408 + 8 [iou-sqp-3144]
254,0 1 4995 0.001432771 3145 G R 1864408 + 8 [iou-sqp-3144]
254,0 1 4996 0.001432872 3145 U N [iou-sqp-3144] 32
So I'm able to drive way more I/O per plug with my patch.
If I plot the histogram of the to_submit argument of io_submit_sqes,
which is exactly io_sqring_entries(ctx), since I have only one ctx, I
see that I get much less io to submit in the ring in the first place.
So, because sqpoll is spinning more (and going to sleep more often), it
completes less I/Os, causing us to submit less from fio, as suggested by
the smaller io_sqring_entries? Does it make any sense?
To retest, I fully dropped the accounting code and I can reproduce the
same submission pattern. It really seems to depend on whether we go to
sleep after completing a small tw batch.
This is what I got from existing logs. I'm a bit limited with testing at the
moment, as I lost the machine where I could reproduce it (my other machine
yields the same io pattern, but no numerical regression). But I thought
it might be worth sharing in case I'm being silly and you can call me
out immediately. I'll reproduce it in the next days, once I get more
time on the shared machine.
--
Gabriel Krisman Bertazi
next prev parent reply other threads:[~2025-04-07 15:47 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-03 19:56 [PATCH] io_uring/sqpoll: Increase task_work submission batch size Gabriel Krisman Bertazi
2025-04-03 20:26 ` Jens Axboe
2025-04-04 1:18 ` Gabriel Krisman Bertazi
2025-04-07 15:47 ` Gabriel Krisman Bertazi [this message]
-- strict thread matches above, loose matches on Subject: below --
2025-05-08 18:12 Gabriel Krisman Bertazi
2025-05-08 18:14 ` Gabriel Krisman Bertazi
2025-05-09 13:57 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o6x8as8q.fsf@mailhost.krisman.be \
--to=krisman@suse.de \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.