From: Christian Loehle <christian.loehle@arm.com>
To: Jens Axboe <axboe@kernel.dk>, linux-kernel@vger.kernel.org
Cc: peterz@infradead.org, tglx@linutronix.de,
Vincent Guittot <vincent.guittot@linaro.org>,
Qais Yousef <qyousef@layalina.io>,
"Rafael J. Wysocki" <rafael@kernel.org>
Subject: Re: [PATCHSET v6 0/4] Split iowait into two states
Date: Wed, 21 Aug 2024 15:54:31 +0100 [thread overview]
Message-ID: <c8cd6339-c168-4409-8cc4-e85e7ad92914@arm.com> (raw)
In-Reply-To: <20240819154259.215504-1-axboe@kernel.dk>
On 8/19/24 16:39, Jens Axboe wrote:
> Hi,
>
> This is v6 of the patchset where the current in_iowait state is split
> into two parts:
>
> 1) The "task is sleeping waiting on IO", and would like cpufreq goodness
> in terms of sleep and wakeup latencies.
> 2) The above, and also accounted as such in the iowait stats.
>
> The current ->in_iowait covers both, this series splits it into two types
> of state so that each can be controlled seperately.
Hi Jens,
I wanted to give a brief update on where I think we're at in terms
of iowait behavior regarding cpuidle and cpufreq.
I'm still working on getting both removed, given the discussions had
on the list [0] and at OSPM [1] this seems realistic and the best way
forward IMO.
That would then naturally make this series and the iowait workaround in
io_uring/io_uring.c unnecessary.
1. For cpuidle:
Main issue with relying on nr_iowaiters is that there is no guarantee
whatsoever that these tasks will wakeup where they went to sleep so if
we can achieve the same throughput without nr_iowaiters it shouldn't
be relevant.
I spent quite some time in fixing teo [2], because untangling nr_iowaiters
from menu seems hard, essentially nobody has worked on menu seriously for
a while now. Thus the plan here is to replace menu by teo eventually.
For your io_uring workloads I see throughput on par for teo (doesn't rely
on iowait) and menu.
# echo teo > /sys/devices/system/cpu/cpuidle/current_governor
# ./io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S 1 -R 0 /dev/nvme0n1
submitter=0, tid=206, file=/dev/nvme0n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=1
Engine=preadv2
IOPS=22500, BW=87MiB/s, IOS/call=0/0
IOPS=21916, BW=85MiB/s, IOS/call=1/0
IOPS=21774, BW=85MiB/s, IOS/call=1/0
IOPS=22467, BW=87MiB/s, IOS/call=1/0
Exiting on timeout
Maximum IOPS=22500
# echo menu > /sys/devices/system/cpu/cpuidle/current_governor
[ 178.754571] cpuidle: using governor menu
# ./io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S 1 -R 0 /dev/nvme0n1
submitter=0, tid=209, file=/dev/nvme0n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=1
Engine=preadv2
IOPS=21452, BW=83MiB/s, IOS/call=0/0
IOPS=21778, BW=85MiB/s, IOS/call=1/0
IOPS=21120, BW=82MiB/s, IOS/call=1/0
IOPS=20903, BW=81MiB/s, IOS/call=1/0
Exiting on timeout
Maximum IOPS=21778
Please do give it a try for yourself as well!
2. For cpufreq:
Main issue for IO-bound workloads with iowait boosting is we're punishing
the 'good' workloads (that don't have iowait sleeps in their throughput-critical
part, which is already bad because of the scheduling overhead induced) by
making them energy-inefficient to make synthetic benchmarks happy.
A study of more realistic workloads show that they don't suffer from a problem
of building up utilization, not util_est anyway, so they don't actually benefit
from a cpufreq boost.
This leads me to the conclusion that cpufreq iowait boosting can be scrapped
altogether if we accept some degradation of benchmarks like
./io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S 1 -R 0 /dev/nvme0n1
or
fio --name=fio --rw=randread --bs=4k --runtime=5 --time_based --filename=/dev/nvme0n1 --iodepth=1 --numjobs=1
(non-io_uring) for that matter.
For io_uring where the expected case is probably not single-threaded sync IO
(or iodepth=1) the cpufreq iowait boost is just hurting use-cases by pushing
it to less efficient frequencies that might not be needed.
I know you want your problem (io_uring showing up as 100% busy even though it's
just sleeping) to be solved like yesterday and my opinion on a future timeline
might not be enough to convince you of much. I wanted to share it anyway.
I don't see an issue with the actual code you're proposing, but it does feel
like a step in the wrong direction to me.
[0] https://lore.kernel.org/lkml/20240304201625.100619-1-christian.loehle@arm.com/
v2: https://lore.kernel.org/lkml/20240518113947.2127802-1-christian.loehle@arm.com/
[1] https://www.youtube.com/watch?v=MSQGEsSziZ4
[2] https://lore.kernel.org/lkml/20240628095955.34096-1-christian.loehle@arm.com/
Regards,
Christian
> [snip]
next prev parent reply other threads:[~2024-08-21 14:54 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-19 15:39 [PATCHSET v6 0/4] Split iowait into two states Jens Axboe
2024-08-19 15:39 ` [PATCH 1/4] sched/core: add helpers for iowait handling Jens Axboe
2024-08-19 15:39 ` [PATCH 2/4] sched/core: change rq->nr_iowait type to atomic_long_t Jens Axboe
2024-08-20 2:14 ` Zhang Qiao
2024-08-19 15:39 ` [PATCH 3/4] sched/core: have io_schedule_prepare() return a long Jens Axboe
2024-09-05 9:57 ` Peter Zijlstra
2024-08-19 15:39 ` [PATCH 4/4] sched/core: split iowait state into two states Jens Axboe
2024-09-05 10:55 ` Peter Zijlstra
2024-08-21 14:54 ` Christian Loehle [this message]
2024-08-21 15:04 ` [PATCHSET v6 0/4] Split iowait " Jens Axboe
2024-08-21 15:57 ` Christian Loehle
2024-08-24 15:34 ` Jens Axboe
2024-09-04 14:28 ` Peter Zijlstra
2024-09-04 14:41 ` Jens Axboe
2024-09-04 14:49 ` Jens Axboe
2024-09-05 9:51 ` Peter Zijlstra
2024-09-04 14:42 ` Rafael J. Wysocki
2024-09-04 15:18 ` Rafael J. Wysocki
2024-09-05 9:29 ` Christian Loehle
2024-09-05 10:40 ` Rafael J. Wysocki
2024-09-05 9:36 ` Peter Zijlstra
2024-09-05 10:31 ` Christian Loehle
2024-09-05 11:00 ` Peter Zijlstra
2024-09-05 11:09 ` Christian Loehle
2025-03-31 9:02 ` Pavel Begunkov
2025-03-31 10:33 ` Christian Loehle
2025-04-01 8:21 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c8cd6339-c168-4409-8cc4-e85e7ad92914@arm.com \
--to=christian.loehle@arm.com \
--cc=axboe@kernel.dk \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=qyousef@layalina.io \
--cc=rafael@kernel.org \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox