From: Jens Axboe <axboe@kernel.dk>
To: Christian Loehle <christian.loehle@arm.com>,
linux-kernel@vger.kernel.org
Cc: peterz@infradead.org, tglx@linutronix.de,
Vincent Guittot <vincent.guittot@linaro.org>,
Qais Yousef <qyousef@layalina.io>,
"Rafael J. Wysocki" <rafael@kernel.org>
Subject: Re: [PATCHSET v6 0/4] Split iowait into two states
Date: Wed, 21 Aug 2024 09:04:12 -0600 [thread overview]
Message-ID: <9a0f2192-b897-4952-b4ea-7fe229f33001@kernel.dk> (raw)
In-Reply-To: <c8cd6339-c168-4409-8cc4-e85e7ad92914@arm.com>
On 8/21/24 8:54 AM, Christian Loehle wrote:
> On 8/19/24 16:39, Jens Axboe wrote:
>> Hi,
>>
>> This is v6 of the patchset where the current in_iowait state is split
>> into two parts:
>>
>> 1) The "task is sleeping waiting on IO", and would like cpufreq goodness
>> in terms of sleep and wakeup latencies.
>> 2) The above, and also accounted as such in the iowait stats.
>>
>> The current ->in_iowait covers both, this series splits it into two types
>> of state so that each can be controlled seperately.
>
> Hi Jens,
> I wanted to give a brief update on where I think we're at in terms
> of iowait behavior regarding cpuidle and cpufreq.
> I'm still working on getting both removed, given the discussions had
> on the list [0] and at OSPM [1] this seems realistic and the best way
> forward IMO.
> That would then naturally make this series and the iowait workaround in
> io_uring/io_uring.c unnecessary.
>
> 1. For cpuidle:
> Main issue with relying on nr_iowaiters is that there is no guarantee
> whatsoever that these tasks will wakeup where they went to sleep so if
> we can achieve the same throughput without nr_iowaiters it shouldn't
> be relevant.
> I spent quite some time in fixing teo [2], because untangling nr_iowaiters
> from menu seems hard, essentially nobody has worked on menu seriously for
> a while now. Thus the plan here is to replace menu by teo eventually.
> For your io_uring workloads I see throughput on par for teo (doesn't rely
> on iowait) and menu.
>
> # echo teo > /sys/devices/system/cpu/cpuidle/current_governor
> # ./io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S 1 -R 0 /dev/nvme0n1
> submitter=0, tid=206, file=/dev/nvme0n1, node=-1
> polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=1
> Engine=preadv2
> IOPS=22500, BW=87MiB/s, IOS/call=0/0
> IOPS=21916, BW=85MiB/s, IOS/call=1/0
> IOPS=21774, BW=85MiB/s, IOS/call=1/0
> IOPS=22467, BW=87MiB/s, IOS/call=1/0
> Exiting on timeout
> Maximum IOPS=22500
> # echo menu > /sys/devices/system/cpu/cpuidle/current_governor
> [ 178.754571] cpuidle: using governor menu
> # ./io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S 1 -R 0 /dev/nvme0n1
> submitter=0, tid=209, file=/dev/nvme0n1, node=-1
> polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=1
> Engine=preadv2
> IOPS=21452, BW=83MiB/s, IOS/call=0/0
> IOPS=21778, BW=85MiB/s, IOS/call=1/0
> IOPS=21120, BW=82MiB/s, IOS/call=1/0
> IOPS=20903, BW=81MiB/s, IOS/call=1/0
> Exiting on timeout
> Maximum IOPS=21778
>
> Please do give it a try for yourself as well!
>
> 2. For cpufreq:
> Main issue for IO-bound workloads with iowait boosting is we're punishing
> the 'good' workloads (that don't have iowait sleeps in their throughput-critical
> part, which is already bad because of the scheduling overhead induced) by
> making them energy-inefficient to make synthetic benchmarks happy.
> A study of more realistic workloads show that they don't suffer from a problem
> of building up utilization, not util_est anyway, so they don't actually benefit
> from a cpufreq boost.
> This leads me to the conclusion that cpufreq iowait boosting can be scrapped
> altogether if we accept some degradation of benchmarks like
> ./io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S 1 -R 0 /dev/nvme0n1
> or
> fio --name=fio --rw=randread --bs=4k --runtime=5 --time_based --filename=/dev/nvme0n1 --iodepth=1 --numjobs=1
> (non-io_uring) for that matter.
The original iowait addition came because a big regression was seen
compared to not setting iowait, it was around 20% iirc. That's big, and
not in the realm of "some degradation" that will be acceptable. And that
will largely depend on the system being used. On some systems, it'll be
less, and on some it'll be more.
> For io_uring where the expected case is probably not single-threaded
> sync IO (or iodepth=1) the cpufreq iowait boost is just hurting
> use-cases by pushing it to less efficient frequencies that might not
> be needed.
People do all sorts of things, and sync (or low queue depth) IO is
certainly one of the use cases. In fact that's where the above report
came from, on the postgres aio side.
> I know you want your problem (io_uring showing up as 100% busy even
> though it's just sleeping) to be solved like yesterday and my opinion
> on a future timeline might not be enough to convince you of much. I
> wanted to share it anyway. I don't see an issue with the actual code
> you're proposing, but it does feel like a step in the wrong direction
> to me.
As mentioned in my original reply, I view this as entirely orthogonal,
and while I appreciate your efforts in this area, I'm a little tired of
this being brought up as a gatekeeping metric when it's not there.
If we can eliminate iowait for boosting down the line, then I'm all for
it. But this has now been pending for > 6 months and I don't think it's
far to keep stringing this along on a future promise. This isn't a lot
of code and it solves the issue for now, if the code will get removed
down the line as not needed, then that's certainly fine. For now, we
need it.
--
Jens Axboe
next prev parent reply other threads:[~2024-08-21 15:04 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-19 15:39 [PATCHSET v6 0/4] Split iowait into two states Jens Axboe
2024-08-19 15:39 ` [PATCH 1/4] sched/core: add helpers for iowait handling Jens Axboe
2024-08-19 15:39 ` [PATCH 2/4] sched/core: change rq->nr_iowait type to atomic_long_t Jens Axboe
2024-08-20 2:14 ` Zhang Qiao
2024-08-19 15:39 ` [PATCH 3/4] sched/core: have io_schedule_prepare() return a long Jens Axboe
2024-09-05 9:57 ` Peter Zijlstra
2024-08-19 15:39 ` [PATCH 4/4] sched/core: split iowait state into two states Jens Axboe
2024-09-05 10:55 ` Peter Zijlstra
2024-08-21 14:54 ` [PATCHSET v6 0/4] Split iowait " Christian Loehle
2024-08-21 15:04 ` Jens Axboe [this message]
2024-08-21 15:57 ` Christian Loehle
2024-08-24 15:34 ` Jens Axboe
2024-09-04 14:28 ` Peter Zijlstra
2024-09-04 14:41 ` Jens Axboe
2024-09-04 14:49 ` Jens Axboe
2024-09-05 9:51 ` Peter Zijlstra
2024-09-04 14:42 ` Rafael J. Wysocki
2024-09-04 15:18 ` Rafael J. Wysocki
2024-09-05 9:29 ` Christian Loehle
2024-09-05 10:40 ` Rafael J. Wysocki
2024-09-05 9:36 ` Peter Zijlstra
2024-09-05 10:31 ` Christian Loehle
2024-09-05 11:00 ` Peter Zijlstra
2024-09-05 11:09 ` Christian Loehle
2025-03-31 9:02 ` Pavel Begunkov
2025-03-31 10:33 ` Christian Loehle
2025-04-01 8:21 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9a0f2192-b897-4952-b4ea-7fe229f33001@kernel.dk \
--to=axboe@kernel.dk \
--cc=christian.loehle@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=qyousef@layalina.io \
--cc=rafael@kernel.org \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox