public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Christian Loehle <christian.loehle@arm.com>
To: Jens Axboe <axboe@kernel.dk>, linux-kernel@vger.kernel.org
Cc: peterz@infradead.org, tglx@linutronix.de,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Qais Yousef <qyousef@layalina.io>,
	"Rafael J. Wysocki" <rafael@kernel.org>
Subject: Re: [PATCHSET v6 0/4] Split iowait into two states
Date: Wed, 21 Aug 2024 15:54:31 +0100	[thread overview]
Message-ID: <c8cd6339-c168-4409-8cc4-e85e7ad92914@arm.com> (raw)
In-Reply-To: <20240819154259.215504-1-axboe@kernel.dk>

On 8/19/24 16:39, Jens Axboe wrote:
> Hi,
> 
> This is v6 of the patchset where the current in_iowait state is split
> into two parts:
> 
> 1) The "task is sleeping waiting on IO", and would like cpufreq goodness
>    in terms of sleep and wakeup latencies.
> 2) The above, and also accounted as such in the iowait stats.
> 
> The current ->in_iowait covers both, this series splits it into two types
> of state so that each can be controlled seperately.

Hi Jens,
I wanted to give a brief update on where I think we're at in terms
of iowait behavior regarding cpuidle and cpufreq.
I'm still working on getting both removed, given the discussions had
on the list [0] and at OSPM [1] this seems realistic and the best way
forward IMO.
That would then naturally make this series and the iowait workaround in
io_uring/io_uring.c unnecessary.

1. For cpuidle:
Main issue with relying on nr_iowaiters is that there is no guarantee
whatsoever that these tasks will wakeup where they went to sleep so if
we can achieve the same throughput without nr_iowaiters it shouldn't
be relevant.
I spent quite some time in fixing teo [2], because untangling nr_iowaiters
from menu seems hard, essentially nobody has worked on menu seriously for
a while now. Thus the plan here is to replace menu by teo eventually.
For your io_uring workloads I see throughput on par for teo (doesn't rely
on iowait) and menu.

# echo teo > /sys/devices/system/cpu/cpuidle/current_governor
#  ./io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S 1 -R 0 /dev/nvme0n1 
submitter=0, tid=206, file=/dev/nvme0n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=1
Engine=preadv2
IOPS=22500, BW=87MiB/s, IOS/call=0/0
IOPS=21916, BW=85MiB/s, IOS/call=1/0
IOPS=21774, BW=85MiB/s, IOS/call=1/0
IOPS=22467, BW=87MiB/s, IOS/call=1/0
Exiting on timeout
Maximum IOPS=22500
# echo menu > /sys/devices/system/cpu/cpuidle/current_governor
[  178.754571] cpuidle: using governor menu
#  ./io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S 1 -R 0 /dev/nvme0n1 
submitter=0, tid=209, file=/dev/nvme0n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=1
Engine=preadv2
IOPS=21452, BW=83MiB/s, IOS/call=0/0
IOPS=21778, BW=85MiB/s, IOS/call=1/0
IOPS=21120, BW=82MiB/s, IOS/call=1/0
IOPS=20903, BW=81MiB/s, IOS/call=1/0
Exiting on timeout
Maximum IOPS=21778

Please do give it a try for yourself as well!

2. For cpufreq:
Main issue for IO-bound workloads with iowait boosting is we're punishing
the 'good' workloads (that don't have iowait sleeps in their throughput-critical
part, which is already bad because of the scheduling overhead induced) by
making them energy-inefficient to make synthetic benchmarks happy.
A study of more realistic workloads show that they don't suffer from a problem
of building up utilization, not util_est anyway, so they don't actually benefit
from a cpufreq boost.
This leads me to the conclusion that cpufreq iowait boosting can be scrapped
altogether if we accept some degradation of benchmarks like
./io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S 1 -R 0 /dev/nvme0n1
or
fio --name=fio --rw=randread --bs=4k --runtime=5 --time_based --filename=/dev/nvme0n1 --iodepth=1 --numjobs=1
(non-io_uring) for that matter.

For io_uring where the expected case is probably not single-threaded sync IO
(or iodepth=1) the cpufreq iowait boost is just hurting use-cases by pushing
it to less efficient frequencies that might not be needed.


I know you want your problem (io_uring showing up as 100% busy even though it's
just sleeping) to be solved like yesterday and my opinion on a future timeline
might not be enough to convince you of much. I wanted to share it anyway.
I don't see an issue with the actual code you're proposing, but it does feel
like a step in the wrong direction to me.

[0] https://lore.kernel.org/lkml/20240304201625.100619-1-christian.loehle@arm.com/
v2: https://lore.kernel.org/lkml/20240518113947.2127802-1-christian.loehle@arm.com/
[1] https://www.youtube.com/watch?v=MSQGEsSziZ4
[2] https://lore.kernel.org/lkml/20240628095955.34096-1-christian.loehle@arm.com/

Regards,
Christian

> [snip]

  parent reply	other threads:[~2024-08-21 14:54 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-19 15:39 [PATCHSET v6 0/4] Split iowait into two states Jens Axboe
2024-08-19 15:39 ` [PATCH 1/4] sched/core: add helpers for iowait handling Jens Axboe
2024-08-19 15:39 ` [PATCH 2/4] sched/core: change rq->nr_iowait type to atomic_long_t Jens Axboe
2024-08-20  2:14   ` Zhang Qiao
2024-08-19 15:39 ` [PATCH 3/4] sched/core: have io_schedule_prepare() return a long Jens Axboe
2024-09-05  9:57   ` Peter Zijlstra
2024-08-19 15:39 ` [PATCH 4/4] sched/core: split iowait state into two states Jens Axboe
2024-09-05 10:55   ` Peter Zijlstra
2024-08-21 14:54 ` Christian Loehle [this message]
2024-08-21 15:04   ` [PATCHSET v6 0/4] Split iowait " Jens Axboe
2024-08-21 15:57     ` Christian Loehle
2024-08-24 15:34       ` Jens Axboe
2024-09-04 14:28 ` Peter Zijlstra
2024-09-04 14:41   ` Jens Axboe
2024-09-04 14:49     ` Jens Axboe
2024-09-05  9:51     ` Peter Zijlstra
2024-09-04 14:42   ` Rafael J. Wysocki
2024-09-04 15:18     ` Rafael J. Wysocki
2024-09-05  9:29       ` Christian Loehle
2024-09-05 10:40         ` Rafael J. Wysocki
2024-09-05  9:36       ` Peter Zijlstra
2024-09-05 10:31         ` Christian Loehle
2024-09-05 11:00           ` Peter Zijlstra
2024-09-05 11:09             ` Christian Loehle
2025-03-31  9:02 ` Pavel Begunkov
2025-03-31 10:33   ` Christian Loehle
2025-04-01  8:21     ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8cd6339-c168-4409-8cc4-e85e7ad92914@arm.com \
    --to=christian.loehle@arm.com \
    --cc=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rafael@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox