From: Alan Jenkins <alan.christopher.jenkins@gmail.com>
To: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
linux-kernel@vger.kernel.org
Cc: Doug Smythies <dsmythies@telus.net>, linux-pm@vger.kernel.org
Subject: Re: iowait v.s. idle accounting is "inconsistent" - iowait is too low
Date: Fri, 5 Jul 2019 12:25:46 +0100 [thread overview]
Message-ID: <e82b9d7c-81e5-dd80-b9c0-f5f065344e2f@gmail.com> (raw)
In-Reply-To: <000001d531a8$8931b2a0$9b9517e0$@net>
Hi, scheduler experts!
My cpu "iowait" time appears to be reported incorrectly. Do you know
why this could happen?
Doug helped me - it was he who noticed different behaviour on v4.15 vs
v4.16 vs v4.17+. So I have some confirmation of this. I don't think
Doug mentioned what hardware he ran the kernels on. lscpu says my
hardware is "Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz".
I tested using "dd" :-
dd if=bigfile bs=1M iflag=direct of=/dev/null
(1) E.g. I get the expected result if I pin "dd" to the "right" cpu.
Which cpu varies; it has often been cpu2. At the moment I have booted
5.2-rc5-ish. Here I run "dd" with "taskset -c 0", and "iowait" worked
as expected:
top - 11:01:47 up 15:10, 2 users, load average: 1.07, 0.86, 0.86
Tasks: 288 total, 1 running, 287 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.3 us, 3.3 sy, 0.0 ni, 0.0 id, 94.3 wa, 0.7 hi, 1.3 si, 0.0 st
%Cpu1 : 0.3 us, 0.7 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
%Cpu2 : 1.0 us, 0.7 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
%Cpu3 : 0.7 us, 0.3 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7854.0 total, 432.2 free, 4616.4 used, 2805.4 buff/cache
MiB Swap: 2048.0 total, 1978.2 free, 69.8 used. 2498.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31849 alan-sy+ 20 0 216052 2836 1800 D 3.0 0.0 0:00.58 dd
24220 alan-sy+ 20 0 3339828 232160 126720 S 0.7 2.9 1:53.14 gnome-shell
...
(I have also used "atop" and "vmstat 3". "atop" shows both total and
per-cpu iowait, idle, etc. "vmstat 3" just shows a total, but all the
old values stay on-screen).
(2) But compare running "dd" with "taskset -c 1":
%Cpu0 : 0.3 us, 0.3 sy, 0.0 ni, 97.3 id, 0.0 wa, 0.7 hi, 1.3 si, 0.0 st
%Cpu1 : 0.3 us, 3.0 sy, 0.0 ni, 83.7 id, 12.6 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu2 : 1.0 us, 0.3 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 1.3 us, 0.7 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
(3) If I don't use "taskset", "dd" generally doesn't sit on the
"right" cpu, and so I get don't see the right "iowait". Here's "top -d
30". Over this longer interval, "dd" appears to spend a quarter of its
time on the "right" cpu:
%Cpu0 : 0.5 us, 1.2 sy, 0.0 ni, 74.4 id, 22.2 wa, 0.5 hi, 1.3 si, 0.0 st
%Cpu1 : 0.7 us, 1.3 sy, 0.0 ni, 92.3 id, 5.7 wa, 0.1 hi, 0.0 si, 0.0 st
%Cpu2 : 0.5 us, 0.9 sy, 0.0 ni, 95.1 id, 3.3 wa, 0.1 hi, 0.0 si, 0.0 st
%Cpu3 : 0.7 us, 0.9 sy, 0.0 ni, 94.6 id, 3.6 wa, 0.1 hi, 0.0 si, 0.0 st
This point (3) does not apply to 4.15. On 4.15, it seems "dd" naturally
sits on the "right" cpu, so I get the "right" iowait". But if I pin "dd"
to a different cpu, I get the "wrong" iowait again.
I bisected 4.15-4.16. The first "bad" commit was 806486c377e3
"sched/fair: Do not migrate if the prev_cpu is idle"
(4) I can get the "right" iowait regardless of which cpu, if I boot
with "nohz=off", or if I suppress nohz by dynamically disabling all
cpuidle states except for state0 (POLL).
This point (4) does not apply to 4.16. On 4.15 and 4.16, suppressing
nohz does not help. (So far, I did not test 4.15).
I bisected 4.16-4.17. The first "new" commit was 554c8aa8ecad "sched:
idle: Select idle state before stopping the tick"
(5) I seem to get the "right" iowait regardless of which cpu, if I run
inside a virtual machine. I tested stock Fedora v5.1.? inside a KVM
(virt-manager) virtual machine, which also had 4 cpus. Whereas
un-virtualized Fedora v5.1.? on my laptop, behaves as per points 1-4.
I read the documented limitations for "iowait" time. As far as I
understand them, they don't explain such inconsistent values.
> - iowait: In a word, iowait stands for waiting for I/O to complete. But there
> are several problems:
>
> 1. Cpu will not wait for I/O to complete, iowait is the time that a task is
> waiting for I/O to complete. When cpu goes into idle state for
> outstanding task io, another task will be scheduled on this CPU.
>
> 2. In a multi-core CPU, the task waiting for I/O to complete is not running
> on any CPU, so the iowait of each CPU is difficult to calculate.
>
> 3. The value of iowait field in /proc/stat will decrease in certain
> conditions
Thanks for all the kernels
Alan
next prev parent reply other threads:[~2019-07-05 11:25 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-01 15:33 NO_HZ_IDLE causes consistently low cpu "iowait" time (and higher cpu "idle" time) Alan Jenkins
2019-07-03 14:06 ` Doug Smythies
2019-07-03 16:09 ` Alan Jenkins
2019-07-05 11:25 ` Alan Jenkins [this message]
2019-07-05 11:38 ` iowait v.s. idle accounting is "inconsistent" - iowait is too low Peter Zijlstra
2019-07-05 13:37 ` Alan Jenkins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e82b9d7c-81e5-dd80-b9c0-f5f065344e2f@gmail.com \
--to=alan.christopher.jenkins@gmail.com \
--cc=dsmythies@telus.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).