From: Frederic Weisbecker <frederic@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
Frederic Weisbecker <frederic@kernel.org>,
Alexey Dobriyan <adobriyan@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Wei Li <liwei391@huawei.com>,
Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>,
Yu Liao <liaoyu15@huawei.com>, Hillf Danton <hdanton@sina.com>,
Ingo Molnar <mingo@kernel.org>
Subject: [PATCH 4/7] timers/nohz: Add a comment about broken iowait counter update race
Date: Mon, 20 Feb 2023 13:41:26 +0100 [thread overview]
Message-ID: <20230220124129.519477-5-frederic@kernel.org> (raw)
In-Reply-To: <20230220124129.519477-1-frederic@kernel.org>
The per-cpu iowait task counter is incremented locally upon sleeping.
But since the task can be woken to (and by) another CPU, the counter may
then be decremented remotely. This is the source of a race involving
readers VS writer of idle/iowait sleeptime.
The following scenario shows an example where a /proc/stat reader
observes a pending sleep time as IO whereas that pending sleep time
later eventually gets accounted as non-IO.
CPU 0 CPU 1 CPU 2
----- ----- ------
//io_schedule() TASK A
current->in_iowait = 1
rq(0)->nr_iowait++
//switch to idle
// READ /proc/stat
// See nr_iowait_cpu(0) == 1
return ts->iowait_sleeptime +
ktime_sub(ktime_get(), ts->idle_entrytime)
//try_to_wake_up(TASK A)
rq(0)->nr_iowait--
//idle exit
// See nr_iowait_cpu(0) == 0
ts->idle_sleeptime += ktime_sub(ktime_get(), ts->idle_entrytime)
As a result subsequent reads on /proc/stat may expose backward progress.
This is unfortunately hardly fixable. Just add a comment about that
condition.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Li <liwei391@huawei.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/time/tick-sched.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 90d9b7b29875..edd6e9f26d16 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -705,7 +705,10 @@ static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime,
* counters if NULL.
*
* Return the cumulative idle time (since boot) for a given
- * CPU, in microseconds.
+ * CPU, in microseconds. Note this is partially broken due to
+ * the counter of iowait tasks that can be remotely updated without
+ * any synchronization. Therefore it is possible to observe backward
+ * values within two consecutive reads.
*
* This time is measured via accounting rather than sampling,
* and is as accurate as ktime_get() is.
@@ -728,7 +731,10 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
* counters if NULL.
*
* Return the cumulative iowait time (since boot) for a given
- * CPU, in microseconds.
+ * CPU, in microseconds. Note this is partially broken due to
+ * the counter of iowait tasks that can be remotely updated without
+ * any synchronization. Therefore it is possible to observe backward
+ * values within two consecutive reads.
*
* This time is measured via accounting rather than sampling,
* and is as accurate as ktime_get() is.
--
2.34.1
next prev parent reply other threads:[~2023-02-20 12:41 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-20 12:41 [PATCH 0/7] timers/nohz: Fixes and cleanups v2 Frederic Weisbecker
2023-02-20 12:41 ` [PATCH 1/7] timers/nohz: Restructure and reshuffle struct tick_sched Frederic Weisbecker
2023-02-20 12:41 ` [PATCH 2/7] timers/nohz: Only ever update sleeptime from idle exit Frederic Weisbecker
2023-02-20 12:41 ` [PATCH 3/7] timers/nohz: Protect idle/iowait sleep time under seqcount Frederic Weisbecker
2023-02-20 12:41 ` Frederic Weisbecker [this message]
2023-02-20 12:41 ` [PATCH 5/7] timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick() Frederic Weisbecker
2023-02-20 12:41 ` [PATCH 6/7] MAINTAINERS: Remove stale email address Frederic Weisbecker
2023-02-20 12:41 ` [PATCH 7/7] selftests/proc: Remove idle time monotonicity assertions Frederic Weisbecker
2023-02-20 20:16 ` Alexey Dobriyan
2023-02-20 21:53 ` Thomas Gleixner
2023-02-22 14:48 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230220124129.519477-5-frederic@kernel.org \
--to=frederic@kernel.org \
--cc=adobriyan@gmail.com \
--cc=hdanton@sina.com \
--cc=liaoyu15@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liwei391@huawei.com \
--cc=mingo@kernel.org \
--cc=mirsad.todorovac@alu.unizg.hr \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).