* [PATCH 1/4] nohz: Only update sleeptime stats locally
@ 2014-05-07 13:41 Denys Vlasenko
2014-05-07 13:41 ` [PATCH 2/4] nohz: Synchronize sleep time stats with seqlock Denys Vlasenko
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Denys Vlasenko @ 2014-05-07 13:41 UTC (permalink / raw)
To: linux-kernel
Cc: Frederic Weisbecker, Denys Vlasenko, Hidetoshi Seto,
Fernando Luis Vazquez Cao, Tetsuo Handa, Thomas Gleixner,
Ingo Molnar, Peter Zijlstra, Andrew Morton, Arjan van de Ven,
Oleg Nesterov
From: Frederic Weisbecker <fweisbec@gmail.com>
The idle and io sleeptime stats can be updated concurrently from callers
of get_cpu_idle_time_us(), get_cpu_iowait_time_us() and
tick_nohz_stop_idle().
Updaters can easily race and mess up with internal datas coherency,
for example when a governor calls a get_cpu_*_time_us() API and the
target CPU exits idle at the same time, because no locking or whatsoever
is there to protect against this concurrency.
To fix this, lets only update the sleeptime stats locally when the CPU
exits from idle. This is the only place where a real update is truly
needed. The callers of get_cpu_*_time_us() can simply add up the pending
sleep time delta to the last sleeptime snapshot in order to get a coherent
result. There is no need for them to also update the stats.
Rebased to Linus' tree by Denys Vlasenko.
Reported-by: Fernando Luis Vazquez Cao <fernando_b1@lab.ntt.co.jp>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Fernando Luis Vazquez Cao <fernando_b1@lab.ntt.co.jp>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
kernel/time/tick-sched.c | 63 ++++++++++++++++--------------------------------
1 file changed, 21 insertions(+), 42 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6558b7a..7d86a54 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -406,31 +406,16 @@ static void tick_nohz_update_jiffies(ktime_t now)
touch_softlockup_watchdog();
}
-/*
- * Updates the per cpu time idle statistics counters
- */
-static void
-update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time)
+static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
{
ktime_t delta;
- if (ts->idle_active) {
- delta = ktime_sub(now, ts->idle_entrytime);
- if (nr_iowait_cpu(cpu) > 0)
- ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
- else
- ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
- ts->idle_entrytime = now;
- }
-
- if (last_update_time)
- *last_update_time = ktime_to_us(now);
-
-}
-
-static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
-{
- update_ts_time_stats(smp_processor_id(), ts, now, NULL);
+ /* Updates the per cpu time idle statistics counters */
+ delta = ktime_sub(now, ts->idle_entrytime);
+ if (nr_iowait_cpu(smp_processor_id()) > 0)
+ ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
+ else
+ ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
ts->idle_active = 0;
sched_clock_idle_wakeup_event(0);
@@ -469,17 +454,14 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
return -1;
now = ktime_get();
- if (last_update_time) {
- update_ts_time_stats(cpu, ts, now, last_update_time);
- idle = ts->idle_sleeptime;
- } else {
- if (ts->idle_active && !nr_iowait_cpu(cpu)) {
- ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ if (last_update_time)
+ *last_update_time = ktime_to_us(now);
- idle = ktime_add(ts->idle_sleeptime, delta);
- } else {
- idle = ts->idle_sleeptime;
- }
+ if (ts->idle_active && !nr_iowait_cpu(cpu)) {
+ ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ idle = ktime_add(ts->idle_sleeptime, delta);
+ } else {
+ idle = ts->idle_sleeptime;
}
return ktime_to_us(idle);
@@ -510,17 +492,14 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
return -1;
now = ktime_get();
- if (last_update_time) {
- update_ts_time_stats(cpu, ts, now, last_update_time);
- iowait = ts->iowait_sleeptime;
- } else {
- if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
- ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ if (last_update_time)
+ *last_update_time = ktime_to_us(now);
- iowait = ktime_add(ts->iowait_sleeptime, delta);
- } else {
- iowait = ts->iowait_sleeptime;
- }
+ if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
+ ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ iowait = ktime_add(ts->iowait_sleeptime, delta);
+ } else {
+ iowait = ts->iowait_sleeptime;
}
return ktime_to_us(iowait);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 2/4] nohz: Synchronize sleep time stats with seqlock
2014-05-07 13:41 [PATCH 1/4] nohz: Only update sleeptime stats locally Denys Vlasenko
@ 2014-05-07 13:41 ` Denys Vlasenko
2014-05-07 13:41 ` [PATCH 3/4 v2] nohz: Fix idle/iowait counts going backwards Denys Vlasenko
2014-05-07 13:41 ` [PATCH 4/4 v2] nohz: Fix iowait overcounting if iowait task migrates Denys Vlasenko
2 siblings, 0 replies; 16+ messages in thread
From: Denys Vlasenko @ 2014-05-07 13:41 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Frederic Weisbecker, Hidetoshi Seto,
Fernando Luis Vazquez Cao, Tetsuo Handa, Thomas Gleixner,
Ingo Molnar, Peter Zijlstra, Andrew Morton, Arjan van de Ven,
Oleg Nesterov
This patch is adapted from a very similar patch
by Frederic Weisbecker. The commit message text is also mostly
from Frederic's patch.
When some call site uses get_cpu_*_time_us() to read a sleeptime
stat, it deduces the total sleeptime by adding the pending time
to the last sleeptime snapshot if the CPU target is idle.
Namely this sums up to:
sleeptime = ts($CPU)->idle_sleeptime;
if (ts($CPU)->idle_active)
sleeptime += NOW() - ts($CPU)->idle_entrytime
But this only works if idle_sleeptime, idle_entrytime and idle_active are
read and updated under some disciplined order.
Lets consider the following scenario:
CPU 0 CPU 1
(seq 1) ts(CPU 0)->idle_active = 1
ts(CPU 0)->idle_entrytime = NOW()
(seq 2) sleeptime = NOW() - ts(CPU 0)->idle_entrytime
ts(CPU 0)->idle_sleeptime += sleeptime sleeptime = ts(CPU 0)->idle_sleeptime;
if (ts(CPU 0)->idle_active)
ts(CPU 0)->idle_entrytime = NOW() sleeptime += NOW() - ts(CPU 0)->idle_entrytime
The resulting value of sleeptime in CPU 1 can vary depending of some
ordering scenario:
* If it sees the value of idle_entrytime after seq 1 and the value of idle_sleeptime
after seq 2, the value of sleeptime will be buggy because it accounts the delta twice,
so it will be too high.
* If it sees the value of idle_entrytime after seq 2 and the value of idle_sleeptime
after seq 1, the value of sleeptime will be buggy because it misses the delta, so it
will be too low.
* If it sees the value of idle_entrytime and idle_sleeptime, both as seen after seq 1 or 2,
the value will be correct.
Some more tricky scenario can also happen if idle_active value is read from a former sequence.
Hence we must honour the following constraints:
- idle_sleeptime, idle_active and idle_entrytime must be updated and read
under some correctly enforced SMP ordering
- The three variable values as read by CPU 1 must belong to the same update
sequences from CPU 0. The update sequences must be delimited such that the
resulting three values after a sequence completion produce a coherent result
together when read from the CPU 1.
- We need to prevent from fetching middle-state sequence values.
The ideal solution to implement this synchronization is to use a seqcount. Lets
use one here around these three values to enforce sequence synchronization between
updates and read.
This fixes potential cpufreq governor bugs.
It also works towards fixing reported bug where non-monotonic sleeptime stats
are returned by /proc/stat when it is frequently read.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Fernando Luis Vazquez Cao <fernando_b1@lab.ntt.co.jp>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
include/linux/tick.h | 1 +
kernel/time/tick-sched.c | 37 ++++++++++++++++++++++++++-----------
2 files changed, 27 insertions(+), 11 deletions(-)
diff --git a/include/linux/tick.h b/include/linux/tick.h
index b84773c..4de1f9e 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -67,6 +67,7 @@ struct tick_sched {
ktime_t idle_exittime;
ktime_t idle_sleeptime;
ktime_t iowait_sleeptime;
+ seqcount_t idle_sleeptime_seq;
ktime_t sleep_length;
unsigned long last_jiffies;
unsigned long next_jiffies;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 7d86a54..8f0f2ee 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -411,12 +411,14 @@ static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
ktime_t delta;
/* Updates the per cpu time idle statistics counters */
+ write_seqcount_begin(&ts->idle_sleeptime_seq);
delta = ktime_sub(now, ts->idle_entrytime);
if (nr_iowait_cpu(smp_processor_id()) > 0)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
else
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
ts->idle_active = 0;
+ write_seqcount_end(&ts->idle_sleeptime_seq);
sched_clock_idle_wakeup_event(0);
}
@@ -425,9 +427,13 @@ static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
{
ktime_t now = ktime_get();
+ write_seqcount_begin(&ts->idle_sleeptime_seq);
ts->idle_entrytime = now;
ts->idle_active = 1;
+ write_seqcount_end(&ts->idle_sleeptime_seq);
+
sched_clock_idle_sleep_event();
+
return now;
}
@@ -449,6 +455,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
{
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
ktime_t now, idle;
+ unsigned int seq;
if (!tick_nohz_active)
return -1;
@@ -457,15 +464,18 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
if (last_update_time)
*last_update_time = ktime_to_us(now);
- if (ts->idle_active && !nr_iowait_cpu(cpu)) {
- ktime_t delta = ktime_sub(now, ts->idle_entrytime);
- idle = ktime_add(ts->idle_sleeptime, delta);
- } else {
+ do {
+ ktime_t delta;
+
+ seq = read_seqcount_begin(&ts->idle_sleeptime_seq);
idle = ts->idle_sleeptime;
- }
+ if (ts->idle_active && !nr_iowait_cpu(cpu)) {
+ delta = ktime_sub(now, ts->idle_entrytime);
+ idle = ktime_add(idle, delta);
+ }
+ } while (read_seqcount_retry(&ts->idle_sleeptime_seq, seq));
return ktime_to_us(idle);
-
}
EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
@@ -487,6 +497,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
{
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
ktime_t now, iowait;
+ unsigned int seq;
if (!tick_nohz_active)
return -1;
@@ -495,12 +506,16 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
if (last_update_time)
*last_update_time = ktime_to_us(now);
- if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
- ktime_t delta = ktime_sub(now, ts->idle_entrytime);
- iowait = ktime_add(ts->iowait_sleeptime, delta);
- } else {
+ do {
+ ktime_t delta;
+
+ seq = read_seqcount_begin(&ts->idle_sleeptime_seq);
iowait = ts->iowait_sleeptime;
- }
+ if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
+ delta = ktime_sub(now, ts->idle_entrytime);
+ iowait = ktime_add(ts->iowait_sleeptime, delta);
+ }
+ } while (read_seqcount_retry(&ts->idle_sleeptime_seq, seq));
return ktime_to_us(iowait);
}
--
1.8.1.4
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 3/4 v2] nohz: Fix idle/iowait counts going backwards
2014-05-07 13:41 [PATCH 1/4] nohz: Only update sleeptime stats locally Denys Vlasenko
2014-05-07 13:41 ` [PATCH 2/4] nohz: Synchronize sleep time stats with seqlock Denys Vlasenko
@ 2014-05-07 13:41 ` Denys Vlasenko
2014-05-07 14:23 ` Peter Zijlstra
2014-05-07 13:41 ` [PATCH 4/4 v2] nohz: Fix iowait overcounting if iowait task migrates Denys Vlasenko
2 siblings, 1 reply; 16+ messages in thread
From: Denys Vlasenko @ 2014-05-07 13:41 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Frederic Weisbecker, Hidetoshi Seto,
Fernando Luis Vazquez Cao, Tetsuo Handa, Thomas Gleixner,
Ingo Molnar, Peter Zijlstra, Andrew Morton, Arjan van de Ven,
Oleg Nesterov
With this change, "iowait-ness" of every idle period is decided
at the moment it starts:
if this CPU's run-queue had tasks waiting on I/O, then this idle
period's duration will be added to iowait_sleeptime.
This fixes the bug where iowait and/or idle counts could go backwards,
but iowait accounting is not precise (it can show more iowait
that there really is).
v2: Use symbolic constants for TRUE_IDLE and IOWAIT_IDLE.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Fernando Luis Vazquez Cao <fernando_b1@lab.ntt.co.jp>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
kernel/time/tick-sched.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 8f0f2ee..7d0e14a 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -406,6 +406,11 @@ static void tick_nohz_update_jiffies(ktime_t now)
touch_softlockup_watchdog();
}
+enum {
+ TRUE_IDLE = 1,
+ IOWAIT_IDLE = 2,
+};
+
static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
{
ktime_t delta;
@@ -413,7 +418,7 @@ static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
/* Updates the per cpu time idle statistics counters */
write_seqcount_begin(&ts->idle_sleeptime_seq);
delta = ktime_sub(now, ts->idle_entrytime);
- if (nr_iowait_cpu(smp_processor_id()) > 0)
+ if (ts->idle_active == IOWAIT_IDLE)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
else
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
@@ -429,7 +434,7 @@ static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
write_seqcount_begin(&ts->idle_sleeptime_seq);
ts->idle_entrytime = now;
- ts->idle_active = 1;
+ ts->idle_active = nr_iowait_cpu(smp_processor_id()) ? IOWAIT_IDLE : TRUE_IDLE;
write_seqcount_end(&ts->idle_sleeptime_seq);
sched_clock_idle_sleep_event();
@@ -469,7 +474,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
seq = read_seqcount_begin(&ts->idle_sleeptime_seq);
idle = ts->idle_sleeptime;
- if (ts->idle_active && !nr_iowait_cpu(cpu)) {
+ if (ts->idle_active == TRUE_IDLE) {
delta = ktime_sub(now, ts->idle_entrytime);
idle = ktime_add(idle, delta);
}
@@ -511,7 +516,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
seq = read_seqcount_begin(&ts->idle_sleeptime_seq);
iowait = ts->iowait_sleeptime;
- if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
+ if (ts->idle_active == IOWAIT_IDLE) {
delta = ktime_sub(now, ts->idle_entrytime);
iowait = ktime_add(ts->iowait_sleeptime, delta);
}
--
1.8.1.4
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH 3/4 v2] nohz: Fix idle/iowait counts going backwards
2014-05-07 13:41 ` [PATCH 3/4 v2] nohz: Fix idle/iowait counts going backwards Denys Vlasenko
@ 2014-05-07 14:23 ` Peter Zijlstra
2014-05-07 16:49 ` Denys Vlasenko
0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2014-05-07 14:23 UTC (permalink / raw)
To: Denys Vlasenko
Cc: linux-kernel, Frederic Weisbecker, Hidetoshi Seto,
Fernando Luis Vazquez Cao, Tetsuo Handa, Thomas Gleixner,
Ingo Molnar, Andrew Morton, Arjan van de Ven, Oleg Nesterov
On Wed, May 07, 2014 at 03:41:33PM +0200, Denys Vlasenko wrote:
> With this change, "iowait-ness" of every idle period is decided
> at the moment it starts:
> if this CPU's run-queue had tasks waiting on I/O, then this idle
> period's duration will be added to iowait_sleeptime.
>
> This fixes the bug where iowait and/or idle counts could go backwards,
> but iowait accounting is not precise (it can show more iowait
> that there really is).
>
NAK on this, the thing going backwards is a symptom of the bug, not an
actual bug itself.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/4 v2] nohz: Fix idle/iowait counts going backwards
2014-05-07 14:23 ` Peter Zijlstra
@ 2014-05-07 16:49 ` Denys Vlasenko
2014-05-07 16:56 ` Peter Zijlstra
0 siblings, 1 reply; 16+ messages in thread
From: Denys Vlasenko @ 2014-05-07 16:49 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Frederic Weisbecker, Hidetoshi Seto,
Fernando Luis Vazquez Cao, Tetsuo Handa, Thomas Gleixner,
Ingo Molnar, Andrew Morton, Arjan van de Ven, Oleg Nesterov
On 05/07/2014 04:23 PM, Peter Zijlstra wrote:
> On Wed, May 07, 2014 at 03:41:33PM +0200, Denys Vlasenko wrote:
>> With this change, "iowait-ness" of every idle period is decided
>> at the moment it starts:
>> if this CPU's run-queue had tasks waiting on I/O, then this idle
>> period's duration will be added to iowait_sleeptime.
>>
>> This fixes the bug where iowait and/or idle counts could go backwards,
>> but iowait accounting is not precise (it can show more iowait
>> that there really is).
>>
>
> NAK on this, the thing going backwards is a symptom of the bug, not an
> actual bug itself.
This patch does fix that bug.
The bug is that in nohz_stop_idle(),
we base the decision to add accumulated time to
ts->iowait_sleeptime or to ts->idle_sleeptime
on nr_iowait_cpu(cpu):
if (nr_iowait_cpu(smp_processor_id()) > 0)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
else
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
and we use the same nr_iowait_cpu(cpu)
to calculate result of get_cpu_iowait_time_us():
if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
delta = ktime_sub(now, ts->idle_entrytime);
iowait = ktime_add(ts->iowait_sleeptime, delta);
}
This is wrong because nr_iowait_cpu(cpu) is not stable.
It could be != 0 in get_cpu_iowait_time_us() but later
become 0 when we are in nohz_stop_idle().
We must use consistent logic in these two functions.
If nr_iowait_cpu() added something to iowait counter,
the same should be done in nohz_stop_idle().
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 3/4 v2] nohz: Fix idle/iowait counts going backwards
2014-05-07 16:49 ` Denys Vlasenko
@ 2014-05-07 16:56 ` Peter Zijlstra
2014-05-07 18:24 ` Denys Vlasenko
0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2014-05-07 16:56 UTC (permalink / raw)
To: Denys Vlasenko
Cc: linux-kernel, Frederic Weisbecker, Hidetoshi Seto,
Fernando Luis Vazquez Cao, Tetsuo Handa, Thomas Gleixner,
Ingo Molnar, Andrew Morton, Arjan van de Ven, Oleg Nesterov
[-- Attachment #1: Type: text/plain, Size: 1327 bytes --]
On Wed, May 07, 2014 at 06:49:47PM +0200, Denys Vlasenko wrote:
> On 05/07/2014 04:23 PM, Peter Zijlstra wrote:
> > On Wed, May 07, 2014 at 03:41:33PM +0200, Denys Vlasenko wrote:
> >> With this change, "iowait-ness" of every idle period is decided
> >> at the moment it starts:
> >> if this CPU's run-queue had tasks waiting on I/O, then this idle
> >> period's duration will be added to iowait_sleeptime.
> >>
> >> This fixes the bug where iowait and/or idle counts could go backwards,
> >> but iowait accounting is not precise (it can show more iowait
> >> that there really is).
> >>
> >
> > NAK on this, the thing going backwards is a symptom of the bug, not an
> > actual bug itself.
>
> This patch does fix that bug.
Which bug, there's two here:
1) that NOHZ and !NOHZ iowait accounting aren't identical
2) that iowait accounting in general is a steaming pile of crap
Those are the only two bugs around. If you want easy, you get to fix 1,
if you want to do hard you try and fix 2.
iowait going backwards is a symptom of NOHZ iowait accounting being
broken -- as compared to !NOHZ.
Just stopping it from going backwards is not a fix for anything except
papering over symptoms, and if you thing that's a good approach to
software engineering then I'll welcome you to my killfile.
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/4 v2] nohz: Fix idle/iowait counts going backwards
2014-05-07 16:56 ` Peter Zijlstra
@ 2014-05-07 18:24 ` Denys Vlasenko
2014-05-07 19:06 ` Peter Zijlstra
0 siblings, 1 reply; 16+ messages in thread
From: Denys Vlasenko @ 2014-05-07 18:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Frederic Weisbecker, Hidetoshi Seto,
Fernando Luis Vazquez Cao, Tetsuo Handa, Thomas Gleixner,
Ingo Molnar, Andrew Morton, Arjan van de Ven, Oleg Nesterov
On 05/07/2014 06:56 PM, Peter Zijlstra wrote:
> On Wed, May 07, 2014 at 06:49:47PM +0200, Denys Vlasenko wrote:
>> On 05/07/2014 04:23 PM, Peter Zijlstra wrote:
>>> On Wed, May 07, 2014 at 03:41:33PM +0200, Denys Vlasenko wrote:
>>>> With this change, "iowait-ness" of every idle period is decided
>>>> at the moment it starts:
>>>> if this CPU's run-queue had tasks waiting on I/O, then this idle
>>>> period's duration will be added to iowait_sleeptime.
>>>>
>>>> This fixes the bug where iowait and/or idle counts could go backwards,
>>>> but iowait accounting is not precise (it can show more iowait
>>>> that there really is).
>>>>
>>>
>>> NAK on this, the thing going backwards is a symptom of the bug, not an
>>> actual bug itself.
>>
>> This patch does fix that bug.
>
> Which bug, there's two here:
>
> 1) that NOHZ and !NOHZ iowait accounting aren't identical
They can hardly be identical, considering how different these modes are.
And they don't have to be identical, in fact.
It is enough if they give similar numbers for similar usage scenarios.
E.g. if I run dd </dev/sda >/dev/null, I expect that iowait
counter increases while idle counter almost standing still
(on the CPU where dd runs, or course).
> 2) that iowait accounting in general is a steaming pile of crap
If you want to nuke iowait (for example, make its counter constant 0),
I personally won't object. Can't guarantee others won't...
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/4 v2] nohz: Fix idle/iowait counts going backwards
2014-05-07 18:24 ` Denys Vlasenko
@ 2014-05-07 19:06 ` Peter Zijlstra
0 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2014-05-07 19:06 UTC (permalink / raw)
To: Denys Vlasenko
Cc: linux-kernel, Frederic Weisbecker, Hidetoshi Seto,
Fernando Luis Vazquez Cao, Tetsuo Handa, Thomas Gleixner,
Ingo Molnar, Andrew Morton, Arjan van de Ven, Oleg Nesterov
[-- Attachment #1: Type: text/plain, Size: 3115 bytes --]
On Wed, May 07, 2014 at 08:24:25PM +0200, Denys Vlasenko wrote:
> On 05/07/2014 06:56 PM, Peter Zijlstra wrote:
> > On Wed, May 07, 2014 at 06:49:47PM +0200, Denys Vlasenko wrote:
> >> On 05/07/2014 04:23 PM, Peter Zijlstra wrote:
> >>> On Wed, May 07, 2014 at 03:41:33PM +0200, Denys Vlasenko wrote:
> >>>> With this change, "iowait-ness" of every idle period is decided
> >>>> at the moment it starts:
> >>>> if this CPU's run-queue had tasks waiting on I/O, then this idle
> >>>> period's duration will be added to iowait_sleeptime.
> >>>>
> >>>> This fixes the bug where iowait and/or idle counts could go backwards,
> >>>> but iowait accounting is not precise (it can show more iowait
> >>>> that there really is).
> >>>>
> >>>
> >>> NAK on this, the thing going backwards is a symptom of the bug, not an
> >>> actual bug itself.
> >>
> >> This patch does fix that bug.
> >
> > Which bug, there's two here:
> >
> > 1) that NOHZ and !NOHZ iowait accounting aren't identical
>
> They can hardly be identical, considering how different these modes are.
They can, we've managed it for pretty much everything else, although its
not always easy.
And if you look at the patch I send, that provides the exact moment the
task wakes up, so you can round that to the nearest jiffy boundary and
account appropriately as if it were accounted on the per-cpu timer tick.
Now, there's likely fun corner cases which need more TLC, see
kernel/sched/proc.c for the fun times we had with the global load avg.
> And they don't have to be identical, in fact.
Yes they have to; per definition. CONFIG_NOHZ should have no user
visible difference (except of course the obvious of less interrupts and
ideally energy usage).
> > 2) that iowait accounting in general is a steaming pile of crap
>
> If you want to nuke iowait (for example, make its counter constant 0),
> I personally won't object. Can't guarantee others won't...
I won't object to a constant 0, but then we have to do it irrespective
of NOHZ. But not necessarily, I think we can have a coherent definition
of iowait, just most likely not per-cpu.
So for UP we have the very simple definition that any idle cycle while
there is a task waiting for io is accounted to iowait.
This definition can be 'trivially' extended to a global iowait,
expensive to compute though.
However, one can argue its not correct to do that trivial extension,
since if there's only 1 task waiting for io, it could at most tie up 1
CPUs worth of idle time (but very emphatically not a specific cpu).
So somewhere in that space is I think a viable way to account iowait,
but the straight fwd implementation (in as far as the eventual
definition will be straight fwd to begin with) will likely be
prohibitively expensive to compute.
Then again, if you look at kernel/sched/proc.c (again) and look at the
bloody mess we had to make for the global load avg accounting to work
for NOHZ there might (or might just not) be a shimmer of hope we can
pull this off in a scalable manner.
Like said, 'fun' problem :-)
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 4/4 v2] nohz: Fix iowait overcounting if iowait task migrates
2014-05-07 13:41 [PATCH 1/4] nohz: Only update sleeptime stats locally Denys Vlasenko
2014-05-07 13:41 ` [PATCH 2/4] nohz: Synchronize sleep time stats with seqlock Denys Vlasenko
2014-05-07 13:41 ` [PATCH 3/4 v2] nohz: Fix idle/iowait counts going backwards Denys Vlasenko
@ 2014-05-07 13:41 ` Denys Vlasenko
2 siblings, 0 replies; 16+ messages in thread
From: Denys Vlasenko @ 2014-05-07 13:41 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Frederic Weisbecker, Hidetoshi Seto,
Fernando Luis Vazquez Cao, Tetsuo Handa, Thomas Gleixner,
Ingo Molnar, Peter Zijlstra, Andrew Morton, Arjan van de Ven,
Oleg Nesterov
Before this change, if last IO-blocked task wakes up
on a different CPU, the original CPU may stay idle for much longer,
and the entire time it stays idle is accounted as iowait time.
This change adds struct tick_sched::iowait_exittime member.
On entry to idle, it is set to KTIME_MAX.
Last IO-blocked task, if migrated, sets it to current time.
Note that this can happen only once per each idle period:
new iowaiting tasks can't magically appear on idle CPU's rq.
If iowait_exittime is set, then (iowait_exittime - idle_entrytime)
gets accounted as iowait, and the remaining (now - iowait_exittime)
as "true" idle.
v2:
* Made iowait_exittime atomic64_t. This way, no locking
is necessary when setting/accessing it.
* Do more paranoid checking of iowait_exittime before using it:
now code checks that idle_entrytime <= iowait_exittime
and iowait_exittime <= now, and uses iowait_exittime only
if both are true.
Run-tested: /proc/stat counters no longer go backwards.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Fernando Luis Vazquez Cao <fernando_b1@lab.ntt.co.jp>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
include/linux/tick.h | 2 ++
kernel/sched/core.c | 19 ++++++++--
kernel/time/tick-sched.c | 91 +++++++++++++++++++++++++++++++++++++++++-------
3 files changed, 98 insertions(+), 14 deletions(-)
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 4de1f9e..49f8b29 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -67,6 +67,7 @@ struct tick_sched {
ktime_t idle_exittime;
ktime_t idle_sleeptime;
ktime_t iowait_sleeptime;
+ atomic64_t iowait_exittime;
seqcount_t idle_sleeptime_seq;
ktime_t sleep_length;
unsigned long last_jiffies;
@@ -140,6 +141,7 @@ extern void tick_nohz_irq_exit(void);
extern ktime_t tick_nohz_get_sleep_length(void);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
+extern void tick_nohz_iowait_to_idle(int cpu);
# else /* !CONFIG_NO_HZ_COMMON */
static inline int tick_nohz_tick_stopped(void)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 268a45e..3137980 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4208,6 +4208,21 @@ EXPORT_SYMBOL_GPL(yield_to);
* This task is about to go to sleep on IO. Increment rq->nr_iowait so
* that process accounting knows that this is a task in IO wait state.
*/
+#ifdef CONFIG_NO_HZ_COMMON
+static __sched void io_wait_end(struct rq *rq)
+{
+ if (atomic_dec_and_test(&rq->nr_iowait)) {
+ if (raw_smp_processor_id() != cpu_of(rq))
+ tick_nohz_iowait_to_idle(cpu_of(rq));
+ }
+}
+#else
+static inline void io_wait_end(struct rq *rq)
+{
+ atomic_dec(&rq->nr_iowait);
+}
+#endif
+
void __sched io_schedule(void)
{
struct rq *rq = raw_rq();
@@ -4218,7 +4233,7 @@ void __sched io_schedule(void)
current->in_iowait = 1;
schedule();
current->in_iowait = 0;
- atomic_dec(&rq->nr_iowait);
+ io_wait_end(rq);
delayacct_blkio_end();
}
EXPORT_SYMBOL(io_schedule);
@@ -4234,7 +4249,7 @@ long __sched io_schedule_timeout(long timeout)
current->in_iowait = 1;
ret = schedule_timeout(timeout);
current->in_iowait = 0;
- atomic_dec(&rq->nr_iowait);
+ io_wait_end(rq);
delayacct_blkio_end();
return ret;
}
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 7d0e14a..f3b214d 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -411,17 +411,41 @@ enum {
IOWAIT_IDLE = 2,
};
+/* We always access iowait_exittime atomically. */
+static inline ktime_t fetch_iowait_exittime(struct tick_sched *ts)
+{
+ ktime_t v;
+
+ v.tv64 = atomic64_read(&ts->iowait_exittime);
+ return v;
+}
+
static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
{
- ktime_t delta;
+ ktime_t start, delta, end;
/* Updates the per cpu time idle statistics counters */
write_seqcount_begin(&ts->idle_sleeptime_seq);
- delta = ktime_sub(now, ts->idle_entrytime);
- if (ts->idle_active == IOWAIT_IDLE)
+ start = ts->idle_entrytime;
+ delta = ktime_sub(now, start);
+
+ if (ts->idle_active == IOWAIT_IDLE) {
+ /*
+ * If last iowaiting task on our rq wakes up on another
+ * CPU, it sets iowait_exittime.
+ * It's the only case it can satisfy "start <= end <= now".
+ */
+ end = fetch_iowait_exittime(ts);
+ if (ktime_compare(start, end) <= 0 && ktime_compare(end, now) <= 0) {
+ /* [end, now] goes to "true idle" counter */
+ delta = ktime_sub(now, end);
+ ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
+ delta = ktime_sub(end, start);
+ }
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
- else
+ } else {
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
+ }
ts->idle_active = 0;
write_seqcount_end(&ts->idle_sleeptime_seq);
@@ -435,6 +459,7 @@ static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
write_seqcount_begin(&ts->idle_sleeptime_seq);
ts->idle_entrytime = now;
ts->idle_active = nr_iowait_cpu(smp_processor_id()) ? IOWAIT_IDLE : TRUE_IDLE;
+ atomic64_set(&ts->iowait_exittime, KTIME_MAX);
write_seqcount_end(&ts->idle_sleeptime_seq);
sched_clock_idle_sleep_event();
@@ -442,6 +467,14 @@ static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
return now;
}
+void tick_nohz_iowait_to_idle(int cpu)
+{
+ struct tick_sched *ts = tick_get_tick_sched(cpu);
+ ktime_t now = ktime_get();
+
+ atomic64_set(&ts->iowait_exittime, now.tv64);
+}
+
/**
* get_cpu_idle_time_us - get the total idle time of a cpu
* @cpu: CPU number to query
@@ -458,7 +491,7 @@ static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
*/
u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
{
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts;
ktime_t now, idle;
unsigned int seq;
@@ -469,14 +502,35 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
if (last_update_time)
*last_update_time = ktime_to_us(now);
+ ts = &per_cpu(tick_cpu_sched, cpu);
+
do {
- ktime_t delta;
+ ktime_t start, delta, iowait_exit;
seq = read_seqcount_begin(&ts->idle_sleeptime_seq);
idle = ts->idle_sleeptime;
- if (ts->idle_active == TRUE_IDLE) {
- delta = ktime_sub(now, ts->idle_entrytime);
+
+ if (ts->idle_active /* either IOWAIT_IDLE or TRUE_IDLE */) {
+ start = ts->idle_entrytime;
+
+ if (ts->idle_active == IOWAIT_IDLE) {
+ /* This idle period started as iowait */
+
+ iowait_exit = fetch_iowait_exittime(ts);
+ if (ktime_compare(start, iowait_exit) > 0 ||
+ ktime_compare(iowait_exit, now) > 0) {
+ /* And it still is (iowait_exit isn't set) */
+ goto skip;
+ }
+ /*
+ * This CPU used to be "iowait idle", but iowait task
+ * has migrated. The rest of idle time is "true idle":
+ */
+ start = iowait_exit;
+ }
+ delta = ktime_sub(now, start);
idle = ktime_add(idle, delta);
+ skip: ;
}
} while (read_seqcount_retry(&ts->idle_sleeptime_seq, seq));
@@ -500,7 +554,7 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
*/
u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
{
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts;
ktime_t now, iowait;
unsigned int seq;
@@ -511,14 +565,27 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
if (last_update_time)
*last_update_time = ktime_to_us(now);
+ ts = &per_cpu(tick_cpu_sched, cpu);
+
do {
- ktime_t delta;
+ ktime_t start, delta, iowait_exit;
seq = read_seqcount_begin(&ts->idle_sleeptime_seq);
iowait = ts->iowait_sleeptime;
+
if (ts->idle_active == IOWAIT_IDLE) {
- delta = ktime_sub(now, ts->idle_entrytime);
- iowait = ktime_add(ts->iowait_sleeptime, delta);
+ start = ts->idle_entrytime;
+ iowait_exit = fetch_iowait_exittime(ts);
+ /*
+ * Did last iowaiting task on our rq wake up on other CPU
+ * sometime in the past, and updated ts->iowait_exittime?
+ */
+ if (ktime_compare(start, iowait_exit) > 0 ||
+ ktime_compare(iowait_exit, now) > 0) {
+ iowait_exit = now; /* no */
+ }
+ delta = ktime_sub(iowait_exit, start);
+ iowait = ktime_add(iowait, delta);
}
} while (read_seqcount_retry(&ts->idle_sleeptime_seq, seq));
--
1.8.1.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 1/4] nohz: Only update sleeptime stats locally
@ 2014-04-24 18:45 Denys Vlasenko
0 siblings, 0 replies; 16+ messages in thread
From: Denys Vlasenko @ 2014-04-24 18:45 UTC (permalink / raw)
To: linux-kernel
Cc: Frederic Weisbecker, Denys Vlasenko, Hidetoshi Seto,
Fernando Luis Vazquez Cao, Tetsuo Handa, Thomas Gleixner,
Ingo Molnar, Peter Zijlstra, Andrew Morton, Arjan van de Ven,
Oleg Nesterov
From: Frederic Weisbecker <fweisbec@gmail.com>
The idle and io sleeptime stats can be updated concurrently from callers
of get_cpu_idle_time_us(), get_cpu_iowait_time_us() and
tick_nohz_stop_idle().
Updaters can easily race and mess up with internal datas coherency,
for example when a governor calls a get_cpu_*_time_us() API and the
target CPU exits idle at the same time, because no locking or whatsoever
is there to protect against this concurrency.
To fix this, lets only update the sleeptime stats locally when the CPU
exits from idle. This is the only place where a real update is truly
needed. The callers of get_cpu_*_time_us() can simply add up the pending
sleep time delta to the last sleeptime snapshot in order to get a coherent
result. There is no need for them to also update the stats.
Rebased to Linus' tree by Denys Vlasenko.
Reported-by: Fernando Luis Vazquez Cao <fernando_b1@lab.ntt.co.jp>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Fernando Luis Vazquez Cao <fernando_b1@lab.ntt.co.jp>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
kernel/time/tick-sched.c | 63 ++++++++++++++++--------------------------------
1 file changed, 21 insertions(+), 42 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6558b7a..7d86a54 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -406,31 +406,16 @@ static void tick_nohz_update_jiffies(ktime_t now)
touch_softlockup_watchdog();
}
-/*
- * Updates the per cpu time idle statistics counters
- */
-static void
-update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time)
+static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
{
ktime_t delta;
- if (ts->idle_active) {
- delta = ktime_sub(now, ts->idle_entrytime);
- if (nr_iowait_cpu(cpu) > 0)
- ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
- else
- ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
- ts->idle_entrytime = now;
- }
-
- if (last_update_time)
- *last_update_time = ktime_to_us(now);
-
-}
-
-static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
-{
- update_ts_time_stats(smp_processor_id(), ts, now, NULL);
+ /* Updates the per cpu time idle statistics counters */
+ delta = ktime_sub(now, ts->idle_entrytime);
+ if (nr_iowait_cpu(smp_processor_id()) > 0)
+ ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
+ else
+ ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
ts->idle_active = 0;
sched_clock_idle_wakeup_event(0);
@@ -469,17 +454,14 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
return -1;
now = ktime_get();
- if (last_update_time) {
- update_ts_time_stats(cpu, ts, now, last_update_time);
- idle = ts->idle_sleeptime;
- } else {
- if (ts->idle_active && !nr_iowait_cpu(cpu)) {
- ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ if (last_update_time)
+ *last_update_time = ktime_to_us(now);
- idle = ktime_add(ts->idle_sleeptime, delta);
- } else {
- idle = ts->idle_sleeptime;
- }
+ if (ts->idle_active && !nr_iowait_cpu(cpu)) {
+ ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ idle = ktime_add(ts->idle_sleeptime, delta);
+ } else {
+ idle = ts->idle_sleeptime;
}
return ktime_to_us(idle);
@@ -510,17 +492,14 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
return -1;
now = ktime_get();
- if (last_update_time) {
- update_ts_time_stats(cpu, ts, now, last_update_time);
- iowait = ts->iowait_sleeptime;
- } else {
- if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
- ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ if (last_update_time)
+ *last_update_time = ktime_to_us(now);
- iowait = ktime_add(ts->iowait_sleeptime, delta);
- } else {
- iowait = ts->iowait_sleeptime;
- }
+ if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
+ ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ iowait = ktime_add(ts->iowait_sleeptime, delta);
+ } else {
+ iowait = ts->iowait_sleeptime;
}
return ktime_to_us(iowait);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH RESEND 0/4] nohz: Fix racy sleeptime stats
@ 2013-08-16 15:42 Frederic Weisbecker
2013-08-16 15:42 ` [PATCH 1/4] nohz: Only update sleeptime stats locally Frederic Weisbecker
0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2013-08-16 15:42 UTC (permalink / raw)
To: Oleg Nesterov
Cc: LKML, Frederic Weisbecker, Fernando Luis Vazquez Cao,
Tetsuo Handa, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Andrew Morton, Arjan van de Ven
Hi,
This is a resend to include Oleg in Cc, no modification since last version
https://lkml.org/lkml/2013/8/8/638.
Tree is still:
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
timers/nohz-sleeptime-fix
Thanks.
Frederic Weisbecker (4):
nohz: Only update sleeptime stats locally
nohz: Synchronize sleep time stats with seqlock
nohz: Consolidate sleep time stats read code
nohz: Convert a few places to use local per cpu accesses
include/linux/tick.h | 2 +
kernel/time/tick-sched.c | 124 +++++++++++++++++++--------------------------
2 files changed, 54 insertions(+), 72 deletions(-)
--
1.7.5.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/4] nohz: Only update sleeptime stats locally
2013-08-16 15:42 [PATCH RESEND 0/4] nohz: Fix racy sleeptime stats Frederic Weisbecker
@ 2013-08-16 15:42 ` Frederic Weisbecker
2013-08-18 16:49 ` Oleg Nesterov
2013-08-18 17:04 ` Oleg Nesterov
0 siblings, 2 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2013-08-16 15:42 UTC (permalink / raw)
To: Oleg Nesterov
Cc: LKML, Frederic Weisbecker, Fernando Luis Vazquez Cao,
Tetsuo Handa, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Andrew Morton, Arjan van de Ven
The idle and io sleeptime stats can be updated concurrently from callers
of get_cpu_idle_time_us(), get_cpu_iowait_time_us() and tick_nohz_stop_idle().
Updaters can easily race and mess up with internal datas coherency , for example
when a governor calls a get_cpu_*_time_us() API and the target CPU exits idle at
the same time, because no locking or whatsoever is there to protect against
this concurrency.
To fix this, lets only update the sleeptime stats locally when the CPU
exits from idle. This is the only place where a real update is truly
needed. The callers of get_cpu_*_time_us() can simply add up the pending
sleep time delta to the last sleeptime snapshot in order to get a coherent
result. There is no need for them to also update the stats.
Reported-by: Fernando Luis Vazquez Cao <fernando_b1@lab.ntt.co.jp>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Fernando Luis Vazquez Cao <fernando_b1@lab.ntt.co.jp>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
---
kernel/time/tick-sched.c | 65 +++++++++++++++------------------------------
1 files changed, 22 insertions(+), 43 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index e77edc9..ede0405 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -408,33 +408,18 @@ static void tick_nohz_update_jiffies(ktime_t now)
touch_softlockup_watchdog();
}
-/*
- * Updates the per cpu time idle statistics counters
- */
-static void
-update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time)
-{
- ktime_t delta;
-
- if (ts->idle_active) {
- delta = ktime_sub(now, ts->idle_entrytime);
- if (nr_iowait_cpu(cpu) > 0)
- ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
- else
- ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
- ts->idle_entrytime = now;
- }
-
- if (last_update_time)
- *last_update_time = ktime_to_us(now);
-
-}
-
static void tick_nohz_stop_idle(int cpu, ktime_t now)
{
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ ktime_t delta;
- update_ts_time_stats(cpu, ts, now, NULL);
+ /* Updates the per cpu time idle statistics counters */
+ delta = ktime_sub(now, ts->idle_entrytime);
+ if (nr_iowait_cpu(cpu) > 0)
+ ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
+ else
+ ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
+ ts->idle_entrytime = now;
ts->idle_active = 0;
sched_clock_idle_wakeup_event(0);
@@ -473,17 +458,14 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
return -1;
now = ktime_get();
- if (last_update_time) {
- update_ts_time_stats(cpu, ts, now, last_update_time);
- idle = ts->idle_sleeptime;
- } else {
- if (ts->idle_active && !nr_iowait_cpu(cpu)) {
- ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ if (last_update_time)
+ *last_update_time = ktime_to_us(now);
- idle = ktime_add(ts->idle_sleeptime, delta);
- } else {
- idle = ts->idle_sleeptime;
- }
+ if (ts->idle_active && !nr_iowait_cpu(cpu)) {
+ ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ idle = ktime_add(ts->idle_sleeptime, delta);
+ } else {
+ idle = ts->idle_sleeptime;
}
return ktime_to_us(idle);
@@ -514,17 +496,14 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
return -1;
now = ktime_get();
- if (last_update_time) {
- update_ts_time_stats(cpu, ts, now, last_update_time);
- iowait = ts->iowait_sleeptime;
- } else {
- if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
- ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ if (last_update_time)
+ *last_update_time = ktime_to_us(now);
- iowait = ktime_add(ts->iowait_sleeptime, delta);
- } else {
- iowait = ts->iowait_sleeptime;
- }
+ if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
+ ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ iowait = ktime_add(ts->iowait_sleeptime, delta);
+ } else {
+ iowait = ts->iowait_sleeptime;
}
return ktime_to_us(iowait);
--
1.7.5.4
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH 1/4] nohz: Only update sleeptime stats locally
2013-08-16 15:42 ` [PATCH 1/4] nohz: Only update sleeptime stats locally Frederic Weisbecker
@ 2013-08-18 16:49 ` Oleg Nesterov
2013-08-18 21:38 ` Frederic Weisbecker
2013-08-18 17:04 ` Oleg Nesterov
1 sibling, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2013-08-18 16:49 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Fernando Luis Vazquez Cao, Tetsuo Handa, Thomas Gleixner,
Ingo Molnar, Peter Zijlstra, Andrew Morton, Arjan van de Ven
On 08/16, Frederic Weisbecker wrote:
>
> To fix this, lets only update the sleeptime stats locally when the CPU
> exits from idle.
I am in no position to ack the changes in this area, but I like this
change very much. Because, as a code reader, I was totally confused by
if (last_update_time)
update_ts_time_stats()
code and it looks "obviously wrong".
I added more cc's. It seems to me that 9366d840 "cpufreq: governors:
Calculate iowait time only when necessary" doesn't realize what
- u64 idle_time = get_cpu_idle_time_us(cpu, NULL);
+ u64 idle_time = get_cpu_idle_time_us(cpu, io_busy ? wall : NULL);
actually means. OTOH, get_cpu_iowait_time_us() was called with
last_update_time != NULL even before this patch...
In short. This looks like the clear fix to me, but I do not understand
this code enough, and I think that cpufreq should know about this change.
> static void tick_nohz_stop_idle(int cpu, ktime_t now)
> {
> struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> + ktime_t delta;
>
> - update_ts_time_stats(cpu, ts, now, NULL);
> + /* Updates the per cpu time idle statistics counters */
> + delta = ktime_sub(now, ts->idle_entrytime);
> + if (nr_iowait_cpu(cpu) > 0)
> + ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
> + else
> + ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
> + ts->idle_entrytime = now;
> ts->idle_active = 0;
With or without this change, why we update ->idle_entrytime in this case?
Looks harmless, but a bit confusing.
While this doesn't really matter, we could probably even kill ->idle_active
and use !!ts->idle_entrytime instead.
> @@ -473,17 +458,14 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
And I think that we should kill this last_update_time argument later.
Oleg.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 1/4] nohz: Only update sleeptime stats locally
2013-08-18 16:49 ` Oleg Nesterov
@ 2013-08-18 21:38 ` Frederic Weisbecker
0 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2013-08-18 21:38 UTC (permalink / raw)
To: Oleg Nesterov
Cc: LKML, Fernando Luis Vazquez Cao, Tetsuo Handa, Thomas Gleixner,
Ingo Molnar, Peter Zijlstra, Andrew Morton, Arjan van de Ven,
Stratos Karafotis, Viresh Kumar, Rafael J. Wysocki
On Sun, Aug 18, 2013 at 06:49:37PM +0200, Oleg Nesterov wrote:
> On 08/16, Frederic Weisbecker wrote:
> >
> > To fix this, lets only update the sleeptime stats locally when the CPU
> > exits from idle.
>
> I am in no position to ack the changes in this area, but I like this
> change very much. Because, as a code reader, I was totally confused by
>
> if (last_update_time)
> update_ts_time_stats()
>
> code and it looks "obviously wrong".
>
> I added more cc's. It seems to me that 9366d840 "cpufreq: governors:
> Calculate iowait time only when necessary" doesn't realize what
>
> - u64 idle_time = get_cpu_idle_time_us(cpu, NULL);
> + u64 idle_time = get_cpu_idle_time_us(cpu, io_busy ? wall : NULL);
>
> actually means. OTOH, get_cpu_iowait_time_us() was called with
> last_update_time != NULL even before this patch...
>
> In short. This looks like the clear fix to me, but I do not understand
> this code enough, and I think that cpufreq should know about this change.
Good point, and this time I'm really adding the Cc :)
>
> > static void tick_nohz_stop_idle(int cpu, ktime_t now)
> > {
> > struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> > + ktime_t delta;
> >
> > - update_ts_time_stats(cpu, ts, now, NULL);
> > + /* Updates the per cpu time idle statistics counters */
> > + delta = ktime_sub(now, ts->idle_entrytime);
> > + if (nr_iowait_cpu(cpu) > 0)
> > + ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
> > + else
> > + ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
> > + ts->idle_entrytime = now;
> > ts->idle_active = 0;
>
> With or without this change, why we update ->idle_entrytime in this case?
> Looks harmless, but a bit confusing.
Oh indeed I missed that. It's a leftover from the copy-paste of update_ts_time_stats()
content.
Well spotted, I'll fix.
>
> While this doesn't really matter, we could probably even kill ->idle_active
> and use !!ts->idle_entrytime instead.
We could but it would be slightly more overhead in the irq entry path (cf: tick_check_nohz())
and it makes the code also a little bit harder to review IMHO.
>
> > @@ -473,17 +458,14 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
>
> And I think that we should kill this last_update_time argument later.
Agreed.
Thanks.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] nohz: Only update sleeptime stats locally
2013-08-16 15:42 ` [PATCH 1/4] nohz: Only update sleeptime stats locally Frederic Weisbecker
2013-08-18 16:49 ` Oleg Nesterov
@ 2013-08-18 17:04 ` Oleg Nesterov
2013-08-19 18:05 ` Stratos Karafotis
1 sibling, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2013-08-18 17:04 UTC (permalink / raw)
To: Frederic Weisbecker, Rafael J. Wysocki, Viresh Kumar,
Stratos Karafotis
Cc: LKML, Fernando Luis Vazquez Cao, Tetsuo Handa, Thomas Gleixner,
Ingo Molnar, Peter Zijlstra, Andrew Morton, Arjan van de Ven
Sorry for double post. forgot to cc cpufreq maintainers.
On 08/16, Frederic Weisbecker wrote:
>
> To fix this, lets only update the sleeptime stats locally when the CPU
> exits from idle.
I am in no position to ack the changes in this area, but I like this
change very much. Because, as a code reader, I was totally confused by
if (last_update_time)
update_ts_time_stats()
code and it looks "obviously wrong".
I added more cc's. It seems to me that 9366d840 "cpufreq: governors:
Calculate iowait time only when necessary" doesn't realize what
- u64 idle_time = get_cpu_idle_time_us(cpu, NULL);
+ u64 idle_time = get_cpu_idle_time_us(cpu, io_busy ? wall : NULL);
actually means. OTOH, get_cpu_iowait_time_us() was called with
last_update_time != NULL even before this patch...
In short. This looks like the clear fix to me, but I do not understand
this code enough, and I think that cpufreq should know about this change.
> static void tick_nohz_stop_idle(int cpu, ktime_t now)
> {
> struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> + ktime_t delta;
>
> - update_ts_time_stats(cpu, ts, now, NULL);
> + /* Updates the per cpu time idle statistics counters */
> + delta = ktime_sub(now, ts->idle_entrytime);
> + if (nr_iowait_cpu(cpu) > 0)
> + ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
> + else
> + ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
> + ts->idle_entrytime = now;
> ts->idle_active = 0;
With or without this change, why we update ->idle_entrytime in this case?
Looks harmless, but a bit confusing.
While this doesn't really matter, we could probably even kill ->idle_active
and use !!ts->idle_entrytime instead.
> @@ -473,17 +458,14 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
And I think that we should kill this last_update_time argument later.
Oleg.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 1/4] nohz: Only update sleeptime stats locally
2013-08-18 17:04 ` Oleg Nesterov
@ 2013-08-19 18:05 ` Stratos Karafotis
0 siblings, 0 replies; 16+ messages in thread
From: Stratos Karafotis @ 2013-08-19 18:05 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Frederic Weisbecker, Rafael J. Wysocki, Viresh Kumar, LKML,
Fernando Luis Vazquez Cao, Tetsuo Handa, Thomas Gleixner,
Ingo Molnar, Peter Zijlstra, Andrew Morton, Arjan van de Ven
On 08/18/2013 08:04 PM, Oleg Nesterov wrote:
> Sorry for double post. forgot to cc cpufreq maintainers.
>
> On 08/16, Frederic Weisbecker wrote:
>>
>> To fix this, lets only update the sleeptime stats locally when the CPU
>> exits from idle.
>
> I am in no position to ack the changes in this area, but I like this
> change very much. Because, as a code reader, I was totally confused by
>
> if (last_update_time)
> update_ts_time_stats()
>
> code and it looks "obviously wrong".
>
> I added more cc's. It seems to me that 9366d840 "cpufreq: governors:
> Calculate iowait time only when necessary" doesn't realize what
>
> - u64 idle_time = get_cpu_idle_time_us(cpu, NULL);
> + u64 idle_time = get_cpu_idle_time_us(cpu, io_busy ? wall : NULL);
>
> actually means. OTOH, get_cpu_iowait_time_us() was called with
> last_update_time != NULL even before this patch...
To be honest, I am unfamiliar with tick-sched code.
With patch 9366d840, I was trying to avoid duplicate calls to
get_cpu_iowait_time_us function. I just saw that the original
code was calling update_ts_time_stats within get_cpu_idle_time_us
and get_cpu_iowait_time_us and I thought that I should keep calling
these functions with non NULL parameter to update the time stats.
In fact the original patch submission was without this:
- u64 idle_time = get_cpu_idle_time_us(cpu, NULL);
+ u64 idle_time = get_cpu_idle_time_us(cpu, io_busy ? wall : NULL);
and the idle time calculation was wrong (ondemand couldn't increase to max freq)
For your convenience the call paths before and after this patch:
Before patch
get_cpu_idle_time(j, &cur_wall_time);
u64 idle_time = get_cpu_idle_time_us(cpu, NULL);
idle_time += get_cpu_iowait_time_us(cpu, wall);
update_ts_time_stats(cpu, ts, now, last_update_time);
...
get_cpu_iowait_time_us(j, &cur_wall_time);
update_ts_time_stats(cpu, ts, now, last_update_time);
After patch (io_busy = 1)
cur_idle_time = get_cpu_idle_time(j, &cur_wall_time, io_busy);
u64 idle_time = get_cpu_idle_time_us(cpu, io_busy ? wall : NULL);
update_ts_time_stats(cpu, ts, now, last_update_time);
After patch (io_busy = 0)
cur_idle_time = get_cpu_idle_time(j, &cur_wall_time, io_busy);
u64 idle_time = get_cpu_idle_time_us(cpu, io_busy ? wall : NULL);
idle_time += get_cpu_iowait_time_us(cpu, wall);
update_ts_time_stats(cpu, ts, now, last_update_time);
Regards,
Stratos
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 0/4] nohz: Fix racy sleeptime stats
@ 2013-08-09 0:54 Frederic Weisbecker
2013-08-09 0:54 ` [PATCH 1/4] nohz: Only update sleeptime stats locally Frederic Weisbecker
0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2013-08-09 0:54 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Fernando Luis Vazquez Cao, Tetsuo Handa,
Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andrew Morton,
Arjan van de Ven
Hi,
Here is a propsition to fix the bug described here:
http://lkml.kernel.org/r/1363660703.4993.3.camel@nexus
http://lkml.kernel.org/r/201304012205.DFC60784.HVOMJSFFLFtOOQ@I-love.SAKURA.ne.jp
You can try the following branch:
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
timers/nohz-sleeptime-fix
Thanks,
Frederic
---
Frederic Weisbecker (4):
nohz: Only update sleeptime stats locally
nohz: Synchronize sleep time stats with seqlock
nohz: Consolidate sleep time stats read code
nohz: Convert a few places to use local per cpu accesses
include/linux/tick.h | 2 +
kernel/time/tick-sched.c | 124 +++++++++++++++++++--------------------------
2 files changed, 54 insertions(+), 72 deletions(-)
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 1/4] nohz: Only update sleeptime stats locally
2013-08-09 0:54 [PATCH 0/4] nohz: Fix racy sleeptime stats Frederic Weisbecker
@ 2013-08-09 0:54 ` Frederic Weisbecker
0 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2013-08-09 0:54 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Fernando Luis Vazquez Cao, Tetsuo Handa,
Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andrew Morton,
Arjan van de Ven
The idle and io sleeptime stats can be updated concurrently from callers
of get_cpu_idle_time_us(), get_cpu_iowait_time_us() and tick_nohz_stop_idle().
Updaters can easily race and mess up with internal datas coherency , for example
when a governor calls a get_cpu_*_time_us() API and the target CPU exits idle at
the same time, because no locking or whatsoever is there to protect against
this concurrency.
To fix this, lets only update the sleeptime stats locally when the CPU
exits from idle. This is the only place where a real update is truly
needed. The callers of get_cpu_*_time_us() can simply add up the pending
sleep time delta to the last sleeptime snapshot in order to get a coherent
result. There is no need for them to also update the stats.
Reported-by: Fernando Luis Vazquez Cao <fernando_b1@lab.ntt.co.jp>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Fernando Luis Vazquez Cao <fernando_b1@lab.ntt.co.jp>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
---
kernel/time/tick-sched.c | 65 +++++++++++++++------------------------------
1 files changed, 22 insertions(+), 43 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index e77edc9..ede0405 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -408,33 +408,18 @@ static void tick_nohz_update_jiffies(ktime_t now)
touch_softlockup_watchdog();
}
-/*
- * Updates the per cpu time idle statistics counters
- */
-static void
-update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time)
-{
- ktime_t delta;
-
- if (ts->idle_active) {
- delta = ktime_sub(now, ts->idle_entrytime);
- if (nr_iowait_cpu(cpu) > 0)
- ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
- else
- ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
- ts->idle_entrytime = now;
- }
-
- if (last_update_time)
- *last_update_time = ktime_to_us(now);
-
-}
-
static void tick_nohz_stop_idle(int cpu, ktime_t now)
{
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ ktime_t delta;
- update_ts_time_stats(cpu, ts, now, NULL);
+ /* Updates the per cpu time idle statistics counters */
+ delta = ktime_sub(now, ts->idle_entrytime);
+ if (nr_iowait_cpu(cpu) > 0)
+ ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
+ else
+ ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
+ ts->idle_entrytime = now;
ts->idle_active = 0;
sched_clock_idle_wakeup_event(0);
@@ -473,17 +458,14 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
return -1;
now = ktime_get();
- if (last_update_time) {
- update_ts_time_stats(cpu, ts, now, last_update_time);
- idle = ts->idle_sleeptime;
- } else {
- if (ts->idle_active && !nr_iowait_cpu(cpu)) {
- ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ if (last_update_time)
+ *last_update_time = ktime_to_us(now);
- idle = ktime_add(ts->idle_sleeptime, delta);
- } else {
- idle = ts->idle_sleeptime;
- }
+ if (ts->idle_active && !nr_iowait_cpu(cpu)) {
+ ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ idle = ktime_add(ts->idle_sleeptime, delta);
+ } else {
+ idle = ts->idle_sleeptime;
}
return ktime_to_us(idle);
@@ -514,17 +496,14 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
return -1;
now = ktime_get();
- if (last_update_time) {
- update_ts_time_stats(cpu, ts, now, last_update_time);
- iowait = ts->iowait_sleeptime;
- } else {
- if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
- ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ if (last_update_time)
+ *last_update_time = ktime_to_us(now);
- iowait = ktime_add(ts->iowait_sleeptime, delta);
- } else {
- iowait = ts->iowait_sleeptime;
- }
+ if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
+ ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+ iowait = ktime_add(ts->iowait_sleeptime, delta);
+ } else {
+ iowait = ts->iowait_sleeptime;
}
return ktime_to_us(iowait);
--
1.7.5.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
end of thread, other threads:[~2014-05-07 19:07 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-07 13:41 [PATCH 1/4] nohz: Only update sleeptime stats locally Denys Vlasenko
2014-05-07 13:41 ` [PATCH 2/4] nohz: Synchronize sleep time stats with seqlock Denys Vlasenko
2014-05-07 13:41 ` [PATCH 3/4 v2] nohz: Fix idle/iowait counts going backwards Denys Vlasenko
2014-05-07 14:23 ` Peter Zijlstra
2014-05-07 16:49 ` Denys Vlasenko
2014-05-07 16:56 ` Peter Zijlstra
2014-05-07 18:24 ` Denys Vlasenko
2014-05-07 19:06 ` Peter Zijlstra
2014-05-07 13:41 ` [PATCH 4/4 v2] nohz: Fix iowait overcounting if iowait task migrates Denys Vlasenko
-- strict thread matches above, loose matches on Subject: below --
2014-04-24 18:45 [PATCH 1/4] nohz: Only update sleeptime stats locally Denys Vlasenko
2013-08-16 15:42 [PATCH RESEND 0/4] nohz: Fix racy sleeptime stats Frederic Weisbecker
2013-08-16 15:42 ` [PATCH 1/4] nohz: Only update sleeptime stats locally Frederic Weisbecker
2013-08-18 16:49 ` Oleg Nesterov
2013-08-18 21:38 ` Frederic Weisbecker
2013-08-18 17:04 ` Oleg Nesterov
2013-08-19 18:05 ` Stratos Karafotis
2013-08-09 0:54 [PATCH 0/4] nohz: Fix racy sleeptime stats Frederic Weisbecker
2013-08-09 0:54 ` [PATCH 1/4] nohz: Only update sleeptime stats locally Frederic Weisbecker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).