* [REGRESSION] High, likely incorrect process cpu usage counters with kvm and 2.6.2[67]
@ 2008-08-31 15:43 Avi Kivity
2008-08-31 18:09 ` Parag Warudkar
2008-09-01 14:58 ` [PATCH] sched_clock: fix NOHZ interaction Peter Zijlstra
0 siblings, 2 replies; 6+ messages in thread
From: Avi Kivity @ 2008-08-31 15:43 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar; +Cc: linux-kernel, KVM list
Running an idle Windows VM on Linux 2.6.26+ with kvm, one sees high
values for the kvm process in top (30%-70% cpu), where one would
normally expect 0%-1%. Surprisingly, the per-cpu system counters show
almost 100% idle, leading me to believe this is an accounting error and
that the process does not actually consume this much cpu.
I bisected this to a scheduler change, namely
commit 3e51f33fcc7f55e6df25d15b55ed10c8b4da84cd
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Sat May 3 18:29:28 2008 +0200
sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
this replaces the rq->clock stuff (and possibly cpu_clock()).
- architectures that have an 'imperfect' hardware clock can set
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
- the 'jiffie' window might be superfulous when we update tick_gtod
before the __update_sched_clock() call in sched_clock_tick()
- cpu_clock() might be implemented as:
sched_clock_cpu(smp_processor_id())
if the accuracy proves good enough - how far can TSC drift in a
single jiffie when considering the filtering and idle hooks?
[ mingo@elte.hu: various fixes and cleanups ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Which is a bit too complex for me to work out.
Further information:
- the kvm thread which has the incorrect counter is the one that
actually executes guest code
- this thread mostly sleeps in schedule(), as one would expect
- it is periodically woken up by a timer; perhaps the problem is that
the process is sampled using the same timer, so it always shows as
running (though I'd expect it to report 100% cpu in that case).
Any help will be appreciated (or provided).
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [REGRESSION] High, likely incorrect process cpu usage counters with kvm and 2.6.2[67]
2008-08-31 15:43 [REGRESSION] High, likely incorrect process cpu usage counters with kvm and 2.6.2[67] Avi Kivity
@ 2008-08-31 18:09 ` Parag Warudkar
2008-09-01 9:16 ` Avi Kivity
2008-09-01 14:58 ` [PATCH] sched_clock: fix NOHZ interaction Peter Zijlstra
1 sibling, 1 reply; 6+ messages in thread
From: Parag Warudkar @ 2008-08-31 18:09 UTC (permalink / raw)
To: Avi Kivity; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, KVM list
On Sun, Aug 31, 2008 at 11:43 AM, Avi Kivity <avi@qumranet.com> wrote:
> Running an idle Windows VM on Linux 2.6.26+ with kvm, one sees high values
> for the kvm process in top (30%-70% cpu), where one would normally expect
> 0%-1%. Surprisingly, the per-cpu system counters show almost 100% idle,
> leading me to believe this is an accounting error and that the process does
> not actually consume this much cpu.
Busted process accounting - This looks the same as
http://bugzilla.kernel.org/show_bug.cgi?id=11209 .
Please verify. Peter's patch in latest git stops showing "incorrect
looking" CPU usage but at least the process times are still wrong,
horribly.
In fact the CPU usage thing in -rc5 is likely also incorrect but I
need to analyze that bit a little more.
>From Today's Git -
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12961 parag 20 0 83000 8908 6628 R 0 0.1 5124415h npviewer.bin
>
> I bisected this to a scheduler change, namely
>
> commit 3e51f33fcc7f55e6df25d15b55ed10c8b4da84cd
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Sat May 3 18:29:28 2008 +0200
>
> sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> this replaces the rq->clock stuff (and possibly cpu_clock()).
> - architectures that have an 'imperfect' hardware clock can set
> CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> - the 'jiffie' window might be superfulous when we update tick_gtod
> before the __update_sched_clock() call in sched_clock_tick()
> - cpu_clock() might be implemented as:
> sched_clock_cpu(smp_processor_id())
> if the accuracy proves good enough - how far can TSC drift in a
> single jiffie when considering the filtering and idle hooks?
> [ mingo@elte.hu: various fixes and cleanups ]
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
That patch sounds like it had open questions?
Really giving this is a long standing bad regression, all the
offending patches should be reverted in absence of a fix, no?
Parag
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] High, likely incorrect process cpu usage counters with kvm and 2.6.2[67]
2008-08-31 18:09 ` Parag Warudkar
@ 2008-09-01 9:16 ` Avi Kivity
0 siblings, 0 replies; 6+ messages in thread
From: Avi Kivity @ 2008-09-01 9:16 UTC (permalink / raw)
To: Parag Warudkar; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, KVM list
Parag Warudkar wrote:
> On Sun, Aug 31, 2008 at 11:43 AM, Avi Kivity <avi@qumranet.com> wrote:
>
>> Running an idle Windows VM on Linux 2.6.26+ with kvm, one sees high values
>> for the kvm process in top (30%-70% cpu), where one would normally expect
>> 0%-1%. Surprisingly, the per-cpu system counters show almost 100% idle,
>> leading me to believe this is an accounting error and that the process does
>> not actually consume this much cpu.
>>
>
> Busted process accounting - This looks the same as
> http://bugzilla.kernel.org/show_bug.cgi?id=11209 .
> Please verify. Peter's patch in latest git stops showing "incorrect
> looking" CPU usage but at least the process times are still wrong,
> horribly.
> In fact the CPU usage thing in -rc5 is likely also incorrect but I
> need to analyze that bit a little more.
>
> From Today's Git -
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>
> 12961 parag 20 0 83000 8908 6628 R 0 0.1 5124415h npviewer.bin
>
>
Yes, it looks very similar. In my tests, %CPU is consistent with TIME;
it's just not consistent with what's actually happening and with the
global statistics.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] sched_clock: fix NOHZ interaction
2008-08-31 15:43 [REGRESSION] High, likely incorrect process cpu usage counters with kvm and 2.6.2[67] Avi Kivity
2008-08-31 18:09 ` Parag Warudkar
@ 2008-09-01 14:58 ` Peter Zijlstra
2008-09-01 16:17 ` Avi Kivity
1 sibling, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2008-09-01 14:58 UTC (permalink / raw)
To: Avi Kivity; +Cc: Ingo Molnar, linux-kernel, KVM list, Thomas Gleixner
Avi, the below fixes it for me..
---
Subject: sched_clock: fix NOHZ interaction
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon Sep 01 16:44:23 CEST 2008
If HLT stops the TSC, we'll fail to account idle time, thereby inflating the
actual process times. Fix this by re-calibrating the clock against GTOD when
leaving nohz mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/time/tick-sched.c | 3 +++
1 file changed, 3 insertions(+)
Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c
+++ linux-2.6/kernel/time/tick-sched.c
@@ -162,6 +162,8 @@ void tick_nohz_stop_idle(int cpu)
ts->idle_lastupdate = now;
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
ts->idle_active = 0;
+
+ sched_clock_idle_wakeup_event(0);
}
}
@@ -177,6 +179,7 @@ static ktime_t tick_nohz_start_idle(stru
}
ts->idle_entrytime = now;
ts->idle_active = 1;
+ sched_clock_idle_sleep_event();
return now;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-09-05 16:14 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-31 15:43 [REGRESSION] High, likely incorrect process cpu usage counters with kvm and 2.6.2[67] Avi Kivity
2008-08-31 18:09 ` Parag Warudkar
2008-09-01 9:16 ` Avi Kivity
2008-09-01 14:58 ` [PATCH] sched_clock: fix NOHZ interaction Peter Zijlstra
2008-09-01 16:17 ` Avi Kivity
2008-09-05 16:13 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox