* time accounting problem in pvops kernel
@ 2010-08-17 17:29 Paolo Bonzini
2010-08-17 22:51 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2010-08-17 17:29 UTC (permalink / raw)
To: xen-devel@lists.xensource.com; +Cc: Glauber Costa
Hi,
while experimenting a bit with time.c we found a bug in time
accounting. Basically, /proc/stat counts idle time twice for PV guests
running a pvops kernel.
To reproduce, try this command in an unloaded guest:
grep cpu0 /proc/stat; sleep 20 ; grep cpu0 /proc/stat
and see the fourth number in /proc/stat (idle) increasing by approximately
4000 for a kernel with USER_HZ == 100. Instead, if you try these commands
instead (you need an otherwise unloaded machine for these):
grep cpu0 /proc/stat; timeout 20s yes > /dev/null ; grep cpu0 /proc/stat
grep cpu0 /proc/stat; timeout 20s dd if=/dev/urandom > /dev/null ; grep cpu0 /proc/stat
the first and third number in the /cpu/stat increase instead by 2000 only.
The reason for this seems to be that in xen_timer_interrupt Linux's
normal timer accounting is called (via evt->event_handler) and this
calls account_idle_time. However, idle ticks are also added from
do_stolen_accounting, so that overall they're counted twice.
Related to this, it looks like stolen tick accounting is subtly
wrong. Even if only part of a tick is stolen by the hypervisor, Linux's
time accounting will add a whole tick to the user/system/idle time. In
a dynticks kernel (or maybe even if the scheduling quanta have some
kind of resonance with the guest's timer interrupt?) the sum of the
four components user+sys+idle+steal will then be larger than the wall
time. In fact, I found experimentally steal time to be usually 20%
off from wall-user-sys-idle when the machine is under moderate load
(e.g. 5 domains at 100% CPU usage, on a 4-CPU machine). Of course I used
the correct, divided-by-2 idle time to do this computation.
Paolo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: time accounting problem in pvops kernel
2010-08-17 17:29 time accounting problem in pvops kernel Paolo Bonzini
@ 2010-08-17 22:51 ` Jeremy Fitzhardinge
2010-08-18 7:49 ` Paolo Bonzini
0 siblings, 1 reply; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-17 22:51 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Glauber Costa, xen-devel@lists.xensource.com
On 08/17/2010 10:29 AM, Paolo Bonzini wrote:
> Hi,
>
> while experimenting a bit with time.c we found a bug in time
> accounting. Basically, /proc/stat counts idle time twice for PV guests
> running a pvops kernel
What version? Upstream and stable kernels contain the changeset "xen:
drop xen_sched_clock in favour of using plain wallclock time" which
should fix a lot of timekeeping/scheduling problems.
Thanks,
J
> .
>
> To reproduce, try this command in an unloaded guest:
>
> grep cpu0 /proc/stat; sleep 20 ; grep cpu0 /proc/stat
>
> and see the fourth number in /proc/stat (idle) increasing by approximately
> 4000 for a kernel with USER_HZ == 100. Instead, if you try these commands
> instead (you need an otherwise unloaded machine for these):
>
> grep cpu0 /proc/stat; timeout 20s yes > /dev/null ; grep cpu0 /proc/stat
> grep cpu0 /proc/stat; timeout 20s dd if=/dev/urandom > /dev/null ; grep cpu0 /proc/stat
>
> the first and third number in the /cpu/stat increase instead by 2000 only.
>
> The reason for this seems to be that in xen_timer_interrupt Linux's
> normal timer accounting is called (via evt->event_handler) and this
> calls account_idle_time. However, idle ticks are also added from
> do_stolen_accounting, so that overall they're counted twice.
>
> Related to this, it looks like stolen tick accounting is subtly
> wrong. Even if only part of a tick is stolen by the hypervisor, Linux's
> time accounting will add a whole tick to the user/system/idle time. In
> a dynticks kernel (or maybe even if the scheduling quanta have some
> kind of resonance with the guest's timer interrupt?) the sum of the
> four components user+sys+idle+steal will then be larger than the wall
> time. In fact, I found experimentally steal time to be usually 20%
> off from wall-user-sys-idle when the machine is under moderate load
> (e.g. 5 domains at 100% CPU usage, on a 4-CPU machine). Of course I used
> the correct, divided-by-2 idle time to do this computation.
>
> Paolo
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: time accounting problem in pvops kernel
2010-08-17 22:51 ` Jeremy Fitzhardinge
@ 2010-08-18 7:49 ` Paolo Bonzini
2010-08-18 14:15 ` Paolo Bonzini
2010-08-18 14:17 ` Jed Smith
0 siblings, 2 replies; 6+ messages in thread
From: Paolo Bonzini @ 2010-08-18 7:49 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Glauber Costa, xen-devel@lists.xensource.com
On 08/18/2010 12:51 AM, Jeremy Fitzhardinge wrote:
> On 08/17/2010 10:29 AM, Paolo Bonzini wrote:
>> Hi,
>>
>> while experimenting a bit with time.c we found a bug in time
>> accounting. Basically, /proc/stat counts idle time twice for PV guests
>> running a pvops kernel
>
> What version?
I was using the latest RHEL6 snapshot + the 16-patch blkfront series
(i.e. without the patch you pointed out).
> Upstream and stable kernels contain the changeset "xen:
> drop xen_sched_clock in favour of using plain wallclock time" which
> should fix a lot of timekeeping/scheduling problems.
I'll try this patch; however, offhand I don't see how it fixes the
problem of calling account_idle_ticks twice.
Paolo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: time accounting problem in pvops kernel
2010-08-18 7:49 ` Paolo Bonzini
@ 2010-08-18 14:15 ` Paolo Bonzini
2010-08-18 16:06 ` Jeremy Fitzhardinge
2010-08-18 14:17 ` Jed Smith
1 sibling, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2010-08-18 14:15 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Glauber Costa, xen-devel@lists.xensource.com
On 08/18/2010 09:49 AM, Paolo Bonzini wrote:
>
>> Upstream and stable kernels contain the changeset "xen:
>> drop xen_sched_clock in favour of using plain wallclock time" which
>> should fix a lot of timekeeping/scheduling problems.
>
> I'll try this patch; however, offhand I don't see how it fixes the
> problem of calling account_idle_ticks twice.
It doesn't. :)
Paolo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: time accounting problem in pvops kernel
2010-08-18 7:49 ` Paolo Bonzini
2010-08-18 14:15 ` Paolo Bonzini
@ 2010-08-18 14:17 ` Jed Smith
1 sibling, 0 replies; 6+ messages in thread
From: Jed Smith @ 2010-08-18 14:17 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: xen-devel
On Aug 18, 2010, at 3:49 AM, Paolo Bonzini wrote:
> On 08/18/2010 12:51 AM, Jeremy Fitzhardinge wrote:
>> On 08/17/2010 10:29 AM, Paolo Bonzini wrote:
>>> Hi,
>>>
>>> while experimenting a bit with time.c we found a bug in time
>>> accounting. Basically, /proc/stat counts idle time twice for PV guests
>>> running a pvops kernel
>>
>> What version?
>
> I was using the latest RHEL6 snapshot + the 16-patch blkfront series (i.e. without the patch you pointed out).
>
>> Upstream and stable kernels contain the changeset "xen:
>> drop xen_sched_clock in favour of using plain wallclock time" which
>> should fix a lot of timekeeping/scheduling problems.
>
> I'll try this patch; however, offhand I don't see how it fixes the problem of calling account_idle_ticks twice.
I saw this too, even with said patch applied. To avoid this being simply a
'me too!' message, I noticed that it aggravated Munin quite a bit. The CPU
plugin detects 800% of idle on a 4-core machine, but only idle time is off.
Regards,
Jed Smith
Systems Administrator
Linode, LLC
+1 (609) 593-7103 x1209
jed@linode.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: time accounting problem in pvops kernel
2010-08-18 14:15 ` Paolo Bonzini
@ 2010-08-18 16:06 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-18 16:06 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Glauber Costa, xen-devel@lists.xensource.com
On 08/18/2010 07:15 AM, Paolo Bonzini wrote:
> On 08/18/2010 09:49 AM, Paolo Bonzini wrote:
>>
>>> Upstream and stable kernels contain the changeset "xen:
>>> drop xen_sched_clock in favour of using plain wallclock time" which
>>> should fix a lot of timekeeping/scheduling problems.
>>
>> I'll try this patch; however, offhand I don't see how it fixes the
>> problem of calling account_idle_ticks twice.
>
> It doesn't. :)
OK. To be honest, I didn't look at the detail of your report. I just
wanted to make sure it wasn't something we'd already addressed.
J
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-08-18 16:06 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-17 17:29 time accounting problem in pvops kernel Paolo Bonzini
2010-08-17 22:51 ` Jeremy Fitzhardinge
2010-08-18 7:49 ` Paolo Bonzini
2010-08-18 14:15 ` Paolo Bonzini
2010-08-18 16:06 ` Jeremy Fitzhardinge
2010-08-18 14:17 ` Jed Smith
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.