All of lore.kernel.org
 help / color / mirror / Atom feed
* time accounting problem in pvops kernel
@ 2010-08-17 17:29 Paolo Bonzini
  2010-08-17 22:51 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2010-08-17 17:29 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com; +Cc: Glauber Costa

Hi,

while experimenting a bit with time.c we found a bug in time
accounting.  Basically, /proc/stat counts idle time twice for PV guests
running a pvops kernel.

To reproduce, try this command in an unloaded guest:

grep cpu0 /proc/stat; sleep 20 ; grep cpu0 /proc/stat

and see the fourth number in /proc/stat (idle) increasing by approximately
4000 for a kernel with USER_HZ == 100. Instead, if you try these commands
instead (you need an otherwise unloaded machine for these):

grep cpu0 /proc/stat; timeout 20s yes > /dev/null ; grep cpu0 /proc/stat
grep cpu0 /proc/stat; timeout 20s dd if=/dev/urandom > /dev/null ; grep cpu0 /proc/stat

the first and third number in the /cpu/stat increase instead by 2000 only.

The reason for this seems to be that in xen_timer_interrupt Linux's
normal timer accounting is called (via evt->event_handler) and this
calls account_idle_time. However, idle ticks are also added from
do_stolen_accounting, so that overall they're counted twice.

Related to this, it looks like stolen tick accounting is subtly
wrong. Even if only part of a tick is stolen by the hypervisor, Linux's
time accounting will add a whole tick to the user/system/idle time. In
a dynticks kernel (or maybe even if the scheduling quanta have some
kind of resonance with the guest's timer interrupt?) the sum of the
four components user+sys+idle+steal will then be larger than the wall
time. In fact, I found experimentally steal time to be usually 20%
off from wall-user-sys-idle when the machine is under moderate load
(e.g. 5 domains at 100% CPU usage, on a 4-CPU machine). Of course I used
the correct, divided-by-2 idle time to do this computation.

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: time accounting problem in pvops kernel
  2010-08-17 17:29 time accounting problem in pvops kernel Paolo Bonzini
@ 2010-08-17 22:51 ` Jeremy Fitzhardinge
  2010-08-18  7:49   ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-17 22:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Glauber Costa, xen-devel@lists.xensource.com

 On 08/17/2010 10:29 AM, Paolo Bonzini wrote:
> Hi,
>
> while experimenting a bit with time.c we found a bug in time
> accounting.  Basically, /proc/stat counts idle time twice for PV guests
> running a pvops kernel

What version?  Upstream and stable kernels contain the changeset "xen:
drop xen_sched_clock in favour of using plain wallclock time" which
should fix a lot of timekeeping/scheduling problems.

Thanks,
    J

> .
>
> To reproduce, try this command in an unloaded guest:
>
> grep cpu0 /proc/stat; sleep 20 ; grep cpu0 /proc/stat
>
> and see the fourth number in /proc/stat (idle) increasing by approximately
> 4000 for a kernel with USER_HZ == 100. Instead, if you try these commands
> instead (you need an otherwise unloaded machine for these):
>
> grep cpu0 /proc/stat; timeout 20s yes > /dev/null ; grep cpu0 /proc/stat
> grep cpu0 /proc/stat; timeout 20s dd if=/dev/urandom > /dev/null ; grep cpu0 /proc/stat
>
> the first and third number in the /cpu/stat increase instead by 2000 only.
>
> The reason for this seems to be that in xen_timer_interrupt Linux's
> normal timer accounting is called (via evt->event_handler) and this
> calls account_idle_time. However, idle ticks are also added from
> do_stolen_accounting, so that overall they're counted twice.
>
> Related to this, it looks like stolen tick accounting is subtly
> wrong. Even if only part of a tick is stolen by the hypervisor, Linux's
> time accounting will add a whole tick to the user/system/idle time. In
> a dynticks kernel (or maybe even if the scheduling quanta have some
> kind of resonance with the guest's timer interrupt?) the sum of the
> four components user+sys+idle+steal will then be larger than the wall
> time. In fact, I found experimentally steal time to be usually 20%
> off from wall-user-sys-idle when the machine is under moderate load
> (e.g. 5 domains at 100% CPU usage, on a 4-CPU machine). Of course I used
> the correct, divided-by-2 idle time to do this computation.
>
> Paolo
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: time accounting problem in pvops kernel
  2010-08-17 22:51 ` Jeremy Fitzhardinge
@ 2010-08-18  7:49   ` Paolo Bonzini
  2010-08-18 14:15     ` Paolo Bonzini
  2010-08-18 14:17     ` Jed Smith
  0 siblings, 2 replies; 6+ messages in thread
From: Paolo Bonzini @ 2010-08-18  7:49 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Glauber Costa, xen-devel@lists.xensource.com

On 08/18/2010 12:51 AM, Jeremy Fitzhardinge wrote:
>   On 08/17/2010 10:29 AM, Paolo Bonzini wrote:
>> Hi,
>>
>> while experimenting a bit with time.c we found a bug in time
>> accounting.  Basically, /proc/stat counts idle time twice for PV guests
>> running a pvops kernel
>
> What version?

I was using the latest RHEL6 snapshot + the 16-patch blkfront series 
(i.e. without the patch you pointed out).

> Upstream and stable kernels contain the changeset "xen:
> drop xen_sched_clock in favour of using plain wallclock time" which
> should fix a lot of timekeeping/scheduling problems.

I'll try this patch; however, offhand I don't see how it fixes the 
problem of calling account_idle_ticks twice.

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: time accounting problem in pvops kernel
  2010-08-18  7:49   ` Paolo Bonzini
@ 2010-08-18 14:15     ` Paolo Bonzini
  2010-08-18 16:06       ` Jeremy Fitzhardinge
  2010-08-18 14:17     ` Jed Smith
  1 sibling, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2010-08-18 14:15 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Glauber Costa, xen-devel@lists.xensource.com

On 08/18/2010 09:49 AM, Paolo Bonzini wrote:
>
>> Upstream and stable kernels contain the changeset "xen:
>> drop xen_sched_clock in favour of using plain wallclock time" which
>> should fix a lot of timekeeping/scheduling problems.
>
> I'll try this patch; however, offhand I don't see how it fixes the
> problem of calling account_idle_ticks twice.

It doesn't. :)

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: time accounting problem in pvops kernel
  2010-08-18  7:49   ` Paolo Bonzini
  2010-08-18 14:15     ` Paolo Bonzini
@ 2010-08-18 14:17     ` Jed Smith
  1 sibling, 0 replies; 6+ messages in thread
From: Jed Smith @ 2010-08-18 14:17 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: xen-devel

On Aug 18, 2010, at 3:49 AM, Paolo Bonzini wrote:

> On 08/18/2010 12:51 AM, Jeremy Fitzhardinge wrote:
>>  On 08/17/2010 10:29 AM, Paolo Bonzini wrote:
>>> Hi,
>>> 
>>> while experimenting a bit with time.c we found a bug in time
>>> accounting.  Basically, /proc/stat counts idle time twice for PV guests
>>> running a pvops kernel
>> 
>> What version?
> 
> I was using the latest RHEL6 snapshot + the 16-patch blkfront series (i.e. without the patch you pointed out).
> 
>> Upstream and stable kernels contain the changeset "xen:
>> drop xen_sched_clock in favour of using plain wallclock time" which
>> should fix a lot of timekeeping/scheduling problems.
> 
> I'll try this patch; however, offhand I don't see how it fixes the problem of calling account_idle_ticks twice.

I saw this too, even with said patch applied.  To avoid this being simply a
'me too!' message, I noticed that it aggravated Munin quite a bit.  The CPU
plugin detects 800% of idle on a 4-core machine, but only idle time is off.

Regards,

Jed Smith
Systems Administrator
Linode, LLC
+1 (609) 593-7103 x1209
jed@linode.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: time accounting problem in pvops kernel
  2010-08-18 14:15     ` Paolo Bonzini
@ 2010-08-18 16:06       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2010-08-18 16:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Glauber Costa, xen-devel@lists.xensource.com

 On 08/18/2010 07:15 AM, Paolo Bonzini wrote:
> On 08/18/2010 09:49 AM, Paolo Bonzini wrote:
>>
>>> Upstream and stable kernels contain the changeset "xen:
>>> drop xen_sched_clock in favour of using plain wallclock time" which
>>> should fix a lot of timekeeping/scheduling problems.
>>
>> I'll try this patch; however, offhand I don't see how it fixes the
>> problem of calling account_idle_ticks twice.
>
> It doesn't. :)

OK.  To be honest, I didn't look at the detail of your report.  I just
wanted to make sure it wasn't something we'd already addressed.

    J

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-08-18 16:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-17 17:29 time accounting problem in pvops kernel Paolo Bonzini
2010-08-17 22:51 ` Jeremy Fitzhardinge
2010-08-18  7:49   ` Paolo Bonzini
2010-08-18 14:15     ` Paolo Bonzini
2010-08-18 16:06       ` Jeremy Fitzhardinge
2010-08-18 14:17     ` Jed Smith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.