From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: time accounting problem in pvops kernel Date: Tue, 17 Aug 2010 15:51:34 -0700 Message-ID: <4C6B1276.9060203@goop.org> References: <4C6AC705.1080904@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4C6AC705.1080904@redhat.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Paolo Bonzini Cc: Glauber Costa , "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org On 08/17/2010 10:29 AM, Paolo Bonzini wrote: > Hi, > > while experimenting a bit with time.c we found a bug in time > accounting. Basically, /proc/stat counts idle time twice for PV guests > running a pvops kernel What version? Upstream and stable kernels contain the changeset "xen: drop xen_sched_clock in favour of using plain wallclock time" which should fix a lot of timekeeping/scheduling problems. Thanks, J > . > > To reproduce, try this command in an unloaded guest: > > grep cpu0 /proc/stat; sleep 20 ; grep cpu0 /proc/stat > > and see the fourth number in /proc/stat (idle) increasing by approximately > 4000 for a kernel with USER_HZ == 100. Instead, if you try these commands > instead (you need an otherwise unloaded machine for these): > > grep cpu0 /proc/stat; timeout 20s yes > /dev/null ; grep cpu0 /proc/stat > grep cpu0 /proc/stat; timeout 20s dd if=/dev/urandom > /dev/null ; grep cpu0 /proc/stat > > the first and third number in the /cpu/stat increase instead by 2000 only. > > The reason for this seems to be that in xen_timer_interrupt Linux's > normal timer accounting is called (via evt->event_handler) and this > calls account_idle_time. However, idle ticks are also added from > do_stolen_accounting, so that overall they're counted twice. > > Related to this, it looks like stolen tick accounting is subtly > wrong. Even if only part of a tick is stolen by the hypervisor, Linux's > time accounting will add a whole tick to the user/system/idle time. In > a dynticks kernel (or maybe even if the scheduling quanta have some > kind of resonance with the guest's timer interrupt?) the sum of the > four components user+sys+idle+steal will then be larger than the wall > time. In fact, I found experimentally steal time to be usually 20% > off from wall-user-sys-idle when the machine is under moderate load > (e.g. 5 domains at 100% CPU usage, on a 4-CPU machine). Of course I used > the correct, divided-by-2 idle time to do this computation. > > Paolo > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >