From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: time accounting problem in pvops kernel Date: Tue, 17 Aug 2010 19:29:41 +0200 Message-ID: <4C6AC705.1080904@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "xen-devel@lists.xensource.com" Cc: Glauber Costa List-Id: xen-devel@lists.xenproject.org Hi, while experimenting a bit with time.c we found a bug in time accounting. Basically, /proc/stat counts idle time twice for PV guests running a pvops kernel. To reproduce, try this command in an unloaded guest: grep cpu0 /proc/stat; sleep 20 ; grep cpu0 /proc/stat and see the fourth number in /proc/stat (idle) increasing by approximately 4000 for a kernel with USER_HZ == 100. Instead, if you try these commands instead (you need an otherwise unloaded machine for these): grep cpu0 /proc/stat; timeout 20s yes > /dev/null ; grep cpu0 /proc/stat grep cpu0 /proc/stat; timeout 20s dd if=/dev/urandom > /dev/null ; grep cpu0 /proc/stat the first and third number in the /cpu/stat increase instead by 2000 only. The reason for this seems to be that in xen_timer_interrupt Linux's normal timer accounting is called (via evt->event_handler) and this calls account_idle_time. However, idle ticks are also added from do_stolen_accounting, so that overall they're counted twice. Related to this, it looks like stolen tick accounting is subtly wrong. Even if only part of a tick is stolen by the hypervisor, Linux's time accounting will add a whole tick to the user/system/idle time. In a dynticks kernel (or maybe even if the scheduling quanta have some kind of resonance with the guest's timer interrupt?) the sum of the four components user+sys+idle+steal will then be larger than the wall time. In fact, I found experimentally steal time to be usually 20% off from wall-user-sys-idle when the machine is under moderate load (e.g. 5 domains at 100% CPU usage, on a 4-CPU machine). Of course I used the correct, divided-by-2 idle time to do this computation. Paolo