From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Theurer Subject: Re: CPU Utilization Date: Tue, 13 Dec 2005 09:35:35 -0600 Message-ID: <439EEA47.7020700@us.ibm.com> References: <5440A5A36B8CED4B9F54524343CB6B68FD1535@xmb-rtp-215.amer.cisco.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5440A5A36B8CED4B9F54524343CB6B68FD1535@xmb-rtp-215.amer.cisco.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Dave Thompson (davetho)" Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Dave Thompson (davetho) wrote: >>-----Original Message----- >>From: Andrew Theurer [mailto:habanero@us.ibm.com] >>Sent: Monday, December 12, 2005 9:24 PM >>To: Dave Thompson (davetho) >>Cc: Anthony Liguori; xen-devel@lists.xensource.com >>Subject: Re: [Xen-devel] CPU Utilization >> >> >>>But what else is running? In this case I only have dom0 configured, >>>there is no domU. The only other possibility would be the hypervisor >>>and I hope the hypervisor is not accounting for the other 30%. >>> >>> >>If xend is started, you may have the software bridge running >>which can use as much as 10% cpu. >> > >But I would think that the bridge activity should be showing up >in the top CPU summary as well. It is running on domain 0 after all. >I know one person suggested that kernel activity is not represented >in the top CPU util output. But I don't see how that can be right. >If so, where else is that time accounted for? It seems to be all >there (in the sy, hi, and si values). > > >>Also, I don't see soft ints in that top output. >>That could also be another ~7% cpu. >> > >Soft interrupt time is accounted for in the si field (15%) of the >summary. I believe that is where most (if not all) of the TCP >processing is performed. Here is the top CPU summary display again: > >Cpu(s): 1.0% us, 7.3% sy, 0.0% ni, 73.3% id, 0.0% wa, 3.3% hi, >15.0% si > > Sorry, I overlooked the si. >>Also xen is doing some work, receiving the real interrupts >>and generating virtual interrupts to dom0, so with all this, >>it is possible that you are using another 30% unseen >>in top. >> > >But aren't the hypervisor calls actually still being accounted for >by the domain since clock ticks are not lost but made up for in the >timer_interrupt() function of arch/xen/i386/kiernel/time.c? The >only issue is really when a domain is preempted by another domain >by the xen scheduler and this is actually a problem in the other >direction. The swapped out domain will still account for the >time in whichever time bucket it was using when the domain was >preempted (so the same time is accounted for by both domains). >Basically the aggregated CPU time for all domains on a CPU could >add greater than 100% because of this. If the domain is >re-scheduled because of a SCHEDOP_block in the idle loop, the time >will be properly accounted for as idle time. > I wonder if this is working under all situations. This problem seems familiar. Before the kernel accounted for si and hi properly, we had a very similar situation with this type of workload: lots of cpu time unaccounted for because the interrupt processing happend mostly when the system was idle, and the timer tick did not account for this properly. I wonder if we have a similar problem in xen/linux. If lost ticks are "queued up" but accounted for just one type of mode, then I think we could be way off in some sitations like this. > >However, none of this really matters for my case since I am >only running domain 0, there is no guest domain. I just want >a good explanation why 'xm top' is reporting 30% more CPU utilization >than top in this case. > > >>Best way to confirm this would be to use xenoprofile. >> > >Xenoprof is great for seeing which kernel functions are taking >the majority of time but does it really help with CPU utilization? >It counts (in the default case) unhalted clock cycles and in the >xen idle loop the processor is halted (to save power) so the >clock cycles are not accounted for. Is this right or am I >missing something. > I guess I was hoping to find a smoking gun in xen :). The only other thing I think we could do is count the number of total samples we got over x seconds and compare this with the number of samples we would get in the same time period on a 100% busy system. We should then be able to figure out how much % time the cpu was halted. -Andrew