From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sebastian Hetze Subject: Re: Strange CPU usage pattern in SMP guest Date: Mon, 22 Mar 2010 13:51:20 +0100 Message-ID: <20100322125120.DE032A0015@mail.linux-ag.de> References: <20100321001304.B8EAF30301DA@mail.linux-ag.de> <4BA5F03C.1020900@redhat.com> <20100321120236.55228A0015@mail.linux-ag.de> <4BA60EDC.6080202@redhat.com> <20100321145548.CC027A0015@mail.linux-ag.de> <4BA63892.6090006@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Sebastian Hetze , kvm@vger.kernel.org, Marcelo Tosatti To: Avi Kivity Return-path: Received: from ironport.linux-ag.com ([62.245.157.240]:53806 "EHLO ironport.linux-ag.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754550Ab0CVMvW (ORCPT ); Mon, 22 Mar 2010 08:51:22 -0400 Received: from localhost (mail.linux-ag.de [62.245.157.206]) by mail.linux-ag.de (Postfix) with ESMTP id DE032A0015 for ; Mon, 22 Mar 2010 13:51:20 +0100 (CET) Content-Disposition: inline In-Reply-To: <4BA63892.6090006@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Sun, Mar 21, 2010 at 05:17:38PM +0200, Avi Kivity wrote: > On 03/21/2010 04:55 PM, Sebastian Hetze wrote: >> On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote: >> >>> On 03/21/2010 02:02 PM, Sebastian Hetze wrote: >>> >>>> 12:46:02 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle >>>> 12:46:03 all 0,20 11,35 10,96 8,96 0,40 2,99 0,00 0,00 65,14 >>>> 12:46:03 0 1,00 11,00 7,00 15,00 0,00 1,00 0,00 0,00 65,00 >>>> 12:46:03 1 0,00 7,14 2,04 6,12 1,02 11,22 0,00 0,00 72,45 >>>> 12:46:03 2 0,00 15,00 1,00 12,00 0,00 1,00 0,00 0,00 71,00 >>>> 12:46:03 3 0,00 11,00 23,00 8,00 0,00 0,00 0,00 0,00 58,00 >>>> 12:46:03 4 0,00 0,00 50,00 0,00 0,00 0,00 0,00 0,00 50,00 >>>> 12:46:03 5 0,00 13,00 20,00 4,00 0,00 1,00 0,00 0,00 62,00 >>>> >>>> So it is only CPU4 that is showing this strange behaviour. >>>> >>>> >>> Can you adjust irqtop to only count cpu4? or even just post a few 'cat >>> /proc/interrupts' from that guest. >>> >>> Most likely the timer interrupt for cpu4 died. >>> >> I've added two keys +/- to your irqtop to focus up and down >> in the row of available CPUs. >> The irqtop for CPU4 shows a constant number of 6 local timer interrupts >> per update, while the other CPUs show various higher values: >> >> irqtop for cpu 4 >> >> eth0 188 >> Rescheduling interrupts 162 >> Local timer interrupts 6 >> ata_piix 3 >> TLB shootdowns 1 >> Spurious interrupts 0 >> Machine check exceptions 0 >> >> >> irqtop for cpu 5 >> >> eth0 257 >> Local timer interrupts 251 >> Rescheduling interrupts 237 >> Spurious interrupts 0 >> Machine check exceptions 0 >> >> So the timer interrupt for cpu4 is not completely dead but somehow >> broken. > > That is incredibly weird. > >> What can cause this problem? Any way to speed it up again? >> > > The host has 8 cpus and is only running this 6 vcpu guest, yes? > > Can you confirm the other vcpus are ticking at 250 Hz? > > What does 'top' show running on cpu 4? Pressing 'f' 'j' will add a > last-used-cpu field in the display. > > Marcelo, any ideas? Just to let you know, right after startup, all vcpus work fine. The following message might be related to the problem: hrtimer: interrupt too slow, forcing clock min delta to 165954639 ns The guest is an 32bit system running on an 64bit host.