From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sebastian Hetze Subject: Re: Strange CPU usage pattern in SMP guest Date: Sun, 21 Mar 2010 15:55:48 +0100 Message-ID: <20100321145548.EBC95A0017@mail.linux-ag.de> References: <20100321001304.B8EAF30301DA@mail.linux-ag.de> <4BA5F03C.1020900@redhat.com> <20100321120236.55228A0015@mail.linux-ag.de> <4BA60EDC.6080202@redhat.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="zhXaljGHf11kAtnf" Cc: Sebastian Hetze , kvm@vger.kernel.org To: Avi Kivity Return-path: Received: from ironport.linux-ag.com ([62.245.157.240]:36532 "EHLO ironport.linux-ag.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752130Ab0CUOzv (ORCPT ); Sun, 21 Mar 2010 10:55:51 -0400 Received: from localhost (mail.linux-ag.de [62.245.157.206]) by mail.linux-ag.de (Postfix) with ESMTP id EBC95A0017 for ; Sun, 21 Mar 2010 15:55:48 +0100 (CET) Content-Disposition: inline In-Reply-To: <4BA60EDC.6080202@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: --zhXaljGHf11kAtnf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote: > On 03/21/2010 02:02 PM, Sebastian Hetze wrote: >> >> 12:46:02 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle >> 12:46:03 all 0,20 11,35 10,96 8,96 0,40 2,99 0,00 0,00 65,14 >> 12:46:03 0 1,00 11,00 7,00 15,00 0,00 1,00 0,00 0,00 65,00 >> 12:46:03 1 0,00 7,14 2,04 6,12 1,02 11,22 0,00 0,00 72,45 >> 12:46:03 2 0,00 15,00 1,00 12,00 0,00 1,00 0,00 0,00 71,00 >> 12:46:03 3 0,00 11,00 23,00 8,00 0,00 0,00 0,00 0,00 58,00 >> 12:46:03 4 0,00 0,00 50,00 0,00 0,00 0,00 0,00 0,00 50,00 >> 12:46:03 5 0,00 13,00 20,00 4,00 0,00 1,00 0,00 0,00 62,00 >> >> So it is only CPU4 that is showing this strange behaviour. >> > > Can you adjust irqtop to only count cpu4? or even just post a few 'cat > /proc/interrupts' from that guest. > > Most likely the timer interrupt for cpu4 died. I've added two keys +/- to your irqtop to focus up and down in the row of available CPUs. The irqtop for CPU4 shows a constant number of 6 local timer interrupts per update, while the other CPUs show various higher values: irqtop for cpu 4 eth0 188 Rescheduling interrupts 162 Local timer interrupts 6 ata_piix 3 TLB shootdowns 1 Spurious interrupts 0 Machine check exceptions 0 irqtop for cpu 5 eth0 257 Local timer interrupts 251 Rescheduling interrupts 237 Spurious interrupts 0 Machine check exceptions 0 So the timer interrupt for cpu4 is not completely dead but somehow broken. What can cause this problem? Any way to speed it up again? --zhXaljGHf11kAtnf Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=irqtop #!/usr/bin/python import curses import sys, os, time, optparse def read_interrupts(): global target irq = {} proc = file('/proc/interrupts') nrcpu = len(proc.readline().split()) if target < 0: target = 0; if target > nrcpu: target = nrcpu for line in proc.readlines(): vec, data = line.strip().split(':', 1) if vec in ('ERR', 'MIS'): continue counts = data.split(None, nrcpu) counts, rest = (counts[:-1], counts[-1]) if target == 0: count = sum([int(x) for x in counts]) else: count = int(counts[target-1]) try: v = int(vec) name = rest.split(None, 1)[1] except: name = rest irq[name] = count return irq def delta_interrupts(): old = read_interrupts() while True: irq = read_interrupts() delta = {} for key in irq.keys(): delta[key] = irq[key] - old[key] yield delta old = irq target = 0 label_width = 35 number_width = 10 def tui(screen): curses.use_default_colors() global target curses.noecho() def getcount(x): return x[1] def refresh(irq): screen.erase() if target > 0: title = "irqtop for cpu %d"%(target-1) else: title = "irqtop sum for all cpu's" screen.addstr(0, 0, title) row = 2 for name, count in sorted(irq.items(), key = getcount, reverse = True): if row >= screen.getmaxyx()[0]: break col = 1 screen.addstr(row, col, name) col += label_width screen.addstr(row, col, '%10d' % (count,)) row += 1 screen.refresh() for irqs in delta_interrupts(): refresh(irqs) curses.halfdelay(10) try: c = screen.getkey() if c == 'q': break if c == '+': target = target+1 if c == '-': target = target-1 except KeyboardInterrupt: break except curses.error: continue import curses.wrapper curses.wrapper(tui) --zhXaljGHf11kAtnf--