From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor Date: Wed, 11 May 2011 14:41:32 +0100 Message-ID: <4DCAAE2C020000780004101B@vpn.id2.novell.com> References: <20110511003347.GA29851@dumpdata.com> <4DCA62150200007800040EEA@vpn.id2.novell.com> <20110511131058.GA4130@dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20110511131058.GA4130@dumpdata.com> Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Konrad Rzeszutek Wilk Cc: xen-devel@lists.xensource.com, crrodriguez@suse.de, nhorman@tuxdriver.com, bwalle@suse.de, pbrobinson@gmail.com, notting@redhat.com, arjan@linux.intel.com, anibal@debian.org List-Id: xen-devel@lists.xenproject.org >>> On 11.05.11 at 15:10, Konrad Rzeszutek Wilk = wrote: > On Wed, May 11, 2011 at 09:16:53AM +0100, Jan Beulich wrote: >> >>> On 11.05.11 at 02:33, Konrad Rzeszutek Wilk = wrote: >> > The reason behind it is that irqbalance parses the /proc/interrupts >> > and whenever it hits something it can't understand: >> >=20 >> > RES: 191614137 73904910 Rescheduling interrupts >> >=20 >> > It will count the number of interrupts towards the IRQ 0. That IRQ = does=20 >> > exist >> > when the kernel boots under baremetal: >> >=20 >> > 0: 46 0 IO-APIC-edge timer >> >=20 >> > but under Xen, the timer interrupts are initialized much later: >> >=20 >> > 272: 41197188 0 xen-percpu-virq timer0 >> >=20 >> > and the first IRQ that is used is not zero, but rather one: >> >=20 >> > 1: 73037 0 0 0 0 = 0 =20 >> > xen-pirq-ioapic-edge i8042 >> >=20 >> > so when irqbalance tries to account for the IRQ 'RES' to the IRQ 0 >> > it fails and segfaults. The attached patch fixes it for whoever else = is >> > hitting this problem. >>=20 >> In the svn snapshot I have, I see >>=20 >> /* lines with letters in front are special, like NMI = count. Ignore */ >> if (!(line[0]=3D=3D' ' || (line[0]>=3D'0' && line[0]<=3D'9'= ))) >> break; >>=20 >> which I would think should be taking care of your problem (or >> I mis-read your description), and which was there already before >=20 > Not anymore. In kernels 2.6.37: >=20 > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 = =20 >=20 > .. snip. > NMI: 0 0 0 0 Non-maskable = interrupts > LOC: 12413629 12858323 16296183 11098466 Local timer interrupts= >=20 > In 2.6.38 and later: > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 = =20 > =20 > TRM: 0 0 0 0 0 0 = =20 > Thermal event interrupts > THR: 0 0 0 0 0 0 = =20 > Threshold APIC interrupts > MCE: 0 0 0 0 0 0 = =20 > Machine check exceptions >=20 > They added in a space before the name. The check you mentioned > above could be augmented for this of course, as another solution > for this. Not generally - this depends on your configuration. I just check on ma laptop, and there is no extra space there. It's presumably indeed what I wrote here: >> 0.56. Or are you perhaps having the problem because you have >> 1000+ interrupts, thus causing even the non-numeric strings to >> get space padded on their left? In that case I'd rather think above >> check should be either improved or removed (replaced by your >> solution). ... and this left padding had been introduced a lot earlier than .37 iirc. Jan >> > I am not sure who the upstream maintainer is for this so >> > I am sending this patch to the different distros as well. >>=20 >> Copying Neil and Arjan. >>=20 >> Jan