From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor Date: Wed, 11 May 2011 09:16:53 +0100 Message-ID: <4DCA62150200007800040EEA@vpn.id2.novell.com> References: <20110511003347.GA29851@dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20110511003347.GA29851@dumpdata.com> Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Konrad Rzeszutek Wilk Cc: xen-devel@lists.xensource.com, crrodriguez@suse.de, nhorman@tuxdriver.com, bwalle@suse.de, pbrobinson@gmail.com, notting@redhat.com, arjan@linux.intel.com, anibal@debian.org List-Id: xen-devel@lists.xenproject.org >>> On 11.05.11 at 02:33, Konrad Rzeszutek Wilk = wrote: > The reason behind it is that irqbalance parses the /proc/interrupts > and whenever it hits something it can't understand: >=20 > RES: 191614137 73904910 Rescheduling interrupts >=20 > It will count the number of interrupts towards the IRQ 0. That IRQ = does=20 > exist > when the kernel boots under baremetal: >=20 > 0: 46 0 IO-APIC-edge timer >=20 > but under Xen, the timer interrupts are initialized much later: >=20 > 272: 41197188 0 xen-percpu-virq timer0 >=20 > and the first IRQ that is used is not zero, but rather one: >=20 > 1: 73037 0 0 0 0 0 = =20 > xen-pirq-ioapic-edge i8042 >=20 > so when irqbalance tries to account for the IRQ 'RES' to the IRQ 0 > it fails and segfaults. The attached patch fixes it for whoever else is > hitting this problem. In the svn snapshot I have, I see /* lines with letters in front are special, like NMI = count. Ignore */ if (!(line[0]=3D=3D' ' || (line[0]>=3D'0' && line[0]<=3D'9'= ))) break; which I would think should be taking care of your problem (or I mis-read your description), and which was there already before 0.56. Or are you perhaps having the problem because you have 1000+ interrupts, thus causing even the non-numeric strings to get space padded on their left? In that case I'd rather think above check should be either improved or removed (replaced by your solution). > I am not sure who the upstream maintainer is for this so > I am sending this patch to the different distros as well. Copying Neil and Arjan. Jan >=20 > --- irqbalance-0.56.orig/procinterrupts.c 2010-06-10 10:45:55.0000000= 00 -0400 > +++ irqbalance-0.56/procinterrupts.c 2011-05-10 20:22:06.897465003 = -0400 > @@ -50,7 +50,7 @@ void parse_proc_interrupts(void) > int cpunr; > int number; > uint64_t count; > - char *c, *c2; > + char *c, *c2, *err; > =20 > if (getline(&line, &size, file)=3D=3D0) > break; > @@ -64,7 +64,11 @@ void parse_proc_interrupts(void) > continue; > *c =3D 0; > c++; > - number =3D strtoul(line, NULL, 10); > + number =3D strtoul(line, &err, 10); > + /* Man page says that if that happens and number =3D=3D 0, = then it > + * failed to parse. */ > + if (err =3D=3D line && number =3D=3D 0) > + continue; > count =3D 0; > cpunr =3D 0; > =20