* irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor
@ 2011-05-11 0:33 Konrad Rzeszutek Wilk
2011-05-11 8:16 ` Jan Beulich
0 siblings, 1 reply; 4+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-11 0:33 UTC (permalink / raw)
To: xen-devel, anibal, notting, pbrobinson, crrodriguez, bwalle
[-- Attachment #1: Type: text/plain, Size: 1596 bytes --]
The reason behind it is that irqbalance parses the /proc/interrupts
and whenever it hits something it can't understand:
RES: 191614137 73904910 Rescheduling interrupts
It will count the number of interrupts towards the IRQ 0. That IRQ does exist
when the kernel boots under baremetal:
0: 46 0 IO-APIC-edge timer
but under Xen, the timer interrupts are initialized much later:
272: 41197188 0 xen-percpu-virq timer0
and the first IRQ that is used is not zero, but rather one:
1: 73037 0 0 0 0 0 xen-pirq-ioapic-edge i8042
so when irqbalance tries to account for the IRQ 'RES' to the IRQ 0
it fails and segfaults. The attached patch fixes it for whoever else is
hitting this problem. I am not sure who the upstream maintainer is for this so
I am sending this patch to the different distros as well.
--- irqbalance-0.56.orig/procinterrupts.c 2010-06-10 10:45:55.000000000 -0400
+++ irqbalance-0.56/procinterrupts.c 2011-05-10 20:22:06.897465003 -0400
@@ -50,7 +50,7 @@ void parse_proc_interrupts(void)
int cpunr;
int number;
uint64_t count;
- char *c, *c2;
+ char *c, *c2, *err;
if (getline(&line, &size, file)==0)
break;
@@ -64,7 +64,11 @@ void parse_proc_interrupts(void)
continue;
*c = 0;
c++;
- number = strtoul(line, NULL, 10);
+ number = strtoul(line, &err, 10);
+ /* Man page says that if that happens and number == 0, then it
+ * failed to parse. */
+ if (err == line && number == 0)
+ continue;
count = 0;
cpunr = 0;
[-- Attachment #2: irqbalance.patch --]
[-- Type: text/x-diff, Size: 673 bytes --]
--- irqbalance-0.56.orig/procinterrupts.c 2010-06-10 10:45:55.000000000 -0400
+++ irqbalance-0.56/procinterrupts.c 2011-05-10 20:22:06.897465003 -0400
@@ -50,7 +50,7 @@ void parse_proc_interrupts(void)
int cpunr;
int number;
uint64_t count;
- char *c, *c2;
+ char *c, *c2, *err;
if (getline(&line, &size, file)==0)
break;
@@ -64,7 +64,11 @@ void parse_proc_interrupts(void)
continue;
*c = 0;
c++;
- number = strtoul(line, NULL, 10);
+ number = strtoul(line, &err, 10);
+ /* Man page says that if that happens and number == 0, then it
+ * failed to parse. */
+ if (err == line && number == 0)
+ continue;
count = 0;
cpunr = 0;
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor
2011-05-11 0:33 irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor Konrad Rzeszutek Wilk
@ 2011-05-11 8:16 ` Jan Beulich
2011-05-11 13:10 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2011-05-11 8:16 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: xen-devel, crrodriguez, nhorman, bwalle, pbrobinson, notting,
arjan, anibal
>>> On 11.05.11 at 02:33, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> The reason behind it is that irqbalance parses the /proc/interrupts
> and whenever it hits something it can't understand:
>
> RES: 191614137 73904910 Rescheduling interrupts
>
> It will count the number of interrupts towards the IRQ 0. That IRQ does
> exist
> when the kernel boots under baremetal:
>
> 0: 46 0 IO-APIC-edge timer
>
> but under Xen, the timer interrupts are initialized much later:
>
> 272: 41197188 0 xen-percpu-virq timer0
>
> and the first IRQ that is used is not zero, but rather one:
>
> 1: 73037 0 0 0 0 0
> xen-pirq-ioapic-edge i8042
>
> so when irqbalance tries to account for the IRQ 'RES' to the IRQ 0
> it fails and segfaults. The attached patch fixes it for whoever else is
> hitting this problem.
In the svn snapshot I have, I see
/* lines with letters in front are special, like NMI count. Ignore */
if (!(line[0]==' ' || (line[0]>='0' && line[0]<='9')))
break;
which I would think should be taking care of your problem (or
I mis-read your description), and which was there already before
0.56. Or are you perhaps having the problem because you have
1000+ interrupts, thus causing even the non-numeric strings to
get space padded on their left? In that case I'd rather think above
check should be either improved or removed (replaced by your
solution).
> I am not sure who the upstream maintainer is for this so
> I am sending this patch to the different distros as well.
Copying Neil and Arjan.
Jan
>
> --- irqbalance-0.56.orig/procinterrupts.c 2010-06-10 10:45:55.000000000 -0400
> +++ irqbalance-0.56/procinterrupts.c 2011-05-10 20:22:06.897465003 -0400
> @@ -50,7 +50,7 @@ void parse_proc_interrupts(void)
> int cpunr;
> int number;
> uint64_t count;
> - char *c, *c2;
> + char *c, *c2, *err;
>
> if (getline(&line, &size, file)==0)
> break;
> @@ -64,7 +64,11 @@ void parse_proc_interrupts(void)
> continue;
> *c = 0;
> c++;
> - number = strtoul(line, NULL, 10);
> + number = strtoul(line, &err, 10);
> + /* Man page says that if that happens and number == 0, then it
> + * failed to parse. */
> + if (err == line && number == 0)
> + continue;
> count = 0;
> cpunr = 0;
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor
2011-05-11 8:16 ` Jan Beulich
@ 2011-05-11 13:10 ` Konrad Rzeszutek Wilk
2011-05-11 13:41 ` Jan Beulich
0 siblings, 1 reply; 4+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-11 13:10 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel, crrodriguez, nhorman, bwalle, pbrobinson, notting,
arjan, anibal
On Wed, May 11, 2011 at 09:16:53AM +0100, Jan Beulich wrote:
> >>> On 11.05.11 at 02:33, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > The reason behind it is that irqbalance parses the /proc/interrupts
> > and whenever it hits something it can't understand:
> >
> > RES: 191614137 73904910 Rescheduling interrupts
> >
> > It will count the number of interrupts towards the IRQ 0. That IRQ does
> > exist
> > when the kernel boots under baremetal:
> >
> > 0: 46 0 IO-APIC-edge timer
> >
> > but under Xen, the timer interrupts are initialized much later:
> >
> > 272: 41197188 0 xen-percpu-virq timer0
> >
> > and the first IRQ that is used is not zero, but rather one:
> >
> > 1: 73037 0 0 0 0 0
> > xen-pirq-ioapic-edge i8042
> >
> > so when irqbalance tries to account for the IRQ 'RES' to the IRQ 0
> > it fails and segfaults. The attached patch fixes it for whoever else is
> > hitting this problem.
>
> In the svn snapshot I have, I see
>
> /* lines with letters in front are special, like NMI count. Ignore */
> if (!(line[0]==' ' || (line[0]>='0' && line[0]<='9')))
> break;
>
> which I would think should be taking care of your problem (or
> I mis-read your description), and which was there already before
Not anymore. In kernels 2.6.37:
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
.. snip.
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 12413629 12858323 16296183 11098466 Local timer interrupts
In 2.6.38 and later:
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
TRM: 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 0 0 Machine check exceptions
They added in a space before the name. The check you mentioned
above could be augmented for this of course, as another solution
for this.
> 0.56. Or are you perhaps having the problem because you have
> 1000+ interrupts, thus causing even the non-numeric strings to
> get space padded on their left? In that case I'd rather think above
> check should be either improved or removed (replaced by your
> solution).
>
> > I am not sure who the upstream maintainer is for this so
> > I am sending this patch to the different distros as well.
>
> Copying Neil and Arjan.
>
> Jan
>
> >
> > --- irqbalance-0.56.orig/procinterrupts.c 2010-06-10 10:45:55.000000000 -0400
> > +++ irqbalance-0.56/procinterrupts.c 2011-05-10 20:22:06.897465003 -0400
> > @@ -50,7 +50,7 @@ void parse_proc_interrupts(void)
> > int cpunr;
> > int number;
> > uint64_t count;
> > - char *c, *c2;
> > + char *c, *c2, *err;
> >
> > if (getline(&line, &size, file)==0)
> > break;
> > @@ -64,7 +64,11 @@ void parse_proc_interrupts(void)
> > continue;
> > *c = 0;
> > c++;
> > - number = strtoul(line, NULL, 10);
> > + number = strtoul(line, &err, 10);
> > + /* Man page says that if that happens and number == 0, then it
> > + * failed to parse. */
> > + if (err == line && number == 0)
> > + continue;
> > count = 0;
> > cpunr = 0;
> >
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor
2011-05-11 13:10 ` Konrad Rzeszutek Wilk
@ 2011-05-11 13:41 ` Jan Beulich
0 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2011-05-11 13:41 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: xen-devel, crrodriguez, nhorman, bwalle, pbrobinson, notting,
arjan, anibal
>>> On 11.05.11 at 15:10, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Wed, May 11, 2011 at 09:16:53AM +0100, Jan Beulich wrote:
>> >>> On 11.05.11 at 02:33, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> > The reason behind it is that irqbalance parses the /proc/interrupts
>> > and whenever it hits something it can't understand:
>> >
>> > RES: 191614137 73904910 Rescheduling interrupts
>> >
>> > It will count the number of interrupts towards the IRQ 0. That IRQ does
>> > exist
>> > when the kernel boots under baremetal:
>> >
>> > 0: 46 0 IO-APIC-edge timer
>> >
>> > but under Xen, the timer interrupts are initialized much later:
>> >
>> > 272: 41197188 0 xen-percpu-virq timer0
>> >
>> > and the first IRQ that is used is not zero, but rather one:
>> >
>> > 1: 73037 0 0 0 0 0
>> > xen-pirq-ioapic-edge i8042
>> >
>> > so when irqbalance tries to account for the IRQ 'RES' to the IRQ 0
>> > it fails and segfaults. The attached patch fixes it for whoever else is
>> > hitting this problem.
>>
>> In the svn snapshot I have, I see
>>
>> /* lines with letters in front are special, like NMI count. Ignore */
>> if (!(line[0]==' ' || (line[0]>='0' && line[0]<='9')))
>> break;
>>
>> which I would think should be taking care of your problem (or
>> I mis-read your description), and which was there already before
>
> Not anymore. In kernels 2.6.37:
>
> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
>
> .. snip.
> NMI: 0 0 0 0 Non-maskable interrupts
> LOC: 12413629 12858323 16296183 11098466 Local timer interrupts
>
> In 2.6.38 and later:
> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
>
> TRM: 0 0 0 0 0 0
> Thermal event interrupts
> THR: 0 0 0 0 0 0
> Threshold APIC interrupts
> MCE: 0 0 0 0 0 0
> Machine check exceptions
>
> They added in a space before the name. The check you mentioned
> above could be augmented for this of course, as another solution
> for this.
Not generally - this depends on your configuration. I just check on
ma laptop, and there is no extra space there. It's presumably
indeed what I wrote here:
>> 0.56. Or are you perhaps having the problem because you have
>> 1000+ interrupts, thus causing even the non-numeric strings to
>> get space padded on their left? In that case I'd rather think above
>> check should be either improved or removed (replaced by your
>> solution).
... and this left padding had been introduced a lot earlier than
.37 iirc.
Jan
>> > I am not sure who the upstream maintainer is for this so
>> > I am sending this patch to the different distros as well.
>>
>> Copying Neil and Arjan.
>>
>> Jan
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-05-11 13:41 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-11 0:33 irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor Konrad Rzeszutek Wilk
2011-05-11 8:16 ` Jan Beulich
2011-05-11 13:10 ` Konrad Rzeszutek Wilk
2011-05-11 13:41 ` Jan Beulich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).