xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor
@ 2011-05-11  0:33 Konrad Rzeszutek Wilk
  2011-05-11  8:16 ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-11  0:33 UTC (permalink / raw)
  To: xen-devel, anibal, notting, pbrobinson, crrodriguez, bwalle

[-- Attachment #1: Type: text/plain, Size: 1596 bytes --]

The reason behind it is that irqbalance parses the /proc/interrupts
and whenever it hits something it can't understand:

 RES:  191614137   73904910    Rescheduling interrupts

It will count the number of interrupts towards the IRQ 0. That IRQ does exist
when the kernel boots under baremetal:

  0:         46          0       IO-APIC-edge      timer

but under Xen, the timer interrupts are initialized much later:

 272:   41197188          0        xen-percpu-virq      timer0

and the first IRQ that is used is not zero, but rather one:

   1:      73037          0          0          0          0          0  xen-pirq-ioapic-edge  i8042

so when irqbalance tries to account for the IRQ 'RES' to the IRQ 0
it fails and segfaults. The attached patch fixes it for whoever else is
hitting this problem. I am not sure who the upstream maintainer is for this so
I am sending this patch to the different distros as well.


--- irqbalance-0.56.orig/procinterrupts.c	2010-06-10 10:45:55.000000000 -0400
+++ irqbalance-0.56/procinterrupts.c	2011-05-10 20:22:06.897465003 -0400
@@ -50,7 +50,7 @@ void parse_proc_interrupts(void)
 		int cpunr;
 		int	 number;
 		uint64_t count;
-		char *c, *c2;
+		char *c, *c2, *err;
 
 		if (getline(&line, &size, file)==0)
 			break;
@@ -64,7 +64,11 @@ void parse_proc_interrupts(void)
 			continue;
 		*c = 0;
 		c++;
-		number = strtoul(line, NULL, 10);
+		number = strtoul(line, &err, 10);
+		/* Man page says that if that happens and number == 0, then it
+		 * failed to parse. */
+		if (err == line && number == 0)
+			continue;
 		count = 0;
 		cpunr = 0;
 

[-- Attachment #2: irqbalance.patch --]
[-- Type: text/x-diff, Size: 673 bytes --]

--- irqbalance-0.56.orig/procinterrupts.c	2010-06-10 10:45:55.000000000 -0400
+++ irqbalance-0.56/procinterrupts.c	2011-05-10 20:22:06.897465003 -0400
@@ -50,7 +50,7 @@ void parse_proc_interrupts(void)
 		int cpunr;
 		int	 number;
 		uint64_t count;
-		char *c, *c2;
+		char *c, *c2, *err;
 
 		if (getline(&line, &size, file)==0)
 			break;
@@ -64,7 +64,11 @@ void parse_proc_interrupts(void)
 			continue;
 		*c = 0;
 		c++;
-		number = strtoul(line, NULL, 10);
+		number = strtoul(line, &err, 10);
+		/* Man page says that if that happens and number == 0, then it
+		 * failed to parse. */
+		if (err == line && number == 0)
+			continue;
 		count = 0;
 		cpunr = 0;
 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor
  2011-05-11  0:33 irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor Konrad Rzeszutek Wilk
@ 2011-05-11  8:16 ` Jan Beulich
  2011-05-11 13:10   ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2011-05-11  8:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, crrodriguez, nhorman, bwalle, pbrobinson, notting,
	arjan, anibal

>>> On 11.05.11 at 02:33, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> The reason behind it is that irqbalance parses the /proc/interrupts
> and whenever it hits something it can't understand:
> 
>  RES:  191614137   73904910    Rescheduling interrupts
> 
> It will count the number of interrupts towards the IRQ 0. That IRQ does 
> exist
> when the kernel boots under baremetal:
> 
>   0:         46          0       IO-APIC-edge      timer
> 
> but under Xen, the timer interrupts are initialized much later:
> 
>  272:   41197188          0        xen-percpu-virq      timer0
> 
> and the first IRQ that is used is not zero, but rather one:
> 
>    1:      73037          0          0          0          0          0  
> xen-pirq-ioapic-edge  i8042
> 
> so when irqbalance tries to account for the IRQ 'RES' to the IRQ 0
> it fails and segfaults. The attached patch fixes it for whoever else is
> hitting this problem.

In the svn snapshot I have, I see

		/* lines with letters in front are special, like NMI count. Ignore */
		if (!(line[0]==' ' || (line[0]>='0' && line[0]<='9')))
			break;

which I would think should be taking care of your problem (or
I mis-read your description), and which was there already before
0.56. Or are you perhaps having the problem because you have
1000+ interrupts, thus causing even the non-numeric strings to
get space padded on their left? In that case I'd rather think above
check should be either improved or removed (replaced by your
solution).

> I am not sure who the upstream maintainer is for this so
> I am sending this patch to the different distros as well.

Copying Neil and Arjan.

Jan

> 
> --- irqbalance-0.56.orig/procinterrupts.c	2010-06-10 10:45:55.000000000 -0400
> +++ irqbalance-0.56/procinterrupts.c	2011-05-10 20:22:06.897465003 -0400
> @@ -50,7 +50,7 @@ void parse_proc_interrupts(void)
>  		int cpunr;
>  		int	 number;
>  		uint64_t count;
> -		char *c, *c2;
> +		char *c, *c2, *err;
>  
>  		if (getline(&line, &size, file)==0)
>  			break;
> @@ -64,7 +64,11 @@ void parse_proc_interrupts(void)
>  			continue;
>  		*c = 0;
>  		c++;
> -		number = strtoul(line, NULL, 10);
> +		number = strtoul(line, &err, 10);
> +		/* Man page says that if that happens and number == 0, then it
> +		 * failed to parse. */
> +		if (err == line && number == 0)
> +			continue;
>  		count = 0;
>  		cpunr = 0;
>  

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor
  2011-05-11  8:16 ` Jan Beulich
@ 2011-05-11 13:10   ` Konrad Rzeszutek Wilk
  2011-05-11 13:41     ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-11 13:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, crrodriguez, nhorman, bwalle, pbrobinson, notting,
	arjan, anibal

On Wed, May 11, 2011 at 09:16:53AM +0100, Jan Beulich wrote:
> >>> On 11.05.11 at 02:33, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > The reason behind it is that irqbalance parses the /proc/interrupts
> > and whenever it hits something it can't understand:
> > 
> >  RES:  191614137   73904910    Rescheduling interrupts
> > 
> > It will count the number of interrupts towards the IRQ 0. That IRQ does 
> > exist
> > when the kernel boots under baremetal:
> > 
> >   0:         46          0       IO-APIC-edge      timer
> > 
> > but under Xen, the timer interrupts are initialized much later:
> > 
> >  272:   41197188          0        xen-percpu-virq      timer0
> > 
> > and the first IRQ that is used is not zero, but rather one:
> > 
> >    1:      73037          0          0          0          0          0  
> > xen-pirq-ioapic-edge  i8042
> > 
> > so when irqbalance tries to account for the IRQ 'RES' to the IRQ 0
> > it fails and segfaults. The attached patch fixes it for whoever else is
> > hitting this problem.
> 
> In the svn snapshot I have, I see
> 
> 		/* lines with letters in front are special, like NMI count. Ignore */
> 		if (!(line[0]==' ' || (line[0]>='0' && line[0]<='9')))
> 			break;
> 
> which I would think should be taking care of your problem (or
> I mis-read your description), and which was there already before

Not anymore. In kernels 2.6.37:

           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       
.. snip.
NMI:          0          0          0          0   Non-maskable interrupts
LOC:   12413629   12858323   16296183   11098466   Local timer interrupts

In 2.6.38 and later:
            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       
 TRM:          0          0          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0          0          0   Machine check exceptions

They added in a space before the name. The check you mentioned
above could be augmented for this of course, as another solution
for this.

> 0.56. Or are you perhaps having the problem because you have
> 1000+ interrupts, thus causing even the non-numeric strings to
> get space padded on their left? In that case I'd rather think above
> check should be either improved or removed (replaced by your
> solution).
> 
> > I am not sure who the upstream maintainer is for this so
> > I am sending this patch to the different distros as well.
> 
> Copying Neil and Arjan.
> 
> Jan
> 
> > 
> > --- irqbalance-0.56.orig/procinterrupts.c	2010-06-10 10:45:55.000000000 -0400
> > +++ irqbalance-0.56/procinterrupts.c	2011-05-10 20:22:06.897465003 -0400
> > @@ -50,7 +50,7 @@ void parse_proc_interrupts(void)
> >  		int cpunr;
> >  		int	 number;
> >  		uint64_t count;
> > -		char *c, *c2;
> > +		char *c, *c2, *err;
> >  
> >  		if (getline(&line, &size, file)==0)
> >  			break;
> > @@ -64,7 +64,11 @@ void parse_proc_interrupts(void)
> >  			continue;
> >  		*c = 0;
> >  		c++;
> > -		number = strtoul(line, NULL, 10);
> > +		number = strtoul(line, &err, 10);
> > +		/* Man page says that if that happens and number == 0, then it
> > +		 * failed to parse. */
> > +		if (err == line && number == 0)
> > +			continue;
> >  		count = 0;
> >  		cpunr = 0;
> >  
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor
  2011-05-11 13:10   ` Konrad Rzeszutek Wilk
@ 2011-05-11 13:41     ` Jan Beulich
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2011-05-11 13:41 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, crrodriguez, nhorman, bwalle, pbrobinson, notting,
	arjan, anibal

>>> On 11.05.11 at 15:10, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Wed, May 11, 2011 at 09:16:53AM +0100, Jan Beulich wrote:
>> >>> On 11.05.11 at 02:33, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> > The reason behind it is that irqbalance parses the /proc/interrupts
>> > and whenever it hits something it can't understand:
>> > 
>> >  RES:  191614137   73904910    Rescheduling interrupts
>> > 
>> > It will count the number of interrupts towards the IRQ 0. That IRQ does 
>> > exist
>> > when the kernel boots under baremetal:
>> > 
>> >   0:         46          0       IO-APIC-edge      timer
>> > 
>> > but under Xen, the timer interrupts are initialized much later:
>> > 
>> >  272:   41197188          0        xen-percpu-virq      timer0
>> > 
>> > and the first IRQ that is used is not zero, but rather one:
>> > 
>> >    1:      73037          0          0          0          0          0  
>> > xen-pirq-ioapic-edge  i8042
>> > 
>> > so when irqbalance tries to account for the IRQ 'RES' to the IRQ 0
>> > it fails and segfaults. The attached patch fixes it for whoever else is
>> > hitting this problem.
>> 
>> In the svn snapshot I have, I see
>> 
>> 		/* lines with letters in front are special, like NMI count. Ignore */
>> 		if (!(line[0]==' ' || (line[0]>='0' && line[0]<='9')))
>> 			break;
>> 
>> which I would think should be taking care of your problem (or
>> I mis-read your description), and which was there already before
> 
> Not anymore. In kernels 2.6.37:
> 
>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       
> 
> .. snip.
> NMI:          0          0          0          0   Non-maskable interrupts
> LOC:   12413629   12858323   16296183   11098466   Local timer interrupts
> 
> In 2.6.38 and later:
>             CPU0       CPU1       CPU2       CPU3       CPU4       CPU5      
>  
>  TRM:          0          0          0          0          0          0   
> Thermal event interrupts
>  THR:          0          0          0          0          0          0   
> Threshold APIC interrupts
>  MCE:          0          0          0          0          0          0   
> Machine check exceptions
> 
> They added in a space before the name. The check you mentioned
> above could be augmented for this of course, as another solution
> for this.

Not generally - this depends on your configuration. I just check on
ma laptop, and there is no extra space there. It's presumably
indeed what I wrote here:

>> 0.56. Or are you perhaps having the problem because you have
>> 1000+ interrupts, thus causing even the non-numeric strings to
>> get space padded on their left? In that case I'd rather think above
>> check should be either improved or removed (replaced by your
>> solution).

... and this left padding had been introduced a lot earlier than
.37 iirc.

Jan

>> > I am not sure who the upstream maintainer is for this so
>> > I am sending this patch to the different distros as well.
>> 
>> Copying Neil and Arjan.
>> 
>> Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-05-11 13:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-11  0:33 irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor Konrad Rzeszutek Wilk
2011-05-11  8:16 ` Jan Beulich
2011-05-11 13:10   ` Konrad Rzeszutek Wilk
2011-05-11 13:41     ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).