i.MX31 kernel panic and irq

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* i.MX31 kernel panic and irq
@ 2009-10-06 15:08 Wolf, Rene, HRO-GP
  2009-10-06 15:43 ` Bill Gatliff
  0 siblings, 1 reply; 8+ messages in thread
From: Wolf, Rene, HRO-GP @ 2009-10-06 15:08 UTC (permalink / raw)
  To: linux-arm-kernel

Hi @ all :-)

This is about a kernel panic I'm experiencing / causing.
Setup: The system is a DENX QONG EVB-Light. I consists of an i.MX31
(ARM11) + some flash and an FPGA doing eth. I use a rootfs over NFS
and the kernel is loaded from tftp. Version 2.6.31 (pulled from
DENX, which should be equal to the one from kernel.org)
So inside my kernel module I do that:
	
unsigned char my_dev; /* needed for the irq requesting */		
static int __init mod_init(void){	
	int gpio_1_irq=-1;
	if ( mxc_iomux_alloc_pin( IOMUX_MODE(MX31_PIN_CSI_D4,IOMUX_CONFIG_GPIO),\
			"GPIO" ) !=0 ){
		printk( " [MX31_PIN_CSI_D4=>GPIO:FAILED]\n");
		MY_EXIT(-EIO);
	}
	/* work around -> */
	gpio_request(IOMUX_TO_GPIO(MX31_PIN_CSI_D4), "GPIO" ) 
	/* <- work around */

	if ( gpio_direction_input(IOMUX_TO_GPIO(MX31_PIN_CSI_D4)) !=0 ){
		printk(" [MX31_PIN_CSI_D4=>IN:FAILED]\n");
		MY_EXIT(-EIO);
	} 
			
	gpio_1_irq = gpio_to_irq(IOMUX_TO_GPIO(MX31_PIN_CSI_D4));
	printk( "Reqesting GPIO-IRQ %d ... ", gpio_1_irq );
	if ( request_irq(gpio_1_irq,my_isr,IRQF_DISABLED,MY_DRV_NAME,&my_dev) ){
		printk( "failed!\n" );
		MY_EXIT(-EIO);
	}
}

the 'my_isr' looks like that:

static irqreturn_t my_isr( int irq, void * dev_id){
	return IRQ_HANDLED;
}

Next I tested the irq, by `watch cat /proc/interrupts` and shorting the
pin with a wire, to GND. -> works fine :-)

But I get a kernel panic when touching the pin with bare hands
(immediately after touching it). This works every time I touch that pin.
If I unload the module (which frees the irq in the deinit routine)
I can touch that pin freely, without crashing the kernel:


------------[ cut here ]------------
WARNING: at arch/arm/kernel/process.c:171 cpu_idle+0x74/0x88()
Modules linked in: test_drv
[<c002c904>] (unwind_backtrace+0x0/0xe8) from [<c00417d4>] (warn_slowpath_fmt+0x6c/0x90)
[<c00417d4>] (warn_slowpath_fmt+0x6c/0x90) from [<c0028230>] (cpu_idle+0x74/0x88)
[<c0028230>] (cpu_idle+0x74/0x88) from [<c00089a8>] (start_kernel+0x1f0/0x2cc)
[<c00089a8>] (start_kernel+0x1f0/0x2cc) from [<80008034>] (0x80008034)
---[ end trace caa39301c0064903 ]---
Unable to handle kernel paging request at virtual address 60000013
pgd = c0004000
[60000013] *pgd=00000000
Internal error: Oops: 5 [#1]
Modules linked in: test_drv
CPU: 0    Tainted: G        W   (2.6.31-mx31-spi #29)
PC is at cpu_idle+0x28/0x88
LR is at cpu_idle+0x74/0x88
pc : [<c00281e4>]    lr : [<c0028230>]    psr: 40000093
sp : c0339fc8  ip : 80000093  fp : 00000000
r10: 80020a40  r9 : 4107b364  r8 : 80020a74
r7 : c033c360  r6 : c033c36c  r5 : 60000013  r4 : c0028308
r3 : f1080080  r2 : 00000002  r1 : c03599ac  r0 : 00000009
Flags: nZcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 00c5387d  Table: 8fa10000  DAC: 00000017
Process swapper (pid: 0, stack limit = 0xc0338268)
Stack: (0xc0339fc8 to 0xc033a000)
9fc0:                   c037c99c c0357ad0 c0022e10 c00089a8 c0008350 00000000 
9fe0: 00000000 c0022e10 00c5387d c0357b40 c0023214 80008034 00000000 00000000 
[<c00281e4>] (cpu_idle+0x28/0x88) from [<c00089a8>] (start_kernel+0x1f0/0x2cc)
[<c00089a8>] (start_kernel+0x1f0/0x2cc) from [<80008034>] (0x80008034)
Code: e5943000 e3130002 1a000007 f10c0080 (e5953000) 
---[ end trace caa39301c0064904 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
[<c002c904>] (unwind_backtrace+0x0/0xe8) from [<c004184c>] (panic+0x44/0x124)
[<c004184c>] (panic+0x44/0x124) from [<c004494c>] (do_exit+0x550/0x5d4)
[<c004494c>] (do_exit+0x550/0x5d4) from [<c002ac24>] (die+0x124/0x144)
[<c002ac24>] (die+0x124/0x144) from [<c002d7cc>] (__do_kernel_fault+0x70/0x80)
[<c002d7cc>] (__do_kernel_fault+0x70/0x80) from [<c002d934>] (do_page_fault+0x158/0x24c)
[<c002d934>] (do_page_fault+0x158/0x24c) from [<c0026250>] (do_DataAbort+0x34/0x98)
[<c0026250>] (do_DataAbort+0x34/0x98) from [<c00269cc>] (__dabt_svc+0x4c/0x60)
Exception stack(0xc0339f80 to 0xc0339fc8)
9f80: 00000009 c03599ac 00000002 f1080080 c0028308 60000013 c033c36c c033c360 
9fa0: 80020a74 4107b364 80020a40 00000000 80000093 c0339fc8 c0028230 c00281e4 
9fc0: 40000093 ffffffff                                                       
[<c00269cc>] (__dabt_svc+0x4c/0x60) from [<c00281e4>] (cpu_idle+0x28/0x88)
[<c00281e4>] (cpu_idle+0x28/0x88) from [<c00089a8>] (start_kernel+0x1f0/0x2cc)
[<c00089a8>] (start_kernel+0x1f0/0x2cc) from [<80008034>] (0x80008034)
Rebooting in 1 seconds..


After that I took a function generator, generating a clock signal.
I tuned that up and with around 500kHz after 5 sec. or so the kernel
crashes (with round about 3Meg irq events). That looks like this:


Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 817 [#1]
Modules linked in: test_drv
CPU: 0    Not tainted  (2.6.31-mx31-spi #29)
PC is at flush_thread+0x4/0x54
LR is at flush_thread+0x8/0x54
pc : [<c0028318>]    lr : [<c002831c>]    psr: 60000093
sp : c0339fb8  ip : 00000000  fp : 00000000
r10: 80020a40  r9 : 4107b364  r8 : 80020a74
r7 : c033c360  r6 : c033c36c  r5 : c0357b0c  r4 : c0338000
r3 : c0338000  r2 : 00000000  r1 : 00000000  r0 : c033f820
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 00c5387d  Table: 8fa10000  DAC: 00000017
Process swapper (pid: 0, stack limit = 0xc0338268)
Stack: (0xc0339fb8 to 0xc033a000)
9fa0:                                                       c027ead4 c0028308 
9fc0: 40000013 c0028210 c037c99c c0357ad0 c0022e10 c00089a8 c0008350 00000000 
9fe0: 00000000 c0022e10 00c5387d c0357b40 c0023214 80008034 00000000 00000000 
[<c0028318>] (flush_thread+0x4/0x54) from [<c0028210>] (cpu_idle+0x54/0x88)
[<c0028210>] (cpu_idle+0x54/0x88) from [<c00089a8>] (start_kernel+0x1f0/0x2cc)
[<c00089a8>] (start_kernel+0x1f0/0x2cc) from [<80008034>] (0x80008034)
Code: f1080080 e28dd004 e8bd8000 e92d4030 (e24dd004) 
---[ end trace 93fe427f69eb77d1 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
[<c002c904>] (unwind_backtrace+0x0/0xe8) from [<c004184c>] (panic+0x44/0x124)
[<c004184c>] (panic+0x44/0x124) from [<c004494c>] (do_exit+0x550/0x5d4)
[<c004494c>] (do_exit+0x550/0x5d4) from [<c002ac24>] (die+0x124/0x144)
[<c002ac24>] (die+0x124/0x144) from [<c002d7cc>] (__do_kernel_fault+0x70/0x80)
[<c002d7cc>] (__do_kernel_fault+0x70/0x80) from [<c002d934>] (do_page_fault+0x158/0x24c)
[<c002d934>] (do_page_fault+0x158/0x24c) from [<c0026250>] (do_DataAbort+0x34/0x98)
[<c0026250>] (do_DataAbort+0x34/0x98) from [<c00269cc>] (__dabt_svc+0x4c/0x60)
Exception stack(0xc0339f70 to 0xc0339fb8)
9f60:                                     c033f820 00000000 00000000 c0338000 
9f80: c0338000 c0357b0c c033c36c c033c360 80020a74 4107b364 80020a40 00000000 
9fa0: 00000000 c0339fb8 c002831c c0028318 60000093 ffffffff                   
[<c00269cc>] (__dabt_svc+0x4c/0x60) from [<c0028318>] (flush_thread+0x4/0x54)
[<c0028318>] (flush_thread+0x4/0x54) from [<c0028210>] (cpu_idle+0x54/0x88)
[<c0028210>] (cpu_idle+0x54/0x88) from [<c00089a8>] (start_kernel+0x1f0/0x2cc)
[<c00089a8>] (start_kernel+0x1f0/0x2cc) from [<80008034>] (0x80008034)
Rebooting in 1 seconds..


But there are cases when the crash appears with less than 3Meg irqs.
Once with around 100kHz it took like 100 sec. (10Meg) irqs for it to crash
and it sometimes also looks like this:


Internal error: Oops - undefined instruction: 0 [#1]
Modules linked in: test_drv
CPU: 0    Not tainted  (2.6.31-mx31-spi #29)
Unable to handle kernel paging request at virtual address e1a020e0
Unable to handle kernel paging request at virtual address 1b00001b
Unable to handle kernel paging request at virtual address e1520027
...10 more times the '1b00001b' and 'e1520027' lines ...
Unable to handle kernel paging request at virtual address 1b00001b
Unable to handle kernel NULL pointer dereference at virtual address 000000d8
pgd = c0004000
[000000d8] *pgd=40000193(bad)
Internal error: Oops: 17 [#2]
Modules linked in: test_drv


So am I doing the irq requesting the wrong way? I also tried that
one with the option 'IRQF_SHARED' (it should not be shared anyway)
the crashes (by touching with the bare hand) now look like this:


Internal error: Oops - undefined instruction: 0 [#1]
Modules linked in: test_drv
CPU: 0    Not tainted  (2.6.31-mx31-spi #29)


I checked the
'WARNING: at arch/arm/kernel/process.c:171 cpu_idle+0x74/0x88()'
but could not get my head around the idle_task stuff that's happening
there. May be I did tinker with the '.config' switches too much?
Or is there something missing in the setup of the GPIO ports?

The next thing: I will switch the pin and try if other ones
behave the same way.

Any suggestions welcome :-)

Cheers
Rene



Rene Wolf
LFK-Lenkflugk?rpersysteme GmbH
Human Resources Operations & Policy, HRO
Landshuter Stra?e 26, 85716 Unterschlei?heim, GERMANY
Phone: +49 89 3179 8337
Fax: +49 8252 99 8964
E-Mail: rene.wolf at mbda-systems.de

http://www.mbda.net

Chairman of the Supervisory Board: Antoine Bouvier
Managing Director: Werner Kaltenegger
Registered Office: Schrobenhausen
Commercial Register: Amtsgericht Ingolstadt, HRB 4365

^ permalink raw reply	[flat|nested] 8+ messages in thread

* i.MX31 kernel panic and irq
  2009-10-06 15:08 i.MX31 kernel panic and irq Wolf, Rene, HRO-GP
@ 2009-10-06 15:43 ` Bill Gatliff
  2009-10-07  7:57   ` Wolf, Rene, HRO-GP
  2009-10-07 13:20   ` Russell King - ARM Linux
  0 siblings, 2 replies; 8+ messages in thread
From: Bill Gatliff @ 2009-10-06 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

Wolf, Rene, HRO-GP wrote:
> Hi @ all :-)
>
> This is about a kernel panic I'm experiencing / causing.
> Setup: The system is a DENX QONG EVB-Light. I consists of an i.MX31
> (ARM11) + some flash and an FPGA doing eth. I use a rootfs over NFS
> and the kernel is loaded from tftp. Version 2.6.31 (pulled from
> DENX, which should be equal to the one from kernel.org)
> So inside my kernel module I do that:
>   

The OOPS messages suggest that the machine has run off into stuff that 
isn't code, which would be consistent with the stack pointer getting 
blown out of the stack memory.

I don't know if the i.mx31 kernel does any low-level throttling of 
incoming interrupts, but if it doesn't then a reason why your hand 
gripping the wire might trigger the OOPS is because you are holding the 
pin at an invalid signal level, thereby causing a burst of interrupt 
events that blow up the stack.  I would ignore the results of this test 
case.

I would expect bursts of 100kHz interrupts to be manageable, but not 
sustainable.  So the failure there might be for the same reasons as 
above.  Do you see problems with 10kHz inputs?

This is all speculation, of course...

b.g.

-- 
Bill Gatliff
bgat at billgatliff.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* i.MX31 kernel panic and irq
  2009-10-06 15:43 ` Bill Gatliff
@ 2009-10-07  7:57   ` Wolf, Rene, HRO-GP
  2009-10-07 12:45     ` Bill Gatliff
  2009-10-07 13:20   ` Russell King - ARM Linux
  1 sibling, 1 reply; 8+ messages in thread
From: Wolf, Rene, HRO-GP @ 2009-10-07  7:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Bill.

Thanks for your fast reply!

> low-level throttling of incoming interrupts 

Hmm, I did test it with the option 'IRQF_DISABLED'. That should disable
IRQs while in the ISR, so all IRQs during the ISR should be ignored.
If they are ignored this should not lead to stack overflows, right?
Or is the stack jumble happening during the time the kernel needs to
disable the IRQs?

I did test the setup with 10kHz and it went haywire, too:
1st test: crash after  700k irqs
2nd test: crash after 4200k irqs
3rd test: crash after  200k irqs

So it seams quite random to me. Also tried with no irq events: system
was running for 20 min. without crash.

About the application: the IRQs will come with a freq. of around 200kHz.
in bursts of several dozens. So I'm not quite sure if that is going
to work, will have to try :-)

Thanks again!

Cheers
Rene

Rene Wolf
LFK-Lenkflugk?rpersysteme GmbH
Human Resources Operations & Policy, HRO
Landshuter Stra?e 26, 85716 Unterschlei?heim, GERMANY
Phone: +49 89 3179 8337
Fax: +49 8252 99 8964
E-Mail: rene.wolf at mbda-systems.de

http://www.mbda.net

Chairman of the Supervisory Board: Antoine Bouvier
Managing Director: Werner Kaltenegger
Registered Office: Schrobenhausen
Commercial Register: Amtsgericht Ingolstadt, HRB 4365

-----Urspr?ngliche Nachricht-----
Von: Bill Gatliff [mailto:bgat at billgatliff.com] 
Gesendet: Dienstag, 6. Oktober 2009 17:43
An: Wolf, Rene, HRO-GP
Cc: linux-arm-kernel at lists.infradead.org
Betreff: Re: i.MX31 kernel panic and irq

Wolf, Rene, HRO-GP wrote:
> Hi @ all :-)
>
> This is about a kernel panic I'm experiencing / causing.
> Setup: The system is a DENX QONG EVB-Light. I consists of an i.MX31
> (ARM11) + some flash and an FPGA doing eth. I use a rootfs over NFS
> and the kernel is loaded from tftp. Version 2.6.31 (pulled from
> DENX, which should be equal to the one from kernel.org)
> So inside my kernel module I do that:
>   

The OOPS messages suggest that the machine has run off into stuff that 
isn't code, which would be consistent with the stack pointer getting 
blown out of the stack memory.

I don't know if the i.mx31 kernel does any low-level throttling of 
incoming interrupts, but if it doesn't then a reason why your hand 
gripping the wire might trigger the OOPS is because you are holding the 
pin at an invalid signal level, thereby causing a burst of interrupt 
events that blow up the stack.  I would ignore the results of this test 
case.

I would expect bursts of 100kHz interrupts to be manageable, but not 
sustainable.  So the failure there might be for the same reasons as 
above.  Do you see problems with 10kHz inputs?

This is all speculation, of course...

b.g.

-- 
Bill Gatliff
bgat at billgatliff.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* i.MX31 kernel panic and irq
  2009-10-07  7:57   ` Wolf, Rene, HRO-GP
@ 2009-10-07 12:45     ` Bill Gatliff
  0 siblings, 0 replies; 8+ messages in thread
From: Bill Gatliff @ 2009-10-07 12:45 UTC (permalink / raw)
  To: linux-arm-kernel

Wolf, Rene, HRO-GP wrote:
> Hi Bill.
>
> Thanks for your fast reply!
>
>   
>> low-level throttling of incoming interrupts 
>>     
>
> Hmm, I did test it with the option 'IRQF_DISABLED'. That should disable
> IRQs while in the ISR, so all IRQs during the ISR should be ignored.
> If they are ignored this should not lead to stack overflows, right?
> Or is the stack jumble happening during the time the kernel needs to
> disable the IRQs?
>   

I'd have to look in detail at the code for the IRQ handler, but IIRC it 
re-enables IRQs at some point before even the first IRQF_DISABLED 
handler would get called.  So your explanation could be accurate.

> I did test the setup with 10kHz and it went haywire, too:
> 1st test: crash after  700k irqs
> 2nd test: crash after 4200k irqs
> 3rd test: crash after  200k irqs
>
> So it seams quite random to me. Also tried with no irq events: system
> was running for 20 min. without crash.
>   

Well, "random" when you have limited information.  :)  If it is a stack 
overflow, it would be quite predictable--- at the moment the stack 
overflows!  But you don't know how to reliably set up a condition where 
the stack will overflow, because it will be dependent on whatever else 
the system is busy doing.

I wonder how it fares at 1kHz?

> About the application: the IRQs will come with a freq. of around 200kHz.
> in bursts of several dozens. So I'm not quite sure if that is going
> to work, will have to try :-)
>   

Boy, I sure hope you don't also need a predictable interrupt latency.  
It sounds like the system will definitely fall behind during those bursts.

If your function generator has an amplitude modulation capability, you 
could use that along with a very low-duty-cycle pulse to create 
controlled bursts of any duration and count.  You might need two 
function generators to do it, or perhaps just a couple of 555 timers and 
a soldering iron.

In your previous posts you said that the system could process a few 
200kHz interrupts before it died.  If that number is always greater than 
"several dozens", then you might be still be ok.  Otherwise, you might 
need to bring some additional hardware into the design or switch to a 
different CPU (PPC machines tend to have pretty decent performance at 
such high loads).

Another alternative would be to disable the interrupt source at the 
moment the first interrupt is detected, and then use a polled mode until 
the burst is over.  In fact, if you were to implement that as part of 
your testing and find that your system no longer dies, that's another 
indication that stack overflow might be the root cause.

b.g.

-- 
Bill Gatliff
bgat at billgatliff.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* i.MX31 kernel panic and irq
  2009-10-06 15:43 ` Bill Gatliff
  2009-10-07  7:57   ` Wolf, Rene, HRO-GP
@ 2009-10-07 13:20   ` Russell King - ARM Linux
  2009-10-07 14:44     ` Bill Gatliff
  1 sibling, 1 reply; 8+ messages in thread
From: Russell King - ARM Linux @ 2009-10-07 13:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 06, 2009 at 10:43:26AM -0500, Bill Gatliff wrote:
> The OOPS messages suggest that the machine has run off into stuff that  
> isn't code, which would be consistent with the stack pointer getting  
> blown out of the stack memory.

I don't follow your line of reasoning.  The oops dump was:

Unable to handle kernel paging request at virtual address 60000013
pgd = c0004000
[60000013] *pgd=00000000
Internal error: Oops: 5 [#1]
Modules linked in: test_drv
CPU: 0    Tainted: G        W   (2.6.31-mx31-spi #29)
PC is at cpu_idle+0x28/0x88
LR is at cpu_idle+0x74/0x88
pc : [<c00281e4>]    lr : [<c0028230>]    psr: 40000093
sp : c0339fc8  ip : 80000093  fp : 00000000
r10: 80020a40  r9 : 4107b364  r8 : 80020a74
r7 : c033c360  r6 : c033c36c  r5 : 60000013  r4 : c0028308
r3 : f1080080  r2 : 00000002  r1 : c03599ac  r0 : 00000009
Flags: nZcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 00c5387d  Table: 8fa10000  DAC: 00000017
Process swapper (pid: 0, stack limit = 0xc0338268)
Stack: (0xc0339fc8 to 0xc033a000)
9fc0:                   c037c99c c0357ad0 c0022e10 c00089a8 c0008350 00000000
9fe0: 00000000 c0022e10 00c5387d c0357b40 c0023214 80008034 00000000 00000000
[<c00281e4>] (cpu_idle+0x28/0x88) from [<c00089a8>] (start_kernel+0x1f0/0x2cc)
[<c00089a8>] (start_kernel+0x1f0/0x2cc) from [<80008034>] (0x80008034)
Code: e5943000 e3130002 1a000007 f10c0080 (e5953000)

If we look at this, we can see the following:

1. sp is pointing inside the kernel's direct mapped memory, as it should.
2. it is on an odd-number of pages, which means there's potentially more
   than 4K of space available to the stack.  Plus it's above the stack
   limit.
3. the process name is correct.  This is significant, because it means
   that (sp & ~0x1fff) ends up pointing at a valid thread_info structure,
   which then points at a valid task_struct structure.
4. the stack trace is consistent with pid 0's trace, which is basically
   the kernel boot and idle thread - in other words, it hasn't been
   overwritten by something running down into this page.

To me, it looks like somehow r5 got spuriously corrupted - I think it
should be a pointer to 'hlt_counter', but for some reason it's a PSR
value.

   0:	e5943000 	ldr	r3, [r4]
   4:	e3130002 	tst	r3, #2	; 0x2
   8:	1a000007 	bne	0x2c
   c:	f10c0080 	cpsid	i
  10:	e5953000 	ldr	r3, [r5]

which corresponds to:

                while (!need_resched()) {
                        local_irq_disable();
                        if (hlt_counter) { <== faulting

The question, therefore, is why r5 would be corrupted.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* i.MX31 kernel panic and irq
  2009-10-07 13:20   ` Russell King - ARM Linux
@ 2009-10-07 14:44     ` Bill Gatliff
  2009-10-08  1:21       ` Brian Hutchinson
  2009-10-08  8:16       ` Wolf, Rene, HRO-GP
  0 siblings, 2 replies; 8+ messages in thread
From: Bill Gatliff @ 2009-10-07 14:44 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Tue, Oct 06, 2009 at 10:43:26AM -0500, Bill Gatliff wrote:
>   
>> The OOPS messages suggest that the machine has run off into stuff that  
>> isn't code, which would be consistent with the stack pointer getting  
>> blown out of the stack memory.
>>     
>
> I don't follow your line of reasoning.

I protected myself by saying I was "speculating".  :)

I think some of the other dumps I saw from him were less clear as to the 
problem, which combined with his descriptions of how he was triggering 
the event made me think about stack corruption as a meta-problem.

Stack corruption could result in register value corruption, for sure.  
And the fact that his board doesn't fail in the same way each time makes 
me look more at the pattern of OOPS dumps rather than the individual ones.

But you've definitely looked at this dump in more detail than I did.  So 
I could be speculating in the wrong direction altogether.  I was just 
trying to distract him until you solved the problem.  :)

b.g.

-- 
Bill Gatliff
bgat at billgatliff.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* i.MX31 kernel panic and irq
  2009-10-07 14:44     ` Bill Gatliff
@ 2009-10-08  1:21       ` Brian Hutchinson
  2009-10-08  8:16       ` Wolf, Rene, HRO-GP
  1 sibling, 0 replies; 8+ messages in thread
From: Brian Hutchinson @ 2009-10-08  1:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 7, 2009 at 10:44 AM, Bill Gatliff <bgat@billgatliff.com> wrote:

> Russell King - ARM Linux wrote:
>
>> On Tue, Oct 06, 2009 at 10:43:26AM -0500, Bill Gatliff wrote:
>>
>>
>>> The OOPS messages suggest that the machine has run off into stuff that
>>>  isn't code, which would be consistent with the stack pointer getting  blown
>>> out of the stack memory.
>>>
>>>
>>
>> I don't follow your line of reasoning.
>>
>
> I protected myself by saying I was "speculating".  :)
>
> I think some of the other dumps I saw from him were less clear as to the
> problem, which combined with his descriptions of how he was triggering the
> event made me think about stack corruption as a meta-problem.
>
> Stack corruption could result in register value corruption, for sure.  And
> the fact that his board doesn't fail in the same way each time makes me look
> more at the pattern of OOPS dumps rather than the individual ones.
>
> But you've definitely looked at this dump in more detail than I did.  So I
> could be speculating in the wrong direction altogether.  I was just trying
> to distract him until you solved the problem.  :)
>
> Ha! Bill gives the Jedi wave "this isn't the bug you've been looking for".
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20091007/fcf670ba/attachment.htm>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* i.MX31 kernel panic and irq
  2009-10-07 14:44     ` Bill Gatliff
  2009-10-08  1:21       ` Brian Hutchinson
@ 2009-10-08  8:16       ` Wolf, Rene, HRO-GP
  1 sibling, 0 replies; 8+ messages in thread
From: Wolf, Rene, HRO-GP @ 2009-10-08  8:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi @all.

@Bill

> Well, "random" when you have limited information.  :) 

You're right of course :-)

> I wonder how it fares at 1kHz?

Results from 1kHz:
20 min. and running ...
Also did produce some load: make on small project +
top over ssh with d=0.1. Went without crash.

> Boy, I sure hope you don't also need a predictable interrupt latency.  
> It sounds like the system will definitely fall behind during those
> bursts.

Well.... sort of. But I failed to mention, that this project is also an
investigation on how fast / good the planned stuff works. So if the
result is, that it's reliable only with a pre-divider (of, say 512 ->
200kHz/512) well then that's the result :-)

I was really wondering about the kernel panic. Coz for being a non
real time system (I didn't change anything in that direction on my kernel)
I would have expected it to 'randomly' (on load situations) miss IRQs
but not crash the kernel?

@Russell

> The question, therefore, is why r5 would be corrupted.

Is that something I could have broken with some sort of '.config' switch?

Thanks for your help :-)

Cheers
Rene 

Rene Wolf
LFK-Lenkflugk?rpersysteme GmbH
Human Resources Operations & Policy, HRO
Landshuter Stra?e 26, 85716 Unterschlei?heim, GERMANY
Phone: +49 89 3179 8337
Fax: +49 8252 99 8964
E-Mail: rene.wolf at mbda-systems.de

http://www.mbda.net

Chairman of the Supervisory Board: Antoine Bouvier
Managing Director: Werner Kaltenegger
Registered Office: Schrobenhausen
Commercial Register: Amtsgericht Ingolstadt, HRB 4365

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-10-08  8:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-06 15:08 i.MX31 kernel panic and irq Wolf, Rene, HRO-GP
2009-10-06 15:43 ` Bill Gatliff
2009-10-07  7:57   ` Wolf, Rene, HRO-GP
2009-10-07 12:45     ` Bill Gatliff
2009-10-07 13:20   ` Russell King - ARM Linux
2009-10-07 14:44     ` Bill Gatliff
2009-10-08  1:21       ` Brian Hutchinson
2009-10-08  8:16       ` Wolf, Rene, HRO-GP

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).