[Adeos-main] NULL interrupt handler "ipd->irqs[irq].handler" in __ipipe_run

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Adeos-main] NULL interrupt handler "ipd->irqs[irq].handler" in __ipipe_run_irq()
@ 2011-09-01  2:22 Tom Evans
  2011-09-01  9:56 ` Philippe Gerum
  0 siblings, 1 reply; 3+ messages in thread
From: Tom Evans @ 2011-09-01  2:22 UTC (permalink / raw)
  To: adeos-main

This problem has probably been solved years ago, but Google and 
searching this list didn't find me anything.

I'm running an old (2006) Linux 2.4 kernel with Xenomai 2.1 with the 
Adeos patches on an MPC5200 (ppc).

Every now and then when I stress the system it crashes because 
"ipd->irqs[irq].handler" is NULL for "irq == 1" (a valid irq on this 
system) in this code:

kernel/include/asm/ipipe.h::

#define __ipipe_run_isr(ipd, irq, cpuid)  \
do {                                      \
     if (ipd == ipipe_root_domain) {       \
         /*                                \
          * Linux handlers are called w/ hw interrupts on so \
          * that they could not defer interrupts for higher  \
          * priority domains.                                \
          */                                                 \
         local_irq_enable_hw();                              \
         ((void (*)(unsigned, struct pt_regs *))             \
          ipd->irqs[irq].handler) (irq, __ipipe_tick_regs + cpuid); \
         local_irq_disable_hw();                             \
     } else {                                                \
         __clear_bit(IPIPE_SYNC_FLAG, &cpudata->status);     \
         ipd->irqs[irq].handler(irq,ipd->irqs[irq].cookie);  \
         __set_bit(IPIPE_SYNC_FLAG, &cpudata->status);       \
     }                                                       \
} while(0)

If I add code to printk() when there's a NULL handler and also add a 
printk() to ipipe_virtualize_irq() to detail all interrupt registrations 
and de-registrations I get the following:

[   53.32] 1080:closing...
[   53.32] ipipe_virtualize_irq(256, 0x00000000)
[   53.32] ipipe_virtualize_irq(56, 0x00000000)
[   53.34] 1463:mscan_hwrelease out
[   53.34] ipipe_virtualize_irq(57, 0x00000000)
[   53.34] 1463:mscan_hwrelease out
[   53.35] pcan: pccard_release()
[   53.35] ipipe_virtualize_irq(1, 0x00000000)
[   53.36] __ipipe_run_isr(, 1, ) handler is NULL! #######

So it looks like the interrupt is happening in hardware and being queued 
and THEN it is being deregistered (with the handler being set to zero in 
ipipe_virtualize_irq()) and then it is being pulled from the pipe, run 
and (usually) crashes.

I've checked all the Adeos patches I can find for all architectures up 
to the current date, and none of them have had changes made to check for 
the condition of a NULL interrupt handler in the pipe.

Simply adding a test in __ipipe_run_isr() to ignore these entries seems 
to fix this problem for me.

The other solution I can think of would be to make 
ipipe_virtualize_irq() smarter so on deregistration it removes any 
pending interrupts from the pipelines. Has that been done in any newer 
versions?

This problem might match the old (2007) and long running (40 messages) 
bug report "Re: Xenomai and MSI enabled crashes kernel" listed here:

http://thread.gmane.org/gmane.linux.real-time.xenomai.users/3643/focus=3657

I'd be interested in any observations, comments or pointers to the "real 
cause" and any other "real fixes".

Tom Evans

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Adeos-main] NULL interrupt handler "ipd->irqs[irq].handler" in __ipipe_run_irq()
  2011-09-01  2:22 [Adeos-main] NULL interrupt handler "ipd->irqs[irq].handler" in __ipipe_run_irq() Tom Evans
@ 2011-09-01  9:56 ` Philippe Gerum
  2011-09-05 23:36   ` Tom Evans
  0 siblings, 1 reply; 3+ messages in thread
From: Philippe Gerum @ 2011-09-01  9:56 UTC (permalink / raw)
  To: Tom Evans; +Cc: adeos-main

On Thu, 2011-09-01 at 12:22 +1000, Tom Evans wrote:
> This problem has probably been solved years ago, but Google and 
> searching this list didn't find me anything.
> 
> I'm running an old (2006) Linux 2.4 kernel with Xenomai 2.1 with the 
> Adeos patches on an MPC5200 (ppc).
> 
> Every now and then when I stress the system it crashes because 
> "ipd->irqs[irq].handler" is NULL for "irq == 1" (a valid irq on this 
> system) in this code:

<snip>

> So it looks like the interrupt is happening in hardware and being queued 
> and THEN it is being deregistered (with the handler being set to zero in 
> ipipe_virtualize_irq()) and then it is being pulled from the pipe, run 
> and (usually) crashes.
> 
> I've checked all the Adeos patches I can find for all architectures up 
> to the current date, and none of them have had changes made to check for 
> the condition of a NULL interrupt handler in the pipe.
> 
> Simply adding a test in __ipipe_run_isr() to ignore these entries seems 
> to fix this problem for me.
> 
> The other solution I can think of would be to make 
> ipipe_virtualize_irq() smarter so on deregistration it removes any 
> pending interrupts from the pipelines. Has that been done in any newer 
> versions?
> 
> This problem might match the old (2007) and long running (40 messages) 
> bug report "Re: Xenomai and MSI enabled crashes kernel" listed here:
> 
> http://thread.gmane.org/gmane.linux.real-time.xenomai.users/3643/focus=3657
> 

Actually, the issue discussed in this thread is MSI+x86 specific,
related to the interrupt namespace, so this does not apply to your case.

> I'd be interested in any observations, comments or pointers to the "real 
> cause" and any other "real fixes".

ipipe_virtualize_irq() is an internal service which should be called for
unregistering an IRQ only after the source was shut at device level, and
possibly masked on the interrupt controller. It must be called with
interrupt enabled for the domain which owns the unregistered handler.
On uniprocessor systems, these two conditions are enough to make sure
that no IRQ is lingering in the interrupt log after the handler was
nullified.

I can't spot the routines appearing in the backtrace you sent in the
vanilla linux/xenomai code I have at hand, but if this is a real-time
CAN stack, you may want to check whether the device is properly quiesced
and the IRQ line masked prior to unregistering the interrupt in the
pipeline.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Adeos-main] NULL interrupt handler "ipd->irqs[irq].handler" in __ipipe_run_irq()
  2011-09-01  9:56 ` Philippe Gerum
@ 2011-09-05 23:36   ` Tom Evans
  0 siblings, 0 replies; 3+ messages in thread
From: Tom Evans @ 2011-09-05 23:36 UTC (permalink / raw)
  To: adeos-main; +Cc: Philippe Gerum

Philippe Gerum wrote:
> On Thu, 2011-09-01 at 12:22 +1000, Tom Evans wrote:
>> This problem has probably been solved years ago, but Google and 
>> searching this list didn't find me anything.
>>
>> I'm running an old (2006) Linux 2.4 kernel with Xenomai 2.1 with the 
>> Adeos patches on an MPC5200 (ppc).
>>
>> Every now and then when I stress the system it crashes because 
>> "ipd->irqs[irq].handler" is NULL for "irq == 1" (a valid irq on this 
>> system) in this code:
> 
> <snip>
> 
>> So it looks like the interrupt is happening in hardware and being queued 
>> and THEN it is being deregistered (with the handler being set to zero in 
>> ipipe_virtualize_irq()) and then it is being pulled from the pipe, run 
>> and (usually) crashes.

<snip>

>> I'd be interested in any observations, comments or pointers to the "real 
>> cause" and any other "real fixes".
> 
> ipipe_virtualize_irq() is an internal service which should be called for
> unregistering an IRQ only after the source was shut at device level, and
> possibly masked on the interrupt controller. It must be called with
> interrupt enabled for the domain which owns the unregistered handler.
> On uniprocessor systems, these two conditions are enough to make sure
> that no IRQ is lingering in the interrupt log after the handler was
> nullified.
> 
> I can't spot the routines appearing in the backtrace you sent in the
> vanilla linux/xenomai code I have at hand, but if this is a real-time
> CAN stack, you may want to check whether the device is properly quiesced
> and the IRQ line masked prior to unregistering the interrupt in the
> pipeline.

Thanks for your prompt and detailed reply.

Yes, it is a real time CAN stack. It supports Philips SJA1000 CAN chips 
on Peak Systems PCMCIA cards connected through TI PCI1520 PCI bridge 
chips (using the "Yenta" drivers) through a Freescale MPC5200's PCI 
interface The four CAN chips and the PCMCIA Bridge all use a single 
shared interrupt. The CAN chip interrupts are real-time, but if they 
find none of the CAN chips are responsible the interrupt is handballed 
(via XN_ISR_PROPAGATE) to the Linux-based PCMCIA code to see if it was a 
card insert event.

There's a lot to go wrong. Frankly it is amazing it works as well as it 
does. I can't guarantee that "all interrupt sources are shut down" at 
the time of the ipipe_virtualize_irq() because of the PCMCIA sharing.

I've found that checking for a null interrupt vector and ignoring it 
solves my problem, and makes the code more robust against any other 
corner cases.

Tom

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-09-05 23:36 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-01  2:22 [Adeos-main] NULL interrupt handler "ipd->irqs[irq].handler" in __ipipe_run_irq() Tom Evans
2011-09-01  9:56 ` Philippe Gerum
2011-09-05 23:36   ` Tom Evans

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.