From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4E655CE9.7080006@domain.hid> Date: Tue, 06 Sep 2011 09:36:09 +1000 From: Tom Evans MIME-Version: 1.0 References: <4E5EEC62.2020307@domain.hid> <1314871016.18201.14.camel@domain.hid> In-Reply-To: <1314871016.18201.14.camel@domain.hid> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Adeos-main] NULL interrupt handler "ipd->irqs[irq].handler" in __ipipe_run_irq() List-Id: General discussion about Adeos List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: adeos-main@gna.org Cc: Philippe Gerum Philippe Gerum wrote: > On Thu, 2011-09-01 at 12:22 +1000, Tom Evans wrote: >> This problem has probably been solved years ago, but Google and >> searching this list didn't find me anything. >> >> I'm running an old (2006) Linux 2.4 kernel with Xenomai 2.1 with the >> Adeos patches on an MPC5200 (ppc). >> >> Every now and then when I stress the system it crashes because >> "ipd->irqs[irq].handler" is NULL for "irq == 1" (a valid irq on this >> system) in this code: > > > >> So it looks like the interrupt is happening in hardware and being queued >> and THEN it is being deregistered (with the handler being set to zero in >> ipipe_virtualize_irq()) and then it is being pulled from the pipe, run >> and (usually) crashes. >> I'd be interested in any observations, comments or pointers to the "real >> cause" and any other "real fixes". > > ipipe_virtualize_irq() is an internal service which should be called for > unregistering an IRQ only after the source was shut at device level, and > possibly masked on the interrupt controller. It must be called with > interrupt enabled for the domain which owns the unregistered handler. > On uniprocessor systems, these two conditions are enough to make sure > that no IRQ is lingering in the interrupt log after the handler was > nullified. > > I can't spot the routines appearing in the backtrace you sent in the > vanilla linux/xenomai code I have at hand, but if this is a real-time > CAN stack, you may want to check whether the device is properly quiesced > and the IRQ line masked prior to unregistering the interrupt in the > pipeline. Thanks for your prompt and detailed reply. Yes, it is a real time CAN stack. It supports Philips SJA1000 CAN chips on Peak Systems PCMCIA cards connected through TI PCI1520 PCI bridge chips (using the "Yenta" drivers) through a Freescale MPC5200's PCI interface The four CAN chips and the PCMCIA Bridge all use a single shared interrupt. The CAN chip interrupts are real-time, but if they find none of the CAN chips are responsible the interrupt is handballed (via XN_ISR_PROPAGATE) to the Linux-based PCMCIA code to see if it was a card insert event. There's a lot to go wrong. Frankly it is amazing it works as well as it does. I can't guarantee that "all interrupt sources are shut down" at the time of the ipipe_virtualize_irq() because of the PCMCIA sharing. I've found that checking for a null interrupt vector and ignoring it solves my problem, and makes the code more robust against any other corner cases. Tom