From mboxrd@z Thu Jan 1 00:00:00 1970 From: Philippe Gerum In-Reply-To: <20101008070148.GB2255@domain.hid> References: <20101007115728.GA24500@domain.hid> <4CADBDC2.8080600@domain.hid> <20101008070148.GB2255@domain.hid> Content-Type: text/plain; charset="UTF-8" Date: Fri, 08 Oct 2010 10:17:28 +0200 Message-ID: <1286525848.13186.93.camel@domain.hid> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-help] kernel oopses when killing realtime task List-Id: Help regarding installation and common use of Xenomai List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Pavel Machek Cc: Jan Kiszka , xenomai@xenomai.org On Fri, 2010-10-08 at 09:01 +0200, Pavel Machek wrote: > Hi! > > > > I have... quite an interesting setup here. > > > > > > SMP machine, with special PCI card; that card has GPIOs and serial > > > ports. Unfortunately, there's only one interrupt, shared between > > > serials and GPIO pins, and serials are way too complex to be handled > > > by realtime layer. > > > > > > So I ended up with > > > > > > // we also have an interrupt handler: > > > ret = rtdm_irq_request(&my_context->irq_handle, > > > gpio_rt_config.irq, demo_interrupt, > > > RTDM_IRQTYPE_SHARED, > > > context->device->proc_name, my_context); > > > > > > and > > > > > > static int demo_interrupt(rtdm_irq_t *irq_context) > > > { > > > struct demodrv_context *ctx; > > > int dev_id; > > > int ret = RTDM_IRQ_HANDLED; // usual return value > > > unsigned pending, output; > > > > > > ctx = rtdm_irq_get_arg(irq_context, struct demodrv_context); > > > dev_id = ctx->dev_id; > > > > > > if (!ctx->ready) { > > > printk(KERN_CRIT "Unexpected interrupt\n"); > > > return XN_ISR_PROPAGATE; > > > > Who sets ready and when? Looks racy. > > Debugging aid; yes, this one is racy. > > > > rtdm_lock_put(&ctx->lock); > > > > > > /* We need to propagate the interrupt, so that PMC-6L serials > > > work. Result is that interrupt latencies can't be > > > guaranteed when serials are in use. */ > > > > > > return RTDM_IRQ_HANDLED; > > > } > > > > > > Unregistration is: > > > my_context->ready = 0; > > > rtdm_irq_disable(&my_context->irq_handle); > > > > Where is rtdm_irq_free? Again, this ready flag looks racy. > > Aha, sorry, I quoted wrong snippet. rtdm_irq_free() follows > immediately, like this: > > int demo_close_rt(struct rtdm_dev_context *context, > rtdm_user_info_t *user_info) > { > struct demodrv_context *my_context; > rtdm_lockctx_t lock_ctx; > // get the context > my_context = (struct demodrv_context *)context->dev_private; > > // if we need to do some stuff with preemption disabled: > rtdm_lock_get_irqsave(&my_context->lock, lock_ctx); > > my_context->ready = 0; > rtdm_irq_disable(&my_context->irq_handle); > > > // free irq in RTDM > rtdm_irq_free(&my_context->irq_handle); > > // destroy our interrupt signal/event > rtdm_event_destroy(&my_context->irq_event); > > // other stuff here > rtdm_lock_put_irqrestore(&my_context->lock, lock_ctx); > > return 0; > } > > Now... I'm aware that lock_get/put around irq_free should be > unneccessary, as should be irq_disable and my ->ready flag. Those were > my attempts to work around the problem. I'll attach the full source at > the end. > > > > Unfortunately, when the userspace app is ran and killed repeatedly (so > > > that interrupt is registered/unregistered all the time), I get > > > oopses in __ipipe_dispatch_wired() -- it seems to call into the NULL > > > pointer. > > > > > > I decided that "wired" interrupt when the source is shared between > > > Linux and Xenomai, is wrong thing, so I disable "wired" interrupts > > > altogether, but that only moved oops to __virq_end. > > > > This is wrong. The only way to get a determistically shared IRQs across > > domains is via the wired path, either using the pattern Gilles cited or, > > in a slight variation, signaling down via a separate rtdm_nrtsig. > > For now, I'm trying to get it not to oops; deterministic latencies are > the next topic :-(. The main issue is that we don't lock our IRQ descriptors (the pipeline ones) when running the handlers, so another CPU clearing them via ipipe_virtualize_irq() may well sink the boat... The unwritten rule has always been to assume that drivers would stop _and_ drain interrupts on all CPUs before unregistering handlers, then exiting the code. Granted, that's a bit much. -- Philippe.