From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4CAED93C.20500@domain.hid> Date: Fri, 08 Oct 2010 10:41:32 +0200 From: Jan Kiszka MIME-Version: 1.0 References: <20101007115728.GA24500@domain.hid> <4CADBDC2.8080600@domain.hid> <20101008070148.GB2255@domain.hid> <1286525848.13186.93.camel@domain.hid> In-Reply-To: <1286525848.13186.93.camel@domain.hid> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-help] kernel oopses when killing realtime task List-Id: Help regarding installation and common use of Xenomai List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Philippe Gerum Cc: "xenomai@xenomai.org" Am 08.10.2010 10:17, Philippe Gerum wrote: > On Fri, 2010-10-08 at 09:01 +0200, Pavel Machek wrote: >> Hi! >> >>>> I have... quite an interesting setup here. >>>> >>>> SMP machine, with special PCI card; that card has GPIOs and serial >>>> ports. Unfortunately, there's only one interrupt, shared between >>>> serials and GPIO pins, and serials are way too complex to be handled >>>> by realtime layer. >>>> >>>> So I ended up with >>>> >>>> // we also have an interrupt handler: >>>> ret = rtdm_irq_request(&my_context->irq_handle, >>>> gpio_rt_config.irq, demo_interrupt, >>>> RTDM_IRQTYPE_SHARED, >>>> context->device->proc_name, my_context); >>>> >>>> and >>>> >>>> static int demo_interrupt(rtdm_irq_t *irq_context) >>>> { >>>> struct demodrv_context *ctx; >>>> int dev_id; >>>> int ret = RTDM_IRQ_HANDLED; // usual return value >>>> unsigned pending, output; >>>> >>>> ctx = rtdm_irq_get_arg(irq_context, struct demodrv_context); >>>> dev_id = ctx->dev_id; >>>> >>>> if (!ctx->ready) { >>>> printk(KERN_CRIT "Unexpected interrupt\n"); >>>> return XN_ISR_PROPAGATE; >>> >>> Who sets ready and when? Looks racy. >> >> Debugging aid; yes, this one is racy. >> >>>> rtdm_lock_put(&ctx->lock); >>>> >>>> /* We need to propagate the interrupt, so that PMC-6L serials >>>> work. Result is that interrupt latencies can't be >>>> guaranteed when serials are in use. */ >>>> >>>> return RTDM_IRQ_HANDLED; >>>> } >>>> >>>> Unregistration is: >>>> my_context->ready = 0; >>>> rtdm_irq_disable(&my_context->irq_handle); >>> >>> Where is rtdm_irq_free? Again, this ready flag looks racy. >> >> Aha, sorry, I quoted wrong snippet. rtdm_irq_free() follows >> immediately, like this: >> >> int demo_close_rt(struct rtdm_dev_context *context, >> rtdm_user_info_t *user_info) >> { >> struct demodrv_context *my_context; >> rtdm_lockctx_t lock_ctx; >> // get the context >> my_context = (struct demodrv_context *)context->dev_private; >> >> // if we need to do some stuff with preemption disabled: >> rtdm_lock_get_irqsave(&my_context->lock, lock_ctx); >> >> my_context->ready = 0; >> rtdm_irq_disable(&my_context->irq_handle); >> >> >> // free irq in RTDM >> rtdm_irq_free(&my_context->irq_handle); >> >> // destroy our interrupt signal/event >> rtdm_event_destroy(&my_context->irq_event); >> >> // other stuff here >> rtdm_lock_put_irqrestore(&my_context->lock, lock_ctx); >> >> return 0; >> } >> >> Now... I'm aware that lock_get/put around irq_free should be >> unneccessary, as should be irq_disable and my ->ready flag. Those were >> my attempts to work around the problem. I'll attach the full source at >> the end. >> >>>> Unfortunately, when the userspace app is ran and killed repeatedly (so >>>> that interrupt is registered/unregistered all the time), I get >>>> oopses in __ipipe_dispatch_wired() -- it seems to call into the NULL >>>> pointer. >>>> >>>> I decided that "wired" interrupt when the source is shared between >>>> Linux and Xenomai, is wrong thing, so I disable "wired" interrupts >>>> altogether, but that only moved oops to __virq_end. >>> >>> This is wrong. The only way to get a determistically shared IRQs across >>> domains is via the wired path, either using the pattern Gilles cited or, >>> in a slight variation, signaling down via a separate rtdm_nrtsig. >> >> For now, I'm trying to get it not to oops; deterministic latencies are >> the next topic :-(. > > The main issue is that we don't lock our IRQ descriptors (the pipeline > ones) when running the handlers, so another CPU clearing them via > ipipe_virtualize_irq() may well sink the boat... > > The unwritten rule has always been to assume that drivers would stop > _and_ drain interrupts on all CPUs before unregistering handlers, then > exiting the code. Granted, that's a bit much. IIRC, we drain at nucleus-level if statistic are enabled. I guess we should make this unconditional. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux