[Xenomai-help] kernel oopses when killing realtime task

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai-help] kernel oopses when killing realtime task
@ 2010-10-07 11:57 Pavel Machek
  2010-10-07 12:11 ` Gilles Chanteperdrix
                   ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Pavel Machek @ 2010-10-07 11:57 UTC (permalink / raw)
  To: xenomai

Hi!

I have... quite an interesting setup here.

SMP machine, with special PCI card; that card has GPIOs and serial
ports. Unfortunately, there's only one interrupt, shared between
serials and GPIO pins, and serials are way too complex to be handled
by realtime layer.

So I ended up with

        // we also have an interrupt handler:                                                                                                                 
        ret = rtdm_irq_request(&my_context->irq_handle,
        gpio_rt_config.irq, demo_interrupt,
                               RTDM_IRQTYPE_SHARED,
        context->device->proc_name, my_context);

and 

static int demo_interrupt(rtdm_irq_t *irq_context)
{
        struct demodrv_context *ctx;
        int           dev_id;
        int           ret = RTDM_IRQ_HANDLED; // usual return value                                                                                           
        unsigned pending, output;

        ctx = rtdm_irq_get_arg(irq_context, struct demodrv_context);
        dev_id    = ctx->dev_id;

        if (!ctx->ready) {
                printk(KERN_CRIT "Unexpected interrupt\n");
                return XN_ISR_PROPAGATE;
        }               

        rtdm_lock_get(&ctx->lock);

        pending = pgread(GPIO_IRQ_ID);
        pgwrite(0, GPIO_IRQ_ID);
        output = pgread(GPIO_STATUS_OUT);
        output ^= (1<<3);
        pgwrite(output, GPIO_STATUS_OUT);

        // do stuff                                                                                                                                           
        if (events > 1000) {
                rtdm_event_signal(&ctx->irq_event);
                events = 0;
        }
        events++;
        
        rtdm_lock_put(&ctx->lock);
 
        /* We need to propagate the interrupt, so that PMC-6L serials                                                                                         
           work. Result is that interrupt latencies can't be                                                                                                  
           guaranteed when serials are in use.  */

         return RTDM_IRQ_HANDLED;
}

Unregistration is:
        my_context->ready = 0;
        rtdm_irq_disable(&my_context->irq_handle);


Unfortunately, when the userspace app is ran and killed repeatedly (so
that interrupt is registered/unregistered all the time), I get
oopses in __ipipe_dispatch_wired() -- it seems to call into the NULL
pointer.

I decided that "wired" interrupt when the source is shared between
Linux and Xenomai, is wrong thing, so I disable "wired" interrupts
altogether, but that only moved oops to __virq_end. 

I'm using 2.6.27.21-ELinOS-46 with xenomai-2.4.7 . Problem does go
away if I boot with maxcpus=1.

Any ideas? (Besides using non-historic kernel; but that's
unfortunately not exactly easy here.)

								Pavel



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-07 11:57 [Xenomai-help] kernel oopses when killing realtime task Pavel Machek
@ 2010-10-07 12:11 ` Gilles Chanteperdrix
  2010-10-07 13:00   ` Gilles Chanteperdrix
  2010-10-07 12:32 ` Jan Kiszka
  2010-10-07 14:07 ` Philippe Gerum
  2 siblings, 1 reply; 51+ messages in thread
From: Gilles Chanteperdrix @ 2010-10-07 12:11 UTC (permalink / raw)
  To: Pavel Machek; +Cc: xenomai

Pavel Machek wrote:
> Hi!
> 
> I have... quite an interesting setup here.
> 
> SMP machine, with special PCI card; that card has GPIOs and serial
> ports. Unfortunately, there's only one interrupt, shared between
> serials and GPIO pins, and serials are way too complex to be handled
> by realtime layer.

If the configuration is fixed, you may handle the shared interrupt in
the way described here, and you should not have the loss of real-time
guarantees:
https://mail.gna.org/public/xenomai-core/2008-07/msg00025.html

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-07 11:57 [Xenomai-help] kernel oopses when killing realtime task Pavel Machek
  2010-10-07 12:11 ` Gilles Chanteperdrix
@ 2010-10-07 12:32 ` Jan Kiszka
  2010-10-08  7:01   ` Pavel Machek
  2010-10-07 14:07 ` Philippe Gerum
  2 siblings, 1 reply; 51+ messages in thread
From: Jan Kiszka @ 2010-10-07 12:32 UTC (permalink / raw)
  To: Pavel Machek; +Cc: xenomai

Am 07.10.2010 13:57, Pavel Machek wrote:
> Hi!
> 
> I have... quite an interesting setup here.
> 
> SMP machine, with special PCI card; that card has GPIOs and serial
> ports. Unfortunately, there's only one interrupt, shared between
> serials and GPIO pins, and serials are way too complex to be handled
> by realtime layer.
> 
> So I ended up with
> 
>         // we also have an interrupt handler:                                                                                                                 
>         ret = rtdm_irq_request(&my_context->irq_handle,
>         gpio_rt_config.irq, demo_interrupt,
>                                RTDM_IRQTYPE_SHARED,
>         context->device->proc_name, my_context);
> 
> and 
> 
> static int demo_interrupt(rtdm_irq_t *irq_context)
> {
>         struct demodrv_context *ctx;
>         int           dev_id;
>         int           ret = RTDM_IRQ_HANDLED; // usual return value                                                                                           
>         unsigned pending, output;
> 
>         ctx = rtdm_irq_get_arg(irq_context, struct demodrv_context);
>         dev_id    = ctx->dev_id;
> 
>         if (!ctx->ready) {
>                 printk(KERN_CRIT "Unexpected interrupt\n");
>                 return XN_ISR_PROPAGATE;

Who sets ready and when? Looks racy.

>         }               
> 
>         rtdm_lock_get(&ctx->lock);
> 
>         pending = pgread(GPIO_IRQ_ID);
>         pgwrite(0, GPIO_IRQ_ID);
>         output = pgread(GPIO_STATUS_OUT);
>         output ^= (1<<3);
>         pgwrite(output, GPIO_STATUS_OUT);
> 
>         // do stuff                                                                                                                                           
>         if (events > 1000) {
>                 rtdm_event_signal(&ctx->irq_event);
>                 events = 0;
>         }
>         events++;
>         
>         rtdm_lock_put(&ctx->lock);
>  
>         /* We need to propagate the interrupt, so that PMC-6L serials                                                                                         
>            work. Result is that interrupt latencies can't be                                                                                                  
>            guaranteed when serials are in use.  */
> 
>          return RTDM_IRQ_HANDLED;
> }
> 
> Unregistration is:
>         my_context->ready = 0;
>         rtdm_irq_disable(&my_context->irq_handle);

Where is rtdm_irq_free? Again, this ready flag looks racy.

> 
> 
> Unfortunately, when the userspace app is ran and killed repeatedly (so
> that interrupt is registered/unregistered all the time), I get
> oopses in __ipipe_dispatch_wired() -- it seems to call into the NULL
> pointer.
> 
> I decided that "wired" interrupt when the source is shared between
> Linux and Xenomai, is wrong thing, so I disable "wired" interrupts
> altogether, but that only moved oops to __virq_end. 

This is wrong. The only way to get a determistically shared IRQs across
domains is via the wired path, either using the pattern Gilles cited or,
in a slight variation, signaling down via a separate rtdm_nrtsig.

> 
> I'm using 2.6.27.21-ELinOS-46 with xenomai-2.4.7 . Problem does go
> away if I boot with maxcpus=1.
> 
> Any ideas? (Besides using non-historic kernel; but that's
> unfortunately not exactly easy here.)
> 
> 								Pavel

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-07 12:11 ` Gilles Chanteperdrix
@ 2010-10-07 13:00   ` Gilles Chanteperdrix
  0 siblings, 0 replies; 51+ messages in thread
From: Gilles Chanteperdrix @ 2010-10-07 13:00 UTC (permalink / raw)
  To: Pavel Machek; +Cc: xenomai

Gilles Chanteperdrix wrote:
> Pavel Machek wrote:
>> Hi!
>>
>> I have... quite an interesting setup here.
>>
>> SMP machine, with special PCI card; that card has GPIOs and serial
>> ports. Unfortunately, there's only one interrupt, shared between
>> serials and GPIO pins, and serials are way too complex to be handled
>> by realtime layer.
> 
> If the configuration is fixed, you may handle the shared interrupt in
> the way described here, and you should not have the loss of real-time
> guarantees:
> https://mail.gna.org/public/xenomai-core/2008-07/msg00025.html

That code is a bit sketchy, and probably does not even work correctly on
SMP...

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-07 11:57 [Xenomai-help] kernel oopses when killing realtime task Pavel Machek
  2010-10-07 12:11 ` Gilles Chanteperdrix
  2010-10-07 12:32 ` Jan Kiszka
@ 2010-10-07 14:07 ` Philippe Gerum
  2 siblings, 0 replies; 51+ messages in thread
From: Philippe Gerum @ 2010-10-07 14:07 UTC (permalink / raw)
  To: Pavel Machek; +Cc: xenomai

On Thu, 2010-10-07 at 13:57 +0200, Pavel Machek wrote:
> Hi!
> 
> I have... quite an interesting setup here.
> 
> SMP machine, with special PCI card; that card has GPIOs and serial
> ports. Unfortunately, there's only one interrupt, shared between
> serials and GPIO pins, and serials are way too complex to be handled
> by realtime layer.
> 

[snip]

> 
> Unregistration is:
>         my_context->ready = 0;
>         rtdm_irq_disable(&my_context->irq_handle);
> 
> 
> Unfortunately, when the userspace app is ran and killed repeatedly (so
> that interrupt is registered/unregistered all the time), I get
> oopses in __ipipe_dispatch_wired() -- it seems to call into the NULL
> pointer.

rtdm_irq_disable() will only mask the IRQ in the PIC, not unregister the
handler from the pipeline core. In case your handler is pulled out from
the kernel as part of a module, this may explain this behavior.
rtdm_irq_free() is something you likely look after.

> 
> I decided that "wired" interrupt when the source is shared between
> Linux and Xenomai, is wrong thing, so I disable "wired" interrupts
> altogether, but that only moved oops to __virq_end. 
> 

"wired" just means "deliver to topmost domain unconditionally", this is
a way to optimize the IRQ delivery path to the real-time handler,
without going through the entire domain propagation logic. As you
already found out, this is not related to registration issues.

> I'm using 2.6.27.21-ELinOS-46 with xenomai-2.4.7 . Problem does go
> away if I boot with maxcpus=1.
> 
> Any ideas? (Besides using non-historic kernel; but that's
> unfortunately not exactly easy here.)

The I-pipe code for x86 had a number of fixes since 2.6.27, including
rather serious ones. At the very least, I would recommend to check
whether this one is in your tree:
http://git.denx.de/?p=ipipe-2.6.git;a=commit;h=80a79bb5b0e74d75fa4a511d7b9a08a015e37f46

This may decrease worst case latency in SMP quite significantly under
load.

> 
> 								Pavel
> 
> 
> _______________________________________________
> Xenomai-help mailing list
> Xenomai-help@domain.hid
> https://mail.gna.org/listinfo/xenomai-help

-- 
Philippe.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-07 12:32 ` Jan Kiszka
@ 2010-10-08  7:01   ` Pavel Machek
  2010-10-08  7:20     ` Gilles Chanteperdrix
                       ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Pavel Machek @ 2010-10-08  7:01 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Hi!

> > I have... quite an interesting setup here.
> > 
> > SMP machine, with special PCI card; that card has GPIOs and serial
> > ports. Unfortunately, there's only one interrupt, shared between
> > serials and GPIO pins, and serials are way too complex to be handled
> > by realtime layer.
> > 
> > So I ended up with
> > 
> >         // we also have an interrupt handler:                                                                                                                 
> >         ret = rtdm_irq_request(&my_context->irq_handle,
> >         gpio_rt_config.irq, demo_interrupt,
> >                                RTDM_IRQTYPE_SHARED,
> >         context->device->proc_name, my_context);
> > 
> > and 
> > 
> > static int demo_interrupt(rtdm_irq_t *irq_context)
> > {
> >         struct demodrv_context *ctx;
> >         int           dev_id;
> >         int           ret = RTDM_IRQ_HANDLED; // usual return value                                                                                           
> >         unsigned pending, output;
> > 
> >         ctx = rtdm_irq_get_arg(irq_context, struct demodrv_context);
> >         dev_id    = ctx->dev_id;
> > 
> >         if (!ctx->ready) {
> >                 printk(KERN_CRIT "Unexpected interrupt\n");
> >                 return XN_ISR_PROPAGATE;
> 
> Who sets ready and when? Looks racy.

Debugging aid; yes, this one is racy.

> >         rtdm_lock_put(&ctx->lock);
> >  
> >         /* We need to propagate the interrupt, so that PMC-6L serials                                                                                         
> >            work. Result is that interrupt latencies can't be                                                                                                  
> >            guaranteed when serials are in use.  */
> > 
> >          return RTDM_IRQ_HANDLED;
> > }
> > 
> > Unregistration is:
> >         my_context->ready = 0;
> >         rtdm_irq_disable(&my_context->irq_handle);
> 
> Where is rtdm_irq_free? Again, this ready flag looks racy.

Aha, sorry, I quoted wrong snippet. rtdm_irq_free() follows
immediately, like this:

int demo_close_rt(struct rtdm_dev_context   *context,
                  rtdm_user_info_t          *user_info)
{
        struct demodrv_context  *my_context;
        rtdm_lockctx_t          lock_ctx;
        // get the context                                                                                                                                    
        my_context = (struct demodrv_context *)context->dev_private;

        // if we need to do some stuff with preemption disabled:                                                                                              
        rtdm_lock_get_irqsave(&my_context->lock, lock_ctx);

        my_context->ready = 0;
        rtdm_irq_disable(&my_context->irq_handle);


        // free irq in RTDM                                                                                                                                   
        rtdm_irq_free(&my_context->irq_handle);

        // destroy our interrupt signal/event                                                                                                                 
        rtdm_event_destroy(&my_context->irq_event);

        // other stuff here                                                                                                                                   
        rtdm_lock_put_irqrestore(&my_context->lock, lock_ctx);

        return 0;
}

Now... I'm aware that lock_get/put around irq_free should be
unneccessary, as should be irq_disable and my ->ready flag. Those were
my attempts to work around the problem. I'll attach the full source at
the end.

> > Unfortunately, when the userspace app is ran and killed repeatedly (so
> > that interrupt is registered/unregistered all the time), I get
> > oopses in __ipipe_dispatch_wired() -- it seems to call into the NULL
> > pointer.
> > 
> > I decided that "wired" interrupt when the source is shared between
> > Linux and Xenomai, is wrong thing, so I disable "wired" interrupts
> > altogether, but that only moved oops to __virq_end. 
> 
> This is wrong. The only way to get a determistically shared IRQs across
> domains is via the wired path, either using the pattern Gilles cited or,
> in a slight variation, signaling down via a separate rtdm_nrtsig.

For now, I'm trying to get it not to oops; deterministic latencies are
the next topic :-(.

The hardware is little unusual that it generates interrupts at 1kHz,
whether realtime driver is loaded or not.

								Pavel

								
/**********************************************************/
/*    HARD REAL TIME RTDM Driver Skeleton V1.1            */
/*                                                        */
/*    Based on code of Jan Kiszka - thanks Jan!           */
/*    (C) 2006 Jan Kiszka, www.captain.at                 */
/*    (C) 2010 Pavel Machek, sysgo                        */
/*    License: GPL                                        */
/**********************************************************/

/* Note: Interrupt is shared between serial lines and GPIOs. That
   means that interrupt latencies can't be guaranteed when serials are
   in use.
*/

#include <linux/mman.h>
#include <rtdm/rtdm_driver.h>
#include "pmc6l-rt.h"

char dummy_buffer[BUFSIZE];
int  events;

// our driver context struct: used for storing various information
struct demodrv_context {
	rtdm_irq_t              irq_handle;
	rtdm_lock_t             lock;
	int                     dev_id;
	u64                     last_timestamp;
	rtdm_event_t            irq_event;
	volatile unsigned long  irq_event_lock;
	volatile int            irq_events;
	int64_t                 timeout;
	void                    *buf;
	int			ready;
};

#define GPIO_CONF_LOW 0xb0
#define GPIO_CONF_HIGH 0xb2
#define GPIO_STATUS_IN 0xb4
#define GPIO_STATUS_OUT 0xb6
#define GPIO_IRQ_ID 0x2a

#define iowrite(bits, val, adr) iowrite##bits(cpu_to_be##bits(val), adr)
#define ioread(bits, adr) be##bits##_to_cpu(ioread##bits(adr))

#define pgwrite(dat, adr)	iowrite(16, (dat), gpio_rt_config.mmio+(adr))
#define pgread(adr)		ioread(16, gpio_rt_config.mmio+(adr))

/**********************************************************/
/*            INTERRUPT HANDLING                          */
/**********************************************************/

/*
 * Demo interrupt code; when GPIO #2 is changed, GPIO #4 is toggled.
 */
static int demo_interrupt(rtdm_irq_t *irq_context)
{
	struct demodrv_context *ctx;
	int           dev_id;
	int           ret = RTDM_IRQ_HANDLED; // usual return value
	unsigned pending, output;

	ctx = rtdm_irq_get_arg(irq_context, struct demodrv_context);
	dev_id    = ctx->dev_id;

	if (!ctx->ready) {
		printk(KERN_CRIT "Unexpected interrupt\n");
		return XN_ISR_PROPAGATE;
	}		

#if 0
	rtdm_lock_get(&ctx->lock);

	pending = pgread(GPIO_IRQ_ID);
	pgwrite(0, GPIO_IRQ_ID);
	output = pgread(GPIO_STATUS_OUT);
	output ^= (1<<3);
	pgwrite(output, GPIO_STATUS_OUT);

	// do stuff
	if (events > 1000) {
		rtdm_event_signal(&ctx->irq_event);
		events = 0;
	}
	events++;
        
	rtdm_lock_put(&ctx->lock);
	// those return values were dropped from the RTDM
	// ret = RTDM_IRQ_ENABLE | RTDM_IRQ_PROPAGATE;

	/* We need to propagate the interrupt, so that PMC-6L serials
	   work. Result is that interrupt latencies can't be
	   guaranteed when serials are in use.  */

	//	return XN_ISR_PROPAGATE;
	return RTDM_IRQ_HANDLED;
#else
	pending = pgread(GPIO_IRQ_ID);
	pgwrite(0, GPIO_IRQ_ID);
	output = pgread(GPIO_STATUS_OUT);
	output = output & (~(1<<3));
	if (pgread(GPIO_STATUS_IN) & (1<<1))
		output |= (1<<3);
	pgwrite(output, GPIO_STATUS_OUT);

	// those return values were dropped from the RTDM
	// ret = RTDM_IRQ_ENABLE | RTDM_IRQ_PROPAGATE;

	/* We need to propagate the interrupt, so that PMC-6L serials
	   work. Result is that interrupt latencies can't be
	   guaranteed when serials are in use.  */

	//	return XN_ISR_PROPAGATE;
	return RTDM_IRQ_HANDLED;
#endif
}


/**********************************************************/
/*            DRIVER OPEN                                 */
/**********************************************************/
int demo_open_rt(struct rtdm_dev_context    *context,
                 rtdm_user_info_t           *user_info,
                 int                        oflags)
{
	struct demodrv_context  *my_context;
	int dev_id = context->device->device_id;
	int ret;
    
	// get the context for our driver - used to store driver info
	my_context = (struct demodrv_context *)context->dev_private;

	// we also have an interrupt handler:
	ret = rtdm_irq_request(&my_context->irq_handle, gpio_rt_config.irq, demo_interrupt,
			       RTDM_IRQTYPE_SHARED, context->device->proc_name, my_context);

	if (ret < 0)
		return ret;

	/* IPC initialisation - cannot fail with used parameters */
 	rtdm_lock_init(&my_context->lock);
	rtdm_event_init(&my_context->irq_event, 0);
	my_context->dev_id         = dev_id;

	my_context->irq_events     = 0;
	my_context->irq_event_lock = 0;
	my_context->ready = 1;
	my_context->timeout = 0; // wait INFINITE

	// enable interrupt in RTDM
	rtdm_irq_enable(&my_context->irq_handle);
	return 0;
}

/**********************************************************/
/*            DRIVER CLOSE                                */
/**********************************************************/
int demo_close_rt(struct rtdm_dev_context   *context,
                  rtdm_user_info_t          *user_info)
{
	struct demodrv_context  *my_context;
	rtdm_lockctx_t          lock_ctx;
	// get the context
	my_context = (struct demodrv_context *)context->dev_private;

	// if we need to do some stuff with preemption disabled:
	rtdm_lock_get_irqsave(&my_context->lock, lock_ctx);

	my_context->ready = 0;
	rtdm_irq_disable(&my_context->irq_handle);


	// free irq in RTDM
	rtdm_irq_free(&my_context->irq_handle);

	// destroy our interrupt signal/event
	rtdm_event_destroy(&my_context->irq_event);

	// other stuff here
	rtdm_lock_put_irqrestore(&my_context->lock, lock_ctx);

	return 0;
}

/**********************************************************/
/*            DRIVER READ                                 */
/**********************************************************/
int demo_read_rt(struct rtdm_dev_context *context,
		 rtdm_user_info_t *user_info, void *buf, size_t nbyte)
{
	struct demodrv_context *ctx;
	int                     dev_id;
	char                    *out_pos = (char *)buf;
	rtdm_toseq_t            timeout_seq;
	int                     ret;

	// zero bytes requested ? return!
	if (nbyte == 0)
		return 0;

	// check if R/W actions to user-space are allowed
	if (user_info && !rtdm_rw_user_ok(user_info, buf, nbyte))
		return -EFAULT;

	ctx    = (struct demodrv_context *)context->dev_private;
	dev_id = ctx->dev_id;

	// wait: if ctx->timeout = 0, it will block infintely until
	//       rtdm_event_signal(&ctx->irq_event); is called from our
	//       interrupt routine
	ret = rtdm_event_timedwait(&ctx->irq_event, ctx->timeout, &timeout_seq);

	// now write the requested stuff to user-space
	if (rtdm_copy_to_user(user_info, out_pos,
			      dummy_buffer, BUFSIZE) != 0) {
		ret = -EFAULT;
	} else {
		ret = BUFSIZE;
	}
	return ret;
}

/**********************************************************/
/*            DRIVER OPERATIONS                           */
/**********************************************************/
static struct rtdm_device demo_device = {
struct_version:     RTDM_DEVICE_STRUCT_VER,

device_flags:       RTDM_NAMED_DEVICE,
context_size:       sizeof(struct demodrv_context),
device_name:        DEV_FILE,

/* open and close functions are not real-time safe due kmalloc
   and kfree. If you do not use kmalloc and kfree, and you made
   sure that there is no syscall in the open/close handler, you
   can declare the open_rt and close_rt handler.
*/
open_rt:            NULL,
open_nrt:           demo_open_rt,

ops: {
        close_rt:       NULL,
        close_nrt:      demo_close_rt,

        ioctl_rt:       NULL,
        ioctl_nrt:      NULL,

        read_rt:        demo_read_rt,
        read_nrt:       NULL,

        write_rt:       NULL,
        write_nrt:      NULL,

        recvmsg_rt:     NULL,
        recvmsg_nrt:    NULL,

        sendmsg_rt:     NULL,
        sendmsg_nrt:    NULL,
	},

device_class:       RTDM_CLASS_EXPERIMENTAL,
device_sub_class:   222,
driver_name:        DRV_NAME,
peripheral_name:    DEV_FILE_NAME,
provider_name:      "-",
proc_name:          demo_device.device_name,
};

/**********************************************************/
/*            INIT DRIVER                                 */
/**********************************************************/
int init_module(void)
{
	pmc_gpio_rt_config();
	printk("realtime driver at irq %d mmio %x\n", 
	       gpio_rt_config.irq, gpio_rt_config.mmio);
	return rtdm_dev_register(&demo_device);
}

/**********************************************************/
/*            CLEANUP DRIVER                              */
/**********************************************************/
void cleanup_module(void)
{
	rtdm_dev_unregister(&demo_device, 1000);
}

MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("RTDM Real Time Driver Example");

--- user app ----

/**********************************************************/
/*    HARD REAL TIME RTDM Driver User-Space Application   */
/*                                                        */
/*    Based on code of Jan Kiszka - thanks Jan!           */
/*    (C) 2006 Jan Kiszka, www.captain.at                 */
/*    (C) 2010 Pavel Machek, sysgo                        */
/*    License: GPL                                        */
/**********************************************************/

#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/mman.h>

#include <native/task.h>
#include <native/timer.h>
#include <rtdm/rtdm.h>
#include "pmc6l-rt.h"

unsigned int my_state = 0;
int timer_started = 0;
int my_fd = -1;
int shutdownnow = 0;
RT_TASK my_task;
//                        --s-ms-us-ns
RTIME my_task_period_ns =   1000000000llu;


/**********************************************************/
/*            CLOSE RT DRIVER                             */
/**********************************************************/
static int close_file( int fd, unsigned char *name) {
	int ret,i=0;
	do {
		i++;
		ret = rt_dev_close(fd);
		switch(-ret){
		case EBADF:   printf("%s -> invalid fd or context\n",name);
			break;
		case EAGAIN:  printf("%s -> EAGAIN (%d times)\n",name,i);
			rt_task_sleep(50000); // wait 50us
			break;
		case 0:       printf("%s -> closed\n",name);
			break;
		default:      printf("%s -> ???\n",name);
			break;
		}
	} while (ret == -EAGAIN && i < 10);
	return ret;
}

/**********************************************************/
/*            CLEANING UP                                 */
/**********************************************************/
void cleanup_all(void) {
	if (my_state & STATE_FILE_OPENED) {
		close_file( my_fd, DEV_FILE " (user)");
		my_state &= ~STATE_FILE_OPENED;
	}
	if (my_state & STATE_TASK_CREATED) {
		printf("delete my_task\n");
		rt_task_delete(&my_task);
		my_state &= ~STATE_TASK_CREATED;
	}
	if (timer_started) {
		printf("stop timer\n");
		rt_timer_stop();
	}
}

void catch_signal(int sig) {
	shutdownnow = 1;
	cleanup_all();
	printf("exit\n");
	return;
}

/**********************************************************/
/*            REAL TIME TASK                              */
/**********************************************************/
void my_task_proc(void *arg) {
	int ret;
	ssize_t sz = sizeof(RTIME);
	ssize_t written = 0;
	ssize_t read = 0;
	int counter = 0;
	int readbackcounter;
	unsigned char buf[17] = "CAPTAIN WAS HERE\0";
	unsigned char buf2[17] = "XXXXXXXXXXXXXXXX\0";
	/* no periodic task, due blocking read from the RT driver (see below too)
	   ret = rt_task_set_periodic(NULL, TM_NOW, rt_timer_ns2ticks(my_task_period_ns));
	   if (ret) {
	   printf("error while set periodic, code %d\n",ret);
	   goto exit_my_task;
	   }
	*/

	while (1) {
		sprintf(buf, "CAPTAIN %d", counter);
		counter++;
		if (counter > 100) counter = 0;
		/* switch to primary mode */
		ret = rt_task_set_mode(0, T_PRIMARY, NULL);
		if (ret) {
			printf("error while rt_task_set_mode, code %d\n",ret);
			goto exit_my_task;
		}
    
		sz = sizeof(buf2);
		read = rt_dev_read(my_fd, &buf2, sizeof(buf2));
		if (read == sz ) {
			printf("READ: read=%s\n",buf2);
		} else {
			if (read < 0 ) {
				printf("error while rt_dev_read, code %d\n",read);
			} else {
				printf("only %d / %d byte received \n",read,sz);
			}
		}

		// read blocks, so check if user hit CTRL-C meanwhile
		//   otherwise we segfault when mmap'ing
		if (shutdownnow == 1) break;
	}
exit_my_task:
	if (my_state & STATE_FILE_OPENED) {
		if (!close_file( my_fd, DEV_FILE " (write)")) {
			my_state &= ~STATE_FILE_OPENED;
		}
	}
	printf("exit\n");
}

/**********************************************************/
/*            MAIN: mainly RT task initialization         */
/**********************************************************/
int main(int argc, char* argv[]) {
	int ret = 0;
	signal(SIGTERM, catch_signal);
	signal(SIGINT, catch_signal);

	printf("PRESS CTRL-C to EXIT\n");
	/* no memory-swapping for this programm */
	mlockall(MCL_CURRENT | MCL_FUTURE);
	/* start timer */
	ret = rt_timer_start(TM_ONESHOT);
	switch (ret) {
	case 0:       printf("timer started\n");
		timer_started = 1;
		break;
	case -EBUSY:  printf("timer is running\n");
		timer_started = 0;
		break;
	case -ENOSYS: printf("can't start timer\n");
		return ret;
	}

	/* open DEV_FILE */
	my_fd = rt_dev_open( DEV_FILE, 0);
	if (my_fd < 0) {
		printf("can't open %s\n", DEV_FILE);
		goto error;
	}
	my_state |= STATE_FILE_OPENED;
	printf("%s opened\n", DEV_FILE);

	/* create my_task */
	ret = rt_task_create(&my_task,"my_task",0,50,0);
	if (ret) {
		printf("failed to create my_task, code %d\n",ret);
		goto error;
	}
	my_state |= STATE_TASK_CREATED;
	printf("my_task created\n");

	/* start my_task */
	printf("starting my_task\n");
	ret = rt_task_start(&my_task,&my_task_proc,NULL);
	if (ret) {
		printf("failed to start my_task, code %d\n",ret);
		goto error;
	}

	pause();
	return 0;

error:
	cleanup_all();
	return ret;
}


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-08  7:01   ` Pavel Machek
@ 2010-10-08  7:20     ` Gilles Chanteperdrix
  2010-10-08  8:17     ` Philippe Gerum
  2010-10-08  9:41     ` Philippe Gerum
  2 siblings, 0 replies; 51+ messages in thread
From: Gilles Chanteperdrix @ 2010-10-08  7:20 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Jan Kiszka, xenomai

Pavel Machek wrote:
> Hi!
> 
>>> I have... quite an interesting setup here.
>>>
>>> SMP machine, with special PCI card; that card has GPIOs and serial
>>> ports. Unfortunately, there's only one interrupt, shared between
>>> serials and GPIO pins, and serials are way too complex to be handled
>>> by realtime layer.
>>>
>>> So I ended up with
>>>
>>>         // we also have an interrupt handler:                                                                                                                 
>>>         ret = rtdm_irq_request(&my_context->irq_handle,
>>>         gpio_rt_config.irq, demo_interrupt,
>>>                                RTDM_IRQTYPE_SHARED,
>>>         context->device->proc_name, my_context);
>>>
>>> and 
>>>
>>> static int demo_interrupt(rtdm_irq_t *irq_context)
>>> {
>>>         struct demodrv_context *ctx;
>>>         int           dev_id;
>>>         int           ret = RTDM_IRQ_HANDLED; // usual return value                                                                                           
>>>         unsigned pending, output;
>>>
>>>         ctx = rtdm_irq_get_arg(irq_context, struct demodrv_context);
>>>         dev_id    = ctx->dev_id;
>>>
>>>         if (!ctx->ready) {
>>>                 printk(KERN_CRIT "Unexpected interrupt\n");
>>>                 return XN_ISR_PROPAGATE;
>> Who sets ready and when? Looks racy.
> 
> Debugging aid; yes, this one is racy.
> 
>>>         rtdm_lock_put(&ctx->lock);
>>>  
>>>         /* We need to propagate the interrupt, so that PMC-6L serials                                                                                         
>>>            work. Result is that interrupt latencies can't be                                                                                                  
>>>            guaranteed when serials are in use.  */
>>>
>>>          return RTDM_IRQ_HANDLED;
>>> }
>>>
>>> Unregistration is:
>>>         my_context->ready = 0;
>>>         rtdm_irq_disable(&my_context->irq_handle);
>> Where is rtdm_irq_free? Again, this ready flag looks racy.
> 
> Aha, sorry, I quoted wrong snippet. rtdm_irq_free() follows
> immediately, like this:
> 
> int demo_close_rt(struct rtdm_dev_context   *context,
>                   rtdm_user_info_t          *user_info)
> {
>         struct demodrv_context  *my_context;
>         rtdm_lockctx_t          lock_ctx;
>         // get the context                                                                                                                                    
>         my_context = (struct demodrv_context *)context->dev_private;
> 
>         // if we need to do some stuff with preemption disabled:                                                                                              
>         rtdm_lock_get_irqsave(&my_context->lock, lock_ctx);
> 
>         my_context->ready = 0;
>         rtdm_irq_disable(&my_context->irq_handle);
> 
> 
>         // free irq in RTDM                                                                                                                                   
>         rtdm_irq_free(&my_context->irq_handle);
> 
>         // destroy our interrupt signal/event                                                                                                                 
>         rtdm_event_destroy(&my_context->irq_event);
> 
>         // other stuff here                                                                                                                                   
>         rtdm_lock_put_irqrestore(&my_context->lock, lock_ctx);
> 
>         return 0;
> }
> 
> Now... I'm aware that lock_get/put around irq_free should be
> unneccessary, as should be irq_disable and my ->ready flag. Those were
> my attempts to work around the problem. I'll attach the full source at
> the end.
> 
>>> Unfortunately, when the userspace app is ran and killed repeatedly (so
>>> that interrupt is registered/unregistered all the time), I get
>>> oopses in __ipipe_dispatch_wired() -- it seems to call into the NULL
>>> pointer.
>>>
>>> I decided that "wired" interrupt when the source is shared between
>>> Linux and Xenomai, is wrong thing, so I disable "wired" interrupts
>>> altogether, but that only moved oops to __virq_end. 
>> This is wrong. The only way to get a determistically shared IRQs across
>> domains is via the wired path, either using the pattern Gilles cited or,
>> in a slight variation, signaling down via a separate rtdm_nrtsig.
> 
> For now, I'm trying to get it not to oops; deterministic latencies are
> the next topic :-(.

Another remark, Xenomai releases of a given branch are ABI and API
compatible, so, upgrade to the latest version of the 2.4 branch, which
is 2.4.9.1 will not hurt. In the same vein, I-pipe patches are backward
compatibles, so, upgrade to
http://download.gna.org/adeos/patches/v2.6/x86/older/adeos-ipipe-2.6.27-x86-2.2-03.patch

is recommended if it is not the I-pipe you are using.

It would be nice if you could at least test these versions, if they
work, we would know whether the issue you have has been solved since the
release of these versions.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-08  7:01   ` Pavel Machek
  2010-10-08  7:20     ` Gilles Chanteperdrix
@ 2010-10-08  8:17     ` Philippe Gerum
  2010-10-08  8:41       ` Jan Kiszka
  2010-10-08  9:41     ` Philippe Gerum
  2 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-10-08  8:17 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Jan Kiszka, xenomai

On Fri, 2010-10-08 at 09:01 +0200, Pavel Machek wrote:
> Hi!
> 
> > > I have... quite an interesting setup here.
> > > 
> > > SMP machine, with special PCI card; that card has GPIOs and serial
> > > ports. Unfortunately, there's only one interrupt, shared between
> > > serials and GPIO pins, and serials are way too complex to be handled
> > > by realtime layer.
> > > 
> > > So I ended up with
> > > 
> > >         // we also have an interrupt handler:                                                                                                                 
> > >         ret = rtdm_irq_request(&my_context->irq_handle,
> > >         gpio_rt_config.irq, demo_interrupt,
> > >                                RTDM_IRQTYPE_SHARED,
> > >         context->device->proc_name, my_context);
> > > 
> > > and 
> > > 
> > > static int demo_interrupt(rtdm_irq_t *irq_context)
> > > {
> > >         struct demodrv_context *ctx;
> > >         int           dev_id;
> > >         int           ret = RTDM_IRQ_HANDLED; // usual return value                                                                                           
> > >         unsigned pending, output;
> > > 
> > >         ctx = rtdm_irq_get_arg(irq_context, struct demodrv_context);
> > >         dev_id    = ctx->dev_id;
> > > 
> > >         if (!ctx->ready) {
> > >                 printk(KERN_CRIT "Unexpected interrupt\n");
> > >                 return XN_ISR_PROPAGATE;
> > 
> > Who sets ready and when? Looks racy.
> 
> Debugging aid; yes, this one is racy.
> 
> > >         rtdm_lock_put(&ctx->lock);
> > >  
> > >         /* We need to propagate the interrupt, so that PMC-6L serials                                                                                         
> > >            work. Result is that interrupt latencies can't be                                                                                                  
> > >            guaranteed when serials are in use.  */
> > > 
> > >          return RTDM_IRQ_HANDLED;
> > > }
> > > 
> > > Unregistration is:
> > >         my_context->ready = 0;
> > >         rtdm_irq_disable(&my_context->irq_handle);
> > 
> > Where is rtdm_irq_free? Again, this ready flag looks racy.
> 
> Aha, sorry, I quoted wrong snippet. rtdm_irq_free() follows
> immediately, like this:
> 
> int demo_close_rt(struct rtdm_dev_context   *context,
>                   rtdm_user_info_t          *user_info)
> {
>         struct demodrv_context  *my_context;
>         rtdm_lockctx_t          lock_ctx;
>         // get the context                                                                                                                                    
>         my_context = (struct demodrv_context *)context->dev_private;
> 
>         // if we need to do some stuff with preemption disabled:                                                                                              
>         rtdm_lock_get_irqsave(&my_context->lock, lock_ctx);
> 
>         my_context->ready = 0;
>         rtdm_irq_disable(&my_context->irq_handle);
> 
> 
>         // free irq in RTDM                                                                                                                                   
>         rtdm_irq_free(&my_context->irq_handle);
> 
>         // destroy our interrupt signal/event                                                                                                                 
>         rtdm_event_destroy(&my_context->irq_event);
> 
>         // other stuff here                                                                                                                                   
>         rtdm_lock_put_irqrestore(&my_context->lock, lock_ctx);
> 
>         return 0;
> }
> 
> Now... I'm aware that lock_get/put around irq_free should be
> unneccessary, as should be irq_disable and my ->ready flag. Those were
> my attempts to work around the problem. I'll attach the full source at
> the end.
> 
> > > Unfortunately, when the userspace app is ran and killed repeatedly (so
> > > that interrupt is registered/unregistered all the time), I get
> > > oopses in __ipipe_dispatch_wired() -- it seems to call into the NULL
> > > pointer.
> > > 
> > > I decided that "wired" interrupt when the source is shared between
> > > Linux and Xenomai, is wrong thing, so I disable "wired" interrupts
> > > altogether, but that only moved oops to __virq_end. 
> > 
> > This is wrong. The only way to get a determistically shared IRQs across
> > domains is via the wired path, either using the pattern Gilles cited or,
> > in a slight variation, signaling down via a separate rtdm_nrtsig.
> 
> For now, I'm trying to get it not to oops; deterministic latencies are
> the next topic :-(.

The main issue is that we don't lock our IRQ descriptors (the pipeline
ones) when running the handlers, so another CPU clearing them via
ipipe_virtualize_irq() may well sink the boat...

The unwritten rule has always been to assume that drivers would stop
_and_ drain interrupts on all CPUs before unregistering handlers, then
exiting the code. Granted, that's a bit much.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-08  8:17     ` Philippe Gerum
@ 2010-10-08  8:41       ` Jan Kiszka
  2010-10-08  8:57         ` Philippe Gerum
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Kiszka @ 2010-10-08  8:41 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai@xenomai.org

Am 08.10.2010 10:17, Philippe Gerum wrote:
> On Fri, 2010-10-08 at 09:01 +0200, Pavel Machek wrote:
>> Hi!
>>
>>>> I have... quite an interesting setup here.
>>>>
>>>> SMP machine, with special PCI card; that card has GPIOs and serial
>>>> ports. Unfortunately, there's only one interrupt, shared between
>>>> serials and GPIO pins, and serials are way too complex to be handled
>>>> by realtime layer.
>>>>
>>>> So I ended up with
>>>>
>>>>         // we also have an interrupt handler:                                                                                                                 
>>>>         ret = rtdm_irq_request(&my_context->irq_handle,
>>>>         gpio_rt_config.irq, demo_interrupt,
>>>>                                RTDM_IRQTYPE_SHARED,
>>>>         context->device->proc_name, my_context);
>>>>
>>>> and 
>>>>
>>>> static int demo_interrupt(rtdm_irq_t *irq_context)
>>>> {
>>>>         struct demodrv_context *ctx;
>>>>         int           dev_id;
>>>>         int           ret = RTDM_IRQ_HANDLED; // usual return value                                                                                           
>>>>         unsigned pending, output;
>>>>
>>>>         ctx = rtdm_irq_get_arg(irq_context, struct demodrv_context);
>>>>         dev_id    = ctx->dev_id;
>>>>
>>>>         if (!ctx->ready) {
>>>>                 printk(KERN_CRIT "Unexpected interrupt\n");
>>>>                 return XN_ISR_PROPAGATE;
>>>
>>> Who sets ready and when? Looks racy.
>>
>> Debugging aid; yes, this one is racy.
>>
>>>>         rtdm_lock_put(&ctx->lock);
>>>>  
>>>>         /* We need to propagate the interrupt, so that PMC-6L serials                                                                                         
>>>>            work. Result is that interrupt latencies can't be                                                                                                  
>>>>            guaranteed when serials are in use.  */
>>>>
>>>>          return RTDM_IRQ_HANDLED;
>>>> }
>>>>
>>>> Unregistration is:
>>>>         my_context->ready = 0;
>>>>         rtdm_irq_disable(&my_context->irq_handle);
>>>
>>> Where is rtdm_irq_free? Again, this ready flag looks racy.
>>
>> Aha, sorry, I quoted wrong snippet. rtdm_irq_free() follows
>> immediately, like this:
>>
>> int demo_close_rt(struct rtdm_dev_context   *context,
>>                   rtdm_user_info_t          *user_info)
>> {
>>         struct demodrv_context  *my_context;
>>         rtdm_lockctx_t          lock_ctx;
>>         // get the context                                                                                                                                    
>>         my_context = (struct demodrv_context *)context->dev_private;
>>
>>         // if we need to do some stuff with preemption disabled:                                                                                              
>>         rtdm_lock_get_irqsave(&my_context->lock, lock_ctx);
>>
>>         my_context->ready = 0;
>>         rtdm_irq_disable(&my_context->irq_handle);
>>
>>
>>         // free irq in RTDM                                                                                                                                   
>>         rtdm_irq_free(&my_context->irq_handle);
>>
>>         // destroy our interrupt signal/event                                                                                                                 
>>         rtdm_event_destroy(&my_context->irq_event);
>>
>>         // other stuff here                                                                                                                                   
>>         rtdm_lock_put_irqrestore(&my_context->lock, lock_ctx);
>>
>>         return 0;
>> }
>>
>> Now... I'm aware that lock_get/put around irq_free should be
>> unneccessary, as should be irq_disable and my ->ready flag. Those were
>> my attempts to work around the problem. I'll attach the full source at
>> the end.
>>
>>>> Unfortunately, when the userspace app is ran and killed repeatedly (so
>>>> that interrupt is registered/unregistered all the time), I get
>>>> oopses in __ipipe_dispatch_wired() -- it seems to call into the NULL
>>>> pointer.
>>>>
>>>> I decided that "wired" interrupt when the source is shared between
>>>> Linux and Xenomai, is wrong thing, so I disable "wired" interrupts
>>>> altogether, but that only moved oops to __virq_end. 
>>>
>>> This is wrong. The only way to get a determistically shared IRQs across
>>> domains is via the wired path, either using the pattern Gilles cited or,
>>> in a slight variation, signaling down via a separate rtdm_nrtsig.
>>
>> For now, I'm trying to get it not to oops; deterministic latencies are
>> the next topic :-(.
> 
> The main issue is that we don't lock our IRQ descriptors (the pipeline
> ones) when running the handlers, so another CPU clearing them via
> ipipe_virtualize_irq() may well sink the boat...
> 
> The unwritten rule has always been to assume that drivers would stop
> _and_ drain interrupts on all CPUs before unregistering handlers, then
> exiting the code. Granted, that's a bit much.

IIRC, we drain at nucleus-level if statistic are enabled. I guess we
should make this unconditional.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-08  8:41       ` Jan Kiszka
@ 2010-10-08  8:57         ` Philippe Gerum
  2010-10-08  9:00           ` Philippe Gerum
  0 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-10-08  8:57 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai@xenomai.org

On Fri, 2010-10-08 at 10:41 +0200, Jan Kiszka wrote:
> Am 08.10.2010 10:17, Philippe Gerum wrote:
> > On Fri, 2010-10-08 at 09:01 +0200, Pavel Machek wrote:
> >> Hi!
> >>
> >>>> I have... quite an interesting setup here.
> >>>>
> >>>> SMP machine, with special PCI card; that card has GPIOs and serial
> >>>> ports. Unfortunately, there's only one interrupt, shared between
> >>>> serials and GPIO pins, and serials are way too complex to be handled
> >>>> by realtime layer.
> >>>>
> >>>> So I ended up with
> >>>>
> >>>>         // we also have an interrupt handler:                                                                                                                 
> >>>>         ret = rtdm_irq_request(&my_context->irq_handle,
> >>>>         gpio_rt_config.irq, demo_interrupt,
> >>>>                                RTDM_IRQTYPE_SHARED,
> >>>>         context->device->proc_name, my_context);
> >>>>
> >>>> and 
> >>>>
> >>>> static int demo_interrupt(rtdm_irq_t *irq_context)
> >>>> {
> >>>>         struct demodrv_context *ctx;
> >>>>         int           dev_id;
> >>>>         int           ret = RTDM_IRQ_HANDLED; // usual return value                                                                                           
> >>>>         unsigned pending, output;
> >>>>
> >>>>         ctx = rtdm_irq_get_arg(irq_context, struct demodrv_context);
> >>>>         dev_id    = ctx->dev_id;
> >>>>
> >>>>         if (!ctx->ready) {
> >>>>                 printk(KERN_CRIT "Unexpected interrupt\n");
> >>>>                 return XN_ISR_PROPAGATE;
> >>>
> >>> Who sets ready and when? Looks racy.
> >>
> >> Debugging aid; yes, this one is racy.
> >>
> >>>>         rtdm_lock_put(&ctx->lock);
> >>>>  
> >>>>         /* We need to propagate the interrupt, so that PMC-6L serials                                                                                         
> >>>>            work. Result is that interrupt latencies can't be                                                                                                  
> >>>>            guaranteed when serials are in use.  */
> >>>>
> >>>>          return RTDM_IRQ_HANDLED;
> >>>> }
> >>>>
> >>>> Unregistration is:
> >>>>         my_context->ready = 0;
> >>>>         rtdm_irq_disable(&my_context->irq_handle);
> >>>
> >>> Where is rtdm_irq_free? Again, this ready flag looks racy.
> >>
> >> Aha, sorry, I quoted wrong snippet. rtdm_irq_free() follows
> >> immediately, like this:
> >>
> >> int demo_close_rt(struct rtdm_dev_context   *context,
> >>                   rtdm_user_info_t          *user_info)
> >> {
> >>         struct demodrv_context  *my_context;
> >>         rtdm_lockctx_t          lock_ctx;
> >>         // get the context                                                                                                                                    
> >>         my_context = (struct demodrv_context *)context->dev_private;
> >>
> >>         // if we need to do some stuff with preemption disabled:                                                                                              
> >>         rtdm_lock_get_irqsave(&my_context->lock, lock_ctx);
> >>
> >>         my_context->ready = 0;
> >>         rtdm_irq_disable(&my_context->irq_handle);
> >>
> >>
> >>         // free irq in RTDM                                                                                                                                   
> >>         rtdm_irq_free(&my_context->irq_handle);
> >>
> >>         // destroy our interrupt signal/event                                                                                                                 
> >>         rtdm_event_destroy(&my_context->irq_event);
> >>
> >>         // other stuff here                                                                                                                                   
> >>         rtdm_lock_put_irqrestore(&my_context->lock, lock_ctx);
> >>
> >>         return 0;
> >> }
> >>
> >> Now... I'm aware that lock_get/put around irq_free should be
> >> unneccessary, as should be irq_disable and my ->ready flag. Those were
> >> my attempts to work around the problem. I'll attach the full source at
> >> the end.
> >>
> >>>> Unfortunately, when the userspace app is ran and killed repeatedly (so
> >>>> that interrupt is registered/unregistered all the time), I get
> >>>> oopses in __ipipe_dispatch_wired() -- it seems to call into the NULL
> >>>> pointer.
> >>>>
> >>>> I decided that "wired" interrupt when the source is shared between
> >>>> Linux and Xenomai, is wrong thing, so I disable "wired" interrupts
> >>>> altogether, but that only moved oops to __virq_end. 
> >>>
> >>> This is wrong. The only way to get a determistically shared IRQs across
> >>> domains is via the wired path, either using the pattern Gilles cited or,
> >>> in a slight variation, signaling down via a separate rtdm_nrtsig.
> >>
> >> For now, I'm trying to get it not to oops; deterministic latencies are
> >> the next topic :-(.
> > 
> > The main issue is that we don't lock our IRQ descriptors (the pipeline
> > ones) when running the handlers, so another CPU clearing them via
> > ipipe_virtualize_irq() may well sink the boat...
> > 
> > The unwritten rule has always been to assume that drivers would stop
> > _and_ drain interrupts on all CPUs before unregistering handlers, then
> > exiting the code. Granted, that's a bit much.
> 
> IIRC, we drain at nucleus-level if statistic are enabled. I guess we
> should make this unconditional.

Draining is currently performed after the descriptor release via
rthal_irq_release() in this code, and it depends on the stat counters to
determine whether the IRQ handler is still running on any CPU it seems.
A saner way would be to define a draining service in the pipeline, and
have rtdm_irq_free() invoke it early.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-08  8:57         ` Philippe Gerum
@ 2010-10-08  9:00           ` Philippe Gerum
  0 siblings, 0 replies; 51+ messages in thread
From: Philippe Gerum @ 2010-10-08  9:00 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai@xenomai.org

On Fri, 2010-10-08 at 10:57 +0200, Philippe Gerum wrote:
> On Fri, 2010-10-08 at 10:41 +0200, Jan Kiszka wrote:
> > Am 08.10.2010 10:17, Philippe Gerum wrote:
> > > On Fri, 2010-10-08 at 09:01 +0200, Pavel Machek wrote:
> > >> Hi!
> > >>
> > >>>> I have... quite an interesting setup here.
> > >>>>
> > >>>> SMP machine, with special PCI card; that card has GPIOs and serial
> > >>>> ports. Unfortunately, there's only one interrupt, shared between
> > >>>> serials and GPIO pins, and serials are way too complex to be handled
> > >>>> by realtime layer.
> > >>>>
> > >>>> So I ended up with
> > >>>>
> > >>>>         // we also have an interrupt handler:                                                                                                                 
> > >>>>         ret = rtdm_irq_request(&my_context->irq_handle,
> > >>>>         gpio_rt_config.irq, demo_interrupt,
> > >>>>                                RTDM_IRQTYPE_SHARED,
> > >>>>         context->device->proc_name, my_context);
> > >>>>
> > >>>> and 
> > >>>>
> > >>>> static int demo_interrupt(rtdm_irq_t *irq_context)
> > >>>> {
> > >>>>         struct demodrv_context *ctx;
> > >>>>         int           dev_id;
> > >>>>         int           ret = RTDM_IRQ_HANDLED; // usual return value                                                                                           
> > >>>>         unsigned pending, output;
> > >>>>
> > >>>>         ctx = rtdm_irq_get_arg(irq_context, struct demodrv_context);
> > >>>>         dev_id    = ctx->dev_id;
> > >>>>
> > >>>>         if (!ctx->ready) {
> > >>>>                 printk(KERN_CRIT "Unexpected interrupt\n");
> > >>>>                 return XN_ISR_PROPAGATE;
> > >>>
> > >>> Who sets ready and when? Looks racy.
> > >>
> > >> Debugging aid; yes, this one is racy.
> > >>
> > >>>>         rtdm_lock_put(&ctx->lock);
> > >>>>  
> > >>>>         /* We need to propagate the interrupt, so that PMC-6L serials                                                                                         
> > >>>>            work. Result is that interrupt latencies can't be                                                                                                  
> > >>>>            guaranteed when serials are in use.  */
> > >>>>
> > >>>>          return RTDM_IRQ_HANDLED;
> > >>>> }
> > >>>>
> > >>>> Unregistration is:
> > >>>>         my_context->ready = 0;
> > >>>>         rtdm_irq_disable(&my_context->irq_handle);
> > >>>
> > >>> Where is rtdm_irq_free? Again, this ready flag looks racy.
> > >>
> > >> Aha, sorry, I quoted wrong snippet. rtdm_irq_free() follows
> > >> immediately, like this:
> > >>
> > >> int demo_close_rt(struct rtdm_dev_context   *context,
> > >>                   rtdm_user_info_t          *user_info)
> > >> {
> > >>         struct demodrv_context  *my_context;
> > >>         rtdm_lockctx_t          lock_ctx;
> > >>         // get the context                                                                                                                                    
> > >>         my_context = (struct demodrv_context *)context->dev_private;
> > >>
> > >>         // if we need to do some stuff with preemption disabled:                                                                                              
> > >>         rtdm_lock_get_irqsave(&my_context->lock, lock_ctx);
> > >>
> > >>         my_context->ready = 0;
> > >>         rtdm_irq_disable(&my_context->irq_handle);
> > >>
> > >>
> > >>         // free irq in RTDM                                                                                                                                   
> > >>         rtdm_irq_free(&my_context->irq_handle);
> > >>
> > >>         // destroy our interrupt signal/event                                                                                                                 
> > >>         rtdm_event_destroy(&my_context->irq_event);
> > >>
> > >>         // other stuff here                                                                                                                                   
> > >>         rtdm_lock_put_irqrestore(&my_context->lock, lock_ctx);
> > >>
> > >>         return 0;
> > >> }
> > >>
> > >> Now... I'm aware that lock_get/put around irq_free should be
> > >> unneccessary, as should be irq_disable and my ->ready flag. Those were
> > >> my attempts to work around the problem. I'll attach the full source at
> > >> the end.
> > >>
> > >>>> Unfortunately, when the userspace app is ran and killed repeatedly (so
> > >>>> that interrupt is registered/unregistered all the time), I get
> > >>>> oopses in __ipipe_dispatch_wired() -- it seems to call into the NULL
> > >>>> pointer.
> > >>>>
> > >>>> I decided that "wired" interrupt when the source is shared between
> > >>>> Linux and Xenomai, is wrong thing, so I disable "wired" interrupts
> > >>>> altogether, but that only moved oops to __virq_end. 
> > >>>
> > >>> This is wrong. The only way to get a determistically shared IRQs across
> > >>> domains is via the wired path, either using the pattern Gilles cited or,
> > >>> in a slight variation, signaling down via a separate rtdm_nrtsig.
> > >>
> > >> For now, I'm trying to get it not to oops; deterministic latencies are
> > >> the next topic :-(.
> > > 
> > > The main issue is that we don't lock our IRQ descriptors (the pipeline
> > > ones) when running the handlers, so another CPU clearing them via
> > > ipipe_virtualize_irq() may well sink the boat...
> > > 
> > > The unwritten rule has always been to assume that drivers would stop
> > > _and_ drain interrupts on all CPUs before unregistering handlers, then
> > > exiting the code. Granted, that's a bit much.
> > 
> > IIRC, we drain at nucleus-level if statistic are enabled. I guess we
> > should make this unconditional.
> 
> Draining is currently performed after the descriptor release via
> rthal_irq_release() in this code, and it depends on the stat counters to
> determine whether the IRQ handler is still running on any CPU it seems.
> A saner way would be to define a draining service in the pipeline, and
> have rtdm_irq_free() invoke it early.

s,rtdm_irq_free,xnintr_detach,

> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-08  7:01   ` Pavel Machek
  2010-10-08  7:20     ` Gilles Chanteperdrix
  2010-10-08  8:17     ` Philippe Gerum
@ 2010-10-08  9:41     ` Philippe Gerum
  2010-10-13  9:03       ` Pavel Machek
  2 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-10-08  9:41 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Jan Kiszka, xenomai

On Fri, 2010-10-08 at 09:01 +0200, Pavel Machek wrote:
> Hi!

[snip]

> Now... I'm aware that lock_get/put around irq_free should be
> unneccessary, as should be irq_disable and my ->ready flag. Those were
> my attempts to work around the problem. I'll attach the full source at
> the end.
> 
> > > Unfortunately, when the userspace app is ran and killed repeatedly (so
> > > that interrupt is registered/unregistered all the time), I get
> > > oopses in __ipipe_dispatch_wired() -- it seems to call into the NULL
> > > pointer.
> > > 
> > > I decided that "wired" interrupt when the source is shared between
> > > Linux and Xenomai, is wrong thing, so I disable "wired" interrupts
> > > altogether, but that only moved oops to __virq_end. 
> > 
> > This is wrong. The only way to get a determistically shared IRQs across
> > domains is via the wired path, either using the pattern Gilles cited or,
> > in a slight variation, signaling down via a separate rtdm_nrtsig.
> 
> For now, I'm trying to get it not to oops; deterministic latencies are
> the next topic :-(.

Could you try the patches below? Absolutely untested as required by the
breaktiquette, but might help. Hopefully.

This series is on top of I-pipe 2.6.27-x86:

diff --git a/include/linux/ipipe.h b/include/linux/ipipe.h
index a12f431..b0e59e6 100644
--- a/include/linux/ipipe.h
+++ b/include/linux/ipipe.h
@@ -353,6 +353,15 @@ int ipipe_register_domain(struct ipipe_domain *ipd,
 
 int ipipe_unregister_domain(struct ipipe_domain *ipd);
 
+#ifdef CONFIG_SMP
+void ipipe_drain_interrupt(struct ipipe_domain *ipd,
+			   unsigned int irq);
+#else
+static inline void ipipe_drain_interrupt(struct ipipe_domain *ipd,
+					 unsigned int irq)
+{}
+#endif
+
 void ipipe_suspend_domain(void);
 
 int ipipe_virtualize_irq(struct ipipe_domain *ipd,
diff --git a/kernel/ipipe/core.c b/kernel/ipipe/core.c
index 97e6aae..ac4addc 100644
--- a/kernel/ipipe/core.c
+++ b/kernel/ipipe/core.c
@@ -1250,6 +1250,32 @@ int ipipe_unregister_domain(struct ipipe_domain
*ipd)
 	return 0;
 }
 
+void ipipe_drain_interrupt(struct ipipe_domain *ipd, unsigned int irq)
+{
+	struct ipipe_percpu_domain_data *p;
+	unsigned long flags;
+	int cpu;
+
+	flags = ipipe_critical_enter(NULL);
+	clear_bit(IPIPE_HANDLE_FLAG, &ipd->irqs[irq].control);
+	clear_bit(IPIPE_STICKY_FLAG, &ipd->irqs[irq].control);
+	set_bit(IPIPE_PASS_FLAG, &ipd->irqs[irq].control);
+	ipipe_critical_exit(flags);
+
+	for_each_online_cpu(cpu) {
+		local_irq_save_hw(flags);
+		p = ipipe_percpudom_ptr(ipd, cpu);
+		do {
+			local_irq_restore_hw(flags);
+			cpu_relax();
+			local_irq_save_hw(flags);
+		} while (test_bit(irq, p->irqpend_lomask) ||
+			 test_bit(IPIPE_STALL_FLAG, &p->status));
+		local_irq_restore_hw(flags);
+	}
+}
+EXPORT_SYMBOL_GPL(ipipe_drain_interrupt);
+

This series goes on top of Xenomai 2.5.x:

 /*
  * ipipe_propagate_irq() -- Force a given IRQ propagation on behalf of
  * a running interrupt handler to the next domain down the pipeline.
diff --git a/include/asm-generic/bits/intr.h
b/include/asm-generic/bits/intr.h
index 19b35d6..9452932 100644
--- a/include/asm-generic/bits/intr.h
+++ b/include/asm-generic/bits/intr.h
@@ -57,6 +57,11 @@ static inline void xnarch_chain_irq (unsigned irq)
     rthal_irq_host_pend(irq);
 }
 
+static inline void xnarch_drain_irq(unsigned irq)
+{
+    rthal_irq_drain(irq);
+}
+
 static inline xnarch_cpumask_t xnarch_set_irq_affinity (unsigned irq,
 							xnarch_cpumask_t affinity)
 {
diff --git a/include/asm-generic/hal.h b/include/asm-generic/hal.h
index 34f8ea1..81aef5e 100644
--- a/include/asm-generic/hal.h
+++ b/include/asm-generic/hal.h
@@ -504,6 +504,11 @@ int rthal_irq_disable(unsigned irq);
 
 int rthal_irq_end(unsigned irq);
 
+static inline void rthal_irq_drain(unsigned int irq)
+{
+	ipipe_drain_interrupt(&rthal_domain, irq);
+}
+
 int rthal_irq_host_request(unsigned irq,
                            rthal_irq_host_handler_t handler,
 			   char *name,
diff --git a/ksrc/nucleus/intr.c b/ksrc/nucleus/intr.c
index a6de4ea..1b79874 100644
--- a/ksrc/nucleus/intr.c
+++ b/ksrc/nucleus/intr.c
@@ -792,6 +792,8 @@ int xnintr_detach(xnintr_t *intr)
 		goto out;
 	}
 
+	xnarch_drain_irq(intr->irq);
+
 	__clrbits(intr->flags, XN_ISR_ATTACHED);
 
 	ret = xnintr_irq_detach(intr);

-- 
Philippe.




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-08  9:41     ` Philippe Gerum
@ 2010-10-13  9:03       ` Pavel Machek
  2010-10-13  9:16         ` Philippe Gerum
  0 siblings, 1 reply; 51+ messages in thread
From: Pavel Machek @ 2010-10-13  9:03 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, xenomai

Hi!

> > Now... I'm aware that lock_get/put around irq_free should be
> > unneccessary, as should be irq_disable and my ->ready flag. Those were
> > my attempts to work around the problem. I'll attach the full source at
> > the end.
> > 
> > > > Unfortunately, when the userspace app is ran and killed repeatedly (so
> > > > that interrupt is registered/unregistered all the time), I get
> > > > oopses in __ipipe_dispatch_wired() -- it seems to call into the NULL
> > > > pointer.
> > > > 
> > > > I decided that "wired" interrupt when the source is shared between
> > > > Linux and Xenomai, is wrong thing, so I disable "wired" interrupts
> > > > altogether, but that only moved oops to __virq_end. 
> > > 
> > > This is wrong. The only way to get a determistically shared IRQs across
> > > domains is via the wired path, either using the pattern Gilles cited or,
> > > in a slight variation, signaling down via a separate rtdm_nrtsig.
> > 
> > For now, I'm trying to get it not to oops; deterministic latencies are
> > the next topic :-(.
> 
> Could you try the patches below? Absolutely untested as required by the
> breaktiquette, but might help. Hopefully.

Thanks!

I hope I applied the patch correctly, it was slightly
tricky... Unfortunately, it causes hang at first attempt to kill
realtime task :-(.

> @@ -1250,6 +1250,32 @@ int ipipe_unregister_domain(struct ipipe_domain
> *ipd)
>  	return 0;
>  }
>  
> +void ipipe_drain_interrupt(struct ipipe_domain *ipd, unsigned int irq)
> +{
> +	struct ipipe_percpu_domain_data *p;
> +	unsigned long flags;
> +	int cpu;
> +
> +	flags = ipipe_critical_enter(NULL);
> +	clear_bit(IPIPE_HANDLE_FLAG, &ipd->irqs[irq].control);
> +	clear_bit(IPIPE_STICKY_FLAG, &ipd->irqs[irq].control);
> +	set_bit(IPIPE_PASS_FLAG, &ipd->irqs[irq].control);
> +	ipipe_critical_exit(flags);
> +
> +	for_each_online_cpu(cpu) {
> +		local_irq_save_hw(flags);
> +		p = ipipe_percpudom_ptr(ipd, cpu);
> +		do {
> +			local_irq_restore_hw(flags);
> +			cpu_relax();
> +			local_irq_save_hw(flags);
> +		} while (test_bit(irq, p->irqpend_lomask) ||
> +			 test_bit(IPIPE_STALL_FLAG, &p->status));
> +		local_irq_restore_hw(flags);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(ipipe_drain_interrupt);
> +

This is the interesting place, right? I'll stick printks there and see
what happens.

								Pavel


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-13  9:03       ` Pavel Machek
@ 2010-10-13  9:16         ` Philippe Gerum
  2010-10-13  9:26           ` Pavel Machek
  0 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-10-13  9:16 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Jan Kiszka, xenomai

On Wed, 2010-10-13 at 11:03 +0200, Pavel Machek wrote:
> Hi!
> 
> > > Now... I'm aware that lock_get/put around irq_free should be
> > > unneccessary, as should be irq_disable and my ->ready flag. Those were
> > > my attempts to work around the problem. I'll attach the full source at
> > > the end.
> > > 
> > > > > Unfortunately, when the userspace app is ran and killed repeatedly (so
> > > > > that interrupt is registered/unregistered all the time), I get
> > > > > oopses in __ipipe_dispatch_wired() -- it seems to call into the NULL
> > > > > pointer.
> > > > > 
> > > > > I decided that "wired" interrupt when the source is shared between
> > > > > Linux and Xenomai, is wrong thing, so I disable "wired" interrupts
> > > > > altogether, but that only moved oops to __virq_end. 
> > > > 
> > > > This is wrong. The only way to get a determistically shared IRQs across
> > > > domains is via the wired path, either using the pattern Gilles cited or,
> > > > in a slight variation, signaling down via a separate rtdm_nrtsig.
> > > 
> > > For now, I'm trying to get it not to oops; deterministic latencies are
> > > the next topic :-(.
> > 
> > Could you try the patches below? Absolutely untested as required by the
> > breaktiquette, but might help. Hopefully.
> 
> Thanks!
> 
> I hope I applied the patch correctly, it was slightly
> tricky... Unfortunately, it causes hang at first attempt to kill
> realtime task :-(.

Ah. Weird. The IRQ drain code is supposed to be called only when an
interrupt is unregistered from the Xenomai layer, which is not directly
related to killing a task. Maybe some of the task cleanup code involves
unregistering IRQs though, so this might explain.

> 
> > @@ -1250,6 +1250,32 @@ int ipipe_unregister_domain(struct ipipe_domain
> > *ipd)
> >  	return 0;
> >  }
> >  
> > +void ipipe_drain_interrupt(struct ipipe_domain *ipd, unsigned int irq)
> > +{
> > +	struct ipipe_percpu_domain_data *p;
> > +	unsigned long flags;
> > +	int cpu;
> > +
> > +	flags = ipipe_critical_enter(NULL);
> > +	clear_bit(IPIPE_HANDLE_FLAG, &ipd->irqs[irq].control);
> > +	clear_bit(IPIPE_STICKY_FLAG, &ipd->irqs[irq].control);
> > +	set_bit(IPIPE_PASS_FLAG, &ipd->irqs[irq].control);
> > +	ipipe_critical_exit(flags);
> > +
> > +	for_each_online_cpu(cpu) {
> > +		local_irq_save_hw(flags);
> > +		p = ipipe_percpudom_ptr(ipd, cpu);
> > +		do {
> > +			local_irq_restore_hw(flags);
> > +			cpu_relax();
> > +			local_irq_save_hw(flags);
> > +		} while (test_bit(irq, p->irqpend_lomask) ||
> > +			 test_bit(IPIPE_STALL_FLAG, &p->status));
> > +		local_irq_restore_hw(flags);
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(ipipe_drain_interrupt);
> > +
> 
> This is the interesting place, right?

Yes. I messed up. We need the test on the STALL bit to make 100% sure
that we are not about clearing the function pointer under the feet of
the wired IRQ dispatcher, but at the same time, we enter the drain
routine with the STALL bit set for the target domain. So this code is
plain silly (patent not granted, way too much prior art).

As a quick check, you could try removing the second test_bit from the
end loop condition.

>  I'll stick printks there and see
> what happens.
> 
> 								Pavel

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-13  9:16         ` Philippe Gerum
@ 2010-10-13  9:26           ` Pavel Machek
  2010-10-13 14:52             ` Philippe Gerum
  0 siblings, 1 reply; 51+ messages in thread
From: Pavel Machek @ 2010-10-13  9:26 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, xenomai

Hi!

> > Thanks!
> > 
> > I hope I applied the patch correctly, it was slightly
> > tricky... Unfortunately, it causes hang at first attempt to kill
> > realtime task :-(.
> 
> Ah. Weird. The IRQ drain code is supposed to be called only when an
> interrupt is unregistered from the Xenomai layer, which is not directly
> related to killing a task. Maybe some of the task cleanup code involves
> unregistering IRQs though, so this might explain.

Task cleanup definitely unregisters IRQ, yes.

> > > +void ipipe_drain_interrupt(struct ipipe_domain *ipd, unsigned int irq)
> > > +{
> > > +	struct ipipe_percpu_domain_data *p;
> > > +	unsigned long flags;
> > > +	int cpu;
> > > +
> > > +	flags = ipipe_critical_enter(NULL);
> > > +	clear_bit(IPIPE_HANDLE_FLAG, &ipd->irqs[irq].control);
> > > +	clear_bit(IPIPE_STICKY_FLAG, &ipd->irqs[irq].control);
> > > +	set_bit(IPIPE_PASS_FLAG, &ipd->irqs[irq].control);
> > > +	ipipe_critical_exit(flags);
> > > +
> > > +	for_each_online_cpu(cpu) {
> > > +		local_irq_save_hw(flags);
> > > +		p = ipipe_percpudom_ptr(ipd, cpu);
> > > +		do {
> > > +			local_irq_restore_hw(flags);
> > > +			cpu_relax();
> > > +			local_irq_save_hw(flags);
> > > +		} while (test_bit(irq, p->irqpend_lomask) ||
> > > +			 test_bit(IPIPE_STALL_FLAG, &p->status));
> > > +		local_irq_restore_hw(flags);
> > > +	}
> > > +}
> > > +EXPORT_SYMBOL_GPL(ipipe_drain_interrupt);
> > > +
> > 
> > This is the interesting place, right?
> 
> Yes. I messed up. We need the test on the STALL bit to make 100% sure
> that we are not about clearing the function pointer under the feet of
> the wired IRQ dispatcher, but at the same time, we enter the drain
> routine with the STALL bit set for the target domain. So this code is
> plain silly (patent not granted, way too much prior art).

:-).

> As a quick check, you could try removing the second test_bit from the
> end loop condition.

Yes, with IPIPE_STALL_FLAG test removed it "works", but I assume that
at that point it is placebo patch ;-).

Should we test IPIPE_STALL_FLAG on all but current CPUs?
								Pavel

								


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-13  9:26           ` Pavel Machek
@ 2010-10-13 14:52             ` Philippe Gerum
  2010-10-25 16:48               ` Philippe Gerum
  0 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-10-13 14:52 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Jan Kiszka, xenomai

On Wed, 2010-10-13 at 11:26 +0200, Pavel Machek wrote:
> Hi!
> 
> > > Thanks!
> > > 
> > > I hope I applied the patch correctly, it was slightly
> > > tricky... Unfortunately, it causes hang at first attempt to kill
> > > realtime task :-(.
> > 
> > Ah. Weird. The IRQ drain code is supposed to be called only when an
> > interrupt is unregistered from the Xenomai layer, which is not directly
> > related to killing a task. Maybe some of the task cleanup code involves
> > unregistering IRQs though, so this might explain.
> 
> Task cleanup definitely unregisters IRQ, yes.
> 
> > > > +void ipipe_drain_interrupt(struct ipipe_domain *ipd, unsigned int irq)
> > > > +{
> > > > +	struct ipipe_percpu_domain_data *p;
> > > > +	unsigned long flags;
> > > > +	int cpu;
> > > > +
> > > > +	flags = ipipe_critical_enter(NULL);
> > > > +	clear_bit(IPIPE_HANDLE_FLAG, &ipd->irqs[irq].control);
> > > > +	clear_bit(IPIPE_STICKY_FLAG, &ipd->irqs[irq].control);
> > > > +	set_bit(IPIPE_PASS_FLAG, &ipd->irqs[irq].control);
> > > > +	ipipe_critical_exit(flags);
> > > > +
> > > > +	for_each_online_cpu(cpu) {
> > > > +		local_irq_save_hw(flags);
> > > > +		p = ipipe_percpudom_ptr(ipd, cpu);
> > > > +		do {
> > > > +			local_irq_restore_hw(flags);
> > > > +			cpu_relax();
> > > > +			local_irq_save_hw(flags);
> > > > +		} while (test_bit(irq, p->irqpend_lomask) ||
> > > > +			 test_bit(IPIPE_STALL_FLAG, &p->status));
> > > > +		local_irq_restore_hw(flags);
> > > > +	}
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(ipipe_drain_interrupt);
> > > > +
> > > 
> > > This is the interesting place, right?
> > 
> > Yes. I messed up. We need the test on the STALL bit to make 100% sure
> > that we are not about clearing the function pointer under the feet of
> > the wired IRQ dispatcher, but at the same time, we enter the drain
> > routine with the STALL bit set for the target domain. So this code is
> > plain silly (patent not granted, way too much prior art).
> 
> :-).
> 
> > As a quick check, you could try removing the second test_bit from the
> > end loop condition.
> 
> Yes, with IPIPE_STALL_FLAG test removed it "works", but I assume that
> at that point it is placebo patch ;-).

Basically, yes, since we have problems with wired IRQs, so...

> 
> Should we test IPIPE_STALL_FLAG on all but current CPUs?

That would solve this particular issue, but we should drain the pipeline
out of any Xenomai critical section. The way it is done now may induce a
deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in
ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs off
for CPU0 to release the Xenomai lock that annoys us right now).

I'll come up with something hopefully better and tested in the next
days.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-13 14:52             ` Philippe Gerum
@ 2010-10-25 16:48               ` Philippe Gerum
  2010-10-25 18:10                 ` Jan Kiszka
  0 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-10-25 16:48 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Jan Kiszka, xenomai

On Wed, 2010-10-13 at 16:52 +0200, Philippe Gerum wrote: 
> > 
> > Should we test IPIPE_STALL_FLAG on all but current CPUs?
> 
> That would solve this particular issue, but we should drain the pipeline
> out of any Xenomai critical section. The way it is done now may induce a
> deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in
> ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs off
> for CPU0 to release the Xenomai lock that annoys us right now).
> 
> I'll come up with something hopefully better and tested in the next
> days.
> 

Sorry for the lag. In case that helps, here is another approach, based
on telling the pipeline to ignore the irq about to be detached, so that
it passes all further occurrences down to the next domain, without
handling any on the target domain anymore. Once this operation is done,
we are safe, and no handler for this interrupt could run in parallel on
other CPUs for the given domain, while we attempt to clear it in the
domain's IRQ descriptor. Said differently, the inserted code tells the
pipeline to skip the target domain when handling the interrupt.

For this to work, we have to ignore the irq before we enter the irqs off
section which disables the interrupt descriptor in Xenomai (breaking the
section across the IRQ unregistration call - i.e. xnarch_release_irq() -
would introduce a race). Patches against 2.6.35.7 and Xenomai 2.5.5.2,
but the backport should be reasonably straightforward.

HTH,

Cherry-pick these commits for the I-pipe part:
http://git.denx.de/?p=ipipe-2.6.git;a=commit;h=b786b2c5963d5884d60f265a6efc40276d8487e2
http://git.denx.de/?p=ipipe-2.6.git;a=commit;h=ca5f69024fec66c7b25d9b5b6507501c4a47c3a9

And the rest is on top of Xenomai 2.5.5.2:
http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=737e3bb09e2364658cfc59a0037f5e39a81fb799
http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=5ee600768743906342a350ecbca51a01ca5689b4

-- 
Philippe.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-25 16:48               ` Philippe Gerum
@ 2010-10-25 18:10                 ` Jan Kiszka
  2010-10-25 19:08                   ` Philippe Gerum
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Kiszka @ 2010-10-25 18:10 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 1223 bytes --]

Am 25.10.2010 18:48, Philippe Gerum wrote:
> On Wed, 2010-10-13 at 16:52 +0200, Philippe Gerum wrote: 
>>>
>>> Should we test IPIPE_STALL_FLAG on all but current CPUs?
>>
>> That would solve this particular issue, but we should drain the pipeline
>> out of any Xenomai critical section. The way it is done now may induce a
>> deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in
>> ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs off
>> for CPU0 to release the Xenomai lock that annoys us right now).
>>
>> I'll come up with something hopefully better and tested in the next
>> days.
>>
> 
> Sorry for the lag. In case that helps, here is another approach, based
> on telling the pipeline to ignore the irq about to be detached, so that
> it passes all further occurrences down to the next domain, without

Err, won't this irritate that next domain, ie. won't Linux dump warnings
about a spurious/unhandled IRQ? I think either the old handler shall
receive the last event or no one.

Why this complex solution, why not simply draining (via critical_enter
or whatever) - but _after_ xnintr_irq_detach, ie. while the related
resources are still valid?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-25 18:10                 ` Jan Kiszka
@ 2010-10-25 19:08                   ` Philippe Gerum
  2010-10-25 19:11                     ` Philippe Gerum
  2010-10-25 19:15                     ` Jan Kiszka
  0 siblings, 2 replies; 51+ messages in thread
From: Philippe Gerum @ 2010-10-25 19:08 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Mon, 2010-10-25 at 20:10 +0200, Jan Kiszka wrote:
> Am 25.10.2010 18:48, Philippe Gerum wrote:
> > On Wed, 2010-10-13 at 16:52 +0200, Philippe Gerum wrote: 
> >>>
> >>> Should we test IPIPE_STALL_FLAG on all but current CPUs?
> >>
> >> That would solve this particular issue, but we should drain the pipeline
> >> out of any Xenomai critical section. The way it is done now may induce a
> >> deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in
> >> ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs off
> >> for CPU0 to release the Xenomai lock that annoys us right now).
> >>
> >> I'll come up with something hopefully better and tested in the next
> >> days.
> >>
> > 
> > Sorry for the lag. In case that helps, here is another approach, based
> > on telling the pipeline to ignore the irq about to be detached, so that
> > it passes all further occurrences down to the next domain, without
> 
> Err, won't this irritate that next domain, ie. won't Linux dump warnings
> about a spurious/unhandled IRQ? I think either the old handler shall
> receive the last event or no one.

Flipping the IRQ modes within a ipipe_critical_enter/exit section gives
you that guarantee. You are supposed to have disabled the irq line
before detaching, and critical IPIs cannot be acknowledged until all
CPUs have re-enabled interrupts at some point. Therefore, there are only
two scenarii:

- irq was disabled before delivery, and a pending interrupt is masked by
the PIC and never delivered to the CPU.

- an interrupt sneaked in before disabling, it is currently processed by
the pipeline in the low handler on some CPU, in which case interrupts
are off, so a critical IPI could be acked yet, and the irq mode bits
still allow dispatching to the target domain on that CPU. The assumption
which is happily made is that only head domains are interested in
un-virtualizing irqs, so the dispatch will happen immediately, while the
handler is still valid (actually, we are not allowed to un-virtualize
root irqs, and intermediate Adeos domains are already considered as
endangered species, so this is fine).

> 
> Why this complex solution, why not simply draining (via critical_enter
> or whatever) - but _after_ xnintr_irq_detach, ie. while the related
> resources are still valid?
> 

Because it's already too late. You have cleared the handler pointer when
un-virtualizing via xnarch_release_irq, and the wired irq dispatcher or
the log syncer on another CPU could then branch to eip $0.

And the solution is - reasonably - complex because xnintr_detach has
quite a few inter-deps. Typically, you may not drop the lock Xenomai
holds on the irq descriptor before calling xnarch_release_irq, to avoid
a race with xnintr_irq_handler in SMP (you could get a NULL cookie
there).

I would have preferred to have ipipe_virtualize_irq drain the
interrupts, but you just can't rely on a critical IPI while holding a
lock other CPUs might spin on irqs off. And you do need this code to
happen in a critical enter section, to act as a barrier wrt IRQ
dispatching. So the operation is unfold, the irq barrier first with irqs
on, then un-virtualizing the irq (for the relevant domain) with irqs
off.

> Jan
> 

-- 
Philippe.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-25 19:08                   ` Philippe Gerum
@ 2010-10-25 19:11                     ` Philippe Gerum
  2010-10-25 19:15                     ` Jan Kiszka
  1 sibling, 0 replies; 51+ messages in thread
From: Philippe Gerum @ 2010-10-25 19:11 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Mon, 2010-10-25 at 21:08 +0200, Philippe Gerum wrote:
> On Mon, 2010-10-25 at 20:10 +0200, Jan Kiszka wrote:
> > Am 25.10.2010 18:48, Philippe Gerum wrote:
> > > On Wed, 2010-10-13 at 16:52 +0200, Philippe Gerum wrote: 
> > >>>
> > >>> Should we test IPIPE_STALL_FLAG on all but current CPUs?
> > >>
> > >> That would solve this particular issue, but we should drain the pipeline
> > >> out of any Xenomai critical section. The way it is done now may induce a
> > >> deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in
> > >> ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs off
> > >> for CPU0 to release the Xenomai lock that annoys us right now).
> > >>
> > >> I'll come up with something hopefully better and tested in the next
> > >> days.
> > >>
> > > 
> > > Sorry for the lag. In case that helps, here is another approach, based
> > > on telling the pipeline to ignore the irq about to be detached, so that
> > > it passes all further occurrences down to the next domain, without
> > 
> > Err, won't this irritate that next domain, ie. won't Linux dump warnings
> > about a spurious/unhandled IRQ? I think either the old handler shall
> > receive the last event or no one.
> 
> Flipping the IRQ modes within a ipipe_critical_enter/exit section gives
> you that guarantee. You are supposed to have disabled the irq line
> before detaching, and critical IPIs cannot be acknowledged until all
> CPUs have re-enabled interrupts at some point. Therefore, there are only
> two scenarii:
> 
> - irq was disabled before delivery, and a pending interrupt is masked by
> the PIC and never delivered to the CPU.
> 
> - an interrupt sneaked in before disabling, it is currently processed by
> the pipeline in the low handler on some CPU, in which case interrupts
> are off, so a critical IPI could

s,,not,

>  be acked yet, and the irq mode bits
> still allow dispatching to the target domain on that CPU. The assumption
> which is happily made is that only head domains are interested in
> un-virtualizing irqs, so the dispatch will happen immediately, while the
> handler is still valid (actually, we are not allowed to un-virtualize
> root irqs, and intermediate Adeos domains are already considered as
> endangered species, so this is fine).
> 
> > 
> > Why this complex solution, why not simply draining (via critical_enter
> > or whatever) - but _after_ xnintr_irq_detach, ie. while the related
> > resources are still valid?
> > 
> 
> Because it's already too late. You have cleared the handler pointer when
> un-virtualizing via xnarch_release_irq, and the wired irq dispatcher or
> the log syncer on another CPU could then branch to eip $0.
> 
> And the solution is - reasonably - complex because xnintr_detach has
> quite a few inter-deps. Typically, you may not drop the lock Xenomai
> holds on the irq descriptor before calling xnarch_release_irq, to avoid
> a race with xnintr_irq_handler in SMP (you could get a NULL cookie
> there).
> 
> I would have preferred to have ipipe_virtualize_irq drain the
> interrupts, but you just can't rely on a critical IPI while holding a
> lock other CPUs might spin on irqs off. And you do need this code to
> happen in a critical enter section, to act as a barrier wrt IRQ
> dispatching. So the operation is unfold, the irq barrier first with irqs
> on, then un-virtualizing the irq (for the relevant domain) with irqs
> off.
> 
> > Jan
> > 
> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-25 19:08                   ` Philippe Gerum
  2010-10-25 19:11                     ` Philippe Gerum
@ 2010-10-25 19:15                     ` Jan Kiszka
  2010-10-25 19:20                       ` Philippe Gerum
  1 sibling, 1 reply; 51+ messages in thread
From: Jan Kiszka @ 2010-10-25 19:15 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 3510 bytes --]

Am 25.10.2010 21:08, Philippe Gerum wrote:
> On Mon, 2010-10-25 at 20:10 +0200, Jan Kiszka wrote:
>> Am 25.10.2010 18:48, Philippe Gerum wrote:
>>> On Wed, 2010-10-13 at 16:52 +0200, Philippe Gerum wrote: 
>>>>>
>>>>> Should we test IPIPE_STALL_FLAG on all but current CPUs?
>>>>
>>>> That would solve this particular issue, but we should drain the pipeline
>>>> out of any Xenomai critical section. The way it is done now may induce a
>>>> deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in
>>>> ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs off
>>>> for CPU0 to release the Xenomai lock that annoys us right now).
>>>>
>>>> I'll come up with something hopefully better and tested in the next
>>>> days.
>>>>
>>>
>>> Sorry for the lag. In case that helps, here is another approach, based
>>> on telling the pipeline to ignore the irq about to be detached, so that
>>> it passes all further occurrences down to the next domain, without
>>
>> Err, won't this irritate that next domain, ie. won't Linux dump warnings
>> about a spurious/unhandled IRQ? I think either the old handler shall
>> receive the last event or no one.
> 
> Flipping the IRQ modes within a ipipe_critical_enter/exit section gives
> you that guarantee. You are supposed to have disabled the irq line
> before detaching, and critical IPIs cannot be acknowledged until all
> CPUs have re-enabled interrupts at some point. Therefore, there are only
> two scenarii:
> 
> - irq was disabled before delivery, and a pending interrupt is masked by
> the PIC and never delivered to the CPU.
> 
> - an interrupt sneaked in before disabling, it is currently processed by
> the pipeline in the low handler on some CPU, in which case interrupts
> are off, so a critical IPI could be acked yet, and the irq mode bits
> still allow dispatching to the target domain on that CPU. The assumption
> which is happily made is that only head domains are interested in
> un-virtualizing irqs, so the dispatch will happen immediately, while the
> handler is still valid (actually, we are not allowed to un-virtualize
> root irqs, and intermediate Adeos domains are already considered as
> endangered species, so this is fine).
> 
>>
>> Why this complex solution, why not simply draining (via critical_enter
>> or whatever) - but _after_ xnintr_irq_detach, ie. while the related
>> resources are still valid?
>>
> 
> Because it's already too late. You have cleared the handler pointer when
> un-virtualizing via xnarch_release_irq, and the wired irq dispatcher or
> the log syncer on another CPU could then branch to eip $0.

Just make ipipe_virtualize_irq install a nop handler instead of NULL.

Jan

> 
> And the solution is - reasonably - complex because xnintr_detach has
> quite a few inter-deps. Typically, you may not drop the lock Xenomai
> holds on the irq descriptor before calling xnarch_release_irq, to avoid
> a race with xnintr_irq_handler in SMP (you could get a NULL cookie
> there).
> 
> I would have preferred to have ipipe_virtualize_irq drain the
> interrupts, but you just can't rely on a critical IPI while holding a
> lock other CPUs might spin on irqs off. And you do need this code to
> happen in a critical enter section, to act as a barrier wrt IRQ
> dispatching. So the operation is unfold, the irq barrier first with irqs
> on, then un-virtualizing the irq (for the relevant domain) with irqs
> off.
> 
>> Jan
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-25 19:15                     ` Jan Kiszka
@ 2010-10-25 19:20                       ` Philippe Gerum
  2010-10-25 19:22                         ` Jan Kiszka
  0 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-10-25 19:20 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Mon, 2010-10-25 at 21:15 +0200, Jan Kiszka wrote:
> Am 25.10.2010 21:08, Philippe Gerum wrote:
> > On Mon, 2010-10-25 at 20:10 +0200, Jan Kiszka wrote:
> >> Am 25.10.2010 18:48, Philippe Gerum wrote:
> >>> On Wed, 2010-10-13 at 16:52 +0200, Philippe Gerum wrote: 
> >>>>>
> >>>>> Should we test IPIPE_STALL_FLAG on all but current CPUs?
> >>>>
> >>>> That would solve this particular issue, but we should drain the pipeline
> >>>> out of any Xenomai critical section. The way it is done now may induce a
> >>>> deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in
> >>>> ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs off
> >>>> for CPU0 to release the Xenomai lock that annoys us right now).
> >>>>
> >>>> I'll come up with something hopefully better and tested in the next
> >>>> days.
> >>>>
> >>>
> >>> Sorry for the lag. In case that helps, here is another approach, based
> >>> on telling the pipeline to ignore the irq about to be detached, so that
> >>> it passes all further occurrences down to the next domain, without
> >>
> >> Err, won't this irritate that next domain, ie. won't Linux dump warnings
> >> about a spurious/unhandled IRQ? I think either the old handler shall
> >> receive the last event or no one.
> > 
> > Flipping the IRQ modes within a ipipe_critical_enter/exit section gives
> > you that guarantee. You are supposed to have disabled the irq line
> > before detaching, and critical IPIs cannot be acknowledged until all
> > CPUs have re-enabled interrupts at some point. Therefore, there are only
> > two scenarii:
> > 
> > - irq was disabled before delivery, and a pending interrupt is masked by
> > the PIC and never delivered to the CPU.
> > 
> > - an interrupt sneaked in before disabling, it is currently processed by
> > the pipeline in the low handler on some CPU, in which case interrupts
> > are off, so a critical IPI could be acked yet, and the irq mode bits
> > still allow dispatching to the target domain on that CPU. The assumption
> > which is happily made is that only head domains are interested in
> > un-virtualizing irqs, so the dispatch will happen immediately, while the
> > handler is still valid (actually, we are not allowed to un-virtualize
> > root irqs, and intermediate Adeos domains are already considered as
> > endangered species, so this is fine).
> > 
> >>
> >> Why this complex solution, why not simply draining (via critical_enter
> >> or whatever) - but _after_ xnintr_irq_detach, ie. while the related
> >> resources are still valid?
> >>
> > 
> > Because it's already too late. You have cleared the handler pointer when
> > un-virtualizing via xnarch_release_irq, and the wired irq dispatcher or
> > the log syncer on another CPU could then branch to eip $0.
> 
> Just make ipipe_virtualize_irq install a nop handler instead of NULL.

This does not solve the issue of the last interrupt which should be
processed. You don't want to miss it.

> 
> Jan
> 
> > 
> > And the solution is - reasonably - complex because xnintr_detach has
> > quite a few inter-deps. Typically, you may not drop the lock Xenomai
> > holds on the irq descriptor before calling xnarch_release_irq, to avoid
> > a race with xnintr_irq_handler in SMP (you could get a NULL cookie
> > there).
> > 
> > I would have preferred to have ipipe_virtualize_irq drain the
> > interrupts, but you just can't rely on a critical IPI while holding a
> > lock other CPUs might spin on irqs off. And you do need this code to
> > happen in a critical enter section, to act as a barrier wrt IRQ
> > dispatching. So the operation is unfold, the irq barrier first with irqs
> > on, then un-virtualizing the irq (for the relevant domain) with irqs
> > off.
> > 
> >> Jan
> >>
> > 
> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-25 19:20                       ` Philippe Gerum
@ 2010-10-25 19:22                         ` Jan Kiszka
  2010-10-25 21:12                           ` Philippe Gerum
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Kiszka @ 2010-10-25 19:22 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 3251 bytes --]

Am 25.10.2010 21:20, Philippe Gerum wrote:
> On Mon, 2010-10-25 at 21:15 +0200, Jan Kiszka wrote:
>> Am 25.10.2010 21:08, Philippe Gerum wrote:
>>> On Mon, 2010-10-25 at 20:10 +0200, Jan Kiszka wrote:
>>>> Am 25.10.2010 18:48, Philippe Gerum wrote:
>>>>> On Wed, 2010-10-13 at 16:52 +0200, Philippe Gerum wrote: 
>>>>>>>
>>>>>>> Should we test IPIPE_STALL_FLAG on all but current CPUs?
>>>>>>
>>>>>> That would solve this particular issue, but we should drain the pipeline
>>>>>> out of any Xenomai critical section. The way it is done now may induce a
>>>>>> deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in
>>>>>> ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs off
>>>>>> for CPU0 to release the Xenomai lock that annoys us right now).
>>>>>>
>>>>>> I'll come up with something hopefully better and tested in the next
>>>>>> days.
>>>>>>
>>>>>
>>>>> Sorry for the lag. In case that helps, here is another approach, based
>>>>> on telling the pipeline to ignore the irq about to be detached, so that
>>>>> it passes all further occurrences down to the next domain, without
>>>>
>>>> Err, won't this irritate that next domain, ie. won't Linux dump warnings
>>>> about a spurious/unhandled IRQ? I think either the old handler shall
>>>> receive the last event or no one.
>>>
>>> Flipping the IRQ modes within a ipipe_critical_enter/exit section gives
>>> you that guarantee. You are supposed to have disabled the irq line
>>> before detaching, and critical IPIs cannot be acknowledged until all
>>> CPUs have re-enabled interrupts at some point. Therefore, there are only
>>> two scenarii:
>>>
>>> - irq was disabled before delivery, and a pending interrupt is masked by
>>> the PIC and never delivered to the CPU.
>>>
>>> - an interrupt sneaked in before disabling, it is currently processed by
>>> the pipeline in the low handler on some CPU, in which case interrupts
>>> are off, so a critical IPI could be acked yet, and the irq mode bits
>>> still allow dispatching to the target domain on that CPU. The assumption
>>> which is happily made is that only head domains are interested in
>>> un-virtualizing irqs, so the dispatch will happen immediately, while the
>>> handler is still valid (actually, we are not allowed to un-virtualize
>>> root irqs, and intermediate Adeos domains are already considered as
>>> endangered species, so this is fine).
>>>
>>>>
>>>> Why this complex solution, why not simply draining (via critical_enter
>>>> or whatever) - but _after_ xnintr_irq_detach, ie. while the related
>>>> resources are still valid?
>>>>
>>>
>>> Because it's already too late. You have cleared the handler pointer when
>>> un-virtualizing via xnarch_release_irq, and the wired irq dispatcher or
>>> the log syncer on another CPU could then branch to eip $0.
>>
>> Just make ipipe_virtualize_irq install a nop handler instead of NULL.
> 
> This does not solve the issue of the last interrupt which should be
> processed. You don't want to miss it.

Don't understand. No interrupt is supposed to arrive anymore on
deregistration, the last user is supposed to be down by now. We just
need to catch though that slipped through.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-25 19:22                         ` Jan Kiszka
@ 2010-10-25 21:12                           ` Philippe Gerum
  2010-10-25 21:22                             ` Jan Kiszka
  0 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-10-25 21:12 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Mon, 2010-10-25 at 21:22 +0200, Jan Kiszka wrote:
> Am 25.10.2010 21:20, Philippe Gerum wrote:
> > On Mon, 2010-10-25 at 21:15 +0200, Jan Kiszka wrote:
> >> Am 25.10.2010 21:08, Philippe Gerum wrote:
> >>> On Mon, 2010-10-25 at 20:10 +0200, Jan Kiszka wrote:
> >>>> Am 25.10.2010 18:48, Philippe Gerum wrote:
> >>>>> On Wed, 2010-10-13 at 16:52 +0200, Philippe Gerum wrote: 
> >>>>>>>
> >>>>>>> Should we test IPIPE_STALL_FLAG on all but current CPUs?
> >>>>>>
> >>>>>> That would solve this particular issue, but we should drain the pipeline
> >>>>>> out of any Xenomai critical section. The way it is done now may induce a
> >>>>>> deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in
> >>>>>> ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs off
> >>>>>> for CPU0 to release the Xenomai lock that annoys us right now).
> >>>>>>
> >>>>>> I'll come up with something hopefully better and tested in the next
> >>>>>> days.
> >>>>>>
> >>>>>
> >>>>> Sorry for the lag. In case that helps, here is another approach, based
> >>>>> on telling the pipeline to ignore the irq about to be detached, so that
> >>>>> it passes all further occurrences down to the next domain, without
> >>>>
> >>>> Err, won't this irritate that next domain, ie. won't Linux dump warnings
> >>>> about a spurious/unhandled IRQ? I think either the old handler shall
> >>>> receive the last event or no one.
> >>>
> >>> Flipping the IRQ modes within a ipipe_critical_enter/exit section gives
> >>> you that guarantee. You are supposed to have disabled the irq line
> >>> before detaching, and critical IPIs cannot be acknowledged until all
> >>> CPUs have re-enabled interrupts at some point. Therefore, there are only
> >>> two scenarii:
> >>>
> >>> - irq was disabled before delivery, and a pending interrupt is masked by
> >>> the PIC and never delivered to the CPU.
> >>>
> >>> - an interrupt sneaked in before disabling, it is currently processed by
> >>> the pipeline in the low handler on some CPU, in which case interrupts
> >>> are off, so a critical IPI could be acked yet, and the irq mode bits
> >>> still allow dispatching to the target domain on that CPU. The assumption
> >>> which is happily made is that only head domains are interested in
> >>> un-virtualizing irqs, so the dispatch will happen immediately, while the
> >>> handler is still valid (actually, we are not allowed to un-virtualize
> >>> root irqs, and intermediate Adeos domains are already considered as
> >>> endangered species, so this is fine).
> >>>
> >>>>
> >>>> Why this complex solution, why not simply draining (via critical_enter
> >>>> or whatever) - but _after_ xnintr_irq_detach, ie. while the related
> >>>> resources are still valid?
> >>>>
> >>>
> >>> Because it's already too late. You have cleared the handler pointer when
> >>> un-virtualizing via xnarch_release_irq, and the wired irq dispatcher or
> >>> the log syncer on another CPU could then branch to eip $0.
> >>
> >> Just make ipipe_virtualize_irq install a nop handler instead of NULL.
> > 
> > This does not solve the issue of the last interrupt which should be
> > processed. You don't want to miss it.
> 
> Don't understand. No interrupt is supposed to arrive anymore on
> deregistration, the last user is supposed to be down by now. We just
> need to catch though that slipped through.

No, we are handling the case when an interrupt is currently handled on a
CPU which is not the one that unregisters the IRQ, and which managed to
sneak in while the irq source was about to be masked in the PIC. This is
purely asynchronous for us in SMP, since we don't have irq descriptor
locks for the pipeline, we only have them at Xenomai level, which is one
step too far to protected our low level Adeos handler against that kind
of race. Logically speaking, there is no reason why you would accept to
leave that irq unhandled if it is there and known from a CPU (it was
actually the first point you raised).

The proper way to fix this issue would have been to fix xnintr_detach in
the first place, because calling ipipe_virtualize_irq() while holding a
lock with irqs off is wrong. We could have then drained the pipeline
from the unregistering code. I'm rather going for a decent solution
which is not prone to regression for 2.x.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-25 21:12                           ` Philippe Gerum
@ 2010-10-25 21:22                             ` Jan Kiszka
  2010-10-25 21:40                               ` Philippe Gerum
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Kiszka @ 2010-10-25 21:22 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 5236 bytes --]

Am 25.10.2010 23:12, Philippe Gerum wrote:
> On Mon, 2010-10-25 at 21:22 +0200, Jan Kiszka wrote:
>> Am 25.10.2010 21:20, Philippe Gerum wrote:
>>> On Mon, 2010-10-25 at 21:15 +0200, Jan Kiszka wrote:
>>>> Am 25.10.2010 21:08, Philippe Gerum wrote:
>>>>> On Mon, 2010-10-25 at 20:10 +0200, Jan Kiszka wrote:
>>>>>> Am 25.10.2010 18:48, Philippe Gerum wrote:
>>>>>>> On Wed, 2010-10-13 at 16:52 +0200, Philippe Gerum wrote: 
>>>>>>>>>
>>>>>>>>> Should we test IPIPE_STALL_FLAG on all but current CPUs?
>>>>>>>>
>>>>>>>> That would solve this particular issue, but we should drain the pipeline
>>>>>>>> out of any Xenomai critical section. The way it is done now may induce a
>>>>>>>> deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in
>>>>>>>> ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs off
>>>>>>>> for CPU0 to release the Xenomai lock that annoys us right now).
>>>>>>>>
>>>>>>>> I'll come up with something hopefully better and tested in the next
>>>>>>>> days.
>>>>>>>>
>>>>>>>
>>>>>>> Sorry for the lag. In case that helps, here is another approach, based
>>>>>>> on telling the pipeline to ignore the irq about to be detached, so that
>>>>>>> it passes all further occurrences down to the next domain, without
>>>>>>
>>>>>> Err, won't this irritate that next domain, ie. won't Linux dump warnings
>>>>>> about a spurious/unhandled IRQ? I think either the old handler shall
>>>>>> receive the last event or no one.
>>>>>
>>>>> Flipping the IRQ modes within a ipipe_critical_enter/exit section gives
>>>>> you that guarantee. You are supposed to have disabled the irq line
>>>>> before detaching, and critical IPIs cannot be acknowledged until all
>>>>> CPUs have re-enabled interrupts at some point. Therefore, there are only
>>>>> two scenarii:
>>>>>
>>>>> - irq was disabled before delivery, and a pending interrupt is masked by
>>>>> the PIC and never delivered to the CPU.
>>>>>
>>>>> - an interrupt sneaked in before disabling, it is currently processed by
>>>>> the pipeline in the low handler on some CPU, in which case interrupts
>>>>> are off, so a critical IPI could be acked yet, and the irq mode bits
>>>>> still allow dispatching to the target domain on that CPU. The assumption
>>>>> which is happily made is that only head domains are interested in
>>>>> un-virtualizing irqs, so the dispatch will happen immediately, while the
>>>>> handler is still valid (actually, we are not allowed to un-virtualize
>>>>> root irqs, and intermediate Adeos domains are already considered as
>>>>> endangered species, so this is fine).
>>>>>
>>>>>>
>>>>>> Why this complex solution, why not simply draining (via critical_enter
>>>>>> or whatever) - but _after_ xnintr_irq_detach, ie. while the related
>>>>>> resources are still valid?
>>>>>>
>>>>>
>>>>> Because it's already too late. You have cleared the handler pointer when
>>>>> un-virtualizing via xnarch_release_irq, and the wired irq dispatcher or
>>>>> the log syncer on another CPU could then branch to eip $0.
>>>>
>>>> Just make ipipe_virtualize_irq install a nop handler instead of NULL.
>>>
>>> This does not solve the issue of the last interrupt which should be
>>> processed. You don't want to miss it.
>>
>> Don't understand. No interrupt is supposed to arrive anymore on
>> deregistration, the last user is supposed to be down by now. We just
>> need to catch though that slipped through.
> 
> No, we are handling the case when an interrupt is currently handled on a
> CPU which is not the one that unregisters the IRQ, and which managed to
> sneak in while the irq source was about to be masked in the PIC. This is
> purely asynchronous for us in SMP, since we don't have irq descriptor
> locks for the pipeline, we only have them at Xenomai level, which is one
> step too far to protected our low level Adeos handler against that kind
> of race. Logically speaking, there is no reason why you would accept to
> leave that irq unhandled if it is there and known from a CPU (it was
> actually the first point you raised).

If that handler is already running, the IRQ will get handled, we just
need to wait for the handler to finish after we returned from
ipipe_virtualize_irq => thus we need a barrier here.

If the handler was about to run but we deregistered it a bit quicker,
the IRQ need not be addressed at device level anymore. Reason: we
already switched off any IRQ assertions in the device before we entered
ipipe_virtualize_irq. So no harm is caused, that IRQ line is deasserted
already (IOW: the IRQ became a spurious one while cleaning up).

> 
> The proper way to fix this issue would have been to fix xnintr_detach in
> the first place, because calling ipipe_virtualize_irq() while holding a
> lock with irqs off is wrong. We could have then drained the pipeline
> from the unregistering code. I'm rather going for a decent solution
> which is not prone to regression for 2.x.
> 

Again, I see no reason for a more complex solution than avoiding that
NULL pointer dereference at ipipe level as I suggested and adding a very
simply system wide barrier right after dropping nklock in xnintr_detach.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-25 21:22                             ` Jan Kiszka
@ 2010-10-25 21:40                               ` Philippe Gerum
  2010-10-25 21:47                                 ` Jan Kiszka
  0 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-10-25 21:40 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Mon, 2010-10-25 at 23:22 +0200, Jan Kiszka wrote:
> Am 25.10.2010 23:12, Philippe Gerum wrote:
> > On Mon, 2010-10-25 at 21:22 +0200, Jan Kiszka wrote:
> >> Am 25.10.2010 21:20, Philippe Gerum wrote:
> >>> On Mon, 2010-10-25 at 21:15 +0200, Jan Kiszka wrote:
> >>>> Am 25.10.2010 21:08, Philippe Gerum wrote:
> >>>>> On Mon, 2010-10-25 at 20:10 +0200, Jan Kiszka wrote:
> >>>>>> Am 25.10.2010 18:48, Philippe Gerum wrote:
> >>>>>>> On Wed, 2010-10-13 at 16:52 +0200, Philippe Gerum wrote: 
> >>>>>>>>>
> >>>>>>>>> Should we test IPIPE_STALL_FLAG on all but current CPUs?
> >>>>>>>>
> >>>>>>>> That would solve this particular issue, but we should drain the pipeline
> >>>>>>>> out of any Xenomai critical section. The way it is done now may induce a
> >>>>>>>> deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in
> >>>>>>>> ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs off
> >>>>>>>> for CPU0 to release the Xenomai lock that annoys us right now).
> >>>>>>>>
> >>>>>>>> I'll come up with something hopefully better and tested in the next
> >>>>>>>> days.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Sorry for the lag. In case that helps, here is another approach, based
> >>>>>>> on telling the pipeline to ignore the irq about to be detached, so that
> >>>>>>> it passes all further occurrences down to the next domain, without
> >>>>>>
> >>>>>> Err, won't this irritate that next domain, ie. won't Linux dump warnings
> >>>>>> about a spurious/unhandled IRQ? I think either the old handler shall
> >>>>>> receive the last event or no one.
> >>>>>
> >>>>> Flipping the IRQ modes within a ipipe_critical_enter/exit section gives
> >>>>> you that guarantee. You are supposed to have disabled the irq line
> >>>>> before detaching, and critical IPIs cannot be acknowledged until all
> >>>>> CPUs have re-enabled interrupts at some point. Therefore, there are only
> >>>>> two scenarii:
> >>>>>
> >>>>> - irq was disabled before delivery, and a pending interrupt is masked by
> >>>>> the PIC and never delivered to the CPU.
> >>>>>
> >>>>> - an interrupt sneaked in before disabling, it is currently processed by
> >>>>> the pipeline in the low handler on some CPU, in which case interrupts
> >>>>> are off, so a critical IPI could be acked yet, and the irq mode bits
> >>>>> still allow dispatching to the target domain on that CPU. The assumption
> >>>>> which is happily made is that only head domains are interested in
> >>>>> un-virtualizing irqs, so the dispatch will happen immediately, while the
> >>>>> handler is still valid (actually, we are not allowed to un-virtualize
> >>>>> root irqs, and intermediate Adeos domains are already considered as
> >>>>> endangered species, so this is fine).
> >>>>>
> >>>>>>
> >>>>>> Why this complex solution, why not simply draining (via critical_enter
> >>>>>> or whatever) - but _after_ xnintr_irq_detach, ie. while the related
> >>>>>> resources are still valid?
> >>>>>>
> >>>>>
> >>>>> Because it's already too late. You have cleared the handler pointer when
> >>>>> un-virtualizing via xnarch_release_irq, and the wired irq dispatcher or
> >>>>> the log syncer on another CPU could then branch to eip $0.
> >>>>
> >>>> Just make ipipe_virtualize_irq install a nop handler instead of NULL.
> >>>
> >>> This does not solve the issue of the last interrupt which should be
> >>> processed. You don't want to miss it.
> >>
> >> Don't understand. No interrupt is supposed to arrive anymore on
> >> deregistration, the last user is supposed to be down by now. We just
> >> need to catch though that slipped through.
> > 
> > No, we are handling the case when an interrupt is currently handled on a
> > CPU which is not the one that unregisters the IRQ, and which managed to
> > sneak in while the irq source was about to be masked in the PIC. This is
> > purely asynchronous for us in SMP, since we don't have irq descriptor
> > locks for the pipeline, we only have them at Xenomai level, which is one
> > step too far to protected our low level Adeos handler against that kind
> > of race. Logically speaking, there is no reason why you would accept to
> > leave that irq unhandled if it is there and known from a CPU (it was
> > actually the first point you raised).
> 
> If that handler is already running, the IRQ will get handled, we just
> need to wait for the handler to finish after we returned from
> ipipe_virtualize_irq => thus we need a barrier here.
> 

So, we let ipipe_virtualize_irq clear the handler pointer any remote CPU
could use in parallel, and we...synchronize on the outcome? The notion
of "handler" may explain why we don't sync yet: I'm not taking about the
Xenomai entry for interrupts, I'm dealing with interrupts in the early
code of ipipe_handle_irq, before you hit the wired irq dispatcher or the
sync_stage loop. 

> If the handler was about to run but we deregistered it a bit quicker,
> the IRQ need not be addressed at device level anymore. Reason: we
> already switched off any IRQ assertions in the device before we entered
> ipipe_virtualize_irq. So no harm is caused, that IRQ line is deasserted
> already (IOW: the IRQ became a spurious one while cleaning up).

You can attempt to disable the irq line on one CPU, and have a relevant
irq entering the low level pipeline handler on another one at exactly
the same time. We have _no_ sync here.

> 
> > 
> > The proper way to fix this issue would have been to fix xnintr_detach in
> > the first place, because calling ipipe_virtualize_irq() while holding a
> > lock with irqs off is wrong. We could have then drained the pipeline
> > from the unregistering code. I'm rather going for a decent solution
> > which is not prone to regression for 2.x.
> > 
> 
> Again, I see no reason for a more complex solution than avoiding that
> NULL pointer dereference at ipipe level as I suggested and adding a very
> simply system wide barrier right after dropping nklock in xnintr_detach.
> 

You cannot drop that lock without rewriting the xnintr layer in rather
touchy areas, that is the point.

> Jan
> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-25 21:40                               ` Philippe Gerum
@ 2010-10-25 21:47                                 ` Jan Kiszka
  2010-10-26  4:43                                   ` Philippe Gerum
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Kiszka @ 2010-10-25 21:47 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 6725 bytes --]

Am 25.10.2010 23:40, Philippe Gerum wrote:
> On Mon, 2010-10-25 at 23:22 +0200, Jan Kiszka wrote:
>> Am 25.10.2010 23:12, Philippe Gerum wrote:
>>> On Mon, 2010-10-25 at 21:22 +0200, Jan Kiszka wrote:
>>>> Am 25.10.2010 21:20, Philippe Gerum wrote:
>>>>> On Mon, 2010-10-25 at 21:15 +0200, Jan Kiszka wrote:
>>>>>> Am 25.10.2010 21:08, Philippe Gerum wrote:
>>>>>>> On Mon, 2010-10-25 at 20:10 +0200, Jan Kiszka wrote:
>>>>>>>> Am 25.10.2010 18:48, Philippe Gerum wrote:
>>>>>>>>> On Wed, 2010-10-13 at 16:52 +0200, Philippe Gerum wrote: 
>>>>>>>>>>>
>>>>>>>>>>> Should we test IPIPE_STALL_FLAG on all but current CPUs?
>>>>>>>>>>
>>>>>>>>>> That would solve this particular issue, but we should drain the pipeline
>>>>>>>>>> out of any Xenomai critical section. The way it is done now may induce a
>>>>>>>>>> deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in
>>>>>>>>>> ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs off
>>>>>>>>>> for CPU0 to release the Xenomai lock that annoys us right now).
>>>>>>>>>>
>>>>>>>>>> I'll come up with something hopefully better and tested in the next
>>>>>>>>>> days.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sorry for the lag. In case that helps, here is another approach, based
>>>>>>>>> on telling the pipeline to ignore the irq about to be detached, so that
>>>>>>>>> it passes all further occurrences down to the next domain, without
>>>>>>>>
>>>>>>>> Err, won't this irritate that next domain, ie. won't Linux dump warnings
>>>>>>>> about a spurious/unhandled IRQ? I think either the old handler shall
>>>>>>>> receive the last event or no one.
>>>>>>>
>>>>>>> Flipping the IRQ modes within a ipipe_critical_enter/exit section gives
>>>>>>> you that guarantee. You are supposed to have disabled the irq line
>>>>>>> before detaching, and critical IPIs cannot be acknowledged until all
>>>>>>> CPUs have re-enabled interrupts at some point. Therefore, there are only
>>>>>>> two scenarii:
>>>>>>>
>>>>>>> - irq was disabled before delivery, and a pending interrupt is masked by
>>>>>>> the PIC and never delivered to the CPU.
>>>>>>>
>>>>>>> - an interrupt sneaked in before disabling, it is currently processed by
>>>>>>> the pipeline in the low handler on some CPU, in which case interrupts
>>>>>>> are off, so a critical IPI could be acked yet, and the irq mode bits
>>>>>>> still allow dispatching to the target domain on that CPU. The assumption
>>>>>>> which is happily made is that only head domains are interested in
>>>>>>> un-virtualizing irqs, so the dispatch will happen immediately, while the
>>>>>>> handler is still valid (actually, we are not allowed to un-virtualize
>>>>>>> root irqs, and intermediate Adeos domains are already considered as
>>>>>>> endangered species, so this is fine).
>>>>>>>
>>>>>>>>
>>>>>>>> Why this complex solution, why not simply draining (via critical_enter
>>>>>>>> or whatever) - but _after_ xnintr_irq_detach, ie. while the related
>>>>>>>> resources are still valid?
>>>>>>>>
>>>>>>>
>>>>>>> Because it's already too late. You have cleared the handler pointer when
>>>>>>> un-virtualizing via xnarch_release_irq, and the wired irq dispatcher or
>>>>>>> the log syncer on another CPU could then branch to eip $0.
>>>>>>
>>>>>> Just make ipipe_virtualize_irq install a nop handler instead of NULL.
>>>>>
>>>>> This does not solve the issue of the last interrupt which should be
>>>>> processed. You don't want to miss it.
>>>>
>>>> Don't understand. No interrupt is supposed to arrive anymore on
>>>> deregistration, the last user is supposed to be down by now. We just
>>>> need to catch though that slipped through.
>>>
>>> No, we are handling the case when an interrupt is currently handled on a
>>> CPU which is not the one that unregisters the IRQ, and which managed to
>>> sneak in while the irq source was about to be masked in the PIC. This is
>>> purely asynchronous for us in SMP, since we don't have irq descriptor
>>> locks for the pipeline, we only have them at Xenomai level, which is one
>>> step too far to protected our low level Adeos handler against that kind
>>> of race. Logically speaking, there is no reason why you would accept to
>>> leave that irq unhandled if it is there and known from a CPU (it was
>>> actually the first point you raised).
>>
>> If that handler is already running, the IRQ will get handled, we just
>> need to wait for the handler to finish after we returned from
>> ipipe_virtualize_irq => thus we need a barrier here.
>>
> 
> So, we let ipipe_virtualize_irq clear the handler pointer any remote CPU
> could use in parallel, and we...synchronize on the outcome? The notion
> of "handler" may explain why we don't sync yet: I'm not taking about the
> Xenomai entry for interrupts, I'm dealing with interrupts in the early
> code of ipipe_handle_irq, before you hit the wired irq dispatcher or the
> sync_stage loop. 

/me too.

> 
>> If the handler was about to run but we deregistered it a bit quicker,
>> the IRQ need not be addressed at device level anymore. Reason: we
>> already switched off any IRQ assertions in the device before we entered
>> ipipe_virtualize_irq. So no harm is caused, that IRQ line is deasserted
>> already (IOW: the IRQ became a spurious one while cleaning up).
> 
> You can attempt to disable the irq line on one CPU, and have a relevant
> irq entering the low level pipeline handler on another one at exactly
> the same time. We have _no_ sync here.

That is not the problem here. I'm not talking about the line, I'm
talking now about the device that drives it. Silencing the IRQ there is
important, but is async /wrt to other cores and needs a barrier.

Actually, playing with the IRQ line is no driver business anyway (except
for very few special cases).


> 
>>
>>>
>>> The proper way to fix this issue would have been to fix xnintr_detach in
>>> the first place, because calling ipipe_virtualize_irq() while holding a
>>> lock with irqs off is wrong. We could have then drained the pipeline
>>> from the unregistering code. I'm rather going for a decent solution
>>> which is not prone to regression for 2.x.
>>>
>>
>> Again, I see no reason for a more complex solution than avoiding that
>> NULL pointer dereference at ipipe level as I suggested and adding a very
>> simply system wide barrier right after dropping nklock in xnintr_detach.
>>
> 
> You cannot drop that lock without rewriting the xnintr layer in rather
> touchy areas, that is the point.

I meant intrlock - we do not call xnintr_detach with nklock acquired
(your pre-detach synch would not work as well if we did).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-25 21:47                                 ` Jan Kiszka
@ 2010-10-26  4:43                                   ` Philippe Gerum
  2010-10-26  5:22                                     ` Jan Kiszka
  0 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-10-26  4:43 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Mon, 2010-10-25 at 23:47 +0200, Jan Kiszka wrote:
> Am 25.10.2010 23:40, Philippe Gerum wrote:
> > On Mon, 2010-10-25 at 23:22 +0200, Jan Kiszka wrote:
> >> Am 25.10.2010 23:12, Philippe Gerum wrote:
> >>> On Mon, 2010-10-25 at 21:22 +0200, Jan Kiszka wrote:
> >>>> Am 25.10.2010 21:20, Philippe Gerum wrote:
> >>>>> On Mon, 2010-10-25 at 21:15 +0200, Jan Kiszka wrote:
> >>>>>> Am 25.10.2010 21:08, Philippe Gerum wrote:
> >>>>>>> On Mon, 2010-10-25 at 20:10 +0200, Jan Kiszka wrote:
> >>>>>>>> Am 25.10.2010 18:48, Philippe Gerum wrote:
> >>>>>>>>> On Wed, 2010-10-13 at 16:52 +0200, Philippe Gerum wrote: 
> >>>>>>>>>>>
> >>>>>>>>>>> Should we test IPIPE_STALL_FLAG on all but current CPUs?
> >>>>>>>>>>
> >>>>>>>>>> That would solve this particular issue, but we should drain the pipeline
> >>>>>>>>>> out of any Xenomai critical section. The way it is done now may induce a
> >>>>>>>>>> deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in
> >>>>>>>>>> ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs off
> >>>>>>>>>> for CPU0 to release the Xenomai lock that annoys us right now).
> >>>>>>>>>>
> >>>>>>>>>> I'll come up with something hopefully better and tested in the next
> >>>>>>>>>> days.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Sorry for the lag. In case that helps, here is another approach, based
> >>>>>>>>> on telling the pipeline to ignore the irq about to be detached, so that
> >>>>>>>>> it passes all further occurrences down to the next domain, without
> >>>>>>>>
> >>>>>>>> Err, won't this irritate that next domain, ie. won't Linux dump warnings
> >>>>>>>> about a spurious/unhandled IRQ? I think either the old handler shall
> >>>>>>>> receive the last event or no one.
> >>>>>>>
> >>>>>>> Flipping the IRQ modes within a ipipe_critical_enter/exit section gives
> >>>>>>> you that guarantee. You are supposed to have disabled the irq line
> >>>>>>> before detaching, and critical IPIs cannot be acknowledged until all
> >>>>>>> CPUs have re-enabled interrupts at some point. Therefore, there are only
> >>>>>>> two scenarii:
> >>>>>>>
> >>>>>>> - irq was disabled before delivery, and a pending interrupt is masked by
> >>>>>>> the PIC and never delivered to the CPU.
> >>>>>>>
> >>>>>>> - an interrupt sneaked in before disabling, it is currently processed by
> >>>>>>> the pipeline in the low handler on some CPU, in which case interrupts
> >>>>>>> are off, so a critical IPI could be acked yet, and the irq mode bits
> >>>>>>> still allow dispatching to the target domain on that CPU. The assumption
> >>>>>>> which is happily made is that only head domains are interested in
> >>>>>>> un-virtualizing irqs, so the dispatch will happen immediately, while the
> >>>>>>> handler is still valid (actually, we are not allowed to un-virtualize
> >>>>>>> root irqs, and intermediate Adeos domains are already considered as
> >>>>>>> endangered species, so this is fine).
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Why this complex solution, why not simply draining (via critical_enter
> >>>>>>>> or whatever) - but _after_ xnintr_irq_detach, ie. while the related
> >>>>>>>> resources are still valid?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Because it's already too late. You have cleared the handler pointer when
> >>>>>>> un-virtualizing via xnarch_release_irq, and the wired irq dispatcher or
> >>>>>>> the log syncer on another CPU could then branch to eip $0.
> >>>>>>
> >>>>>> Just make ipipe_virtualize_irq install a nop handler instead of NULL.
> >>>>>
> >>>>> This does not solve the issue of the last interrupt which should be
> >>>>> processed. You don't want to miss it.
> >>>>
> >>>> Don't understand. No interrupt is supposed to arrive anymore on
> >>>> deregistration, the last user is supposed to be down by now. We just
> >>>> need to catch though that slipped through.
> >>>
> >>> No, we are handling the case when an interrupt is currently handled on a
> >>> CPU which is not the one that unregisters the IRQ, and which managed to
> >>> sneak in while the irq source was about to be masked in the PIC. This is
> >>> purely asynchronous for us in SMP, since we don't have irq descriptor
> >>> locks for the pipeline, we only have them at Xenomai level, which is one
> >>> step too far to protected our low level Adeos handler against that kind
> >>> of race. Logically speaking, there is no reason why you would accept to
> >>> leave that irq unhandled if it is there and known from a CPU (it was
> >>> actually the first point you raised).
> >>
> >> If that handler is already running, the IRQ will get handled, we just
> >> need to wait for the handler to finish after we returned from
> >> ipipe_virtualize_irq => thus we need a barrier here.
> >>
> > 
> > So, we let ipipe_virtualize_irq clear the handler pointer any remote CPU
> > could use in parallel, and we...synchronize on the outcome? The notion
> > of "handler" may explain why we don't sync yet: I'm not taking about the
> > Xenomai entry for interrupts, I'm dealing with interrupts in the early
> > code of ipipe_handle_irq, before you hit the wired irq dispatcher or the
> > sync_stage loop. 
> 
> /me too.
> 
> > 
> >> If the handler was about to run but we deregistered it a bit quicker,
> >> the IRQ need not be addressed at device level anymore. Reason: we
> >> already switched off any IRQ assertions in the device before we entered
> >> ipipe_virtualize_irq. So no harm is caused, that IRQ line is deasserted
> >> already (IOW: the IRQ became a spurious one while cleaning up).
> > 
> > You can attempt to disable the irq line on one CPU, and have a relevant
> > irq entering the low level pipeline handler on another one at exactly
> > the same time. We have _no_ sync here.
> 
> That is not the problem here. I'm not talking about the line, I'm
> talking now about the device that drives it. Silencing the IRQ there is
> important, but is async /wrt to other cores and needs a barrier.
> 
> Actually, playing with the IRQ line is no driver business anyway (except
> for very few special cases).

It seems we are rehashing the same thing using different words: we have
a lingering interrupt, we have nothing to synchronize the
disable-drain-unregister sequence with respect to dispatching that
interrupt, and this code adds the missing bits. There is nothing complex
or overkill compared to the constraints we have in xnintr. So let's move
on.

> 
> 
> > 
> >>
> >>>
> >>> The proper way to fix this issue would have been to fix xnintr_detach in
> >>> the first place, because calling ipipe_virtualize_irq() while holding a
> >>> lock with irqs off is wrong. We could have then drained the pipeline
> >>> from the unregistering code. I'm rather going for a decent solution
> >>> which is not prone to regression for 2.x.
> >>>
> >>
> >> Again, I see no reason for a more complex solution than avoiding that
> >> NULL pointer dereference at ipipe level as I suggested and adding a very
> >> simply system wide barrier right after dropping nklock in xnintr_detach.
> >>
> > 
> > You cannot drop that lock without rewriting the xnintr layer in rather
> > touchy areas, that is the point.
> 
> I meant intrlock - we do not call xnintr_detach with nklock acquired
> (your pre-detach synch would not work as well if we did).

nklock has never been in the picture, no need to bring it in. If you
drop intrlock across the release, you will end up with a race between
attach and detach, and you may well unvirtualize the irq which was in
the process of being attached from another CPU. That is part of the
"touchy areas" I was mentioning lately. xnintr locking is entangled and
assumes that unvirtualizing irq while holding a lock is fine, which is
not. We have to live with it for now, and go for the solution with the
smaller potential for regression. I will obviously welcome any update to
xnintr which would alleviate those contraints, but rather aimed at
-forge, not 2.x.

> 
> Jan
> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-26  4:43                                   ` Philippe Gerum
@ 2010-10-26  5:22                                     ` Jan Kiszka
  2010-10-26 19:33                                       ` Jan Kiszka
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Kiszka @ 2010-10-26  5:22 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 8288 bytes --]

Am 26.10.2010 06:43, Philippe Gerum wrote:
> On Mon, 2010-10-25 at 23:47 +0200, Jan Kiszka wrote:
>> Am 25.10.2010 23:40, Philippe Gerum wrote:
>>> On Mon, 2010-10-25 at 23:22 +0200, Jan Kiszka wrote:
>>>> Am 25.10.2010 23:12, Philippe Gerum wrote:
>>>>> On Mon, 2010-10-25 at 21:22 +0200, Jan Kiszka wrote:
>>>>>> Am 25.10.2010 21:20, Philippe Gerum wrote:
>>>>>>> On Mon, 2010-10-25 at 21:15 +0200, Jan Kiszka wrote:
>>>>>>>> Am 25.10.2010 21:08, Philippe Gerum wrote:
>>>>>>>>> On Mon, 2010-10-25 at 20:10 +0200, Jan Kiszka wrote:
>>>>>>>>>> Am 25.10.2010 18:48, Philippe Gerum wrote:
>>>>>>>>>>> On Wed, 2010-10-13 at 16:52 +0200, Philippe Gerum wrote: 
>>>>>>>>>>>>>
>>>>>>>>>>>>> Should we test IPIPE_STALL_FLAG on all but current CPUs?
>>>>>>>>>>>>
>>>>>>>>>>>> That would solve this particular issue, but we should drain the pipeline
>>>>>>>>>>>> out of any Xenomai critical section. The way it is done now may induce a
>>>>>>>>>>>> deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in
>>>>>>>>>>>> ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs off
>>>>>>>>>>>> for CPU0 to release the Xenomai lock that annoys us right now).
>>>>>>>>>>>>
>>>>>>>>>>>> I'll come up with something hopefully better and tested in the next
>>>>>>>>>>>> days.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sorry for the lag. In case that helps, here is another approach, based
>>>>>>>>>>> on telling the pipeline to ignore the irq about to be detached, so that
>>>>>>>>>>> it passes all further occurrences down to the next domain, without
>>>>>>>>>>
>>>>>>>>>> Err, won't this irritate that next domain, ie. won't Linux dump warnings
>>>>>>>>>> about a spurious/unhandled IRQ? I think either the old handler shall
>>>>>>>>>> receive the last event or no one.
>>>>>>>>>
>>>>>>>>> Flipping the IRQ modes within a ipipe_critical_enter/exit section gives
>>>>>>>>> you that guarantee. You are supposed to have disabled the irq line
>>>>>>>>> before detaching, and critical IPIs cannot be acknowledged until all
>>>>>>>>> CPUs have re-enabled interrupts at some point. Therefore, there are only
>>>>>>>>> two scenarii:
>>>>>>>>>
>>>>>>>>> - irq was disabled before delivery, and a pending interrupt is masked by
>>>>>>>>> the PIC and never delivered to the CPU.
>>>>>>>>>
>>>>>>>>> - an interrupt sneaked in before disabling, it is currently processed by
>>>>>>>>> the pipeline in the low handler on some CPU, in which case interrupts
>>>>>>>>> are off, so a critical IPI could be acked yet, and the irq mode bits
>>>>>>>>> still allow dispatching to the target domain on that CPU. The assumption
>>>>>>>>> which is happily made is that only head domains are interested in
>>>>>>>>> un-virtualizing irqs, so the dispatch will happen immediately, while the
>>>>>>>>> handler is still valid (actually, we are not allowed to un-virtualize
>>>>>>>>> root irqs, and intermediate Adeos domains are already considered as
>>>>>>>>> endangered species, so this is fine).
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Why this complex solution, why not simply draining (via critical_enter
>>>>>>>>>> or whatever) - but _after_ xnintr_irq_detach, ie. while the related
>>>>>>>>>> resources are still valid?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Because it's already too late. You have cleared the handler pointer when
>>>>>>>>> un-virtualizing via xnarch_release_irq, and the wired irq dispatcher or
>>>>>>>>> the log syncer on another CPU could then branch to eip $0.
>>>>>>>>
>>>>>>>> Just make ipipe_virtualize_irq install a nop handler instead of NULL.
>>>>>>>
>>>>>>> This does not solve the issue of the last interrupt which should be
>>>>>>> processed. You don't want to miss it.
>>>>>>
>>>>>> Don't understand. No interrupt is supposed to arrive anymore on
>>>>>> deregistration, the last user is supposed to be down by now. We just
>>>>>> need to catch though that slipped through.
>>>>>
>>>>> No, we are handling the case when an interrupt is currently handled on a
>>>>> CPU which is not the one that unregisters the IRQ, and which managed to
>>>>> sneak in while the irq source was about to be masked in the PIC. This is
>>>>> purely asynchronous for us in SMP, since we don't have irq descriptor
>>>>> locks for the pipeline, we only have them at Xenomai level, which is one
>>>>> step too far to protected our low level Adeos handler against that kind
>>>>> of race. Logically speaking, there is no reason why you would accept to
>>>>> leave that irq unhandled if it is there and known from a CPU (it was
>>>>> actually the first point you raised).
>>>>
>>>> If that handler is already running, the IRQ will get handled, we just
>>>> need to wait for the handler to finish after we returned from
>>>> ipipe_virtualize_irq => thus we need a barrier here.
>>>>
>>>
>>> So, we let ipipe_virtualize_irq clear the handler pointer any remote CPU
>>> could use in parallel, and we...synchronize on the outcome? The notion
>>> of "handler" may explain why we don't sync yet: I'm not taking about the
>>> Xenomai entry for interrupts, I'm dealing with interrupts in the early
>>> code of ipipe_handle_irq, before you hit the wired irq dispatcher or the
>>> sync_stage loop. 
>>
>> /me too.
>>
>>>
>>>> If the handler was about to run but we deregistered it a bit quicker,
>>>> the IRQ need not be addressed at device level anymore. Reason: we
>>>> already switched off any IRQ assertions in the device before we entered
>>>> ipipe_virtualize_irq. So no harm is caused, that IRQ line is deasserted
>>>> already (IOW: the IRQ became a spurious one while cleaning up).
>>>
>>> You can attempt to disable the irq line on one CPU, and have a relevant
>>> irq entering the low level pipeline handler on another one at exactly
>>> the same time. We have _no_ sync here.
>>
>> That is not the problem here. I'm not talking about the line, I'm
>> talking now about the device that drives it. Silencing the IRQ there is
>> important, but is async /wrt to other cores and needs a barrier.
>>
>> Actually, playing with the IRQ line is no driver business anyway (except
>> for very few special cases).
> 
> It seems we are rehashing the same thing using different words: we have
> a lingering interrupt, we have nothing to synchronize the
> disable-drain-unregister sequence with respect to dispatching that
> interrupt, and this code adds the missing bits. There is nothing complex
> or overkill compared to the constraints we have in xnintr. So let's move
> on.
> 
>>
>>
>>>
>>>>
>>>>>
>>>>> The proper way to fix this issue would have been to fix xnintr_detach in
>>>>> the first place, because calling ipipe_virtualize_irq() while holding a
>>>>> lock with irqs off is wrong. We could have then drained the pipeline
>>>>> from the unregistering code. I'm rather going for a decent solution
>>>>> which is not prone to regression for 2.x.
>>>>>
>>>>
>>>> Again, I see no reason for a more complex solution than avoiding that
>>>> NULL pointer dereference at ipipe level as I suggested and adding a very
>>>> simply system wide barrier right after dropping nklock in xnintr_detach.
>>>>
>>>
>>> You cannot drop that lock without rewriting the xnintr layer in rather
>>> touchy areas, that is the point.
>>
>> I meant intrlock - we do not call xnintr_detach with nklock acquired
>> (your pre-detach synch would not work as well if we did).
> 
> nklock has never been in the picture, no need to bring it in. If you
> drop intrlock across the release, you will end up with a race between
> attach and detach, and you may well unvirtualize the irq which was in
> the process of being attached from another CPU. That is part of the
> "touchy areas" I was mentioning lately. xnintr locking is entangled and
> assumes that unvirtualizing irq while holding a lock is fine, which is
> not. We have to live with it for now, and go for the solution with the
> smaller potential for regression. I will obviously welcome any update to
> xnintr which would alleviate those contraints, but rather aimed at
> -forge, not 2.x.

Will come up with two patches for stable, one for I-pipe and one for
Xenomai, later today. Then we can discuss which cases I'm missing.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-26  5:22                                     ` Jan Kiszka
@ 2010-10-26 19:33                                       ` Jan Kiszka
  2010-10-28  5:17                                         ` Philippe Gerum
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Kiszka @ 2010-10-26 19:33 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 1071 bytes --]

Am 26.10.2010 07:22, Jan Kiszka wrote:
> Will come up with two patches for stable, one for I-pipe and one for
> Xenomai, later today. Then we can discuss which cases I'm missing.

While meditating over my approach (which turned out to be less trivial
as expected - of course), I also reconsidered your current patches. The
concerns I had (forwarding of spurious IRQ to Linux) turned out to be
harmless (Linux will ignore such few spurious events).

Still, the approach to sync via shutting down the line for the current
domain before xnintr_irq_detach doesn't work for us. It only works if
xnintr_irq_detach actually detaches from the line, but it breaks if
there are users remaining.

We need intrlock to check if we are the last user while removing
ourselves from the list. And we cannot postpone line detaching after the
critical section as we may otherwise race with the next registration on
that line. IOW, I don't see how to solve the issue without moving the
drain after the detach and making the detach safer instead.

Do you agree?

Jan

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-26 19:33                                       ` Jan Kiszka
@ 2010-10-28  5:17                                         ` Philippe Gerum
  2010-10-28  7:31                                           ` Jan Kiszka
  0 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-10-28  5:17 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Tue, 2010-10-26 at 21:33 +0200, Jan Kiszka wrote:
> Am 26.10.2010 07:22, Jan Kiszka wrote:
> > Will come up with two patches for stable, one for I-pipe and one for
> > Xenomai, later today. Then we can discuss which cases I'm missing.
> 
> While meditating over my approach (which turned out to be less trivial
> as expected - of course), I also reconsidered your current patches. The
> concerns I had (forwarding of spurious IRQ to Linux) turned out to be
> harmless (Linux will ignore such few spurious events).
> 

That is not even an issue if you consider the sequence to be
xnarch_disable_irq then ipipe_control (new version, doing a critical
entry to flip the irq mode).

> Still, the approach to sync via shutting down the line for the current
> domain before xnintr_irq_detach doesn't work for us. It only works if
> xnintr_irq_detach actually detaches from the line, but it breaks if
> there are users remaining.
> 
> We need intrlock to check if we are the last user while removing
> ourselves from the list. And we cannot postpone line detaching after the
> critical section as we may otherwise race with the next registration on
> that line. IOW, I don't see how to solve the issue without moving the
> drain after the detach and making the detach safer instead.
> 
> Do you agree?
> 

I agree this is not trivial, for sure. To keep things simple, I would
introduce a new "teardown" flag to freeze the descriptor, thus avoiding
further attachments, while xnintr_detach can probe the shared list for
lingering users, and eventually call xnarch_disable_irq
+xnarch_ignore_irq+xnarch_release_irq in sequence with all locks
dropped, if empty.

The only adverse effect I can see ATM would be some concurrent caller of
xnintr_detach() blocked on the teardown flag on another CPU, albeit it
_could_ have joined the bandwagon, attaching the irq, in case the shared
list proved to remain active (and thus xnarch_release_irq was not
called). But this may also look like a simple way to prevent live
locking of interrupt descriptors. YMMV.

> Jan
> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-28  5:17                                         ` Philippe Gerum
@ 2010-10-28  7:31                                           ` Jan Kiszka
  2010-10-28  7:38                                             ` Jan Kiszka
  2010-10-28  7:46                                             ` Philippe Gerum
  0 siblings, 2 replies; 51+ messages in thread
From: Jan Kiszka @ 2010-10-28  7:31 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 2375 bytes --]

Am 28.10.2010 07:17, Philippe Gerum wrote:
> On Tue, 2010-10-26 at 21:33 +0200, Jan Kiszka wrote:
>> Am 26.10.2010 07:22, Jan Kiszka wrote:
>>> Will come up with two patches for stable, one for I-pipe and one for
>>> Xenomai, later today. Then we can discuss which cases I'm missing.
>>
>> While meditating over my approach (which turned out to be less trivial
>> as expected - of course), I also reconsidered your current patches. The
>> concerns I had (forwarding of spurious IRQ to Linux) turned out to be
>> harmless (Linux will ignore such few spurious events).
>>
> 
> That is not even an issue if you consider the sequence to be
> xnarch_disable_irq then ipipe_control (new version, doing a critical
> entry to flip the irq mode).

When you want to support shared IRQs, xnarch_disable_irq is tabu. I
suppose you meant some my_device_disable_irqs().

> 
>> Still, the approach to sync via shutting down the line for the current
>> domain before xnintr_irq_detach doesn't work for us. It only works if
>> xnintr_irq_detach actually detaches from the line, but it breaks if
>> there are users remaining.
>>
>> We need intrlock to check if we are the last user while removing
>> ourselves from the list. And we cannot postpone line detaching after the
>> critical section as we may otherwise race with the next registration on
>> that line. IOW, I don't see how to solve the issue without moving the
>> drain after the detach and making the detach safer instead.
>>
>> Do you agree?
>>
> 
> I agree this is not trivial, for sure. To keep things simple, I would
> introduce a new "teardown" flag to freeze the descriptor, thus avoiding
> further attachments, while xnintr_detach can probe the shared list for
> lingering users, and eventually call xnarch_disable_irq
> +xnarch_ignore_irq+xnarch_release_irq in sequence with all locks
> dropped, if empty.
> 
> The only adverse effect I can see ATM would be some concurrent caller of
> xnintr_detach() blocked on the teardown flag on another CPU, albeit it
> _could_ have joined the bandwagon, attaching the irq, in case the shared
> list proved to remain active (and thus xnarch_release_irq was not
> called). But this may also look like a simple way to prevent live
> locking of interrupt descriptors. YMMV.

This sounds like it's best discussed based on patches.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-28  7:31                                           ` Jan Kiszka
@ 2010-10-28  7:38                                             ` Jan Kiszka
  2010-10-28  7:46                                             ` Philippe Gerum
  1 sibling, 0 replies; 51+ messages in thread
From: Jan Kiszka @ 2010-10-28  7:38 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 3780 bytes --]

Am 28.10.2010 09:31, Jan Kiszka wrote:
> This sounds like it's best discussed based on patches.
> 

BTW, this is my current idea to make deregistration via
ipipe_virtualize_irq safe against ongoing IRQs, ie. the risk to invoke
a NULL handler:


From: Jan Kiszka <jan.kiszka@domain.hid>

ipipe: Catch ongoing IRQs while deregistering from the line

If ipipe_virtualize_irq is called for an IRQ that may be under
processing on a different CPU, we must not clear the handler to avoid
NULL pointer failures. However, this IRQ-on-the-fly already became
spurious before the caller invoked ipipe_virtualize_irq. So install a
dummy handler that swallows the event silently.

Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid>
---
 kernel/ipipe/core.c |   30 ++++++++++++++++++++++++------
 1 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/kernel/ipipe/core.c b/kernel/ipipe/core.c
index b3a8139..bf43bb8 100644
--- a/kernel/ipipe/core.c
+++ b/kernel/ipipe/core.c
@@ -277,6 +277,18 @@ void __init ipipe_init(void)
 	       IPIPE_VERSION_STRING);
 }
 
+static void ipipe_final_irq_handler(unsigned irq, struct irq_desc *desc)
+{
+	/*
+	 * De-virtualization of an IRQ may race with ongoing IRQ delivery on a
+	 * different CPU. This handler catches the delivery and ends the
+	 * request. It is assumed to be the final event as the caller of
+	 * ipipe_virtualize_irq() is supposed to have disabled any related IRQ
+	 * source at device level before deregistering the interrupt.
+	 */
+	desc->ipipe_end(irq, desc);
+}
+
 void __ipipe_init_stage(struct ipipe_domain *ipd)
 {
 	struct ipipe_percpu_domain_data *p;
@@ -292,7 +304,7 @@ void __ipipe_init_stage(struct ipipe_domain *ipd)
 
 	for (n = 0; n < IPIPE_NR_IRQS; n++) {
 		ipd->irqs[n].acknowledge = NULL;
-		ipd->irqs[n].handler = NULL;
+		ipd->irqs[n].handler = ipipe_final_irq_handler;
 		ipd->irqs[n].control = IPIPE_PASS_MASK;	/* Pass but don't handle */
 	}
 
@@ -924,14 +936,20 @@ int ipipe_virtualize_irq(struct ipipe_domain *ipd,
 		    ~(IPIPE_HANDLE_MASK | IPIPE_STICKY_MASK |
 		      IPIPE_EXCLUSIVE_MASK | IPIPE_WIRED_MASK);
 
-		ipd->irqs[irq].handler = NULL;
+		ipd->irqs[irq].handler = ipipe_final_irq_handler;
+		/*
+		 * Make sure we never call the old handler with an invalid
+		 * cookie. The dummy handler ignores the cookie. Set it first.
+		 */
+		smp_mb();
 		ipd->irqs[irq].cookie = NULL;
-		ipd->irqs[irq].acknowledge = NULL;
+		ipd->irqs[irq].acknowledge =
+			ipipe_root_domain->irqs[irq].acknowledge;
 		ipd->irqs[irq].control = modemask;
 
 		if (irq < NR_IRQS && !ipipe_virtual_irq_p(irq)) {
 			desc = irq_to_desc(irq);
-			if (old_handler && desc)
+			if (old_handler != ipipe_final_irq_handler && desc)
 				__ipipe_disable_irqdesc(ipd, irq);
 		}
 
@@ -941,7 +959,7 @@ int ipipe_virtualize_irq(struct ipipe_domain *ipd,
 	if (handler == IPIPE_SAME_HANDLER) {
 		cookie = ipd->irqs[irq].cookie;
 		handler = old_handler;
-		if (handler == NULL) {
+		if (handler == ipipe_final_irq_handler) {
 			ret = -EINVAL;
 			goto unlock_and_exit;
 		}
@@ -1024,7 +1042,7 @@ int ipipe_control_irq(struct ipipe_domain *ipd, unsigned int irq,
 		goto out;
 	}
 
-	if (ipd->irqs[irq].handler == NULL)
+	if (ipd->irqs[irq].handler == ipipe_final_irq_handler)
 		setmask &= ~(IPIPE_HANDLE_MASK | IPIPE_STICKY_MASK);
 
 	if ((setmask & IPIPE_STICKY_MASK) != 0)
-- 
1.7.1


I'm not your 100% happy with the fact that the old acknowledge handler
may be paired with ipipe_finale_irq_handler this way, but I see no
better approach so far.

So this patch just had to be paired with a barrier after the
deregistration (in Xenomai) to ensure all users of the old handlers and
cookies are done.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-28  7:31                                           ` Jan Kiszka
  2010-10-28  7:38                                             ` Jan Kiszka
@ 2010-10-28  7:46                                             ` Philippe Gerum
  2010-11-07 15:15                                               ` Philippe Gerum
  1 sibling, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-10-28  7:46 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Thu, 2010-10-28 at 09:31 +0200, Jan Kiszka wrote:
> Am 28.10.2010 07:17, Philippe Gerum wrote:
> > On Tue, 2010-10-26 at 21:33 +0200, Jan Kiszka wrote:
> >> Am 26.10.2010 07:22, Jan Kiszka wrote:
> >>> Will come up with two patches for stable, one for I-pipe and one for
> >>> Xenomai, later today. Then we can discuss which cases I'm missing.
> >>
> >> While meditating over my approach (which turned out to be less trivial
> >> as expected - of course), I also reconsidered your current patches. The
> >> concerns I had (forwarding of spurious IRQ to Linux) turned out to be
> >> harmless (Linux will ignore such few spurious events).
> >>
> > 
> > That is not even an issue if you consider the sequence to be
> > xnarch_disable_irq then ipipe_control (new version, doing a critical
> > entry to flip the irq mode).
> 
> When you want to support shared IRQs, xnarch_disable_irq is tabu. I
> suppose you meant some my_device_disable_irqs().

No, it is perfectly valid provided you made sure that no handler
remained on the shared list. There is absolutely no reason to keep a
line unmasked if no device is supposed to be active on it. Hence the
release sequence described earlier.

> 
> > 
> >> Still, the approach to sync via shutting down the line for the current
> >> domain before xnintr_irq_detach doesn't work for us. It only works if
> >> xnintr_irq_detach actually detaches from the line, but it breaks if
> >> there are users remaining.
> >>
> >> We need intrlock to check if we are the last user while removing
> >> ourselves from the list. And we cannot postpone line detaching after the
> >> critical section as we may otherwise race with the next registration on
> >> that line. IOW, I don't see how to solve the issue without moving the
> >> drain after the detach and making the detach safer instead.
> >>
> >> Do you agree?
> >>
> > 
> > I agree this is not trivial, for sure. To keep things simple, I would
> > introduce a new "teardown" flag to freeze the descriptor, thus avoiding
> > further attachments, while xnintr_detach can probe the shared list for
> > lingering users, and eventually call xnarch_disable_irq
> > +xnarch_ignore_irq+xnarch_release_irq in sequence with all locks
> > dropped, if empty.
> > 
> > The only adverse effect I can see ATM would be some concurrent caller of
> > xnintr_detach() blocked on the teardown flag on another CPU, albeit it
> > _could_ have joined the bandwagon, attaching the irq, in case the shared
> > list proved to remain active (and thus xnarch_release_irq was not
> > called). But this may also look like a simple way to prevent live
> > locking of interrupt descriptors. YMMV.
> 
> This sounds like it's best discussed based on patches.
> 

Likely, yes. I'll have a look when time allows.

> Jan
> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-10-28  7:46                                             ` Philippe Gerum
@ 2010-11-07 15:15                                               ` Philippe Gerum
  2010-11-07 16:22                                                 ` Jan Kiszka
  0 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-11-07 15:15 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Thu, 2010-10-28 at 09:46 +0200, Philippe Gerum wrote:
> On Thu, 2010-10-28 at 09:31 +0200, Jan Kiszka wrote:
> > Am 28.10.2010 07:17, Philippe Gerum wrote:
> > > On Tue, 2010-10-26 at 21:33 +0200, Jan Kiszka wrote:
> > >> Am 26.10.2010 07:22, Jan Kiszka wrote:
> > >>> Will come up with two patches for stable, one for I-pipe and one for
> > >>> Xenomai, later today. Then we can discuss which cases I'm missing.
> > >>
> > >> While meditating over my approach (which turned out to be less trivial
> > >> as expected - of course), I also reconsidered your current patches. The
> > >> concerns I had (forwarding of spurious IRQ to Linux) turned out to be
> > >> harmless (Linux will ignore such few spurious events).
> > >>
> > > 
> > > That is not even an issue if you consider the sequence to be
> > > xnarch_disable_irq then ipipe_control (new version, doing a critical
> > > entry to flip the irq mode).
> > 
> > When you want to support shared IRQs, xnarch_disable_irq is tabu. I
> > suppose you meant some my_device_disable_irqs().
> 
> No, it is perfectly valid provided you made sure that no handler
> remained on the shared list. There is absolutely no reason to keep a
> line unmasked if no device is supposed to be active on it. Hence the
> release sequence described earlier.
> 
> > 
> > > 
> > >> Still, the approach to sync via shutting down the line for the current
> > >> domain before xnintr_irq_detach doesn't work for us. It only works if
> > >> xnintr_irq_detach actually detaches from the line, but it breaks if
> > >> there are users remaining.
> > >>
> > >> We need intrlock to check if we are the last user while removing
> > >> ourselves from the list. And we cannot postpone line detaching after the
> > >> critical section as we may otherwise race with the next registration on
> > >> that line. IOW, I don't see how to solve the issue without moving the
> > >> drain after the detach and making the detach safer instead.
> > >>
> > >> Do you agree?
> > >>
> > > 
> > > I agree this is not trivial, for sure. To keep things simple, I would
> > > introduce a new "teardown" flag to freeze the descriptor, thus avoiding
> > > further attachments, while xnintr_detach can probe the shared list for
> > > lingering users, and eventually call xnarch_disable_irq
> > > +xnarch_ignore_irq+xnarch_release_irq in sequence with all locks
> > > dropped, if empty.
> > > 
> > > The only adverse effect I can see ATM would be some concurrent caller of
> > > xnintr_detach() blocked on the teardown flag on another CPU, albeit it
> > > _could_ have joined the bandwagon, attaching the irq, in case the shared
> > > list proved to remain active (and thus xnarch_release_irq was not
> > > called). But this may also look like a simple way to prevent live
> > > locking of interrupt descriptors. YMMV.
> > 
> > This sounds like it's best discussed based on patches.
> > 
> 
> Likely, yes. I'll have a look when time allows.

The following patches implements the teardown approach. The basic idea
is:
- neither break nor improve old setups with legacy I-pipe patches not
providing the revised ipipe_control_irq call.
- fix the SMP race when detaching interrupts.

The last patch also fixes two other issues:
- do not alter the irq descriptor (e.g. cookie and stats) if the
attachment fails early
- do not set irq affinity before the validity checks, and set it only
for the first handler introduced in the shared list.

http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=ea771aec61c399dc85e14d3de571cffaa5b9b1e7
http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4fa4643f8e59ca0528b059f82e97c0b37419e62e

> 
> > Jan
> > 
> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-07 15:15                                               ` Philippe Gerum
@ 2010-11-07 16:22                                                 ` Jan Kiszka
  2010-11-07 16:55                                                   ` Philippe Gerum
                                                                     ` (3 more replies)
  0 siblings, 4 replies; 51+ messages in thread
From: Jan Kiszka @ 2010-11-07 16:22 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 3614 bytes --]

Am 07.11.2010 16:15, Philippe Gerum wrote:
> On Thu, 2010-10-28 at 09:46 +0200, Philippe Gerum wrote:
>> On Thu, 2010-10-28 at 09:31 +0200, Jan Kiszka wrote:
>>> Am 28.10.2010 07:17, Philippe Gerum wrote:
>>>> On Tue, 2010-10-26 at 21:33 +0200, Jan Kiszka wrote:
>>>>> Am 26.10.2010 07:22, Jan Kiszka wrote:
>>>>>> Will come up with two patches for stable, one for I-pipe and one for
>>>>>> Xenomai, later today. Then we can discuss which cases I'm missing.
>>>>>
>>>>> While meditating over my approach (which turned out to be less trivial
>>>>> as expected - of course), I also reconsidered your current patches. The
>>>>> concerns I had (forwarding of spurious IRQ to Linux) turned out to be
>>>>> harmless (Linux will ignore such few spurious events).
>>>>>
>>>>
>>>> That is not even an issue if you consider the sequence to be
>>>> xnarch_disable_irq then ipipe_control (new version, doing a critical
>>>> entry to flip the irq mode).
>>>
>>> When you want to support shared IRQs, xnarch_disable_irq is tabu. I
>>> suppose you meant some my_device_disable_irqs().
>>
>> No, it is perfectly valid provided you made sure that no handler
>> remained on the shared list. There is absolutely no reason to keep a
>> line unmasked if no device is supposed to be active on it. Hence the
>> release sequence described earlier.
>>
>>>
>>>>
>>>>> Still, the approach to sync via shutting down the line for the current
>>>>> domain before xnintr_irq_detach doesn't work for us. It only works if
>>>>> xnintr_irq_detach actually detaches from the line, but it breaks if
>>>>> there are users remaining.
>>>>>
>>>>> We need intrlock to check if we are the last user while removing
>>>>> ourselves from the list. And we cannot postpone line detaching after the
>>>>> critical section as we may otherwise race with the next registration on
>>>>> that line. IOW, I don't see how to solve the issue without moving the
>>>>> drain after the detach and making the detach safer instead.
>>>>>
>>>>> Do you agree?
>>>>>
>>>>
>>>> I agree this is not trivial, for sure. To keep things simple, I would
>>>> introduce a new "teardown" flag to freeze the descriptor, thus avoiding
>>>> further attachments, while xnintr_detach can probe the shared list for
>>>> lingering users, and eventually call xnarch_disable_irq
>>>> +xnarch_ignore_irq+xnarch_release_irq in sequence with all locks
>>>> dropped, if empty.
>>>>
>>>> The only adverse effect I can see ATM would be some concurrent caller of
>>>> xnintr_detach() blocked on the teardown flag on another CPU, albeit it
>>>> _could_ have joined the bandwagon, attaching the irq, in case the shared
>>>> list proved to remain active (and thus xnarch_release_irq was not
>>>> called). But this may also look like a simple way to prevent live
>>>> locking of interrupt descriptors. YMMV.
>>>
>>> This sounds like it's best discussed based on patches.
>>>
>>
>> Likely, yes. I'll have a look when time allows.
> 
> The following patches implements the teardown approach. The basic idea
> is:
> - neither break nor improve old setups with legacy I-pipe patches not
> providing the revised ipipe_control_irq call.
> - fix the SMP race when detaching interrupts.

Looks good.

> 
> The last patch also fixes two other issues:
> - do not alter the irq descriptor (e.g. cookie and stats) if the
> attachment fails early
> - do not set irq affinity before the validity checks, and set it only
> for the first handler introduced in the shared list.

Separate commits? At least mention it in the change log.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-07 16:22                                                 ` Jan Kiszka
@ 2010-11-07 16:55                                                   ` Philippe Gerum
  2010-11-07 16:59                                                   ` Philippe Gerum
                                                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 51+ messages in thread
From: Philippe Gerum @ 2010-11-07 16:55 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Sun, 2010-11-07 at 17:22 +0100, Jan Kiszka wrote:
> Am 07.11.2010 16:15, Philippe Gerum wrote:
> > On Thu, 2010-10-28 at 09:46 +0200, Philippe Gerum wrote:
> >> On Thu, 2010-10-28 at 09:31 +0200, Jan Kiszka wrote:
> >>> Am 28.10.2010 07:17, Philippe Gerum wrote:
> >>>> On Tue, 2010-10-26 at 21:33 +0200, Jan Kiszka wrote:
> >>>>> Am 26.10.2010 07:22, Jan Kiszka wrote:
> >>>>>> Will come up with two patches for stable, one for I-pipe and one for
> >>>>>> Xenomai, later today. Then we can discuss which cases I'm missing.
> >>>>>
> >>>>> While meditating over my approach (which turned out to be less trivial
> >>>>> as expected - of course), I also reconsidered your current patches. The
> >>>>> concerns I had (forwarding of spurious IRQ to Linux) turned out to be
> >>>>> harmless (Linux will ignore such few spurious events).
> >>>>>
> >>>>
> >>>> That is not even an issue if you consider the sequence to be
> >>>> xnarch_disable_irq then ipipe_control (new version, doing a critical
> >>>> entry to flip the irq mode).
> >>>
> >>> When you want to support shared IRQs, xnarch_disable_irq is tabu. I
> >>> suppose you meant some my_device_disable_irqs().
> >>
> >> No, it is perfectly valid provided you made sure that no handler
> >> remained on the shared list. There is absolutely no reason to keep a
> >> line unmasked if no device is supposed to be active on it. Hence the
> >> release sequence described earlier.
> >>
> >>>
> >>>>
> >>>>> Still, the approach to sync via shutting down the line for the current
> >>>>> domain before xnintr_irq_detach doesn't work for us. It only works if
> >>>>> xnintr_irq_detach actually detaches from the line, but it breaks if
> >>>>> there are users remaining.
> >>>>>
> >>>>> We need intrlock to check if we are the last user while removing
> >>>>> ourselves from the list. And we cannot postpone line detaching after the
> >>>>> critical section as we may otherwise race with the next registration on
> >>>>> that line. IOW, I don't see how to solve the issue without moving the
> >>>>> drain after the detach and making the detach safer instead.
> >>>>>
> >>>>> Do you agree?
> >>>>>
> >>>>
> >>>> I agree this is not trivial, for sure. To keep things simple, I would
> >>>> introduce a new "teardown" flag to freeze the descriptor, thus avoiding
> >>>> further attachments, while xnintr_detach can probe the shared list for
> >>>> lingering users, and eventually call xnarch_disable_irq
> >>>> +xnarch_ignore_irq+xnarch_release_irq in sequence with all locks
> >>>> dropped, if empty.
> >>>>
> >>>> The only adverse effect I can see ATM would be some concurrent caller of
> >>>> xnintr_detach() blocked on the teardown flag on another CPU, albeit it
> >>>> _could_ have joined the bandwagon, attaching the irq, in case the shared
> >>>> list proved to remain active (and thus xnarch_release_irq was not
> >>>> called). But this may also look like a simple way to prevent live
> >>>> locking of interrupt descriptors. YMMV.
> >>>
> >>> This sounds like it's best discussed based on patches.
> >>>
> >>
> >> Likely, yes. I'll have a look when time allows.
> > 
> > The following patches implements the teardown approach. The basic idea
> > is:
> > - neither break nor improve old setups with legacy I-pipe patches not
> > providing the revised ipipe_control_irq call.
> > - fix the SMP race when detaching interrupts.
> 
> Looks good.
> 
> > 
> > The last patch also fixes two other issues:
> > - do not alter the irq descriptor (e.g. cookie and stats) if the
> > attachment fails early
> > - do not set irq affinity before the validity checks, and set it only
> > for the first handler introduced in the shared list.
> 
> Separate commits? At least mention it in the change log.

I'm unsure this really matters. The bugs could only bite when taking the
error path (who makes mistakes these days?), and the affinity fix
depends on the race fix.

> 
> Jan
> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-07 16:22                                                 ` Jan Kiszka
  2010-11-07 16:55                                                   ` Philippe Gerum
@ 2010-11-07 16:59                                                   ` Philippe Gerum
  2010-11-07 17:19                                                   ` Philippe Gerum
  2010-11-09  8:01                                                   ` Jan Kiszka
  3 siblings, 0 replies; 51+ messages in thread
From: Philippe Gerum @ 2010-11-07 16:59 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Sun, 2010-11-07 at 17:22 +0100, Jan Kiszka wrote:
> Am 07.11.2010 16:15, Philippe Gerum wrote:
> > On Thu, 2010-10-28 at 09:46 +0200, Philippe Gerum wrote:
> >> On Thu, 2010-10-28 at 09:31 +0200, Jan Kiszka wrote:
> >>> Am 28.10.2010 07:17, Philippe Gerum wrote:
> >>>> On Tue, 2010-10-26 at 21:33 +0200, Jan Kiszka wrote:
> >>>>> Am 26.10.2010 07:22, Jan Kiszka wrote:
> >>>>>> Will come up with two patches for stable, one for I-pipe and one for
> >>>>>> Xenomai, later today. Then we can discuss which cases I'm missing.
> >>>>>
> >>>>> While meditating over my approach (which turned out to be less trivial
> >>>>> as expected - of course), I also reconsidered your current patches. The
> >>>>> concerns I had (forwarding of spurious IRQ to Linux) turned out to be
> >>>>> harmless (Linux will ignore such few spurious events).
> >>>>>
> >>>>
> >>>> That is not even an issue if you consider the sequence to be
> >>>> xnarch_disable_irq then ipipe_control (new version, doing a critical
> >>>> entry to flip the irq mode).
> >>>
> >>> When you want to support shared IRQs, xnarch_disable_irq is tabu. I
> >>> suppose you meant some my_device_disable_irqs().
> >>
> >> No, it is perfectly valid provided you made sure that no handler
> >> remained on the shared list. There is absolutely no reason to keep a
> >> line unmasked if no device is supposed to be active on it. Hence the
> >> release sequence described earlier.
> >>
> >>>
> >>>>
> >>>>> Still, the approach to sync via shutting down the line for the current
> >>>>> domain before xnintr_irq_detach doesn't work for us. It only works if
> >>>>> xnintr_irq_detach actually detaches from the line, but it breaks if
> >>>>> there are users remaining.
> >>>>>
> >>>>> We need intrlock to check if we are the last user while removing
> >>>>> ourselves from the list. And we cannot postpone line detaching after the
> >>>>> critical section as we may otherwise race with the next registration on
> >>>>> that line. IOW, I don't see how to solve the issue without moving the
> >>>>> drain after the detach and making the detach safer instead.
> >>>>>
> >>>>> Do you agree?
> >>>>>
> >>>>
> >>>> I agree this is not trivial, for sure. To keep things simple, I would
> >>>> introduce a new "teardown" flag to freeze the descriptor, thus avoiding
> >>>> further attachments, while xnintr_detach can probe the shared list for
> >>>> lingering users, and eventually call xnarch_disable_irq
> >>>> +xnarch_ignore_irq+xnarch_release_irq in sequence with all locks
> >>>> dropped, if empty.
> >>>>
> >>>> The only adverse effect I can see ATM would be some concurrent caller of
> >>>> xnintr_detach() blocked on the teardown flag on another CPU, albeit it
> >>>> _could_ have joined the bandwagon, attaching the irq, in case the shared
> >>>> list proved to remain active (and thus xnarch_release_irq was not
> >>>> called). But this may also look like a simple way to prevent live
> >>>> locking of interrupt descriptors. YMMV.
> >>>
> >>> This sounds like it's best discussed based on patches.
> >>>
> >>
> >> Likely, yes. I'll have a look when time allows.
> > 
> > The following patches implements the teardown approach. The basic idea
> > is:
> > - neither break nor improve old setups with legacy I-pipe patches not
> > providing the revised ipipe_control_irq call.
> > - fix the SMP race when detaching interrupts.
> 
> Looks good.
> 
> > 
> > The last patch also fixes two other issues:
> > - do not alter the irq descriptor (e.g. cookie and stats) if the
> > attachment fails early
> > - do not set irq affinity before the validity checks, and set it only
> > for the first handler introduced in the shared list.
> 
> Separate commits? At least mention it in the change log.

Yeah, ok. In fact it makes sense to mention this in the last commit log
for those who still show bad taste enough to make programming errors.
Ahem.

> 
> Jan
> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-07 16:22                                                 ` Jan Kiszka
  2010-11-07 16:55                                                   ` Philippe Gerum
  2010-11-07 16:59                                                   ` Philippe Gerum
@ 2010-11-07 17:19                                                   ` Philippe Gerum
  2010-11-09  8:01                                                   ` Jan Kiszka
  3 siblings, 0 replies; 51+ messages in thread
From: Philippe Gerum @ 2010-11-07 17:19 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Sun, 2010-11-07 at 17:22 +0100, Jan Kiszka wrote:
> Am 07.11.2010 16:15, Philippe Gerum wrote:
> > On Thu, 2010-10-28 at 09:46 +0200, Philippe Gerum wrote:
> >> On Thu, 2010-10-28 at 09:31 +0200, Jan Kiszka wrote:
> >>> Am 28.10.2010 07:17, Philippe Gerum wrote:
> >>>> On Tue, 2010-10-26 at 21:33 +0200, Jan Kiszka wrote:
> >>>>> Am 26.10.2010 07:22, Jan Kiszka wrote:
> >>>>>> Will come up with two patches for stable, one for I-pipe and one for
> >>>>>> Xenomai, later today. Then we can discuss which cases I'm missing.
> >>>>>
> >>>>> While meditating over my approach (which turned out to be less trivial
> >>>>> as expected - of course), I also reconsidered your current patches. The
> >>>>> concerns I had (forwarding of spurious IRQ to Linux) turned out to be
> >>>>> harmless (Linux will ignore such few spurious events).
> >>>>>
> >>>>
> >>>> That is not even an issue if you consider the sequence to be
> >>>> xnarch_disable_irq then ipipe_control (new version, doing a critical
> >>>> entry to flip the irq mode).
> >>>
> >>> When you want to support shared IRQs, xnarch_disable_irq is tabu. I
> >>> suppose you meant some my_device_disable_irqs().
> >>
> >> No, it is perfectly valid provided you made sure that no handler
> >> remained on the shared list. There is absolutely no reason to keep a
> >> line unmasked if no device is supposed to be active on it. Hence the
> >> release sequence described earlier.
> >>
> >>>
> >>>>
> >>>>> Still, the approach to sync via shutting down the line for the current
> >>>>> domain before xnintr_irq_detach doesn't work for us. It only works if
> >>>>> xnintr_irq_detach actually detaches from the line, but it breaks if
> >>>>> there are users remaining.
> >>>>>
> >>>>> We need intrlock to check if we are the last user while removing
> >>>>> ourselves from the list. And we cannot postpone line detaching after the
> >>>>> critical section as we may otherwise race with the next registration on
> >>>>> that line. IOW, I don't see how to solve the issue without moving the
> >>>>> drain after the detach and making the detach safer instead.
> >>>>>
> >>>>> Do you agree?
> >>>>>
> >>>>
> >>>> I agree this is not trivial, for sure. To keep things simple, I would
> >>>> introduce a new "teardown" flag to freeze the descriptor, thus avoiding
> >>>> further attachments, while xnintr_detach can probe the shared list for
> >>>> lingering users, and eventually call xnarch_disable_irq
> >>>> +xnarch_ignore_irq+xnarch_release_irq in sequence with all locks
> >>>> dropped, if empty.
> >>>>
> >>>> The only adverse effect I can see ATM would be some concurrent caller of
> >>>> xnintr_detach() blocked on the teardown flag on another CPU, albeit it
> >>>> _could_ have joined the bandwagon, attaching the irq, in case the shared
> >>>> list proved to remain active (and thus xnarch_release_irq was not
> >>>> called). But this may also look like a simple way to prevent live
> >>>> locking of interrupt descriptors. YMMV.
> >>>
> >>> This sounds like it's best discussed based on patches.
> >>>
> >>
> >> Likely, yes. I'll have a look when time allows.
> > 
> > The following patches implements the teardown approach. The basic idea
> > is:
> > - neither break nor improve old setups with legacy I-pipe patches not
> > providing the revised ipipe_control_irq call.
> > - fix the SMP race when detaching interrupts.
> 
> Looks good.
> 
> > 
> > The last patch also fixes two other issues:
> > - do not alter the irq descriptor (e.g. cookie and stats) if the
> > attachment fails early
> > - do not set irq affinity before the validity checks, and set it only
> > for the first handler introduced in the shared list.
> 
> Separate commits? At least mention it in the change log.

I have just added an assertion to trigger when xnintr_attach is called
with interrupts off, since doing so would introduce a possibility of
deadlocking in the teardown wait loop.

> 
> Jan
> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-07 16:22                                                 ` Jan Kiszka
                                                                     ` (2 preceding siblings ...)
  2010-11-07 17:19                                                   ` Philippe Gerum
@ 2010-11-09  8:01                                                   ` Jan Kiszka
  2010-11-09  8:26                                                     ` Philippe Gerum
  3 siblings, 1 reply; 51+ messages in thread
From: Jan Kiszka @ 2010-11-09  8:01 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]

Am 07.11.2010 17:22, Jan Kiszka wrote:
> Am 07.11.2010 16:15, Philippe Gerum wrote:
>> The following patches implements the teardown approach. The basic idea
>> is:
>> - neither break nor improve old setups with legacy I-pipe patches not
>> providing the revised ipipe_control_irq call.
>> - fix the SMP race when detaching interrupts.
> 
> Looks good.

This actually causes one regression: I've just learned that people are
already happily using MSIs with Xenomai in the field. This is perfectly
fine as long as you don't fiddle with rtdm_irq_disable/enable in
non-root contexts or while hard IRQs are disable. The latter requirement
would be violated by this fix now.

I've evaluated hardening MSI disable/enable in further details in the
meantime. But after collecting information about the latency impacts of
accessing PCI devices' config spaces during some KVM pass-through work,
I finally had to give up this path. What remains (besides restricting
the irq_disable/enable usage) is a software-maintained mask, but that
also requires updated I-pipe patches and refactorings on Xenomai's HAL.

Jan

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-09  8:01                                                   ` Jan Kiszka
@ 2010-11-09  8:26                                                     ` Philippe Gerum
  2010-11-09  8:39                                                       ` Jan Kiszka
  0 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-11-09  8:26 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Tue, 2010-11-09 at 09:01 +0100, Jan Kiszka wrote:
> Am 07.11.2010 17:22, Jan Kiszka wrote:
> > Am 07.11.2010 16:15, Philippe Gerum wrote:
> >> The following patches implements the teardown approach. The basic idea
> >> is:
> >> - neither break nor improve old setups with legacy I-pipe patches not
> >> providing the revised ipipe_control_irq call.
> >> - fix the SMP race when detaching interrupts.
> > 
> > Looks good.
> 
> This actually causes one regression: I've just learned that people are
> already happily using MSIs with Xenomai in the field. This is perfectly
> fine as long as you don't fiddle with rtdm_irq_disable/enable in
> non-root contexts or while hard IRQs are disable. The latter requirement
> would be violated by this fix now.

What we could do is handle this corner-case in the ipipe directly, going
for a nop when IRQs are off on a per-arch basis only to please those
users, because I don't think we can generally tell people that using MSI
is fine right now with respect to the above limitations. Besides, we
can't enable CONFIG_PCI_MSI at all on powerpc 83xx yet (I suspect most
other powerpc platforms are broken the same way), this simply causes a
lockup at boot. So more work is really needed all over the place for
having MSI officially supported in Xenomai.

> 
> I've evaluated hardening MSI disable/enable in further details in the
> meantime. But after collecting information about the latency impacts of
> accessing PCI devices' config spaces during some KVM pass-through work,
> I finally had to give up this path. What remains (besides restricting
> the irq_disable/enable usage) is a software-maintained mask, but that
> also requires updated I-pipe patches and refactorings on Xenomai's HAL.

I agree that trying to fit the PCI config accesses over the primary
domain would be just insane, I see way too many intricacies and room for
deadly issues as well.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-09  8:26                                                     ` Philippe Gerum
@ 2010-11-09  8:39                                                       ` Jan Kiszka
  2010-11-09  9:36                                                         ` Philippe Gerum
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Kiszka @ 2010-11-09  8:39 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 2624 bytes --]

Am 09.11.2010 09:26, Philippe Gerum wrote:
> On Tue, 2010-11-09 at 09:01 +0100, Jan Kiszka wrote:
>> Am 07.11.2010 17:22, Jan Kiszka wrote:
>>> Am 07.11.2010 16:15, Philippe Gerum wrote:
>>>> The following patches implements the teardown approach. The basic idea
>>>> is:
>>>> - neither break nor improve old setups with legacy I-pipe patches not
>>>> providing the revised ipipe_control_irq call.
>>>> - fix the SMP race when detaching interrupts.
>>>
>>> Looks good.
>>
>> This actually causes one regression: I've just learned that people are
>> already happily using MSIs with Xenomai in the field. This is perfectly
>> fine as long as you don't fiddle with rtdm_irq_disable/enable in
>> non-root contexts or while hard IRQs are disable. The latter requirement
>> would be violated by this fix now.
> 
> What we could do is handle this corner-case in the ipipe directly, going
> for a nop when IRQs are off on a per-arch basis only to please those
> users,

Don't we disable hard IRQs also then the root domain is the only
registered one? I'm worried about pushing regressions around, then to
plain Linux use-cases of MSI (which are not broken in anyway - except
for powerpc).

> because I don't think we can generally tell people that using MSI
> is fine right now with respect to the above limitations. Besides, we
> can't enable CONFIG_PCI_MSI at all on powerpc 83xx yet (I suspect most
> other powerpc platforms are broken the same way), this simply causes a
> lockup at boot.

OK, but this is most probably an arch-specific issue. I saw no issues
inherent to MSI support in the generic PCI driver.

> So more work is really needed all over the place for
> having MSI officially supported in Xenomai.
> 
>>
>> I've evaluated hardening MSI disable/enable in further details in the
>> meantime. But after collecting information about the latency impacts of
>> accessing PCI devices' config spaces during some KVM pass-through work,
>> I finally had to give up this path. What remains (besides restricting
>> the irq_disable/enable usage) is a software-maintained mask, but that
>> also requires updated I-pipe patches and refactorings on Xenomai's HAL.
> 
> I agree that trying to fit the PCI config accesses over the primary
> domain would be just insane, I see way too many intricacies and room for
> deadly issues as well.

Most deadly is the fact that insane hardware, probably firmware, can
impact this path under our feet. And we can't isolate those accesses
even if the devices are well known - the access method is a system-wide
shared resource.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-09  8:39                                                       ` Jan Kiszka
@ 2010-11-09  9:36                                                         ` Philippe Gerum
  2010-11-09 13:12                                                           ` Jan Kiszka
  0 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-11-09  9:36 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Tue, 2010-11-09 at 09:39 +0100, Jan Kiszka wrote:
> Am 09.11.2010 09:26, Philippe Gerum wrote:
> > On Tue, 2010-11-09 at 09:01 +0100, Jan Kiszka wrote:
> >> Am 07.11.2010 17:22, Jan Kiszka wrote:
> >>> Am 07.11.2010 16:15, Philippe Gerum wrote:
> >>>> The following patches implements the teardown approach. The basic idea
> >>>> is:
> >>>> - neither break nor improve old setups with legacy I-pipe patches not
> >>>> providing the revised ipipe_control_irq call.
> >>>> - fix the SMP race when detaching interrupts.
> >>>
> >>> Looks good.
> >>
> >> This actually causes one regression: I've just learned that people are
> >> already happily using MSIs with Xenomai in the field. This is perfectly
> >> fine as long as you don't fiddle with rtdm_irq_disable/enable in
> >> non-root contexts or while hard IRQs are disable. The latter requirement
> >> would be violated by this fix now.
> > 
> > What we could do is handle this corner-case in the ipipe directly, going
> > for a nop when IRQs are off on a per-arch basis only to please those
> > users,
> 
> Don't we disable hard IRQs also then the root domain is the only
> registered one? I'm worried about pushing regressions around, then to
> plain Linux use-cases of MSI (which are not broken in anyway - except
> for powerpc).

The idea is to provide an ad hoc ipipe service for this, to be used by
the HAL. A service that would check the controller for the target IRQ,
and handle MSI ones conditionally. For sure, we just can't put those
conditionally bluntly into the chip mask handler and expect the kernel
to be happy.

In fact, we already have __ipipe_enable/disable_irq from the internal
Adeos interface avail, but they are mostly wrappers for now. We could
make them a bit more smart, and handle the MSI issue as well. We would
then tell the HAL to switch to using those arch-agnostic helpers
generally, instead of peeking directly into the chip controller structs
like today.

If that ipipe "feature" is not detected by the HAL, then we would
refrain from disabling the IRQ in xnintr_detach. In effect, this would
leave the SMP race window open, but since we need recent ipipes to get
it plugged already anyway (for the revised ipipe_control_irq), we would
still remain in the current situation:
- old patches? no SMP race fix, no regression
- new patches? SMP race fix avail, no regression

> 
> > because I don't think we can generally tell people that using MSI
> > is fine right now with respect to the above limitations. Besides, we
> > can't enable CONFIG_PCI_MSI at all on powerpc 83xx yet (I suspect most
> > other powerpc platforms are broken the same way), this simply causes a
> > lockup at boot.
> 
> OK, but this is most probably an arch-specific issue. I saw no issues
> inherent to MSI support in the generic PCI driver.

Yes, likely. But still, MSI only works in a well-defined context over
x86 for now.

> 
> > So more work is really needed all over the place for
> > having MSI officially supported in Xenomai.
> > 
> >>
> >> I've evaluated hardening MSI disable/enable in further details in the
> >> meantime. But after collecting information about the latency impacts of
> >> accessing PCI devices' config spaces during some KVM pass-through work,
> >> I finally had to give up this path. What remains (besides restricting
> >> the irq_disable/enable usage) is a software-maintained mask, but that
> >> also requires updated I-pipe patches and refactorings on Xenomai's HAL.
> > 
> > I agree that trying to fit the PCI config accesses over the primary
> > domain would be just insane, I see way too many intricacies and room for
> > deadly issues as well.
> 
> Most deadly is the fact that insane hardware, probably firmware, can
> impact this path under our feet. And we can't isolate those accesses
> even if the devices are well known - the access method is a system-wide
> shared resource.
> 
> Jan
> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-09  9:36                                                         ` Philippe Gerum
@ 2010-11-09 13:12                                                           ` Jan Kiszka
  2010-11-12  8:48                                                             ` Philippe Gerum
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Kiszka @ 2010-11-09 13:12 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 2855 bytes --]

Am 09.11.2010 10:36, Philippe Gerum wrote:
> On Tue, 2010-11-09 at 09:39 +0100, Jan Kiszka wrote:
>> Am 09.11.2010 09:26, Philippe Gerum wrote:
>>> On Tue, 2010-11-09 at 09:01 +0100, Jan Kiszka wrote:
>>>> Am 07.11.2010 17:22, Jan Kiszka wrote:
>>>>> Am 07.11.2010 16:15, Philippe Gerum wrote:
>>>>>> The following patches implements the teardown approach. The basic idea
>>>>>> is:
>>>>>> - neither break nor improve old setups with legacy I-pipe patches not
>>>>>> providing the revised ipipe_control_irq call.
>>>>>> - fix the SMP race when detaching interrupts.
>>>>>
>>>>> Looks good.
>>>>
>>>> This actually causes one regression: I've just learned that people are
>>>> already happily using MSIs with Xenomai in the field. This is perfectly
>>>> fine as long as you don't fiddle with rtdm_irq_disable/enable in
>>>> non-root contexts or while hard IRQs are disable. The latter requirement
>>>> would be violated by this fix now.
>>>
>>> What we could do is handle this corner-case in the ipipe directly, going
>>> for a nop when IRQs are off on a per-arch basis only to please those
>>> users,
>>
>> Don't we disable hard IRQs also then the root domain is the only
>> registered one? I'm worried about pushing regressions around, then to
>> plain Linux use-cases of MSI (which are not broken in anyway - except
>> for powerpc).
> 
> The idea is to provide an ad hoc ipipe service for this, to be used by
> the HAL. A service that would check the controller for the target IRQ,
> and handle MSI ones conditionally. For sure, we just can't put those
> conditionally bluntly into the chip mask handler and expect the kernel
> to be happy.
> 
> In fact, we already have __ipipe_enable/disable_irq from the internal
> Adeos interface avail, but they are mostly wrappers for now. We could
> make them a bit more smart, and handle the MSI issue as well. We would
> then tell the HAL to switch to using those arch-agnostic helpers
> generally, instead of peeking directly into the chip controller structs
> like today.

This belongs to I-pipe, like we already have ipipe_end, just properly
wrapped to avoid descriptor access. That's specifically important if we
want to emulate MSI masking in software. I've the generic I-pipe
infrastructure ready, but the backend, so far consisting of x86 MSI
hardening, unfortunately needs to be rewritten.

> 
> If that ipipe "feature" is not detected by the HAL, then we would
> refrain from disabling the IRQ in xnintr_detach. In effect, this would
> leave the SMP race window open, but since we need recent ipipes to get
> it plugged already anyway (for the revised ipipe_control_irq), we would
> still remain in the current situation:
> - old patches? no SMP race fix, no regression
> - new patches? SMP race fix avail, no regression

Sounds good.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-09 13:12                                                           ` Jan Kiszka
@ 2010-11-12  8:48                                                             ` Philippe Gerum
  2010-11-12  9:14                                                               ` Jan Kiszka
  0 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-11-12  8:48 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Tue, 2010-11-09 at 14:12 +0100, Jan Kiszka wrote:
> Am 09.11.2010 10:36, Philippe Gerum wrote:
> > On Tue, 2010-11-09 at 09:39 +0100, Jan Kiszka wrote:
> >> Am 09.11.2010 09:26, Philippe Gerum wrote:
> >>> On Tue, 2010-11-09 at 09:01 +0100, Jan Kiszka wrote:
> >>>> Am 07.11.2010 17:22, Jan Kiszka wrote:
> >>>>> Am 07.11.2010 16:15, Philippe Gerum wrote:
> >>>>>> The following patches implements the teardown approach. The basic idea
> >>>>>> is:
> >>>>>> - neither break nor improve old setups with legacy I-pipe patches not
> >>>>>> providing the revised ipipe_control_irq call.
> >>>>>> - fix the SMP race when detaching interrupts.
> >>>>>
> >>>>> Looks good.
> >>>>
> >>>> This actually causes one regression: I've just learned that people are
> >>>> already happily using MSIs with Xenomai in the field. This is perfectly
> >>>> fine as long as you don't fiddle with rtdm_irq_disable/enable in
> >>>> non-root contexts or while hard IRQs are disable. The latter requirement
> >>>> would be violated by this fix now.
> >>>
> >>> What we could do is handle this corner-case in the ipipe directly, going
> >>> for a nop when IRQs are off on a per-arch basis only to please those
> >>> users,
> >>
> >> Don't we disable hard IRQs also then the root domain is the only
> >> registered one? I'm worried about pushing regressions around, then to
> >> plain Linux use-cases of MSI (which are not broken in anyway - except
> >> for powerpc).
> > 
> > The idea is to provide an ad hoc ipipe service for this, to be used by
> > the HAL. A service that would check the controller for the target IRQ,
> > and handle MSI ones conditionally. For sure, we just can't put those
> > conditionally bluntly into the chip mask handler and expect the kernel
> > to be happy.
> > 
> > In fact, we already have __ipipe_enable/disable_irq from the internal
> > Adeos interface avail, but they are mostly wrappers for now. We could
> > make them a bit more smart, and handle the MSI issue as well. We would
> > then tell the HAL to switch to using those arch-agnostic helpers
> > generally, instead of peeking directly into the chip controller structs
> > like today.
> 
> This belongs to I-pipe, like we already have ipipe_end, just properly
> wrapped to avoid descriptor access. That's specifically important if we
> want to emulate MSI masking in software. I've the generic I-pipe
> infrastructure ready, but the backend, so far consisting of x86 MSI
> hardening, unfortunately needs to be rewritten.
> 
> > 
> > If that ipipe "feature" is not detected by the HAL, then we would
> > refrain from disabling the IRQ in xnintr_detach. In effect, this would
> > leave the SMP race window open, but since we need recent ipipes to get
> > it plugged already anyway (for the revised ipipe_control_irq), we would
> > still remain in the current situation:
> > - old patches? no SMP race fix, no regression
> > - new patches? SMP race fix avail, no regression
> 
> Sounds good.

Now that I slept on it, I find the approach of working around pipeline
limitations this way, to be incorrect.

Basically, the issue is that we still don't have 100% reliable handling
of MSI interrupts (actually, we only have partial handling, and solely
for x86), but this is no reason to introduce code in the pipeline
interface which would perpetuate this fact. I see this as a "all or
nothing" issue: either MSI is fully handled and there shall be no
restriction on applying common operations such as masking/unmasking on
the related IRQs, or it is not, and we should not export "conditionally
working" APIs.

In the latter case, the responsibility to rely on MSI support belongs to
the user, which then should know about the pending restrictions, and
decides for himself whether to use MSI. So I'm heading to this solution
instead:

- when detaching the last handler for a given IRQ, instead of forcibly
disabling the IRQ line, the nucleus would just make sure that such IRQ
is already in a disabled state, and bail out on error if not (probably
with a kernel warning to make the issue obvious).

- track the IRQ line state from xnintr_enable/xnintr_disable routines,
so that xnintr_detach can determine whether the call is legit. Of
course, this also means that any attempt to take sideways to
enable/disable nucleus managed interrupts at PIC level would break that
logic, but doing so would be the root bug anyway.

The advantage of doing so would be three-fold:

- no pipeline code to acknowledge (or even perpetuate) the fact that MSI
support is half working, half broken. We need to fix it properly, so
that we can use it 100% reliably, from whatever context commonly allowed
for enabling/disabling IRQs (and not "from root domain with IRQs on"
only). Typically, I fail to see how one would cope with such limitation,
if a real-time handler detects that some device is going wild and really
needs to shut it down before the whole system crashes.

- we enforce the API usage requirement to disable an interrupt line with
rtdm_irq_disable(), before eventually detaching the last IRQ handler for
it, which is common sense anyway.

- absolutely no change for people who currently rely on partial MSI
support, provided they duly disable IRQ lines before detaching their
last handler via the appropriate RTDM interface.

Can we deal on this?

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-12  8:48                                                             ` Philippe Gerum
@ 2010-11-12  9:14                                                               ` Jan Kiszka
  2010-11-12 13:57                                                                 ` Philippe Gerum
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Kiszka @ 2010-11-12  9:14 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 6300 bytes --]

Am 12.11.2010 09:48, Philippe Gerum wrote:
> On Tue, 2010-11-09 at 14:12 +0100, Jan Kiszka wrote:
>> Am 09.11.2010 10:36, Philippe Gerum wrote:
>>> On Tue, 2010-11-09 at 09:39 +0100, Jan Kiszka wrote:
>>>> Am 09.11.2010 09:26, Philippe Gerum wrote:
>>>>> On Tue, 2010-11-09 at 09:01 +0100, Jan Kiszka wrote:
>>>>>> Am 07.11.2010 17:22, Jan Kiszka wrote:
>>>>>>> Am 07.11.2010 16:15, Philippe Gerum wrote:
>>>>>>>> The following patches implements the teardown approach. The basic idea
>>>>>>>> is:
>>>>>>>> - neither break nor improve old setups with legacy I-pipe patches not
>>>>>>>> providing the revised ipipe_control_irq call.
>>>>>>>> - fix the SMP race when detaching interrupts.
>>>>>>>
>>>>>>> Looks good.
>>>>>>
>>>>>> This actually causes one regression: I've just learned that people are
>>>>>> already happily using MSIs with Xenomai in the field. This is perfectly
>>>>>> fine as long as you don't fiddle with rtdm_irq_disable/enable in
>>>>>> non-root contexts or while hard IRQs are disable. The latter requirement
>>>>>> would be violated by this fix now.
>>>>>
>>>>> What we could do is handle this corner-case in the ipipe directly, going
>>>>> for a nop when IRQs are off on a per-arch basis only to please those
>>>>> users,
>>>>
>>>> Don't we disable hard IRQs also then the root domain is the only
>>>> registered one? I'm worried about pushing regressions around, then to
>>>> plain Linux use-cases of MSI (which are not broken in anyway - except
>>>> for powerpc).
>>>
>>> The idea is to provide an ad hoc ipipe service for this, to be used by
>>> the HAL. A service that would check the controller for the target IRQ,
>>> and handle MSI ones conditionally. For sure, we just can't put those
>>> conditionally bluntly into the chip mask handler and expect the kernel
>>> to be happy.
>>>
>>> In fact, we already have __ipipe_enable/disable_irq from the internal
>>> Adeos interface avail, but they are mostly wrappers for now. We could
>>> make them a bit more smart, and handle the MSI issue as well. We would
>>> then tell the HAL to switch to using those arch-agnostic helpers
>>> generally, instead of peeking directly into the chip controller structs
>>> like today.
>>
>> This belongs to I-pipe, like we already have ipipe_end, just properly
>> wrapped to avoid descriptor access. That's specifically important if we
>> want to emulate MSI masking in software. I've the generic I-pipe
>> infrastructure ready, but the backend, so far consisting of x86 MSI
>> hardening, unfortunately needs to be rewritten.
>>
>>>
>>> If that ipipe "feature" is not detected by the HAL, then we would
>>> refrain from disabling the IRQ in xnintr_detach. In effect, this would
>>> leave the SMP race window open, but since we need recent ipipes to get
>>> it plugged already anyway (for the revised ipipe_control_irq), we would
>>> still remain in the current situation:
>>> - old patches? no SMP race fix, no regression
>>> - new patches? SMP race fix avail, no regression
>>
>> Sounds good.
> 
> Now that I slept on it, I find the approach of working around pipeline
> limitations this way, to be incorrect.
> 
> Basically, the issue is that we still don't have 100% reliable handling
> of MSI interrupts (actually, we only have partial handling, and solely
> for x86), but this is no reason to introduce code in the pipeline
> interface which would perpetuate this fact. I see this as a "all or
> nothing" issue: either MSI is fully handled and there shall be no
> restriction on applying common operations such as masking/unmasking on
> the related IRQs, or it is not, and we should not export "conditionally
> working" APIs.
> 
> In the latter case, the responsibility to rely on MSI support belongs to
> the user, which then should know about the pending restrictions, and
> decides for himself whether to use MSI. So I'm heading to this solution
> instead:
> 
> - when detaching the last handler for a given IRQ, instead of forcibly
> disabling the IRQ line, the nucleus would just make sure that such IRQ
> is already in a disabled state, and bail out on error if not (probably
> with a kernel warning to make the issue obvious).

Fiddling with the IRQ "line" state is a workaround for the missing
synchronize_irq service in Xenomai/I-pipe. If we had this, all this
disabling become unneeded.

> 
> - track the IRQ line state from xnintr_enable/xnintr_disable routines,
> so that xnintr_detach can determine whether the call is legit. Of
> course, this also means that any attempt to take sideways to
> enable/disable nucleus managed interrupts at PIC level would break that
> logic, but doing so would be the root bug anyway.
> 
> The advantage of doing so would be three-fold:
> 
> - no pipeline code to acknowledge (or even perpetuate) the fact that MSI
> support is half working, half broken. We need to fix it properly, so
> that we can use it 100% reliably, from whatever context commonly allowed
> for enabling/disabling IRQs (and not "from root domain with IRQs on"
> only). Typically, I fail to see how one would cope with such limitation,
> if a real-time handler detects that some device is going wild and really
> needs to shut it down before the whole system crashes.

MSIs are edge-triggered. Only broken hardware continuously sending bogus
messages can theoretically cause troubles. In practice (ie. in absence
of broken HW), we see a single spurious IRQ at worst.

> 
> - we enforce the API usage requirement to disable an interrupt line with
> rtdm_irq_disable(), before eventually detaching the last IRQ handler for
> it, which is common sense anyway.

That's an easy-to-get-wrong API. It would apply to non-shared IRQs only
(aka MSIs). No-go IMHO.

> 
> - absolutely no change for people who currently rely on partial MSI
> support, provided they duly disable IRQ lines before detaching their
> last handler via the appropriate RTDM interface.
> 
> Can we deal on this?
> 

Nope, don't think so. The only option I see (besides using my original
proposal of a dummy handler for deregistering - still much simpler than
the current patches) is to emulate MSI masking in the same run, thus
providing solutions for both issues.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-12  9:14                                                               ` Jan Kiszka
@ 2010-11-12 13:57                                                                 ` Philippe Gerum
  2010-11-12 14:30                                                                   ` Jan Kiszka
  0 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-11-12 13:57 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Fri, 2010-11-12 at 10:14 +0100, Jan Kiszka wrote:
> Am 12.11.2010 09:48, Philippe Gerum wrote:
> > On Tue, 2010-11-09 at 14:12 +0100, Jan Kiszka wrote:
> >> Am 09.11.2010 10:36, Philippe Gerum wrote:
> >>> On Tue, 2010-11-09 at 09:39 +0100, Jan Kiszka wrote:
> >>>> Am 09.11.2010 09:26, Philippe Gerum wrote:
> >>>>> On Tue, 2010-11-09 at 09:01 +0100, Jan Kiszka wrote:
> >>>>>> Am 07.11.2010 17:22, Jan Kiszka wrote:
> >>>>>>> Am 07.11.2010 16:15, Philippe Gerum wrote:
> >>>>>>>> The following patches implements the teardown approach. The basic idea
> >>>>>>>> is:
> >>>>>>>> - neither break nor improve old setups with legacy I-pipe patches not
> >>>>>>>> providing the revised ipipe_control_irq call.
> >>>>>>>> - fix the SMP race when detaching interrupts.
> >>>>>>>
> >>>>>>> Looks good.
> >>>>>>
> >>>>>> This actually causes one regression: I've just learned that people are
> >>>>>> already happily using MSIs with Xenomai in the field. This is perfectly
> >>>>>> fine as long as you don't fiddle with rtdm_irq_disable/enable in
> >>>>>> non-root contexts or while hard IRQs are disable. The latter requirement
> >>>>>> would be violated by this fix now.
> >>>>>
> >>>>> What we could do is handle this corner-case in the ipipe directly, going
> >>>>> for a nop when IRQs are off on a per-arch basis only to please those
> >>>>> users,
> >>>>
> >>>> Don't we disable hard IRQs also then the root domain is the only
> >>>> registered one? I'm worried about pushing regressions around, then to
> >>>> plain Linux use-cases of MSI (which are not broken in anyway - except
> >>>> for powerpc).
> >>>
> >>> The idea is to provide an ad hoc ipipe service for this, to be used by
> >>> the HAL. A service that would check the controller for the target IRQ,
> >>> and handle MSI ones conditionally. For sure, we just can't put those
> >>> conditionally bluntly into the chip mask handler and expect the kernel
> >>> to be happy.
> >>>
> >>> In fact, we already have __ipipe_enable/disable_irq from the internal
> >>> Adeos interface avail, but they are mostly wrappers for now. We could
> >>> make them a bit more smart, and handle the MSI issue as well. We would
> >>> then tell the HAL to switch to using those arch-agnostic helpers
> >>> generally, instead of peeking directly into the chip controller structs
> >>> like today.
> >>
> >> This belongs to I-pipe, like we already have ipipe_end, just properly
> >> wrapped to avoid descriptor access. That's specifically important if we
> >> want to emulate MSI masking in software. I've the generic I-pipe
> >> infrastructure ready, but the backend, so far consisting of x86 MSI
> >> hardening, unfortunately needs to be rewritten.
> >>
> >>>
> >>> If that ipipe "feature" is not detected by the HAL, then we would
> >>> refrain from disabling the IRQ in xnintr_detach. In effect, this would
> >>> leave the SMP race window open, but since we need recent ipipes to get
> >>> it plugged already anyway (for the revised ipipe_control_irq), we would
> >>> still remain in the current situation:
> >>> - old patches? no SMP race fix, no regression
> >>> - new patches? SMP race fix avail, no regression
> >>
> >> Sounds good.
> > 
> > Now that I slept on it, I find the approach of working around pipeline
> > limitations this way, to be incorrect.
> > 
> > Basically, the issue is that we still don't have 100% reliable handling
> > of MSI interrupts (actually, we only have partial handling, and solely
> > for x86), but this is no reason to introduce code in the pipeline
> > interface which would perpetuate this fact. I see this as a "all or
> > nothing" issue: either MSI is fully handled and there shall be no
> > restriction on applying common operations such as masking/unmasking on
> > the related IRQs, or it is not, and we should not export "conditionally
> > working" APIs.
> > 
> > In the latter case, the responsibility to rely on MSI support belongs to
> > the user, which then should know about the pending restrictions, and
> > decides for himself whether to use MSI. So I'm heading to this solution
> > instead:
> > 
> > - when detaching the last handler for a given IRQ, instead of forcibly
> > disabling the IRQ line, the nucleus would just make sure that such IRQ
> > is already in a disabled state, and bail out on error if not (probably
> > with a kernel warning to make the issue obvious).
> 
> Fiddling with the IRQ "line" state is a workaround for the missing
> synchronize_irq service in Xenomai/I-pipe. If we had this, all this
> disabling become unneeded.

Actually, no. The synchronize irq service in the pipeline could have
been introduced weeks ago in a snap, provided Xenomai did not call
ipipe_virtualize_irq holding a lock with irqs off - this is in essence
what my first patch to Pavel did, but could not work for the reason
mentioned. That is what the patch we are discussing now is all about:
clearing issues in Xenomai first.

Additionally, having an IRQ disabled is a requirement to properly
synchronize that IRQ later, it is not an option. It is part of the sync
protocol actually. If you don't do this, it lacks a basic guarantee, it
can't work.

> 
> > 
> > - track the IRQ line state from xnintr_enable/xnintr_disable routines,
> > so that xnintr_detach can determine whether the call is legit. Of
> > course, this also means that any attempt to take sideways to
> > enable/disable nucleus managed interrupts at PIC level would break that
> > logic, but doing so would be the root bug anyway.
> > 
> > The advantage of doing so would be three-fold:
> > 
> > - no pipeline code to acknowledge (or even perpetuate) the fact that MSI
> > support is half working, half broken. We need to fix it properly, so
> > that we can use it 100% reliably, from whatever context commonly allowed
> > for enabling/disabling IRQs (and not "from root domain with IRQs on"
> > only). Typically, I fail to see how one would cope with such limitation,
> > if a real-time handler detects that some device is going wild and really
> > needs to shut it down before the whole system crashes.
> 
> MSIs are edge-triggered. Only broken hardware continuously sending bogus
> messages can theoretically cause troubles. In practice (ie. in absence
> of broken HW), we see a single spurious IRQ at worst.
> 

The driver should be able to work the same way whether it deals with MSI
support, or is given pinned (level) IRQs.

> > 
> > - we enforce the API usage requirement to disable an interrupt line with
> > rtdm_irq_disable(), before eventually detaching the last IRQ handler for
> > it, which is common sense anyway.
> 
> That's an easy-to-get-wrong API. It would apply to non-shared IRQs only
> (aka MSIs). No-go IMHO.

- the software does not want to share MSIs, or something is quite wrong
- xnintr_detach() is broken by design in its current implementation for
shared handlers, because it leaves the decision to mask the IRQ line to
the user. This could not work for multiple non-cooperating drivers
sharing an IRQ, at least not in a raceless way.

So the issue is not that much about MSI vs pinned-type IRQs, it is
rather about shared vs non-shared interrupts. What we want is a sane API
and behavior for both types, which happens to fit the limited support we
have now for MSI, not the other way around.

Therefore, until the MSI issue is fixed in the pipeline:

- non-shared mode users should call xnintr_disable(irq) then
xnintr_detach(irq) explicitly. This is what everybody should do today
already (via rtdm_irq_disable() or whatever), since xnintr_detach() does
not disable the IRQ in the current implementation.

- shared mode users should only call xnintr_detach(irq), which would
then decide racelessly whether to disable the line. In essence, this
could never affect MSIs.

When the MSI support is generally available with no limitation on
masking/unmasking contexts, then xnintr_detach() could be updated to
forcibly disable the detached IRQ in the non-shared case as well. At
worst, the legacy code would end up calling xnintr_disable() uselessly
when the pipeline fix is merged.

What we can do in the meantime, is 1) fixing xnintr_detach() in the
shared case, 2) introducing an error check to prevent detaching
non-shared IRQs that are still enabled. That won't have any impact on
the partial MSI support we have today.

> 
> > 
> > - absolutely no change for people who currently rely on partial MSI
> > support, provided they duly disable IRQ lines before detaching their
> > last handler via the appropriate RTDM interface.
> > 
> > Can we deal on this?
> > 
> 
> Nope, don't think so. The only option I see (besides using my original
> proposal of a dummy handler for deregistering - still much simpler than
> the current patches) is to emulate MSI masking in the same run, thus
> providing solutions for both issues.

The dummy handler solution is incomplete because it does not address the
IRQ disabling issue in xnintr_detach() for shared IRQs. This is the gist
of the patch I sent, actually.

Besides, I'm still scratching my head: how would introducing a dummy
handler help dealing with that sequence?

CPU0					CPU1

xnintr_disable => mask irq		<irq>
xnintr_detach				real-time handler
	<install dummy handler>			<touch device>
					ipipe_end_irq() => unmask irq

Then, on whichever CPU, we could receive a device IRQ as we touched the
hardware before leaving the handler, passing that IRQ to the root domain
which has no proper handler for it:

- lucky scenario: the device is in a state that won't cause it to kick
yet another interrupt even if the last one was not serviced, and we just
end up with a nifty spurious IRQ message in the kernel log.

- out of luck: we touched the device on CPU1 in a way that claims for a
specific action by the software, which won't be able to perform it
because it is now branching to the dummy handler. Whether we have some
code to silence the device (and not only the IRQ line) before or after
we detach the IRQ on CPU0 does not fix the issue, since we have no
synchronization between the CPUs whatsoever.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-12 13:57                                                                 ` Philippe Gerum
@ 2010-11-12 14:30                                                                   ` Jan Kiszka
  2010-11-12 17:42                                                                     ` Philippe Gerum
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Kiszka @ 2010-11-12 14:30 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 11956 bytes --]

Am 12.11.2010 14:57, Philippe Gerum wrote:
> On Fri, 2010-11-12 at 10:14 +0100, Jan Kiszka wrote:
>> Am 12.11.2010 09:48, Philippe Gerum wrote:
>>> On Tue, 2010-11-09 at 14:12 +0100, Jan Kiszka wrote:
>>>> Am 09.11.2010 10:36, Philippe Gerum wrote:
>>>>> On Tue, 2010-11-09 at 09:39 +0100, Jan Kiszka wrote:
>>>>>> Am 09.11.2010 09:26, Philippe Gerum wrote:
>>>>>>> On Tue, 2010-11-09 at 09:01 +0100, Jan Kiszka wrote:
>>>>>>>> Am 07.11.2010 17:22, Jan Kiszka wrote:
>>>>>>>>> Am 07.11.2010 16:15, Philippe Gerum wrote:
>>>>>>>>>> The following patches implements the teardown approach. The basic idea
>>>>>>>>>> is:
>>>>>>>>>> - neither break nor improve old setups with legacy I-pipe patches not
>>>>>>>>>> providing the revised ipipe_control_irq call.
>>>>>>>>>> - fix the SMP race when detaching interrupts.
>>>>>>>>>
>>>>>>>>> Looks good.
>>>>>>>>
>>>>>>>> This actually causes one regression: I've just learned that people are
>>>>>>>> already happily using MSIs with Xenomai in the field. This is perfectly
>>>>>>>> fine as long as you don't fiddle with rtdm_irq_disable/enable in
>>>>>>>> non-root contexts or while hard IRQs are disable. The latter requirement
>>>>>>>> would be violated by this fix now.
>>>>>>>
>>>>>>> What we could do is handle this corner-case in the ipipe directly, going
>>>>>>> for a nop when IRQs are off on a per-arch basis only to please those
>>>>>>> users,
>>>>>>
>>>>>> Don't we disable hard IRQs also then the root domain is the only
>>>>>> registered one? I'm worried about pushing regressions around, then to
>>>>>> plain Linux use-cases of MSI (which are not broken in anyway - except
>>>>>> for powerpc).
>>>>>
>>>>> The idea is to provide an ad hoc ipipe service for this, to be used by
>>>>> the HAL. A service that would check the controller for the target IRQ,
>>>>> and handle MSI ones conditionally. For sure, we just can't put those
>>>>> conditionally bluntly into the chip mask handler and expect the kernel
>>>>> to be happy.
>>>>>
>>>>> In fact, we already have __ipipe_enable/disable_irq from the internal
>>>>> Adeos interface avail, but they are mostly wrappers for now. We could
>>>>> make them a bit more smart, and handle the MSI issue as well. We would
>>>>> then tell the HAL to switch to using those arch-agnostic helpers
>>>>> generally, instead of peeking directly into the chip controller structs
>>>>> like today.
>>>>
>>>> This belongs to I-pipe, like we already have ipipe_end, just properly
>>>> wrapped to avoid descriptor access. That's specifically important if we
>>>> want to emulate MSI masking in software. I've the generic I-pipe
>>>> infrastructure ready, but the backend, so far consisting of x86 MSI
>>>> hardening, unfortunately needs to be rewritten.
>>>>
>>>>>
>>>>> If that ipipe "feature" is not detected by the HAL, then we would
>>>>> refrain from disabling the IRQ in xnintr_detach. In effect, this would
>>>>> leave the SMP race window open, but since we need recent ipipes to get
>>>>> it plugged already anyway (for the revised ipipe_control_irq), we would
>>>>> still remain in the current situation:
>>>>> - old patches? no SMP race fix, no regression
>>>>> - new patches? SMP race fix avail, no regression
>>>>
>>>> Sounds good.
>>>
>>> Now that I slept on it, I find the approach of working around pipeline
>>> limitations this way, to be incorrect.
>>>
>>> Basically, the issue is that we still don't have 100% reliable handling
>>> of MSI interrupts (actually, we only have partial handling, and solely
>>> for x86), but this is no reason to introduce code in the pipeline
>>> interface which would perpetuate this fact. I see this as a "all or
>>> nothing" issue: either MSI is fully handled and there shall be no
>>> restriction on applying common operations such as masking/unmasking on
>>> the related IRQs, or it is not, and we should not export "conditionally
>>> working" APIs.
>>>
>>> In the latter case, the responsibility to rely on MSI support belongs to
>>> the user, which then should know about the pending restrictions, and
>>> decides for himself whether to use MSI. So I'm heading to this solution
>>> instead:
>>>
>>> - when detaching the last handler for a given IRQ, instead of forcibly
>>> disabling the IRQ line, the nucleus would just make sure that such IRQ
>>> is already in a disabled state, and bail out on error if not (probably
>>> with a kernel warning to make the issue obvious).
>>
>> Fiddling with the IRQ "line" state is a workaround for the missing
>> synchronize_irq service in Xenomai/I-pipe. If we had this, all this
>> disabling become unneeded.
> 
> Actually, no. The synchronize irq service in the pipeline could have
> been introduced weeks ago in a snap, provided Xenomai did not call
> ipipe_virtualize_irq holding a lock with irqs off - this is in essence
> what my first patch to Pavel did, but could not work for the reason
> mentioned. That is what the patch we are discussing now is all about:
> clearing issues in Xenomai first.
> 
> Additionally, having an IRQ disabled is a requirement to properly
> synchronize that IRQ later, it is not an option. It is part of the sync
> protocol actually. If you don't do this, it lacks a basic guarantee, it
> can't work.

The required pattern from driver POV is
 - disable IRQ at device level
 - flush pending IRQs
 - deregister handler

Linux does the last two steps in reverse order as it synchronizes
descriptor changes against descriptor usage internally. I-pipe requires
the above ordering (unless my patch is used to harden
ipipe_virtualize_irq against NULL handlers). Still, the driver should
find step 2 and 3 as part of rtdm_irq_free. That's all.

> 
>>
>>>
>>> - track the IRQ line state from xnintr_enable/xnintr_disable routines,
>>> so that xnintr_detach can determine whether the call is legit. Of
>>> course, this also means that any attempt to take sideways to
>>> enable/disable nucleus managed interrupts at PIC level would break that
>>> logic, but doing so would be the root bug anyway.
>>>
>>> The advantage of doing so would be three-fold:
>>>
>>> - no pipeline code to acknowledge (or even perpetuate) the fact that MSI
>>> support is half working, half broken. We need to fix it properly, so
>>> that we can use it 100% reliably, from whatever context commonly allowed
>>> for enabling/disabling IRQs (and not "from root domain with IRQs on"
>>> only). Typically, I fail to see how one would cope with such limitation,
>>> if a real-time handler detects that some device is going wild and really
>>> needs to shut it down before the whole system crashes.
>>
>> MSIs are edge-triggered. Only broken hardware continuously sending bogus
>> messages can theoretically cause troubles. In practice (ie. in absence
>> of broken HW), we see a single spurious IRQ at worst.
>>
> 
> The driver should be able to work the same way whether it deals with MSI
> support, or is given pinned (level) IRQs.

Exactly my point, thus rtdm_irq_disable in driver code is a no-go.

> 
>>>
>>> - we enforce the API usage requirement to disable an interrupt line with
>>> rtdm_irq_disable(), before eventually detaching the last IRQ handler for
>>> it, which is common sense anyway.
>>
>> That's an easy-to-get-wrong API. It would apply to non-shared IRQs only
>> (aka MSIs). No-go IMHO.
> 
> - the software does not want to share MSIs, or something is quite wrong
> - xnintr_detach() is broken by design in its current implementation for
> shared handlers, because it leaves the decision to mask the IRQ line to
> the user. This could not work for multiple non-cooperating drivers
> sharing an IRQ, at least not in a raceless way.

It works (minus the missing synchronize_irq) if you disable the IRQ at
device level as drivers normally do.

> 
> So the issue is not that much about MSI vs pinned-type IRQs, it is
> rather about shared vs non-shared interrupts. What we want is a sane API
> and behavior for both types, which happens to fit the limited support we
> have now for MSI, not the other way around.
> 
> Therefore, until the MSI issue is fixed in the pipeline:
> 
> - non-shared mode users should call xnintr_disable(irq) then
> xnintr_detach(irq) explicitly. This is what everybody should do today
> already (via rtdm_irq_disable() or whatever), since xnintr_detach() does
> not disable the IRQ in the current implementation.

No one should call rtdm_irq_disable today for whatever IRQ unless in
very few exceptional cases. I think I should extend the documentation of
rtdm_irq_enabled/disable ASAP to clarify their exceptional nature.

> 
> - shared mode users should only call xnintr_detach(irq), which would
> then decide racelessly whether to disable the line. In essence, this
> could never affect MSIs.
> 
> When the MSI support is generally available with no limitation on
> masking/unmasking contexts, then xnintr_detach() could be updated to
> forcibly disable the detached IRQ in the non-shared case as well. At
> worst, the legacy code would end up calling xnintr_disable() uselessly
> when the pipeline fix is merged.
> 
> What we can do in the meantime, is 1) fixing xnintr_detach() in the
> shared case, 2) introducing an error check to prevent detaching
> non-shared IRQs that are still enabled. That won't have any impact on
> the partial MSI support we have today.
> 
>>
>>>
>>> - absolutely no change for people who currently rely on partial MSI
>>> support, provided they duly disable IRQ lines before detaching their
>>> last handler via the appropriate RTDM interface.
>>>
>>> Can we deal on this?
>>>
>>
>> Nope, don't think so. The only option I see (besides using my original
>> proposal of a dummy handler for deregistering - still much simpler than
>> the current patches) is to emulate MSI masking in the same run, thus
>> providing solutions for both issues.
> 
> The dummy handler solution is incomplete because it does not address the
> IRQ disabling issue in xnintr_detach() for shared IRQs. This is the gist
> of the patch I sent, actually.

It is complete in this regard as it does not touch the line state. It
actually implements the Linux pattern.

> 
> Besides, I'm still scratching my head: how would introducing a dummy
> handler help dealing with that sequence?
> 
> CPU0					CPU1
> 
> xnintr_disable => mask irq		<irq>

No disable here, mask must be at device level.

> xnintr_detach				real-time handler
> 	<install dummy handler>			<touch device>
> 					ipipe_end_irq() => unmask irq

xnarch_synchronize_irq

> 
> Then, on whichever CPU, we could receive a device IRQ as we touched the
> hardware before leaving the handler, passing that IRQ to the root domain
> which has no proper handler for it:
> 
> - lucky scenario: the device is in a state that won't cause it to kick
> yet another interrupt even if the last one was not serviced, and we just
> end up with a nifty spurious IRQ message in the kernel log.
> 
> - out of luck: we touched the device on CPU1 in a way that claims for a
> specific action by the software, which won't be able to perform it
> because it is now branching to the dummy handler. Whether we have some
> code to silence the device (and not only the IRQ line) before or after
> we detach the IRQ on CPU0 does not fix the issue, since we have no
> synchronization between the CPUs whatsoever.
> 

The driver is responsible for synchronizing its shutdown sequence
between the cleanup code and any of its in-flight IRQs. If that handler
reenables the IRQ that the shutdown code just disabled (all at device
level, of course), something is seriously broken - but not on Xenomai
side. As I explained before: That potential in-flight IRQ becomes
spurious at the point the drivers starts the shutdown, thus can safely
be ignored.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-12 14:30                                                                   ` Jan Kiszka
@ 2010-11-12 17:42                                                                     ` Philippe Gerum
  2010-11-12 18:42                                                                       ` Jan Kiszka
  0 siblings, 1 reply; 51+ messages in thread
From: Philippe Gerum @ 2010-11-12 17:42 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Fri, 2010-11-12 at 15:30 +0100, Jan Kiszka wrote: 
> Am 12.11.2010 14:57, Philippe Gerum wrote:
> > On Fri, 2010-11-12 at 10:14 +0100, Jan Kiszka wrote:
> >> Am 12.11.2010 09:48, Philippe Gerum wrote:
> >>> On Tue, 2010-11-09 at 14:12 +0100, Jan Kiszka wrote:
> >>>> Am 09.11.2010 10:36, Philippe Gerum wrote:
> >>>>> On Tue, 2010-11-09 at 09:39 +0100, Jan Kiszka wrote:
> >>>>>> Am 09.11.2010 09:26, Philippe Gerum wrote:
> >>>>>>> On Tue, 2010-11-09 at 09:01 +0100, Jan Kiszka wrote:
> >>>>>>>> Am 07.11.2010 17:22, Jan Kiszka wrote:
> >>>>>>>>> Am 07.11.2010 16:15, Philippe Gerum wrote:
> >>>>>>>>>> The following patches implements the teardown approach. The basic idea
> >>>>>>>>>> is:
> >>>>>>>>>> - neither break nor improve old setups with legacy I-pipe patches not
> >>>>>>>>>> providing the revised ipipe_control_irq call.
> >>>>>>>>>> - fix the SMP race when detaching interrupts.
> >>>>>>>>>
> >>>>>>>>> Looks good.
> >>>>>>>>
> >>>>>>>> This actually causes one regression: I've just learned that people are
> >>>>>>>> already happily using MSIs with Xenomai in the field. This is perfectly
> >>>>>>>> fine as long as you don't fiddle with rtdm_irq_disable/enable in
> >>>>>>>> non-root contexts or while hard IRQs are disable. The latter requirement
> >>>>>>>> would be violated by this fix now.
> >>>>>>>
> >>>>>>> What we could do is handle this corner-case in the ipipe directly, going
> >>>>>>> for a nop when IRQs are off on a per-arch basis only to please those
> >>>>>>> users,
> >>>>>>
> >>>>>> Don't we disable hard IRQs also then the root domain is the only
> >>>>>> registered one? I'm worried about pushing regressions around, then to
> >>>>>> plain Linux use-cases of MSI (which are not broken in anyway - except
> >>>>>> for powerpc).
> >>>>>
> >>>>> The idea is to provide an ad hoc ipipe service for this, to be used by
> >>>>> the HAL. A service that would check the controller for the target IRQ,
> >>>>> and handle MSI ones conditionally. For sure, we just can't put those
> >>>>> conditionally bluntly into the chip mask handler and expect the kernel
> >>>>> to be happy.
> >>>>>
> >>>>> In fact, we already have __ipipe_enable/disable_irq from the internal
> >>>>> Adeos interface avail, but they are mostly wrappers for now. We could
> >>>>> make them a bit more smart, and handle the MSI issue as well. We would
> >>>>> then tell the HAL to switch to using those arch-agnostic helpers
> >>>>> generally, instead of peeking directly into the chip controller structs
> >>>>> like today.
> >>>>
> >>>> This belongs to I-pipe, like we already have ipipe_end, just properly
> >>>> wrapped to avoid descriptor access. That's specifically important if we
> >>>> want to emulate MSI masking in software. I've the generic I-pipe
> >>>> infrastructure ready, but the backend, so far consisting of x86 MSI
> >>>> hardening, unfortunately needs to be rewritten.
> >>>>
> >>>>>
> >>>>> If that ipipe "feature" is not detected by the HAL, then we would
> >>>>> refrain from disabling the IRQ in xnintr_detach. In effect, this would
> >>>>> leave the SMP race window open, but since we need recent ipipes to get
> >>>>> it plugged already anyway (for the revised ipipe_control_irq), we would
> >>>>> still remain in the current situation:
> >>>>> - old patches? no SMP race fix, no regression
> >>>>> - new patches? SMP race fix avail, no regression
> >>>>
> >>>> Sounds good.
> >>>
> >>> Now that I slept on it, I find the approach of working around pipeline
> >>> limitations this way, to be incorrect.
> >>>
> >>> Basically, the issue is that we still don't have 100% reliable handling
> >>> of MSI interrupts (actually, we only have partial handling, and solely
> >>> for x86), but this is no reason to introduce code in the pipeline
> >>> interface which would perpetuate this fact. I see this as a "all or
> >>> nothing" issue: either MSI is fully handled and there shall be no
> >>> restriction on applying common operations such as masking/unmasking on
> >>> the related IRQs, or it is not, and we should not export "conditionally
> >>> working" APIs.
> >>>
> >>> In the latter case, the responsibility to rely on MSI support belongs to
> >>> the user, which then should know about the pending restrictions, and
> >>> decides for himself whether to use MSI. So I'm heading to this solution
> >>> instead:
> >>>
> >>> - when detaching the last handler for a given IRQ, instead of forcibly
> >>> disabling the IRQ line, the nucleus would just make sure that such IRQ
> >>> is already in a disabled state, and bail out on error if not (probably
> >>> with a kernel warning to make the issue obvious).
> >>
> >> Fiddling with the IRQ "line" state is a workaround for the missing
> >> synchronize_irq service in Xenomai/I-pipe. If we had this, all this
> >> disabling become unneeded.
> > 
> > Actually, no. The synchronize irq service in the pipeline could have
> > been introduced weeks ago in a snap, provided Xenomai did not call
> > ipipe_virtualize_irq holding a lock with irqs off - this is in essence
> > what my first patch to Pavel did, but could not work for the reason
> > mentioned. That is what the patch we are discussing now is all about:
> > clearing issues in Xenomai first.
> > 
> > Additionally, having an IRQ disabled is a requirement to properly
> > synchronize that IRQ later, it is not an option. It is part of the sync
> > protocol actually. If you don't do this, it lacks a basic guarantee, it
> > can't work.
> 
> The required pattern from driver POV is
>  - disable IRQ at device level
>  - flush pending IRQs
>  - deregister handler
> 
> Linux does the last two steps in reverse order as it synchronizes
> descriptor changes against descriptor usage internally. I-pipe requires
> the above ordering (unless my patch is used to harden
> ipipe_virtualize_irq against NULL handlers). Still, the driver should
> find step 2 and 3 as part of rtdm_irq_free. That's all.
> 

It looks like you are omitting the disable IRQ line step which free_irq
does when no more handler is attached to the line. You don't seem to
take into account that pipeline-wise, synchronizing an IRQ upon shutdown
requires to mask it first at PIC level, either.

> Jan 
> > 
> >>
> >>>
> >>> - track the IRQ line state from xnintr_enable/xnintr_disable routines,
> >>> so that xnintr_detach can determine whether the call is legit. Of
> >>> course, this also means that any attempt to take sideways to
> >>> enable/disable nucleus managed interrupts at PIC level would break that
> >>> logic, but doing so would be the root bug anyway.
> >>>
> >>> The advantage of doing so would be three-fold:
> >>>
> >>> - no pipeline code to acknowledge (or even perpetuate) the fact that MSI
> >>> support is half working, half broken. We need to fix it properly, so
> >>> that we can use it 100% reliably, from whatever context commonly allowed
> >>> for enabling/disabling IRQs (and not "from root domain with IRQs on"
> >>> only). Typically, I fail to see how one would cope with such limitation,
> >>> if a real-time handler detects that some device is going wild and really
> >>> needs to shut it down before the whole system crashes.
> >>
> >> MSIs are edge-triggered. Only broken hardware continuously sending bogus
> >> messages can theoretically cause troubles. In practice (ie. in absence
> >> of broken HW), we see a single spurious IRQ at worst.
> >>
> > 
> > The driver should be able to work the same way whether it deals with MSI
> > support, or is given pinned (level) IRQs.
> 
> Exactly my point, thus rtdm_irq_disable in driver code is a no-go.
> 

Since you don't have such things as shared MSIs, the only issue is with
non-shared handling. And disabling the interrupt line for a non-shared
handler which is being deregistered makes sense, in all cases, MSI or
not. The fact that MSI support is broken today for us does not mean that
we should design a broken xnintr_detach API to cope with that -
hopefully temporary - limitation. So, xnarch_disable_irq in
xnintr_detach is obviously correct, even if we currently have to refrain
from doing this for non-shared handlers because of MSI.

> > 
> >>>
> >>> - we enforce the API usage requirement to disable an interrupt line with
> >>> rtdm_irq_disable(), before eventually detaching the last IRQ handler for
> >>> it, which is common sense anyway.
> >>
> >> That's an easy-to-get-wrong API. It would apply to non-shared IRQs only
> >> (aka MSIs). No-go IMHO.
> > 
> > - the software does not want to share MSIs, or something is quite wrong
> > - xnintr_detach() is broken by design in its current implementation for
> > shared handlers, because it leaves the decision to mask the IRQ line to
> > the user. This could not work for multiple non-cooperating drivers
> > sharing an IRQ, at least not in a raceless way.
> 
> It works (minus the missing synchronize_irq) if you disable the IRQ at
> device level as drivers normally do.

synchronize_irq without prior IRQ disabling does not work for the
pipeline. So relying on synchronize_irq to only mask at device level
without disabling the IRQ line just does not work.

> > 
> > So the issue is not that much about MSI vs pinned-type IRQs, it is
> > rather about shared vs non-shared interrupts. What we want is a sane API
> > and behavior for both types, which happens to fit the limited support we
> > have now for MSI, not the other way around.
> > 
> > Therefore, until the MSI issue is fixed in the pipeline:
> > 
> > - non-shared mode users should call xnintr_disable(irq) then
> > xnintr_detach(irq) explicitly. This is what everybody should do today
> > already (via rtdm_irq_disable() or whatever), since xnintr_detach() does
> > not disable the IRQ in the current implementation.
> 
> No one should call rtdm_irq_disable today for whatever IRQ unless in
> very few exceptional cases. I think I should extend the documentation of
> rtdm_irq_enabled/disable ASAP to clarify their exceptional nature.
> 

rtdm_irq_enable() is required to startup the IRQ line, since Xenomai
leaves interrupts in a disabled state at descriptor init time, so this
one is hardly exceptionally used.

rtdm_irq_disable() should be seldomly used indeed, provided the MSI
support is fixed. Until then, xnintr_detach won't be allowed to disable
non-shared interrupts. But if you discourage disabling IRQs in general
before detaching them, then you basically tell people that their driver
should not expect the system to help them in any way for synchronizing
IRQs, and this is something I disagree with.

> > 
> > - shared mode users should only call xnintr_detach(irq), which would
> > then decide racelessly whether to disable the line. In essence, this
> > could never affect MSIs.
> > 
> > When the MSI support is generally available with no limitation on
> > masking/unmasking contexts, then xnintr_detach() could be updated to
> > forcibly disable the detached IRQ in the non-shared case as well. At
> > worst, the legacy code would end up calling xnintr_disable() uselessly
> > when the pipeline fix is merged.
> > 
> > What we can do in the meantime, is 1) fixing xnintr_detach() in the
> > shared case, 2) introducing an error check to prevent detaching
> > non-shared IRQs that are still enabled. That won't have any impact on
> > the partial MSI support we have today.
> > 
> >>
> >>>
> >>> - absolutely no change for people who currently rely on partial MSI
> >>> support, provided they duly disable IRQ lines before detaching their
> >>> last handler via the appropriate RTDM interface.
> >>>
> >>> Can we deal on this?
> >>>
> >>
> >> Nope, don't think so. The only option I see (besides using my original
> >> proposal of a dummy handler for deregistering - still much simpler than
> >> the current patches) is to emulate MSI masking in the same run, thus
> >> providing solutions for both issues.
> > 
> > The dummy handler solution is incomplete because it does not address the
> > IRQ disabling issue in xnintr_detach() for shared IRQs. This is the gist
> > of the patch I sent, actually.
> 
> It is complete in this regard as it does not touch the line state. It
> actually implements the Linux pattern.
> 

The linux pattern in free_irq(), which is semantically equivalent of
xnintr_detach() for us does include IRQ disabling when the last handler
is detached. Introducing the dummy handler looks a bit like papering
over the issue, so that weakly synchronized xnintr_detach() might be
sufficient. I don't agree there.

> 
> > 
> > Besides, I'm still scratching my head: how would introducing a dummy
> > handler help dealing with that sequence?
> > 
> > CPU0					CPU1
> > 
> > xnintr_disable => mask irq		<irq>
> 
> No disable here, mask must be at device level
> .
> 
> > xnintr_detach				real-time handler
> > 	<install dummy handler>			<touch device>
> > 					ipipe_end_irq() => unmask irq
> 
> xnarch_synchronize_irq
> 

No, it is too late. It assumes the driver could somehow prevent CPU1
from touching the device before the dummy handler could be installed by
ipipe_virtualize_irq. But synchronizing and releasing a pipelined IRQ
must be done in that order, and all locks dropped, with interrupts
enabled on CPU0. I can't see how just replacing mask_irq by mask_device
would work any better, and prevent CPU0 from receiving an unexpected IRQ
from the device it is supposed to have disabled a few instructions
before. This is where disabling the IRQ comes into play, /me think.

> > 
> > Then, on whichever CPU, we could receive a device IRQ as we touched the
> > hardware before leaving the handler, passing that IRQ to the root domain
> > which has no proper handler for it:
> > 
> > - lucky scenario: the device is in a state that won't cause it to kick
> > yet another interrupt even if the last one was not serviced, and we just
> > end up with a nifty spurious IRQ message in the kernel log.
> > 
> > - out of luck: we touched the device on CPU1 in a way that claims for a
> > specific action by the software, which won't be able to perform it
> > because it is now branching to the dummy handler. Whether we have some
> > code to silence the device (and not only the IRQ line) before or after
> > we detach the IRQ on CPU0 does not fix the issue, since we have no
> > synchronization between the CPUs whatsoever.
> > 
> 
> The driver is responsible for synchronizing its shutdown sequence
> between the cleanup code and any of its in-flight IRQs. If that handler
> reenables the IRQ that the shutdown code just disabled (all at device
> level, of course), something is seriously broken - but not on Xenomai
> side. As I explained before: That potential in-flight IRQ becomes
> spurious at the point the drivers starts the shutdown, thus can safely
> be ignored.

For mask_device to be useful in SMP, I guess we both agree that we need
some kind of barrier for synchronizing in flight IRQs before they become
not only spurious IRQs, but also deadly ones. Reading the kernel code, I
can understand how this is done, both in disable_irq() and __free_irq():
first disable the device in the driver, then disable the line, then
synchronize. The barrier mechanism is there.

What kind of construct would you then recommend to Xenomai driver
writers, to have the same guarantee with only the driver doing
"disable_device" and Xenomai doing "synchronize"? This is what I'm
likely missing.

-- 
Philippe.





^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-12 17:42                                                                     ` Philippe Gerum
@ 2010-11-12 18:42                                                                       ` Jan Kiszka
  2010-11-14 21:28                                                                         ` Philippe Gerum
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Kiszka @ 2010-11-12 18:42 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 17799 bytes --]

Am 12.11.2010 18:42, Philippe Gerum wrote:
> On Fri, 2010-11-12 at 15:30 +0100, Jan Kiszka wrote: 
>> Am 12.11.2010 14:57, Philippe Gerum wrote:
>>> On Fri, 2010-11-12 at 10:14 +0100, Jan Kiszka wrote:
>>>> Am 12.11.2010 09:48, Philippe Gerum wrote:
>>>>> On Tue, 2010-11-09 at 14:12 +0100, Jan Kiszka wrote:
>>>>>> Am 09.11.2010 10:36, Philippe Gerum wrote:
>>>>>>> On Tue, 2010-11-09 at 09:39 +0100, Jan Kiszka wrote:
>>>>>>>> Am 09.11.2010 09:26, Philippe Gerum wrote:
>>>>>>>>> On Tue, 2010-11-09 at 09:01 +0100, Jan Kiszka wrote:
>>>>>>>>>> Am 07.11.2010 17:22, Jan Kiszka wrote:
>>>>>>>>>>> Am 07.11.2010 16:15, Philippe Gerum wrote:
>>>>>>>>>>>> The following patches implements the teardown approach. The basic idea
>>>>>>>>>>>> is:
>>>>>>>>>>>> - neither break nor improve old setups with legacy I-pipe patches not
>>>>>>>>>>>> providing the revised ipipe_control_irq call.
>>>>>>>>>>>> - fix the SMP race when detaching interrupts.
>>>>>>>>>>>
>>>>>>>>>>> Looks good.
>>>>>>>>>>
>>>>>>>>>> This actually causes one regression: I've just learned that people are
>>>>>>>>>> already happily using MSIs with Xenomai in the field. This is perfectly
>>>>>>>>>> fine as long as you don't fiddle with rtdm_irq_disable/enable in
>>>>>>>>>> non-root contexts or while hard IRQs are disable. The latter requirement
>>>>>>>>>> would be violated by this fix now.
>>>>>>>>>
>>>>>>>>> What we could do is handle this corner-case in the ipipe directly, going
>>>>>>>>> for a nop when IRQs are off on a per-arch basis only to please those
>>>>>>>>> users,
>>>>>>>>
>>>>>>>> Don't we disable hard IRQs also then the root domain is the only
>>>>>>>> registered one? I'm worried about pushing regressions around, then to
>>>>>>>> plain Linux use-cases of MSI (which are not broken in anyway - except
>>>>>>>> for powerpc).
>>>>>>>
>>>>>>> The idea is to provide an ad hoc ipipe service for this, to be used by
>>>>>>> the HAL. A service that would check the controller for the target IRQ,
>>>>>>> and handle MSI ones conditionally. For sure, we just can't put those
>>>>>>> conditionally bluntly into the chip mask handler and expect the kernel
>>>>>>> to be happy.
>>>>>>>
>>>>>>> In fact, we already have __ipipe_enable/disable_irq from the internal
>>>>>>> Adeos interface avail, but they are mostly wrappers for now. We could
>>>>>>> make them a bit more smart, and handle the MSI issue as well. We would
>>>>>>> then tell the HAL to switch to using those arch-agnostic helpers
>>>>>>> generally, instead of peeking directly into the chip controller structs
>>>>>>> like today.
>>>>>>
>>>>>> This belongs to I-pipe, like we already have ipipe_end, just properly
>>>>>> wrapped to avoid descriptor access. That's specifically important if we
>>>>>> want to emulate MSI masking in software. I've the generic I-pipe
>>>>>> infrastructure ready, but the backend, so far consisting of x86 MSI
>>>>>> hardening, unfortunately needs to be rewritten.
>>>>>>
>>>>>>>
>>>>>>> If that ipipe "feature" is not detected by the HAL, then we would
>>>>>>> refrain from disabling the IRQ in xnintr_detach. In effect, this would
>>>>>>> leave the SMP race window open, but since we need recent ipipes to get
>>>>>>> it plugged already anyway (for the revised ipipe_control_irq), we would
>>>>>>> still remain in the current situation:
>>>>>>> - old patches? no SMP race fix, no regression
>>>>>>> - new patches? SMP race fix avail, no regression
>>>>>>
>>>>>> Sounds good.
>>>>>
>>>>> Now that I slept on it, I find the approach of working around pipeline
>>>>> limitations this way, to be incorrect.
>>>>>
>>>>> Basically, the issue is that we still don't have 100% reliable handling
>>>>> of MSI interrupts (actually, we only have partial handling, and solely
>>>>> for x86), but this is no reason to introduce code in the pipeline
>>>>> interface which would perpetuate this fact. I see this as a "all or
>>>>> nothing" issue: either MSI is fully handled and there shall be no
>>>>> restriction on applying common operations such as masking/unmasking on
>>>>> the related IRQs, or it is not, and we should not export "conditionally
>>>>> working" APIs.
>>>>>
>>>>> In the latter case, the responsibility to rely on MSI support belongs to
>>>>> the user, which then should know about the pending restrictions, and
>>>>> decides for himself whether to use MSI. So I'm heading to this solution
>>>>> instead:
>>>>>
>>>>> - when detaching the last handler for a given IRQ, instead of forcibly
>>>>> disabling the IRQ line, the nucleus would just make sure that such IRQ
>>>>> is already in a disabled state, and bail out on error if not (probably
>>>>> with a kernel warning to make the issue obvious).
>>>>
>>>> Fiddling with the IRQ "line" state is a workaround for the missing
>>>> synchronize_irq service in Xenomai/I-pipe. If we had this, all this
>>>> disabling become unneeded.
>>>
>>> Actually, no. The synchronize irq service in the pipeline could have
>>> been introduced weeks ago in a snap, provided Xenomai did not call
>>> ipipe_virtualize_irq holding a lock with irqs off - this is in essence
>>> what my first patch to Pavel did, but could not work for the reason
>>> mentioned. That is what the patch we are discussing now is all about:
>>> clearing issues in Xenomai first.
>>>
>>> Additionally, having an IRQ disabled is a requirement to properly
>>> synchronize that IRQ later, it is not an option. It is part of the sync
>>> protocol actually. If you don't do this, it lacks a basic guarantee, it
>>> can't work.
>>
>> The required pattern from driver POV is
>>  - disable IRQ at device level
>>  - flush pending IRQs
>>  - deregister handler
>>
>> Linux does the last two steps in reverse order as it synchronizes
>> descriptor changes against descriptor usage internally. I-pipe requires
>> the above ordering (unless my patch is used to harden
>> ipipe_virtualize_irq against NULL handlers). Still, the driver should
>> find step 2 and 3 as part of rtdm_irq_free. That's all.
>>
> 
> It looks like you are omitting the disable IRQ line step which free_irq
> does when no more handler is attached to the line. You don't seem to
> take into account that pipeline-wise, synchronizing an IRQ upon shutdown
> requires to mask it first at PIC level, either.

Disabling at IRQ controller level is a step which is not conceptually
required to perform a safe shutdown. It is likely applied to suppress
spurious IRQs due to buggy hardware.

> 
>> Jan 
>>>
>>>>
>>>>>
>>>>> - track the IRQ line state from xnintr_enable/xnintr_disable routines,
>>>>> so that xnintr_detach can determine whether the call is legit. Of
>>>>> course, this also means that any attempt to take sideways to
>>>>> enable/disable nucleus managed interrupts at PIC level would break that
>>>>> logic, but doing so would be the root bug anyway.
>>>>>
>>>>> The advantage of doing so would be three-fold:
>>>>>
>>>>> - no pipeline code to acknowledge (or even perpetuate) the fact that MSI
>>>>> support is half working, half broken. We need to fix it properly, so
>>>>> that we can use it 100% reliably, from whatever context commonly allowed
>>>>> for enabling/disabling IRQs (and not "from root domain with IRQs on"
>>>>> only). Typically, I fail to see how one would cope with such limitation,
>>>>> if a real-time handler detects that some device is going wild and really
>>>>> needs to shut it down before the whole system crashes.
>>>>
>>>> MSIs are edge-triggered. Only broken hardware continuously sending bogus
>>>> messages can theoretically cause troubles. In practice (ie. in absence
>>>> of broken HW), we see a single spurious IRQ at worst.
>>>>
>>>
>>> The driver should be able to work the same way whether it deals with MSI
>>> support, or is given pinned (level) IRQs.
>>
>> Exactly my point, thus rtdm_irq_disable in driver code is a no-go.
>>
> 
> Since you don't have such things as shared MSIs, the only issue is with
> non-shared handling. And disabling the interrupt line for a non-shared
> handler which is being deregistered makes sense, in all cases, MSI or
> not. The fact that MSI support is broken today for us does not mean that
> we should design a broken xnintr_detach API to cope with that -
> hopefully temporary - limitation. So, xnarch_disable_irq in
> xnintr_detach is obviously correct, even if we currently have to refrain
> from doing this for non-shared handlers because of MSI.

It is not incorrect, but it must not be mandatory for proper shutdown.

> 
>>>
>>>>>
>>>>> - we enforce the API usage requirement to disable an interrupt line with
>>>>> rtdm_irq_disable(), before eventually detaching the last IRQ handler for
>>>>> it, which is common sense anyway.
>>>>
>>>> That's an easy-to-get-wrong API. It would apply to non-shared IRQs only
>>>> (aka MSIs). No-go IMHO.
>>>
>>> - the software does not want to share MSIs, or something is quite wrong
>>> - xnintr_detach() is broken by design in its current implementation for
>>> shared handlers, because it leaves the decision to mask the IRQ line to
>>> the user. This could not work for multiple non-cooperating drivers
>>> sharing an IRQ, at least not in a raceless way.
>>
>> It works (minus the missing synchronize_irq) if you disable the IRQ at
>> device level as drivers normally do.
> 
> synchronize_irq without prior IRQ disabling does not work for the
> pipeline. So relying on synchronize_irq to only mask at device level
> without disabling the IRQ line just does not work.

It is not implemented, but it is not impossible. It takes
synchronization on hard IRQ context exits on all CPUs and waiting on
deassertion of any pending bit for that IRQ in question. That is surely
implementable if we wanted to.

> 
>>>
>>> So the issue is not that much about MSI vs pinned-type IRQs, it is
>>> rather about shared vs non-shared interrupts. What we want is a sane API
>>> and behavior for both types, which happens to fit the limited support we
>>> have now for MSI, not the other way around.
>>>
>>> Therefore, until the MSI issue is fixed in the pipeline:
>>>
>>> - non-shared mode users should call xnintr_disable(irq) then
>>> xnintr_detach(irq) explicitly. This is what everybody should do today
>>> already (via rtdm_irq_disable() or whatever), since xnintr_detach() does
>>> not disable the IRQ in the current implementation.
>>
>> No one should call rtdm_irq_disable today for whatever IRQ unless in
>> very few exceptional cases. I think I should extend the documentation of
>> rtdm_irq_enabled/disable ASAP to clarify their exceptional nature.
>>
> 
> rtdm_irq_enable() is required to startup the IRQ line, since Xenomai
> leaves interrupts in a disabled state at descriptor init time, so this
> one is hardly exceptionally used.

Nope it isn't. That's rtdm_irq_request business for quite a while now -
for a good reason: to avoid that driver writers introduce bugs by
calling rtdm_irq_disable on shutdown.

> 
> rtdm_irq_disable() should be seldomly used indeed, provided the MSI
> support is fixed. Until then, xnintr_detach won't be allowed to disable
> non-shared interrupts. But if you discourage disabling IRQs in general
> before detaching them, then you basically tell people that their driver
> should not expect the system to help them in any way for synchronizing
> IRQs, and this is something I disagree with.

Driver writers should not bother about internals of Xenomai here.
rtdm_irq_disable before rtdm_irq_free is not required and should not
show up in properly written drivers. If that's not true today or after
upcoming fixes, Xenomai needs to be fixed.

> 
>>>
>>> - shared mode users should only call xnintr_detach(irq), which would
>>> then decide racelessly whether to disable the line. In essence, this
>>> could never affect MSIs.
>>>
>>> When the MSI support is generally available with no limitation on
>>> masking/unmasking contexts, then xnintr_detach() could be updated to
>>> forcibly disable the detached IRQ in the non-shared case as well. At
>>> worst, the legacy code would end up calling xnintr_disable() uselessly
>>> when the pipeline fix is merged.
>>>
>>> What we can do in the meantime, is 1) fixing xnintr_detach() in the
>>> shared case, 2) introducing an error check to prevent detaching
>>> non-shared IRQs that are still enabled. That won't have any impact on
>>> the partial MSI support we have today.
>>>
>>>>
>>>>>
>>>>> - absolutely no change for people who currently rely on partial MSI
>>>>> support, provided they duly disable IRQ lines before detaching their
>>>>> last handler via the appropriate RTDM interface.
>>>>>
>>>>> Can we deal on this?
>>>>>
>>>>
>>>> Nope, don't think so. The only option I see (besides using my original
>>>> proposal of a dummy handler for deregistering - still much simpler than
>>>> the current patches) is to emulate MSI masking in the same run, thus
>>>> providing solutions for both issues.
>>>
>>> The dummy handler solution is incomplete because it does not address the
>>> IRQ disabling issue in xnintr_detach() for shared IRQs. This is the gist
>>> of the patch I sent, actually.
>>
>> It is complete in this regard as it does not touch the line state. It
>> actually implements the Linux pattern.
>>
> 
> The linux pattern in free_irq(), which is semantically equivalent of
> xnintr_detach() for us does include IRQ disabling when the last handler
> is detached. Introducing the dummy handler looks a bit like papering
> over the issue, so that weakly synchronized xnintr_detach() might be
> sufficient. I don't agree there.

Linux disables the line after removing the handler. Disabling has no
effect on ongoing interrupts that are about to pick up the action
handler. The key is: Linux does not crash on NULL action. So the dummy
handler is the same thing, just without a spinlock and a conditional branch.

> 
>>
>>>
>>> Besides, I'm still scratching my head: how would introducing a dummy
>>> handler help dealing with that sequence?
>>>
>>> CPU0					CPU1
>>>
>>> xnintr_disable => mask irq		<irq>
>>
>> No disable here, mask must be at device level
>> .
>>
>>> xnintr_detach				real-time handler
>>> 	<install dummy handler>			<touch device>
>>> 					ipipe_end_irq() => unmask irq
>>
>> xnarch_synchronize_irq
>>
> 
> No, it is too late. It assumes the driver could somehow prevent CPU1
> from touching the device before the dummy handler could be installed by
> ipipe_virtualize_irq. But synchronizing and releasing a pipelined IRQ
> must be done in that order, and all locks dropped, with interrupts
> enabled on CPU0. I can't see how just replacing mask_irq by mask_device
> would work any better, and prevent CPU0 from receiving an unexpected IRQ
> from the device it is supposed to have disabled a few instructions
> before. This is where disabling the IRQ comes into play, /me think.

Receiving that IRQ is not an issue at all. It is for us as the pipeline
crashes. Linux swallows it, and after synchronize_irq, there is no more
risk that the deregistered handler could still be in use.

> 
>>>
>>> Then, on whichever CPU, we could receive a device IRQ as we touched the
>>> hardware before leaving the handler, passing that IRQ to the root domain
>>> which has no proper handler for it:
>>>
>>> - lucky scenario: the device is in a state that won't cause it to kick
>>> yet another interrupt even if the last one was not serviced, and we just
>>> end up with a nifty spurious IRQ message in the kernel log.
>>>
>>> - out of luck: we touched the device on CPU1 in a way that claims for a
>>> specific action by the software, which won't be able to perform it
>>> because it is now branching to the dummy handler. Whether we have some
>>> code to silence the device (and not only the IRQ line) before or after
>>> we detach the IRQ on CPU0 does not fix the issue, since we have no
>>> synchronization between the CPUs whatsoever.
>>>
>>
>> The driver is responsible for synchronizing its shutdown sequence
>> between the cleanup code and any of its in-flight IRQs. If that handler
>> reenables the IRQ that the shutdown code just disabled (all at device
>> level, of course), something is seriously broken - but not on Xenomai
>> side. As I explained before: That potential in-flight IRQ becomes
>> spurious at the point the drivers starts the shutdown, thus can safely
>> be ignored.
> 
> For mask_device to be useful in SMP, I guess we both agree that we need
> some kind of barrier for synchronizing in flight IRQs before they become
> not only spurious IRQs, but also deadly ones. Reading the kernel code, I
> can understand how this is done, both in disable_irq() and __free_irq():
> first disable the device in the driver, then disable the line, then
> synchronize. The barrier mechanism is there.

The barrier is the descriptor lock (to protect against future events on
the line) and then synchronize_irq (to protect against already running
events).

> 
> What kind of construct would you then recommend to Xenomai driver
> writers, to have the same guarantee with only the driver doing
> "disable_device" and Xenomai doing "synchronize"? This is what I'm
> likely missing.
> 

my_device_disable_irqs();
rtdm_irq_free();

That must suffice, and Xenomai must be fixed to guarantee this. If
rtdm_irq_free also automatically disables the line is irrelevant for the
driver writer, it's an implementation detail for Xenomai - nothing I
vote against in principle once we can provide soft-IRQ masking for MSI
or whatever problematic IRQ type.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Xenomai-help] kernel oopses when killing realtime task
  2010-11-12 18:42                                                                       ` Jan Kiszka
@ 2010-11-14 21:28                                                                         ` Philippe Gerum
  0 siblings, 0 replies; 51+ messages in thread
From: Philippe Gerum @ 2010-11-14 21:28 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Fri, 2010-11-12 at 19:42 +0100, Jan Kiszka wrote: 
> Am 12.11.2010 18:42, Philippe Gerum wrote:
> > On Fri, 2010-11-12 at 15:30 +0100, Jan Kiszka wrote: 
> >> Am 12.11.2010 14:57, Philippe Gerum wrote:
> >>> On Fri, 2010-11-12 at 10:14 +0100, Jan Kiszka wrote:
> >>>> Am 12.11.2010 09:48, Philippe Gerum wrote:
> >>>>> On Tue, 2010-11-09 at 14:12 +0100, Jan Kiszka wrote:
> >>>>>> Am 09.11.2010 10:36, Philippe Gerum wrote:
> >>>>>>> On Tue, 2010-11-09 at 09:39 +0100, Jan Kiszka wrote:
> >>>>>>>> Am 09.11.2010 09:26, Philippe Gerum wrote:
> >>>>>>>>> On Tue, 2010-11-09 at 09:01 +0100, Jan Kiszka wrote:
> >>>>>>>>>> Am 07.11.2010 17:22, Jan Kiszka wrote:
> >>>>>>>>>>> Am 07.11.2010 16:15, Philippe Gerum wrote:
> >>>>>>>>>>>> The following patches implements the teardown approach. The basic idea
> >>>>>>>>>>>> is:
> >>>>>>>>>>>> - neither break nor improve old setups with legacy I-pipe patches not
> >>>>>>>>>>>> providing the revised ipipe_control_irq call.
> >>>>>>>>>>>> - fix the SMP race when detaching interrupts.
> >>>>>>>>>>>
> >>>>>>>>>>> Looks good.
> >>>>>>>>>>
> >>>>>>>>>> This actually causes one regression: I've just learned that people are
> >>>>>>>>>> already happily using MSIs with Xenomai in the field. This is perfectly
> >>>>>>>>>> fine as long as you don't fiddle with rtdm_irq_disable/enable in
> >>>>>>>>>> non-root contexts or while hard IRQs are disable. The latter requirement
> >>>>>>>>>> would be violated by this fix now.
> >>>>>>>>>
> >>>>>>>>> What we could do is handle this corner-case in the ipipe directly, going
> >>>>>>>>> for a nop when IRQs are off on a per-arch basis only to please those
> >>>>>>>>> users,
> >>>>>>>>
> >>>>>>>> Don't we disable hard IRQs also then the root domain is the only
> >>>>>>>> registered one? I'm worried about pushing regressions around, then to
> >>>>>>>> plain Linux use-cases of MSI (which are not broken in anyway - except
> >>>>>>>> for powerpc).
> >>>>>>>
> >>>>>>> The idea is to provide an ad hoc ipipe service for this, to be used by
> >>>>>>> the HAL. A service that would check the controller for the target IRQ,
> >>>>>>> and handle MSI ones conditionally. For sure, we just can't put those
> >>>>>>> conditionally bluntly into the chip mask handler and expect the kernel
> >>>>>>> to be happy.
> >>>>>>>
> >>>>>>> In fact, we already have __ipipe_enable/disable_irq from the internal
> >>>>>>> Adeos interface avail, but they are mostly wrappers for now. We could
> >>>>>>> make them a bit more smart, and handle the MSI issue as well. We would
> >>>>>>> then tell the HAL to switch to using those arch-agnostic helpers
> >>>>>>> generally, instead of peeking directly into the chip controller structs
> >>>>>>> like today.
> >>>>>>
> >>>>>> This belongs to I-pipe, like we already have ipipe_end, just properly
> >>>>>> wrapped to avoid descriptor access. That's specifically important if we
> >>>>>> want to emulate MSI masking in software. I've the generic I-pipe
> >>>>>> infrastructure ready, but the backend, so far consisting of x86 MSI
> >>>>>> hardening, unfortunately needs to be rewritten.
> >>>>>>
> >>>>>>>
> >>>>>>> If that ipipe "feature" is not detected by the HAL, then we would
> >>>>>>> refrain from disabling the IRQ in xnintr_detach. In effect, this would
> >>>>>>> leave the SMP race window open, but since we need recent ipipes to get
> >>>>>>> it plugged already anyway (for the revised ipipe_control_irq), we would
> >>>>>>> still remain in the current situation:
> >>>>>>> - old patches? no SMP race fix, no regression
> >>>>>>> - new patches? SMP race fix avail, no regression
> >>>>>>
> >>>>>> Sounds good.
> >>>>>
> >>>>> Now that I slept on it, I find the approach of working around pipeline
> >>>>> limitations this way, to be incorrect.
> >>>>>
> >>>>> Basically, the issue is that we still don't have 100% reliable handling
> >>>>> of MSI interrupts (actually, we only have partial handling, and solely
> >>>>> for x86), but this is no reason to introduce code in the pipeline
> >>>>> interface which would perpetuate this fact. I see this as a "all or
> >>>>> nothing" issue: either MSI is fully handled and there shall be no
> >>>>> restriction on applying common operations such as masking/unmasking on
> >>>>> the related IRQs, or it is not, and we should not export "conditionally
> >>>>> working" APIs.
> >>>>>
> >>>>> In the latter case, the responsibility to rely on MSI support belongs to
> >>>>> the user, which then should know about the pending restrictions, and
> >>>>> decides for himself whether to use MSI. So I'm heading to this solution
> >>>>> instead:
> >>>>>
> >>>>> - when detaching the last handler for a given IRQ, instead of forcibly
> >>>>> disabling the IRQ line, the nucleus would just make sure that such IRQ
> >>>>> is already in a disabled state, and bail out on error if not (probably
> >>>>> with a kernel warning to make the issue obvious).
> >>>>
> >>>> Fiddling with the IRQ "line" state is a workaround for the missing
> >>>> synchronize_irq service in Xenomai/I-pipe. If we had this, all this
> >>>> disabling become unneeded.
> >>>
> >>> Actually, no. The synchronize irq service in the pipeline could have
> >>> been introduced weeks ago in a snap, provided Xenomai did not call
> >>> ipipe_virtualize_irq holding a lock with irqs off - this is in essence
> >>> what my first patch to Pavel did, but could not work for the reason
> >>> mentioned. That is what the patch we are discussing now is all about:
> >>> clearing issues in Xenomai first.
> >>>
> >>> Additionally, having an IRQ disabled is a requirement to properly
> >>> synchronize that IRQ later, it is not an option. It is part of the sync
> >>> protocol actually. If you don't do this, it lacks a basic guarantee, it
> >>> can't work.
> >>
> >> The required pattern from driver POV is
> >>  - disable IRQ at device level
> >>  - flush pending IRQs
> >>  - deregister handler
> >>
> >> Linux does the last two steps in reverse order as it synchronizes
> >> descriptor changes against descriptor usage internally. I-pipe requires
> >> the above ordering (unless my patch is used to harden
> >> ipipe_virtualize_irq against NULL handlers). Still, the driver should
> >> find step 2 and 3 as part of rtdm_irq_free. That's all.
> >>
> > 
> > It looks like you are omitting the disable IRQ line step which free_irq
> > does when no more handler is attached to the line. You don't seem to
> > take into account that pipeline-wise, synchronizing an IRQ upon shutdown
> > requires to mask it first at PIC level, either.
> 
> Disabling at IRQ controller level is a step which is not conceptually
> required to perform a safe shutdown. It is likely applied to suppress
> spurious IRQs due to buggy hardware.

Your interpretation is broader than what the kernel doc actually says.
Only shared IRQs are mentioned as candidates for device disabling first
in free_irq(), simply because we may need the IRQ line to remain enabled
for other drivers, that is the only documented prerequisite.

My interpretation of disable_irq() in __free_irq(), is that it keeps the
invariant that an IRQ line should remain disabled until a handler is
registered. Dealing with buggy hardware is a useful side-effect, which
also happens to deal with buggy drivers not doing device_disable
properly.

Many linux drivers - particularly in the embedded space - assume to deal
with non-shared interrupts, which they expect to be in a disable state
before they call request_irq for them. I'm not saying this is correct in
theory, but quite common in practice.

> > 
> >> Jan 
> >>>
> >>>>
> >>>>>
> >>>>> - track the IRQ line state from xnintr_enable/xnintr_disable routines,
> >>>>> so that xnintr_detach can determine whether the call is legit. Of
> >>>>> course, this also means that any attempt to take sideways to
> >>>>> enable/disable nucleus managed interrupts at PIC level would break that
> >>>>> logic, but doing so would be the root bug anyway.
> >>>>>
> >>>>> The advantage of doing so would be three-fold:
> >>>>>
> >>>>> - no pipeline code to acknowledge (or even perpetuate) the fact that MSI
> >>>>> support is half working, half broken. We need to fix it properly, so
> >>>>> that we can use it 100% reliably, from whatever context commonly allowed
> >>>>> for enabling/disabling IRQs (and not "from root domain with IRQs on"
> >>>>> only). Typically, I fail to see how one would cope with such limitation,
> >>>>> if a real-time handler detects that some device is going wild and really
> >>>>> needs to shut it down before the whole system crashes.
> >>>>
> >>>> MSIs are edge-triggered. Only broken hardware continuously sending bogus
> >>>> messages can theoretically cause troubles. In practice (ie. in absence
> >>>> of broken HW), we see a single spurious IRQ at worst.
> >>>>
> >>>
> >>> The driver should be able to work the same way whether it deals with MSI
> >>> support, or is given pinned (level) IRQs.
> >>
> >> Exactly my point, thus rtdm_irq_disable in driver code is a no-go.
> >>
> > 
> > Since you don't have such things as shared MSIs, the only issue is with
> > non-shared handling. And disabling the interrupt line for a non-shared
> > handler which is being deregistered makes sense, in all cases, MSI or
> > not. The fact that MSI support is broken today for us does not mean that
> > we should design a broken xnintr_detach API to cope with that -
> > hopefully temporary - limitation. So, xnarch_disable_irq in
> > xnintr_detach is obviously correct, even if we currently have to refrain
> > from doing this for non-shared handlers because of MSI.
> 
> It is not incorrect, but it must not be mandatory for proper shutdown.
> 

Not for shutdown, but for consistency of interrupt states.

> > 
> >>>
> >>>>>
> >>>>> - we enforce the API usage requirement to disable an interrupt line with
> >>>>> rtdm_irq_disable(), before eventually detaching the last IRQ handler for
> >>>>> it, which is common sense anyway.
> >>>>
> >>>> That's an easy-to-get-wrong API. It would apply to non-shared IRQs only
> >>>> (aka MSIs). No-go IMHO.
> >>>
> >>> - the software does not want to share MSIs, or something is quite wrong
> >>> - xnintr_detach() is broken by design in its current implementation for
> >>> shared handlers, because it leaves the decision to mask the IRQ line to
> >>> the user. This could not work for multiple non-cooperating drivers
> >>> sharing an IRQ, at least not in a raceless way.
> >>
> >> It works (minus the missing synchronize_irq) if you disable the IRQ at
> >> device level as drivers normally do.
> > 
> > synchronize_irq without prior IRQ disabling does not work for the
> > pipeline. So relying on synchronize_irq to only mask at device level
> > without disabling the IRQ line just does not work.
> 
> It is not implemented, but it is not impossible. It takes
> synchronization on hard IRQ context exits on all CPUs and waiting on
> deassertion of any pending bit for that IRQ in question. That is surely
> implementable if we wanted to.
> 

Yes, I don't challenge that. But it is not as trivial and cheap as it
may seem; for this to work we would need to serialize interrupts among
all CPUs with a per-descriptor lock at pipeline level, for introducing
some kind of "in progress" bit, because you can't rely on the pending
bit.

In short:

- using a dummy handler + busy wait would work, at the expense of more
overhead for handling every IRQ and perhaps more issues ahead.
- using a critical IPI to switch the irq in pass/discard on each CPU
synchronously works and actually fits the pipeline model, but requires
to drop all Xenomai locks while synchronizing to prevent deadlocks.

> > 
> >>>
> >>> So the issue is not that much about MSI vs pinned-type IRQs, it is
> >>> rather about shared vs non-shared interrupts. What we want is a sane API
> >>> and behavior for both types, which happens to fit the limited support we
> >>> have now for MSI, not the other way around.
> >>>
> >>> Therefore, until the MSI issue is fixed in the pipeline:
> >>>
> >>> - non-shared mode users should call xnintr_disable(irq) then
> >>> xnintr_detach(irq) explicitly. This is what everybody should do today
> >>> already (via rtdm_irq_disable() or whatever), since xnintr_detach() does
> >>> not disable the IRQ in the current implementation.
> >>
> >> No one should call rtdm_irq_disable today for whatever IRQ unless in
> >> very few exceptional cases. I think I should extend the documentation of
> >> rtdm_irq_enabled/disable ASAP to clarify their exceptional nature.
> >>
> > 
> > rtdm_irq_enable() is required to startup the IRQ line, since Xenomai
> > leaves interrupts in a disabled state at descriptor init time, so this
> > one is hardly exceptionally used.
> 
> Nope it isn't. That's rtdm_irq_request business for quite a while now -
> for a good reason: to avoid that driver writers introduce bugs by
> calling rtdm_irq_disable on shutdown.
> > 
> > rtdm_irq_disable() should be seldomly used indeed, provided the MSI
> > support is fixed. Until then, xnintr_detach won't be allowed to disable
> > non-shared interrupts. But if you discourage disabling IRQs in general
> > before detaching them, then you basically tell people that their driver
> > should not expect the system to help them in any way for synchronizing
> > IRQs, and this is something I disagree with.
> 
> Driver writers should not bother about internals of Xenomai here.
> rtdm_irq_disable before rtdm_irq_free is not required and should not
> show up in properly written drivers. If that's not true today or after
> upcoming fixes, Xenomai needs to be fixed.

This is not primarily a Xenomai issue, but two pipeline issues:

- disabling or enabling MSIs via the pipeline interface only from
restricted contexts.

- lack of built in synchronization mechanism in ipipe_virtualize_irq.

What has to be changed in Xenomai is the calling context for
ipipe_virtualize_irq, so that the latter allows proper synchronization
internally.

> 
> > 
> >>>
> >>> - shared mode users should only call xnintr_detach(irq), which would
> >>> then decide racelessly whether to disable the line. In essence, this
> >>> could never affect MSIs.
> >>>
> >>> When the MSI support is generally available with no limitation on
> >>> masking/unmasking contexts, then xnintr_detach() could be updated to
> >>> forcibly disable the detached IRQ in the non-shared case as well. At
> >>> worst, the legacy code would end up calling xnintr_disable() uselessly
> >>> when the pipeline fix is merged.
> >>>
> >>> What we can do in the meantime, is 1) fixing xnintr_detach() in the
> >>> shared case, 2) introducing an error check to prevent detaching
> >>> non-shared IRQs that are still enabled. That won't have any impact on
> >>> the partial MSI support we have today.
> >>>
> >>>>
> >>>>>
> >>>>> - absolutely no change for people who currently rely on partial MSI
> >>>>> support, provided they duly disable IRQ lines before detaching their
> >>>>> last handler via the appropriate RTDM interface.
> >>>>>
> >>>>> Can we deal on this?
> >>>>>
> >>>>
> >>>> Nope, don't think so. The only option I see (besides using my original
> >>>> proposal of a dummy handler for deregistering - still much simpler than
> >>>> the current patches) is to emulate MSI masking in the same run, thus
> >>>> providing solutions for both issues.
> >>>
> >>> The dummy handler solution is incomplete because it does not address the
> >>> IRQ disabling issue in xnintr_detach() for shared IRQs. This is the gist
> >>> of the patch I sent, actually.
> >>
> >> It is complete in this regard as it does not touch the line state. It
> >> actually implements the Linux pattern.
> >>
> > 
> > The linux pattern in free_irq(), which is semantically equivalent of
> > xnintr_detach() for us does include IRQ disabling when the last handler
> > is detached. Introducing the dummy handler looks a bit like papering
> > over the issue, so that weakly synchronized xnintr_detach() might be
> > sufficient. I don't agree there.
> 
> Linux disables the line after removing the handler. Disabling has no
> effect on ongoing interrupts that are about to pick up the action
> handler.
>  The key is: Linux does not crash on NULL action. So the dummy
> handler is the same thing, just without a spinlock and a conditional branch.
> 

On a pipelined kernel, disabling the IRQ means raising the IPIPE_LOCK
bit as well as silencing the line at PIC level, which does prevent
further dispatching immediately for any interrupt which has been taken,
but did not reach the dispatch call yet.

Ongoing handlers on remote CPUs will finish before the critical IPI is
taken and set the IRQ in pass mode. Finally, spurious IRQs still pending
at hardware level will skip the domain because of the latter action.

> > 
> >>
> >>>
> >>> Besides, I'm still scratching my head: how would introducing a dummy
> >>> handler help dealing with that sequence?
> >>>
> >>> CPU0					CPU1
> >>>
> >>> xnintr_disable => mask irq		<irq>
> >>
> >> No disable here, mask must be at device level
> >> .
> >>
> >>> xnintr_detach				real-time handler
> >>> 	<install dummy handler>			<touch device>
> >>> 					ipipe_end_irq() => unmask irq
> >>
> >> xnarch_synchronize_irq
> >>
> > 
> > No, it is too late. It assumes the driver could somehow prevent CPU1
> > from touching the device before the dummy handler could be installed by
> > ipipe_virtualize_irq. But synchronizing and releasing a pipelined IRQ
> > must be done in that order, and all locks dropped, with interrupts
> > enabled on CPU0.

I was wrong: we don't need interrupts enabled to enforce a critical
entry, but we have to drop all locks.

>  I can't see how just replacing mask_irq by mask_device
> > would work any better, and prevent CPU0 from receiving an unexpected IRQ
> > from the device it is supposed to have disabled a few instructions
> > before. This is where disabling the IRQ comes into play, /me think.
> 
> Receiving that IRQ is not an issue at all. It is for us as the pipeline
> crashes. Linux swallows it, and after synchronize_irq, there is no more
> risk that the deregistered handler could still be in use.
> 

The problem is not about receiving one IRQ. This is about having
touch_device re-enabling the hardware inadvertently, in a way which may
cause the board to be hammered by a level-triggered IRQ with no way to
clear it at device level from the now dummy handler. That would be a bit
too much to swallow.

I understand you want to fix this by keeping Xenomai's interrupt lock
held until xnintr_detach has fully synchronized, but then, you need the
busy wait option to be available for synchronizing, but implementing it
would induce overhead in the hot path.

> > 
> >>>
> >>> Then, on whichever CPU, we could receive a device IRQ as we touched the
> >>> hardware before leaving the handler, passing that IRQ to the root domain
> >>> which has no proper handler for it:
> >>>
> >>> - lucky scenario: the device is in a state that won't cause it to kick
> >>> yet another interrupt even if the last one was not serviced, and we just
> >>> end up with a nifty spurious IRQ message in the kernel log.
> >>>
> >>> - out of luck: we touched the device on CPU1 in a way that claims for a
> >>> specific action by the software, which won't be able to perform it
> >>> because it is now branching to the dummy handler. Whether we have some
> >>> code to silence the device (and not only the IRQ line) before or after
> >>> we detach the IRQ on CPU0 does not fix the issue, since we have no
> >>> synchronization between the CPUs whatsoever.
> >>>
> >>
> >> The driver is responsible for synchronizing its shutdown sequence
> >> between the cleanup code and any of its in-flight IRQs. If that handler
> >> reenables the IRQ that the shutdown code just disabled (all at device
> >> level, of course), something is seriously broken - but not on Xenomai
> >> side. As I explained before: That potential in-flight IRQ becomes
> >> spurious at the point the drivers starts the shutdown, thus can safely
> >> be ignored.
> > 
> > For mask_device to be useful in SMP, I guess we both agree that we need
> > some kind of barrier for synchronizing in flight IRQs before they become
> > not only spurious IRQs, but also deadly ones. Reading the kernel code, I
> > can understand how this is done, both in disable_irq() and __free_irq():
> > first disable the device in the driver, then disable the line, then
> > synchronize. The barrier mechanism is there.
> 
> The barrier is the descriptor lock (to protect against future events on
> the line) and then synchronize_irq (to protect against already running
> events).
> 

Not the descriptor lock per se since it is dropped before synchronizing,
only what you do within this locked section is part of the barrier, such
as raising the disabled flag in that descriptor to prevent further
handling and unmasking in the various irq flow handlers. But I guess we
are in fact using different names to refer to the same "locking"
action. 

> 
> > 
> > What kind of construct would you then recommend to Xenomai driver
> > writers, to have the same guarantee with only the driver doing
> > "disable_device" and Xenomai doing "synchronize"? This is what I'm
> > likely missing.
> > 
> 
> my_device_disable_irqs();
> rtdm_irq_free();
> 
> That must suffice, and Xenomai must be fixed to guarantee this.

Xenomai is not the issue, it should be able to call ipipe_virtualize_irq
and expect proper synchronization of the shut down irq, this is the gist
of the matter. There is no reason for the nucleus to implement permanent
workarounds for shortcomings in the pipeline, and there is absolutely no
reason to ask each and every driver to implement yet another ad hoc irq
synchronization mechanism to work around the same shortcomings.

For the time being, the patch synchronizes externally via
ipipe_control_irq from rthal_irq_release, but this is a kludge, the
proper solution is to fold this with the I-pipe irq deregistration code,
just like free_irq does for the regular kernel. This kludge is only a
temporary measure to plug the race, until ipipe_virtualize_irq is
eventually fixed.

For that to happen, we must call ipipe_virtualize_irq with all Xenomai
locks dropped, hence the patch, to prepare for fixing the pipeline. 

> If
> rtdm_irq_free also automatically disables the line is irrelevant for the
> driver writer, it's an implementation detail for Xenomai - nothing I
> vote against in principle once we can provide soft-IRQ masking for MSI
> or whatever problematic IRQ type.
> 

Soft-masking could be obtained playing a variant of the
ipipe_irq_lock/unlock game, with a deferred execution of the
msi_mask/unmask_irq calls when the root domain resumes.

Regarding the pending SMP race, I will push that fix upstream based on
the one I submitted earlier, but amended with a change which should be
functionally equivalent to this:

diff --git a/ksrc/nucleus/intr.c b/ksrc/nucleus/intr.c
index 0cf66f4..3d5c64c 100644
--- a/ksrc/nucleus/intr.c
+++ b/ksrc/nucleus/intr.c
@@ -844,7 +844,7 @@ int xnintr_detach(xnintr_t *intr)
 	 * be running concurrently on any remote CPU before we tell
 	 * the pipeline to release the IRQ.
 	 */
-	xnarch_disable_irq(intr->irq);
+	___ipipe_lock_irq(&rthal_domain, ipipe_processor_id(), intr->irq);
 	/*
 	 * We use the teardown flag to prevent further attachment on
 	 * the same IRQ line, after we dropped the lock to release the
@@ -860,6 +860,10 @@ int xnintr_detach(xnintr_t *intr)
 	ret = xnintr_irq_release(intr);
 	xnlock_get(&intrlock);
 	__clear_bit(intr->irq, teardown);
+	xnlock_put_irqrestore(&intrlock, s);
+	xnarch_disable_irq(intr->irq);
+
+	return 0;
  out:
 	xnlock_put_irqrestore(&intrlock, s);
 
This way, we will have the IRQ line disabled upon last detach with some
guarantee against driver/hardware issues before synchronizing, no
leakage when doing so on the RTDM API, and we won't break MSI usage. I
just need to find a way to bury the __ipipe_lock_irq() call into some
HAL-level interface.

If at some point, anyone comes with a better approach than the teardown
+ critical IPI sequence, then we will reconsider the issue.

-- 
Philippe.







^ permalink raw reply related	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2010-11-14 21:28 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-07 11:57 [Xenomai-help] kernel oopses when killing realtime task Pavel Machek
2010-10-07 12:11 ` Gilles Chanteperdrix
2010-10-07 13:00   ` Gilles Chanteperdrix
2010-10-07 12:32 ` Jan Kiszka
2010-10-08  7:01   ` Pavel Machek
2010-10-08  7:20     ` Gilles Chanteperdrix
2010-10-08  8:17     ` Philippe Gerum
2010-10-08  8:41       ` Jan Kiszka
2010-10-08  8:57         ` Philippe Gerum
2010-10-08  9:00           ` Philippe Gerum
2010-10-08  9:41     ` Philippe Gerum
2010-10-13  9:03       ` Pavel Machek
2010-10-13  9:16         ` Philippe Gerum
2010-10-13  9:26           ` Pavel Machek
2010-10-13 14:52             ` Philippe Gerum
2010-10-25 16:48               ` Philippe Gerum
2010-10-25 18:10                 ` Jan Kiszka
2010-10-25 19:08                   ` Philippe Gerum
2010-10-25 19:11                     ` Philippe Gerum
2010-10-25 19:15                     ` Jan Kiszka
2010-10-25 19:20                       ` Philippe Gerum
2010-10-25 19:22                         ` Jan Kiszka
2010-10-25 21:12                           ` Philippe Gerum
2010-10-25 21:22                             ` Jan Kiszka
2010-10-25 21:40                               ` Philippe Gerum
2010-10-25 21:47                                 ` Jan Kiszka
2010-10-26  4:43                                   ` Philippe Gerum
2010-10-26  5:22                                     ` Jan Kiszka
2010-10-26 19:33                                       ` Jan Kiszka
2010-10-28  5:17                                         ` Philippe Gerum
2010-10-28  7:31                                           ` Jan Kiszka
2010-10-28  7:38                                             ` Jan Kiszka
2010-10-28  7:46                                             ` Philippe Gerum
2010-11-07 15:15                                               ` Philippe Gerum
2010-11-07 16:22                                                 ` Jan Kiszka
2010-11-07 16:55                                                   ` Philippe Gerum
2010-11-07 16:59                                                   ` Philippe Gerum
2010-11-07 17:19                                                   ` Philippe Gerum
2010-11-09  8:01                                                   ` Jan Kiszka
2010-11-09  8:26                                                     ` Philippe Gerum
2010-11-09  8:39                                                       ` Jan Kiszka
2010-11-09  9:36                                                         ` Philippe Gerum
2010-11-09 13:12                                                           ` Jan Kiszka
2010-11-12  8:48                                                             ` Philippe Gerum
2010-11-12  9:14                                                               ` Jan Kiszka
2010-11-12 13:57                                                                 ` Philippe Gerum
2010-11-12 14:30                                                                   ` Jan Kiszka
2010-11-12 17:42                                                                     ` Philippe Gerum
2010-11-12 18:42                                                                       ` Jan Kiszka
2010-11-14 21:28                                                                         ` Philippe Gerum
2010-10-07 14:07 ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.