[patch] IRQ threads

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch] IRQ threads
@ 2004-07-27 22:50 Scott Wood
  2004-07-28  6:27 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Scott Wood @ 2004-07-27 22:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, La Monte H.P. Yarroll, Manas Saksena

I have attached a patch for implementing IRQ handlers in threads, for
latency-reduction purposes.  It requires that softirqs must be run in
threads (or else they could end up running inside the IRQ threads,
which will at the very least trigger bugs due to in_irq() being set). 
I've tested it with Ingo's voluntary-preempt J7 patch, and it should
work with the TimeSys softirq thread patch as well (though you might
get a conflict with the PF_IRQHANDLER definition; just merge them
into one).

Some notes:

1. This may not work properly with some interrupt controller code,
which doesn't do the obvious thing with mask_and_ack() and end(). 
This includes the IO-APIC code, which has an empty end() for edge
triggered interrupts and an empty mask_and_ack() for level-triggered
interrupts.  The mask_and_ack() needs to really mask the interrupt,
as otherwise the hardware will not deliver lower-priority (to it)
interrupts, which may have a higher-priority thread.

2. This patch does not disable local interrupts when running a
threaded handler.  SMP-safe drivers shouldn't be directly bothered by
this (as the interrupt could as easily have happened on another CPU),
but there may be some interactions with softirqs and per-cpu data, if
a softirq thread preempts an IRQ thread, or an IRQ thread gets
migrated to a different CPU.  I'm particularly worried about the
network code.  If possible, I'd like to find and fix such breakages
rather than use local_irq_disable(), as that would prevent IRQ
proritization from working, and prevent IRQ threads from being used
to isolate the rest of the system from long-running IRQs (such as
non-DMA IDE).

3. The i8042 driver had to be marked SA_NOTHREAD, as there are
non-preemptible regions where it spins, waiting for an interrupt.
Ideally, this driver (and others like it) should be fixed to either
do a cond_resched() or use a wait queue.

4. This might be a good time to get around to moving the bulk of the
arch/whatever/kernel/irq.c into generic code, as the code said was
supposed to happen in 2.5.  This patch is currently only for x86
(though we've run IRQ threads on many different platforms in the
past).

5. Is there any reason why an IRQ controller might want to have its
end() called even if IRQ_DISABLED or IRQ_INPROGRESS is set?  It'd be
nice to merge those checks in with the IRQ_THREADPENDING/IRQ_THREADRUNNING 
checks.

6. This patch causes in_irq() to return true if an IRQ thread is
running, as some drivers use it in common code to determine how to
act.  in_interrupt(), however, will return false in such a case.
The exact meaning of these macros in the presence of IRQ threads
isn't very well defined, and I hope this results in sane behavior.

Signed-off-by: Scott Wood <scott.wood@timesys.com> under TS0058

diff -urN linux-2.6.8-rc2/arch/i386/kernel/i386_ksyms.c linux-2.6.8-rc2-irq-threads/arch/i386/kernel/i386_ksyms.c
--- linux-2.6.8-rc2/arch/i386/kernel/i386_ksyms.c	2004-07-27 17:06:24.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/arch/i386/kernel/i386_ksyms.c	2004-07-27 17:08:37.000000000 -0400
@@ -146,7 +146,6 @@
 EXPORT_SYMBOL_NOVERS(__read_lock_failed);
 
 /* Global SMP stuff */
-EXPORT_SYMBOL(synchronize_irq);
 EXPORT_SYMBOL(smp_call_function);
 
 /* TLB flushing */
@@ -154,6 +153,10 @@
 EXPORT_SYMBOL_GPL(flush_tlb_all);
 #endif
 
+#if defined(CONFIG_SMP) || defined(CONFIG_IRQ_THREADS) 
+EXPORT_SYMBOL(synchronize_irq);
+#endif
+
 #ifdef CONFIG_X86_IO_APIC
 EXPORT_SYMBOL(IO_APIC_get_PCI_irq_vector);
 #endif
diff -urN linux-2.6.8-rc2/arch/i386/kernel/i8259.c linux-2.6.8-rc2-irq-threads/arch/i386/kernel/i8259.c
--- linux-2.6.8-rc2/arch/i386/kernel/i8259.c	2004-07-27 17:06:24.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/arch/i386/kernel/i8259.c	2004-07-27 17:09:57.000000000 -0400
@@ -332,7 +332,8 @@
  * New motherboards sometimes make IRQ 13 be a PCI interrupt,
  * so allow interrupt sharing.
  */
-static struct irqaction fpu_irq = { math_error_irq, 0, CPU_MASK_NONE, "fpu", NULL, NULL };
+static struct irqaction fpu_irq = 
+	{ math_error_irq, SA_NOTHREAD, CPU_MASK_NONE, "fpu", NULL, NULL };
 
 void __init init_ISA_irqs (void)
 {
diff -urN linux-2.6.8-rc2/arch/i386/kernel/irq.c linux-2.6.8-rc2-irq-threads/arch/i386/kernel/irq.c
--- linux-2.6.8-rc2/arch/i386/kernel/irq.c	2004-07-27 17:06:24.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/arch/i386/kernel/irq.c	2004-07-27 17:08:37.000000000 -0400
@@ -200,8 +200,9 @@
 
 
 
-
-#ifdef CONFIG_SMP
+/* When IRQ threads are enabled, this has to synchronize with the thread.
+   The function to do this is provided in generic code. */
+#if defined(CONFIG_SMP) && !defined(CONFIG_IRQ_THREADS)
 inline void synchronize_irq(unsigned int irq)
 {
 	while (irq_desc[irq].status & IRQ_INPROGRESS)
@@ -226,10 +227,16 @@
 		local_irq_enable();
 
 	do {
-		status |= action->flags;
-		retval |= action->handler(irq, action->dev_id, regs);
+#ifdef CONFIG_IRQ_THREADS
+		if (action->flags & SA_NOTHREAD)
+#endif
+		{
+			status |= action->flags;
+			retval |= action->handler(irq, action->dev_id, regs);
+		}
 		action = action->next;
 	} while (action);
+
 	if (status & SA_SAMPLE_RANDOM)
 		add_interrupt_randomness(irq);
 	local_irq_disable();
@@ -289,13 +296,10 @@
  *
  * Called under desc->lock
  */
-static void note_interrupt(int irq, irq_desc_t *desc, irqreturn_t action_ret)
+static void note_interrupt(int irq, irq_desc_t *desc)
 {
-	if (action_ret != IRQ_HANDLED) {
+	if (desc->status & IRQ_UNHANDLED)
 		desc->irqs_unhandled++;
-		if (action_ret != IRQ_NONE)
-			report_bad_irq(irq, desc, action_ret);
-	}
 
 	desc->irq_count++;
 	if (desc->irq_count < 100000)
@@ -306,7 +310,7 @@
 		/*
 		 * The interrupt is stuck
 		 */
-		__report_bad_irq(irq, desc, action_ret);
+		__report_bad_irq(irq, desc, IRQ_NONE);
 		/*
 		 * Now kill the IRQ
 		 */
@@ -395,7 +399,14 @@
 			desc->status = status | IRQ_REPLAY;
 			hw_resend_irq(desc->handler,irq);
 		}
-		desc->handler->enable(irq);
+		
+		/* Don't unmask the IRQ if it's in progress, or else you
+		   could re-enter the IRQ handler.  As it is now enabled,
+		   the IRQ will be unmasked when the handler is finished. */
+		
+		if (!(desc->status & (IRQ_INPROGRESS | IRQ_THREADRUNNING |
+		                      IRQ_THREADPENDING)))
+			desc->handler->enable(irq);
 		/* fall-through */
 	}
 	default:
@@ -408,12 +419,7 @@
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 
-/*
- * do_IRQ handles all normal device IRQ's (the special
- * SMP cross-CPU interrupts have their own specific
- * handlers).
- */
-asmlinkage unsigned int do_IRQ(struct pt_regs regs)
+static void really_do_IRQ(struct pt_regs *regs)
 {	
 	/* 
 	 * We ack quickly, we don't want the irq controller
@@ -425,27 +431,11 @@
 	 * 0 return value means that this irq is already being
 	 * handled by some other CPU. (or is disabled)
 	 */
-	int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code  */
+	int irq = regs->orig_eax & 0xff; /* high bits used in ret_from_ code  */
 	irq_desc_t *desc = irq_desc + irq;
 	struct irqaction * action;
 	unsigned int status;
 
-	irq_enter();
-
-#ifdef CONFIG_DEBUG_STACKOVERFLOW
-	/* Debugging check for stack overflow: is there less than 1KB free? */
-	{
-		long esp;
-
-		__asm__ __volatile__("andl %%esp,%0" :
-					"=r" (esp) : "0" (THREAD_SIZE - 1));
-		if (unlikely(esp < (sizeof(struct thread_info) + STACK_WARN))) {
-			printk("do_IRQ: stack overflow: %ld\n",
-				esp - sizeof(struct thread_info));
-			dump_stack();
-		}
-	}
-#endif
 	kstat_this_cpu.irqs[irq]++;
 	spin_lock(&desc->lock);
 	desc->handler->ack(irq);
@@ -454,14 +444,17 @@
 	   WAITING is used by probe to mark irqs that are being tested
 	   */
 	status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);
-	status |= IRQ_PENDING; /* we _want_ to handle it */
+	status |= IRQ_PENDING |  /* we _want_ to handle it */
+	          IRQ_UNHANDLED; /* This will be cleared after a
+	                            handler that cares. */
 
 	/*
 	 * If the IRQ is disabled for whatever reason, we cannot
 	 * use the action we have.
 	 */
 	action = NULL;
-	if (likely(!(status & (IRQ_DISABLED | IRQ_INPROGRESS)))) {
+	if (likely(!(status & (IRQ_DISABLED | IRQ_INPROGRESS |
+	                       IRQ_THREADPENDING | IRQ_THREADRUNNING)))) {
 		action = desc->action;
 		status &= ~IRQ_PENDING; /* we commit to handling */
 		status |= IRQ_INPROGRESS; /* we are handling it */
@@ -487,89 +480,117 @@
 	 * useful for irq hardware that does not mask cleanly in an
 	 * SMP environment.
 	 */
-#ifdef CONFIG_4KSTACKS
-
 	for (;;) {
 		irqreturn_t action_ret;
-		u32 *isp;
-		union irq_ctx * curctx;
-		union irq_ctx * irqctx;
-
-		curctx = (union irq_ctx *) current_thread_info();
-		irqctx = hardirq_ctx[smp_processor_id()];
-
-		spin_unlock(&desc->lock);
-
-		/*
-		 * this is where we switch to the IRQ stack. However, if we are already using
-		 * the IRQ stack (because we interrupted a hardirq handler) we can't do that
-		 * and just have to keep using the current stack (which is the irq stack already
-		 * after all)
-		 */
-
-		if (curctx == irqctx)
-			action_ret = handle_IRQ_event(irq, &regs, action);
-		else {
-			/* build the stack frame on the IRQ stack */
-			isp = (u32*) ((char*)irqctx + sizeof(*irqctx));
-			irqctx->tinfo.task = curctx->tinfo.task;
-			irqctx->tinfo.previous_esp = current_stack_pointer();
-
-			*--isp = (u32) action;
-			*--isp = (u32) &regs;
-			*--isp = (u32) irq;
-
-			asm volatile(
-				"       xchgl   %%ebx,%%esp     \n"
-				"       call    handle_IRQ_event \n"
-				"       xchgl   %%ebx,%%esp     \n"
-				: "=a"(action_ret)
-				: "b"(isp)
-				: "memory", "cc", "edx", "ecx"
-			);
-
 
+#ifdef CONFIG_IRQ_THREADS
+		if (desc->thread) {
+			desc->status |= IRQ_THREADPENDING;
+			wake_up_process(desc->thread);
 		}
-		spin_lock(&desc->lock);
-		if (!noirqdebug)
-			note_interrupt(irq, desc, action_ret);
-		if (curctx != irqctx)
-			irqctx->tinfo.task = NULL;
+		
+		if (!desc->thread || (desc->status & IRQ_NOTHREAD))
+#endif
+		{
+			spin_unlock(&desc->lock);
+			action_ret = handle_IRQ_event(irq, regs, action);
+			spin_lock(&desc->lock);
+
+			if (!noirqdebug) {
+				if (action_ret == IRQ_HANDLED)
+					desc->status &= ~IRQ_UNHANDLED;
+				else if (action_ret != IRQ_NONE)
+					report_bad_irq(irq, desc, action_ret);
+			}
+		}
+			
 		if (likely(!(desc->status & IRQ_PENDING)))
 			break;
 		desc->status &= ~IRQ_PENDING;
 	}
 
-#else
+	desc->status &= ~IRQ_INPROGRESS;
 
-	for (;;) {
-		irqreturn_t action_ret;
+out:
+	/*
+	 * The ->end() handler has to deal with interrupts which got
+	 * disabled while the handler was running.
+	 */
+	if (!(desc->status & (IRQ_THREADPENDING | IRQ_THREADRUNNING))) {
+		if (!noirqdebug)
+			note_interrupt(irq, desc);
+		
+		desc->handler->end(irq);
+	}
+	spin_unlock(&desc->lock);
+}
+
+/*
+ * do_IRQ handles all normal device IRQ's (the special
+ * SMP cross-CPU interrupts have their own specific
+ * handlers).
+ */
+asmlinkage void do_IRQ(struct pt_regs regs)
+{
+#ifdef CONFIG_4KSTACKS
+	u32 *isp;
+	union irq_ctx *curctx;
+	union irq_ctx *irqctx;
+#endif
+
+	irq_enter();
 
-		spin_unlock(&desc->lock);
+#ifdef CONFIG_DEBUG_STACKOVERFLOW
+	/* Debugging check for stack overflow: is there less than 1KB free? */
+	{
+		long esp;
 
-		action_ret = handle_IRQ_event(irq, &regs, action);
+		asm volatile("andl %%esp,%0" :
+		             "=r" (esp) : "0" (THREAD_SIZE - 1));
 
-		spin_lock(&desc->lock);
-		if (!noirqdebug)
-			note_interrupt(irq, desc, action_ret);
-		if (likely(!(desc->status & IRQ_PENDING)))
-			break;
-		desc->status &= ~IRQ_PENDING;
+		if (unlikely(esp < (sizeof(struct thread_info) + STACK_WARN))) {
+			printk("do_IRQ: stack overflow: %ld\n",
+			       esp - sizeof(struct thread_info));
+			dump_stack();
+		}
 	}
 #endif
-	desc->status &= ~IRQ_INPROGRESS;
 
-out:
+#ifdef CONFIG_4KSTACKS
+	curctx = (union irq_ctx *) current_thread_info();
+	irqctx = hardirq_ctx[smp_processor_id()];
+
 	/*
-	 * The ->end() handler has to deal with interrupts which got
-	 * disabled while the handler was running.
+	 * this is where we switch to the IRQ stack. However, if we are already using
+	 * the IRQ stack (because we interrupted a hardirq handler) we can't do that
+	 * and just have to keep using the current stack (which is the irq stack already
+	 * after all)
 	 */
-	desc->handler->end(irq);
-	spin_unlock(&desc->lock);
 
-	irq_exit();
+	if (curctx == irqctx) {
+		really_do_IRQ(&regs);
+	} else {
+		/* build the stack frame on the IRQ stack */
+		isp = (u32*) ((char*)irqctx + sizeof(*irqctx));
+		irqctx->tinfo.task = curctx->tinfo.task;
+		irqctx->tinfo.previous_esp = current_stack_pointer();
 
-	return 1;
+		*--isp = (u32) &regs;
+
+		asm volatile("xchgl   %%ebx, %%esp;"
+		             "call    really_do_IRQ;"
+ 		             "xchgl   %%ebx, %%esp;"
+		           : /* no outputs */
+		           : "b" (isp)
+		           : "memory", "cc", "eax", "edx", "ecx");
+
+		irqctx->tinfo.task = NULL;
+	}
+#else
+	really_do_IRQ(&regs);
+#endif
+
+	irq_exit();
 }
 
 int can_request_irq(unsigned int irq, unsigned long irqflags)
@@ -704,7 +725,7 @@
 			*pp = action->next;
 			if (!desc->action) {
 				desc->status |= IRQ_DISABLED;
-				desc->handler->shutdown(irq);
+				do_shutdown_irq(irq);
 			}
 			spin_unlock_irqrestore(&desc->lock,flags);
 
@@ -943,6 +964,8 @@
 		rand_initialize_irq(irq);
 	}
 
+	setup_irq_spawn_thread(irq, new);
+
 	/*
 	 * The following block of code has to be executed atomically
 	 */
@@ -968,7 +991,7 @@
 	if (!shared) {
 		desc->depth = 0;
 		desc->status &= ~(IRQ_DISABLED | IRQ_AUTODETECT | IRQ_WAITING | IRQ_INPROGRESS);
-		desc->handler->startup(irq);
+		do_startup_irq(irq);
 	}
 	spin_unlock_irqrestore(&desc->lock,flags);
 
diff -urN linux-2.6.8-rc2/arch/i386/mach-default/setup.c linux-2.6.8-rc2-irq-threads/arch/i386/mach-default/setup.c
--- linux-2.6.8-rc2/arch/i386/mach-default/setup.c	2004-07-27 17:06:24.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/arch/i386/mach-default/setup.c	2004-07-27 17:10:42.000000000 -0400
@@ -27,7 +27,8 @@
 /*
  * IRQ2 is cascade interrupt to second interrupt controller
  */
-static struct irqaction irq2 = { no_action, 0, CPU_MASK_NONE, "cascade", NULL, NULL};
+static struct irqaction irq2 = 
+	{ no_action, SA_NOTHREAD, CPU_MASK_NONE, "cascade", NULL, NULL};
 
 /**
  * intr_init_hook - post gate setup interrupt initialisation
@@ -71,7 +72,9 @@
 {
 }
 
-static struct irqaction irq0  = { timer_interrupt, SA_INTERRUPT, CPU_MASK_NONE, "timer", NULL, NULL};
+static struct irqaction irq0  = 
+	{ timer_interrupt, SA_INTERRUPT | SA_NOTHREAD, CPU_MASK_NONE, 
+	  "timer", NULL, NULL};
 
 /**
  * time_init_hook - do any specific initialisations for the system timer.
diff -urN linux-2.6.8-rc2/arch/i386/mach-visws/setup.c linux-2.6.8-rc2-irq-threads/arch/i386/mach-visws/setup.c
--- linux-2.6.8-rc2/arch/i386/mach-visws/setup.c	2004-06-16 01:18:59.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/arch/i386/mach-visws/setup.c	2004-07-27 17:08:46.000000000 -0400
@@ -112,7 +112,7 @@
 
 static struct irqaction irq0 = {
 	.handler =	timer_interrupt,
-	.flags =	SA_INTERRUPT,
+	.flags =	SA_INTERRUPT | SA_NOTHREAD,
 	.name =		"timer",
 };
 
diff -urN linux-2.6.8-rc2/arch/i386/mach-visws/visws_apic.c linux-2.6.8-rc2-irq-threads/arch/i386/mach-visws/visws_apic.c
--- linux-2.6.8-rc2/arch/i386/mach-visws/visws_apic.c	2004-06-16 01:18:57.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/arch/i386/mach-visws/visws_apic.c	2004-07-27 17:08:46.000000000 -0400
@@ -261,11 +261,13 @@
 static struct irqaction master_action = {
 	.handler =	piix4_master_intr,
 	.name =		"PIIX4-8259",
+	.flags =        SA_NOTHREAD,
 };
 
 static struct irqaction cascade_action = {
 	.handler = 	no_action,
 	.name =		"cascade",
+	.flags =        SA_NOTHREAD,
 };
 
 
diff -urN linux-2.6.8-rc2/arch/i386/mach-voyager/setup.c linux-2.6.8-rc2-irq-threads/arch/i386/mach-voyager/setup.c
--- linux-2.6.8-rc2/arch/i386/mach-voyager/setup.c	2004-07-27 17:06:24.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/arch/i386/mach-voyager/setup.c	2004-07-27 17:11:14.000000000 -0400
@@ -17,7 +17,8 @@
 /*
  * IRQ2 is cascade interrupt to second interrupt controller
  */
-static struct irqaction irq2 = { no_action, 0, CPU_MASK_NONE, "cascade", NULL, NULL};
+static struct irqaction irq2 = 
+	{ no_action, SA_NOTHREAD, CPU_MASK_NONE, "cascade", NULL, NULL};
 
 void __init intr_init_hook(void)
 {
@@ -40,7 +41,9 @@
 {
 }
 
-static struct irqaction irq0  = { timer_interrupt, SA_INTERRUPT, CPU_MASK_NONE, "timer", NULL, NULL};
+static struct irqaction irq0  = 
+	{ timer_interrupt, SA_INTERRUPT | SA_NOTHREAD, 
+	  CPU_MASK_NONE, "timer", NULL, NULL};
 
 void __init time_init_hook(void)
 {
diff -urN linux-2.6.8-rc2/drivers/input/serio/i8042.c linux-2.6.8-rc2-irq-threads/drivers/input/serio/i8042.c
--- linux-2.6.8-rc2/drivers/input/serio/i8042.c	2004-06-16 01:18:57.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/drivers/input/serio/i8042.c	2004-07-27 17:52:11.000000000 -0400
@@ -277,7 +277,7 @@
 			return 0;
 
 	if (request_irq(values->irq, i8042_interrupt,
-			SA_SHIRQ, "i8042", i8042_request_irq_cookie)) {
+			SA_SHIRQ | SA_NOTHREAD, "i8042", i8042_request_irq_cookie)) {
 		printk(KERN_ERR "i8042.c: Can't get irq %d for %s, unregistering the port.\n", values->irq, values->name);
 		goto irq_fail;
 	}
@@ -571,7 +571,7 @@
  * in trying to detect AUX presence.
  */
 
-	if (request_irq(values->irq, i8042_interrupt, SA_SHIRQ,
+	if (request_irq(values->irq, i8042_interrupt, SA_SHIRQ | SA_NOTHREAD,
 				"i8042", &i8042_check_aux_cookie))
                 return -1;
 	free_irq(values->irq, &i8042_check_aux_cookie);
diff -urN linux-2.6.8-rc2/include/asm-i386/hardirq.h linux-2.6.8-rc2-irq-threads/include/asm-i386/hardirq.h
--- linux-2.6.8-rc2/include/asm-i386/hardirq.h	2004-06-16 01:19:43.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/include/asm-i386/hardirq.h	2004-07-27 17:08:46.000000000 -0400
@@ -64,7 +64,12 @@
  * Are we doing bottom half or hardware interrupt processing?
  * Are we in a softirq context? Interrupt context?
  */
+#ifdef CONFIG_IRQ_THREADS
+#define in_irq()     (hardirq_count() || (current->flags & PF_IRQHANDLER))
+#else
 #define in_irq()		(hardirq_count())
+#endif
+
 #define in_softirq()		(softirq_count())
 #define in_interrupt()		(irq_count())
 
@@ -92,7 +97,7 @@
 		preempt_enable_no_resched();				\
 } while (0)
 
-#ifndef CONFIG_SMP
+#if !defined(CONFIG_SMP) && !defined(CONFIG_IRQ_THREADS)
 # define synchronize_irq(irq)	barrier()
 #else
   extern void synchronize_irq(unsigned int irq);
diff -urN linux-2.6.8-rc2/include/asm-i386/irq.h linux-2.6.8-rc2-irq-threads/include/asm-i386/irq.h
--- linux-2.6.8-rc2/include/asm-i386/irq.h	2004-06-16 01:19:37.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/include/asm-i386/irq.h	2004-07-27 17:08:46.000000000 -0400
@@ -27,6 +27,8 @@
 extern void release_x86_irqs(struct task_struct *);
 extern int can_request_irq(unsigned int, unsigned long flags);
 
+#define get_irq_desc(irq) (&irq_desc[irq])
+
 #ifdef CONFIG_X86_LOCAL_APIC
 #define ARCH_HAS_NMI_WATCHDOG		/* See include/linux/nmi.h */
 #endif
diff -urN linux-2.6.8-rc2/include/asm-i386/signal.h linux-2.6.8-rc2-irq-threads/include/asm-i386/signal.h
--- linux-2.6.8-rc2/include/asm-i386/signal.h	2004-07-27 17:06:26.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/include/asm-i386/signal.h	2004-07-27 17:08:46.000000000 -0400
@@ -122,6 +122,7 @@
 #define SA_PROBE		SA_ONESHOT
 #define SA_SAMPLE_RANDOM	SA_RESTART
 #define SA_SHIRQ		0x04000000
+#define SA_NOTHREAD             0x01000000
 #endif
 
 #define SIG_BLOCK          0	/* for blocking signals */
diff -urN linux-2.6.8-rc2/include/linux/interrupt.h linux-2.6.8-rc2-irq-threads/include/linux/interrupt.h
--- linux-2.6.8-rc2/include/linux/interrupt.h	2004-07-27 17:06:26.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/include/linux/interrupt.h	2004-07-27 17:08:46.000000000 -0400
@@ -51,7 +51,7 @@
 /*
  * Temporary defines for UP kernels, until all code gets fixed.
  */
-#ifndef CONFIG_SMP
+#if !defined(CONFIG_SMP) && !defined(CONFIG_IRQ_THREADS)
 # define cli()			local_irq_disable()
 # define sti()			local_irq_enable()
 # define save_flags(x)		local_save_flags(x)
diff -urN linux-2.6.8-rc2/include/linux/irq.h linux-2.6.8-rc2-irq-threads/include/linux/irq.h
--- linux-2.6.8-rc2/include/linux/irq.h	2004-06-16 01:19:17.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/include/linux/irq.h	2004-07-27 17:08:46.000000000 -0400
@@ -23,15 +23,31 @@
 /*
  * IRQ line status.
  */
-#define IRQ_INPROGRESS	1	/* IRQ handler active - do not enter! */
-#define IRQ_DISABLED	2	/* IRQ disabled - do not enter! */
-#define IRQ_PENDING	4	/* IRQ pending - replay on enable */
-#define IRQ_REPLAY	8	/* IRQ has been replayed but not acked yet */
-#define IRQ_AUTODETECT	16	/* IRQ is being autodetected */
-#define IRQ_WAITING	32	/* IRQ not yet seen - for autodetection */
-#define IRQ_LEVEL	64	/* IRQ level triggered */
-#define IRQ_MASKED	128	/* IRQ masked - shouldn't be seen again */
-#define IRQ_PER_CPU	256	/* IRQ is per CPU */
+#define IRQ_INPROGRESS     1    /* IRQ handler active - do not enter! */
+#define IRQ_DISABLED       2    /* IRQ disabled - do not enter! */
+#define IRQ_PENDING        4    /* IRQ pending - replay on enable */
+#define IRQ_REPLAY         8    /* IRQ has been replayed but not acked yet */
+#define IRQ_AUTODETECT     16   /* IRQ is being autodetected */
+#define IRQ_WAITING        32   /* IRQ not yet seen - for autodetection */
+#define IRQ_LEVEL          64   /* IRQ level triggered */
+#define IRQ_MASKED         128  /* IRQ masked - shouldn't be seen again */
+#define IRQ_PER_CPU        256  /* IRQ is per CPU */
+#define IRQ_THREAD         512  /* IRQ has at least one threaded handler */
+#define IRQ_NOTHREAD       1024 /* IRQ has at least one nonthreaded handler */
+#define IRQ_THREADPENDING  2048 /* IRQ thread has been woken */
+#define IRQ_THREADRUNNING  4096 /* IRQ thread is currently running */
+
+/* Nobody has yet handled this IRQ.  This is set when ack() is called,
+   and checked when end() is called.  It is done this way to accomodate
+   threaded and non-threaded IRQs sharing the same IRQ. */
+
+#define IRQ_UNHANDLED      8192
+
+/* The interrupt is supposed to be enabled, but the IRQ thread hasn't
+   been spawned yet.  Call desc->handler->startup() once the thread
+   has been spawned. */
+
+#define IRQ_DELAYEDSTARTUP 16384
 
 /*
  * Interrupt controller descriptor. This is all we need
@@ -65,6 +81,10 @@
 	unsigned int irq_count;		/* For detecting broken interrupts */
 	unsigned int irqs_unhandled;
 	spinlock_t lock;
+#ifdef CONFIG_IRQ_THREADS
+	struct task_struct *thread;
+	wait_queue_head_t sync;
+#endif
 } ____cacheline_aligned irq_desc_t;
 
 extern irq_desc_t irq_desc [NR_IRQS];
@@ -75,6 +95,36 @@
 
 extern hw_irq_controller no_irq_type;  /* needed in every arch ? */
 
+#ifdef CONFIG_IRQ_THREADS
+
+void spawn_irq_threads(void);
+void setup_irq_spawn_thread(unsigned int irq, struct irqaction *new);
+unsigned int do_startup_irq(unsigned int irq);
+void do_shutdown_irq(unsigned int irq);
+
+#else
+
+static inline void spawn_irq_threads(void)
+{
+}
+
+static inline void setup_irq_spawn_thread(unsigned int irq,
+                                          struct irqaction *new)
+{
+}
+
+static inline unsigned int do_startup_irq(int irq)
+{
+	return get_irq_desc(irq)->handler->startup(irq);
+}
+
+static inline void do_shutdown_irq(int irq)
+{
+	get_irq_desc(irq)->handler->shutdown(irq);
+}
+
+#endif
+
 #endif
 
 #endif /* __irq_h */
diff -urN linux-2.6.8-rc2/include/linux/sched.h linux-2.6.8-rc2-irq-threads/include/linux/sched.h
--- linux-2.6.8-rc2/include/linux/sched.h	2004-07-27 17:06:26.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/include/linux/sched.h	2004-07-27 17:08:46.000000000 -0400
@@ -555,6 +555,7 @@
 #define PF_SWAPOFF	0x00080000	/* I am in swapoff */
 #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
 #define PF_SYNCWRITE	0x00200000	/* I am doing a sync write */
+#define PF_IRQHANDLER   0x00400000      /* in_irq() should return true */
 
 #ifdef CONFIG_SMP
 #define SCHED_LOAD_SCALE	128UL	/* increase resolution of load */
diff -urN linux-2.6.8-rc2/init/Kconfig linux-2.6.8-rc2-irq-threads/init/Kconfig
--- linux-2.6.8-rc2/init/Kconfig	2004-07-27 17:06:26.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/init/Kconfig	2004-07-27 17:08:46.000000000 -0400
@@ -294,6 +294,18 @@
 
 	  If unsure, say N.
 
+config IRQ_THREADS
+  bool "Run all IRQs in threads by default"
+  depends on PREEMPT
+  help
+    This option creates a thread for each IRQ, which runs at high
+    real-time priority, unless the SA_NOTHREAD option is passed to
+    request_irq().  This allows these IRQs to be prioritized, so as
+    to avoid preempting very high priority real-time tasks.  This
+    also allows spinlocks used by threaded IRQs to be converted
+    into sleeping mutexes, for further reduction of latency (however,
+    this is not done automatically).
+
 endmenu		# General setup
 
 
diff -urN linux-2.6.8-rc2/init/main.c linux-2.6.8-rc2-irq-threads/init/main.c
--- linux-2.6.8-rc2/init/main.c	2004-07-27 17:06:26.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/init/main.c	2004-07-27 17:08:46.000000000 -0400
@@ -668,6 +668,8 @@
 	smp_init();
 	sched_init_smp();
 
+	spawn_irq_threads();
+
 	/*
 	 * Do this before initcalls, because some drivers want to access
 	 * firmware files.
diff -urN linux-2.6.8-rc2/kernel/Makefile linux-2.6.8-rc2-irq-threads/kernel/Makefile
--- linux-2.6.8-rc2/kernel/Makefile	2004-07-27 17:06:26.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/kernel/Makefile	2004-07-27 17:08:46.000000000 -0400
@@ -23,6 +23,7 @@
 obj-$(CONFIG_STOP_MACHINE) += stop_machine.o
 obj-$(CONFIG_AUDIT) += audit.o
 obj-$(CONFIG_AUDITSYSCALL) += auditsc.o
+obj-$(CONFIG_IRQ_THREADS) += irq.o
 
 ifneq ($(CONFIG_IA64),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
diff -urN linux-2.6.8-rc2/kernel/irq.c linux-2.6.8-rc2-irq-threads/kernel/irq.c
--- linux-2.6.8-rc2/kernel/irq.c	1969-12-31 19:00:00.000000000 -0500
+++ linux-2.6.8-rc2-irq-threads/kernel/irq.c	2004-07-27 17:08:46.000000000 -0400
@@ -0,0 +1,232 @@
+/*
+ *	linux/kernel/irq.c -- Generic code for threaded IRQ handling
+ *
+ *	Copyright (C) 2001-2004 TimeSys Corp.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/config.h>
+#include <linux/sched.h>
+#include <linux/interrupt.h>
+#include <linux/slab.h>
+#include <linux/irq.h>
+#include <linux/completion.h>
+#include <linux/syscalls.h>
+#include <linux/random.h>
+
+#include <asm/uaccess.h>
+
+static const int irq_prio = MAX_USER_RT_PRIO - 2;
+
+static inline void synchronize_hard_irq(unsigned int irq)
+{
+#ifdef CONFIG_SMP
+	while (get_irq_desc(irq)->status & IRQ_INPROGRESS)
+		cpu_relax();
+#endif
+}
+
+void synchronize_irq(unsigned int irq)
+{
+	irq_desc_t *desc = get_irq_desc(irq);
+	
+	synchronize_hard_irq(irq);
+	
+	if (desc->thread)
+		wait_event(desc->sync, !(desc->status & IRQ_THREADRUNNING));
+}
+
+typedef struct {
+	struct completion comp;
+	int irq;
+} irq_thread_info;
+
+static int run_irq_thread(void *__info)
+{
+	irq_thread_info *info = __info;
+	int irq = info->irq;
+	struct sched_param param = { .sched_priority = irq_prio };
+	irq_desc_t *desc = get_irq_desc(irq);
+	
+	daemonize("IRQ %d", irq);
+	
+	set_fs(KERNEL_DS);
+	sys_sched_setscheduler(0, SCHED_FIFO, &param);
+	
+	current->flags |= PF_IRQHANDLER | PF_NOFREEZE;
+	
+	init_waitqueue_head(&desc->sync);
+	smp_wmb();
+	desc->thread = current;
+	
+	spin_lock_irq(&desc->lock);
+	
+	if (desc->status & IRQ_DELAYEDSTARTUP) {
+		desc->status &= ~IRQ_DELAYEDSTARTUP;
+		do_startup_irq(irq);
+	}
+	
+	spin_unlock_irq(&desc->lock);
+	
+	/* info is no longer valid after this... */
+	complete(&info->comp);
+	
+	for (;;) {
+		struct irqaction *action;
+		int status, retval;
+		
+		set_current_state(TASK_INTERRUPTIBLE);
+
+		while (!(desc->status & IRQ_THREADPENDING))
+			schedule();
+		
+		set_current_state(TASK_RUNNING);
+
+		spin_lock_irq(&desc->lock);
+		
+		desc->status |= IRQ_THREADRUNNING;
+		desc->status &= ~IRQ_THREADPENDING;
+		status = desc->status;
+		
+		spin_unlock_irq(&desc->lock);
+		
+		retval = 0;
+		
+		if (!(status & IRQ_DISABLED)) {
+			action = desc->action;
+
+			while (action) {
+				if (!(action->flags & SA_NOTHREAD)) {
+					status |= action->flags;
+					retval |= action->handler(irq, action->dev_id, NULL);
+				}
+				
+				action = action->next;
+			}
+		}
+
+		if (status & SA_SAMPLE_RANDOM)
+			add_interrupt_randomness(irq);
+
+		spin_lock_irq(&desc->lock);
+		
+		
+		desc->status &= ~IRQ_THREADRUNNING;
+		if (!(desc->status & (IRQ_DISABLED | IRQ_INPROGRESS |
+				      IRQ_THREADPENDING | IRQ_THREADRUNNING))) {
+		  desc->handler->end(irq);
+		}
+		
+		spin_unlock_irq(&desc->lock);
+		
+		if (waitqueue_active(&desc->sync))
+			wake_up(&desc->sync);
+	}
+}
+
+static int ok_to_spawn_threads;
+
+void do_spawn_irq_thread(int irq)
+{
+	irq_thread_info info;
+	
+	info.irq = irq;
+	init_completion(&info.comp);
+
+	if (kernel_thread(run_irq_thread, &info, CLONE_KERNEL) < 0) {
+		printk(KERN_EMERG "Could not spawn thread for IRQ %d\n", irq);
+	} else {
+		wait_for_completion(&info.comp);
+	}
+}
+
+void setup_irq_spawn_thread(unsigned int irq, struct irqaction *new)
+{
+	irq_desc_t *desc = get_irq_desc(irq);
+	int spawn_thread = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&desc->lock, flags);
+	
+	if (new->flags & SA_NOTHREAD) {
+		desc->status |= IRQ_NOTHREAD;
+	} else {
+		/* Only the first threaded handler should spawn
+		   a thread. */
+
+		if (!(desc->status & IRQ_THREAD)) {
+			spawn_thread = 1;
+			desc->status |= IRQ_THREAD;
+		}
+	}
+
+	spin_unlock_irqrestore(&desc->lock, flags);
+	
+	if (ok_to_spawn_threads && spawn_thread)
+		do_spawn_irq_thread(irq);
+}
+
+
+/* This takes care of interrupts that were requested before the
+   scheduler was ready for threads to be created. */
+
+void spawn_irq_threads(void)
+{
+	int i;
+	
+	for (i = 0; i < NR_IRQS; i++) {
+		irq_desc_t *desc = get_irq_desc(i);
+	
+		if (desc->action && !desc->thread && (desc->status & IRQ_THREAD))
+			do_spawn_irq_thread(i);
+	}
+	
+	ok_to_spawn_threads = 1;
+}
+
+/*
+ * Workarounds for interrupt types without startup()/shutdown() (ppc, ppc64).
+ * Will be removed some day.
+ */
+
+unsigned int do_startup_irq(unsigned int irq)
+{
+	irq_desc_t *desc = get_irq_desc(irq);
+
+	if ((desc->status & IRQ_THREAD) && !desc->thread) {
+		/* The IRQ threads haven't been spawned yet.  Don't
+		   turn on the IRQ until that happens. */
+		
+		desc->status |= IRQ_DELAYEDSTARTUP;
+		return 0;
+	}
+
+	if (desc->handler->startup)
+		return desc->handler->startup(irq);
+	else if (desc->handler->enable)
+		desc->handler->enable(irq);
+	else 
+		BUG();
+
+	return 0;
+}
+
+void do_shutdown_irq(unsigned int irq)
+{
+	irq_desc_t *desc = get_irq_desc(irq);
+
+	if (desc->status & IRQ_DELAYEDSTARTUP) {
+		desc->status &= ~IRQ_DELAYEDSTARTUP;
+		return;
+	}
+
+	if (desc->handler->shutdown)
+		desc->handler->shutdown(irq);
+	else if (desc->handler->disable)
+		desc->handler->disable(irq);
+	else 
+		BUG();
+}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-27 22:50 [patch] IRQ threads Scott Wood
@ 2004-07-28  6:27 ` Ingo Molnar
  2004-07-28 15:38   ` Karim Yaghmour
                     ` (2 more replies)
  2004-07-28  8:10 ` Ingo Molnar
  2004-07-28 15:45 ` Karim Yaghmour
  2 siblings, 3 replies; 30+ messages in thread
From: Ingo Molnar @ 2004-07-28  6:27 UTC (permalink / raw)
  To: Scott Wood; +Cc: linux-kernel, La Monte H.P. Yarroll, Manas Saksena, Bill Huey

* Scott Wood <scott@timesys.com> wrote:

> 2. This patch does not disable local interrupts when running a
> threaded handler.  SMP-safe drivers shouldn't be directly bothered by
> this (as the interrupt could as easily have happened on another CPU),
> but there may be some interactions with softirqs and per-cpu data, if
> a softirq thread preempts an IRQ thread, or an IRQ thread gets
> migrated to a different CPU. [...]

hardirqs (unlike softirqs) we might attempt to make preemptable. It
really should work because if they run with interrupts enabled then they
must be prepared to see interrupts coming from a similar device using
the same softirq facilities.

migration between CPUs we can prevent easily. While it's not common for
a driver to rely on per-CPU-ness _outside_ of a spinlocked region, it
could in theory happen.

for now i've gone the more conservative route of keeping irqs atomic
still, and applying lock-break methods to them.

> [...] I'm particularly worried about the network code.  If possible,
> I'd like to find and fix such breakages rather than use
> local_irq_disable(), as that would prevent IRQ proritization from
> working, and prevent IRQ threads from being used to isolate the rest
> of the system from long-running IRQs (such as non-DMA IDE).

this is not enough: DMA-IDE creates big latencies too, and those
latencies happen under a spinlock - so with your patch it would be
non-preemptible still.

> 3. The i8042 driver had to be marked SA_NOTHREAD, as there are
> non-preemptible regions where it spins, waiting for an interrupt.
> Ideally, this driver (and others like it) should be fixed to either do
> a cond_resched() or use a wait queue.

yeah, i fixed it via cond_resched_softirq(), _after_ making sure it's
safe to preempt there - softirqs are not generally preemption safe
because they are fundamentally per-CPU. (meanwhile Dmitry Torokhov has
posted a patch that fixes the atkbd.c problem for real by using
workqueues. The psmouse-base.c problem needs a similar fix.)

> 4. This might be a good time to get around to moving the bulk of the
> arch/whatever/kernel/irq.c into generic code, as the code said was
> supposed to happen in 2.5.  This patch is currently only for x86
> (though we've run IRQ threads on many different platforms in the
> past).

agreed. I punted this one for the time being as it's clearly separate
from the issue of latencies and it's deeply intrusive to 2.6.

> 5. Is there any reason why an IRQ controller might want to have its
> end() called even if IRQ_DISABLED or IRQ_INPROGRESS is set?  It'd be
> nice to merge those checks in with the
> IRQ_THREADPENDING/IRQ_THREADRUNNING checks.

e.g. in the IO-APIC case if we ack the local APIC only in the end()
function then we must do that - an un-acked local APIC prevents other
IRQs from being delivered. We do this for level-triggered IO-APIC irqs.

> 6. This patch causes in_irq() to return true if an IRQ thread is
> running, as some drivers use it in common code to determine how to
> act.  in_interrupt(), however, will return false in such a case. The
> exact meaning of these macros in the presence of IRQ threads isn't
> very well defined, and I hope this results in sane behavior.

this is an incorrect change, just grep for in_interrupt() in
linux/drivers/ ...

I agree with the concept of using multiple threads for interrupts - i'll
add that to the voluntary-preempt patch too. This is an essential
feature to prioritize interrupts.

what do you think about making the i8259A's interrupt priorities
configurable? (a'la rtirq patch) Does it make any sense, given how early
we mask the source irq and ack the interrupt controller [hence giving
all other interrupts a fair chance to arrive ASAP]?

Bernhard Kuhn's rtirq patch is for IO-APIC/APICs, but i think the
latency issues could be equally well fixed by not keeping the local APIC
un-ACK-ed during level triggered irqs, but doing the mask & ack thing.
This will be slightly slower but should make them both redirectable and
more symmetric and fair.

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-27 22:50 [patch] IRQ threads Scott Wood
  2004-07-28  6:27 ` Ingo Molnar
@ 2004-07-28  8:10 ` Ingo Molnar
  2004-07-28 23:12   ` Scott Wood
  2004-07-28 15:45 ` Karim Yaghmour
  2 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2004-07-28  8:10 UTC (permalink / raw)
  To: Scott Wood; +Cc: linux-kernel, La Monte H.P. Yarroll, Manas Saksena


* Scott Wood <scott@timesys.com> wrote:

> I have attached a patch for implementing IRQ handlers in threads, for
> latency-reduction purposes. [...]

i'm wondering about a couple of details. Why were the changes to
note_interrupt() necessary? Also, why the enable_irq() change? What do
you think about the simpler approach in my patch which keeps the irq
masked until the thread runs?

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28  6:27 ` Ingo Molnar
@ 2004-07-28 15:38   ` Karim Yaghmour
  2004-07-28 16:01     ` Karim Yaghmour
  2004-07-28 21:23   ` Bill Huey
  2004-07-28 23:24   ` Scott Wood
  2 siblings, 1 reply; 30+ messages in thread
From: Karim Yaghmour @ 2004-07-28 15:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Scott Wood, linux-kernel, La Monte H.P. Yarroll, Manas Saksena,
	Bill Huey, Philippe Gerum

Ingo Molnar wrote:
> what do you think about making the i8259A's interrupt priorities
> configurable? (a'la rtirq patch) Does it make any sense, given how early
> we mask the source irq and ack the interrupt controller [hence giving
> all other interrupts a fair chance to arrive ASAP]?
> 
> Bernhard Kuhn's rtirq patch is for IO-APIC/APICs, but i think the
> latency issues could be equally well fixed by not keeping the local APIC
> un-ACK-ed during level triggered irqs, but doing the mask & ack thing.
> This will be slightly slower but should make them both redirectable and
> more symmetric and fair.

Not sure why don't want to use something as architecture-specific as the
rtirq patch. We've been developing the Adeos nanokernel for 2 years now,
and it provides a uniform functionality regardless of the architecture
you are running it on. Adeos now runs on x86, PowerPC, ARM (mmu-less and
mmu-full), and IA64. Plus, RTAI now uses Adeos to run side-by-side with
Linux on at least the x86 and the PowerPC. If you're looking at
prioritizing interrupts, then Adeos' interrupt pipeline is certainly the
most compelling method available at this point in time.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-27 22:50 [patch] IRQ threads Scott Wood
  2004-07-28  6:27 ` Ingo Molnar
  2004-07-28  8:10 ` Ingo Molnar
@ 2004-07-28 15:45 ` Karim Yaghmour
  2004-07-28 18:28   ` Lee Revell
  2 siblings, 1 reply; 30+ messages in thread
From: Karim Yaghmour @ 2004-07-28 15:45 UTC (permalink / raw)
  To: Scott Wood
  Cc: linux-kernel, Ingo Molnar, La Monte H.P. Yarroll, Manas Saksena,
	Philippe Gerum

Scott Wood wrote:
> I have attached a patch for implementing IRQ handlers in threads, for
> latency-reduction purposes.  It requires that softirqs must be run in
> threads (or else they could end up running inside the IRQ threads,
> which will at the very least trigger bugs due to in_irq() being set). 
> I've tested it with Ingo's voluntary-preempt J7 patch, and it should
> work with the TimeSys softirq thread patch as well (though you might
> get a conflict with the PF_IRQHANDLER definition; just merge them
> into one).

My experience with clients who have been using TimeSys' stuff has been
abysmal. The fact of the of the matter is that most people who used
this were practically locked-in to TimeSys' services, unable to download
anything "standard" off the net and using it with their kernel. In one
example, we had to ditch the kernel the client got from TimeSys because
we had spent 10+ hours trying to get LTT to work on it without any
success whatsoever.

As I had said on other lists before, I don't see the point of creating
that much complexity in the kernel in order to try to shave-off a little
bit more off of the kernel's interrupt response time. The fact of the
matter is that neither this patch nor most of the other patches suggested
makes the kernel truely hard-rt. These patches only make the kernel
respond "faster". If you really need hard-rt, then you should be using
the Adeos nanokernel. With Adeos, you can even get a hard-rt driver
without using RTAI or any of the other rt derivatives.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 15:38   ` Karim Yaghmour
@ 2004-07-28 16:01     ` Karim Yaghmour
  0 siblings, 0 replies; 30+ messages in thread
From: Karim Yaghmour @ 2004-07-28 16:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Scott Wood, linux-kernel, La Monte H.P. Yarroll, Manas Saksena,
	Bill Huey, Philippe Gerum


grr... typo:

Karim Yaghmour wrote:
> Not sure why don't want to use something as architecture-specific as the
                ^^^^^ you'd

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 15:45 ` Karim Yaghmour
@ 2004-07-28 18:28   ` Lee Revell
  2004-07-28 19:12     ` Karim Yaghmour
  0 siblings, 1 reply; 30+ messages in thread
From: Lee Revell @ 2004-07-28 18:28 UTC (permalink / raw)
  To: karim
  Cc: Scott Wood, Ingo Molnar, La Monte H.P. Yarroll, Manas Saksena,
	Philippe Gerum, linux-kernel

On Wed, 2004-07-28 at 11:45, Karim Yaghmour wrote:
> Scott Wood wrote:
> > I have attached a patch for implementing IRQ handlers in threads, for
> > latency-reduction purposes.  It requires that softirqs must be run in
> > threads (or else they could end up running inside the IRQ threads,
> > which will at the very least trigger bugs due to in_irq() being set). 
> > I've tested it with Ingo's voluntary-preempt J7 patch, and it should
> > work with the TimeSys softirq thread patch as well (though you might
> > get a conflict with the PF_IRQHANDLER definition; just merge them
> > into one).
> 
> My experience with clients who have been using TimeSys' stuff has been
> abysmal. The fact of the of the matter is that most people who used
> this were practically locked-in to TimeSys' services, unable to download
> anything "standard" off the net and using it with their kernel. In one
> example, we had to ditch the kernel the client got from TimeSys because
> we had spent 10+ hours trying to get LTT to work on it without any
> success whatsoever.
> 
> As I had said on other lists before, I don't see the point of creating
> that much complexity in the kernel in order to try to shave-off a little
> bit more off of the kernel's interrupt response time. The fact of the
> matter is that neither this patch nor most of the other patches suggested
> makes the kernel truely hard-rt. These patches only make the kernel
> respond "faster". If you really need hard-rt, then you should be using
> the Adeos nanokernel. With Adeos, you can even get a hard-rt driver
> without using RTAI or any of the other rt derivatives.
> 

This is obvious FUD from someone who is selling something.  Please keep
this crap off LKML.

Lee


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 18:28   ` Lee Revell
@ 2004-07-28 19:12     ` Karim Yaghmour
  2004-07-28 19:33       ` Lee Revell
  0 siblings, 1 reply; 30+ messages in thread
From: Karim Yaghmour @ 2004-07-28 19:12 UTC (permalink / raw)
  To: Lee Revell
  Cc: Scott Wood, Ingo Molnar, La Monte H.P. Yarroll, Manas Saksena,
	Philippe Gerum, linux-kernel


Lee Revell wrote:
> This is obvious FUD from someone who is selling something.  Please keep
> this crap off LKML.

"selling something" ??? I don't know what you're talking about. There
is an entire history behind Adeos and its endorsement by a slew of open
source organizations and individuals. You probably want to go read the
LKML archives on the topic.

Please keep your uneducated self off the LKML.

Thanks,

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 19:12     ` Karim Yaghmour
@ 2004-07-28 19:33       ` Lee Revell
  2004-07-28 19:57         ` Karim Yaghmour
  2004-07-28 20:21         ` Bill Huey
  0 siblings, 2 replies; 30+ messages in thread
From: Lee Revell @ 2004-07-28 19:33 UTC (permalink / raw)
  To: karim
  Cc: Scott Wood, Ingo Molnar, La Monte H.P. Yarroll, Manas Saksena,
	Philippe Gerum, linux-kernel

On Wed, 2004-07-28 at 15:12, Karim Yaghmour wrote:
> Lee Revell wrote:
> > This is obvious FUD from someone who is selling something.  Please keep
> > this crap off LKML.
> 
> "selling something" ??? I don't know what you're talking about. There
> is an entire history behind Adeos and its endorsement by a slew of open
> source organizations and individuals. You probably want to go read the
> LKML archives on the topic.
> 
> Please keep your uneducated self off the LKML.
> 

I am familiar with Adeos, as well as other hard-RT solutions for Linux. 
I did my homework before deciding that I do not in fact need hard-RT, so
I really am not interested in your flamewars, keep them on your RT
mailing lists.

The part that was obvious commercially motivated FUD (and which you
omitted) t in which you badmouth TimeSys and its services, then  Your
.sig states that you are a consultant specializing in realtime and
embedded Linux.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 19:33       ` Lee Revell
@ 2004-07-28 19:57         ` Karim Yaghmour
  2004-07-28 20:35           ` Lee Revell
  2004-07-28 20:21         ` Bill Huey
  1 sibling, 1 reply; 30+ messages in thread
From: Karim Yaghmour @ 2004-07-28 19:57 UTC (permalink / raw)
  To: Lee Revell
  Cc: Scott Wood, Ingo Molnar, La Monte H.P. Yarroll, Manas Saksena,
	Philippe Gerum, linux-kernel

Lee Revell wrote:
> I am familiar with Adeos, as well as other hard-RT solutions for Linux. 
> I did my homework before deciding that I do not in fact need hard-RT, so
> I really am not interested in your flamewars, keep them on your RT
> mailing lists.
> 
> The part that was obvious commercially motivated FUD (and which you
> omitted) t in which you badmouth TimeSys and its services, then  Your
> .sig states that you are a consultant specializing in realtime and
> embedded Linux.

And your point is ... ??? Should everyone on this list who's working for
a company that is involved in a certain speciality field stop commenting
on the solutions proposed by people working for other companies?

This style of attack is called sophism, and it is total crap for the
educated crowd. Attacking someone based on who he works for and what he
does has nothing to do with actually presenting a case against his
arguments.

The personal experiences I've stated before have happened. Lest you
suggest that those of us who have had bad experiences with certain
pieces of technology should shut-up and not voice our complaints for
fear of offending someone out there? Sorry, I don't live in Disneyland.

If you're not interested in hard-rt, that's you're business. Others
on this list have expressed interest in the past for this, and you
are welcome to ignore any traffic in this regard. For my part, I
will get involved whenever I see someone suggesting something that
relates to real-time because this is a topic I'm interested in.
What a surprise that I'm actually working in a field I'm interested
in ... sheesh ...

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 19:33       ` Lee Revell
  2004-07-28 19:57         ` Karim Yaghmour
@ 2004-07-28 20:21         ` Bill Huey
  2004-07-28 20:42           ` Lee Revell
                             ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Bill Huey @ 2004-07-28 20:21 UTC (permalink / raw)
  To: Lee Revell
  Cc: karim, Scott Wood, Ingo Molnar, La Monte H.P. Yarroll,
	Manas Saksena, Philippe Gerum, linux-kernel

On Wed, Jul 28, 2004 at 03:33:38PM -0400, Lee Revell wrote:
> I am familiar with Adeos, as well as other hard-RT solutions for Linux. 
> I did my homework before deciding that I do not in fact need hard-RT, so
> I really am not interested in your flamewars, keep them on your RT
> mailing lists.
> 
> The part that was obvious commercially motivated FUD (and which you
> omitted) t in which you badmouth TimeSys and its services, then  Your
> .sig states that you are a consultant specializing in realtime and
> embedded Linux.

With that said, there's really two camps that are emerging in the real
time Linux field, dual and single kernel. The single kernel work that's
current being done could very well get Linux to being hard RT, assuming
that you solve all of the technical problems with things like RCU,
etc... in 2.6.

The dual kernels folks would be in less of position to VAR their own
stuff and sell proprietary products if Linux were to get native hard RT
performance if you accept that economic criteria. Who knows what the
actual results will be.

It could be that all of this work with Linux could bury prioprietary
OS product (such as LynxOS here) or it could open doors to other things
unknown things that were never possible previous to Linux getting some
kind of hard RT capability. It's certainly a scary notion to think about
with many variables to consider. Linux getting hard RT is inevitable.
It's just a question of how it'll be handled by proprietary OS vendors,
witness IBM for a positive example. A negative one would be Sun.

Now that Windriver System (the idiot folks that never understood Linux
before laying off tons of folks and disbanned the rather famous BSD/OS
group which I was apart of, etc...) and Red Hat is in the picture, it's
all starting to cook up.

All things to think about.

bill

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 19:57         ` Karim Yaghmour
@ 2004-07-28 20:35           ` Lee Revell
  2004-07-28 21:15             ` Karim Yaghmour
  0 siblings, 1 reply; 30+ messages in thread
From: Lee Revell @ 2004-07-28 20:35 UTC (permalink / raw)
  To: karim
  Cc: Scott Wood, Ingo Molnar, La Monte H.P. Yarroll, Manas Saksena,
	Philippe Gerum, linux-kernel

On Wed, 2004-07-28 at 15:57, Karim Yaghmour wrote:
> Lee Revell wrote:
> > I am familiar with Adeos, as well as other hard-RT solutions for Linux. 
> > I did my homework before deciding that I do not in fact need hard-RT, so
> > I really am not interested in your flamewars, keep them on your RT
> > mailing lists.
> > 
> > The part that was obvious commercially motivated FUD (and which you
> > omitted) t in which you badmouth TimeSys and its services, then  Your
> > .sig states that you are a consultant specializing in realtime and
> > embedded Linux.
> 
> And your point is ... ??? Should everyone on this list who's working for
> a company that is involved in a certain speciality field stop commenting
> on the solutions proposed by people working for other companies?
> 

No, of course not.  But please be more specific than 'everything I've
seen from TimeSys is crap, I have talked to some of their clients who
had $FOO problem'.  Your complaint was so general as to not be
refutable.

>  Attacking someone based on who he works for and what he
> does has nothing to do with actually presenting a case against his
> arguments.

I could not agree more.  Your original post had a strong ad hominem
flavor, which was my objection.  Of course you are free to attack
anything at any time, on its technical merits, this is what engineers
do.

I am interested in RT as well, I did not mean to imply that I don't find
it a valid topic for discussion on LKML, but you have to admit that your
post bordered on a troll.

Lee

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 20:21         ` Bill Huey
@ 2004-07-28 20:42           ` Lee Revell
  2004-07-28 20:46             ` Bill Huey
  2004-07-28 21:48           ` Karim Yaghmour
  2004-07-28 22:03           ` Philippe Gerum
  2 siblings, 1 reply; 30+ messages in thread
From: Lee Revell @ 2004-07-28 20:42 UTC (permalink / raw)
  To: Bill Huey
  Cc: karim, Scott Wood, Ingo Molnar, La Monte H.P. Yarroll,
	Manas Saksena, Philippe Gerum, linux-kernel

On Wed, 2004-07-28 at 16:21, Bill Huey wrote:
> On Wed, Jul 28, 2004 at 03:33:38PM -0400, Lee Revell wrote:
> > I am familiar with Adeos, as well as other hard-RT solutions for Linux. 
> > I did my homework before deciding that I do not in fact need hard-RT, so
> > I really am not interested in your flamewars, keep them on your RT
> > mailing lists.
> > 
> > The part that was obvious commercially motivated FUD (and which you
> > omitted) t in which you badmouth TimeSys and its services, then  Your
> > .sig states that you are a consultant specializing in realtime and
> > embedded Linux.
> 
> With that said, there's really two camps that are emerging in the real
> time Linux field, dual and single kernel. The single kernel work that's
> current being done could very well get Linux to being hard RT, assuming
> that you solve all of the technical problems with things like RCU,
> etc... in 2.6.
> 
> The dual kernels folks would be in less of position to VAR their own
> stuff and sell proprietary products if Linux were to get native hard RT
> performance if you accept that economic criteria. Who knows what the
> actual results will be.

As I understand it there will still be a place for the current hard-RT
Linux solutions, because even if I can get five nines latency better
than N, this is not good enough for hard RT, as you need to be able to
mathematically demonstrate that you can *never* miss a deadline.

Or are you saying that the latest developments in the stock kernel make
this possible?

Lee


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 20:42           ` Lee Revell
@ 2004-07-28 20:46             ` Bill Huey
  0 siblings, 0 replies; 30+ messages in thread
From: Bill Huey @ 2004-07-28 20:46 UTC (permalink / raw)
  To: Lee Revell
  Cc: Bill Huey, karim, Scott Wood, Ingo Molnar, La Monte H.P. Yarroll,
	Manas Saksena, Philippe Gerum, linux-kernel

On Wed, Jul 28, 2004 at 04:42:51PM -0400, Lee Revell wrote:
> As I understand it there will still be a place for the current hard-RT
> Linux solutions, because even if I can get five nines latency better
> than N, this is not good enough for hard RT, as you need to be able to
> mathematically demonstrate that you can *never* miss a deadline.
> 
> Or are you saying that the latest developments in the stock kernel make
> this possible?

Not quite with this thread and in this stage of development, but this is
quite possible if certain concurrency problems are solved in Linux. RCU is
a potential pain as well as other things. BSD/OS-FreeBSD-current are not
new to this kind of conversion, so this is certainly very possible with
very finite software engineering problems to solve. 

Scary ain't it ? It makes me wonder some times.

bill

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 20:35           ` Lee Revell
@ 2004-07-28 21:15             ` Karim Yaghmour
  2004-07-28 21:43               ` Lee Revell
  0 siblings, 1 reply; 30+ messages in thread
From: Karim Yaghmour @ 2004-07-28 21:15 UTC (permalink / raw)
  To: Lee Revell
  Cc: Scott Wood, Ingo Molnar, La Monte H.P. Yarroll, Manas Saksena,
	Philippe Gerum, linux-kernel

Lee Revell wrote:
> No, of course not.  But please be more specific than 'everything I've
> seen from TimeSys is crap, I have talked to some of their clients who
> had $FOO problem'.  Your complaint was so general as to not be
> refutable.

I didn't say their stuff was crap. I said that "__My__ experience with
clients ...". IOW, I'm talking about cases where I was called on by
customers of mine who were using "stuff" (i.e. threaded int handlers
and mutexed-locks), and in practice I have found that the only entity
who could service this was the one that provided my clients with said
kernel. And from my point of view, this is indeed "abysmal" because
the whole point of using Linux is to be able to service your car at
any retailer.

You are correct in stating that the lack of details made the argument
hard to defend against, but I had a very nasty choice in writing this:
a) Either give enough detail, in which case my clients' confidentiality
would be in question.
b) Or shut up and let this go unanswered on the LKML.

> I could not agree more.  Your original post had a strong ad hominem
> flavor, which was my objection.  Of course you are free to attack
> anything at any time, on its technical merits, this is what engineers
> do.
> 
> I am interested in RT as well, I did not mean to imply that I don't find
> it a valid topic for discussion on LKML, but you have to admit that your
> post bordered on a troll.

I can see that it could be intrepreted as such. But keep in mind that I
was replying to Scott's _public_ posting of TimeSys' patch. IOW, there's
no way that I could claim that there would be lock-in in the future if
the patch did indeed make it in the kernel. The past tense was used on
purpose, and TimeSys have been very up-front about their intent of
getting this into the mainline.

My real argument was best summarized in the second paragraph, and what
I'm saying is that these approaches make the kernel's dynamic behavior
extremely complicated. And while they do contribute to making the
kernel's response time faster, they do not provided hard-rt, which is
what everyone is trying to get in the end anyway (either intentionally
or unintentionally.)

With that, let me respond to Bill's discussion on signle vs. N kernels
as that thread is the most likely to be fruitfull. I hope you'll agree.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28  6:27 ` Ingo Molnar
  2004-07-28 15:38   ` Karim Yaghmour
@ 2004-07-28 21:23   ` Bill Huey
  2004-07-28 21:35     ` Scott Wood
  2004-07-28 23:24   ` Scott Wood
  2 siblings, 1 reply; 30+ messages in thread
From: Bill Huey @ 2004-07-28 21:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Scott Wood, linux-kernel, La Monte H.P. Yarroll, Manas Saksena,
	Bill Huey

On Wed, Jul 28, 2004 at 08:27:22AM +0200, Ingo Molnar wrote:
> this is an incorrect change, just grep for in_interrupt() in
> linux/drivers/ ...
> 
> I agree with the concept of using multiple threads for interrupts - i'll
> add that to the voluntary-preempt patch too. This is an essential
> feature to prioritize interrupts.

That way I picture the problem permits those threads to migration across
CPUs and therefore kill interrupt performance from cache thrashing. Do
you have a solution for that ? I like the way you're doing it now with
irqd() in that it's CPU-local, but as you know it's not priority sensitive.

> what do you think about making the i8259A's interrupt priorities
> configurable? (a'la rtirq patch) Does it make any sense, given how early
> we mask the source irq and ack the interrupt controller [hence giving
> all other interrupts a fair chance to arrive ASAP]?
> 
> Bernhard Kuhn's rtirq patch is for IO-APIC/APICs, but i think the
> latency issues could be equally well fixed by not keeping the local APIC
> un-ACK-ed during level triggered irqs, but doing the mask & ack thing.
> This will be slightly slower but should make them both redirectable and
> more symmetric and fair.

I think it's pretty complementary to what's going on, but it's also driven
by application demand, how it could possibly hook into the scheduler or
something else like that. There isn't a clear precedence for how to use
something like this potentially. LynxOS doesn't have priorities above
interrupt priorities, that mask interrupts somehow, but are folks that need
that kind of stuff and could certainly be added in a relatively trivial
manner.

My answer would be something like yes it's needed, but not now.

bill

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 21:23   ` Bill Huey
@ 2004-07-28 21:35     ` Scott Wood
  2004-07-29 21:08       ` Bill Huey
  2004-07-29 22:44       ` Ingo Molnar
  0 siblings, 2 replies; 30+ messages in thread
From: Scott Wood @ 2004-07-28 21:35 UTC (permalink / raw)
  To: Bill Huey
  Cc: Ingo Molnar, Scott Wood, linux-kernel, La Monte H.P. Yarroll,
	Manas Saksena

On Wed, Jul 28, 2004 at 02:23:14PM -0700, Bill Huey wrote:
> That way I picture the problem permits those threads to migration across
> CPUs and therefore kill interrupt performance from cache thrashing. Do
> you have a solution for that ? I like the way you're doing it now with
> irqd() in that it's CPU-local, but as you know it's not priority sensitive.

Wouldn't the IRQ threads be subject to the same heuristics that the
scheduler uses with ordinary threads, in order to avoid unnecessary
CPU migration?  Plus, IRQs ordinarily get distributed across CPUs,
and in most cases shouldn't have a very large cache footprint
(especially data; the code can be in multiple CPU caches at once), so
I don't think this is a susbtantial degradation from the way things
already are.

If desired by the user, an IRQ thread could be bound to a specific
CPU to avoid such problems (in which case, they'd probably want to
set the smp_affinity of the hard IRQ stub to the same CPU).

-Scott

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 21:43               ` Lee Revell
@ 2004-07-28 21:38                 ` Karim Yaghmour
  0 siblings, 0 replies; 30+ messages in thread
From: Karim Yaghmour @ 2004-07-28 21:38 UTC (permalink / raw)
  To: Lee Revell
  Cc: Scott Wood, Ingo Molnar, La Monte H.P. Yarroll, Manas Saksena,
	Philippe Gerum, linux-kernel


Lee Revell wrote:
> Yes, agreed.  I am glad this did not escalate, and I hope you can
> understand how I would have overlooked your actual argument due to my
> perceiving the first paragraph as vaguely ad hominem. 

Yes I do understand, and thanks for taking the time to sort this out :)

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 21:15             ` Karim Yaghmour
@ 2004-07-28 21:43               ` Lee Revell
  2004-07-28 21:38                 ` Karim Yaghmour
  0 siblings, 1 reply; 30+ messages in thread
From: Lee Revell @ 2004-07-28 21:43 UTC (permalink / raw)
  To: karim
  Cc: Scott Wood, Ingo Molnar, La Monte H.P. Yarroll, Manas Saksena,
	Philippe Gerum, linux-kernel

On Wed, 2004-07-28 at 17:15, Karim Yaghmour wrote:
> Lee Revell wrote:
> My real argument was best summarized in the second paragraph, and what
> I'm saying is that these approaches make the kernel's dynamic behavior
> extremely complicated. And while they do contribute to making the
> kernel's response time faster, they do not provided hard-rt, which is
> what everyone is trying to get in the end anyway (either intentionally
> or unintentionally.)
> 
> With that, let me respond to Bill's discussion on signle vs. N kernels
> as that thread is the most likely to be fruitfull. I hope you'll agree.
> 

Yes, agreed.  I am glad this did not escalate, and I hope you can
understand how I would have overlooked your actual argument due to my
perceiving the first paragraph as vaguely ad hominem. 

Lee


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 20:21         ` Bill Huey
  2004-07-28 20:42           ` Lee Revell
@ 2004-07-28 21:48           ` Karim Yaghmour
  2004-07-28 22:30             ` Bill Huey
  2004-07-28 22:03           ` Philippe Gerum
  2 siblings, 1 reply; 30+ messages in thread
From: Karim Yaghmour @ 2004-07-28 21:48 UTC (permalink / raw)
  To: Bill Huey (hui)
  Cc: Lee Revell, Scott Wood, Ingo Molnar, La Monte H.P. Yarroll,
	Manas Saksena, Philippe Gerum, linux-kernel

Bill Huey (hui) wrote:
> With that said, there's really two camps that are emerging in the real
> time Linux field, dual and single kernel. The single kernel work that's
> current being done could very well get Linux to being hard RT, assuming
> that you solve all of the technical problems with things like RCU,
> etc... in 2.6.
> 
> The dual kernels folks would be in less of position to VAR their own
> stuff and sell proprietary products if Linux were to get native hard RT
> performance if you accept that economic criteria. Who knows what the
> actual results will be.

Two things:
- What I'm suggesting is a nanokernel-based N kernel scheme, not the
dual kernel scheme (which is patented BTW).
- There's only one company out there that is known to sell proprietary
products around the dual kernel approach, and it isn't mine.

> It could be that all of this work with Linux could bury prioprietary
> OS product (such as LynxOS here) or it could open doors to other things
> unknown things that were never possible previous to Linux getting some
> kind of hard RT capability. It's certainly a scary notion to think about
> with many variables to consider. Linux getting hard RT is inevitable.
> It's just a question of how it'll be handled by proprietary OS vendors,
> witness IBM for a positive example. A negative one would be Sun.
> 
> Now that Windriver System (the idiot folks that never understood Linux
> before laying off tons of folks and disbanned the rather famous BSD/OS
> group which I was apart of, etc...) and Red Hat is in the picture, it's
> all starting to cook up.

Indeed. So the question now becomes: is it worth introducing that much
complexity inside the kernel to solve a problem that matters only
marginally to server and workstation users? It's already difficult as it
is to manage the preemption, do we really want to go the full way with
threaded int handlers and mutexes instead of locks?

What I'm suggesting is a very simple model that solves 90% of the time-
sensitive problems I have seen with Linux: Use the Adeos nanokernel to
implement the real-time component of drivers as a priority domain the
interrupt pipeline. Hence, instead of the current driver model:
1- Int handler
2- BH/SoftIRQ/etc.
We get:
1- Hard-rt handler
2- Normal Linux handler
3- BH/SoftIRQ/etc.

This is even without any RTAI/RTL or anything of that kind.

And for the rest, then you want to have a look at Philippe Gerum's
ongoing RTAI-fusion work. With RTAI-fusion, you get the same normal
Linux ABI for all user-space tasks, but you get hard-rt for free.
Practically, normal Linux calls are caught and run through RTAI
instead of Linux. Philippe has already got the standard nanosleep()
running in hard-rt. So the question is therefore straight-forward:
should we engineer a path for converting the entire kernel to hard-rt
or do we keep the kernel as it was designed and add the necessary
mechanisms for obtaining hard-rt using things that were made to
provide it by design?

I personally believe that the added complexity does not benefit
Linux. I may yet be shown wrong, but with what we can currently do
with plain vanilla Adeos, and where RTAI/fusion is heading, the
problem space of applications which can't be serviced by this
combination is getting increasingly limited.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 20:21         ` Bill Huey
  2004-07-28 20:42           ` Lee Revell
  2004-07-28 21:48           ` Karim Yaghmour
@ 2004-07-28 22:03           ` Philippe Gerum
  2 siblings, 0 replies; 30+ messages in thread
From: Philippe Gerum @ 2004-07-28 22:03 UTC (permalink / raw)
  To: Bill Huey
  Cc: Lee Revell, Karim Yaghmour, Scott Wood, Ingo Molnar,
	La Monte H.P. Yarroll, Manas Saksena, linux-kernel

On Wed, 2004-07-28 at 22:21, Bill Huey wrote:

> With that said, there's really two camps that are emerging in the real
> time Linux field, dual and single kernel. The single kernel work that's
> current being done could very well get Linux to being hard RT, assuming
> that you solve all of the technical problems with things like RCU,
> etc... in 2.6.
> 
> The dual kernels folks would be in less of position to VAR their own
> stuff and sell proprietary products if Linux were to get native hard RT
> performance if you accept that economic criteria. Who knows what the
> actual results will be.

<snip>

> Now that Windriver System (the idiot folks that never understood Linux
> before laying off tons of folks and disbanned the rather famous BSD/OS
> group which I was apart of, etc...) and Red Hat is in the picture, it's
> all starting to cook up.
> 

With the ego power switch and name-calling amplifier lowered to the
minimum, maybe there could be a third approach, like cooperation between
both through a better integration. At least, the signal / noise ratio
would improve...

The hard RT people I know of and work with want to be able 1) to get
microsecond level bounded interrupt latency with no exception to this
rule, and 2) to be able to choose the right level of dispatch latency
on a thread-by-thread basis, from a few microseconds to a few hundreds
of, but in any case _bounded_ and predictable in the worst case. For
this to happen, they are willing to accept stringent limitations
functionaly-wise if need be to obtain the first, but still get access
to the regular Linux programming model and APIs if the second fits
their apps.  They already know how they could mix both properly in
what would look like a single system from the application pov.

For these people, the current undergoing work aimed at improving the
current determinism of the vanilla kernel is everything but a danger:
it's fundamental and very good news, because it could make 2) a
reality sooner or later. However, point 1) remains an issue, and
unless you find a solution for mixing fire and water, i.e. determinism
which requires unfairness by design and throughput seeking fairness on
the average case, you would likely end up considering that the Linux
RT people's radical approach of using a dual-kernel does not make them
uneducated bozos (Ok, except me perhaps, but this is not my point).
To get microsecond level guaranteed interrupt latencies, the problem
is far beyond solving random latency spots here and there: it's an
architectural issue.

To achieve this, we (i.e. the educated ones like Karim helping the
uneducated bozo like myself; yep, this is a teamwork) have come to the
conclusion that we needed a portable infrastructure that allows a
complete prioritization of interrupts, and hw events of interest in
general (e.g. traps/exceptions). Some infrastructure that exposes the
same interface regardless of the platform it runs on. It's called
Adeos, the source code identation is terrible (after 20 years practicing
it, I still
find means to worsen my coding style, funky eh?!) but it's a working
example of such kind of infrastructure. The advantage of such kind of
thin layer is that you can plug any hard real-time core over it. This
layer can remain silent when unused, it can be configured out, it is
just an enabler.  You don't have to put the hell of a havoc in a
stable GPOS core to modify some key architectural characteristics of
the Linux kernel in order to buy hard RT capabilities to everyone,
which could be construed as smashing a squadron of flies with nukes.

> All things to think about.

There is evidence that the GPL side of the hard RT universe does too.
Indeed.

-- 

Philippe.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 21:48           ` Karim Yaghmour
@ 2004-07-28 22:30             ` Bill Huey
  0 siblings, 0 replies; 30+ messages in thread
From: Bill Huey @ 2004-07-28 22:30 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: Bill Huey (hui), Lee Revell, Scott Wood, Ingo Molnar,
	La Monte H.P. Yarroll, Manas Saksena, Philippe Gerum,
	linux-kernel

On Wed, Jul 28, 2004 at 05:48:36PM -0400, Karim Yaghmour wrote:
> Two things:
> - What I'm suggesting is a nanokernel-based N kernel scheme, not the
> dual kernel scheme (which is patented BTW).
> - There's only one company out there that is known to sell proprietary
> products around the dual kernel approach, and it isn't mine.

I have to say that I didn't really carefully read the papers for
your stuff and I still don't exactly understand what's going on
with your nanokernel. It sounds interesting, but it doesn't solve
what I think are inescapable preemption problems with the Linux
kernel.

> Indeed. So the question now becomes: is it worth introducing that much
> complexity inside the kernel to solve a problem that matters only
> marginally to server and workstation users? It's already difficult as it
> is to manage the preemption, do we really want to go the full way with
> threaded int handlers and mutexes instead of locks?

Complexity is a kind of relative terms in this conversation. It's
more of a function of what one is uses here verses what other folks
have been doing outside of Linux. I personally don't think it's THAT
hard. It's just that folks inside Linux have really gotten comfortable
using spinlocks for everything instead of another style of thinking.
It's all learnable after you get over the shock and not as hard as
folks think it would be.

> What I'm suggesting is a very simple model that solves 90% of the time-
> sensitive problems I have seen with Linux: Use the Adeos nanokernel to
> implement the real-time component of drivers as a priority domain the
> interrupt pipeline. Hence, instead of the current driver model:
> 1- Int handler
> 2- BH/SoftIRQ/etc.
> We get:
> 1- Hard-rt handler
> 2- Normal Linux handler
> 3- BH/SoftIRQ/etc.

I don't immediately see why this is going to give us hard RT for Linux
applications. I'd like a short explanation if you have time to reply
to this.

> This is even without any RTAI/RTL or anything of that kind.
> 
> And for the rest, then you want to have a look at Philippe Gerum's
> ongoing RTAI-fusion work. With RTAI-fusion, you get the same normal
> Linux ABI for all user-space tasks, but you get hard-rt for free.
> Practically, normal Linux calls are caught and run through RTAI
> instead of Linux. Philippe has already got the standard nanosleep()
> running in hard-rt. So the question is therefore straight-forward:
> should we engineer a path for converting the entire kernel to hard-rt
> or do we keep the kernel as it was designed and add the necessary
> mechanisms for obtaining hard-rt using things that were made to
> provide it by design?

I don't really understand what RTAI fusion is and I would like
some pointers to this. I've read the web page for this, but I wasn't
able to extract a good chunk of information that would give me a
clear picture of what it does.

> I personally believe that the added complexity does not benefit
> Linux. I may yet be shown wrong, but with what we can currently do
> with plain vanilla Adeos, and where RTAI/fusion is heading, the
> problem space of applications which can't be serviced by this
> combination is getting increasingly limited.

Well, you have to think of it in a wider scope for a bit. If you
want rate-guarateed streaming partition support from SGI's XFS,
touching dcache, VM, etc... then you have to dealing with
non-preemptable locking throughout the Linux kernel, no choice.
Having a pervasive model is a superior solution to dual kernels
since you have the full support of Linux verses the constrained
semantics and APIs of a controlled sub-domain. That typically needs
to be temporarly buffered (and API) both in and out of the a
Linux sub-system. I fail to see how a nano-kernel, RTAI and other
supporting kernel sub-systems can get around this problem without
solving preemption issues within Linux.

When you think of things like this you have to think of what's
happening now with applications that include the traditional RT
stuff as well as media driven apps, which are becoming more and
prevalent as little things like cell phone devices get more and
more bandwidth. For media driven applications, none of the
secondary/dual kernel or things like that can properly deal with
things like IO thread/channel bonding, which depends on various
IO layers that touch many SMP locked critical sections/systems
in the kernel. Running into Linux's SMP locking logic in these
cases are inescapable, so I don't see a choice but doing some
kind of full lock conversion.

Anytime you have an application that uses a full suite of Linux
facilities will have this problem. It just hasn't been done for
a 2.6 kernel just yet.

bill

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28  8:10 ` Ingo Molnar
@ 2004-07-28 23:12   ` Scott Wood
  2004-07-29 19:33     ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Scott Wood @ 2004-07-28 23:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Scott Wood, linux-kernel, La Monte H.P. Yarroll, Manas Saksena

On Wed, Jul 28, 2004 at 10:10:05AM +0200, Ingo Molnar wrote:
> i'm wondering about a couple of details. Why were the changes to
> note_interrupt() necessary?

The intent was to defer the calling of note_interrupt() until after
all handlers (threaded and non-threaded) had run.  However, it's
probably not strictly necessary, as it requires the vast majority of
IRQs to be unhandled (whereas threads will only give you 50%).

It might be nice to add a note_interrupt() to the no-action case as
well, in case an interrupt with no handlers at all fails to be masked
properly.

Oh, and the calling of report_bad_irq() seems to have disappeared
from run_irq_thread in that patch; there should be a:

if (!noirqdebug) {
	if (retval == IRQ_HANDLED)
		desc->status &= ~IRQ_UNHANDLED;
	else
		report_bad_irq(irq, desc, retval);
}

before the "desc->status &= ~IRQ_THREADRUNNING;" line in
kernel/irq.c.

> Also, why the enable_irq() change? 

If you mean the do_startup_irq() change, it was mainly to manage the
IRQ_DELAYEDSTARTUP flag, which prevents an IRQ from being unmasked
before the thread has been created (or more accurately, reminds
run_irq_thread to call do_startup_irq()).

> What do you think about the simpler approach in my patch which
> keeps the irq masked until the thread runs?

That way works as well (the desc would just have IRQ_THREADPENDING
marked until the thread runs for the first time, if an IRQ does
happen before the thread starts).

I've attached a new version of the patch that eliminates the
note_interrupt() and startup/shutdown changes, adds the missing
note_interrupt to run_irq_thread(), and makes in_interrupt respond
positively to IRQ threads (it didn't do so in 2.4 because a lot of
places used in_interrupt as in_atomic, but that shouldn't be a
problem now, at least in core code).

Signed-off-by: Scott Wood <scott.wood@timesys.com> under TS0058.

diff -urN linux-2.6.8-rc2/arch/i386/kernel/i386_ksyms.c linux-2.6.8-rc2-irq-threads/arch/i386/kernel/i386_ksyms.c
--- linux-2.6.8-rc2/arch/i386/kernel/i386_ksyms.c	2004-07-27 17:06:24.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/arch/i386/kernel/i386_ksyms.c	2004-07-27 17:08:37.000000000 -0400
@@ -146,7 +146,6 @@
 EXPORT_SYMBOL_NOVERS(__read_lock_failed);
 
 /* Global SMP stuff */
-EXPORT_SYMBOL(synchronize_irq);
 EXPORT_SYMBOL(smp_call_function);
 
 /* TLB flushing */
@@ -154,6 +153,10 @@
 EXPORT_SYMBOL_GPL(flush_tlb_all);
 #endif
 
+#if defined(CONFIG_SMP) || defined(CONFIG_IRQ_THREADS) 
+EXPORT_SYMBOL(synchronize_irq);
+#endif
+
 #ifdef CONFIG_X86_IO_APIC
 EXPORT_SYMBOL(IO_APIC_get_PCI_irq_vector);
 #endif
diff -urN linux-2.6.8-rc2/arch/i386/kernel/i8259.c linux-2.6.8-rc2-irq-threads/arch/i386/kernel/i8259.c
--- linux-2.6.8-rc2/arch/i386/kernel/i8259.c	2004-07-27 17:06:24.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/arch/i386/kernel/i8259.c	2004-07-27 17:09:57.000000000 -0400
@@ -332,7 +332,8 @@
  * New motherboards sometimes make IRQ 13 be a PCI interrupt,
  * so allow interrupt sharing.
  */
-static struct irqaction fpu_irq = { math_error_irq, 0, CPU_MASK_NONE, "fpu", NULL, NULL };
+static struct irqaction fpu_irq = 
+	{ math_error_irq, SA_NOTHREAD, CPU_MASK_NONE, "fpu", NULL, NULL };
 
 void __init init_ISA_irqs (void)
 {
diff -urN linux-2.6.8-rc2/arch/i386/kernel/irq.c linux-2.6.8-rc2-irq-threads/arch/i386/kernel/irq.c
--- linux-2.6.8-rc2/arch/i386/kernel/irq.c	2004-07-27 17:06:24.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/arch/i386/kernel/irq.c	2004-07-28 18:35:26.000000000 -0400
@@ -200,8 +200,9 @@
 
 
 
-
-#ifdef CONFIG_SMP
+/* When IRQ threads are enabled, this has to synchronize with the thread.
+   The function to do this is provided in generic code. */
+#if defined(CONFIG_SMP) && !defined(CONFIG_IRQ_THREADS)
 inline void synchronize_irq(unsigned int irq)
 {
 	while (irq_desc[irq].status & IRQ_INPROGRESS)
@@ -226,10 +227,16 @@
 		local_irq_enable();
 
 	do {
-		status |= action->flags;
-		retval |= action->handler(irq, action->dev_id, regs);
+#ifdef CONFIG_IRQ_THREADS
+		if (action->flags & SA_NOTHREAD)
+#endif
+		{
+			status |= action->flags;
+			retval |= action->handler(irq, action->dev_id, regs);
+		}
 		action = action->next;
 	} while (action);
+
 	if (status & SA_SAMPLE_RANDOM)
 		add_interrupt_randomness(irq);
 	local_irq_disable();
@@ -268,7 +275,7 @@
 	}
 }
 
-static int noirqdebug;
+int noirqdebug;
 
 static int __init noirqdebug_setup(char *str)
 {
@@ -289,7 +296,7 @@
  *
  * Called under desc->lock
  */
-static void note_interrupt(int irq, irq_desc_t *desc, irqreturn_t action_ret)
+void note_interrupt(int irq, irq_desc_t *desc, irqreturn_t action_ret)
 {
 	if (action_ret != IRQ_HANDLED) {
 		desc->irqs_unhandled++;
@@ -395,7 +402,14 @@
 			desc->status = status | IRQ_REPLAY;
 			hw_resend_irq(desc->handler,irq);
 		}
-		desc->handler->enable(irq);
+		
+		/* Don't unmask the IRQ if it's in progress, or else you
+		   could re-enter the IRQ handler.  As it is now enabled,
+		   the IRQ will be unmasked when the handler is finished. */
+		
+		if (!(desc->status & (IRQ_INPROGRESS | IRQ_THREADRUNNING |
+		                      IRQ_THREADPENDING)))
+			desc->handler->enable(irq);
 		/* fall-through */
 	}
 	default:
@@ -408,12 +422,7 @@
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 
-/*
- * do_IRQ handles all normal device IRQ's (the special
- * SMP cross-CPU interrupts have their own specific
- * handlers).
- */
-asmlinkage unsigned int do_IRQ(struct pt_regs regs)
+static void really_do_IRQ(struct pt_regs *regs)
 {	
 	/* 
 	 * We ack quickly, we don't want the irq controller
@@ -425,27 +434,11 @@
 	 * 0 return value means that this irq is already being
 	 * handled by some other CPU. (or is disabled)
 	 */
-	int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code  */
+	int irq = regs->orig_eax & 0xff; /* high bits used in ret_from_ code  */
 	irq_desc_t *desc = irq_desc + irq;
 	struct irqaction * action;
 	unsigned int status;
 
-	irq_enter();
-
-#ifdef CONFIG_DEBUG_STACKOVERFLOW
-	/* Debugging check for stack overflow: is there less than 1KB free? */
-	{
-		long esp;
-
-		__asm__ __volatile__("andl %%esp,%0" :
-					"=r" (esp) : "0" (THREAD_SIZE - 1));
-		if (unlikely(esp < (sizeof(struct thread_info) + STACK_WARN))) {
-			printk("do_IRQ: stack overflow: %ld\n",
-				esp - sizeof(struct thread_info));
-			dump_stack();
-		}
-	}
-#endif
 	kstat_this_cpu.irqs[irq]++;
 	spin_lock(&desc->lock);
 	desc->handler->ack(irq);
@@ -461,7 +454,8 @@
 	 * use the action we have.
 	 */
 	action = NULL;
-	if (likely(!(status & (IRQ_DISABLED | IRQ_INPROGRESS)))) {
+	if (likely(!(status & (IRQ_DISABLED | IRQ_INPROGRESS |
+	                       IRQ_THREADPENDING | IRQ_THREADRUNNING)))) {
 		action = desc->action;
 		status &= ~IRQ_PENDING; /* we commit to handling */
 		status |= IRQ_INPROGRESS; /* we are handling it */
@@ -487,89 +481,111 @@
 	 * useful for irq hardware that does not mask cleanly in an
 	 * SMP environment.
 	 */
-#ifdef CONFIG_4KSTACKS
-
 	for (;;) {
 		irqreturn_t action_ret;
-		u32 *isp;
-		union irq_ctx * curctx;
-		union irq_ctx * irqctx;
-
-		curctx = (union irq_ctx *) current_thread_info();
-		irqctx = hardirq_ctx[smp_processor_id()];
-
-		spin_unlock(&desc->lock);
-
-		/*
-		 * this is where we switch to the IRQ stack. However, if we are already using
-		 * the IRQ stack (because we interrupted a hardirq handler) we can't do that
-		 * and just have to keep using the current stack (which is the irq stack already
-		 * after all)
-		 */
-
-		if (curctx == irqctx)
-			action_ret = handle_IRQ_event(irq, &regs, action);
-		else {
-			/* build the stack frame on the IRQ stack */
-			isp = (u32*) ((char*)irqctx + sizeof(*irqctx));
-			irqctx->tinfo.task = curctx->tinfo.task;
-			irqctx->tinfo.previous_esp = current_stack_pointer();
-
-			*--isp = (u32) action;
-			*--isp = (u32) &regs;
-			*--isp = (u32) irq;
-
-			asm volatile(
-				"       xchgl   %%ebx,%%esp     \n"
-				"       call    handle_IRQ_event \n"
-				"       xchgl   %%ebx,%%esp     \n"
-				: "=a"(action_ret)
-				: "b"(isp)
-				: "memory", "cc", "edx", "ecx"
-			);
 
+#ifdef CONFIG_IRQ_THREADS
+		if (desc->status & IRQ_THREAD) {
+			desc->status |= IRQ_THREADPENDING;
+			
+			if (desc->thread)
+				wake_up_process(desc->thread);
+		}
+		
+		if (desc->status & IRQ_NOTHREAD)
+#endif
+		{
+			spin_unlock(&desc->lock);
+			action_ret = handle_IRQ_event(irq, regs, action);
+			spin_lock(&desc->lock);
 
+			if (!noirqdebug)
+				note_interrupt(irq, desc, action_ret);
 		}
-		spin_lock(&desc->lock);
-		if (!noirqdebug)
-			note_interrupt(irq, desc, action_ret);
-		if (curctx != irqctx)
-			irqctx->tinfo.task = NULL;
+			
 		if (likely(!(desc->status & IRQ_PENDING)))
 			break;
 		desc->status &= ~IRQ_PENDING;
 	}
 
-#else
+	desc->status &= ~IRQ_INPROGRESS;
 
-	for (;;) {
-		irqreturn_t action_ret;
+out:
+	/*
+	 * The ->end() handler has to deal with interrupts which got
+	 * disabled while the handler was running.
+	 */
+	if (!(desc->status & (IRQ_THREADPENDING | IRQ_THREADRUNNING)))
+		desc->handler->end(irq);
+	spin_unlock(&desc->lock);
+}
 
-		spin_unlock(&desc->lock);
+/*
+ * do_IRQ handles all normal device IRQ's (the special
+ * SMP cross-CPU interrupts have their own specific
+ * handlers).
+ */
+asmlinkage void do_IRQ(struct pt_regs regs)
+{
+#ifdef CONFIG_4KSTACKS
+	u32 *isp;
+	union irq_ctx *curctx;
+	union irq_ctx *irqctx;
+#endif
+
+	irq_enter();
 
-		action_ret = handle_IRQ_event(irq, &regs, action);
+#ifdef CONFIG_DEBUG_STACKOVERFLOW
+	/* Debugging check for stack overflow: is there less than 1KB free? */
+	{
+		long esp;
 
-		spin_lock(&desc->lock);
-		if (!noirqdebug)
-			note_interrupt(irq, desc, action_ret);
-		if (likely(!(desc->status & IRQ_PENDING)))
-			break;
-		desc->status &= ~IRQ_PENDING;
+		asm volatile("andl %%esp,%0" :
+		             "=r" (esp) : "0" (THREAD_SIZE - 1));
+
+		if (unlikely(esp < (sizeof(struct thread_info) + STACK_WARN))) {
+			printk("do_IRQ: stack overflow: %ld\n",
+			       esp - sizeof(struct thread_info));
+			dump_stack();
+		}
 	}
 #endif
-	desc->status &= ~IRQ_INPROGRESS;
 
-out:
+#ifdef CONFIG_4KSTACKS
+	curctx = (union irq_ctx *) current_thread_info();
+	irqctx = hardirq_ctx[smp_processor_id()];
+
 	/*
-	 * The ->end() handler has to deal with interrupts which got
-	 * disabled while the handler was running.
+	 * this is where we switch to the IRQ stack. However, if we are already using
+	 * the IRQ stack (because we interrupted a hardirq handler) we can't do that
+	 * and just have to keep using the current stack (which is the irq stack already
+	 * after all)
 	 */
-	desc->handler->end(irq);
-	spin_unlock(&desc->lock);
 
-	irq_exit();
+	if (curctx == irqctx) {
+		really_do_IRQ(&regs);
+	} else {
+		/* build the stack frame on the IRQ stack */
+		isp = (u32*) ((char*)irqctx + sizeof(*irqctx));
+		irqctx->tinfo.task = curctx->tinfo.task;
+		irqctx->tinfo.previous_esp = current_stack_pointer();
 
-	return 1;
+		*--isp = (u32) &regs;
+
+		asm volatile("xchgl   %%ebx, %%esp;"
+		             "call    really_do_IRQ;"
+ 		             "xchgl   %%ebx, %%esp;"
+		           : /* no outputs */
+		           : "b" (isp)
+		           : "memory", "cc", "eax", "edx", "ecx");
+
+		irqctx->tinfo.task = NULL;
+	}
+#else
+	really_do_IRQ(&regs);
+#endif
+
+	irq_exit();
 }
 
 int can_request_irq(unsigned int irq, unsigned long irqflags)
@@ -943,6 +959,8 @@
 		rand_initialize_irq(irq);
 	}
 
+	setup_irq_spawn_thread(irq, new);
+
 	/*
 	 * The following block of code has to be executed atomically
 	 */
diff -urN linux-2.6.8-rc2/arch/i386/mach-default/setup.c linux-2.6.8-rc2-irq-threads/arch/i386/mach-default/setup.c
--- linux-2.6.8-rc2/arch/i386/mach-default/setup.c	2004-07-27 17:06:24.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/arch/i386/mach-default/setup.c	2004-07-27 17:10:42.000000000 -0400
@@ -27,7 +27,8 @@
 /*
  * IRQ2 is cascade interrupt to second interrupt controller
  */
-static struct irqaction irq2 = { no_action, 0, CPU_MASK_NONE, "cascade", NULL, NULL};
+static struct irqaction irq2 = 
+	{ no_action, SA_NOTHREAD, CPU_MASK_NONE, "cascade", NULL, NULL};
 
 /**
  * intr_init_hook - post gate setup interrupt initialisation
@@ -71,7 +72,9 @@
 {
 }
 
-static struct irqaction irq0  = { timer_interrupt, SA_INTERRUPT, CPU_MASK_NONE, "timer", NULL, NULL};
+static struct irqaction irq0  = 
+	{ timer_interrupt, SA_INTERRUPT | SA_NOTHREAD, CPU_MASK_NONE, 
+	  "timer", NULL, NULL};
 
 /**
  * time_init_hook - do any specific initialisations for the system timer.
diff -urN linux-2.6.8-rc2/arch/i386/mach-visws/setup.c linux-2.6.8-rc2-irq-threads/arch/i386/mach-visws/setup.c
--- linux-2.6.8-rc2/arch/i386/mach-visws/setup.c	2004-06-16 01:18:59.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/arch/i386/mach-visws/setup.c	2004-07-27 17:08:46.000000000 -0400
@@ -112,7 +112,7 @@
 
 static struct irqaction irq0 = {
 	.handler =	timer_interrupt,
-	.flags =	SA_INTERRUPT,
+	.flags =	SA_INTERRUPT | SA_NOTHREAD,
 	.name =		"timer",
 };
 
diff -urN linux-2.6.8-rc2/arch/i386/mach-visws/visws_apic.c linux-2.6.8-rc2-irq-threads/arch/i386/mach-visws/visws_apic.c
--- linux-2.6.8-rc2/arch/i386/mach-visws/visws_apic.c	2004-06-16 01:18:57.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/arch/i386/mach-visws/visws_apic.c	2004-07-27 17:08:46.000000000 -0400
@@ -261,11 +261,13 @@
 static struct irqaction master_action = {
 	.handler =	piix4_master_intr,
 	.name =		"PIIX4-8259",
+	.flags =        SA_NOTHREAD,
 };
 
 static struct irqaction cascade_action = {
 	.handler = 	no_action,
 	.name =		"cascade",
+	.flags =        SA_NOTHREAD,
 };
 
 
diff -urN linux-2.6.8-rc2/arch/i386/mach-voyager/setup.c linux-2.6.8-rc2-irq-threads/arch/i386/mach-voyager/setup.c
--- linux-2.6.8-rc2/arch/i386/mach-voyager/setup.c	2004-07-27 17:06:24.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/arch/i386/mach-voyager/setup.c	2004-07-27 17:11:14.000000000 -0400
@@ -17,7 +17,8 @@
 /*
  * IRQ2 is cascade interrupt to second interrupt controller
  */
-static struct irqaction irq2 = { no_action, 0, CPU_MASK_NONE, "cascade", NULL, NULL};
+static struct irqaction irq2 = 
+	{ no_action, SA_NOTHREAD, CPU_MASK_NONE, "cascade", NULL, NULL};
 
 void __init intr_init_hook(void)
 {
@@ -40,7 +41,9 @@
 {
 }
 
-static struct irqaction irq0  = { timer_interrupt, SA_INTERRUPT, CPU_MASK_NONE, "timer", NULL, NULL};
+static struct irqaction irq0  = 
+	{ timer_interrupt, SA_INTERRUPT | SA_NOTHREAD, 
+	  CPU_MASK_NONE, "timer", NULL, NULL};
 
 void __init time_init_hook(void)
 {
diff -urN linux-2.6.8-rc2/drivers/input/serio/i8042.c linux-2.6.8-rc2-irq-threads/drivers/input/serio/i8042.c
--- linux-2.6.8-rc2/drivers/input/serio/i8042.c	2004-06-16 01:18:57.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/drivers/input/serio/i8042.c	2004-07-27 17:52:11.000000000 -0400
@@ -277,7 +277,7 @@
 			return 0;
 
 	if (request_irq(values->irq, i8042_interrupt,
-			SA_SHIRQ, "i8042", i8042_request_irq_cookie)) {
+			SA_SHIRQ | SA_NOTHREAD, "i8042", i8042_request_irq_cookie)) {
 		printk(KERN_ERR "i8042.c: Can't get irq %d for %s, unregistering the port.\n", values->irq, values->name);
 		goto irq_fail;
 	}
@@ -571,7 +571,7 @@
  * in trying to detect AUX presence.
  */
 
-	if (request_irq(values->irq, i8042_interrupt, SA_SHIRQ,
+	if (request_irq(values->irq, i8042_interrupt, SA_SHIRQ | SA_NOTHREAD,
 				"i8042", &i8042_check_aux_cookie))
                 return -1;
 	free_irq(values->irq, &i8042_check_aux_cookie);
diff -urN linux-2.6.8-rc2/include/asm-i386/hardirq.h linux-2.6.8-rc2-irq-threads/include/asm-i386/hardirq.h
--- linux-2.6.8-rc2/include/asm-i386/hardirq.h	2004-06-16 01:19:43.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/include/asm-i386/hardirq.h	2004-07-28 18:30:48.000000000 -0400
@@ -64,10 +64,19 @@
  * Are we doing bottom half or hardware interrupt processing?
  * Are we in a softirq context? Interrupt context?
  */
+#ifdef CONFIG_IRQ_THREADS
+#define in_irq() (hardirq_count() || (current->flags & PF_IRQHANDLER))
+#else
 #define in_irq()		(hardirq_count())
+#endif
+
 #define in_softirq()		(softirq_count())
-#define in_interrupt()		(irq_count())
 
+#ifdef CONFIG_IRQ_THREADS
+#define in_interrupt() (irq_count() || (current->flags & PF_IRQHANDLER))
+#else
+#define in_interrupt()		(irq_count())
+#endif
 
 #define hardirq_trylock()	(!in_interrupt())
 #define hardirq_endlock()	do { } while (0)
@@ -92,7 +101,7 @@
 		preempt_enable_no_resched();				\
 } while (0)
 
-#ifndef CONFIG_SMP
+#if !defined(CONFIG_SMP) && !defined(CONFIG_IRQ_THREADS)
 # define synchronize_irq(irq)	barrier()
 #else
   extern void synchronize_irq(unsigned int irq);
diff -urN linux-2.6.8-rc2/include/asm-i386/irq.h linux-2.6.8-rc2-irq-threads/include/asm-i386/irq.h
--- linux-2.6.8-rc2/include/asm-i386/irq.h	2004-06-16 01:19:37.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/include/asm-i386/irq.h	2004-07-27 17:08:46.000000000 -0400
@@ -27,6 +27,8 @@
 extern void release_x86_irqs(struct task_struct *);
 extern int can_request_irq(unsigned int, unsigned long flags);
 
+#define get_irq_desc(irq) (&irq_desc[irq])
+
 #ifdef CONFIG_X86_LOCAL_APIC
 #define ARCH_HAS_NMI_WATCHDOG		/* See include/linux/nmi.h */
 #endif
diff -urN linux-2.6.8-rc2/include/asm-i386/signal.h linux-2.6.8-rc2-irq-threads/include/asm-i386/signal.h
--- linux-2.6.8-rc2/include/asm-i386/signal.h	2004-07-27 17:06:26.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/include/asm-i386/signal.h	2004-07-27 17:08:46.000000000 -0400
@@ -122,6 +122,7 @@
 #define SA_PROBE		SA_ONESHOT
 #define SA_SAMPLE_RANDOM	SA_RESTART
 #define SA_SHIRQ		0x04000000
+#define SA_NOTHREAD             0x01000000
 #endif
 
 #define SIG_BLOCK          0	/* for blocking signals */
diff -urN linux-2.6.8-rc2/include/linux/interrupt.h linux-2.6.8-rc2-irq-threads/include/linux/interrupt.h
--- linux-2.6.8-rc2/include/linux/interrupt.h	2004-07-27 17:06:26.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/include/linux/interrupt.h	2004-07-28 18:40:19.000000000 -0400
@@ -51,7 +51,7 @@
 /*
  * Temporary defines for UP kernels, until all code gets fixed.
  */
-#ifndef CONFIG_SMP
+#if !defined(CONFIG_SMP) && !defined(CONFIG_IRQ_THREADS)
 # define cli()			local_irq_disable()
 # define sti()			local_irq_enable()
 # define save_flags(x)		local_save_flags(x)
@@ -247,4 +247,15 @@
 extern int probe_irq_off(unsigned long);	/* returns 0 or negative on failure */
 extern unsigned int probe_irq_mask(unsigned long);	/* returns mask of ISA interrupts */
 
+#ifdef CONFIG_IRQ_THREADS
+
+/* This is under CONFIG_IRQ_THREADS for now, so it doesn't break other
+   architectures where it's still static.  It has to be here rather
+   than in irq.h, because it depends on irqreturn_t, and including
+   this file from irq.h apparently causes a loop. */
+
+void note_interrupt(int irq, irq_desc_t *desc, irqreturn_t action_ret);
+extern int noirqdebug;
+
+#endif
 #endif
diff -urN linux-2.6.8-rc2/include/linux/irq.h linux-2.6.8-rc2-irq-threads/include/linux/irq.h
--- linux-2.6.8-rc2/include/linux/irq.h	2004-06-16 01:19:17.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/include/linux/irq.h	2004-07-28 18:40:29.000000000 -0400
@@ -23,15 +23,19 @@
 /*
  * IRQ line status.
  */
-#define IRQ_INPROGRESS	1	/* IRQ handler active - do not enter! */
-#define IRQ_DISABLED	2	/* IRQ disabled - do not enter! */
-#define IRQ_PENDING	4	/* IRQ pending - replay on enable */
-#define IRQ_REPLAY	8	/* IRQ has been replayed but not acked yet */
-#define IRQ_AUTODETECT	16	/* IRQ is being autodetected */
-#define IRQ_WAITING	32	/* IRQ not yet seen - for autodetection */
-#define IRQ_LEVEL	64	/* IRQ level triggered */
-#define IRQ_MASKED	128	/* IRQ masked - shouldn't be seen again */
-#define IRQ_PER_CPU	256	/* IRQ is per CPU */
+#define IRQ_INPROGRESS     1    /* IRQ handler active - do not enter! */
+#define IRQ_DISABLED       2    /* IRQ disabled - do not enter! */
+#define IRQ_PENDING        4    /* IRQ pending - replay on enable */
+#define IRQ_REPLAY         8    /* IRQ has been replayed but not acked yet */
+#define IRQ_AUTODETECT     16   /* IRQ is being autodetected */
+#define IRQ_WAITING        32   /* IRQ not yet seen - for autodetection */
+#define IRQ_LEVEL          64   /* IRQ level triggered */
+#define IRQ_MASKED         128  /* IRQ masked - shouldn't be seen again */
+#define IRQ_PER_CPU        256  /* IRQ is per CPU */
+#define IRQ_THREAD         512  /* IRQ has at least one threaded handler */
+#define IRQ_NOTHREAD       1024 /* IRQ has at least one nonthreaded handler */
+#define IRQ_THREADPENDING  2048 /* IRQ thread has been woken */
+#define IRQ_THREADRUNNING  4096 /* IRQ thread is currently running */
 
 /*
  * Interrupt controller descriptor. This is all we need
@@ -65,6 +69,10 @@
 	unsigned int irq_count;		/* For detecting broken interrupts */
 	unsigned int irqs_unhandled;
 	spinlock_t lock;
+#ifdef CONFIG_IRQ_THREADS
+	struct task_struct *thread;
+	wait_queue_head_t sync;
+#endif
 } ____cacheline_aligned irq_desc_t;
 
 extern irq_desc_t irq_desc [NR_IRQS];
@@ -75,6 +83,24 @@
 
 extern hw_irq_controller no_irq_type;  /* needed in every arch ? */
 
+#ifdef CONFIG_IRQ_THREADS
+
+void spawn_irq_threads(void);
+void setup_irq_spawn_thread(unsigned int irq, struct irqaction *new);
+
+#else
+
+static inline void spawn_irq_threads(void)
+{
+}
+
+static inline void setup_irq_spawn_thread(unsigned int irq,
+                                          struct irqaction *new)
+{
+}
+
+#endif
+
 #endif
 
 #endif /* __irq_h */
diff -urN linux-2.6.8-rc2/include/linux/sched.h linux-2.6.8-rc2-irq-threads/include/linux/sched.h
--- linux-2.6.8-rc2/include/linux/sched.h	2004-07-27 17:06:26.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/include/linux/sched.h	2004-07-27 17:08:46.000000000 -0400
@@ -555,6 +555,7 @@
 #define PF_SWAPOFF	0x00080000	/* I am in swapoff */
 #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
 #define PF_SYNCWRITE	0x00200000	/* I am doing a sync write */
+#define PF_IRQHANDLER   0x00400000      /* in_irq() should return true */
 
 #ifdef CONFIG_SMP
 #define SCHED_LOAD_SCALE	128UL	/* increase resolution of load */
diff -urN linux-2.6.8-rc2/init/Kconfig linux-2.6.8-rc2-irq-threads/init/Kconfig
--- linux-2.6.8-rc2/init/Kconfig	2004-07-27 17:06:26.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/init/Kconfig	2004-07-27 17:08:46.000000000 -0400
@@ -294,6 +294,18 @@
 
 	  If unsure, say N.
 
+config IRQ_THREADS
+  bool "Run all IRQs in threads by default"
+  depends on PREEMPT
+  help
+    This option creates a thread for each IRQ, which runs at high
+    real-time priority, unless the SA_NOTHREAD option is passed to
+    request_irq().  This allows these IRQs to be prioritized, so as
+    to avoid preempting very high priority real-time tasks.  This
+    also allows spinlocks used by threaded IRQs to be converted
+    into sleeping mutexes, for further reduction of latency (however,
+    this is not done automatically).
+
 endmenu		# General setup
 
 
diff -urN linux-2.6.8-rc2/init/main.c linux-2.6.8-rc2-irq-threads/init/main.c
--- linux-2.6.8-rc2/init/main.c	2004-07-27 17:06:26.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/init/main.c	2004-07-27 17:08:46.000000000 -0400
@@ -668,6 +668,8 @@
 	smp_init();
 	sched_init_smp();
 
+	spawn_irq_threads();
+
 	/*
 	 * Do this before initcalls, because some drivers want to access
 	 * firmware files.
diff -urN linux-2.6.8-rc2/kernel/Makefile linux-2.6.8-rc2-irq-threads/kernel/Makefile
--- linux-2.6.8-rc2/kernel/Makefile	2004-07-27 17:06:26.000000000 -0400
+++ linux-2.6.8-rc2-irq-threads/kernel/Makefile	2004-07-27 17:08:46.000000000 -0400
@@ -23,6 +23,7 @@
 obj-$(CONFIG_STOP_MACHINE) += stop_machine.o
 obj-$(CONFIG_AUDIT) += audit.o
 obj-$(CONFIG_AUDITSYSCALL) += auditsc.o
+obj-$(CONFIG_IRQ_THREADS) += irq.o
 
 ifneq ($(CONFIG_IA64),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
diff -urN linux-2.6.8-rc2/kernel/irq.c linux-2.6.8-rc2-irq-threads/kernel/irq.c
--- linux-2.6.8-rc2/kernel/irq.c	1969-12-31 19:00:00.000000000 -0500
+++ linux-2.6.8-rc2-irq-threads/kernel/irq.c	2004-07-28 18:41:03.000000000 -0400
@@ -0,0 +1,179 @@
+/*
+ *	linux/kernel/irq.c -- Generic code for threaded IRQ handling
+ *
+ *	Copyright (C) 2001-2004 TimeSys Corp.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/config.h>
+#include <linux/sched.h>
+#include <linux/interrupt.h>
+#include <linux/slab.h>
+#include <linux/irq.h>
+#include <linux/completion.h>
+#include <linux/syscalls.h>
+#include <linux/random.h>
+
+#include <asm/uaccess.h>
+
+static const int irq_prio = MAX_USER_RT_PRIO - 2;
+
+static inline void synchronize_hard_irq(unsigned int irq)
+{
+#ifdef CONFIG_SMP
+	while (get_irq_desc(irq)->status & IRQ_INPROGRESS)
+		cpu_relax();
+#endif
+}
+
+void synchronize_irq(unsigned int irq)
+{
+	irq_desc_t *desc = get_irq_desc(irq);
+	
+	synchronize_hard_irq(irq);
+	
+	if (desc->thread)
+		wait_event(desc->sync, !(desc->status & IRQ_THREADRUNNING));
+}
+
+typedef struct {
+	struct completion comp;
+	int irq;
+} irq_thread_info;
+
+static int run_irq_thread(void *__info)
+{
+	irq_thread_info *info = __info;
+	int irq = info->irq;
+	struct sched_param param = { .sched_priority = irq_prio };
+	irq_desc_t *desc = get_irq_desc(irq);
+	
+	daemonize("IRQ %d", irq);
+	
+	set_fs(KERNEL_DS);
+	sys_sched_setscheduler(0, SCHED_FIFO, &param);
+	
+	current->flags |= PF_IRQHANDLER | PF_NOFREEZE;
+	
+	init_waitqueue_head(&desc->sync);
+	smp_wmb();
+	desc->thread = current;
+	
+	/* info is no longer valid after this... */
+	complete(&info->comp);
+	
+	for (;;) {
+		struct irqaction *action;
+		int status, retval;
+		
+		set_current_state(TASK_INTERRUPTIBLE);
+
+		while (!(desc->status & IRQ_THREADPENDING))
+			schedule();
+		
+		set_current_state(TASK_RUNNING);
+
+		spin_lock_irq(&desc->lock);
+		
+		desc->status |= IRQ_THREADRUNNING;
+		desc->status &= ~IRQ_THREADPENDING;
+		status = desc->status;
+		
+		spin_unlock_irq(&desc->lock);
+		
+		retval = 0;
+		
+		if (!(status & IRQ_DISABLED))	{
+			action = desc->action;
+
+			while (action) {
+				if (!(action->flags & SA_NOTHREAD)) {
+					status |= action->flags;
+					retval |= action->handler(irq, action->dev_id, NULL);
+				}
+				
+				action = action->next;
+			}
+		}
+
+		if (status & SA_SAMPLE_RANDOM)
+			add_interrupt_randomness(irq);
+
+		spin_lock_irq(&desc->lock);
+		
+		if (!noirqdebug)
+			note_interrupt(irq, desc, retval);
+		
+		desc->status &= ~IRQ_THREADRUNNING;
+		if (!(desc->status & (IRQ_THREADPENDING | IRQ_THREADRUNNING)))
+			desc->handler->end(irq);
+		
+		spin_unlock_irq(&desc->lock);
+		
+		if (waitqueue_active(&desc->sync))
+			wake_up(&desc->sync);
+	}
+}
+
+static int ok_to_spawn_threads;
+
+void do_spawn_irq_thread(int irq)
+{
+	irq_thread_info info;
+	
+	info.irq = irq;
+	init_completion(&info.comp);
+
+	if (kernel_thread(run_irq_thread, &info, CLONE_KERNEL) < 0) {
+		printk(KERN_EMERG "Could not spawn thread for IRQ %d\n", irq);
+	} else {
+		wait_for_completion(&info.comp);
+	}
+}
+
+void setup_irq_spawn_thread(unsigned int irq, struct irqaction *new)
+{
+	irq_desc_t *desc = get_irq_desc(irq);
+	int spawn_thread = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&desc->lock, flags);
+	
+	if (new->flags & SA_NOTHREAD) {
+		desc->status |= IRQ_NOTHREAD;
+	} else {
+		/* Only the first threaded handler should spawn
+		   a thread. */
+
+		if (!(desc->status & IRQ_THREAD)) {
+			spawn_thread = 1;
+			desc->status |= IRQ_THREAD;
+		}
+	}
+
+	spin_unlock_irqrestore(&desc->lock, flags);
+	
+	if (ok_to_spawn_threads && spawn_thread)
+		do_spawn_irq_thread(irq);
+}
+
+
+/* This takes care of interrupts that were requested before the
+   scheduler was ready for threads to be created. */
+
+void spawn_irq_threads(void)
+{
+	int i;
+	
+	for (i = 0; i < NR_IRQS; i++) {
+		irq_desc_t *desc = get_irq_desc(i);
+	
+		if (desc->action && !desc->thread && (desc->status & IRQ_THREAD))
+			do_spawn_irq_thread(i);
+	}
+	
+	ok_to_spawn_threads = 1;
+}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28  6:27 ` Ingo Molnar
  2004-07-28 15:38   ` Karim Yaghmour
  2004-07-28 21:23   ` Bill Huey
@ 2004-07-28 23:24   ` Scott Wood
  2 siblings, 0 replies; 30+ messages in thread
From: Scott Wood @ 2004-07-28 23:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Scott Wood, linux-kernel, La Monte H.P. Yarroll, Manas Saksena,
	Bill Huey

On Wed, Jul 28, 2004 at 08:27:22AM +0200, Ingo Molnar wrote:
> > 4. This might be a good time to get around to moving the bulk of the
> > arch/whatever/kernel/irq.c into generic code, as the code said was
> > supposed to happen in 2.5.  This patch is currently only for x86
> > (though we've run IRQ threads on many different platforms in the
> > past).
> 
> agreed. I punted this one for the time being as it's clearly separate
> from the issue of latencies and it's deeply intrusive to 2.6.

The intrusiveness is somewhat mitigated by not having to convert all
architectures at once, though; the generic code would only be
included for those architectures that set the relevant CONFIG_.  At
that point, it basically is just moving code from one file to another
(for x86), and doing minor tweaks for those architectures that are
close to what x86 does.

> > 5. Is there any reason why an IRQ controller might want to have its
> > end() called even if IRQ_DISABLED or IRQ_INPROGRESS is set?  It'd be
> > nice to merge those checks in with the
> > IRQ_THREADPENDING/IRQ_THREADRUNNING checks.
> 
> e.g. in the IO-APIC case if we ack the local APIC only in the end()
> function then we must do that - an un-acked local APIC prevents other
> IRQs from being delivered. We do this for level-triggered IO-APIC irqs.

Yes, but the IO-APIC code needs to be changed anyway to work properly
with IRQ threads.

> what do you think about making the i8259A's interrupt priorities
> configurable? (a'la rtirq patch) Does it make any sense, given how early
> we mask the source irq and ack the interrupt controller [hence giving
> all other interrupts a fair chance to arrive ASAP]?

It could be useful for SA_NOTHREAD interrupts, but I don't think it
buys much for threaded interrupts (as you'd have to wait until normal
IRQs are enabled to schedule the handler thread anyway).

> Bernhard Kuhn's rtirq patch is for IO-APIC/APICs, but i think the
> latency issues could be equally well fixed by not keeping the local APIC
> un-ACK-ed during level triggered irqs, but doing the mask & ack thing.
> This will be slightly slower but should make them both redirectable and
> more symmetric and fair.

I agree.  How much slower would it be (is it worth an #ifdef for irq
threads)?  Hopefully PIC operations aren't as slow as they are on the
8259...

-Scott

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 23:12   ` Scott Wood
@ 2004-07-29 19:33     ` Ingo Molnar
  2004-07-29 20:21       ` Scott Wood
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2004-07-29 19:33 UTC (permalink / raw)
  To: Scott Wood; +Cc: linux-kernel, La Monte H.P. Yarroll, Manas Saksena


* Scott Wood <scott@timesys.com> wrote:

> > Also, why the enable_irq() change? 
> 
> If you mean the do_startup_irq() change, [...]

i mean the change below - why do irqthreads necessiate it?

	Ingo

@@ -395,7 +402,14 @@ void enable_irq(unsigned int irq)
 			desc->status = status | IRQ_REPLAY;
 			hw_resend_irq(desc->handler,irq);
 		}
-		desc->handler->enable(irq);
+		
+		/* Don't unmask the IRQ if it's in progress, or else you
+		   could re-enter the IRQ handler.  As it is now enabled,
+		   the IRQ will be unmasked when the handler is finished. */
+		
+		if (!(desc->status & (IRQ_INPROGRESS | IRQ_THREADRUNNING |
+		                      IRQ_THREADPENDING)))
+			desc->handler->enable(irq);
 		/* fall-through */
 	}
 	default:

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-29 19:33     ` Ingo Molnar
@ 2004-07-29 20:21       ` Scott Wood
  2004-07-29 21:12         ` Alan Cox
  0 siblings, 1 reply; 30+ messages in thread
From: Scott Wood @ 2004-07-29 20:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Scott Wood, linux-kernel, La Monte H.P. Yarroll, Manas Saksena

On Thu, Jul 29, 2004 at 09:33:41PM +0200, Ingo Molnar wrote:
> 
> * Scott Wood <scott@timesys.com> wrote:
> 
> > > Also, why the enable_irq() change? 
> > 
> > If you mean the do_startup_irq() change, [...]
> 
> i mean the change below - why do irqthreads necessiate it?

The intent is to make enable_irq() robust against calls while the
thread is still running/pending (such as if the thread has lower
priority than the task that calls enable_irq()).  This implies that
the preceding disable was of the _nosync() variety.

I believe we saw drivers/net/8390.c doing this, and it was causing an
interrupt storm because, at the time (this was over a year ago),
actionless irqs (which this IRQ was, because the IRQ was still in
progress) had end() called, unmasking it again.  The IRQ was level
triggered, and thus the handler never got a chance to run.

That wouldn't happen with the current code, because it checks for
THREADPENDING || THREADRUNNING before calling end() in
really_do_IRQ(), so now all the check does is save the wasted time of
one bad interrupt each time it happens.

-Scott

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
@ 2004-07-29 20:33 Albert Cahalan
  0 siblings, 0 replies; 30+ messages in thread
From: Albert Cahalan @ 2004-07-29 20:33 UTC (permalink / raw)
  To: linux-kernel mailing list; +Cc: rlrevell

Lee Revell writes:

> As I understand it there will still be a place for the
> current hard-RT Linux solutions, because even if I can
> get five nines latency better than N, this is not good
> enough for hard RT, as you need to be able to mathematically
> demonstrate that you can *never* miss a deadline.

Nah, that's academic theory. There is no such thing
as hard-RT in the real world.

In reality, there's no point in making the software far
more reliable than the hardware, power supply, and so on.
Somebody may pour a can of Mountain Dew into the vent holes.

Your software is OK as long as other causes of failure
are much more likely. One might even say you spent too
much of your budget perfecting the software! In the end it
all comes down to $$$ (or Euros, or Yen...), doesn't it?

People don't mathematically demonstrate anything about
modern systems, at least not while being honest. Modern
systems have cache memory, interrupts. compiled code...
Use an Intel 4004 if you want mathematical proofs, and
even then, remember the can of Mountain Dew. (and bugs!)
Heh, your proof could be buggy. Then what?

Math problem:

The cost of the system is inversly proportional to the
likelyhood of failure. Set the likelyhood of failure
to zero and solve for the cost. :-)

That won't make the customer happy.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 21:35     ` Scott Wood
@ 2004-07-29 21:08       ` Bill Huey
  2004-07-29 22:44       ` Ingo Molnar
  1 sibling, 0 replies; 30+ messages in thread
From: Bill Huey @ 2004-07-29 21:08 UTC (permalink / raw)
  To: Scott Wood
  Cc: Bill Huey, Ingo Molnar, linux-kernel, La Monte H.P. Yarroll,
	Manas Saksena

On Wed, Jul 28, 2004 at 05:35:57PM -0400, Scott Wood wrote:
> On Wed, Jul 28, 2004 at 02:23:14PM -0700, Bill Huey wrote:
> > That way I picture the problem permits those threads to migration across
> > CPUs and therefore kill interrupt performance from cache thrashing. Do
> > you have a solution for that ? I like the way you're doing it now with
> > irqd() in that it's CPU-local, but as you know it's not priority sensitive.
> 
> Wouldn't the IRQ threads be subject to the same heuristics that the
> scheduler uses with ordinary threads, in order to avoid unnecessary
> CPU migration?  Plus, IRQs ordinarily get distributed across CPUs,
> and in most cases shouldn't have a very large cache footprint
> (especially data; the code can be in multiple CPU caches at once), so
> I don't think this is a susbtantial degradation from the way things
> already are.

I get a number of gripes from SMP aware folks that the context switching
overhead is significant as well as cache issues. That's what the concern
is about.

> If desired by the user, an IRQ thread could be bound to a specific
> CPU to avoid such problems (in which case, they'd probably want to
> set the smp_affinity of the hard IRQ stub to the same CPU).

Yeah, this is an obvious next step in order to get better performance.
Do you have number showing how this logic effects overall SMP and interrupt
performance ?

I wish I could help with this, but I'm doing other things at the moment.

bill

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-29 20:21       ` Scott Wood
@ 2004-07-29 21:12         ` Alan Cox
  0 siblings, 0 replies; 30+ messages in thread
From: Alan Cox @ 2004-07-29 21:12 UTC (permalink / raw)
  To: Scott Wood
  Cc: Ingo Molnar, Linux Kernel Mailing List, La Monte H.P. Yarroll,
	Manas Saksena

On Iau, 2004-07-29 at 21:21, Scott Wood wrote:
> The intent is to make enable_irq() robust against calls while the
> thread is still running/pending (such as if the thread has lower
> priority than the task that calls enable_irq()).  This implies that
> the preceding disable was of the _nosync() variety.
> 
> I believe we saw drivers/net/8390.c doing this, and it was causing an

8390 does a disable_irq_nosync having previously cleared the IRQ on the
controller. This is neccessary because IRQ arrival on PC hardware is
asynchronous to all other busses and can take incredibly long times on
SMP hardware prior to PIV.  Thus it happens now and then that the
controller emits an IRQ, we clear the source, the clear is done and
later the IRQ arrives that has already been cleared down on the original
IRQ source. Most drivers just use spinlocks but the 8390 is
so slow that is has to pull other stunts or even things like serial
ports and the 1Khz clock slide.

Alan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] IRQ threads
  2004-07-28 21:35     ` Scott Wood
  2004-07-29 21:08       ` Bill Huey
@ 2004-07-29 22:44       ` Ingo Molnar
  1 sibling, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2004-07-29 22:44 UTC (permalink / raw)
  To: Scott Wood; +Cc: Bill Huey, linux-kernel, La Monte H.P. Yarroll, Manas Saksena


* Scott Wood <scott@timesys.com> wrote:

> On Wed, Jul 28, 2004 at 02:23:14PM -0700, Bill Huey wrote:
> > That way I picture the problem permits those threads to migration across
> > CPUs and therefore kill interrupt performance from cache thrashing. Do
> > you have a solution for that ? I like the way you're doing it now with
> > irqd() in that it's CPU-local, but as you know it's not priority sensitive.
> 
> Wouldn't the IRQ threads be subject to the same heuristics that the
> scheduler uses with ordinary threads, in order to avoid unnecessary
> CPU migration?  Plus, IRQs ordinarily get distributed across CPUs, and
> in most cases shouldn't have a very large cache footprint (especially
> data; the code can be in multiple CPU caches at once), so I don't
> think this is a susbtantial degradation from the way things already
> are.
> 
> If desired by the user, an IRQ thread could be bound to a specific CPU
> to avoid such problems (in which case, they'd probably want to set the
> smp_affinity of the hard IRQ stub to the same CPU).

i fixed this problem in -M5 the other way around: the IRQ threads follow
the affinity settings. They will bind themselves to the first CPU in the
affinity mask and they migrate only at 'safe' points (between hardirqs).

this way e.g. user-space irqbalance will automatically move the IRQ
threads around too.

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2004-07-29 23:09 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-27 22:50 [patch] IRQ threads Scott Wood
2004-07-28  6:27 ` Ingo Molnar
2004-07-28 15:38   ` Karim Yaghmour
2004-07-28 16:01     ` Karim Yaghmour
2004-07-28 21:23   ` Bill Huey
2004-07-28 21:35     ` Scott Wood
2004-07-29 21:08       ` Bill Huey
2004-07-29 22:44       ` Ingo Molnar
2004-07-28 23:24   ` Scott Wood
2004-07-28  8:10 ` Ingo Molnar
2004-07-28 23:12   ` Scott Wood
2004-07-29 19:33     ` Ingo Molnar
2004-07-29 20:21       ` Scott Wood
2004-07-29 21:12         ` Alan Cox
2004-07-28 15:45 ` Karim Yaghmour
2004-07-28 18:28   ` Lee Revell
2004-07-28 19:12     ` Karim Yaghmour
2004-07-28 19:33       ` Lee Revell
2004-07-28 19:57         ` Karim Yaghmour
2004-07-28 20:35           ` Lee Revell
2004-07-28 21:15             ` Karim Yaghmour
2004-07-28 21:43               ` Lee Revell
2004-07-28 21:38                 ` Karim Yaghmour
2004-07-28 20:21         ` Bill Huey
2004-07-28 20:42           ` Lee Revell
2004-07-28 20:46             ` Bill Huey
2004-07-28 21:48           ` Karim Yaghmour
2004-07-28 22:30             ` Bill Huey
2004-07-28 22:03           ` Philippe Gerum
  -- strict thread matches above, loose matches on Subject: below --
2004-07-29 20:33 Albert Cahalan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox