[PATCH] local_irq_disable removal

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] local_irq_disable removal
@ 2005-06-08  7:08 Daniel Walker
  2005-06-08 11:21 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 86+ messages in thread
From: Daniel Walker @ 2005-06-08  7:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, sdietrich

Introduction:

        The current Real-Time patch from Ingo Molnar significantly
modifies several kernel subsystems. One such system is the 
interrupt handling system. With the current Real Time Linux kernel,
interrupts are deferred and scheduled inside of threads. This 
means most interrupts are delayed , and are run along with all other
threads and user space processes. The result of this is that a minimal
amount of code runs in actual interrupt context. The code executing in
interrupt context only unblocks, (wakes-up) interrupt threads, and then
immediately returns to process context.


        Usually a driver that implements an interrupt handler, along
with system calls or IO controls, would need to disable interrupts to
stop the interrupt handler from running. In this type of driver you must
disable interrupts to have control over when the interrupt runs, which
is essential to serialize data access in the driver. However, this type
of paradigm changes in the face of threaded interrupt handlers.

Theory:

        The interrupt in thread model changes the driver paradigm that I
describe above in one fundamental way, the interrupt handler 
no longer runs in interrupt context. With this one change it is still
possible to stop the interrupt handler from running by disabling 
interrupts. Which is what the current real time patch does. Since
interrupt handlers are now threads, to stop an interrupt handler from
running one must only disable preemption.

Implementation:

        I've written code that removes 70% all interrupt disable
sections in the current real time kernel. These interrupt disable 
sections are replaced with a special preempt disable section. Since an
interrupt disable section implicitly disables preemption, there 
should be no increase in preemption latency due to this change. 

        There is still a need for interrupt disable sections. I've
reassigned local_irq_disable() as hard_local_irq_disable() . One 
would now run this "hard" macro to disable interrupts, with the older
non-hard types re-factored to disable preemption.
 
        Since only a small stub of code runs in interrupt context, we
only need to protect the data structures accessed by that small 
piece of code. This included all of the interrupt handling subsystem. It
also included parts of the scheduler that are used to change 
the process state.

        An intended side effect of this implementation is that if a user
designates an interrupt handler with the flag SA_NODELAY , then that
interrupt should have a very low latency characteristic. Currently the
timer is the only SA_NODELAY interrupt.

Results:

        Config option |         Number of cli's
        PREEMPT                 1138 (0% removal)
        PREEMPT_RT              224  (80% removal)
        RT_IRQ_DISABLE          69   (94% removal)

        PREEMPT_RT displays a significant reduction over plain PREEMPT
due to the fact that mutex converted spinlocks no longer disable
interrupts. However, PREEMPT_RT doesn't give a fixed number of 
interrupt disable sections in the kernel. In HARD_RT there is a fixed
number of interrupt disable sections and it further reduces the 
total to 30% of PREEMPT_RT (or %6 of PREEMPT).

        With a fixed number of interrupt disable sections we can give a
set worst case interrupt latency. This holds no matter what drivers, or
system config is used. The one current exception relates to the
xtime_lock. This exception is because this lock is used in the timer
interrupt which is not in a thread.

        This is a work in progress , and is still volatile. It is not
tested fully on SMP. Raw spinlocks no longer disable interrupts, and it
is unclear what the SMP impact is. So please test on SMP. The irq_trace
feature could cause hangs if it's used in the wrong places, so be
careful. IRQ latency and latency tracing have been modified, but still
require some testing.

        This patch applies on top of the RT patch provided by Ingo
Molnar. There is to much instability after 0.7.47-19 , so my patch is
recommend on this version.

You may download the -19 RT patch from the following location,

http://people.redhat.com/~mingo/realtime-preempt/older/realtime-preempt-2.6.12-rc6-V0.7.47-19

Future work:

        As described above , there are only a few areas that need a true
interrupt disable. It would now be possible to measure all interrupt
disable sections in every kernel when this feature is turned on. 
A definition like this would allow the biggest interrupt disable section
to be defined exactly. Once these sections are defined we would then be
able to optimize each one, producing ever decrease interrupt latency.

        Another optimization to this system would be to produce a method
so that local_irq_disable only prevents interrupt threads from running,
instead of the current method of preventing all threads and processes
from running. The biggest problem in doing this is the balance of the
average size of an interrupt disable section vs. the length of time it
takes to soft disable/enable. 

Thanks to Sven Dietrich

Signed-Off-By: Daniel Walker <dwalker@mvista.com>

Index: linux-2.6.11/arch/i386/kernel/entry.S
===================================================================
--- linux-2.6.11.orig/arch/i386/kernel/entry.S	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/arch/i386/kernel/entry.S	2005-06-08 00:35:30.000000000 +0000
@@ -331,6 +331,9 @@ work_pending:
 work_resched:
 	cli
 	call __schedule
+#ifdef CONFIG_RT_IRQ_DISABLE
+	call local_irq_enable_noresched
+#endif
 					# make sure we don't miss an interrupt
 					# setting need_resched or sigpending
 					# between sampling and the iret
Index: linux-2.6.11/arch/i386/kernel/process.c
===================================================================
--- linux-2.6.11.orig/arch/i386/kernel/process.c	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/arch/i386/kernel/process.c	2005-06-08 06:29:52.000000000 +0000
@@ -96,13 +96,13 @@ EXPORT_SYMBOL(enable_hlt);
 void default_idle(void)
 {
 	if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
-		local_irq_disable();
+		hard_local_irq_disable();
 		if (!need_resched())
-			safe_halt();
+			hard_safe_halt();
 		else
-			local_irq_enable();
+			hard_local_irq_enable();
 	} else {
-		local_irq_enable();
+		hard_local_irq_enable();
 		cpu_relax();
 	}
 }
@@ -149,6 +149,9 @@ void cpu_idle (void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
+#ifdef CONFIG_RT_IRQ_DISABLE
+			BUG_ON(hard_irqs_disabled());
+#endif
 		while (!need_resched()) {
 			void (*idle)(void);
 
@@ -165,7 +168,9 @@ void cpu_idle (void)
 			stop_critical_timing();
 			idle();
 		}
+		hard_local_irq_disable();
 		__schedule();
+		hard_local_irq_enable();
 	}
 }
 
Index: linux-2.6.11/arch/i386/kernel/signal.c
===================================================================
--- linux-2.6.11.orig/arch/i386/kernel/signal.c	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/arch/i386/kernel/signal.c	2005-06-08 00:44:00.000000000 +0000
@@ -597,7 +597,7 @@ int fastcall do_signal(struct pt_regs *r
 	/*
 	 * Fully-preemptible kernel does not need interrupts disabled:
 	 */
-	local_irq_enable();
+	hard_local_irq_enable();
 	preempt_check_resched();
 #endif
 	/*
Index: linux-2.6.11/arch/i386/kernel/traps.c
===================================================================
--- linux-2.6.11.orig/arch/i386/kernel/traps.c	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/arch/i386/kernel/traps.c	2005-06-08 00:45:11.000000000 +0000
@@ -376,7 +376,7 @@ static void do_trap(int trapnr, int sign
 		goto kernel_trap;
 
 #ifdef CONFIG_PREEMPT_RT
-	local_irq_enable();
+	hard_local_irq_enable();
 	preempt_check_resched();
 #endif
 
@@ -508,7 +508,7 @@ fastcall void do_general_protection(stru
 	return;
 
 gp_in_vm86:
-	local_irq_enable();
+	hard_local_irq_enable();
 	handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
 	return;
 
@@ -705,7 +705,7 @@ fastcall void do_debug(struct pt_regs * 
 		return;
 	/* It's safe to allow irq's after DR6 has been saved */
 	if (regs->eflags & X86_EFLAGS_IF)
-		local_irq_enable();
+		hard_local_irq_enable();
 
 	/* Mask out spurious debug traps due to lazy DR7 setting */
 	if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
Index: linux-2.6.11/arch/i386/mm/fault.c
===================================================================
--- linux-2.6.11.orig/arch/i386/mm/fault.c	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/arch/i386/mm/fault.c	2005-06-08 00:35:30.000000000 +0000
@@ -232,7 +232,7 @@ fastcall notrace void do_page_fault(stru
 		return;
 	/* It's safe to allow irq's after cr2 has been saved */
 	if (regs->eflags & (X86_EFLAGS_IF|VM_MASK))
-		local_irq_enable();
+		hard_local_irq_enable();
 
 	tsk = current;
 
Index: linux-2.6.11/include/asm-i386/system.h
===================================================================
--- linux-2.6.11.orig/include/asm-i386/system.h	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/include/asm-i386/system.h	2005-06-08 00:35:30.000000000 +0000
@@ -450,28 +450,70 @@ struct alt_instr { 
 # define trace_irqs_on()		do { } while (0)
 #endif
 
+#ifdef CONFIG_RT_IRQ_DISABLE
+extern void local_irq_enable(void);
+extern void local_irq_enable_noresched(void);
+extern void local_irq_disable(void);
+extern void local_irq_restore(unsigned long);
+extern unsigned long irqs_disabled(void);
+extern unsigned long irqs_disabled_flags(unsigned long);
+extern unsigned int ___local_save_flags(void);
+extern void irq_trace_enable(void);
+extern void irq_trace_disable(void);
+
+#define local_save_flags(x) ({ x = ___local_save_flags(); x;})
+#define local_irq_save(x) ({ local_save_flags(x); local_irq_disable(); x;})
+#define safe_halt()	do { local_irq_enable(); __asm__ __volatile__("hlt": : :"memory"); } while (0)
+
+/* Force the softstate to follow the hard state */
+#define hard_local_save_flags(x)	_hard_local_save_flags(x)
+#define hard_local_irq_enable()		do { local_irq_enable_noresched(); _hard_local_irq_enable(); } while(0)
+#define hard_local_irq_disable()	do { _hard_local_irq_disable(); local_irq_disable(); } while(0)
+#define hard_local_irq_save(x)		do { _hard_local_irq_save(x); } while(0)
+#define hard_local_irq_restore(x)		do { _hard_local_irq_restore(x); } while (0)
+#define hard_safe_halt()			do { local_irq_enable(); _hard_safe_halt(); } while (0)
+#else
+
+#define hard_local_save_flags		_hard_local_save_flags
+#define hard_local_irq_enable		_hard_local_irq_enable
+#define hard_local_irq_disable		_hard_local_irq_disable
+#define hard_local_irq_save		_hard_local_irq_save
+#define hard_local_irq_restore		_hard_local_irq_restore
+#define hard_safe_halt			_hard_safe_halt
+
+#define local_irq_enable_noresched _hard_local_irq_enable
+#define local_save_flags	_hard_local_save_flags
+#define local_irq_enable	_hard_local_irq_enable 
+#define local_irq_disable	_hard_local_irq_disable 
+#define local_irq_save		_hard_local_irq_save
+#define local_irq_restore	_hard_local_irq_restore
+#define irqs_disabled		hard_irqs_disabled
+#define irqs_disabled_flags	hard_irqs_disabled_flags
+#define safe_halt		hard_safe_halt
+#endif
+
 /* interrupt control.. */
-#define local_save_flags(x)	do { typecheck(unsigned long,x); __asm__ __volatile__("pushfl ; popl %0":"=g" (x): /* no input */); } while (0)
-#define local_irq_restore(x) 	do { typecheck(unsigned long,x); if (irqs_disabled_flags(x)) trace_irqs_on(); else trace_irqs_on(); __asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory", "cc"); } while (0)
-#define local_irq_disable() 	do { __asm__ __volatile__("cli": : :"memory"); trace_irqs_off(); } while (0)
-#define local_irq_enable()	do { trace_irqs_on(); __asm__ __volatile__("sti": : :"memory"); } while (0)
+#define _hard_local_save_flags(x)	do { typecheck(unsigned long,x); __asm__ __volatile__("pushfl ; popl %0":"=g" (x): /* no input */); } while (0)
+#define _hard_local_irq_restore(x) 	do { typecheck(unsigned long,x); if (hard_irqs_disabled_flags(x)) trace_irqs_on(); else trace_irqs_on(); __asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory", "cc"); } while (0)
+#define _hard_local_irq_disable() 	do { __asm__ __volatile__("cli": : :"memory"); trace_irqs_off(); } while (0)
+#define _hard_local_irq_enable()	do { trace_irqs_on(); __asm__ __volatile__("sti": : :"memory"); } while (0)
 /* used in the idle loop; sti takes one instruction cycle to complete */
-#define safe_halt()		do { trace_irqs_on(); __asm__ __volatile__("sti; hlt": : :"memory"); } while (0)
+#define _hard_safe_halt()		do { trace_irqs_on(); __asm__ __volatile__("sti; hlt": : :"memory"); } while (0)
 
-#define irqs_disabled_flags(flags)	\
+#define hard_irqs_disabled_flags(flags)	\
 ({					\
 	!(flags & (1<<9));		\
 })
 
-#define irqs_disabled()			\
+#define hard_irqs_disabled()			\
 ({					\
 	unsigned long flags;		\
-	local_save_flags(flags);	\
-	irqs_disabled_flags(flags);	\
+	hard_local_save_flags(flags);	\
+	hard_irqs_disabled_flags(flags);	\
 })
 
 /* For spinlocks etc */
-#define local_irq_save(x)	do { __asm__ __volatile__("pushfl ; popl %0 ; cli":"=g" (x): /* no input */ :"memory"); trace_irqs_off(); } while (0)
+#define _hard_local_irq_save(x)	do { __asm__ __volatile__("pushfl ; popl %0 ; cli":"=g" (x): /* no input */ :"memory"); trace_irqs_off(); } while (0)
 
 /*
  * disable hlt during certain critical i/o operations
Index: linux-2.6.11/include/linux/hardirq.h
===================================================================
--- linux-2.6.11.orig/include/linux/hardirq.h	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/include/linux/hardirq.h	2005-06-08 00:48:03.000000000 +0000
@@ -25,6 +25,9 @@
 #define PREEMPT_BITS	8
 #define SOFTIRQ_BITS	8
 
+#define IRQSOFF_BITS	1 
+#define PREEMPTACTIVE_BITS	1 
+
 #ifndef HARDIRQ_BITS
 #define HARDIRQ_BITS	12
 /*
@@ -40,16 +43,22 @@
 #define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
 #define HARDIRQ_SHIFT	(SOFTIRQ_SHIFT + SOFTIRQ_BITS)
 
+#define PREEMPTACTIVE_SHIFT	(HARDIRQ_SHIFT + HARDIRQ_BITS)
+#define IRQSOFF_SHIFT 		(PREEMPTACTIVE_SHIFT + PREEMPTACTIVE_BITS)
+
 #define __IRQ_MASK(x)	((1UL << (x))-1)
 
 #define PREEMPT_MASK	(__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
 #define SOFTIRQ_MASK	(__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
 #define HARDIRQ_MASK	(__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
 
+#define IRQSOFF_MASK	(__IRQ_MASK(IRQSOFF_BITS) << IRQSOFF_SHIFT)
+
 #define PREEMPT_OFFSET	(1UL << PREEMPT_SHIFT)
 #define SOFTIRQ_OFFSET	(1UL << SOFTIRQ_SHIFT)
 #define HARDIRQ_OFFSET	(1UL << HARDIRQ_SHIFT)
 
+#define IRQSOFF_OFFSET        (1UL << IRQSOFF_SHIFT)
 #if PREEMPT_ACTIVE < (1 << (HARDIRQ_SHIFT + HARDIRQ_BITS))
 #error PREEMPT_ACTIVE is too low!
 #endif
@@ -58,6 +67,8 @@
 #define softirq_count()	(preempt_count() & SOFTIRQ_MASK)
 #define irq_count()	(preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK))
 
+#define irqs_off()	(preempt_count() & IRQSOFF_MASK)
+
 /*
  * Are we doing bottom half or hardware interrupt processing?
  * Are we in a softirq context? Interrupt context?
@@ -69,9 +80,9 @@
 #if defined(CONFIG_PREEMPT) && \
 	!defined(CONFIG_PREEMPT_BKL) && \
 		!defined(CONFIG_PREEMPT_RT)
-# define in_atomic()	((preempt_count() & ~PREEMPT_ACTIVE) != kernel_locked())
+# define in_atomic()	((preempt_count() & ~(PREEMPT_ACTIVE|IRQSOFF_OFFSET)) != kernel_locked())
 #else
-# define in_atomic()	((preempt_count() & ~PREEMPT_ACTIVE) != 0)
+# define in_atomic()	((preempt_count() & ~(PREEMPT_ACTIVE|IRQSOFF_OFFSET)) != 0)
 #endif
 
 #ifdef CONFIG_PREEMPT
Index: linux-2.6.11/include/linux/sched.h
===================================================================
--- linux-2.6.11.orig/include/linux/sched.h	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/include/linux/sched.h	2005-06-08 00:35:30.000000000 +0000
@@ -839,6 +839,9 @@ struct task_struct {
 	unsigned long preempt_trace_eip[MAX_PREEMPT_TRACE];
 	unsigned long preempt_trace_parent_eip[MAX_PREEMPT_TRACE];
 #endif
+	unsigned long last_irq_disable[2];
+	unsigned long last_irq_enable[2];
+
 
 	/* realtime bits */
 	struct list_head delayed_put;
Index: linux-2.6.11/include/linux/seqlock.h
===================================================================
--- linux-2.6.11.orig/include/linux/seqlock.h	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/include/linux/seqlock.h	2005-06-08 00:35:30.000000000 +0000
@@ -305,26 +305,26 @@ do {								\
  * Possible sw/hw IRQ protected versions of the interfaces.
  */
 #define write_seqlock_irqsave(lock, flags)				\
-	do { PICK_IRQOP2(local_irq_save, flags, lock); write_seqlock(lock); } while (0)
+	do { PICK_IRQOP2(hard_local_irq_save, flags, lock); write_seqlock(lock); } while (0)
 #define write_seqlock_irq(lock)						\
-	do { PICK_IRQOP(local_irq_disable, lock); write_seqlock(lock); } while (0)
+	do { PICK_IRQOP(hard_local_irq_disable, lock); write_seqlock(lock); } while (0)
 #define write_seqlock_bh(lock)						\
         do { PICK_IRQOP(local_bh_disable, lock); write_seqlock(lock); } while (0)
 
 #define write_sequnlock_irqrestore(lock, flags)				\
-	do { write_sequnlock(lock); PICK_IRQOP2(local_irq_restore, flags, lock); preempt_check_resched(); } while(0)
+	do { write_sequnlock(lock); PICK_IRQOP2(hard_local_irq_restore, flags, lock); preempt_check_resched(); } while(0)
 #define write_sequnlock_irq(lock)					\
-	do { write_sequnlock(lock); PICK_IRQOP(local_irq_enable, lock); preempt_check_resched(); } while(0)
+	do { write_sequnlock(lock); PICK_IRQOP(hard_local_irq_enable, lock); preempt_check_resched(); } while(0)
 #define write_sequnlock_bh(lock)					\
 	do { write_sequnlock(lock); PICK_IRQOP(local_bh_enable, lock); } while(0)
 
 #define read_seqbegin_irqsave(lock, flags)				\
-	({ PICK_IRQOP2(local_irq_save, flags, lock); read_seqbegin(lock); })
+	({ PICK_IRQOP2(hard_local_irq_save, flags, lock); read_seqbegin(lock); })
 
 #define read_seqretry_irqrestore(lock, iv, flags)			\
 	({								\
 		int ret = read_seqretry(lock, iv);			\
-		PICK_IRQOP2(local_irq_restore, flags, lock);		\
+		PICK_IRQOP2(hard_local_irq_restore, flags, lock);		\
 		preempt_check_resched(); 				\
 		ret;							\
 	})
Index: linux-2.6.11/include/linux/spinlock.h
===================================================================
--- linux-2.6.11.orig/include/linux/spinlock.h	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/include/linux/spinlock.h	2005-06-08 00:35:30.000000000 +0000
@@ -244,7 +244,7 @@ typedef struct {
 ({ \
 	local_irq_disable(); preempt_disable(); \
 	__raw_spin_trylock(lock) ? \
-	1 : ({ __preempt_enable_no_resched(); local_irq_enable(); preempt_check_resched(); 0; }); \
+	1 : ({ __preempt_enable_no_resched(); local_irq_enable_noresched(); preempt_check_resched(); 0; }); \
 })
 
 #define _raw_spin_trylock_irqsave(lock, flags) \
@@ -384,7 +384,7 @@ do { \
 do { \
 	__raw_spin_unlock(lock); \
 	__preempt_enable_no_resched(); \
-	local_irq_enable(); \
+	local_irq_enable_noresched(); \
 	preempt_check_resched(); \
 	__release(lock); \
 } while (0)
@@ -427,7 +427,7 @@ do { \
 do { \
 	__raw_read_unlock(lock);\
 	__preempt_enable_no_resched(); \
-	local_irq_enable();	\
+	local_irq_enable_noresched();	\
 	preempt_check_resched(); \
 	__release(lock); \
 } while (0)
@@ -444,7 +444,7 @@ do { \
 do { \
 	__raw_write_unlock(lock);\
 	__preempt_enable_no_resched(); \
-	local_irq_enable();	\
+	local_irq_enable_noresched();	\
 	preempt_check_resched(); \
 	__release(lock); \
 } while (0)
Index: linux-2.6.11/init/main.c
===================================================================
--- linux-2.6.11.orig/init/main.c	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/init/main.c	2005-06-08 00:51:57.000000000 +0000
@@ -428,6 +428,14 @@ asmlinkage void __init start_kernel(void
 {
 	char * command_line;
 	extern struct kernel_param __start___param[], __stop___param[];
+#ifdef CONFIG_RT_IRQ_DISABLE
+	/* 
+ 	 * Force the soft IRQ state to mimic the hard state until
+	 * we finish boot-up.
+	 */
+	local_irq_disable();
+#endif
+
 /*
  * Interrupts are still disabled. Do necessary setups, then
  * enable them
@@ -456,6 +464,13 @@ asmlinkage void __init start_kernel(void
 	 * fragile until we cpu_idle() for the first time.
 	 */
 	preempt_disable();
+#ifdef CONFIG_RT_IRQ_DISABLE
+	/*
+	 * Reset the irqs off flag after sched_init resets the preempt_count.
+	 */
+	local_irq_disable();
+#endif
+
 	build_all_zonelists();
 	page_alloc_init();
 	early_init_hardirqs();
@@ -482,7 +497,12 @@ asmlinkage void __init start_kernel(void
 	if (panic_later)
 		panic(panic_later, panic_param);
 	profile_init();
-	local_irq_enable();
+
+	/*
+	 * Soft IRQ state will be enabled with the hard state.
+	 */
+	hard_local_irq_enable();
+
 #ifdef CONFIG_BLK_DEV_INITRD
 	if (initrd_start && !initrd_below_start_ok &&
 			initrd_start < min_low_pfn << PAGE_SHIFT) {
@@ -526,6 +546,9 @@ asmlinkage void __init start_kernel(void
 
 	acpi_early_init(); /* before LAPIC and SMP init */
 
+#ifdef CONFIG_RT_IRQ_DISABLE
+	WARN_ON(hard_irqs_disabled());
+#endif
 	/* Do the rest non-__init'ed, we're now alive */
 	rest_init();
 }
@@ -568,6 +591,12 @@ static void __init do_initcalls(void)
 			msg = "disabled interrupts";
 			local_irq_enable();
 		}
+#ifdef CONFIG_RT_IRQ_DISABLE
+		if (hard_irqs_disabled()) {
+			msg = "disabled hard interrupts";
+			hard_local_irq_enable();
+		}
+#endif
 		if (msg) {
 			printk(KERN_WARNING "error in initcall at 0x%p: "
 				"returned with %s\n", *call, msg);
@@ -708,6 +737,9 @@ static int init(void * unused)
 	 * The Bourne shell can be used instead of init if we are 
 	 * trying to recover a really broken machine.
 	 */
+#ifdef CONFIG_RT_IRQ_DISABLE
+	WARN_ON(hard_irqs_disabled());
+#endif
 
 	if (execute_command)
 		run_init_process(execute_command);
Index: linux-2.6.11/kernel/Makefile
===================================================================
--- linux-2.6.11.orig/kernel/Makefile	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/kernel/Makefile	2005-06-08 00:35:30.000000000 +0000
@@ -10,6 +10,7 @@ obj-y     = sched.o fork.o exec_domain.o
 	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o
 
 obj-$(CONFIG_PREEMPT_RT) += rt.o
+obj-$(CONFIG_RT_IRQ_DISABLE) += irqs-off.o
 
 obj-$(CONFIG_DEBUG_PREEMPT) += latency.o
 obj-$(CONFIG_LATENCY_TIMING) += latency.o
Index: linux-2.6.11/kernel/exit.c
===================================================================
--- linux-2.6.11.orig/kernel/exit.c	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/kernel/exit.c	2005-06-08 00:35:30.000000000 +0000
@@ -844,7 +844,7 @@ fastcall NORET_TYPE void do_exit(long co
 	check_no_held_locks(tsk);
 	/* PF_DEAD causes final put_task_struct after we schedule. */
 again:
-	local_irq_disable();
+	hard_local_irq_disable();
 	tsk->flags |= PF_DEAD;
 	__schedule();
 	printk(KERN_ERR "BUG: dead task %s:%d back from the grave!\n",
Index: linux-2.6.11/kernel/irq/autoprobe.c
===================================================================
--- linux-2.6.11.orig/kernel/irq/autoprobe.c	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/kernel/irq/autoprobe.c	2005-06-08 00:35:30.000000000 +0000
@@ -39,10 +39,12 @@ unsigned long probe_irq_on(void)
 	for (i = NR_IRQS-1; i > 0; i--) {
 		desc = irq_desc + i;
 
-		spin_lock_irq(&desc->lock);
+		hard_local_irq_disable();
+		spin_lock(&desc->lock);
 		if (!irq_desc[i].action)
 			irq_desc[i].handler->startup(i);
-		spin_unlock_irq(&desc->lock);
+		spin_unlock(&desc->lock);
+		hard_local_irq_enable();
 	}
 
 	/*
@@ -58,13 +60,15 @@ unsigned long probe_irq_on(void)
 	for (i = NR_IRQS-1; i > 0; i--) {
 		desc = irq_desc + i;
 
-		spin_lock_irq(&desc->lock);
+		hard_local_irq_disable();
+		spin_lock(&desc->lock);
 		if (!desc->action) {
 			desc->status |= IRQ_AUTODETECT | IRQ_WAITING;
 			if (desc->handler->startup(i))
 				desc->status |= IRQ_PENDING;
 		}
-		spin_unlock_irq(&desc->lock);
+		spin_unlock(&desc->lock);
+		hard_local_irq_enable();
 	}
 
 	/*
@@ -80,7 +84,8 @@ unsigned long probe_irq_on(void)
 		irq_desc_t *desc = irq_desc + i;
 		unsigned int status;
 
-		spin_lock_irq(&desc->lock);
+		hard_local_irq_disable();
+		spin_lock(&desc->lock);
 		status = desc->status;
 
 		if (status & IRQ_AUTODETECT) {
@@ -92,7 +97,8 @@ unsigned long probe_irq_on(void)
 				if (i < 32)
 					val |= 1 << i;
 		}
-		spin_unlock_irq(&desc->lock);
+		spin_unlock(&desc->lock);
+		hard_local_irq_enable();
 	}
 
 	return val;
@@ -122,7 +128,8 @@ unsigned int probe_irq_mask(unsigned lon
 		irq_desc_t *desc = irq_desc + i;
 		unsigned int status;
 
-		spin_lock_irq(&desc->lock);
+		hard_local_irq_disable();
+		spin_lock(&desc->lock);
 		status = desc->status;
 
 		if (status & IRQ_AUTODETECT) {
@@ -132,7 +139,8 @@ unsigned int probe_irq_mask(unsigned lon
 			desc->status = status & ~IRQ_AUTODETECT;
 			desc->handler->shutdown(i);
 		}
-		spin_unlock_irq(&desc->lock);
+		spin_unlock(&desc->lock);
+		hard_local_irq_enable();
 	}
 	up(&probe_sem);
 
@@ -165,7 +173,8 @@ int probe_irq_off(unsigned long val)
 		irq_desc_t *desc = irq_desc + i;
 		unsigned int status;
 
-		spin_lock_irq(&desc->lock);
+		hard_local_irq_disable();
+		spin_lock(&desc->lock);
 		status = desc->status;
 
 		if (status & IRQ_AUTODETECT) {
@@ -177,7 +186,8 @@ int probe_irq_off(unsigned long val)
 			desc->status = status & ~IRQ_AUTODETECT;
 			desc->handler->shutdown(i);
 		}
-		spin_unlock_irq(&desc->lock);
+		spin_unlock(&desc->lock);
+		hard_local_irq_enable();
 	}
 	up(&probe_sem);
 
Index: linux-2.6.11/kernel/irq/handle.c
===================================================================
--- linux-2.6.11.orig/kernel/irq/handle.c	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/kernel/irq/handle.c	2005-06-08 00:35:30.000000000 +0000
@@ -113,7 +113,7 @@ fastcall int handle_IRQ_event(unsigned i
 	 * IRQ handlers:
 	 */
 	if (!hardirq_count() || !(action->flags & SA_INTERRUPT))
-		local_irq_enable();
+		hard_local_irq_enable();
 
 	do {
 		unsigned int preempt_count = preempt_count();
@@ -133,10 +133,10 @@ fastcall int handle_IRQ_event(unsigned i
 	} while (action);
 
 	if (status & SA_SAMPLE_RANDOM) {
-		local_irq_enable();
+		hard_local_irq_enable();
 		add_interrupt_randomness(irq);
 	}
-	local_irq_disable();
+	hard_local_irq_disable();
 
 	return retval;
 }
@@ -157,6 +157,10 @@ fastcall notrace unsigned int __do_IRQ(u
 	struct irqaction * action;
 	unsigned int status;
 
+#ifdef CONFIG_RT_IRQ_DISABLE
+	unsigned long flags;
+	local_irq_save(flags);
+#endif
 	kstat_this_cpu.irqs[irq]++;
 	if (desc->status & IRQ_PER_CPU) {
 		irqreturn_t action_ret;
@@ -241,6 +245,9 @@ out:
 out_no_end:
 	spin_unlock(&desc->lock);
 
+#ifdef CONFIG_RT_IRQ_DISABLE
+	local_irq_restore(flags);
+#endif
 	return 1;
 }
 
Index: linux-2.6.11/kernel/irq/manage.c
===================================================================
--- linux-2.6.11.orig/kernel/irq/manage.c	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/kernel/irq/manage.c	2005-06-08 00:55:05.000000000 +0000
@@ -59,13 +59,15 @@ void disable_irq_nosync(unsigned int irq
 {
 	irq_desc_t *desc = irq_desc + irq;
 	unsigned long flags;
-
-	spin_lock_irqsave(&desc->lock, flags);
+	
+	_hard_local_irq_save(flags);
+	spin_lock(&desc->lock);
 	if (!desc->depth++) {
 		desc->status |= IRQ_DISABLED;
 		desc->handler->disable(irq);
 	}
-	spin_unlock_irqrestore(&desc->lock, flags);
+	spin_unlock(&desc->lock);
+	_hard_local_irq_restore(flags);
 }
 
 EXPORT_SYMBOL(disable_irq_nosync);
@@ -108,7 +110,8 @@ void enable_irq(unsigned int irq)
 	irq_desc_t *desc = irq_desc + irq;
 	unsigned long flags;
 
-	spin_lock_irqsave(&desc->lock, flags);
+	_hard_local_irq_save(flags);
+	spin_lock(&desc->lock);
 	switch (desc->depth) {
 	case 0:
 		WARN_ON(1);
@@ -127,7 +130,8 @@ void enable_irq(unsigned int irq)
 	default:
 		desc->depth--;
 	}
-	spin_unlock_irqrestore(&desc->lock, flags);
+	spin_unlock(&desc->lock);
+	_hard_local_irq_restore(flags);
 }
 
 EXPORT_SYMBOL(enable_irq);
@@ -203,7 +207,8 @@ int setup_irq(unsigned int irq, struct i
 	/*
 	 * The following block of code has to be executed atomically
 	 */
-	spin_lock_irqsave(&desc->lock,flags);
+	_hard_local_irq_save(flags);
+	spin_lock(&desc->lock);
 	p = &desc->action;
 	if ((old = *p) != NULL) {
 		/* Can't share interrupts unless both agree to */
@@ -236,7 +241,8 @@ int setup_irq(unsigned int irq, struct i
 		else
 			desc->handler->enable(irq);
 	}
-	spin_unlock_irqrestore(&desc->lock,flags);
+	spin_unlock(&desc->lock);
+	_hard_local_irq_restore(flags);
 
 	new->irq = irq;
 	register_irq_proc(irq);
@@ -270,7 +276,8 @@ void free_irq(unsigned int irq, void *de
 		return;
 
 	desc = irq_desc + irq;
-	spin_lock_irqsave(&desc->lock,flags);
+	_hard_local_irq_save(flags);
+	spin_lock(&desc->lock);
 	p = &desc->action;
 	for (;;) {
 		struct irqaction * action = *p;
@@ -292,7 +299,8 @@ void free_irq(unsigned int irq, void *de
 					desc->handler->disable(irq);
 			}
 			recalculate_desc_flags(desc);
-			spin_unlock_irqrestore(&desc->lock,flags);
+			spin_unlock(&desc->lock);
+			_hard_local_irq_restore(flags);
 			unregister_handler_proc(irq, action);
 
 			/* Make sure it's not being used on another CPU */
@@ -301,7 +309,8 @@ void free_irq(unsigned int irq, void *de
 			return;
 		}
 		printk(KERN_ERR "Trying to free free IRQ%d\n",irq);
-		spin_unlock_irqrestore(&desc->lock,flags);
+		spin_unlock(&desc->lock);
+		_hard_local_irq_restore(flags);
 		return;
 	}
 }
@@ -409,7 +418,7 @@ static void do_hardirq(struct irq_desc *
 	struct irqaction * action;
 	unsigned int irq = desc - irq_desc;
 
-	local_irq_disable();
+	hard_local_irq_disable();
 
 	if (desc->status & IRQ_INPROGRESS) {
 		action = desc->action;
@@ -420,9 +429,10 @@ static void do_hardirq(struct irq_desc *
 			if (action) {
 				spin_unlock(&desc->lock);
 				action_ret = handle_IRQ_event(irq, NULL,action);
-				local_irq_enable();
+				hard_local_irq_enable();
 				cond_resched_all();
-				spin_lock_irq(&desc->lock);
+				hard_local_irq_disable();
+				spin_lock(&desc->lock);
 			}
 			if (!noirqdebug)
 				note_interrupt(irq, desc, action_ret);
@@ -438,7 +448,7 @@ static void do_hardirq(struct irq_desc *
 		desc->handler->end(irq);
 		spin_unlock(&desc->lock);
 	}
-	local_irq_enable();
+	hard_local_irq_enable();
 	if (waitqueue_active(&desc->wait_for_handler))
 		wake_up(&desc->wait_for_handler);
 }
@@ -474,7 +484,7 @@ static int do_irqd(void * __desc)
 		do_hardirq(desc);
 		cond_resched_all();
 		__do_softirq();
-		local_irq_enable();
+		hard_local_irq_enable();
 #ifdef CONFIG_SMP
 		/*
 		 * Did IRQ affinities change?
Index: linux-2.6.11/kernel/irqs-off.c
===================================================================
--- linux-2.6.11.orig/kernel/irqs-off.c	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.11/kernel/irqs-off.c	2005-06-08 01:05:41.000000000 +0000
@@ -0,0 +1,99 @@
+/*
+ * kernel/irqs-off.c 
+ *
+ * IRQ soft state managment 
+ *
+ * Author: Daniel Walker <dwalker@mvista.com>
+ *
+ * 2005 (c) MontaVista Software, Inc. This file is licensed under
+ * the terms of the GNU General Public License version 2. This program
+ * is licensed "as is" without any warranty of any kind, whether express
+ * or implied.
+ */
+
+#include <linux/hardirq.h>
+#include <linux/preempt.h>
+#include <linux/kallsyms.h>
+
+#include <linux/module.h>
+#include <asm/system.h>
+
+static int irq_trace;
+
+void irq_trace_enable(void) { irq_trace = 1; }
+void irq_trace_disable(void) { irq_trace = 0; }
+
+unsigned int ___local_save_flags()
+{
+	return irqs_off();
+}                                                                                                                        
+EXPORT_SYMBOL(___local_save_flags);
+
+void local_irq_enable_noresched(void)
+{
+	if (irq_trace) {
+		current->last_irq_enable[0] = (unsigned long)__builtin_return_address(0);
+		//current->last_irq_enable[1] = (unsigned long)__builtin_return_address(1);
+	}
+
+	if (irqs_off()) sub_preempt_count(IRQSOFF_OFFSET);
+}
+EXPORT_SYMBOL(local_irq_enable_noresched);
+
+void local_irq_enable(void)
+{
+	if (irq_trace) {
+		current->last_irq_enable[0] = (unsigned long)__builtin_return_address(0);
+		//current->last_irq_enable[1] = (unsigned long)__builtin_return_address(1);
+	}
+	if (irqs_off()) sub_preempt_count(IRQSOFF_OFFSET);
+
+	//local_irq_enable_noresched();
+	preempt_check_resched(); 
+}
+EXPORT_SYMBOL(local_irq_enable);
+
+void local_irq_disable(void) 
+{
+	if (irq_trace) {
+		current->last_irq_disable[0] = (unsigned long)__builtin_return_address(0);
+		//current->last_irq_disable[1] = (unsigned long)__builtin_return_address(1);
+	}
+	if (!irqs_off()) add_preempt_count(IRQSOFF_OFFSET);
+}
+EXPORT_SYMBOL(local_irq_disable);
+
+unsigned long irqs_disabled_flags(unsigned long flags)
+{
+	return (flags & IRQSOFF_MASK);	
+}
+EXPORT_SYMBOL(irqs_disabled_flags);
+
+void local_irq_restore(unsigned long flags)
+{
+	if (!irqs_disabled_flags(flags)) local_irq_enable();
+}
+EXPORT_SYMBOL(local_irq_restore);
+
+unsigned long irqs_disabled(void)
+{
+	return irqs_off();
+}
+EXPORT_SYMBOL(irqs_disabled);
+
+void print_irq_traces(struct task_struct *task)
+{
+	printk("Soft state access: (%s)\n", (hard_irqs_disabled()) ? "Hard disabled" : "Not disabled");
+	printk(".. [<%08lx>] .... ", task->last_irq_disable[0]);
+	print_symbol("%s\n", task->last_irq_disable[0]);
+	printk(".....[<%08lx>] ..   ( <= ",
+				task->last_irq_disable[1]);
+	print_symbol("%s)\n", task->last_irq_disable[1]);
+
+	printk(".. [<%08lx>] .... ", task->last_irq_enable[0]);
+	print_symbol("%s\n", task->last_irq_enable[0]);
+	printk(".....[<%08lx>] ..   ( <= ",
+				task->last_irq_enable[1]);
+	print_symbol("%s)\n", task->last_irq_enable[1]);
+	printk("\n");
+}
Index: linux-2.6.11/kernel/latency.c
===================================================================
--- linux-2.6.11.orig/kernel/latency.c	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/kernel/latency.c	2005-06-08 01:05:34.000000000 +0000
@@ -108,12 +108,15 @@ enum trace_flag_type
 	TRACE_FLAG_NEED_RESCHED		= 0x02,
 	TRACE_FLAG_HARDIRQ		= 0x04,
 	TRACE_FLAG_SOFTIRQ		= 0x08,
+#ifdef CONFIG_RT_IRQ_DISABLE
+	TRACE_FLAG_IRQS_HARD_OFF	= 0x16,
+#endif
 };
 
 
 #ifdef CONFIG_LATENCY_TRACE
 
-#define MAX_TRACE (unsigned long)(4096-1)
+#define MAX_TRACE (unsigned long)(8192-1)
 
 #define CMDLINE_BYTES 16
 
@@ -263,6 +266,9 @@ ____trace(int cpu, enum trace_type type,
 		entry->cpu = cpu;
 #endif
 		entry->flags = (irqs_disabled() ? TRACE_FLAG_IRQS_OFF : 0) |
+#ifdef CONFIG_RT_IRQ_DISABLE
+			(hard_irqs_disabled() ? TRACE_FLAG_IRQS_HARD_OFF : 0)|
+#endif
 			((pc & HARDIRQ_MASK) ? TRACE_FLAG_HARDIRQ : 0) |
 			((pc & SOFTIRQ_MASK) ? TRACE_FLAG_SOFTIRQ : 0) |
 			(_need_resched() ? TRACE_FLAG_NEED_RESCHED : 0);
@@ -724,7 +730,12 @@ print_generic(struct seq_file *m, struct
 	seq_printf(m, "%8.8s-%-5d ", pid_to_cmdline(entry->pid), entry->pid);
 	seq_printf(m, "%d", entry->cpu);
 	seq_printf(m, "%c%c",
-		(entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' : '.',
+		(entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' :
+#ifdef CONFIG_RT_IRQ_DISABLE
+		(entry->flags & TRACE_FLAG_IRQS_HARD_OFF) ? 'D' : '.',
+#else
+		'.',
+#endif
 		(entry->flags & TRACE_FLAG_NEED_RESCHED) ? 'n' : '.');
 
 	hardirq = entry->flags & TRACE_FLAG_HARDIRQ;
@@ -1212,9 +1223,9 @@ void notrace trace_irqs_off_lowlevel(voi
 {
 	unsigned long flags;
 
-	local_save_flags(flags);
+	hard_local_save_flags(flags);
 
-	if (!irqs_off_preempt_count() && irqs_disabled_flags(flags))
+	if (!irqs_off_preempt_count() && hard_irqs_disabled_flags(flags))
 		__start_critical_timing(CALLER_ADDR0, 0);
 }
 
@@ -1222,9 +1233,9 @@ void notrace trace_irqs_off(void)
 {
 	unsigned long flags;
 
-	local_save_flags(flags);
+	hard_local_save_flags(flags);
 
-	if (!irqs_off_preempt_count() && irqs_disabled_flags(flags))
+	if (!irqs_off_preempt_count() && hard_irqs_disabled_flags(flags))
 		__start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
 }
 
@@ -1234,9 +1245,9 @@ void notrace trace_irqs_on(void)
 {
 	unsigned long flags;
 
-	local_save_flags(flags);
+	hard_local_save_flags(flags);
 
-	if (!irqs_off_preempt_count() && irqs_disabled_flags(flags))
+	if (!irqs_off_preempt_count() && hard_irqs_disabled_flags(flags))
 		__stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
 }
 
@@ -1633,8 +1644,13 @@ static void print_entry(struct trace_ent
 	printk("%-5d ", entry->pid);
 
 	printk("%c%c",
-		(entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' : '.',
-		(entry->flags & TRACE_FLAG_NEED_RESCHED) ? 'n' : '.');
+		(entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' :
+#ifdef CONFIG_RT_IRQ_DISABLE
+		(entry->flags & TRACE_FLAG_IRQS_HARD_OFF) ? 'D' : '.',
+#else
+		'.',
+#endif
+ 		(entry->flags & TRACE_FLAG_NEED_RESCHED) ? 'n' : '.');
 
 	hardirq = entry->flags & TRACE_FLAG_HARDIRQ;
 	softirq = entry->flags & TRACE_FLAG_SOFTIRQ;
Index: linux-2.6.11/kernel/printk.c
===================================================================
--- linux-2.6.11.orig/kernel/printk.c	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/kernel/printk.c	2005-06-08 00:35:30.000000000 +0000
@@ -529,7 +529,8 @@ asmlinkage int vprintk(const char *fmt, 
 		zap_locks();
 
 	/* This stops the holder of console_sem just where we want him */
-	spin_lock_irqsave(&logbuf_lock, flags);
+	local_irq_save(flags);
+	spin_lock(&logbuf_lock);
 
 	/* Emit the output into the temporary buffer */
 	printed_len = vscnprintf(printk_buf, sizeof(printk_buf), fmt, args);
@@ -599,16 +600,18 @@ asmlinkage int vprintk(const char *fmt, 
 		 * CPU until it is officially up.  We shouldn't be calling into
 		 * random console drivers on a CPU which doesn't exist yet..
 		 */
-		spin_unlock_irqrestore(&logbuf_lock, flags);
+		spin_unlock(&logbuf_lock);
+		local_irq_restore(flags);
 		goto out;
 	}
-	if (!down_trylock(&console_sem)) {
+	if (!in_interrupt() && !down_trylock(&console_sem)) {
 		console_locked = 1;
 		/*
 		 * We own the drivers.  We can drop the spinlock and let
 		 * release_console_sem() print the text
 		 */
-		spin_unlock_irqrestore(&logbuf_lock, flags);
+		spin_unlock(&logbuf_lock);
+		local_irq_restore(flags);
 		console_may_schedule = 0;
 		release_console_sem();
 	} else {
@@ -617,7 +620,8 @@ asmlinkage int vprintk(const char *fmt, 
 		 * allows the semaphore holder to proceed and to call the
 		 * console drivers with the output which we just produced.
 		 */
-		spin_unlock_irqrestore(&logbuf_lock, flags);
+		spin_unlock(&logbuf_lock);
+		local_irq_restore(flags);
 	}
 out:
 	return printed_len;
@@ -750,7 +754,7 @@ void release_console_sem(void)
 	 * case only.
 	 */
 #ifdef CONFIG_PREEMPT_RT
-	if (!in_atomic() && !irqs_disabled())
+	if (!in_atomic() && !irqs_disabled() && !hard_irqs_disabled())
 #endif
 	if (wake_klogd && !oops_in_progress && waitqueue_active(&log_wait))
 		wake_up_interruptible(&log_wait);
Index: linux-2.6.11/kernel/sched.c
===================================================================
--- linux-2.6.11.orig/kernel/sched.c	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/kernel/sched.c	2005-06-08 06:06:37.000000000 +0000
@@ -307,11 +307,12 @@ static inline runqueue_t *task_rq_lock(t
 	struct runqueue *rq;
 
 repeat_lock_task:
-	local_irq_save(*flags);
+	hard_local_irq_save(*flags);
 	rq = task_rq(p);
 	spin_lock(&rq->lock);
 	if (unlikely(rq != task_rq(p))) {
-		spin_unlock_irqrestore(&rq->lock, *flags);
+		spin_unlock(&rq->lock);
+		hard_local_irq_restore(*flags);
 		goto repeat_lock_task;
 	}
 	return rq;
@@ -320,7 +321,8 @@ repeat_lock_task:
 static inline void task_rq_unlock(runqueue_t *rq, unsigned long *flags)
 	__releases(rq->lock)
 {
-	spin_unlock_irqrestore(&rq->lock, *flags);
+	spin_unlock(&rq->lock);
+	hard_local_irq_restore(*flags);
 }
 
 #ifdef CONFIG_SCHEDSTATS
@@ -426,7 +428,7 @@ static inline runqueue_t *this_rq_lock(v
 {
 	runqueue_t *rq;
 
-	local_irq_disable();
+	hard_local_irq_disable();
 	rq = this_rq();
 	spin_lock(&rq->lock);
 
@@ -1213,9 +1215,10 @@ out:
 	 */
 	if (_need_resched() && !irqs_disabled_flags(flags) && !preempt_count())
 		preempt_schedule_irq();
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 #else
-	spin_unlock_irqrestore(&rq->lock, flags);
+	spin_unlock(&rq->lock);
+	hard_local_irq_restore(flags);
 #endif
 	/* no need to check for preempt here - we just handled it */
 
@@ -1289,7 +1292,7 @@ void fastcall sched_fork(task_t *p)
 	 * total amount of pending timeslices in the system doesn't change,
 	 * resulting in more scheduling fairness.
 	 */
-	local_irq_disable();
+	hard_local_irq_disable();
 	p->time_slice = (current->time_slice + 1) >> 1;
 	/*
 	 * The remainder of the first timeslice might be recovered by
@@ -1307,10 +1310,10 @@ void fastcall sched_fork(task_t *p)
 		current->time_slice = 1;
 		preempt_disable();
 		scheduler_tick();
-		local_irq_enable();
+		hard_local_irq_enable();
 		preempt_enable();
 	} else
-		local_irq_enable();
+		hard_local_irq_enable();
 }
 
 /*
@@ -1496,7 +1499,7 @@ asmlinkage void schedule_tail(task_t *pr
 	preempt_disable(); // TODO: move this to fork setup
 	finish_task_switch(prev);
 	__preempt_enable_no_resched();
-	local_irq_enable();
+	hard_local_irq_enable();
 	preempt_check_resched();
 
 	if (current->set_child_tid)
@@ -2623,7 +2626,7 @@ void scheduler_tick(void)
 	task_t *p = current;
 	unsigned long long now = sched_clock();
 
-	BUG_ON(!irqs_disabled());
+	BUG_ON(!hard_irqs_disabled());
 
 	update_cpu_clock(p, rq, now);
 
@@ -2938,7 +2941,8 @@ void __sched __schedule(void)
 	run_time /= (CURRENT_BONUS(prev) ? : 1);
 
 	cpu = smp_processor_id();
-	spin_lock_irq(&rq->lock);
+	hard_local_irq_disable();
+	spin_lock(&rq->lock);
 
 	switch_count = &prev->nvcsw; // TODO: temporary - to see it in vmstat
 	if ((prev->state & ~TASK_RUNNING_MUTEX) &&
@@ -3078,7 +3082,7 @@ asmlinkage void __sched schedule(void)
 	/*
 	 * Test if we have interrupts disabled.
 	 */
-	if (unlikely(irqs_disabled())) {
+	if (unlikely(irqs_disabled() || hard_irqs_disabled())) {
 		stop_trace();
 		printk(KERN_ERR "BUG: scheduling with irqs disabled: "
 			"%s/0x%08x/%d\n",
@@ -3096,7 +3100,7 @@ asmlinkage void __sched schedule(void)
 	do {
 		__schedule();
 	} while (unlikely(test_thread_flag(TIF_NEED_RESCHED)));
-	local_irq_enable(); // TODO: do sti; ret
+	hard_local_irq_enable(); // TODO: do sti; ret
 }
 
 EXPORT_SYMBOL(schedule);
@@ -3166,11 +3170,11 @@ asmlinkage void __sched preempt_schedule
 	 * If there is a non-zero preempt_count or interrupts are disabled,
 	 * we do not want to preempt the current task.  Just return..
 	 */
-	if (unlikely(ti->preempt_count || irqs_disabled()))
+	if (unlikely(ti->preempt_count || irqs_disabled() || hard_irqs_disabled()))
 		return;
 
 need_resched:
-	local_irq_disable();
+	hard_local_irq_disable();
 	add_preempt_count(PREEMPT_ACTIVE);
 	/*
 	 * We keep the big kernel semaphore locked, but we
@@ -3189,7 +3193,7 @@ need_resched:
 	barrier();
 	if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
 		goto need_resched;
-	local_irq_enable();
+	hard_local_irq_enable();
 }
 
 EXPORT_SYMBOL(preempt_schedule);
@@ -3217,6 +3221,9 @@ asmlinkage void __sched preempt_schedule
 		return;
 
 need_resched:
+#ifdef CONFIG_RT_IRQ_DISABLE
+	hard_local_irq_disable();
+#endif
 	add_preempt_count(PREEMPT_ACTIVE);
 	/*
 	 * We keep the big kernel semaphore locked, but we
@@ -3228,7 +3235,12 @@ need_resched:
 	task->lock_depth = -1;
 #endif
 	__schedule();
-	local_irq_disable();
+
+	_hard_local_irq_disable();
+#ifdef CONFIG_RT_IRQ_DISABLE
+	local_irq_enable_noresched();
+#endif
+
 #ifdef CONFIG_PREEMPT_BKL
 	task->lock_depth = saved_lock_depth;
 #endif
@@ -4160,7 +4172,7 @@ asmlinkage long sys_sched_yield(void)
 	__preempt_enable_no_resched();
 
 	__schedule();
-	local_irq_enable();
+	hard_local_irq_enable();
 	preempt_check_resched();
 
 	return 0;
@@ -4173,11 +4185,11 @@ static void __cond_resched(void)
 	if (preempt_count() & PREEMPT_ACTIVE)
 		return;
 	do {
-		local_irq_disable();
+		hard_local_irq_disable();
 		add_preempt_count(PREEMPT_ACTIVE);
 		__schedule();
 	} while (need_resched());
-	local_irq_enable();
+	hard_local_irq_enable();
 }
 
 int __sched cond_resched(void)
Index: linux-2.6.11/kernel/timer.c
===================================================================
--- linux-2.6.11.orig/kernel/timer.c	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/kernel/timer.c	2005-06-08 01:09:15.000000000 +0000
@@ -437,9 +437,10 @@ static int cascade(tvec_base_t *base, tv
 static inline void __run_timers(tvec_base_t *base)
 {
 	struct timer_list *timer;
+	unsigned long jiffies_sample = jiffies;
 
 	spin_lock_irq(&base->lock);
-	while (time_after_eq(jiffies, base->timer_jiffies)) {
+	while (time_after_eq(jiffies_sample, base->timer_jiffies)) {
 		struct list_head work_list = LIST_HEAD_INIT(work_list);
 		struct list_head *head = &work_list;
  		int index = base->timer_jiffies & TVR_MASK;
Index: linux-2.6.11/lib/Kconfig.RT
===================================================================
--- linux-2.6.11.orig/lib/Kconfig.RT	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/lib/Kconfig.RT	2005-06-08 06:56:35.000000000 +0000
@@ -81,6 +81,25 @@ config PREEMPT_RT
 
 endchoice
 
+config RT_IRQ_DISABLE
+	bool "Real-Time IRQ Disable"
+	default y
+	depends on PREEMPT_RT
+	help
+	  This option will remove all local_irq_enable() and
+	  local_irq_disable() calls and replace them with soft
+	  versions. This will decrease the frequency that
+	  interrupt are disabled.
+
+	  All interrupts that are flagged with SA_NODELAY are
+	  considered hard interrupts. This option will force
+	  SA_NODELAY interrupts to run even when they normally
+	  wouldn't be enabled.
+
+	  Select this if you plan to use Linux in an 
+	  embedded enviorment that needs low interrupt
+	  latency.
+
 config PREEMPT
 	bool
 	default y
Index: linux-2.6.11/lib/kernel_lock.c
===================================================================
--- linux-2.6.11.orig/lib/kernel_lock.c	2005-06-08 00:33:21.000000000 +0000
+++ linux-2.6.11/lib/kernel_lock.c	2005-06-08 00:35:30.000000000 +0000
@@ -98,7 +98,8 @@ int __lockfunc __reacquire_kernel_lock(v
 	struct task_struct *task = current;
 	int saved_lock_depth = task->lock_depth;
 
-	local_irq_enable();
+	hard_local_irq_enable();
+
 	BUG_ON(saved_lock_depth < 0);
 
 	task->lock_depth = -1;
@@ -107,8 +108,8 @@ int __lockfunc __reacquire_kernel_lock(v
 
 	task->lock_depth = saved_lock_depth;
 
-	local_irq_disable();
-
+	hard_local_irq_disable();
+	
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-08  7:08 [PATCH] local_irq_disable removal Daniel Walker
@ 2005-06-08 11:21 ` Ingo Molnar
  2005-06-08 20:33   ` Daniel Walker
  2005-06-10 23:37 ` Esben Nielsen
  2005-06-11 16:51 ` Christoph Hellwig
  2 siblings, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2005-06-08 11:21 UTC (permalink / raw)
  To: Daniel Walker; +Cc: linux-kernel, sdietrich


* Daniel Walker <dwalker@mvista.com> wrote:

> Implementation:
> 
>         I've written code that removes 70% all interrupt disable 
> sections in the current real time kernel. These interrupt disable 
> sections are replaced with a special preempt disable section. Since an 
> interrupt disable section implicitly disables preemption, there should 
> be no increase in preemption latency due to this change.

great. I've applied a cleaned up / fixed version of this and have 
released it as part of the -48-00 patch, which is available at the usual 
place:

    http://redhat.com/~mingo/realtime-preempt/

i've attached below the delta relative to your patch. The changes are:

 - fixed a soft-local_irq_restore() bug: it didnt re-disable the IRQ 
   flag if the flags passed in had it set.

 - fixed SMP support - both the scheduler and the lowlevel SMP code was 
   not fully converted to the soft flag assumptions. The PREEMPT_RT 
   kernel now boots fine on a 2-way/4-way x86 box.

 - fixed the APIC code

 - fixed irq-latency tracing and other tracing assumptions

 - fixed DEBUG_RT_DEADLOCK_DETECT - we checked for the wrong irq flags

 - added debug code to find IRQ flag mismatches: mixing the CPU and soft 
   flags is lethal, but detectable.

 - simplified the code which should thus also be faster: introduced the
   mask_preempt_count/unmask_preempt_count primitives and made the 
   soft-flag code use it.

 - cleaned up the interdependencies of the soft-flag functions - they 
   now dont call each other anymore, they all use inlined code for 
   maximum performance.

 - made the soft IRQ flag an unconditional feature of PREEMPT_RT: once 
   it works properly there's no reason to ever disable it under 
   PREEMPT_RT.

 - renamed hard_ to raw_, to bring it in line with other constructs in 
   PREEMPT_RT.

 - cleaned up the system.h impact by creating linux/rt_irq.h. Made the 
   naming consistent all across.

 - cleaned up the preempt.h impact and updated the comments.

 - fixed smp_processor_id() debugging: we have to check for the CPU irq 
   flag too.

	Ingo

--- kernel/rt.c.orig
+++ kernel/rt.c
@@ -80,7 +80,7 @@ void deadlock_trace_off(void)
 
 #define trace_lock_irq(lock)			\
 	do {					\
-		local_irq_disable();		\
+		raw_local_irq_disable();	\
 		if (trace_on)			\
 			spin_lock(lock);	\
 	} while (0)
@@ -95,13 +95,13 @@ void deadlock_trace_off(void)
 	do {					\
 		if (trace_on)			\
 			spin_unlock(lock);	\
-		local_irq_enable();		\
+		raw_local_irq_enable();		\
 		preempt_check_resched();	\
 	} while (0)
 
 #define trace_lock_irqsave(lock, flags)		\
 	do {					\
-		local_irq_save(flags);		\
+		raw_local_irq_save(flags);	\
 		if (trace_on)			\
 			spin_lock(lock);	\
 	} while (0)
@@ -110,7 +110,7 @@ void deadlock_trace_off(void)
 	do {					\
 		if (trace_on)			\
 			spin_unlock(lock);	\
-		local_irq_restore(flags);	\
+		raw_local_irq_restore(flags);	\
 		preempt_check_resched();	\
 	} while (0)
 
@@ -137,9 +137,9 @@ do {						\
 	}					\
 } while (0)
 
-# define trace_local_irq_disable()		local_irq_disable()
-# define trace_local_irq_enable()		local_irq_enable()
-# define trace_local_irq_restore(flags)		local_irq_restore(flags)
+# define trace_local_irq_disable()		raw_local_irq_disable()
+# define trace_local_irq_enable()		raw_local_irq_enable()
+# define trace_local_irq_restore(flags)		raw_local_irq_restore(flags)
 # define TRACE_BUG_ON(c)			do { if (c) TRACE_BUG(); } while (0)
 #else
 # define trace_lock_irq(lock)			preempt_disable()
@@ -928,7 +928,7 @@ static void __sched __down(struct rt_mut
 	struct rt_mutex_waiter waiter;
 
 	trace_lock_irqsave(&trace_lock, flags);
-	TRACE_BUG_ON(!irqs_disabled());
+	TRACE_BUG_ON(!raw_irqs_disabled());
 	spin_lock(&lock->wait_lock);
 
 	init_lists(lock);
@@ -951,7 +951,7 @@ static void __sched __down(struct rt_mut
 	plist_init(&waiter.list, task->prio);
 	task_blocks_on_lock(&waiter, task, lock, eip);
 
-	TRACE_BUG_ON(!irqs_disabled());
+	TRACE_BUG_ON(!raw_irqs_disabled());
 	/* we don't need to touch the lock struct anymore */
 	spin_unlock(&lock->wait_lock);
 	trace_unlock_irqrestore(&trace_lock, flags);
@@ -1025,7 +1025,7 @@ static void __sched __down_mutex(struct 
 	int got_wakeup = 0;
 
 	trace_lock_irqsave(&trace_lock, flags);
-	TRACE_BUG_ON(!irqs_disabled());
+	TRACE_BUG_ON(!raw_irqs_disabled());
 	__raw_spin_lock(&lock->wait_lock);
 
 	init_lists(lock);
@@ -1046,7 +1046,7 @@ static void __sched __down_mutex(struct 
 	plist_init(&waiter.list, task->prio);
 	task_blocks_on_lock(&waiter, task, lock, eip);
 
-	TRACE_BUG_ON(!irqs_disabled());
+	TRACE_BUG_ON(!raw_irqs_disabled());
 	/*
 	 * Here we save whatever state the task was in originally,
 	 * we'll restore it at the end of the function and we'll
@@ -1165,7 +1165,7 @@ static int __sched __down_interruptible(
 	int ret;
 
 	trace_lock_irqsave(&trace_lock, flags);
-	TRACE_BUG_ON(!irqs_disabled());
+	TRACE_BUG_ON(!raw_irqs_disabled());
 	spin_lock(&lock->wait_lock);
 
 	init_lists(lock);
@@ -1188,7 +1188,7 @@ static int __sched __down_interruptible(
 	plist_init(&waiter.list, task->prio);
 	task_blocks_on_lock(&waiter, task, lock, eip);
 
-	TRACE_BUG_ON(!irqs_disabled());
+	TRACE_BUG_ON(!raw_irqs_disabled());
 	/* we don't need to touch the lock struct anymore */
 	spin_unlock(&lock->wait_lock);
 	trace_unlock_irqrestore(&trace_lock, flags);
@@ -1258,7 +1258,7 @@ static int __down_trylock(struct rt_mute
 	int ret = 0;
 
 	trace_lock_irqsave(&trace_lock, flags);
-	TRACE_BUG_ON(!irqs_disabled());
+	TRACE_BUG_ON(!raw_irqs_disabled());
 	spin_lock(&lock->wait_lock);
 
 	init_lists(lock);
@@ -1331,7 +1331,7 @@ static void __up_mutex(struct rt_mutex *
 	TRACE_WARN_ON(save_state != lock->save_state);
 
 	trace_lock_irqsave(&trace_lock, flags);
-	TRACE_BUG_ON(!irqs_disabled());
+	TRACE_BUG_ON(!raw_irqs_disabled());
 	__raw_spin_lock(&lock->wait_lock);
 	TRACE_BUG_ON(!lock->wait_list.dp_node.prev && !lock->wait_list.dp_node.next);
 
@@ -1392,7 +1392,7 @@ static void __up_mutex(struct rt_mutex *
 	 * reschedule then do it here without enabling interrupts
 	 * again (and lengthening latency):
 	 */
-	if (need_resched() && !irqs_disabled_flags(flags) && !preempt_count())
+	if (need_resched() && !raw_irqs_disabled_flags(flags) && !preempt_count())
 		preempt_schedule_irq();
 	trace_local_irq_restore(flags);
 #else
@@ -1997,3 +1997,83 @@ int _write_can_lock(rwlock_t *rwlock)
 }
 EXPORT_SYMBOL(_write_can_lock);
 
+/*
+ * Soft irq-flag support:
+ */
+
+#ifdef CONFIG_DEBUG_PREEMPT
+static void check_soft_flags(unsigned long flags)
+{
+	if (flags & ~IRQSOFF_MASK) {
+		static int print_once = 1;
+		if (print_once) {
+			print_once = 0;
+			printk("BUG: bad soft irq-flag value %08lx, %s/%d!\n",
+				flags, current->comm, current->pid);
+			dump_stack();
+		}
+	}
+}
+#else
+static inline void check_soft_flags(unsigned long flags)
+{
+}
+#endif
+
+void local_irq_enable_noresched(void)
+{
+	unmask_preempt_count(IRQSOFF_MASK);
+}
+EXPORT_SYMBOL(local_irq_enable_noresched);
+
+void local_irq_enable(void)
+{
+	unmask_preempt_count(IRQSOFF_MASK);
+	preempt_check_resched();
+}
+EXPORT_SYMBOL(local_irq_enable);
+
+void local_irq_disable(void)
+{
+	mask_preempt_count(IRQSOFF_MASK);
+}
+EXPORT_SYMBOL(local_irq_disable);
+
+int irqs_disabled_flags(unsigned long flags)
+{
+	check_soft_flags(flags);
+
+	return (flags & IRQSOFF_MASK);
+}
+EXPORT_SYMBOL(irqs_disabled_flags);
+
+void __local_save_flags(unsigned long *flags)
+{
+	*flags = irqs_off();
+}
+EXPORT_SYMBOL(__local_save_flags);
+
+void __local_irq_save(unsigned long *flags)
+{
+	*flags = irqs_off();
+	mask_preempt_count(IRQSOFF_MASK);
+}
+EXPORT_SYMBOL(__local_irq_save);
+
+void local_irq_restore(unsigned long flags)
+{
+	check_soft_flags(flags);
+	if (flags)
+		mask_preempt_count(IRQSOFF_MASK);
+	else {
+		unmask_preempt_count(IRQSOFF_MASK);
+		preempt_check_resched();
+	}
+}
+EXPORT_SYMBOL(local_irq_restore);
+
+int irqs_disabled(void)
+{
+	return irqs_off();
+}
+EXPORT_SYMBOL(irqs_disabled);
--- kernel/exit.c.orig
+++ kernel/exit.c
@@ -844,7 +844,7 @@ fastcall NORET_TYPE void do_exit(long co
 	check_no_held_locks(tsk);
 	/* PF_DEAD causes final put_task_struct after we schedule. */
 again:
-	hard_local_irq_disable();
+	raw_local_irq_disable();
 	tsk->flags |= PF_DEAD;
 	__schedule();
 	printk(KERN_ERR "BUG: dead task %s:%d back from the grave!\n",
--- kernel/printk.c.orig
+++ kernel/printk.c
@@ -529,8 +529,7 @@ asmlinkage int vprintk(const char *fmt, 
 		zap_locks();
 
 	/* This stops the holder of console_sem just where we want him */
-	local_irq_save(flags);
-	spin_lock(&logbuf_lock);
+	spin_lock_irqsave(&logbuf_lock, flags);
 
 	/* Emit the output into the temporary buffer */
 	printed_len = vscnprintf(printk_buf, sizeof(printk_buf), fmt, args);
@@ -600,18 +599,16 @@ asmlinkage int vprintk(const char *fmt, 
 		 * CPU until it is officially up.  We shouldn't be calling into
 		 * random console drivers on a CPU which doesn't exist yet..
 		 */
-		spin_unlock(&logbuf_lock);
-		local_irq_restore(flags);
+		spin_unlock_irqrestore(&logbuf_lock, flags);
 		goto out;
 	}
-	if (!in_interrupt() && !down_trylock(&console_sem)) {
+	if (!down_trylock(&console_sem)) {
 		console_locked = 1;
 		/*
 		 * We own the drivers.  We can drop the spinlock and let
 		 * release_console_sem() print the text
 		 */
-		spin_unlock(&logbuf_lock);
-		local_irq_restore(flags);
+		spin_unlock_irqrestore(&logbuf_lock, flags);
 		console_may_schedule = 0;
 		release_console_sem();
 	} else {
@@ -620,8 +617,7 @@ asmlinkage int vprintk(const char *fmt, 
 		 * allows the semaphore holder to proceed and to call the
 		 * console drivers with the output which we just produced.
 		 */
-		spin_unlock(&logbuf_lock);
-		local_irq_restore(flags);
+		spin_unlock_irqrestore(&logbuf_lock, flags);
 	}
 out:
 	return printed_len;
@@ -754,7 +750,7 @@ void release_console_sem(void)
 	 * case only.
 	 */
 #ifdef CONFIG_PREEMPT_RT
-	if (!in_atomic() && !irqs_disabled() && !hard_irqs_disabled())
+	if (!in_atomic() && !irqs_disabled() && !raw_irqs_disabled())
 #endif
 	if (wake_klogd && !oops_in_progress && waitqueue_active(&log_wait))
 		wake_up_interruptible(&log_wait);
--- kernel/sched.c.orig
+++ kernel/sched.c
@@ -307,12 +307,12 @@ static inline runqueue_t *task_rq_lock(t
 	struct runqueue *rq;
 
 repeat_lock_task:
-	hard_local_irq_save(*flags);
+	raw_local_irq_save(*flags);
 	rq = task_rq(p);
 	spin_lock(&rq->lock);
 	if (unlikely(rq != task_rq(p))) {
 		spin_unlock(&rq->lock);
-		hard_local_irq_restore(*flags);
+		raw_local_irq_restore(*flags);
 		goto repeat_lock_task;
 	}
 	return rq;
@@ -322,7 +322,7 @@ static inline void task_rq_unlock(runque
 	__releases(rq->lock)
 {
 	spin_unlock(&rq->lock);
-	hard_local_irq_restore(*flags);
+	raw_local_irq_restore(*flags);
 }
 
 #ifdef CONFIG_SCHEDSTATS
@@ -428,7 +428,7 @@ static inline runqueue_t *this_rq_lock(v
 {
 	runqueue_t *rq;
 
-	hard_local_irq_disable();
+	raw_local_irq_disable();
 	rq = this_rq();
 	spin_lock(&rq->lock);
 
@@ -1223,10 +1223,10 @@ out:
 	 */
 	if (_need_resched() && !irqs_disabled_flags(flags) && !preempt_count())
 		preempt_schedule_irq();
-	hard_local_irq_restore(flags);
+	raw_local_irq_restore(flags);
 #else
 	spin_unlock(&rq->lock);
-	hard_local_irq_restore(flags);
+	raw_local_irq_restore(flags);
 #endif
 	/* no need to check for preempt here - we just handled it */
 
@@ -1300,7 +1300,7 @@ void fastcall sched_fork(task_t *p)
 	 * total amount of pending timeslices in the system doesn't change,
 	 * resulting in more scheduling fairness.
 	 */
-	hard_local_irq_disable();
+	raw_local_irq_disable();
 	p->time_slice = (current->time_slice + 1) >> 1;
 	/*
 	 * The remainder of the first timeslice might be recovered by
@@ -1318,10 +1318,10 @@ void fastcall sched_fork(task_t *p)
 		current->time_slice = 1;
 		preempt_disable();
 		scheduler_tick();
-		hard_local_irq_enable();
+		raw_local_irq_enable();
 		preempt_enable();
 	} else
-		hard_local_irq_enable();
+		raw_local_irq_enable();
 }
 
 /*
@@ -1507,7 +1507,7 @@ asmlinkage void schedule_tail(task_t *pr
 	preempt_disable(); // TODO: move this to fork setup
 	finish_task_switch(prev);
 	__preempt_enable_no_resched();
-	hard_local_irq_enable();
+	raw_local_irq_enable();
 	preempt_check_resched();
 
 	if (current->set_child_tid)
@@ -2522,10 +2522,11 @@ unsigned long long current_sched_time(co
 {
 	unsigned long long ns;
 	unsigned long flags;
-	local_irq_save(flags);
+
+	raw_local_irq_save(flags);
 	ns = max(tsk->timestamp, task_rq(tsk)->timestamp_last_tick);
 	ns = tsk->sched_time + (sched_clock() - ns);
-	local_irq_restore(flags);
+	raw_local_irq_restore(flags);
 	return ns;
 }
 
@@ -2634,7 +2635,7 @@ void scheduler_tick(void)
 	task_t *p = current;
 	unsigned long long now = sched_clock();
 
-	BUG_ON(!hard_irqs_disabled());
+	BUG_ON(!raw_irqs_disabled());
 
 	update_cpu_clock(p, rq, now);
 
@@ -2949,7 +2950,7 @@ void __sched __schedule(void)
 	run_time /= (CURRENT_BONUS(prev) ? : 1);
 
 	cpu = smp_processor_id();
-	hard_local_irq_disable();
+	raw_local_irq_disable();
 	spin_lock(&rq->lock);
 
 	switch_count = &prev->nvcsw; // TODO: temporary - to see it in vmstat
@@ -3091,7 +3092,7 @@ asmlinkage void __sched schedule(void)
 	/*
 	 * Test if we have interrupts disabled.
 	 */
-	if (unlikely(irqs_disabled() || hard_irqs_disabled())) {
+	if (unlikely(irqs_disabled() || raw_irqs_disabled())) {
 		stop_trace();
 		printk(KERN_ERR "BUG: scheduling with irqs disabled: "
 			"%s/0x%08x/%d\n",
@@ -3109,7 +3110,7 @@ asmlinkage void __sched schedule(void)
 	do {
 		__schedule();
 	} while (unlikely(test_thread_flag(TIF_NEED_RESCHED)));
-	hard_local_irq_enable(); // TODO: do sti; ret
+	raw_local_irq_enable(); // TODO: do sti; ret
 }
 
 EXPORT_SYMBOL(schedule);
@@ -3179,11 +3180,11 @@ asmlinkage void __sched preempt_schedule
 	 * If there is a non-zero preempt_count or interrupts are disabled,
 	 * we do not want to preempt the current task.  Just return..
 	 */
-	if (unlikely(ti->preempt_count || irqs_disabled() || hard_irqs_disabled()))
+	if (unlikely(ti->preempt_count || irqs_disabled() || raw_irqs_disabled()))
 		return;
 
 need_resched:
-	hard_local_irq_disable();
+	raw_local_irq_disable();
 	add_preempt_count(PREEMPT_ACTIVE);
 	/*
 	 * We keep the big kernel semaphore locked, but we
@@ -3202,7 +3203,7 @@ need_resched:
 	barrier();
 	if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
 		goto need_resched;
-	hard_local_irq_enable();
+	raw_local_irq_enable();
 }
 
 EXPORT_SYMBOL(preempt_schedule);
@@ -3230,9 +3231,7 @@ asmlinkage void __sched preempt_schedule
 		return;
 
 need_resched:
-#ifdef CONFIG_RT_IRQ_DISABLE
-	hard_local_irq_disable();
-#endif
+	raw_local_irq_disable();
 	add_preempt_count(PREEMPT_ACTIVE);
 	/*
 	 * We keep the big kernel semaphore locked, but we
@@ -3245,10 +3244,8 @@ need_resched:
 #endif
 	__schedule();
 
-	_hard_local_irq_disable();
-#ifdef CONFIG_RT_IRQ_DISABLE
+	raw_local_irq_disable();
 	local_irq_enable_noresched();
-#endif
 
 #ifdef CONFIG_PREEMPT_BKL
 	task->lock_depth = saved_lock_depth;
@@ -4182,7 +4179,7 @@ asmlinkage long sys_sched_yield(void)
 	__preempt_enable_no_resched();
 
 	__schedule();
-	hard_local_irq_enable();
+	raw_local_irq_enable();
 	preempt_check_resched();
 
 	return 0;
@@ -4195,11 +4192,11 @@ static void __cond_resched(void)
 	if (preempt_count() & PREEMPT_ACTIVE)
 		return;
 	do {
-		hard_local_irq_disable();
+		raw_local_irq_disable();
 		add_preempt_count(PREEMPT_ACTIVE);
 		__schedule();
 	} while (need_resched());
-	hard_local_irq_enable();
+	raw_local_irq_enable();
 }
 
 int __sched cond_resched(void)
@@ -4609,10 +4606,12 @@ void __devinit init_idle(task_t *idle, i
 	idle->cpus_allowed = cpumask_of_cpu(cpu);
 	set_task_cpu(idle, cpu);
 
-	spin_lock_irqsave(&rq->lock, flags);
+	raw_local_irq_save(flags);
+	spin_lock(&rq->lock);
 	rq->curr = rq->idle = idle;
 	set_tsk_need_resched(idle);
-	spin_unlock_irqrestore(&rq->lock, flags);
+	spin_unlock(&rq->lock);
+	raw_local_irq_restore(flags);
 
 	/* Set the preempt count _outside_ the spinlocks! */
 #if defined(CONFIG_PREEMPT) && \
@@ -4766,10 +4765,12 @@ static int migration_thread(void * data)
 		if (current->flags & PF_FREEZE)
 			refrigerator(PF_FREEZE);
 
-		spin_lock_irq(&rq->lock);
+		raw_local_irq_disable();
+		spin_lock(&rq->lock);
 
 		if (cpu_is_offline(cpu)) {
-			spin_unlock_irq(&rq->lock);
+			spin_unlock(&rq->lock);
+			raw_local_irq_enable();
 			goto wait_to_die;
 		}
 
@@ -4781,7 +4782,8 @@ static int migration_thread(void * data)
 		head = &rq->migration_queue;
 
 		if (list_empty(head)) {
-			spin_unlock_irq(&rq->lock);
+			spin_unlock(&rq->lock);
+			raw_local_irq_enable();
 			schedule();
 			set_current_state(TASK_INTERRUPTIBLE);
 			continue;
@@ -4792,12 +4794,14 @@ static int migration_thread(void * data)
 		if (req->type == REQ_MOVE_TASK) {
 			spin_unlock(&rq->lock);
 			__migrate_task(req->task, cpu, req->dest_cpu);
-			local_irq_enable();
+			raw_local_irq_enable();
 		} else if (req->type == REQ_SET_DOMAIN) {
 			rq->sd = req->sd;
-			spin_unlock_irq(&rq->lock);
+			spin_unlock(&rq->lock);
+			raw_local_irq_enable();
 		} else {
-			spin_unlock_irq(&rq->lock);
+			spin_unlock(&rq->lock);
+			raw_local_irq_enable();
 			WARN_ON(1);
 		}
 
@@ -4863,12 +4867,12 @@ static void migrate_nr_uninterruptible(r
 	runqueue_t *rq_dest = cpu_rq(any_online_cpu(CPU_MASK_ALL));
 	unsigned long flags;
 
-	local_irq_save(flags);
+	raw_local_irq_save(flags);
 	double_rq_lock(rq_src, rq_dest);
 	rq_dest->nr_uninterruptible += rq_src->nr_uninterruptible;
 	rq_src->nr_uninterruptible = 0;
 	double_rq_unlock(rq_src, rq_dest);
-	local_irq_restore(flags);
+	raw_local_irq_restore(flags);
 }
 
 /* Run through task list and migrate tasks from the dead cpu. */
@@ -4906,13 +4910,15 @@ void sched_idle_next(void)
 	/* Strictly not necessary since rest of the CPUs are stopped by now
 	 * and interrupts disabled on current cpu.
 	 */
-	spin_lock_irqsave(&rq->lock, flags);
+	raw_local_irq_save(flags);
+	spin_lock(&rq->lock);
 
 	__setscheduler(p, SCHED_FIFO, MAX_RT_PRIO-1);
 	/* Add idle task to _front_ of it's priority queue */
 	__activate_idle_task(p, rq);
 
-	spin_unlock_irqrestore(&rq->lock, flags);
+	spin_unlock(&rq->lock);
+	raw_local_irq_restore(flags);
 }
 
 /* Ensures that the idle task is using init_mm right before its cpu goes
@@ -4946,9 +4952,11 @@ static void migrate_dead(unsigned int de
 	 * that's OK.  No task can be added to this CPU, so iteration is
 	 * fine.
 	 */
-	spin_unlock_irq(&rq->lock);
+	raw_local_irq_disable();
+	spin_lock(&rq->lock);
 	move_task_off_dead_cpu(dead_cpu, tsk);
-	spin_lock_irq(&rq->lock);
+	spin_unlock(&rq->lock);
+	raw_local_irq_enable();
 
 	put_task_struct(tsk);
 }
@@ -5025,7 +5033,8 @@ static int migration_call(struct notifie
 		/* No need to migrate the tasks: it was best-effort if
 		 * they didn't do lock_cpu_hotplug().  Just wake up
 		 * the requestors. */
-		spin_lock_irq(&rq->lock);
+		raw_local_irq_disable();
+		spin_lock(&rq->lock);
 		while (!list_empty(&rq->migration_queue)) {
 			migration_req_t *req;
 			req = list_entry(rq->migration_queue.next,
@@ -5034,7 +5043,8 @@ static int migration_call(struct notifie
 			list_del_init(&req->list);
 			complete(&req->done);
 		}
-		spin_unlock_irq(&rq->lock);
+		spin_unlock(&rq->lock);
+		raw_local_irq_enable();
 		break;
 #endif
 	}
@@ -5162,7 +5172,8 @@ void __devinit cpu_attach_domain(struct 
 
 	sched_domain_debug(sd, cpu);
 
-	spin_lock_irqsave(&rq->lock, flags);
+	raw_local_irq_save(flags);
+	spin_lock(&rq->lock);
 
 	if (cpu == smp_processor_id() || !cpu_online(cpu)) {
 		rq->sd = sd;
@@ -5174,7 +5185,8 @@ void __devinit cpu_attach_domain(struct 
 		local = 0;
 	}
 
-	spin_unlock_irqrestore(&rq->lock, flags);
+	spin_unlock(&rq->lock);
+	raw_local_irq_restore(flags);
 
 	if (!local) {
 		wake_up_process(rq->migration_thread);
--- kernel/irq/manage.c.orig
+++ kernel/irq/manage.c
@@ -60,14 +60,14 @@ void disable_irq_nosync(unsigned int irq
 	irq_desc_t *desc = irq_desc + irq;
 	unsigned long flags;
 	
-	_hard_local_irq_save(flags);
+	__raw_local_irq_save(flags);
 	spin_lock(&desc->lock);
 	if (!desc->depth++) {
 		desc->status |= IRQ_DISABLED;
 		desc->handler->disable(irq);
 	}
 	spin_unlock(&desc->lock);
-	_hard_local_irq_restore(flags);
+	__raw_local_irq_restore(flags);
 }
 
 EXPORT_SYMBOL(disable_irq_nosync);
@@ -110,7 +110,7 @@ void enable_irq(unsigned int irq)
 	irq_desc_t *desc = irq_desc + irq;
 	unsigned long flags;
 
-	_hard_local_irq_save(flags);
+	__raw_local_irq_save(flags);
 	spin_lock(&desc->lock);
 	switch (desc->depth) {
 	case 0:
@@ -131,7 +131,7 @@ void enable_irq(unsigned int irq)
 		desc->depth--;
 	}
 	spin_unlock(&desc->lock);
-	_hard_local_irq_restore(flags);
+	__raw_local_irq_restore(flags);
 }
 
 EXPORT_SYMBOL(enable_irq);
@@ -207,7 +207,7 @@ int setup_irq(unsigned int irq, struct i
 	/*
 	 * The following block of code has to be executed atomically
 	 */
-	_hard_local_irq_save(flags);
+	__raw_local_irq_save(flags);
 	spin_lock(&desc->lock);
 	p = &desc->action;
 	if ((old = *p) != NULL) {
@@ -242,7 +242,7 @@ int setup_irq(unsigned int irq, struct i
 			desc->handler->enable(irq);
 	}
 	spin_unlock(&desc->lock);
-	_hard_local_irq_restore(flags);
+	__raw_local_irq_restore(flags);
 
 	new->irq = irq;
 	register_irq_proc(irq);
@@ -276,7 +276,7 @@ void free_irq(unsigned int irq, void *de
 		return;
 
 	desc = irq_desc + irq;
-	_hard_local_irq_save(flags);
+	__raw_local_irq_save(flags);
 	spin_lock(&desc->lock);
 	p = &desc->action;
 	for (;;) {
@@ -300,7 +300,7 @@ void free_irq(unsigned int irq, void *de
 			}
 			recalculate_desc_flags(desc);
 			spin_unlock(&desc->lock);
-			_hard_local_irq_restore(flags);
+			__raw_local_irq_restore(flags);
 			unregister_handler_proc(irq, action);
 
 			/* Make sure it's not being used on another CPU */
@@ -310,7 +310,7 @@ void free_irq(unsigned int irq, void *de
 		}
 		printk(KERN_ERR "Trying to free free IRQ%d\n",irq);
 		spin_unlock(&desc->lock);
-		_hard_local_irq_restore(flags);
+		__raw_local_irq_restore(flags);
 		return;
 	}
 }
@@ -418,7 +418,7 @@ static void do_hardirq(struct irq_desc *
 	struct irqaction * action;
 	unsigned int irq = desc - irq_desc;
 
-	hard_local_irq_disable();
+	raw_local_irq_disable();
 
 	if (desc->status & IRQ_INPROGRESS) {
 		action = desc->action;
@@ -429,9 +429,9 @@ static void do_hardirq(struct irq_desc *
 			if (action) {
 				spin_unlock(&desc->lock);
 				action_ret = handle_IRQ_event(irq, NULL,action);
-				hard_local_irq_enable();
+				raw_local_irq_enable();
 				cond_resched_all();
-				hard_local_irq_disable();
+				raw_local_irq_disable();
 				spin_lock(&desc->lock);
 			}
 			if (!noirqdebug)
@@ -448,7 +448,7 @@ static void do_hardirq(struct irq_desc *
 		desc->handler->end(irq);
 		spin_unlock(&desc->lock);
 	}
-	hard_local_irq_enable();
+	raw_local_irq_enable();
 	if (waitqueue_active(&desc->wait_for_handler))
 		wake_up(&desc->wait_for_handler);
 }
@@ -484,7 +484,7 @@ static int do_irqd(void * __desc)
 		do_hardirq(desc);
 		cond_resched_all();
 		__do_softirq();
-		hard_local_irq_enable();
+		raw_local_irq_enable();
 #ifdef CONFIG_SMP
 		/*
 		 * Did IRQ affinities change?
--- kernel/irq/handle.c.orig
+++ kernel/irq/handle.c
@@ -113,7 +113,7 @@ fastcall int handle_IRQ_event(unsigned i
 	 * IRQ handlers:
 	 */
 	if (!hardirq_count() || !(action->flags & SA_INTERRUPT))
-		hard_local_irq_enable();
+		raw_local_irq_enable();
 
 	do {
 		unsigned int preempt_count = preempt_count();
@@ -133,10 +133,10 @@ fastcall int handle_IRQ_event(unsigned i
 	} while (action);
 
 	if (status & SA_SAMPLE_RANDOM) {
-		hard_local_irq_enable();
+		raw_local_irq_enable();
 		add_interrupt_randomness(irq);
 	}
-	hard_local_irq_disable();
+	raw_local_irq_disable();
 
 	return retval;
 }
@@ -156,9 +156,12 @@ fastcall notrace unsigned int __do_IRQ(u
 	irq_desc_t *desc = irq_desc + irq;
 	struct irqaction * action;
 	unsigned int status;
-
-#ifdef CONFIG_RT_IRQ_DISABLE
+#ifdef CONFIG_PREEMPT_RT
 	unsigned long flags;
+
+	/*
+	 * Disable the soft-irq-flag:
+	 */
 	local_irq_save(flags);
 #endif
 	kstat_this_cpu.irqs[irq]++;
@@ -245,7 +248,7 @@ out:
 out_no_end:
 	spin_unlock(&desc->lock);
 
-#ifdef CONFIG_RT_IRQ_DISABLE
+#ifdef CONFIG_PREEMPT_RT
 	local_irq_restore(flags);
 #endif
 	return 1;
--- kernel/irq/autoprobe.c.orig
+++ kernel/irq/autoprobe.c
@@ -39,12 +39,12 @@ unsigned long probe_irq_on(void)
 	for (i = NR_IRQS-1; i > 0; i--) {
 		desc = irq_desc + i;
 
-		hard_local_irq_disable();
+		raw_local_irq_disable();
 		spin_lock(&desc->lock);
 		if (!irq_desc[i].action)
 			irq_desc[i].handler->startup(i);
 		spin_unlock(&desc->lock);
-		hard_local_irq_enable();
+		raw_local_irq_enable();
 	}
 
 	/*
@@ -60,7 +60,7 @@ unsigned long probe_irq_on(void)
 	for (i = NR_IRQS-1; i > 0; i--) {
 		desc = irq_desc + i;
 
-		hard_local_irq_disable();
+		raw_local_irq_disable();
 		spin_lock(&desc->lock);
 		if (!desc->action) {
 			desc->status |= IRQ_AUTODETECT | IRQ_WAITING;
@@ -68,7 +68,7 @@ unsigned long probe_irq_on(void)
 				desc->status |= IRQ_PENDING;
 		}
 		spin_unlock(&desc->lock);
-		hard_local_irq_enable();
+		raw_local_irq_enable();
 	}
 
 	/*
@@ -84,7 +84,7 @@ unsigned long probe_irq_on(void)
 		irq_desc_t *desc = irq_desc + i;
 		unsigned int status;
 
-		hard_local_irq_disable();
+		raw_local_irq_disable();
 		spin_lock(&desc->lock);
 		status = desc->status;
 
@@ -98,7 +98,7 @@ unsigned long probe_irq_on(void)
 					val |= 1 << i;
 		}
 		spin_unlock(&desc->lock);
-		hard_local_irq_enable();
+		raw_local_irq_enable();
 	}
 
 	return val;
@@ -128,7 +128,7 @@ unsigned int probe_irq_mask(unsigned lon
 		irq_desc_t *desc = irq_desc + i;
 		unsigned int status;
 
-		hard_local_irq_disable();
+		raw_local_irq_disable();
 		spin_lock(&desc->lock);
 		status = desc->status;
 
@@ -140,7 +140,7 @@ unsigned int probe_irq_mask(unsigned lon
 			desc->handler->shutdown(i);
 		}
 		spin_unlock(&desc->lock);
-		hard_local_irq_enable();
+		raw_local_irq_enable();
 	}
 	up(&probe_sem);
 
@@ -173,7 +173,7 @@ int probe_irq_off(unsigned long val)
 		irq_desc_t *desc = irq_desc + i;
 		unsigned int status;
 
-		hard_local_irq_disable();
+		raw_local_irq_disable();
 		spin_lock(&desc->lock);
 		status = desc->status;
 
@@ -187,7 +187,7 @@ int probe_irq_off(unsigned long val)
 			desc->handler->shutdown(i);
 		}
 		spin_unlock(&desc->lock);
-		hard_local_irq_enable();
+		raw_local_irq_enable();
 	}
 	up(&probe_sem);
 
--- kernel/Makefile.orig
+++ kernel/Makefile
@@ -10,7 +10,6 @@ obj-y     = sched.o fork.o exec_domain.o
 	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o
 
 obj-$(CONFIG_PREEMPT_RT) += rt.o
-obj-$(CONFIG_RT_IRQ_DISABLE) += irqs-off.o
 
 obj-$(CONFIG_DEBUG_PREEMPT) += latency.o
 obj-$(CONFIG_LATENCY_TIMING) += latency.o
--- kernel/irqs-off.c.orig
+++ kernel/irqs-off.c
@@ -1,99 +0,0 @@
-/*
- * kernel/irqs-off.c 
- *
- * IRQ soft state managment 
- *
- * Author: Daniel Walker <dwalker@mvista.com>
- *
- * 2005 (c) MontaVista Software, Inc. This file is licensed under
- * the terms of the GNU General Public License version 2. This program
- * is licensed "as is" without any warranty of any kind, whether express
- * or implied.
- */
-
-#include <linux/hardirq.h>
-#include <linux/preempt.h>
-#include <linux/kallsyms.h>
-
-#include <linux/module.h>
-#include <asm/system.h>
-
-static int irq_trace;
-
-void irq_trace_enable(void) { irq_trace = 1; }
-void irq_trace_disable(void) { irq_trace = 0; }
-
-unsigned int ___local_save_flags()
-{
-	return irqs_off();
-}                                                                                                                        
-EXPORT_SYMBOL(___local_save_flags);
-
-void local_irq_enable_noresched(void)
-{
-	if (irq_trace) {
-		current->last_irq_enable[0] = (unsigned long)__builtin_return_address(0);
-		//current->last_irq_enable[1] = (unsigned long)__builtin_return_address(1);
-	}
-
-	if (irqs_off()) sub_preempt_count(IRQSOFF_OFFSET);
-}
-EXPORT_SYMBOL(local_irq_enable_noresched);
-
-void local_irq_enable(void)
-{
-	if (irq_trace) {
-		current->last_irq_enable[0] = (unsigned long)__builtin_return_address(0);
-		//current->last_irq_enable[1] = (unsigned long)__builtin_return_address(1);
-	}
-	if (irqs_off()) sub_preempt_count(IRQSOFF_OFFSET);
-
-	//local_irq_enable_noresched();
-	preempt_check_resched(); 
-}
-EXPORT_SYMBOL(local_irq_enable);
-
-void local_irq_disable(void) 
-{
-	if (irq_trace) {
-		current->last_irq_disable[0] = (unsigned long)__builtin_return_address(0);
-		//current->last_irq_disable[1] = (unsigned long)__builtin_return_address(1);
-	}
-	if (!irqs_off()) add_preempt_count(IRQSOFF_OFFSET);
-}
-EXPORT_SYMBOL(local_irq_disable);
-
-unsigned long irqs_disabled_flags(unsigned long flags)
-{
-	return (flags & IRQSOFF_MASK);	
-}
-EXPORT_SYMBOL(irqs_disabled_flags);
-
-void local_irq_restore(unsigned long flags)
-{
-	if (!irqs_disabled_flags(flags)) local_irq_enable();
-}
-EXPORT_SYMBOL(local_irq_restore);
-
-unsigned long irqs_disabled(void)
-{
-	return irqs_off();
-}
-EXPORT_SYMBOL(irqs_disabled);
-
-void print_irq_traces(struct task_struct *task)
-{
-	printk("Soft state access: (%s)\n", (hard_irqs_disabled()) ? "Hard disabled" : "Not disabled");
-	printk(".. [<%08lx>] .... ", task->last_irq_disable[0]);
-	print_symbol("%s\n", task->last_irq_disable[0]);
-	printk(".....[<%08lx>] ..   ( <= ",
-				task->last_irq_disable[1]);
-	print_symbol("%s)\n", task->last_irq_disable[1]);
-
-	printk(".. [<%08lx>] .... ", task->last_irq_enable[0]);
-	print_symbol("%s\n", task->last_irq_enable[0]);
-	printk(".....[<%08lx>] ..   ( <= ",
-				task->last_irq_enable[1]);
-	print_symbol("%s)\n", task->last_irq_enable[1]);
-	printk("\n");
-}
--- kernel/latency.c.orig
+++ kernel/latency.c
@@ -108,15 +108,13 @@ enum trace_flag_type
 	TRACE_FLAG_NEED_RESCHED		= 0x02,
 	TRACE_FLAG_HARDIRQ		= 0x04,
 	TRACE_FLAG_SOFTIRQ		= 0x08,
-#ifdef CONFIG_RT_IRQ_DISABLE
 	TRACE_FLAG_IRQS_HARD_OFF	= 0x16,
-#endif
 };
 
 
 #ifdef CONFIG_LATENCY_TRACE
 
-#define MAX_TRACE (unsigned long)(8192-1)
+#define MAX_TRACE (unsigned long)(4096-1)
 
 #define CMDLINE_BYTES 16
 
@@ -266,9 +264,9 @@ ____trace(int cpu, enum trace_type type,
 		entry->cpu = cpu;
 #endif
 		entry->flags = (irqs_disabled() ? TRACE_FLAG_IRQS_OFF : 0) |
-#ifdef CONFIG_RT_IRQ_DISABLE
-			(hard_irqs_disabled() ? TRACE_FLAG_IRQS_HARD_OFF : 0)|
-#endif
+
+			(raw_irqs_disabled() ? TRACE_FLAG_IRQS_HARD_OFF : 0)|
+
 			((pc & HARDIRQ_MASK) ? TRACE_FLAG_HARDIRQ : 0) |
 			((pc & SOFTIRQ_MASK) ? TRACE_FLAG_SOFTIRQ : 0) |
 			(_need_resched() ? TRACE_FLAG_NEED_RESCHED : 0);
@@ -731,11 +729,7 @@ print_generic(struct seq_file *m, struct
 	seq_printf(m, "%d", entry->cpu);
 	seq_printf(m, "%c%c",
 		(entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' :
-#ifdef CONFIG_RT_IRQ_DISABLE
 		(entry->flags & TRACE_FLAG_IRQS_HARD_OFF) ? 'D' : '.',
-#else
-		'.',
-#endif
 		(entry->flags & TRACE_FLAG_NEED_RESCHED) ? 'n' : '.');
 
 	hardirq = entry->flags & TRACE_FLAG_HARDIRQ;
@@ -1223,9 +1217,9 @@ void notrace trace_irqs_off_lowlevel(voi
 {
 	unsigned long flags;
 
-	hard_local_save_flags(flags);
+	raw_local_save_flags(flags);
 
-	if (!irqs_off_preempt_count() && hard_irqs_disabled_flags(flags))
+	if (!irqs_off_preempt_count() && raw_irqs_disabled_flags(flags))
 		__start_critical_timing(CALLER_ADDR0, 0);
 }
 
@@ -1233,9 +1227,9 @@ void notrace trace_irqs_off(void)
 {
 	unsigned long flags;
 
-	hard_local_save_flags(flags);
+	raw_local_save_flags(flags);
 
-	if (!irqs_off_preempt_count() && hard_irqs_disabled_flags(flags))
+	if (!irqs_off_preempt_count() && raw_irqs_disabled_flags(flags))
 		__start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
 }
 
@@ -1245,9 +1239,9 @@ void notrace trace_irqs_on(void)
 {
 	unsigned long flags;
 
-	hard_local_save_flags(flags);
+	raw_local_save_flags(flags);
 
-	if (!irqs_off_preempt_count() && hard_irqs_disabled_flags(flags))
+	if (!irqs_off_preempt_count() && raw_irqs_disabled_flags(flags))
 		__stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
 }
 
@@ -1259,7 +1253,7 @@ EXPORT_SYMBOL(trace_irqs_on);
 
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_CRITICAL_TIMING)
 
-void notrace add_preempt_count(int val)
+void notrace add_preempt_count(unsigned int val)
 {
 	unsigned long eip = CALLER_ADDR0;
 	unsigned long parent_eip = CALLER_ADDR1;
@@ -1290,9 +1284,9 @@ void notrace add_preempt_count(int val)
 #ifdef CONFIG_CRITICAL_IRQSOFF_TIMING
 		unsigned long flags;
 
-		local_save_flags(flags);
+		raw_local_save_flags(flags);
 
-		if (!irqs_disabled_flags(flags))
+		if (!raw_irqs_disabled_flags(flags))
 #endif
 			if (preempt_count() == val)
 				__start_critical_timing(eip, parent_eip);
@@ -1302,7 +1296,7 @@ void notrace add_preempt_count(int val)
 }
 EXPORT_SYMBOL(add_preempt_count);
 
-void notrace sub_preempt_count(int val)
+void notrace sub_preempt_count(unsigned int val)
 {
 #ifdef CONFIG_DEBUG_PREEMPT
 	/*
@@ -1321,9 +1315,9 @@ void notrace sub_preempt_count(int val)
 #ifdef CONFIG_CRITICAL_IRQSOFF_TIMING
 		unsigned long flags;
 
-		local_save_flags(flags);
+		raw_local_save_flags(flags);
 
-		if (!irqs_disabled_flags(flags))
+		if (!raw_irqs_disabled_flags(flags))
 #endif
 			if (preempt_count() == val)
 				__stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
@@ -1334,6 +1328,50 @@ void notrace sub_preempt_count(int val)
 
 EXPORT_SYMBOL(sub_preempt_count);
 
+void notrace mask_preempt_count(unsigned int mask)
+{
+	unsigned long eip = CALLER_ADDR0;
+	unsigned long parent_eip = CALLER_ADDR1;
+
+	preempt_count() |= mask;
+
+#ifdef CONFIG_CRITICAL_PREEMPT_TIMING
+	{
+#ifdef CONFIG_CRITICAL_IRQSOFF_TIMING
+		unsigned long flags;
+
+		raw_local_save_flags(flags);
+
+		if (!raw_irqs_disabled_flags(flags))
+#endif
+			if (preempt_count() == mask)
+				__start_critical_timing(eip, parent_eip);
+	}
+#endif
+	(void) eip, (void) parent_eip;
+}
+EXPORT_SYMBOL(mask_preempt_count);
+
+void notrace unmask_preempt_count(unsigned int mask)
+{
+#ifdef CONFIG_CRITICAL_PREEMPT_TIMING
+	{
+#ifdef CONFIG_CRITICAL_IRQSOFF_TIMING
+		unsigned long flags;
+
+		raw_local_save_flags(flags);
+
+		if (!raw_irqs_disabled_flags(flags))
+#endif
+			if (preempt_count() == mask)
+				__stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
+	}
+#endif
+	preempt_count() &= ~mask;
+}
+EXPORT_SYMBOL(unmask_preempt_count);
+
+
 #endif
 
 /*
@@ -1457,7 +1495,8 @@ void trace_stop_sched_switched(struct ta
 
 	trace_special_pid(p->pid, p->prio, 0);
 
-	spin_lock_irqsave(&sch.trace_lock, flags);
+	raw_local_irq_save(flags);
+	spin_lock(&sch.trace_lock);
 	if (p == sch.task) {
 		sch.task = NULL;
 		tr = sch.tr;
@@ -1472,14 +1511,14 @@ void trace_stop_sched_switched(struct ta
 		spin_unlock(&sch.trace_lock);
 		check_wakeup_timing(tr, CALLER_ADDR0);
 //		atomic_dec(&tr->disabled);
-		local_irq_restore(flags);
 	} else {
 		if (sch.task)
 			trace_special_pid(sch.task->pid, sch.task->prio, p->prio);
 		if (sch.task && (sch.task->prio >= p->prio))
 			sch.task = NULL;
-		spin_unlock_irqrestore(&sch.trace_lock, flags);
+		spin_unlock(&sch.trace_lock);
 	}
+	raw_local_irq_restore(flags);
 }
 
 void trace_change_sched_cpu(struct task_struct *p, int new_cpu)
@@ -1490,12 +1529,14 @@ void trace_change_sched_cpu(struct task_
 		return;
 
 	trace_special(task_cpu(p), task_cpu(p), new_cpu);
-	spin_lock_irqsave(&sch.trace_lock, flags);
+	raw_local_irq_save(flags);
+	spin_lock(&sch.trace_lock);
 	if (p == sch.task && task_cpu(p) != new_cpu) {
 		sch.cpu = new_cpu;
 		trace_special(task_cpu(p), new_cpu, 0);
 	}
-	spin_unlock_irqrestore(&sch.trace_lock, flags);
+	spin_unlock(&sch.trace_lock);
+	raw_local_irq_restore(flags);
 }
 
 #endif
@@ -1520,11 +1561,13 @@ long user_trace_start(void)
 	if (wakeup_timing) {
 		unsigned long flags;
 
-		spin_lock_irqsave(&sch.trace_lock, flags);
+		raw_local_irq_save(flags);
+		spin_lock(&sch.trace_lock);
 		sch.task = current;
 		sch.cpu = smp_processor_id();
 		sch.tr = tr;
-		spin_unlock_irqrestore(&sch.trace_lock, flags);
+		spin_unlock(&sch.trace_lock);
+		raw_local_irq_restore(flags);
 	}
 #endif
 	if (trace_all_cpus)
@@ -1560,16 +1603,19 @@ long user_trace_stop(void)
 	if (wakeup_timing) {
 		unsigned long flags;
 
-		spin_lock_irqsave(&sch.trace_lock, flags);
+		raw_local_irq_save(flags);
+		spin_lock(&sch.trace_lock);
 		if (current != sch.task) {
-			spin_unlock_irqrestore(&sch.trace_lock, flags);
+			spin_unlock(&sch.trace_lock);
+			raw_local_irq_restore(flags);
 			preempt_enable();
 			return -EINVAL;
 		}
 		sch.task = NULL;
 		tr = sch.tr;
 		sch.tr = NULL;
-		spin_unlock_irqrestore(&sch.trace_lock, flags);
+		spin_unlock(&sch.trace_lock);
+		raw_local_irq_restore(flags);
 	} else
 #endif
 		tr = cpu_traces + smp_processor_id();
@@ -1645,11 +1691,7 @@ static void print_entry(struct trace_ent
 
 	printk("%c%c",
 		(entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' :
-#ifdef CONFIG_RT_IRQ_DISABLE
 		(entry->flags & TRACE_FLAG_IRQS_HARD_OFF) ? 'D' : '.',
-#else
-		'.',
-#endif
  		(entry->flags & TRACE_FLAG_NEED_RESCHED) ? 'n' : '.');
 
 	hardirq = entry->flags & TRACE_FLAG_HARDIRQ;
--- init/main.c.orig
+++ init/main.c
@@ -428,7 +428,7 @@ asmlinkage void __init start_kernel(void
 {
 	char * command_line;
 	extern struct kernel_param __start___param[], __stop___param[];
-#ifdef CONFIG_RT_IRQ_DISABLE
+#ifdef CONFIG_PREEMPT_RT
 	/* 
  	 * Force the soft IRQ state to mimic the hard state until
 	 * we finish boot-up.
@@ -464,7 +464,7 @@ asmlinkage void __init start_kernel(void
 	 * fragile until we cpu_idle() for the first time.
 	 */
 	preempt_disable();
-#ifdef CONFIG_RT_IRQ_DISABLE
+#ifdef CONFIG_PREEMPT_RT
 	/*
 	 * Reset the irqs off flag after sched_init resets the preempt_count.
 	 */
@@ -501,7 +501,7 @@ asmlinkage void __init start_kernel(void
 	/*
 	 * Soft IRQ state will be enabled with the hard state.
 	 */
-	hard_local_irq_enable();
+	raw_local_irq_enable();
 
 #ifdef CONFIG_BLK_DEV_INITRD
 	if (initrd_start && !initrd_below_start_ok &&
@@ -546,8 +546,8 @@ asmlinkage void __init start_kernel(void
 
 	acpi_early_init(); /* before LAPIC and SMP init */
 
-#ifdef CONFIG_RT_IRQ_DISABLE
-	WARN_ON(hard_irqs_disabled());
+#ifdef CONFIG_PREEMPT_RT
+	WARN_ON(raw_irqs_disabled());
 #endif
 	/* Do the rest non-__init'ed, we're now alive */
 	rest_init();
@@ -591,10 +591,10 @@ static void __init do_initcalls(void)
 			msg = "disabled interrupts";
 			local_irq_enable();
 		}
-#ifdef CONFIG_RT_IRQ_DISABLE
-		if (hard_irqs_disabled()) {
+#ifdef CONFIG_PREEMPT_RT
+		if (raw_irqs_disabled()) {
 			msg = "disabled hard interrupts";
-			hard_local_irq_enable();
+			raw_local_irq_enable();
 		}
 #endif
 		if (msg) {
@@ -737,8 +737,8 @@ static int init(void * unused)
 	 * The Bourne shell can be used instead of init if we are 
 	 * trying to recover a really broken machine.
 	 */
-#ifdef CONFIG_RT_IRQ_DISABLE
-	WARN_ON(hard_irqs_disabled());
+#ifdef CONFIG_PREEMPT_RT
+	WARN_ON(raw_irqs_disabled());
 #endif
 
 	if (execute_command)
--- arch/i386/mm/fault.c.orig
+++ arch/i386/mm/fault.c
@@ -232,7 +232,7 @@ fastcall notrace void do_page_fault(stru
 		return;
 	/* It's safe to allow irq's after cr2 has been saved */
 	if (regs->eflags & (X86_EFLAGS_IF|VM_MASK))
-		hard_local_irq_enable();
+		raw_local_irq_enable();
 
 	tsk = current;
 
--- arch/i386/kernel/apic.c.orig
+++ arch/i386/kernel/apic.c
@@ -524,9 +524,9 @@ void lapic_shutdown(void)
 	if (!cpu_has_apic || !enabled_via_apicbase)
 		return;
 
-	local_irq_disable();
+	raw_local_irq_disable();
 	disable_local_APIC();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 #ifdef CONFIG_PM
@@ -570,9 +570,9 @@ static int lapic_suspend(struct sys_devi
 	apic_pm_state.apic_tdcr = apic_read(APIC_TDCR);
 	apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR);
 	
-	local_irq_save(flags);
+	raw_local_irq_save(flags);
 	disable_local_APIC();
-	local_irq_restore(flags);
+	raw_local_irq_restore(flags);
 	return 0;
 }
 
@@ -584,7 +584,7 @@ static int lapic_resume(struct sys_devic
 	if (!apic_pm_state.active)
 		return 0;
 
-	local_irq_save(flags);
+	raw_local_irq_save(flags);
 
 	/*
 	 * Make sure the APICBASE points to the right address
@@ -615,7 +615,7 @@ static int lapic_resume(struct sys_devic
 	apic_write(APIC_LVTERR, apic_pm_state.apic_lvterr);
 	apic_write(APIC_ESR, 0);
 	apic_read(APIC_ESR);
-	local_irq_restore(flags);
+	raw_local_irq_restore(flags);
 	return 0;
 }
 
@@ -934,7 +934,7 @@ static void __init setup_APIC_timer(unsi
 {
 	unsigned long flags;
 
-	local_irq_save(flags);
+	raw_local_irq_save(flags);
 
 	/*
 	 * Wait for IRQ0's slice:
@@ -943,7 +943,7 @@ static void __init setup_APIC_timer(unsi
 
 	__setup_APIC_LVTT(clocks);
 
-	local_irq_restore(flags);
+	raw_local_irq_restore(flags);
 }
 
 /*
@@ -1032,7 +1032,7 @@ void __init setup_boot_APIC_clock(void)
 	apic_printk(APIC_VERBOSE, "Using local APIC timer interrupts.\n");
 	using_apic_timer = 1;
 
-	local_irq_disable();
+	raw_local_irq_disable();
 
 	calibration_result = calibrate_APIC_clock();
 	/*
@@ -1040,7 +1040,7 @@ void __init setup_boot_APIC_clock(void)
 	 */
 	setup_APIC_timer(calibration_result);
 
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 void __init setup_secondary_APIC_clock(void)
--- arch/i386/kernel/smpboot.c.orig
+++ arch/i386/kernel/smpboot.c
@@ -440,7 +440,7 @@ static void __init start_secondary(void 
 	cpu_set(smp_processor_id(), cpu_online_map);
 
 	/* We can take interrupts now: we're officially "up". */
-	local_irq_enable();
+	raw_local_irq_enable();
 
 	wmb();
 	cpu_idle();
@@ -1120,17 +1120,17 @@ int __devinit __cpu_up(unsigned int cpu)
 {
 	/* This only works at boot for x86.  See "rewrite" above. */
 	if (cpu_isset(cpu, smp_commenced_mask)) {
-		local_irq_enable();
+		raw_local_irq_enable();
 		return -ENOSYS;
 	}
 
 	/* In case one didn't come up */
 	if (!cpu_isset(cpu, cpu_callin_map)) {
-		local_irq_enable();
+		raw_local_irq_enable();
 		return -EIO;
 	}
 
-	local_irq_enable();
+	raw_local_irq_enable();
 	/* Unleash the CPU! */
 	cpu_set(cpu, smp_commenced_mask);
 	while (!cpu_isset(cpu, cpu_online_map))
--- arch/i386/kernel/signal.c.orig
+++ arch/i386/kernel/signal.c
@@ -597,7 +597,7 @@ int fastcall do_signal(struct pt_regs *r
 	/*
 	 * Fully-preemptible kernel does not need interrupts disabled:
 	 */
-	hard_local_irq_enable();
+	raw_local_irq_enable();
 	preempt_check_resched();
 #endif
 	/*
--- arch/i386/kernel/nmi.c.orig
+++ arch/i386/kernel/nmi.c
@@ -513,6 +513,7 @@ void notrace nmi_watchdog_tick (struct p
 	if (nmi_show_regs[cpu]) {
 		nmi_show_regs[cpu] = 0;
 		spin_lock(&nmi_print_lock);
+		printk("NMI show regs on CPU#%d:\n", cpu);
 		show_regs(regs);
 		spin_unlock(&nmi_print_lock);
 	}
@@ -523,15 +524,24 @@ void notrace nmi_watchdog_tick (struct p
 		 * wait a few IRQs (5 seconds) before doing the oops ...
 		 */
 		alert_counter[cpu]++;
-		if (alert_counter[cpu] == 5*nmi_hz) {
+		if (alert_counter[cpu] && !(alert_counter[cpu] % (5*nmi_hz))) {
 			int i;
 
 			bust_spinlocks(1);
-			for (i = 0; i < NR_CPUS; i++)
-				nmi_show_regs[i] = 1;
-		}
-		if (alert_counter[cpu] == 5*nmi_hz)
+			spin_lock(&nmi_print_lock);
+			printk("NMI watchdog detected lockup on CPU#%d (%d/%d)\n", cpu, alert_counter[cpu], 5*nmi_hz);
+			show_regs(regs);
+			spin_unlock(&nmi_print_lock);
+
+			for_each_online_cpu(i)
+				if (i != cpu)
+					nmi_show_regs[i] = 1;
+			for_each_online_cpu(i)
+				while (nmi_show_regs[i] == 1)
+					barrier();
+
 			die_nmi(regs, "NMI Watchdog detected LOCKUP");
+		}
 	} else {
 		last_irq_sums[cpu] = sum;
 		alert_counter[cpu] = 0;
--- arch/i386/kernel/entry.S.orig
+++ arch/i386/kernel/entry.S
@@ -76,10 +76,10 @@ NT_MASK		= 0x00004000
 VM_MASK		= 0x00020000
 
 #ifdef CONFIG_PREEMPT
-#define preempt_stop		cli
+# define preempt_stop		cli
 #else
-#define preempt_stop
-#define resume_kernel		restore_nocheck
+# define preempt_stop
+# define resume_kernel		restore_nocheck
 #endif
 
 #define SAVE_ALL \
@@ -331,7 +331,7 @@ work_pending:
 work_resched:
 	cli
 	call __schedule
-#ifdef CONFIG_RT_IRQ_DISABLE
+#ifdef CONFIG_PREEMPT_RT
 	call local_irq_enable_noresched
 #endif
 					# make sure we don't miss an interrupt
--- arch/i386/kernel/process.c.orig
+++ arch/i386/kernel/process.c
@@ -96,13 +96,13 @@ EXPORT_SYMBOL(enable_hlt);
 void default_idle(void)
 {
 	if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
-		hard_local_irq_disable();
+		raw_local_irq_disable();
 		if (!need_resched())
-			hard_safe_halt();
+			raw_safe_halt();
 		else
-			hard_local_irq_enable();
+			raw_local_irq_enable();
 	} else {
-		hard_local_irq_enable();
+		raw_local_irq_enable();
 		cpu_relax();
 	}
 }
@@ -149,9 +149,8 @@ void cpu_idle (void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
-#ifdef CONFIG_RT_IRQ_DISABLE
-			BUG_ON(hard_irqs_disabled());
-#endif
+		BUG_ON(raw_irqs_disabled());
+
 		while (!need_resched()) {
 			void (*idle)(void);
 
@@ -169,9 +168,9 @@ void cpu_idle (void)
 			propagate_preempt_locks_value();
 			idle();
 		}
-		hard_local_irq_disable();
+		raw_local_irq_disable();
 		__schedule();
-		hard_local_irq_enable();
+		raw_local_irq_enable();
 	}
 }
 
--- arch/i386/kernel/traps.c.orig
+++ arch/i386/kernel/traps.c
@@ -376,7 +376,7 @@ static void do_trap(int trapnr, int sign
 		goto kernel_trap;
 
 #ifdef CONFIG_PREEMPT_RT
-	hard_local_irq_enable();
+	raw_local_irq_enable();
 	preempt_check_resched();
 #endif
 
@@ -508,7 +508,7 @@ fastcall void do_general_protection(stru
 	return;
 
 gp_in_vm86:
-	hard_local_irq_enable();
+	raw_local_irq_enable();
 	handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
 	return;
 
@@ -705,7 +705,7 @@ fastcall void do_debug(struct pt_regs * 
 		return;
 	/* It's safe to allow irq's after DR6 has been saved */
 	if (regs->eflags & X86_EFLAGS_IF)
-		hard_local_irq_enable();
+		raw_local_irq_enable();
 
 	/* Mask out spurious debug traps due to lazy DR7 setting */
 	if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
--- arch/i386/kernel/smp.c.orig
+++ arch/i386/kernel/smp.c
@@ -162,7 +162,7 @@ void send_IPI_mask_bitmask(cpumask_t cpu
 	unsigned long cfg;
 	unsigned long flags;
 
-	local_irq_save(flags);
+	raw_local_irq_save(flags);
 		
 	/*
 	 * Wait for idle.
@@ -185,7 +185,7 @@ void send_IPI_mask_bitmask(cpumask_t cpu
 	 */
 	apic_write_around(APIC_ICR, cfg);
 
-	local_irq_restore(flags);
+	raw_local_irq_restore(flags);
 }
 
 void send_IPI_mask_sequence(cpumask_t mask, int vector)
@@ -199,7 +199,7 @@ void send_IPI_mask_sequence(cpumask_t ma
 	 * should be modified to do 1 message per cluster ID - mbligh
 	 */ 
 
-	local_irq_save(flags);
+	raw_local_irq_save(flags);
 
 	for (query_cpu = 0; query_cpu < NR_CPUS; ++query_cpu) {
 		if (cpu_isset(query_cpu, mask)) {
@@ -226,7 +226,7 @@ void send_IPI_mask_sequence(cpumask_t ma
 			apic_write_around(APIC_ICR, cfg);
 		}
 	}
-	local_irq_restore(flags);
+	raw_local_irq_restore(flags);
 }
 
 #include <mach_ipi.h> /* must come after the send_IPI functions above for inlining */
@@ -530,7 +530,7 @@ int smp_call_function (void (*func) (voi
 		return 0;
 
 	/* Can deadlock when called with interrupts disabled */
-	WARN_ON(irqs_disabled());
+	WARN_ON(raw_irqs_disabled());
 
 	data.func = func;
 	data.info = info;
@@ -564,7 +564,7 @@ static void stop_this_cpu (void * dummy)
 	 * Remove this CPU:
 	 */
 	cpu_clear(smp_processor_id(), cpu_online_map);
-	local_irq_disable();
+	raw_local_irq_disable();
 	disable_local_APIC();
 	if (cpu_data[smp_processor_id()].hlt_works_ok)
 		for(;;) __asm__("hlt");
@@ -579,9 +579,9 @@ void smp_send_stop(void)
 {
 	smp_call_function(stop_this_cpu, NULL, 1, 0);
 
-	local_irq_disable();
+	raw_local_irq_disable();
 	disable_local_APIC();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 /*
--- arch/i386/kernel/irq.c.orig
+++ arch/i386/kernel/irq.c
@@ -218,7 +218,8 @@ int show_interrupts(struct seq_file *p, 
 	}
 
 	if (i < NR_IRQS) {
-		spin_lock_irqsave(&irq_desc[i].lock, flags);
+		raw_local_irq_save(flags);
+		spin_lock(&irq_desc[i].lock);
 		action = irq_desc[i].action;
 		if (!action)
 			goto skip;
@@ -239,7 +240,8 @@ int show_interrupts(struct seq_file *p, 
 
 		seq_putc(p, '\n');
 skip:
-		spin_unlock_irqrestore(&irq_desc[i].lock, flags);
+		spin_unlock(&irq_desc[i].lock);
+		raw_local_irq_restore(flags);
 	} else if (i == NR_IRQS) {
 		seq_printf(p, "NMI: ");
 		for (j = 0; j < NR_CPUS; j++)
--- drivers/usb/net/usbnet.c.orig
+++ drivers/usb/net/usbnet.c
@@ -3490,6 +3490,8 @@ static void tx_complete (struct urb *urb
 
 	urb->dev = NULL;
 	entry->state = tx_done;
+	spin_lock_rt(&dev->txq.lock);
+	spin_unlock_rt(&dev->txq.lock);
 	defer_bh (dev, skb);
 }
 
--- include/linux/seqlock.h.orig
+++ include/linux/seqlock.h
@@ -305,26 +305,26 @@ do {								\
  * Possible sw/hw IRQ protected versions of the interfaces.
  */
 #define write_seqlock_irqsave(lock, flags)				\
-	do { PICK_IRQOP2(hard_local_irq_save, flags, lock); write_seqlock(lock); } while (0)
+	do { PICK_IRQOP2(raw_local_irq_save, flags, lock); write_seqlock(lock); } while (0)
 #define write_seqlock_irq(lock)						\
-	do { PICK_IRQOP(hard_local_irq_disable, lock); write_seqlock(lock); } while (0)
+	do { PICK_IRQOP(raw_local_irq_disable, lock); write_seqlock(lock); } while (0)
 #define write_seqlock_bh(lock)						\
         do { PICK_IRQOP(local_bh_disable, lock); write_seqlock(lock); } while (0)
 
 #define write_sequnlock_irqrestore(lock, flags)				\
-	do { write_sequnlock(lock); PICK_IRQOP2(hard_local_irq_restore, flags, lock); preempt_check_resched(); } while(0)
+	do { write_sequnlock(lock); PICK_IRQOP2(raw_local_irq_restore, flags, lock); preempt_check_resched(); } while(0)
 #define write_sequnlock_irq(lock)					\
-	do { write_sequnlock(lock); PICK_IRQOP(hard_local_irq_enable, lock); preempt_check_resched(); } while(0)
+	do { write_sequnlock(lock); PICK_IRQOP(raw_local_irq_enable, lock); preempt_check_resched(); } while(0)
 #define write_sequnlock_bh(lock)					\
 	do { write_sequnlock(lock); PICK_IRQOP(local_bh_enable, lock); } while(0)
 
 #define read_seqbegin_irqsave(lock, flags)				\
-	({ PICK_IRQOP2(hard_local_irq_save, flags, lock); read_seqbegin(lock); })
+	({ PICK_IRQOP2(raw_local_irq_save, flags, lock); read_seqbegin(lock); })
 
 #define read_seqretry_irqrestore(lock, iv, flags)			\
 	({								\
 		int ret = read_seqretry(lock, iv);			\
-		PICK_IRQOP2(hard_local_irq_restore, flags, lock);		\
+		PICK_IRQOP2(raw_local_irq_restore, flags, lock);		\
 		preempt_check_resched(); 				\
 		ret;							\
 	})
--- include/linux/rt_irq.h.orig
+++ include/linux/rt_irq.h
@@ -0,0 +1,69 @@
+#ifndef __LINUX_RT_IRQ_H
+#define __LINUX_RT_IRQ_H
+
+/*
+ * Soft IRQ flag support on PREEMPT_RT kernels:
+ */
+#ifdef CONFIG_PREEMPT_RT
+
+extern void local_irq_enable(void);
+extern void local_irq_enable_noresched(void);
+extern void local_irq_disable(void);
+extern void local_irq_restore(unsigned long flags);
+extern void __local_save_flags(unsigned long *flags);
+extern void __local_irq_save(unsigned long *flags);
+extern int irqs_disabled(void);
+extern int irqs_disabled_flags(unsigned long flags);
+
+#define local_save_flags(flags)		__local_save_flags(&(flags))
+#define local_irq_save(flags)		__local_irq_save(&(flags))
+
+/* Force the soft state to follow the hard state */
+#define raw_local_save_flags(flags)	__raw_local_save_flags(flags)
+
+#define raw_local_irq_enable() \
+		do { \
+			local_irq_enable_noresched(); \
+			__raw_local_irq_enable(); \
+		} while (0)
+
+#define raw_local_irq_disable() \
+		do { __raw_local_irq_disable(); local_irq_disable(); } while (0)
+#define raw_local_irq_save(x) \
+		do { __raw_local_irq_save(x); } while (0)
+#define raw_local_irq_restore(x) \
+		do { __raw_local_irq_restore(x); } while (0)
+#define raw_safe_halt() \
+		do { local_irq_enable(); __raw_safe_halt(); } while (0)
+#else
+# define raw_local_save_flags		__raw_local_save_flags
+# define raw_local_irq_enable		__raw_local_irq_enable
+# define raw_local_irq_disable		__raw_local_irq_disable
+# define raw_local_irq_save		__raw_local_irq_save
+# define raw_local_irq_restore		__raw_local_irq_restore
+# define raw_safe_halt			__raw_safe_halt
+# define local_irq_enable_noresched	__raw_local_irq_enable
+# define local_save_flags		__raw_local_save_flags
+# define local_irq_enable		__raw_local_irq_enable 
+# define local_irq_disable		__raw_local_irq_disable 
+# define local_irq_save			__raw_local_irq_save
+# define local_irq_restore		__raw_local_irq_restore
+# define irqs_disabled			__raw_irqs_disabled
+# define irqs_disabled_flags		__raw_irqs_disabled_flags
+# define safe_halt			raw_safe_halt
+#endif
+
+#define raw_irqs_disabled		__raw_irqs_disabled
+#define raw_irqs_disabled_flags		__raw_irqs_disabled_flags
+
+#ifdef CONFIG_CRITICAL_IRQSOFF_TIMING
+  extern void notrace trace_irqs_off_lowlevel(void);
+  extern void notrace trace_irqs_off(void);
+  extern void notrace trace_irqs_on(void);
+#else
+# define trace_irqs_off_lowlevel()	do { } while (0)
+# define trace_irqs_off()		do { } while (0)
+# define trace_irqs_on()		do { } while (0)
+#endif
+
+#endif /* __LINUX_RT_IRQ_H */
--- include/linux/hardirq.h.orig
+++ include/linux/hardirq.h
@@ -16,20 +16,22 @@
  * The hardirq count can be overridden per architecture, the default is:
  *
  * - bits 16-27 are the hardirq count (max # of hardirqs: 4096)
- * - ( bit 28 is the PREEMPT_ACTIVE flag. )
+ * - bit 28 is the PREEMPT_ACTIVE flag
+ * - bit 29 is the soft irq-disable flag, IRQSOFF
  *
- * PREEMPT_MASK: 0x000000ff
- * SOFTIRQ_MASK: 0x0000ff00
- * HARDIRQ_MASK: 0x0fff0000
+ * PREEMPT_MASK:         0x000000ff
+ * SOFTIRQ_MASK:         0x0000ff00
+ * HARDIRQ_MASK:         0x0fff0000
+ * PREEMPT_ACTIVE_MASK:  0x10000000
+ * IRQSOFF_MASK:         0x20000000
  */
-#define PREEMPT_BITS	8
-#define SOFTIRQ_BITS	8
-
-#define IRQSOFF_BITS	1 
-#define PREEMPTACTIVE_BITS	1 
-
+#define PREEMPT_BITS		8
+#define SOFTIRQ_BITS		8
 #ifndef HARDIRQ_BITS
-#define HARDIRQ_BITS	12
+#define HARDIRQ_BITS		12
+#define PREEMPT_ACTIVE_BITS	1
+#define IRQSOFF_BITS		1
+
 /*
  * The hardirq mask has to be large enough to have space for potentially
  * all IRQ sources in the system nesting on a single CPU.
@@ -39,34 +41,31 @@
 #endif
 #endif
 
-#define PREEMPT_SHIFT	0
-#define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
-#define HARDIRQ_SHIFT	(SOFTIRQ_SHIFT + SOFTIRQ_BITS)
-
-#define PREEMPTACTIVE_SHIFT	(HARDIRQ_SHIFT + HARDIRQ_BITS)
-#define IRQSOFF_SHIFT 		(PREEMPTACTIVE_SHIFT + PREEMPTACTIVE_BITS)
-
-#define __IRQ_MASK(x)	((1UL << (x))-1)
+#define PREEMPT_SHIFT		0
+#define SOFTIRQ_SHIFT		(PREEMPT_SHIFT + PREEMPT_BITS)
+#define HARDIRQ_SHIFT		(SOFTIRQ_SHIFT + SOFTIRQ_BITS)
+#define PREEMPT_ACTIVE_SHIFT	(HARDIRQ_SHIFT + HARDIRQ_BITS)
+#define IRQSOFF_SHIFT 		(PREEMPT_ACTIVE_SHIFT + PREEMPT_ACTIVE_BITS)
+
+#define __IRQ_MASK(x)		((1UL << (x))-1)
+
+#define PREEMPT_MASK		(__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
+#define SOFTIRQ_MASK		(__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
+#define HARDIRQ_MASK		(__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
+#define IRQSOFF_MASK		(__IRQ_MASK(IRQSOFF_BITS) << IRQSOFF_SHIFT)
+
+#define PREEMPT_OFFSET		(1UL << PREEMPT_SHIFT)
+#define SOFTIRQ_OFFSET		(1UL << SOFTIRQ_SHIFT)
+#define HARDIRQ_OFFSET		(1UL << HARDIRQ_SHIFT)
+#define IRQSOFF_OFFSET		(1UL << IRQSOFF_SHIFT)
 
-#define PREEMPT_MASK	(__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
-#define SOFTIRQ_MASK	(__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
-#define HARDIRQ_MASK	(__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
-
-#define IRQSOFF_MASK	(__IRQ_MASK(IRQSOFF_BITS) << IRQSOFF_SHIFT)
-
-#define PREEMPT_OFFSET	(1UL << PREEMPT_SHIFT)
-#define SOFTIRQ_OFFSET	(1UL << SOFTIRQ_SHIFT)
-#define HARDIRQ_OFFSET	(1UL << HARDIRQ_SHIFT)
-
-#define IRQSOFF_OFFSET        (1UL << IRQSOFF_SHIFT)
 #if PREEMPT_ACTIVE < (1 << (HARDIRQ_SHIFT + HARDIRQ_BITS))
-#error PREEMPT_ACTIVE is too low!
+# error PREEMPT_ACTIVE is too low!
 #endif
 
 #define hardirq_count()	(preempt_count() & HARDIRQ_MASK)
 #define softirq_count()	(preempt_count() & SOFTIRQ_MASK)
 #define irq_count()	(preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK))
-
 #define irqs_off()	(preempt_count() & IRQSOFF_MASK)
 
 /*
@@ -80,9 +79,9 @@
 #if defined(CONFIG_PREEMPT) && \
 	!defined(CONFIG_PREEMPT_BKL) && \
 		!defined(CONFIG_PREEMPT_RT)
-# define in_atomic()	((preempt_count() & ~(PREEMPT_ACTIVE|IRQSOFF_OFFSET)) != kernel_locked())
+# define in_atomic()	((preempt_count() & ~(PREEMPT_ACTIVE|IRQSOFF_MASK)) != kernel_locked())
 #else
-# define in_atomic()	((preempt_count() & ~(PREEMPT_ACTIVE|IRQSOFF_OFFSET)) != 0)
+# define in_atomic()	((preempt_count() & ~(PREEMPT_ACTIVE|IRQSOFF_MASK)) != 0)
 #endif
 
 #ifdef CONFIG_PREEMPT
--- include/linux/preempt.h.orig
+++ include/linux/preempt.h
@@ -10,11 +10,17 @@
 #include <linux/linkage.h>
 
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_CRITICAL_TIMING)
-  extern void notrace add_preempt_count(int val);
-  extern void notrace sub_preempt_count(int val);
+  extern void notrace add_preempt_count(unsigned int val);
+  extern void notrace sub_preempt_count(unsigned int val);
+  extern void notrace mask_preempt_count(unsigned int mask);
+  extern void notrace unmask_preempt_count(unsigned int mask);
 #else
 # define add_preempt_count(val)	do { preempt_count() += (val); } while (0)
 # define sub_preempt_count(val)	do { preempt_count() -= (val); } while (0)
+# define mask_preempt_count(mask) \
+		do { preempt_count() |= (mask); } while (0)
+# define unmask_preempt_count(mask) \
+		do { preempt_count() &= ~(mask); } while (0)
 #endif
 
 #ifdef CONFIG_CRITICAL_TIMING
--- include/asm-i386/system.h.orig
+++ include/asm-i386/system.h
@@ -440,80 +440,32 @@ struct alt_instr { 
 
 #define set_wmb(var, value) do { var = value; wmb(); } while (0)
 
-#ifdef CONFIG_CRITICAL_IRQSOFF_TIMING
-  extern void notrace trace_irqs_off_lowlevel(void);
-  extern void notrace trace_irqs_off(void);
-  extern void notrace trace_irqs_on(void);
-#else
-# define trace_irqs_off_lowlevel()	do { } while (0)
-# define trace_irqs_off()		do { } while (0)
-# define trace_irqs_on()		do { } while (0)
-#endif
-
-#ifdef CONFIG_RT_IRQ_DISABLE
-extern void local_irq_enable(void);
-extern void local_irq_enable_noresched(void);
-extern void local_irq_disable(void);
-extern void local_irq_restore(unsigned long);
-extern unsigned long irqs_disabled(void);
-extern unsigned long irqs_disabled_flags(unsigned long);
-extern unsigned int ___local_save_flags(void);
-extern void irq_trace_enable(void);
-extern void irq_trace_disable(void);
-
-#define local_save_flags(x) ({ x = ___local_save_flags(); x;})
-#define local_irq_save(x) ({ local_save_flags(x); local_irq_disable(); x;})
-#define safe_halt()	do { local_irq_enable(); __asm__ __volatile__("hlt": : :"memory"); } while (0)
-
-/* Force the softstate to follow the hard state */
-#define hard_local_save_flags(x)	_hard_local_save_flags(x)
-#define hard_local_irq_enable()		do { local_irq_enable_noresched(); _hard_local_irq_enable(); } while(0)
-#define hard_local_irq_disable()	do { _hard_local_irq_disable(); local_irq_disable(); } while(0)
-#define hard_local_irq_save(x)		do { _hard_local_irq_save(x); } while(0)
-#define hard_local_irq_restore(x)		do { _hard_local_irq_restore(x); } while (0)
-#define hard_safe_halt()			do { local_irq_enable(); _hard_safe_halt(); } while (0)
-#else
-
-#define hard_local_save_flags		_hard_local_save_flags
-#define hard_local_irq_enable		_hard_local_irq_enable
-#define hard_local_irq_disable		_hard_local_irq_disable
-#define hard_local_irq_save		_hard_local_irq_save
-#define hard_local_irq_restore		_hard_local_irq_restore
-#define hard_safe_halt			_hard_safe_halt
-
-#define local_irq_enable_noresched _hard_local_irq_enable
-#define local_save_flags	_hard_local_save_flags
-#define local_irq_enable	_hard_local_irq_enable 
-#define local_irq_disable	_hard_local_irq_disable 
-#define local_irq_save		_hard_local_irq_save
-#define local_irq_restore	_hard_local_irq_restore
-#define irqs_disabled		hard_irqs_disabled
-#define irqs_disabled_flags	hard_irqs_disabled_flags
-#define safe_halt		hard_safe_halt
-#endif
-
 /* interrupt control.. */
-#define _hard_local_save_flags(x)	do { typecheck(unsigned long,x); __asm__ __volatile__("pushfl ; popl %0":"=g" (x): /* no input */); } while (0)
-#define _hard_local_irq_restore(x) 	do { typecheck(unsigned long,x); if (hard_irqs_disabled_flags(x)) trace_irqs_on(); else trace_irqs_on(); __asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory", "cc"); } while (0)
-#define _hard_local_irq_disable() 	do { __asm__ __volatile__("cli": : :"memory"); trace_irqs_off(); } while (0)
-#define _hard_local_irq_enable()	do { trace_irqs_on(); __asm__ __volatile__("sti": : :"memory"); } while (0)
+#define __raw_local_save_flags(x)	do { typecheck(unsigned long,x); __asm__ __volatile__("pushfl ; popl %0":"=g" (x): /* no input */); } while (0)
+#define __raw_local_irq_restore(x) 	do { typecheck(unsigned long,x); if (raw_irqs_disabled_flags(x)) trace_irqs_on(); else trace_irqs_on(); __asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory", "cc"); } while (0)
+#define __raw_local_irq_disable() 	do { __asm__ __volatile__("cli": : :"memory"); trace_irqs_off(); } while (0)
+#define __raw_local_irq_enable()	do { trace_irqs_on(); __asm__ __volatile__("sti": : :"memory"); } while (0)
 /* used in the idle loop; sti takes one instruction cycle to complete */
-#define _hard_safe_halt()		do { trace_irqs_on(); __asm__ __volatile__("sti; hlt": : :"memory"); } while (0)
+#define __raw_safe_halt()		do { trace_irqs_on(); __asm__ __volatile__("sti; hlt": : :"memory"); } while (0)
 
-#define hard_irqs_disabled_flags(flags)	\
-({					\
-	!(flags & (1<<9));		\
+#define __raw_irqs_disabled_flags(flags)	\
+({						\
+	!(flags & (1<<9));			\
 })
 
-#define hard_irqs_disabled()			\
-({					\
-	unsigned long flags;		\
-	hard_local_save_flags(flags);	\
-	hard_irqs_disabled_flags(flags);	\
+#define __raw_irqs_disabled()			\
+({						\
+	unsigned long flags;			\
+	__raw_local_save_flags(flags);		\
+	__raw_irqs_disabled_flags(flags);	\
 })
 
 /* For spinlocks etc */
-#define _hard_local_irq_save(x)	do { __asm__ __volatile__("pushfl ; popl %0 ; cli":"=g" (x): /* no input */ :"memory"); trace_irqs_off(); } while (0)
+#define __raw_local_irq_save(x)	do { __asm__ __volatile__("pushfl ; popl %0 ; cli":"=g" (x): /* no input */ :"memory"); trace_irqs_off(); } while (0)
+
+#include <linux/rt_irq.h>
+
+#define safe_halt()	do { local_irq_enable(); __asm__ __volatile__("hlt": : :"memory"); } while (0)
 
 /*
  * disable hlt during certain critical i/o operations
--- Makefile.orig
+++ Makefile
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 12
-EXTRAVERSION =-rc6-RT-V0.7.47-30
+EXTRAVERSION =-rc6-RT-V0.7.48-00
 NAME=Woozy Numbat
 
 # *DOCUMENTATION*
--- lib/kernel_lock.c.orig
+++ lib/kernel_lock.c
@@ -24,7 +24,7 @@ unsigned int notrace smp_processor_id(vo
 	if (likely(preempt_count))
 		goto out;
 
-	if (irqs_disabled())
+	if (irqs_disabled() || raw_irqs_disabled())
 		goto out;
 
 	/*
@@ -50,7 +50,7 @@ unsigned int notrace smp_processor_id(vo
 	if (!printk_ratelimit())
 		goto out_enable;
 
-	printk(KERN_ERR "BUG: using smp_processor_id() in preemptible [%08x] code: %s/%d\n", preempt_count(), current->comm, current->pid);
+	printk(KERN_ERR "BUG: using smp_processor_id() in preemptible [%08x] code: %s/%d\n", preempt_count()-1, current->comm, current->pid);
 	print_symbol("caller is %s\n", (long)__builtin_return_address(0));
 	dump_stack();
 
@@ -98,8 +98,7 @@ int __lockfunc __reacquire_kernel_lock(v
 	struct task_struct *task = current;
 	int saved_lock_depth = task->lock_depth;
 
-	hard_local_irq_enable();
-
+	raw_local_irq_enable();
 	BUG_ON(saved_lock_depth < 0);
 
 	task->lock_depth = -1;
@@ -108,8 +107,8 @@ int __lockfunc __reacquire_kernel_lock(v
 
 	task->lock_depth = saved_lock_depth;
 
-	hard_local_irq_disable();
-	
+	raw_local_irq_disable();
+
 	return 0;
 }
 
--- lib/Kconfig.RT.orig
+++ lib/Kconfig.RT
@@ -81,25 +81,6 @@ config PREEMPT_RT
 
 endchoice
 
-config RT_IRQ_DISABLE
-	bool "Real-Time IRQ Disable"
-	default y
-	depends on PREEMPT_RT
-	help
-	  This option will remove all local_irq_enable() and
-	  local_irq_disable() calls and replace them with soft
-	  versions. This will decrease the frequency that
-	  interrupt are disabled.
-
-	  All interrupts that are flagged with SA_NODELAY are
-	  considered hard interrupts. This option will force
-	  SA_NODELAY interrupts to run even when they normally
-	  wouldn't be enabled.
-
-	  Select this if you plan to use Linux in an 
-	  embedded enviorment that needs low interrupt
-	  latency.
-
 config PREEMPT
 	bool
 	default y

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-08 11:21 ` Ingo Molnar
@ 2005-06-08 20:33   ` Daniel Walker
  2005-06-09 11:56     ` Ingo Molnar
  0 siblings, 1 reply; 86+ messages in thread
From: Daniel Walker @ 2005-06-08 20:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, sdietrich

Great.

On Wed, 2005-06-08 at 13:21 +0200, Ingo Molnar wrote:
>
> i've attached below the delta relative to your patch. The changes are:
> 
>  - fixed a soft-local_irq_restore() bug: it didnt re-disable the IRQ 
>    flag if the flags passed in had it set.
> 
>  - fixed SMP support - both the scheduler and the lowlevel SMP code was 
>    not fully converted to the soft flag assumptions. The PREEMPT_RT 
>    kernel now boots fine on a 2-way/4-way x86 box.
> 
>  - fixed the APIC code
> 
>  - fixed irq-latency tracing and other tracing assumptions
> 
>  - fixed DEBUG_RT_DEADLOCK_DETECT - we checked for the wrong irq flags
> 
>  - added debug code to find IRQ flag mismatches: mixing the CPU and soft 
>    flags is lethal, but detectable.
> 
>  - simplified the code which should thus also be faster: introduced the
>    mask_preempt_count/unmask_preempt_count primitives and made the 
>    soft-flag code use it.
> 
>  - cleaned up the interdependencies of the soft-flag functions - they 
>    now dont call each other anymore, they all use inlined code for 
>    maximum performance.

Should be macro's one day ...

>  - made the soft IRQ flag an unconditional feature of PREEMPT_RT: once 
>    it works properly there's no reason to ever disable it under 
>    PREEMPT_RT.
>
>  - renamed hard_ to raw_, to bring it in line with other constructs in 
>    PREEMPT_RT.
> 
>  - cleaned up the system.h impact by creating linux/rt_irq.h. Made the 
>    naming consistent all across.
> 
>  - cleaned up the preempt.h impact and updated the comments.
> 
>  - fixed smp_processor_id() debugging: we have to check for the CPU irq 
>    flag too.
> 


Excellent .. I have one fix related to preempt_schedule_irq() below.
There needs to be an ifdef , cause when PREEPMT_RT is off you would end
up with interrupts enabled when exiting preempt_schedule_irq() ..


Index: linux-2.6.11/kernel/sched.c
===================================================================
--- linux-2.6.11.orig/kernel/sched.c	2005-06-08 20:25:00.000000000 +0000
+++ linux-2.6.11/kernel/sched.c	2005-06-08 20:24:37.000000000 +0000
@@ -3245,7 +3245,9 @@ need_resched:
 	__schedule();
 
 	raw_local_irq_disable();
+#ifdef CONFIG_PREEMPT_RT
 	local_irq_enable_noresched();
+#endif
 
 #ifdef CONFIG_PREEMPT_BKL
 	task->lock_depth = saved_lock_depth;




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-08 20:33   ` Daniel Walker
@ 2005-06-09 11:56     ` Ingo Molnar
  0 siblings, 0 replies; 86+ messages in thread
From: Ingo Molnar @ 2005-06-09 11:56 UTC (permalink / raw)
  To: Daniel Walker; +Cc: linux-kernel, sdietrich


* Daniel Walker <dwalker@mvista.com> wrote:

> Excellent .. I have one fix related to preempt_schedule_irq() below. 
> There needs to be an ifdef , cause when PREEPMT_RT is off you would 
> end up with interrupts enabled when exiting preempt_schedule_irq() ..

thanks, i've added this to the -48-03 patch.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-08  7:08 [PATCH] local_irq_disable removal Daniel Walker
  2005-06-08 11:21 ` Ingo Molnar
@ 2005-06-10 23:37 ` Esben Nielsen
  2005-06-11  0:20   ` Daniel Walker
  2005-06-11 16:51 ` Christoph Hellwig
  2 siblings, 1 reply; 86+ messages in thread
From: Esben Nielsen @ 2005-06-10 23:37 UTC (permalink / raw)
  To: Daniel Walker; +Cc: linux-kernel, mingo, sdietrich

I am sorry, Daniel, but this patch doesn't make much sense to me.
As far as I can see you effectlively turned local_irq_disable() into a
preempt_disable(). I.e. it gives no improvement to task latency. As all
interrupts are threaded it will not improve irq-latency either....

I hope it is just me who have misunderstood the patch, but as far as I see
it
 taks->preempt_count
is non-zero in a local_irq_disable() region. That means preempt_schedule()
wont schedule.

What is the problem you wanted to fix in the first place?
Drivers and subsystems using local_irq_disable()/enable() as a lock -
which is valid for !PREEMPT_RT to protect per-cpu variables but not a good
idea when we want determnistic realtime. Thus these regions needs to be
made preemptive, such any RT events can come through even though you have
enabled a non-RT subsystem  where local_irq_disable()/enable() haven't
been removed. 

As far as I can see the only solution is to replace them with a per-cpu
mutex. Such a mutex can be the rt_mutex for now, but someone may want to
make a more optimized per-cpu version where a raw_spin_lock isn't used.
That would make it nearly as cheap as cli()/sti() when there is no
congestion. One doesn't need PI for this region either as the RT
subsystems will not hit it anyway.

Esben



On Wed, 8 Jun 2005, Daniel Walker wrote:

> Introduction:
> 
>         The current Real-Time patch from Ingo Molnar significantly
> modifies several kernel subsystems. One such system is the 
> interrupt handling system. With the current Real Time Linux kernel,
> interrupts are deferred and scheduled inside of threads. This 
> means most interrupts are delayed , and are run along with all other
> threads and user space processes. The result of this is that a minimal
> amount of code runs in actual interrupt context. The code executing in
> interrupt context only unblocks, (wakes-up) interrupt threads, and then
> immediately returns to process context.
> 
> 
>         Usually a driver that implements an interrupt handler, along
> with system calls or IO controls, would need to disable interrupts to
> stop the interrupt handler from running. In this type of driver you must
> disable interrupts to have control over when the interrupt runs, which
> is essential to serialize data access in the driver. However, this type
> of paradigm changes in the face of threaded interrupt handlers.
> 
> Theory:
> 
>         The interrupt in thread model changes the driver paradigm that I
> describe above in one fundamental way, the interrupt handler 
> no longer runs in interrupt context. With this one change it is still
> possible to stop the interrupt handler from running by disabling 
> interrupts. Which is what the current real time patch does. Since
> interrupt handlers are now threads, to stop an interrupt handler from
> running one must only disable preemption.
> 
> Implementation:
> 
>         I've written code that removes 70% all interrupt disable
> sections in the current real time kernel. These interrupt disable 
> sections are replaced with a special preempt disable section. Since an
> interrupt disable section implicitly disables preemption, there 
> should be no increase in preemption latency due to this change. 
> 
>         There is still a need for interrupt disable sections. I've
> reassigned local_irq_disable() as hard_local_irq_disable() . One 
> would now run this "hard" macro to disable interrupts, with the older
> non-hard types re-factored to disable preemption.
>  
>         Since only a small stub of code runs in interrupt context, we
> only need to protect the data structures accessed by that small 
> piece of code. This included all of the interrupt handling subsystem. It
> also included parts of the scheduler that are used to change 
> the process state.
> 
>         An intended side effect of this implementation is that if a user
> designates an interrupt handler with the flag SA_NODELAY , then that
> interrupt should have a very low latency characteristic. Currently the
> timer is the only SA_NODELAY interrupt.
> 
> Results:
> 
>         Config option |         Number of cli's
>         PREEMPT                 1138 (0% removal)
>         PREEMPT_RT              224  (80% removal)
>         RT_IRQ_DISABLE          69   (94% removal)
> 
>         PREEMPT_RT displays a significant reduction over plain PREEMPT
> due to the fact that mutex converted spinlocks no longer disable
> interrupts. However, PREEMPT_RT doesn't give a fixed number of 
> interrupt disable sections in the kernel. In HARD_RT there is a fixed
> number of interrupt disable sections and it further reduces the 
> total to 30% of PREEMPT_RT (or %6 of PREEMPT).
> 
>         With a fixed number of interrupt disable sections we can give a
> set worst case interrupt latency. This holds no matter what drivers, or
> system config is used. The one current exception relates to the
> xtime_lock. This exception is because this lock is used in the timer
> interrupt which is not in a thread.
> 
>         This is a work in progress , and is still volatile. It is not
> tested fully on SMP. Raw spinlocks no longer disable interrupts, and it
> is unclear what the SMP impact is. So please test on SMP. The irq_trace
> feature could cause hangs if it's used in the wrong places, so be
> careful. IRQ latency and latency tracing have been modified, but still
> require some testing.
> 
>         This patch applies on top of the RT patch provided by Ingo
> Molnar. There is to much instability after 0.7.47-19 , so my patch is
> recommend on this version.
> 
> You may download the -19 RT patch from the following location,
> 
> http://people.redhat.com/~mingo/realtime-preempt/older/realtime-preempt-2.6.12-rc6-V0.7.47-19
> 
> Future work:
> 
>         As described above , there are only a few areas that need a true
> interrupt disable. It would now be possible to measure all interrupt
> disable sections in every kernel when this feature is turned on. 
> A definition like this would allow the biggest interrupt disable section
> to be defined exactly. Once these sections are defined we would then be
> able to optimize each one, producing ever decrease interrupt latency.
> 
>         Another optimization to this system would be to produce a method
> so that local_irq_disable only prevents interrupt threads from running,
> instead of the current method of preventing all threads and processes
> from running. The biggest problem in doing this is the balance of the
> average size of an interrupt disable section vs. the length of time it
> takes to soft disable/enable. 
> 
> Thanks to Sven Dietrich
> 
> Signed-Off-By: Daniel Walker <dwalker@mvista.com>
> 
> Index: linux-2.6.11/arch/i386/kernel/entry.S
> ===================================================================
> --- linux-2.6.11.orig/arch/i386/kernel/entry.S	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/arch/i386/kernel/entry.S	2005-06-08 00:35:30.000000000 +0000
> @@ -331,6 +331,9 @@ work_pending:
>  work_resched:
>  	cli
>  	call __schedule
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +	call local_irq_enable_noresched
> +#endif
>  					# make sure we don't miss an interrupt
>  					# setting need_resched or sigpending
>  					# between sampling and the iret
> Index: linux-2.6.11/arch/i386/kernel/process.c
> ===================================================================
> --- linux-2.6.11.orig/arch/i386/kernel/process.c	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/arch/i386/kernel/process.c	2005-06-08 06:29:52.000000000 +0000
> @@ -96,13 +96,13 @@ EXPORT_SYMBOL(enable_hlt);
>  void default_idle(void)
>  {
>  	if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
> -		local_irq_disable();
> +		hard_local_irq_disable();
>  		if (!need_resched())
> -			safe_halt();
> +			hard_safe_halt();
>  		else
> -			local_irq_enable();
> +			hard_local_irq_enable();
>  	} else {
> -		local_irq_enable();
> +		hard_local_irq_enable();
>  		cpu_relax();
>  	}
>  }
> @@ -149,6 +149,9 @@ void cpu_idle (void)
>  {
>  	/* endless idle loop with no priority at all */
>  	while (1) {
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +			BUG_ON(hard_irqs_disabled());
> +#endif
>  		while (!need_resched()) {
>  			void (*idle)(void);
>  
> @@ -165,7 +168,9 @@ void cpu_idle (void)
>  			stop_critical_timing();
>  			idle();
>  		}
> +		hard_local_irq_disable();
>  		__schedule();
> +		hard_local_irq_enable();
>  	}
>  }
>  
> Index: linux-2.6.11/arch/i386/kernel/signal.c
> ===================================================================
> --- linux-2.6.11.orig/arch/i386/kernel/signal.c	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/arch/i386/kernel/signal.c	2005-06-08 00:44:00.000000000 +0000
> @@ -597,7 +597,7 @@ int fastcall do_signal(struct pt_regs *r
>  	/*
>  	 * Fully-preemptible kernel does not need interrupts disabled:
>  	 */
> -	local_irq_enable();
> +	hard_local_irq_enable();
>  	preempt_check_resched();
>  #endif
>  	/*
> Index: linux-2.6.11/arch/i386/kernel/traps.c
> ===================================================================
> --- linux-2.6.11.orig/arch/i386/kernel/traps.c	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/arch/i386/kernel/traps.c	2005-06-08 00:45:11.000000000 +0000
> @@ -376,7 +376,7 @@ static void do_trap(int trapnr, int sign
>  		goto kernel_trap;
>  
>  #ifdef CONFIG_PREEMPT_RT
> -	local_irq_enable();
> +	hard_local_irq_enable();
>  	preempt_check_resched();
>  #endif
>  
> @@ -508,7 +508,7 @@ fastcall void do_general_protection(stru
>  	return;
>  
>  gp_in_vm86:
> -	local_irq_enable();
> +	hard_local_irq_enable();
>  	handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
>  	return;
>  
> @@ -705,7 +705,7 @@ fastcall void do_debug(struct pt_regs * 
>  		return;
>  	/* It's safe to allow irq's after DR6 has been saved */
>  	if (regs->eflags & X86_EFLAGS_IF)
> -		local_irq_enable();
> +		hard_local_irq_enable();
>  
>  	/* Mask out spurious debug traps due to lazy DR7 setting */
>  	if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
> Index: linux-2.6.11/arch/i386/mm/fault.c
> ===================================================================
> --- linux-2.6.11.orig/arch/i386/mm/fault.c	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/arch/i386/mm/fault.c	2005-06-08 00:35:30.000000000 +0000
> @@ -232,7 +232,7 @@ fastcall notrace void do_page_fault(stru
>  		return;
>  	/* It's safe to allow irq's after cr2 has been saved */
>  	if (regs->eflags & (X86_EFLAGS_IF|VM_MASK))
> -		local_irq_enable();
> +		hard_local_irq_enable();
>  
>  	tsk = current;
>  
> Index: linux-2.6.11/include/asm-i386/system.h
> ===================================================================
> --- linux-2.6.11.orig/include/asm-i386/system.h	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/include/asm-i386/system.h	2005-06-08 00:35:30.000000000 +0000
> @@ -450,28 +450,70 @@ struct alt_instr { 
>  # define trace_irqs_on()		do { } while (0)
>  #endif
>  
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +extern void local_irq_enable(void);
> +extern void local_irq_enable_noresched(void);
> +extern void local_irq_disable(void);
> +extern void local_irq_restore(unsigned long);
> +extern unsigned long irqs_disabled(void);
> +extern unsigned long irqs_disabled_flags(unsigned long);
> +extern unsigned int ___local_save_flags(void);
> +extern void irq_trace_enable(void);
> +extern void irq_trace_disable(void);
> +
> +#define local_save_flags(x) ({ x = ___local_save_flags(); x;})
> +#define local_irq_save(x) ({ local_save_flags(x); local_irq_disable(); x;})
> +#define safe_halt()	do { local_irq_enable(); __asm__ __volatile__("hlt": : :"memory"); } while (0)
> +
> +/* Force the softstate to follow the hard state */
> +#define hard_local_save_flags(x)	_hard_local_save_flags(x)
> +#define hard_local_irq_enable()		do { local_irq_enable_noresched(); _hard_local_irq_enable(); } while(0)
> +#define hard_local_irq_disable()	do { _hard_local_irq_disable(); local_irq_disable(); } while(0)
> +#define hard_local_irq_save(x)		do { _hard_local_irq_save(x); } while(0)
> +#define hard_local_irq_restore(x)		do { _hard_local_irq_restore(x); } while (0)
> +#define hard_safe_halt()			do { local_irq_enable(); _hard_safe_halt(); } while (0)
> +#else
> +
> +#define hard_local_save_flags		_hard_local_save_flags
> +#define hard_local_irq_enable		_hard_local_irq_enable
> +#define hard_local_irq_disable		_hard_local_irq_disable
> +#define hard_local_irq_save		_hard_local_irq_save
> +#define hard_local_irq_restore		_hard_local_irq_restore
> +#define hard_safe_halt			_hard_safe_halt
> +
> +#define local_irq_enable_noresched _hard_local_irq_enable
> +#define local_save_flags	_hard_local_save_flags
> +#define local_irq_enable	_hard_local_irq_enable 
> +#define local_irq_disable	_hard_local_irq_disable 
> +#define local_irq_save		_hard_local_irq_save
> +#define local_irq_restore	_hard_local_irq_restore
> +#define irqs_disabled		hard_irqs_disabled
> +#define irqs_disabled_flags	hard_irqs_disabled_flags
> +#define safe_halt		hard_safe_halt
> +#endif
> +
>  /* interrupt control.. */
> -#define local_save_flags(x)	do { typecheck(unsigned long,x); __asm__ __volatile__("pushfl ; popl %0":"=g" (x): /* no input */); } while (0)
> -#define local_irq_restore(x) 	do { typecheck(unsigned long,x); if (irqs_disabled_flags(x)) trace_irqs_on(); else trace_irqs_on(); __asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory", "cc"); } while (0)
> -#define local_irq_disable() 	do { __asm__ __volatile__("cli": : :"memory"); trace_irqs_off(); } while (0)
> -#define local_irq_enable()	do { trace_irqs_on(); __asm__ __volatile__("sti": : :"memory"); } while (0)
> +#define _hard_local_save_flags(x)	do { typecheck(unsigned long,x); __asm__ __volatile__("pushfl ; popl %0":"=g" (x): /* no input */); } while (0)
> +#define _hard_local_irq_restore(x) 	do { typecheck(unsigned long,x); if (hard_irqs_disabled_flags(x)) trace_irqs_on(); else trace_irqs_on(); __asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory", "cc"); } while (0)
> +#define _hard_local_irq_disable() 	do { __asm__ __volatile__("cli": : :"memory"); trace_irqs_off(); } while (0)
> +#define _hard_local_irq_enable()	do { trace_irqs_on(); __asm__ __volatile__("sti": : :"memory"); } while (0)
>  /* used in the idle loop; sti takes one instruction cycle to complete */
> -#define safe_halt()		do { trace_irqs_on(); __asm__ __volatile__("sti; hlt": : :"memory"); } while (0)
> +#define _hard_safe_halt()		do { trace_irqs_on(); __asm__ __volatile__("sti; hlt": : :"memory"); } while (0)
>  
> -#define irqs_disabled_flags(flags)	\
> +#define hard_irqs_disabled_flags(flags)	\
>  ({					\
>  	!(flags & (1<<9));		\
>  })
>  
> -#define irqs_disabled()			\
> +#define hard_irqs_disabled()			\
>  ({					\
>  	unsigned long flags;		\
> -	local_save_flags(flags);	\
> -	irqs_disabled_flags(flags);	\
> +	hard_local_save_flags(flags);	\
> +	hard_irqs_disabled_flags(flags);	\
>  })
>  
>  /* For spinlocks etc */
> -#define local_irq_save(x)	do { __asm__ __volatile__("pushfl ; popl %0 ; cli":"=g" (x): /* no input */ :"memory"); trace_irqs_off(); } while (0)
> +#define _hard_local_irq_save(x)	do { __asm__ __volatile__("pushfl ; popl %0 ; cli":"=g" (x): /* no input */ :"memory"); trace_irqs_off(); } while (0)
>  
>  /*
>   * disable hlt during certain critical i/o operations
> Index: linux-2.6.11/include/linux/hardirq.h
> ===================================================================
> --- linux-2.6.11.orig/include/linux/hardirq.h	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/include/linux/hardirq.h	2005-06-08 00:48:03.000000000 +0000
> @@ -25,6 +25,9 @@
>  #define PREEMPT_BITS	8
>  #define SOFTIRQ_BITS	8
>  
> +#define IRQSOFF_BITS	1 
> +#define PREEMPTACTIVE_BITS	1 
> +
>  #ifndef HARDIRQ_BITS
>  #define HARDIRQ_BITS	12
>  /*
> @@ -40,16 +43,22 @@
>  #define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
>  #define HARDIRQ_SHIFT	(SOFTIRQ_SHIFT + SOFTIRQ_BITS)
>  
> +#define PREEMPTACTIVE_SHIFT	(HARDIRQ_SHIFT + HARDIRQ_BITS)
> +#define IRQSOFF_SHIFT 		(PREEMPTACTIVE_SHIFT + PREEMPTACTIVE_BITS)
> +
>  #define __IRQ_MASK(x)	((1UL << (x))-1)
>  
>  #define PREEMPT_MASK	(__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
>  #define SOFTIRQ_MASK	(__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
>  #define HARDIRQ_MASK	(__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
>  
> +#define IRQSOFF_MASK	(__IRQ_MASK(IRQSOFF_BITS) << IRQSOFF_SHIFT)
> +
>  #define PREEMPT_OFFSET	(1UL << PREEMPT_SHIFT)
>  #define SOFTIRQ_OFFSET	(1UL << SOFTIRQ_SHIFT)
>  #define HARDIRQ_OFFSET	(1UL << HARDIRQ_SHIFT)
>  
> +#define IRQSOFF_OFFSET        (1UL << IRQSOFF_SHIFT)
>  #if PREEMPT_ACTIVE < (1 << (HARDIRQ_SHIFT + HARDIRQ_BITS))
>  #error PREEMPT_ACTIVE is too low!
>  #endif
> @@ -58,6 +67,8 @@
>  #define softirq_count()	(preempt_count() & SOFTIRQ_MASK)
>  #define irq_count()	(preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK))
>  
> +#define irqs_off()	(preempt_count() & IRQSOFF_MASK)
> +
>  /*
>   * Are we doing bottom half or hardware interrupt processing?
>   * Are we in a softirq context? Interrupt context?
> @@ -69,9 +80,9 @@
>  #if defined(CONFIG_PREEMPT) && \
>  	!defined(CONFIG_PREEMPT_BKL) && \
>  		!defined(CONFIG_PREEMPT_RT)
> -# define in_atomic()	((preempt_count() & ~PREEMPT_ACTIVE) != kernel_locked())
> +# define in_atomic()	((preempt_count() & ~(PREEMPT_ACTIVE|IRQSOFF_OFFSET)) != kernel_locked())
>  #else
> -# define in_atomic()	((preempt_count() & ~PREEMPT_ACTIVE) != 0)
> +# define in_atomic()	((preempt_count() & ~(PREEMPT_ACTIVE|IRQSOFF_OFFSET)) != 0)
>  #endif
>  
>  #ifdef CONFIG_PREEMPT
> Index: linux-2.6.11/include/linux/sched.h
> ===================================================================
> --- linux-2.6.11.orig/include/linux/sched.h	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/include/linux/sched.h	2005-06-08 00:35:30.000000000 +0000
> @@ -839,6 +839,9 @@ struct task_struct {
>  	unsigned long preempt_trace_eip[MAX_PREEMPT_TRACE];
>  	unsigned long preempt_trace_parent_eip[MAX_PREEMPT_TRACE];
>  #endif
> +	unsigned long last_irq_disable[2];
> +	unsigned long last_irq_enable[2];
> +
>  
>  	/* realtime bits */
>  	struct list_head delayed_put;
> Index: linux-2.6.11/include/linux/seqlock.h
> ===================================================================
> --- linux-2.6.11.orig/include/linux/seqlock.h	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/include/linux/seqlock.h	2005-06-08 00:35:30.000000000 +0000
> @@ -305,26 +305,26 @@ do {								\
>   * Possible sw/hw IRQ protected versions of the interfaces.
>   */
>  #define write_seqlock_irqsave(lock, flags)				\
> -	do { PICK_IRQOP2(local_irq_save, flags, lock); write_seqlock(lock); } while (0)
> +	do { PICK_IRQOP2(hard_local_irq_save, flags, lock); write_seqlock(lock); } while (0)
>  #define write_seqlock_irq(lock)						\
> -	do { PICK_IRQOP(local_irq_disable, lock); write_seqlock(lock); } while (0)
> +	do { PICK_IRQOP(hard_local_irq_disable, lock); write_seqlock(lock); } while (0)
>  #define write_seqlock_bh(lock)						\
>          do { PICK_IRQOP(local_bh_disable, lock); write_seqlock(lock); } while (0)
>  
>  #define write_sequnlock_irqrestore(lock, flags)				\
> -	do { write_sequnlock(lock); PICK_IRQOP2(local_irq_restore, flags, lock); preempt_check_resched(); } while(0)
> +	do { write_sequnlock(lock); PICK_IRQOP2(hard_local_irq_restore, flags, lock); preempt_check_resched(); } while(0)
>  #define write_sequnlock_irq(lock)					\
> -	do { write_sequnlock(lock); PICK_IRQOP(local_irq_enable, lock); preempt_check_resched(); } while(0)
> +	do { write_sequnlock(lock); PICK_IRQOP(hard_local_irq_enable, lock); preempt_check_resched(); } while(0)
>  #define write_sequnlock_bh(lock)					\
>  	do { write_sequnlock(lock); PICK_IRQOP(local_bh_enable, lock); } while(0)
>  
>  #define read_seqbegin_irqsave(lock, flags)				\
> -	({ PICK_IRQOP2(local_irq_save, flags, lock); read_seqbegin(lock); })
> +	({ PICK_IRQOP2(hard_local_irq_save, flags, lock); read_seqbegin(lock); })
>  
>  #define read_seqretry_irqrestore(lock, iv, flags)			\
>  	({								\
>  		int ret = read_seqretry(lock, iv);			\
> -		PICK_IRQOP2(local_irq_restore, flags, lock);		\
> +		PICK_IRQOP2(hard_local_irq_restore, flags, lock);		\
>  		preempt_check_resched(); 				\
>  		ret;							\
>  	})
> Index: linux-2.6.11/include/linux/spinlock.h
> ===================================================================
> --- linux-2.6.11.orig/include/linux/spinlock.h	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/include/linux/spinlock.h	2005-06-08 00:35:30.000000000 +0000
> @@ -244,7 +244,7 @@ typedef struct {
>  ({ \
>  	local_irq_disable(); preempt_disable(); \
>  	__raw_spin_trylock(lock) ? \
> -	1 : ({ __preempt_enable_no_resched(); local_irq_enable(); preempt_check_resched(); 0; }); \
> +	1 : ({ __preempt_enable_no_resched(); local_irq_enable_noresched(); preempt_check_resched(); 0; }); \
>  })
>  
>  #define _raw_spin_trylock_irqsave(lock, flags) \
> @@ -384,7 +384,7 @@ do { \
>  do { \
>  	__raw_spin_unlock(lock); \
>  	__preempt_enable_no_resched(); \
> -	local_irq_enable(); \
> +	local_irq_enable_noresched(); \
>  	preempt_check_resched(); \
>  	__release(lock); \
>  } while (0)
> @@ -427,7 +427,7 @@ do { \
>  do { \
>  	__raw_read_unlock(lock);\
>  	__preempt_enable_no_resched(); \
> -	local_irq_enable();	\
> +	local_irq_enable_noresched();	\
>  	preempt_check_resched(); \
>  	__release(lock); \
>  } while (0)
> @@ -444,7 +444,7 @@ do { \
>  do { \
>  	__raw_write_unlock(lock);\
>  	__preempt_enable_no_resched(); \
> -	local_irq_enable();	\
> +	local_irq_enable_noresched();	\
>  	preempt_check_resched(); \
>  	__release(lock); \
>  } while (0)
> Index: linux-2.6.11/init/main.c
> ===================================================================
> --- linux-2.6.11.orig/init/main.c	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/init/main.c	2005-06-08 00:51:57.000000000 +0000
> @@ -428,6 +428,14 @@ asmlinkage void __init start_kernel(void
>  {
>  	char * command_line;
>  	extern struct kernel_param __start___param[], __stop___param[];
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +	/* 
> + 	 * Force the soft IRQ state to mimic the hard state until
> +	 * we finish boot-up.
> +	 */
> +	local_irq_disable();
> +#endif
> +
>  /*
>   * Interrupts are still disabled. Do necessary setups, then
>   * enable them
> @@ -456,6 +464,13 @@ asmlinkage void __init start_kernel(void
>  	 * fragile until we cpu_idle() for the first time.
>  	 */
>  	preempt_disable();
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +	/*
> +	 * Reset the irqs off flag after sched_init resets the preempt_count.
> +	 */
> +	local_irq_disable();
> +#endif
> +
>  	build_all_zonelists();
>  	page_alloc_init();
>  	early_init_hardirqs();
> @@ -482,7 +497,12 @@ asmlinkage void __init start_kernel(void
>  	if (panic_later)
>  		panic(panic_later, panic_param);
>  	profile_init();
> -	local_irq_enable();
> +
> +	/*
> +	 * Soft IRQ state will be enabled with the hard state.
> +	 */
> +	hard_local_irq_enable();
> +
>  #ifdef CONFIG_BLK_DEV_INITRD
>  	if (initrd_start && !initrd_below_start_ok &&
>  			initrd_start < min_low_pfn << PAGE_SHIFT) {
> @@ -526,6 +546,9 @@ asmlinkage void __init start_kernel(void
>  
>  	acpi_early_init(); /* before LAPIC and SMP init */
>  
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +	WARN_ON(hard_irqs_disabled());
> +#endif
>  	/* Do the rest non-__init'ed, we're now alive */
>  	rest_init();
>  }
> @@ -568,6 +591,12 @@ static void __init do_initcalls(void)
>  			msg = "disabled interrupts";
>  			local_irq_enable();
>  		}
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +		if (hard_irqs_disabled()) {
> +			msg = "disabled hard interrupts";
> +			hard_local_irq_enable();
> +		}
> +#endif
>  		if (msg) {
>  			printk(KERN_WARNING "error in initcall at 0x%p: "
>  				"returned with %s\n", *call, msg);
> @@ -708,6 +737,9 @@ static int init(void * unused)
>  	 * The Bourne shell can be used instead of init if we are 
>  	 * trying to recover a really broken machine.
>  	 */
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +	WARN_ON(hard_irqs_disabled());
> +#endif
>  
>  	if (execute_command)
>  		run_init_process(execute_command);
> Index: linux-2.6.11/kernel/Makefile
> ===================================================================
> --- linux-2.6.11.orig/kernel/Makefile	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/kernel/Makefile	2005-06-08 00:35:30.000000000 +0000
> @@ -10,6 +10,7 @@ obj-y     = sched.o fork.o exec_domain.o
>  	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o
>  
>  obj-$(CONFIG_PREEMPT_RT) += rt.o
> +obj-$(CONFIG_RT_IRQ_DISABLE) += irqs-off.o
>  
>  obj-$(CONFIG_DEBUG_PREEMPT) += latency.o
>  obj-$(CONFIG_LATENCY_TIMING) += latency.o
> Index: linux-2.6.11/kernel/exit.c
> ===================================================================
> --- linux-2.6.11.orig/kernel/exit.c	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/kernel/exit.c	2005-06-08 00:35:30.000000000 +0000
> @@ -844,7 +844,7 @@ fastcall NORET_TYPE void do_exit(long co
>  	check_no_held_locks(tsk);
>  	/* PF_DEAD causes final put_task_struct after we schedule. */
>  again:
> -	local_irq_disable();
> +	hard_local_irq_disable();
>  	tsk->flags |= PF_DEAD;
>  	__schedule();
>  	printk(KERN_ERR "BUG: dead task %s:%d back from the grave!\n",
> Index: linux-2.6.11/kernel/irq/autoprobe.c
> ===================================================================
> --- linux-2.6.11.orig/kernel/irq/autoprobe.c	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/kernel/irq/autoprobe.c	2005-06-08 00:35:30.000000000 +0000
> @@ -39,10 +39,12 @@ unsigned long probe_irq_on(void)
>  	for (i = NR_IRQS-1; i > 0; i--) {
>  		desc = irq_desc + i;
>  
> -		spin_lock_irq(&desc->lock);
> +		hard_local_irq_disable();
> +		spin_lock(&desc->lock);
>  		if (!irq_desc[i].action)
>  			irq_desc[i].handler->startup(i);
> -		spin_unlock_irq(&desc->lock);
> +		spin_unlock(&desc->lock);
> +		hard_local_irq_enable();
>  	}
>  
>  	/*
> @@ -58,13 +60,15 @@ unsigned long probe_irq_on(void)
>  	for (i = NR_IRQS-1; i > 0; i--) {
>  		desc = irq_desc + i;
>  
> -		spin_lock_irq(&desc->lock);
> +		hard_local_irq_disable();
> +		spin_lock(&desc->lock);
>  		if (!desc->action) {
>  			desc->status |= IRQ_AUTODETECT | IRQ_WAITING;
>  			if (desc->handler->startup(i))
>  				desc->status |= IRQ_PENDING;
>  		}
> -		spin_unlock_irq(&desc->lock);
> +		spin_unlock(&desc->lock);
> +		hard_local_irq_enable();
>  	}
>  
>  	/*
> @@ -80,7 +84,8 @@ unsigned long probe_irq_on(void)
>  		irq_desc_t *desc = irq_desc + i;
>  		unsigned int status;
>  
> -		spin_lock_irq(&desc->lock);
> +		hard_local_irq_disable();
> +		spin_lock(&desc->lock);
>  		status = desc->status;
>  
>  		if (status & IRQ_AUTODETECT) {
> @@ -92,7 +97,8 @@ unsigned long probe_irq_on(void)
>  				if (i < 32)
>  					val |= 1 << i;
>  		}
> -		spin_unlock_irq(&desc->lock);
> +		spin_unlock(&desc->lock);
> +		hard_local_irq_enable();
>  	}
>  
>  	return val;
> @@ -122,7 +128,8 @@ unsigned int probe_irq_mask(unsigned lon
>  		irq_desc_t *desc = irq_desc + i;
>  		unsigned int status;
>  
> -		spin_lock_irq(&desc->lock);
> +		hard_local_irq_disable();
> +		spin_lock(&desc->lock);
>  		status = desc->status;
>  
>  		if (status & IRQ_AUTODETECT) {
> @@ -132,7 +139,8 @@ unsigned int probe_irq_mask(unsigned lon
>  			desc->status = status & ~IRQ_AUTODETECT;
>  			desc->handler->shutdown(i);
>  		}
> -		spin_unlock_irq(&desc->lock);
> +		spin_unlock(&desc->lock);
> +		hard_local_irq_enable();
>  	}
>  	up(&probe_sem);
>  
> @@ -165,7 +173,8 @@ int probe_irq_off(unsigned long val)
>  		irq_desc_t *desc = irq_desc + i;
>  		unsigned int status;
>  
> -		spin_lock_irq(&desc->lock);
> +		hard_local_irq_disable();
> +		spin_lock(&desc->lock);
>  		status = desc->status;
>  
>  		if (status & IRQ_AUTODETECT) {
> @@ -177,7 +186,8 @@ int probe_irq_off(unsigned long val)
>  			desc->status = status & ~IRQ_AUTODETECT;
>  			desc->handler->shutdown(i);
>  		}
> -		spin_unlock_irq(&desc->lock);
> +		spin_unlock(&desc->lock);
> +		hard_local_irq_enable();
>  	}
>  	up(&probe_sem);
>  
> Index: linux-2.6.11/kernel/irq/handle.c
> ===================================================================
> --- linux-2.6.11.orig/kernel/irq/handle.c	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/kernel/irq/handle.c	2005-06-08 00:35:30.000000000 +0000
> @@ -113,7 +113,7 @@ fastcall int handle_IRQ_event(unsigned i
>  	 * IRQ handlers:
>  	 */
>  	if (!hardirq_count() || !(action->flags & SA_INTERRUPT))
> -		local_irq_enable();
> +		hard_local_irq_enable();
>  
>  	do {
>  		unsigned int preempt_count = preempt_count();
> @@ -133,10 +133,10 @@ fastcall int handle_IRQ_event(unsigned i
>  	} while (action);
>  
>  	if (status & SA_SAMPLE_RANDOM) {
> -		local_irq_enable();
> +		hard_local_irq_enable();
>  		add_interrupt_randomness(irq);
>  	}
> -	local_irq_disable();
> +	hard_local_irq_disable();
>  
>  	return retval;
>  }
> @@ -157,6 +157,10 @@ fastcall notrace unsigned int __do_IRQ(u
>  	struct irqaction * action;
>  	unsigned int status;
>  
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +	unsigned long flags;
> +	local_irq_save(flags);
> +#endif
>  	kstat_this_cpu.irqs[irq]++;
>  	if (desc->status & IRQ_PER_CPU) {
>  		irqreturn_t action_ret;
> @@ -241,6 +245,9 @@ out:
>  out_no_end:
>  	spin_unlock(&desc->lock);
>  
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +	local_irq_restore(flags);
> +#endif
>  	return 1;
>  }
>  
> Index: linux-2.6.11/kernel/irq/manage.c
> ===================================================================
> --- linux-2.6.11.orig/kernel/irq/manage.c	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/kernel/irq/manage.c	2005-06-08 00:55:05.000000000 +0000
> @@ -59,13 +59,15 @@ void disable_irq_nosync(unsigned int irq
>  {
>  	irq_desc_t *desc = irq_desc + irq;
>  	unsigned long flags;
> -
> -	spin_lock_irqsave(&desc->lock, flags);
> +	
> +	_hard_local_irq_save(flags);
> +	spin_lock(&desc->lock);
>  	if (!desc->depth++) {
>  		desc->status |= IRQ_DISABLED;
>  		desc->handler->disable(irq);
>  	}
> -	spin_unlock_irqrestore(&desc->lock, flags);
> +	spin_unlock(&desc->lock);
> +	_hard_local_irq_restore(flags);
>  }
>  
>  EXPORT_SYMBOL(disable_irq_nosync);
> @@ -108,7 +110,8 @@ void enable_irq(unsigned int irq)
>  	irq_desc_t *desc = irq_desc + irq;
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&desc->lock, flags);
> +	_hard_local_irq_save(flags);
> +	spin_lock(&desc->lock);
>  	switch (desc->depth) {
>  	case 0:
>  		WARN_ON(1);
> @@ -127,7 +130,8 @@ void enable_irq(unsigned int irq)
>  	default:
>  		desc->depth--;
>  	}
> -	spin_unlock_irqrestore(&desc->lock, flags);
> +	spin_unlock(&desc->lock);
> +	_hard_local_irq_restore(flags);
>  }
>  
>  EXPORT_SYMBOL(enable_irq);
> @@ -203,7 +207,8 @@ int setup_irq(unsigned int irq, struct i
>  	/*
>  	 * The following block of code has to be executed atomically
>  	 */
> -	spin_lock_irqsave(&desc->lock,flags);
> +	_hard_local_irq_save(flags);
> +	spin_lock(&desc->lock);
>  	p = &desc->action;
>  	if ((old = *p) != NULL) {
>  		/* Can't share interrupts unless both agree to */
> @@ -236,7 +241,8 @@ int setup_irq(unsigned int irq, struct i
>  		else
>  			desc->handler->enable(irq);
>  	}
> -	spin_unlock_irqrestore(&desc->lock,flags);
> +	spin_unlock(&desc->lock);
> +	_hard_local_irq_restore(flags);
>  
>  	new->irq = irq;
>  	register_irq_proc(irq);
> @@ -270,7 +276,8 @@ void free_irq(unsigned int irq, void *de
>  		return;
>  
>  	desc = irq_desc + irq;
> -	spin_lock_irqsave(&desc->lock,flags);
> +	_hard_local_irq_save(flags);
> +	spin_lock(&desc->lock);
>  	p = &desc->action;
>  	for (;;) {
>  		struct irqaction * action = *p;
> @@ -292,7 +299,8 @@ void free_irq(unsigned int irq, void *de
>  					desc->handler->disable(irq);
>  			}
>  			recalculate_desc_flags(desc);
> -			spin_unlock_irqrestore(&desc->lock,flags);
> +			spin_unlock(&desc->lock);
> +			_hard_local_irq_restore(flags);
>  			unregister_handler_proc(irq, action);
>  
>  			/* Make sure it's not being used on another CPU */
> @@ -301,7 +309,8 @@ void free_irq(unsigned int irq, void *de
>  			return;
>  		}
>  		printk(KERN_ERR "Trying to free free IRQ%d\n",irq);
> -		spin_unlock_irqrestore(&desc->lock,flags);
> +		spin_unlock(&desc->lock);
> +		_hard_local_irq_restore(flags);
>  		return;
>  	}
>  }
> @@ -409,7 +418,7 @@ static void do_hardirq(struct irq_desc *
>  	struct irqaction * action;
>  	unsigned int irq = desc - irq_desc;
>  
> -	local_irq_disable();
> +	hard_local_irq_disable();
>  
>  	if (desc->status & IRQ_INPROGRESS) {
>  		action = desc->action;
> @@ -420,9 +429,10 @@ static void do_hardirq(struct irq_desc *
>  			if (action) {
>  				spin_unlock(&desc->lock);
>  				action_ret = handle_IRQ_event(irq, NULL,action);
> -				local_irq_enable();
> +				hard_local_irq_enable();
>  				cond_resched_all();
> -				spin_lock_irq(&desc->lock);
> +				hard_local_irq_disable();
> +				spin_lock(&desc->lock);
>  			}
>  			if (!noirqdebug)
>  				note_interrupt(irq, desc, action_ret);
> @@ -438,7 +448,7 @@ static void do_hardirq(struct irq_desc *
>  		desc->handler->end(irq);
>  		spin_unlock(&desc->lock);
>  	}
> -	local_irq_enable();
> +	hard_local_irq_enable();
>  	if (waitqueue_active(&desc->wait_for_handler))
>  		wake_up(&desc->wait_for_handler);
>  }
> @@ -474,7 +484,7 @@ static int do_irqd(void * __desc)
>  		do_hardirq(desc);
>  		cond_resched_all();
>  		__do_softirq();
> -		local_irq_enable();
> +		hard_local_irq_enable();
>  #ifdef CONFIG_SMP
>  		/*
>  		 * Did IRQ affinities change?
> Index: linux-2.6.11/kernel/irqs-off.c
> ===================================================================
> --- linux-2.6.11.orig/kernel/irqs-off.c	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.11/kernel/irqs-off.c	2005-06-08 01:05:41.000000000 +0000
> @@ -0,0 +1,99 @@
> +/*
> + * kernel/irqs-off.c 
> + *
> + * IRQ soft state managment 
> + *
> + * Author: Daniel Walker <dwalker@mvista.com>
> + *
> + * 2005 (c) MontaVista Software, Inc. This file is licensed under
> + * the terms of the GNU General Public License version 2. This program
> + * is licensed "as is" without any warranty of any kind, whether express
> + * or implied.
> + */
> +
> +#include <linux/hardirq.h>
> +#include <linux/preempt.h>
> +#include <linux/kallsyms.h>
> +
> +#include <linux/module.h>
> +#include <asm/system.h>
> +
> +static int irq_trace;
> +
> +void irq_trace_enable(void) { irq_trace = 1; }
> +void irq_trace_disable(void) { irq_trace = 0; }
> +
> +unsigned int ___local_save_flags()
> +{
> +	return irqs_off();
> +}                                                                                                                        
> +EXPORT_SYMBOL(___local_save_flags);
> +
> +void local_irq_enable_noresched(void)
> +{
> +	if (irq_trace) {
> +		current->last_irq_enable[0] = (unsigned long)__builtin_return_address(0);
> +		//current->last_irq_enable[1] = (unsigned long)__builtin_return_address(1);
> +	}
> +
> +	if (irqs_off()) sub_preempt_count(IRQSOFF_OFFSET);
> +}
> +EXPORT_SYMBOL(local_irq_enable_noresched);
> +
> +void local_irq_enable(void)
> +{
> +	if (irq_trace) {
> +		current->last_irq_enable[0] = (unsigned long)__builtin_return_address(0);
> +		//current->last_irq_enable[1] = (unsigned long)__builtin_return_address(1);
> +	}
> +	if (irqs_off()) sub_preempt_count(IRQSOFF_OFFSET);
> +
> +	//local_irq_enable_noresched();
> +	preempt_check_resched(); 
> +}
> +EXPORT_SYMBOL(local_irq_enable);
> +
> +void local_irq_disable(void) 
> +{
> +	if (irq_trace) {
> +		current->last_irq_disable[0] = (unsigned long)__builtin_return_address(0);
> +		//current->last_irq_disable[1] = (unsigned long)__builtin_return_address(1);
> +	}
> +	if (!irqs_off()) add_preempt_count(IRQSOFF_OFFSET);
> +}
> +EXPORT_SYMBOL(local_irq_disable);
> +
> +unsigned long irqs_disabled_flags(unsigned long flags)
> +{
> +	return (flags & IRQSOFF_MASK);	
> +}
> +EXPORT_SYMBOL(irqs_disabled_flags);
> +
> +void local_irq_restore(unsigned long flags)
> +{
> +	if (!irqs_disabled_flags(flags)) local_irq_enable();
> +}
> +EXPORT_SYMBOL(local_irq_restore);
> +
> +unsigned long irqs_disabled(void)
> +{
> +	return irqs_off();
> +}
> +EXPORT_SYMBOL(irqs_disabled);
> +
> +void print_irq_traces(struct task_struct *task)
> +{
> +	printk("Soft state access: (%s)\n", (hard_irqs_disabled()) ? "Hard disabled" : "Not disabled");
> +	printk(".. [<%08lx>] .... ", task->last_irq_disable[0]);
> +	print_symbol("%s\n", task->last_irq_disable[0]);
> +	printk(".....[<%08lx>] ..   ( <= ",
> +				task->last_irq_disable[1]);
> +	print_symbol("%s)\n", task->last_irq_disable[1]);
> +
> +	printk(".. [<%08lx>] .... ", task->last_irq_enable[0]);
> +	print_symbol("%s\n", task->last_irq_enable[0]);
> +	printk(".....[<%08lx>] ..   ( <= ",
> +				task->last_irq_enable[1]);
> +	print_symbol("%s)\n", task->last_irq_enable[1]);
> +	printk("\n");
> +}
> Index: linux-2.6.11/kernel/latency.c
> ===================================================================
> --- linux-2.6.11.orig/kernel/latency.c	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/kernel/latency.c	2005-06-08 01:05:34.000000000 +0000
> @@ -108,12 +108,15 @@ enum trace_flag_type
>  	TRACE_FLAG_NEED_RESCHED		= 0x02,
>  	TRACE_FLAG_HARDIRQ		= 0x04,
>  	TRACE_FLAG_SOFTIRQ		= 0x08,
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +	TRACE_FLAG_IRQS_HARD_OFF	= 0x16,
> +#endif
>  };
>  
>  
>  #ifdef CONFIG_LATENCY_TRACE
>  
> -#define MAX_TRACE (unsigned long)(4096-1)
> +#define MAX_TRACE (unsigned long)(8192-1)
>  
>  #define CMDLINE_BYTES 16
>  
> @@ -263,6 +266,9 @@ ____trace(int cpu, enum trace_type type,
>  		entry->cpu = cpu;
>  #endif
>  		entry->flags = (irqs_disabled() ? TRACE_FLAG_IRQS_OFF : 0) |
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +			(hard_irqs_disabled() ? TRACE_FLAG_IRQS_HARD_OFF : 0)|
> +#endif
>  			((pc & HARDIRQ_MASK) ? TRACE_FLAG_HARDIRQ : 0) |
>  			((pc & SOFTIRQ_MASK) ? TRACE_FLAG_SOFTIRQ : 0) |
>  			(_need_resched() ? TRACE_FLAG_NEED_RESCHED : 0);
> @@ -724,7 +730,12 @@ print_generic(struct seq_file *m, struct
>  	seq_printf(m, "%8.8s-%-5d ", pid_to_cmdline(entry->pid), entry->pid);
>  	seq_printf(m, "%d", entry->cpu);
>  	seq_printf(m, "%c%c",
> -		(entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' : '.',
> +		(entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' :
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +		(entry->flags & TRACE_FLAG_IRQS_HARD_OFF) ? 'D' : '.',
> +#else
> +		'.',
> +#endif
>  		(entry->flags & TRACE_FLAG_NEED_RESCHED) ? 'n' : '.');
>  
>  	hardirq = entry->flags & TRACE_FLAG_HARDIRQ;
> @@ -1212,9 +1223,9 @@ void notrace trace_irqs_off_lowlevel(voi
>  {
>  	unsigned long flags;
>  
> -	local_save_flags(flags);
> +	hard_local_save_flags(flags);
>  
> -	if (!irqs_off_preempt_count() && irqs_disabled_flags(flags))
> +	if (!irqs_off_preempt_count() && hard_irqs_disabled_flags(flags))
>  		__start_critical_timing(CALLER_ADDR0, 0);
>  }
>  
> @@ -1222,9 +1233,9 @@ void notrace trace_irqs_off(void)
>  {
>  	unsigned long flags;
>  
> -	local_save_flags(flags);
> +	hard_local_save_flags(flags);
>  
> -	if (!irqs_off_preempt_count() && irqs_disabled_flags(flags))
> +	if (!irqs_off_preempt_count() && hard_irqs_disabled_flags(flags))
>  		__start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
>  }
>  
> @@ -1234,9 +1245,9 @@ void notrace trace_irqs_on(void)
>  {
>  	unsigned long flags;
>  
> -	local_save_flags(flags);
> +	hard_local_save_flags(flags);
>  
> -	if (!irqs_off_preempt_count() && irqs_disabled_flags(flags))
> +	if (!irqs_off_preempt_count() && hard_irqs_disabled_flags(flags))
>  		__stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
>  }
>  
> @@ -1633,8 +1644,13 @@ static void print_entry(struct trace_ent
>  	printk("%-5d ", entry->pid);
>  
>  	printk("%c%c",
> -		(entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' : '.',
> -		(entry->flags & TRACE_FLAG_NEED_RESCHED) ? 'n' : '.');
> +		(entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' :
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +		(entry->flags & TRACE_FLAG_IRQS_HARD_OFF) ? 'D' : '.',
> +#else
> +		'.',
> +#endif
> + 		(entry->flags & TRACE_FLAG_NEED_RESCHED) ? 'n' : '.');
>  
>  	hardirq = entry->flags & TRACE_FLAG_HARDIRQ;
>  	softirq = entry->flags & TRACE_FLAG_SOFTIRQ;
> Index: linux-2.6.11/kernel/printk.c
> ===================================================================
> --- linux-2.6.11.orig/kernel/printk.c	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/kernel/printk.c	2005-06-08 00:35:30.000000000 +0000
> @@ -529,7 +529,8 @@ asmlinkage int vprintk(const char *fmt, 
>  		zap_locks();
>  
>  	/* This stops the holder of console_sem just where we want him */
> -	spin_lock_irqsave(&logbuf_lock, flags);
> +	local_irq_save(flags);
> +	spin_lock(&logbuf_lock);
>  
>  	/* Emit the output into the temporary buffer */
>  	printed_len = vscnprintf(printk_buf, sizeof(printk_buf), fmt, args);
> @@ -599,16 +600,18 @@ asmlinkage int vprintk(const char *fmt, 
>  		 * CPU until it is officially up.  We shouldn't be calling into
>  		 * random console drivers on a CPU which doesn't exist yet..
>  		 */
> -		spin_unlock_irqrestore(&logbuf_lock, flags);
> +		spin_unlock(&logbuf_lock);
> +		local_irq_restore(flags);
>  		goto out;
>  	}
> -	if (!down_trylock(&console_sem)) {
> +	if (!in_interrupt() && !down_trylock(&console_sem)) {
>  		console_locked = 1;
>  		/*
>  		 * We own the drivers.  We can drop the spinlock and let
>  		 * release_console_sem() print the text
>  		 */
> -		spin_unlock_irqrestore(&logbuf_lock, flags);
> +		spin_unlock(&logbuf_lock);
> +		local_irq_restore(flags);
>  		console_may_schedule = 0;
>  		release_console_sem();
>  	} else {
> @@ -617,7 +620,8 @@ asmlinkage int vprintk(const char *fmt, 
>  		 * allows the semaphore holder to proceed and to call the
>  		 * console drivers with the output which we just produced.
>  		 */
> -		spin_unlock_irqrestore(&logbuf_lock, flags);
> +		spin_unlock(&logbuf_lock);
> +		local_irq_restore(flags);
>  	}
>  out:
>  	return printed_len;
> @@ -750,7 +754,7 @@ void release_console_sem(void)
>  	 * case only.
>  	 */
>  #ifdef CONFIG_PREEMPT_RT
> -	if (!in_atomic() && !irqs_disabled())
> +	if (!in_atomic() && !irqs_disabled() && !hard_irqs_disabled())
>  #endif
>  	if (wake_klogd && !oops_in_progress && waitqueue_active(&log_wait))
>  		wake_up_interruptible(&log_wait);
> Index: linux-2.6.11/kernel/sched.c
> ===================================================================
> --- linux-2.6.11.orig/kernel/sched.c	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/kernel/sched.c	2005-06-08 06:06:37.000000000 +0000
> @@ -307,11 +307,12 @@ static inline runqueue_t *task_rq_lock(t
>  	struct runqueue *rq;
>  
>  repeat_lock_task:
> -	local_irq_save(*flags);
> +	hard_local_irq_save(*flags);
>  	rq = task_rq(p);
>  	spin_lock(&rq->lock);
>  	if (unlikely(rq != task_rq(p))) {
> -		spin_unlock_irqrestore(&rq->lock, *flags);
> +		spin_unlock(&rq->lock);
> +		hard_local_irq_restore(*flags);
>  		goto repeat_lock_task;
>  	}
>  	return rq;
> @@ -320,7 +321,8 @@ repeat_lock_task:
>  static inline void task_rq_unlock(runqueue_t *rq, unsigned long *flags)
>  	__releases(rq->lock)
>  {
> -	spin_unlock_irqrestore(&rq->lock, *flags);
> +	spin_unlock(&rq->lock);
> +	hard_local_irq_restore(*flags);
>  }
>  
>  #ifdef CONFIG_SCHEDSTATS
> @@ -426,7 +428,7 @@ static inline runqueue_t *this_rq_lock(v
>  {
>  	runqueue_t *rq;
>  
> -	local_irq_disable();
> +	hard_local_irq_disable();
>  	rq = this_rq();
>  	spin_lock(&rq->lock);
>  
> @@ -1213,9 +1215,10 @@ out:
>  	 */
>  	if (_need_resched() && !irqs_disabled_flags(flags) && !preempt_count())
>  		preempt_schedule_irq();
> -	local_irq_restore(flags);
> +	hard_local_irq_restore(flags);
>  #else
> -	spin_unlock_irqrestore(&rq->lock, flags);
> +	spin_unlock(&rq->lock);
> +	hard_local_irq_restore(flags);
>  #endif
>  	/* no need to check for preempt here - we just handled it */
>  
> @@ -1289,7 +1292,7 @@ void fastcall sched_fork(task_t *p)
>  	 * total amount of pending timeslices in the system doesn't change,
>  	 * resulting in more scheduling fairness.
>  	 */
> -	local_irq_disable();
> +	hard_local_irq_disable();
>  	p->time_slice = (current->time_slice + 1) >> 1;
>  	/*
>  	 * The remainder of the first timeslice might be recovered by
> @@ -1307,10 +1310,10 @@ void fastcall sched_fork(task_t *p)
>  		current->time_slice = 1;
>  		preempt_disable();
>  		scheduler_tick();
> -		local_irq_enable();
> +		hard_local_irq_enable();
>  		preempt_enable();
>  	} else
> -		local_irq_enable();
> +		hard_local_irq_enable();
>  }
>  
>  /*
> @@ -1496,7 +1499,7 @@ asmlinkage void schedule_tail(task_t *pr
>  	preempt_disable(); // TODO: move this to fork setup
>  	finish_task_switch(prev);
>  	__preempt_enable_no_resched();
> -	local_irq_enable();
> +	hard_local_irq_enable();
>  	preempt_check_resched();
>  
>  	if (current->set_child_tid)
> @@ -2623,7 +2626,7 @@ void scheduler_tick(void)
>  	task_t *p = current;
>  	unsigned long long now = sched_clock();
>  
> -	BUG_ON(!irqs_disabled());
> +	BUG_ON(!hard_irqs_disabled());
>  
>  	update_cpu_clock(p, rq, now);
>  
> @@ -2938,7 +2941,8 @@ void __sched __schedule(void)
>  	run_time /= (CURRENT_BONUS(prev) ? : 1);
>  
>  	cpu = smp_processor_id();
> -	spin_lock_irq(&rq->lock);
> +	hard_local_irq_disable();
> +	spin_lock(&rq->lock);
>  
>  	switch_count = &prev->nvcsw; // TODO: temporary - to see it in vmstat
>  	if ((prev->state & ~TASK_RUNNING_MUTEX) &&
> @@ -3078,7 +3082,7 @@ asmlinkage void __sched schedule(void)
>  	/*
>  	 * Test if we have interrupts disabled.
>  	 */
> -	if (unlikely(irqs_disabled())) {
> +	if (unlikely(irqs_disabled() || hard_irqs_disabled())) {
>  		stop_trace();
>  		printk(KERN_ERR "BUG: scheduling with irqs disabled: "
>  			"%s/0x%08x/%d\n",
> @@ -3096,7 +3100,7 @@ asmlinkage void __sched schedule(void)
>  	do {
>  		__schedule();
>  	} while (unlikely(test_thread_flag(TIF_NEED_RESCHED)));
> -	local_irq_enable(); // TODO: do sti; ret
> +	hard_local_irq_enable(); // TODO: do sti; ret
>  }
>  
>  EXPORT_SYMBOL(schedule);
> @@ -3166,11 +3170,11 @@ asmlinkage void __sched preempt_schedule
>  	 * If there is a non-zero preempt_count or interrupts are disabled,
>  	 * we do not want to preempt the current task.  Just return..
>  	 */
> -	if (unlikely(ti->preempt_count || irqs_disabled()))
> +	if (unlikely(ti->preempt_count || irqs_disabled() || hard_irqs_disabled()))
>  		return;
>  
>  need_resched:
> -	local_irq_disable();
> +	hard_local_irq_disable();
>  	add_preempt_count(PREEMPT_ACTIVE);
>  	/*
>  	 * We keep the big kernel semaphore locked, but we
> @@ -3189,7 +3193,7 @@ need_resched:
>  	barrier();
>  	if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
>  		goto need_resched;
> -	local_irq_enable();
> +	hard_local_irq_enable();
>  }
>  
>  EXPORT_SYMBOL(preempt_schedule);
> @@ -3217,6 +3221,9 @@ asmlinkage void __sched preempt_schedule
>  		return;
>  
>  need_resched:
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +	hard_local_irq_disable();
> +#endif
>  	add_preempt_count(PREEMPT_ACTIVE);
>  	/*
>  	 * We keep the big kernel semaphore locked, but we
> @@ -3228,7 +3235,12 @@ need_resched:
>  	task->lock_depth = -1;
>  #endif
>  	__schedule();
> -	local_irq_disable();
> +
> +	_hard_local_irq_disable();
> +#ifdef CONFIG_RT_IRQ_DISABLE
> +	local_irq_enable_noresched();
> +#endif
> +
>  #ifdef CONFIG_PREEMPT_BKL
>  	task->lock_depth = saved_lock_depth;
>  #endif
> @@ -4160,7 +4172,7 @@ asmlinkage long sys_sched_yield(void)
>  	__preempt_enable_no_resched();
>  
>  	__schedule();
> -	local_irq_enable();
> +	hard_local_irq_enable();
>  	preempt_check_resched();
>  
>  	return 0;
> @@ -4173,11 +4185,11 @@ static void __cond_resched(void)
>  	if (preempt_count() & PREEMPT_ACTIVE)
>  		return;
>  	do {
> -		local_irq_disable();
> +		hard_local_irq_disable();
>  		add_preempt_count(PREEMPT_ACTIVE);
>  		__schedule();
>  	} while (need_resched());
> -	local_irq_enable();
> +	hard_local_irq_enable();
>  }
>  
>  int __sched cond_resched(void)
> Index: linux-2.6.11/kernel/timer.c
> ===================================================================
> --- linux-2.6.11.orig/kernel/timer.c	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/kernel/timer.c	2005-06-08 01:09:15.000000000 +0000
> @@ -437,9 +437,10 @@ static int cascade(tvec_base_t *base, tv
>  static inline void __run_timers(tvec_base_t *base)
>  {
>  	struct timer_list *timer;
> +	unsigned long jiffies_sample = jiffies;
>  
>  	spin_lock_irq(&base->lock);
> -	while (time_after_eq(jiffies, base->timer_jiffies)) {
> +	while (time_after_eq(jiffies_sample, base->timer_jiffies)) {
>  		struct list_head work_list = LIST_HEAD_INIT(work_list);
>  		struct list_head *head = &work_list;
>   		int index = base->timer_jiffies & TVR_MASK;
> Index: linux-2.6.11/lib/Kconfig.RT
> ===================================================================
> --- linux-2.6.11.orig/lib/Kconfig.RT	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/lib/Kconfig.RT	2005-06-08 06:56:35.000000000 +0000
> @@ -81,6 +81,25 @@ config PREEMPT_RT
>  
>  endchoice
>  
> +config RT_IRQ_DISABLE
> +	bool "Real-Time IRQ Disable"
> +	default y
> +	depends on PREEMPT_RT
> +	help
> +	  This option will remove all local_irq_enable() and
> +	  local_irq_disable() calls and replace them with soft
> +	  versions. This will decrease the frequency that
> +	  interrupt are disabled.
> +
> +	  All interrupts that are flagged with SA_NODELAY are
> +	  considered hard interrupts. This option will force
> +	  SA_NODELAY interrupts to run even when they normally
> +	  wouldn't be enabled.
> +
> +	  Select this if you plan to use Linux in an 
> +	  embedded enviorment that needs low interrupt
> +	  latency.
> +
>  config PREEMPT
>  	bool
>  	default y
> Index: linux-2.6.11/lib/kernel_lock.c
> ===================================================================
> --- linux-2.6.11.orig/lib/kernel_lock.c	2005-06-08 00:33:21.000000000 +0000
> +++ linux-2.6.11/lib/kernel_lock.c	2005-06-08 00:35:30.000000000 +0000
> @@ -98,7 +98,8 @@ int __lockfunc __reacquire_kernel_lock(v
>  	struct task_struct *task = current;
>  	int saved_lock_depth = task->lock_depth;
>  
> -	local_irq_enable();
> +	hard_local_irq_enable();
> +
>  	BUG_ON(saved_lock_depth < 0);
>  
>  	task->lock_depth = -1;
> @@ -107,8 +108,8 @@ int __lockfunc __reacquire_kernel_lock(v
>  
>  	task->lock_depth = saved_lock_depth;
>  
> -	local_irq_disable();
> -
> +	hard_local_irq_disable();
> +	
>  	return 0;
>  }
>  
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-10 23:37 ` Esben Nielsen
@ 2005-06-11  0:20   ` Daniel Walker
  2005-06-11 13:13     ` Esben Nielsen
  0 siblings, 1 reply; 86+ messages in thread
From: Daniel Walker @ 2005-06-11  0:20 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: linux-kernel, mingo, sdietrich

On Sat, 2005-06-11 at 01:37 +0200, Esben Nielsen wrote:
> I am sorry, Daniel, but this patch doesn't make much sense to me.
> As far as I can see you effectlively turned local_irq_disable() into a
> preempt_disable(). I.e. it gives no improvement to task latency. As all
> interrupts are threaded it will not improve irq-latency either....

It does turn local_irq_disable() into a preempt_disable() . My
definition of IRQ latency is the longest period that interrupts are
disabled. So my patch does reduce interrupt latency.

> I hope it is just me who have misunderstood the patch, but as far as I see
> it
>  taks->preempt_count
> is non-zero in a local_irq_disable() region. That means preempt_schedule()
> wont schedule.

True.

> What is the problem you wanted to fix in the first place?
> Drivers and subsystems using local_irq_disable()/enable() as a lock -
> which is valid for !PREEMPT_RT to protect per-cpu variables but not a good
> idea when we want determnistic realtime. Thus these regions needs to be
> made preemptive, such any RT events can come through even though you have
> enabled a non-RT subsystem  where local_irq_disable()/enable() haven't
> been removed. 

My patch gives a set upper bound on interrupt latency that doesn't
change depending on config. You can also measure all these disable
sections. Your missing the relevance of the SA_NODELAY flag. If you have
a specific interrupt that you want to be low latency you would flag it
SA_NODELAY .

> As far as I can see the only solution is to replace them with a per-cpu
> mutex. Such a mutex can be the rt_mutex for now, but someone may want to
> make a more optimized per-cpu version where a raw_spin_lock isn't used.
> That would make it nearly as cheap as cli()/sti() when there is no
> congestion. One doesn't need PI for this region either as the RT
> subsystems will not hit it anyway.

I don't like this solution mainly because it's so expensive. cli/sti may
take a few cycles at most, what your suggesting may take 50 times that,
which would similar in speed to put linux under adeos.. Plus take into
account that the average interrupt disable section is very small .. I
also think it's possible to extend my version to allow those section to
be preemptible but keep the cost equally low.

Daniel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11  0:20   ` Daniel Walker
@ 2005-06-11 13:13     ` Esben Nielsen
  2005-06-11 13:46       ` Ingo Molnar
                         ` (2 more replies)
  0 siblings, 3 replies; 86+ messages in thread
From: Esben Nielsen @ 2005-06-11 13:13 UTC (permalink / raw)
  To: Daniel Walker; +Cc: linux-kernel, mingo, sdietrich

On Fri, 10 Jun 2005, Daniel Walker wrote:

> On Sat, 2005-06-11 at 01:37 +0200, Esben Nielsen wrote:
> [...]
> > As far as I can see the only solution is to replace them with a per-cpu
> > mutex. Such a mutex can be the rt_mutex for now, but someone may want to
> > make a more optimized per-cpu version where a raw_spin_lock isn't used.
> > That would make it nearly as cheap as cli()/sti() when there is no
> > congestion. One doesn't need PI for this region either as the RT
> > subsystems will not hit it anyway.
> 
> I don't like this solution mainly because it's so expensive. cli/sti may
> take a few cycles at most, what your suggesting may take 50 times that,
> which would similar in speed to put linux under adeos.. 

We are only talking about the local_irq_disable()/enable() in drivers, not
the core system, right? Therefore making it into a mutex will not be that
expensive overall.

> Plus take into
> account that the average interrupt disable section is very small .. I
> also think it's possible to extend my version to allow those section to
> be preemptible but keep the cost equally low.
> 

The more I think about it the more dangerous I think it is. What does
local_irq_disable() protect against? All local threads as well as
irq-handlers. If these sections keeped mutual exclusive but preemtible we
will not have protected against a irq-handler.

I will start to play around with the following:
1) Make local_irq_disable() stop compiling to see how many we are really
talking about.
2) Make local_cpu_lock, which on PREEMPT_RT is a rt_mutex and on
!PREEMPT_RT turns into local_irq_disable()/enable() pairs. To introduce
this will demand some code-analyzing for each case but I am afraid there
is no general one-size solution to all the places.

> Daniel
> 
> 

Esben

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 13:13     ` Esben Nielsen
@ 2005-06-11 13:46       ` Ingo Molnar
  2005-06-11 14:32         ` Esben Nielsen
  2005-06-11 16:19         ` Daniel Walker
  2005-06-11 13:51       ` Ingo Molnar
  2005-06-11 16:09       ` Daniel Walker
  2 siblings, 2 replies; 86+ messages in thread
From: Ingo Molnar @ 2005-06-11 13:46 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Daniel Walker, linux-kernel, sdietrich

* Esben Nielsen <simlo@phys.au.dk> wrote:

> > Plus take into
> > account that the average interrupt disable section is very small .. I
> > also think it's possible to extend my version to allow those section to
> > be preemptible but keep the cost equally low.
> > 
> 
> The more I think about it the more dangerous I think it is. What does 
> local_irq_disable() protect against? All local threads as well as 
> irq-handlers. If these sections keeped mutual exclusive but preemtible 
> we will not have protected against a irq-handler.

one way to make it safe/reviewable is to runtime warn if 
local_irq_disable() is called from a !preempt_count() section. But this 
will uncover quite some code. There's some code in the VM, in the 
buffer-cache, in the RCU code - etc. that uses per-CPU data structures 
and assumes non-preemptability of local_irq_disable().

> I will start to play around with the following:
> 1) Make local_irq_disable() stop compiling to see how many we are really
> talking about.

there are roughly 100 places:

 $ objdump -d vmlinux | grep -w call |
      grep -wE 'local_irq_disable|local_irq_save' | wc -l
 116

the advantage of having such primitives as out-of-line function calls :)

> 2) Make local_cpu_lock, which on PREEMPT_RT is a rt_mutex and on
> !PREEMPT_RT turns into local_irq_disable()/enable() pairs. To introduce
> this will demand some code-analyzing for each case but I am afraid there
> is no general one-size solution to all the places.

I'm not sure we'd gain much from this. Lets assume we have a highprio RT 
task that is waiting for an external event. Will it be able to preempt 
the IRQ mutex?  Yes. Will it be able to make any progress: no, because 
it needs an IRQ thread to run to get the wakeup in the first place, and 
the IRQ thread needs to take the IRQ mutex => serialization.

what seems a better is to rewrite per-CPU-local-irq-disable users to 
make use of the DEFINE_PER_CPU_LOCKED/per_cpu_locked/get_cpu_lock 
primitives to use preemptible per-CPU data structures. In this case 
these sections would be truly preemptible. I've done this for a couple 
of cases already, where it was unavoidable for lock-dependency reasons.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 13:13     ` Esben Nielsen
  2005-06-11 13:46       ` Ingo Molnar
@ 2005-06-11 13:51       ` Ingo Molnar
  2005-06-11 15:00         ` Mika Penttilä
  2005-06-11 16:28         ` Daniel Walker
  2005-06-11 16:09       ` Daniel Walker
  2 siblings, 2 replies; 86+ messages in thread
From: Ingo Molnar @ 2005-06-11 13:51 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Daniel Walker, linux-kernel, sdietrich


i've done two more things in the latest patches:

- decoupled the 'soft IRQ flag' from the hard IRQ flag. There's
  basically no need for the hard IRQ state to follow the soft IRQ state. 
  This makes the hard IRQ disable primitives a bit faster.

- for raw spinlocks i've reintroduced raw_local_irq primitives again.
  This helped get rid of some grossness in sched.c, and the raw
  spinlocks disable preemption anyway. It's also safer to just assume
  that if a raw spinlock is used together with the IRQ flag that the
  real IRQ flag has to be disabled.

these changes dont really impact scheduling/preemption behavior, they 
are cleanup/robustization changes.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 13:46       ` Ingo Molnar
@ 2005-06-11 14:32         ` Esben Nielsen
  2005-06-11 16:36           ` Daniel Walker
  2005-06-11 16:41           ` Sven-Thorsten Dietrich
  2005-06-11 16:19         ` Daniel Walker
  1 sibling, 2 replies; 86+ messages in thread
From: Esben Nielsen @ 2005-06-11 14:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Daniel Walker, linux-kernel, sdietrich

On Sat, 11 Jun 2005, Ingo Molnar wrote:

> 
> * Esben Nielsen <simlo@phys.au.dk> wrote:
> 
> > > Plus take into
> > > account that the average interrupt disable section is very small .. I
> > > also think it's possible to extend my version to allow those section to
> > > be preemptible but keep the cost equally low.
> > > 
> > 
> > The more I think about it the more dangerous I think it is. What does 
> > local_irq_disable() protect against? All local threads as well as 
> > irq-handlers. If these sections keeped mutual exclusive but preemtible 
> > we will not have protected against a irq-handler.
> 
> one way to make it safe/reviewable is to runtime warn if 
> local_irq_disable() is called from a !preempt_count() section. But this 
> will uncover quite some code. 


> There's some code in the VM, in the 
> buffer-cache, in the RCU code - etc. that uses per-CPU data structures 
> and assumes non-preemptability of local_irq_disable().
> 
For me it is perfectly ok if RCU code, buffer caches etc use
raw_local_irq_disable(). I consider that code to be "core" code.

> > I will start to play around with the following:
> > 1) Make local_irq_disable() stop compiling to see how many we are really
> > talking about.
> 
> there are roughly 100 places:
> 
>  $ objdump -d vmlinux | grep -w call |
>       grep -wE 'local_irq_disable|local_irq_save' | wc -l
>  116
> 
> the advantage of having such primitives as out-of-line function calls :)

But many of those might be called from inline functions :-)

> 
> > 2) Make local_cpu_lock, which on PREEMPT_RT is a rt_mutex and on
> > !PREEMPT_RT turns into local_irq_disable()/enable() pairs. To introduce
> > this will demand some code-analyzing for each case but I am afraid there
> > is no general one-size solution to all the places.
> 
> I'm not sure we'd gain much from this. Lets assume we have a highprio RT 
> task that is waiting for an external event. Will it be able to preempt 
> the IRQ mutex?  Yes. Will it be able to make any progress: no, because 
> it needs an IRQ thread to run to get the wakeup in the first place, and 
> the IRQ thread needs to take the IRQ mutex => serialization.
>

That is exactly my point: We can't make a per-cpu mutex to replace
local_irq_disable(). We have to make real lock for each subsystem now
relying on local_irq_disable(). A global lock will not work. We could have
a temporary lock all non-RT can share but that would be a hack similar to
BKL.

The current soft-irq states only gives us better hard-irq latency but
nothing else. I think the overhead runtime and the complication of the
code is way too big for gaining only that. 

What is the aim of PREEMPT_RT? Low irq-latency, low task latency or
determnistic task latency? The soft irq-state helps on the first but
harms the two others indirectly by introducing extra overhead. To be
honest I think that approach should be abandoned.
 
> what seems a better is to rewrite per-CPU-local-irq-disable users to 
> make use of the DEFINE_PER_CPU_LOCKED/per_cpu_locked/get_cpu_lock 
> primitives to use preemptible per-CPU data structures. In this case 
> these sections would be truly preemptible. I've done this for a couple 
> of cases already, where it was unavoidable for lock-dependency reasons.
>

I'll continue that work then but in a way where !PREEMPT_RT will make it
back into local-irq-disable such it wont hurt performance there.
 
I.e. I will try to make a macro system, and try to turn references to
local_irq_disable() into these - or raw_local_irq_disable().

> 	Ingo
> 
Esben



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 13:51       ` Ingo Molnar
@ 2005-06-11 15:00         ` Mika Penttilä
  2005-06-11 16:45           ` Sven-Thorsten Dietrich
  2005-06-11 16:28         ` Daniel Walker
  1 sibling, 1 reply; 86+ messages in thread
From: Mika Penttilä @ 2005-06-11 15:00 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Esben Nielsen, Daniel Walker, linux-kernel, sdietrich

Ingo Molnar wrote:

>i've done two more things in the latest patches:
>
>- decoupled the 'soft IRQ flag' from the hard IRQ flag. There's
>  basically no need for the hard IRQ state to follow the soft IRQ state. 
>  This makes the hard IRQ disable primitives a bit faster.
>
>- for raw spinlocks i've reintroduced raw_local_irq primitives again.
>  This helped get rid of some grossness in sched.c, and the raw
>  spinlocks disable preemption anyway. It's also safer to just assume
>  that if a raw spinlock is used together with the IRQ flag that the
>  real IRQ flag has to be disabled.
>
>these changes dont really impact scheduling/preemption behavior, they 
>are cleanup/robustization changes.
>
>	Ingo
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>  
>
With the soft IRQ flag local_irq_disable() doesn't seem to protect 
against soft interrupts (via SA_NODELAY interrupt-> invoke_softirq()). 
Could this be a problem?

--Mika


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 13:13     ` Esben Nielsen
  2005-06-11 13:46       ` Ingo Molnar
  2005-06-11 13:51       ` Ingo Molnar
@ 2005-06-11 16:09       ` Daniel Walker
  2005-06-11 16:31         ` Esben Nielsen
  2 siblings, 1 reply; 86+ messages in thread
From: Daniel Walker @ 2005-06-11 16:09 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: linux-kernel, mingo, sdietrich



On Sat, 11 Jun 2005, Esben Nielsen wrote:

> On Fri, 10 Jun 2005, Daniel Walker wrote:
> 
> > On Sat, 2005-06-11 at 01:37 +0200, Esben Nielsen wrote:
> > [...]
> > > As far as I can see the only solution is to replace them with a per-cpu
> > > mutex. Such a mutex can be the rt_mutex for now, but someone may want to
> > > make a more optimized per-cpu version where a raw_spin_lock isn't used.
> > > That would make it nearly as cheap as cli()/sti() when there is no
> > > congestion. One doesn't need PI for this region either as the RT
> > > subsystems will not hit it anyway.
> > 
> > I don't like this solution mainly because it's so expensive. cli/sti may
> > take a few cycles at most, what your suggesting may take 50 times that,
> > which would similar in speed to put linux under adeos.. 
> 
> We are only talking about the local_irq_disable()/enable() in drivers, not
> the core system, right? Therefore making it into a mutex will not be that
> expensive overall.

No, core system . We're talking about everything, including
raw_spinlock_t.
 
> The more I think about it the more dangerous I think it is. What does
> local_irq_disable() protect against? All local threads as well as
> irq-handlers. If these sections keeped mutual exclusive but preemtible we
> will not have protected against a irq-handler.

Which IRQ handlers are those ? There is only one interrupt context handler
that must be protected, if you enable PREEMPT_HARDIRQS .
 
> I will start to play around with the following:
> 1) Make local_irq_disable() stop compiling to see how many we are really
> talking about.

I did that in my release notes , my patch reduced the number of cli's in
my kernel to 30% of what was in PREEMPT_RT .

Daniel


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 13:46       ` Ingo Molnar
  2005-06-11 14:32         ` Esben Nielsen
@ 2005-06-11 16:19         ` Daniel Walker
  1 sibling, 0 replies; 86+ messages in thread
From: Daniel Walker @ 2005-06-11 16:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Esben Nielsen, linux-kernel, sdietrich



On Sat, 11 Jun 2005, Ingo Molnar wrote:

> 
> one way to make it safe/reviewable is to runtime warn if 
> local_irq_disable() is called from a !preempt_count() section. But this 
> will uncover quite some code. There's some code in the VM, in the 
> buffer-cache, in the RCU code - etc. that uses per-CPU data structures 
> and assumes non-preemptability of local_irq_disable().


Are you talking about make those section preemptible? Or a safety problem
in the current method.

Daniel


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 13:51       ` Ingo Molnar
  2005-06-11 15:00         ` Mika Penttilä
@ 2005-06-11 16:28         ` Daniel Walker
  2005-06-11 16:46           ` Esben Nielsen
  1 sibling, 1 reply; 86+ messages in thread
From: Daniel Walker @ 2005-06-11 16:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Esben Nielsen, linux-kernel, sdietrich

On Sat, 11 Jun 2005, Ingo Molnar wrote:

> - for raw spinlocks i've reintroduced raw_local_irq primitives again.
>   This helped get rid of some grossness in sched.c, and the raw
>   spinlocks disable preemption anyway. It's also safer to just assume
>   that if a raw spinlock is used together with the IRQ flag that the
>   real IRQ flag has to be disabled.

I don't know about this one .. That grossness was there so people aren't
able to easily add new disable sections. 

Could we add a new raw_raw_spinlock_t that really disable interrupt , then
investigate each one . There are really only two that need it, runqueue
lock , and the irq descriptor lock . If you add it back for all raw types
you just add back more un-needed disable sections. The only way a raw lock
needs to disable interrupts is if it's possible to enter that region from
interrupt context .

Daniel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 16:09       ` Daniel Walker
@ 2005-06-11 16:31         ` Esben Nielsen
  0 siblings, 0 replies; 86+ messages in thread
From: Esben Nielsen @ 2005-06-11 16:31 UTC (permalink / raw)
  To: Daniel Walker; +Cc: linux-kernel, mingo, sdietrich

On Sat, 11 Jun 2005, Daniel Walker wrote:

> 
> 
> On Sat, 11 Jun 2005, Esben Nielsen wrote:
> 
> > On Fri, 10 Jun 2005, Daniel Walker wrote:
> > 
> > > On Sat, 2005-06-11 at 01:37 +0200, Esben Nielsen wrote:
> > > [...]
> > > > As far as I can see the only solution is to replace them with a per-cpu
> > > > mutex. Such a mutex can be the rt_mutex for now, but someone may want to
> > > > make a more optimized per-cpu version where a raw_spin_lock isn't used.
> > > > That would make it nearly as cheap as cli()/sti() when there is no
> > > > congestion. One doesn't need PI for this region either as the RT
> > > > subsystems will not hit it anyway.
> > > 
> > > I don't like this solution mainly because it's so expensive. cli/sti may
> > > take a few cycles at most, what your suggesting may take 50 times that,
> > > which would similar in speed to put linux under adeos.. 
> > 
> > We are only talking about the local_irq_disable()/enable() in drivers, not
> > the core system, right? Therefore making it into a mutex will not be that
> > expensive overall.
> 
> No, core system . We're talking about everything, including
> raw_spinlock_t.
>
I think that is a really bad idea then. It only helps on irq-latency. The
rest of the system will see lower performance. It should certainly not be
on as a default thing with PREEMPT_RT. Such low latencies are rarely
needed.

I think extremely low irq-latencies can better obtained with other 
solutions closer to the sub-kernel approach, i.e. taking it completely 
away from the scheduler such that the whole kernel - including the
scheduler, raw_spinlock etc. - runs with irqs enabled.

Actually, I think there should be 3 "levels" of irq-macroes:
 local_irq_disable()      should not compile under PREEMPT_RT (!!)
 raw_local_irq_disable()  should be used within the core kernel code.
 hard_local_irq_disable() for the very low latency interrupt systems.
Under normal circumstances raw_local and hard_local should refer to the
hardware but raw_local can be made into a soft-interrupt state close to
the sub-kernel approach such that one or two very special interupts can
come through.

Esben

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 14:32         ` Esben Nielsen
@ 2005-06-11 16:36           ` Daniel Walker
  2005-06-11 17:26             ` Thomas Gleixner
  2005-06-11 19:16             ` Ingo Molnar
  2005-06-11 16:41           ` Sven-Thorsten Dietrich
  1 sibling, 2 replies; 86+ messages in thread
From: Daniel Walker @ 2005-06-11 16:36 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Ingo Molnar, linux-kernel, sdietrich



On Sat, 11 Jun 2005, Esben Nielsen wrote:

> For me it is perfectly ok if RCU code, buffer caches etc use
> raw_local_irq_disable(). I consider that code to be "core" code.

This distinction seem completly baseless to me. Core code doesn't
carry any weight . The question is , can the code be called from real
interrupt context ? If not then don't protect it.

> 
> The current soft-irq states only gives us better hard-irq latency but
> nothing else. I think the overhead runtime and the complication of the
> code is way too big for gaining only that. 

Interrupt response is massive, check the adeos vs. RT numbers . They did
one test which was just interrupt latency.


Daniel


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 14:32         ` Esben Nielsen
  2005-06-11 16:36           ` Daniel Walker
@ 2005-06-11 16:41           ` Sven-Thorsten Dietrich
  2005-06-11 17:16             ` Esben Nielsen
  1 sibling, 1 reply; 86+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-06-11 16:41 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Ingo Molnar, Daniel Walker, linux-kernel

On Sat, 2005-06-11 at 16:32 +0200, Esben Nielsen wrote:

> > > The more I think about it the more dangerous I think it is. What does 
> > > local_irq_disable() protect against? All local threads as well as 
> > > irq-handlers. If these sections keeped mutual exclusive but preemtible 
> > > we will not have protected against a irq-handler.
> > 

The more Dangerous you perieve it to be. Can you point to real damage?
After all this is experimental code.

> That is exactly my point: We can't make a per-cpu mutex to replace
> local_irq_disable(). We have to make real lock for each subsystem now
> relying on local_irq_disable(). A global lock will not work. We could have
> a temporary lock all non-RT can share but that would be a hack similar to
> BKL.
> 

Why do we need any of this?

> The current soft-irq states only gives us better hard-irq latency but
> nothing else. I think the overhead runtime and the complication of the
> code is way too big for gaining only that. 

Real numbers please, not speculation! Science, not religion.

> What is the aim of PREEMPT_RT? Low irq-latency, low task latency or
> determnistic task latency? The soft irq-state helps on the first but
> harms the two others indirectly by introducing extra overhead. To be
> honest I think that approach should be abandoned.

The purpose of the irq fix is deterministic interrupt response at a low
cost.

Quite honestly, if it does get abandoned in the community, Daniel and I
will continue to maintain it.

>  
> > what seems a better is to rewrite per-CPU-local-irq-disable users to 
> > make use of the DEFINE_PER_CPU_LOCKED/per_cpu_locked/get_cpu_lock 
> > primitives to use preemptible per-CPU data structures. In this case 
> > these sections would be truly preemptible. I've done this for a couple 
> > of cases already, where it was unavoidable for lock-dependency reasons.
> >
> 
> I'll continue that work then but in a way where !PREEMPT_RT will make it
> back into local-irq-disable such it wont hurt performance there.
>  
> I.e. I will try to make a macro system, and try to turn references to
> local_irq_disable() into these - or raw_local_irq_disable().

Esben, are there any bugs you are finding, or do you just dislike the
overhead?

Your argument is identical to the one that people made (and still make)
about preemption in the first place. 

A lot of issues were raised about simple preempt disable that was
necessary to make the kernel preemptable outside critical sections with
preemption, even though the concept was easy to understand from an SMP
POV. A lot of FUD is STILL floating around from that, 5 YEARS AGO.

A lot of discussion has been taking place about the constant overhead
necessary to improve preemption down to the level where we are now, in
PREEMPT_RT (with or without the IRQ relief you are concerned about)

The overhead of the mutex, which IS somewhat significant.

The people who need the preemption response-time absolutely don't care
about the overhead, because the cost of the CPUs required to make
vanilla Linux (with its high preemption latency) perform at latencies as
low as RT, is much higher, than the cost of a middle-of-the-road-
upgrade, to mask the constant mutex overhead.

In other words, the people who need RT preemption are willing to accept
the overhead absolutely, because it still saves them money, time, power,
whatever.

Daniel has now gone another step, very similar to the first step in
preemption, but taking advantage of a new environment, IRQs in threads.

The price here isn't that high, and the gain is astronomical, in terms
of the confidence in the interrupt subsystem that can be established
with this.

In a similar context as the argument I made above, people who need this
level of performance, will be real happy that they can leverage it, and
they WILL accept the overhead.

Daniel made this configurable, which is conventional in add-on patches.
Ingo removed that, and I do not disagree with either approach. I am just
glad that some folks understand the value of this concept, which Daniel
and I have been digesting since we opensourced the Mutex concept in the
first place.

I would prefer REAL performance data on the overhead you are suggesting,
rather than pages and pages of speculation, that make me question if you
have fully digested what is being done here.

I will be producing those ASAP, but I am REALLY busy with people who are
excited about the new era in Linux development that we are
(collectively) introducing. We are not expecting everyone to understand
it or agree with it.

But this is scientific, and we do need scientific approaches to
analysis.

Thanks,

Sven

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 15:00         ` Mika Penttilä
@ 2005-06-11 16:45           ` Sven-Thorsten Dietrich
  2005-06-11 16:53             ` Mika Penttilä
  0 siblings, 1 reply; 86+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-06-11 16:45 UTC (permalink / raw)
  To: Mika Penttilä
  Cc: Ingo Molnar, Esben Nielsen, Daniel Walker, linux-kernel

On Sat, 2005-06-11 at 18:00 +0300, Mika Penttilä wrote:
> Ingo Molnar wrote:
> 
> >i've done two more things in the latest patches:
> >
> >- decoupled the 'soft IRQ flag' from the hard IRQ flag. There's
> >  basically no need for the hard IRQ state to follow the soft IRQ state. 
> >  This makes the hard IRQ disable primitives a bit faster.
> >
> >- for raw spinlocks i've reintroduced raw_local_irq primitives again.
> >  This helped get rid of some grossness in sched.c, and the raw
> >  spinlocks disable preemption anyway. It's also safer to just assume
> >  that if a raw spinlock is used together with the IRQ flag that the
> >  real IRQ flag has to be disabled.
> >
> >these changes dont really impact scheduling/preemption behavior, they 
> >are cleanup/robustization changes.
> >
> >	Ingo
> >-
> >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >Please read the FAQ at  http://www.tux.org/lkml/
> >
> >  
> >
> With the soft IRQ flag local_irq_disable() doesn't seem to protect 
> against soft interrupts (via SA_NODELAY interrupt-> invoke_softirq()). 
> Could this be a problem?

Only if you run SOFT IRQs as SA_NODELAY, which is going to KILL all your
preemption gains with the first arriving network packet.

And that is, if you don't get buried in "scheduling while atomic" printk
messages first.

SA_NODELAY is not generally allowed in PREEMPT_RT, except for code
designed to take advantage of the IRQ void that has been created. 

This code must follow a new set of rules, which people who design RT
apps are really happy to accet, they have to accept worse compromises
with the alternatives (subkernel or ANOTHER OS (ugh))

Sven



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 16:28         ` Daniel Walker
@ 2005-06-11 16:46           ` Esben Nielsen
  0 siblings, 0 replies; 86+ messages in thread
From: Esben Nielsen @ 2005-06-11 16:46 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Ingo Molnar, linux-kernel, sdietrich

On Sat, 11 Jun 2005, Daniel Walker wrote:

> 
> 
> On Sat, 11 Jun 2005, Ingo Molnar wrote:
> 
> > - for raw spinlocks i've reintroduced raw_local_irq primitives again.
> >   This helped get rid of some grossness in sched.c, and the raw
> >   spinlocks disable preemption anyway. It's also safer to just assume
> >   that if a raw spinlock is used together with the IRQ flag that the
> >   real IRQ flag has to be disabled.
> 
> I don't know about this one .. That grossness was there so people aren't
> able to easily add new disable sections. 
> 
> Could we add a new raw_raw_spinlock_t that really disable interrupt , then
> investigate each one . There are really only two that need it, runqueue
> lock , and the irq descriptor lock . If you add it back for all raw types
> you just add back more un-needed disable sections. The only way a raw lock
> needs to disable interrupts is if it's possible to enter that region from
> interrupt context .
> 
> 
> Daniel
> 
>

We must assume the !PREEMPT_RT writer never uses raw. If he starts to do
that RT will be broken no matter what.

What is it you want to obtain anyway?
As far as I understand it comes from the discussion about 
local_irq_disable() in random driver X made for !PREEMPT_RT can destroy
RT because the author used local_irq_disable() around a large,
non-deterministic section of the code.

That only has one solution:
Disallow local_irq_disable() when PREEMPT_RT is on and make some easy
alternatives. Many of them could be turned into raw_local_irq_disable(),
others into regular locks.

If you want extremely low interrupt latencies I say it is better to use a
sub-kernel - which might be a very, very simple interrupt-dispatcher. 

I think the PREEMPT_RT should go for deterministic task-latencies. Very
low interrupt, special perpose latencies is a whole other issue which I
think should be posponed - and at least should be made obtional.

Esben

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-08  7:08 [PATCH] local_irq_disable removal Daniel Walker
  2005-06-08 11:21 ` Ingo Molnar
  2005-06-10 23:37 ` Esben Nielsen
@ 2005-06-11 16:51 ` Christoph Hellwig
  2005-06-11 22:44   ` Ed Tomlinson
  2005-06-12  6:23   ` Ingo Molnar
  2 siblings, 2 replies; 86+ messages in thread
From: Christoph Hellwig @ 2005-06-11 16:51 UTC (permalink / raw)
  To: Daniel Walker; +Cc: linux-kernel, mingo, sdietrich

folks, can you please take this RT stuff of lkml?  And with that I don't
mean the highlevel discussions what makes sense, but specific patches that
aren't related to anything near mainline.  Just create your own list, like
any other far from mainline project does.

Thanks.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 16:45           ` Sven-Thorsten Dietrich
@ 2005-06-11 16:53             ` Mika Penttilä
  2005-06-11 17:13               ` Daniel Walker
  0 siblings, 1 reply; 86+ messages in thread
From: Mika Penttilä @ 2005-06-11 16:53 UTC (permalink / raw)
  To: Sven-Thorsten Dietrich
  Cc: Ingo Molnar, Esben Nielsen, Daniel Walker, linux-kernel

Sven-Thorsten Dietrich wrote:

>On Sat, 2005-06-11 at 18:00 +0300, Mika Penttilä wrote:
>  
>
>>Ingo Molnar wrote:
>>
>>    
>>
>>>i've done two more things in the latest patches:
>>>
>>>- decoupled the 'soft IRQ flag' from the hard IRQ flag. There's
>>> basically no need for the hard IRQ state to follow the soft IRQ state. 
>>> This makes the hard IRQ disable primitives a bit faster.
>>>
>>>- for raw spinlocks i've reintroduced raw_local_irq primitives again.
>>> This helped get rid of some grossness in sched.c, and the raw
>>> spinlocks disable preemption anyway. It's also safer to just assume
>>> that if a raw spinlock is used together with the IRQ flag that the
>>> real IRQ flag has to be disabled.
>>>
>>>these changes dont really impact scheduling/preemption behavior, they 
>>>are cleanup/robustization changes.
>>>
>>>	Ingo
>>>-
>>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>>the body of a message to majordomo@vger.kernel.org
>>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>Please read the FAQ at  http://www.tux.org/lkml/
>>>
>>> 
>>>
>>>      
>>>
>>With the soft IRQ flag local_irq_disable() doesn't seem to protect 
>>against soft interrupts (via SA_NODELAY interrupt-> invoke_softirq()). 
>>Could this be a problem?
>>    
>>
>
>Only if you run SOFT IRQs as SA_NODELAY, which is going to KILL all your
>preemption gains with the first arriving network packet.
>
>And that is, if you don't get buried in "scheduling while atomic" printk
>messages first.
>
>SA_NODELAY is not generally allowed in PREEMPT_RT, except for code
>designed to take advantage of the IRQ void that has been created. 
>
>This code must follow a new set of rules, which people who design RT
>apps are really happy to accet, they have to accept worse compromises
>with the alternatives (subkernel or ANOTHER OS (ugh))
>
>Sven
>
>
>
>  
>
The timer irq is run as NODELAY, so soft irqs are run against 
local_irq_disable sections all the time.

--Mika



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 16:53             ` Mika Penttilä
@ 2005-06-11 17:13               ` Daniel Walker
  2005-06-11 17:22                 ` Mika Penttilä
  0 siblings, 1 reply; 86+ messages in thread
From: Daniel Walker @ 2005-06-11 17:13 UTC (permalink / raw)
  To: Mika Penttilä
  Cc: Sven-Thorsten Dietrich, Ingo Molnar, Esben Nielsen, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 278 bytes --]



On Sat, 11 Jun 2005, [UTF-8] Mika Penttilä wrote:
> The timer irq is run as NODELAY, so soft irqs are run against 
> local_irq_disable sections all the time.

Softirq's are run in threads . The wake_up_process() path is protected,
and used by the timer interrupt .


Daniel


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 16:41           ` Sven-Thorsten Dietrich
@ 2005-06-11 17:16             ` Esben Nielsen
  2005-06-11 19:29               ` Sven-Thorsten Dietrich
  2005-06-11 20:02               ` Sven-Thorsten Dietrich
  0 siblings, 2 replies; 86+ messages in thread
From: Esben Nielsen @ 2005-06-11 17:16 UTC (permalink / raw)
  To: Sven-Thorsten Dietrich; +Cc: Ingo Molnar, Daniel Walker, linux-kernel

On Sat, 11 Jun 2005, Sven-Thorsten Dietrich wrote:

> On Sat, 2005-06-11 at 16:32 +0200, Esben Nielsen wrote:
> 
> 
> > > > The more I think about it the more dangerous I think it is. What does 
> > > > local_irq_disable() protect against? All local threads as well as 
> > > > irq-handlers. If these sections keeped mutual exclusive but preemtible 
> > > > we will not have protected against a irq-handler.
> > > 
> 
> The more Dangerous you perieve it to be. Can you point to real damage?
> After all this is experimental code.
> 
> > That is exactly my point: We can't make a per-cpu mutex to replace
> > local_irq_disable(). We have to make real lock for each subsystem now
> > relying on local_irq_disable(). A global lock will not work. We could have
> > a temporary lock all non-RT can share but that would be a hack similar to
> > BKL.
> > 
> 
> Why do we need any of this?

If we want deterministic task-latencies without worrying about
proof-reading the code of every subsystem which might be using
local_irq_disable() in some non-deterministic way, we need to make another
locking mechanism.

> 
> > The current soft-irq states only gives us better hard-irq latency but
> > nothing else. I think the overhead runtime and the complication of the
> > code is way too big for gaining only that. 
> 
> 
> Real numbers please, not speculation! Science, not religion.
> 
Well, isn't enough to see that the code contains more instructions and
looks somewhat more complicated?

> > What is the aim of PREEMPT_RT? Low irq-latency, low task latency or
> > determnistic task latency? The soft irq-state helps on the first but
> > harms the two others indirectly by introducing extra overhead. To be
> > honest I think that approach should be abandoned.
> 
> The purpose of the irq fix is deterministic interrupt response at a low
> cost.
> 
> Quite honestly, if it does get abandoned in the community, Daniel and I
> will continue to maintain it.

I think it should be in as an obtion. And it should not be marketed as
fixing the problem of wide spread use of local_irq_disable().
> 
> >  
> > > what seems a better is to rewrite per-CPU-local-irq-disable users to 
> > > make use of the DEFINE_PER_CPU_LOCKED/per_cpu_locked/get_cpu_lock 
> > > primitives to use preemptible per-CPU data structures. In this case 
> > > these sections would be truly preemptible. I've done this for a couple 
> > > of cases already, where it was unavoidable for lock-dependency reasons.
> > >
> > 
> > I'll continue that work then but in a way where !PREEMPT_RT will make it
> > back into local-irq-disable such it wont hurt performance there.
> >  
> > I.e. I will try to make a macro system, and try to turn references to
> > local_irq_disable() into these - or raw_local_irq_disable().
> 
> Esben, are there any bugs you are finding, or do you just dislike the
> overhead?
> 

Both the overhead and the complication. You instroduce yet another kind of
context into the system. I was seeing PREEMPT_RT as a cleanup: There is
basicly _only_ the thread state.

> Your argument is identical to the one that people made (and still make)
> about preemption in the first place. 
> 

Yes it is. Preemption is expensive and if you don't need it avoid it.
This is basicly the same thing - just on another level in the system.
Preemption turns 100 ms latencies into 10 ms latencies. PREEMPT_RT turns
them into 100 us. You can probably take interrupt latencies down to a few
us. That is good. But there is a price to pay in performance at each step.
I do not see any reason to go below 100 us in latencies. Others might want
to pay to get the last bit.

> A lot of issues were raised about simple preempt disable that was
> necessary to make the kernel preemptable outside critical sections with
> preemption, even though the concept was easy to understand from an SMP
> POV. A lot of FUD is STILL floating around from that, 5 YEARS AGO.
> 
> A lot of discussion has been taking place about the constant overhead
> necessary to improve preemption down to the level where we are now, in
> PREEMPT_RT (with or without the IRQ relief you are concerned about)
> 
> The overhead of the mutex, which IS somewhat significant.
> 
> The people who need the preemption response-time absolutely don't care
> about the overhead, because the cost of the CPUs required to make
> vanilla Linux (with its high preemption latency) perform at latencies as
> low as RT, is much higher, than the cost of a middle-of-the-road-
> upgrade, to mask the constant mutex overhead.
> 
> In other words, the people who need RT preemption are willing to accept
> the overhead absolutely, because it still saves them money, time, power,
> whatever.
> 
> Daniel has now gone another step, very similar to the first step in
> preemption, but taking advantage of a new environment, IRQs in threads.
> 
> The price here isn't that high, and the gain is astronomical, in terms
> of the confidence in the interrupt subsystem that can be established
> with this.
> 
> In a similar context as the argument I made above, people who need this
> level of performance, will be real happy that they can leverage it, and
> they WILL accept the overhead.
> 
> Daniel made this configurable, which is conventional in add-on patches.
> Ingo removed that, and I do not disagree with either approach. I am just
> glad that some folks understand the value of this concept, which Daniel
> and I have been digesting since we opensourced the Mutex concept in the
> first place.
> 
I think Ingo made a wrong move there. It should remain optional - just as
the whole PREEMPT_RT and even PREEMPT is optional.

> I would prefer REAL performance data on the overhead you are suggesting,
> rather than pages and pages of speculation, that make me question if you
> have fully digested what is being done here.
> 
> I will be producing those ASAP, but I am REALLY busy with people who are
> excited about the new era in Linux development that we are
> (collectively) introducing. We are not expecting everyone to understand
> it or agree with it.
> 
> But this is scientific, and we do need scientific approaches to
> analysis.
>

Please, you are taking this too personal. I started by asking questions
because what I saw wasn't living up to my expectations (getting around
"random" local_irq_disable()) but "only" gave good irq-latencies, which
I to be frank have no interrest in.

No, I don't understand _all_ the details of the patch but I think I
understand the principles. From my chair it is just an unwanted
complication :-(
 
> Thanks,
> 
> Sven
> 
> 
Esben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 17:13               ` Daniel Walker
@ 2005-06-11 17:22                 ` Mika Penttilä
  2005-06-11 17:25                   ` Daniel Walker
  0 siblings, 1 reply; 86+ messages in thread
From: Mika Penttilä @ 2005-06-11 17:22 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Sven-Thorsten Dietrich, Ingo Molnar, Esben Nielsen, linux-kernel

Daniel Walker wrote:

>On Sat, 11 Jun 2005, [UTF-8] Mika Penttilä wrote:
>  
>
>>The timer irq is run as NODELAY, so soft irqs are run against 
>>local_irq_disable sections all the time.
>>    
>>
>
>Softirq's are run in threads . The wake_up_process() path is protected,
>and used by the timer interrupt .
>
>
>Daniel
>
>
>  
>
Not with !softirq_preemption, then we take the immediate path and 
process soft irqs on irq exit.

--Mika



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 17:22                 ` Mika Penttilä
@ 2005-06-11 17:25                   ` Daniel Walker
  2005-06-11 17:29                     ` Mika Penttilä
  0 siblings, 1 reply; 86+ messages in thread
From: Daniel Walker @ 2005-06-11 17:25 UTC (permalink / raw)
  To: Mika Penttilä
  Cc: Sven-Thorsten Dietrich, Ingo Molnar, Esben Nielsen, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 207 bytes --]



On Sat, 11 Jun 2005, [ISO-8859-15] Mika Penttilä wrote:
> Not with !softirq_preemption, then we take the immediate path and 
> process soft irqs on irq exit.

My changes aren't used in that case.

Daniel


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 16:36           ` Daniel Walker
@ 2005-06-11 17:26             ` Thomas Gleixner
  2005-06-11 18:40               ` Sven-Thorsten Dietrich
  2005-06-11 19:16             ` Ingo Molnar
  1 sibling, 1 reply; 86+ messages in thread
From: Thomas Gleixner @ 2005-06-11 17:26 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Esben Nielsen, Ingo Molnar, linux-kernel, sdietrich

On Sat, 2005-06-11 at 09:36 -0700, Daniel Walker wrote:
> 
> On Sat, 11 Jun 2005, Esben Nielsen wrote:
> 
> > For me it is perfectly ok if RCU code, buffer caches etc use
> > raw_local_irq_disable(). I consider that code to be "core" code.
> 
> This distinction seem completly baseless to me. Core code doesn't
> carry any weight . The question is , can the code be called from real
> interrupt context ? If not then don't protect it.
> 
> > 
> > The current soft-irq states only gives us better hard-irq latency but
> > nothing else. I think the overhead runtime and the complication of the
> > code is way too big for gaining only that. 
> 
> Interrupt response is massive, check the adeos vs. RT numbers . They did
> one test which was just interrupt latency.

Performance on RT systems is more than IRQ latencies. 

The wide spread misbelief that 
  "Realtime == As fast as possible" 

seems to be still stuck in peoples mind.

  "Realtime == As fast as specified"
is the correct equation.

There is always a tradeoff between interrupt latencies and other
performance values, as you have to invent new mechanisms to protect
critical sections. In the end, they can be less effective than the gain
on irq latencies.

While working on high resolution timers on top of RT, I can prove that
changing a couple of shortheld spinlocks  into raw locks (with hardirq
disable), results in 50-80% latency improvement of the scheduled tasks
but increases the interrupt latency by only 5-10%. The differrent
numbers are related to different CPUs (x86/PPC,ARM). 

tglx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 17:25                   ` Daniel Walker
@ 2005-06-11 17:29                     ` Mika Penttilä
  2005-06-11 17:30                       ` Daniel Walker
  0 siblings, 1 reply; 86+ messages in thread
From: Mika Penttilä @ 2005-06-11 17:29 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Sven-Thorsten Dietrich, Ingo Molnar, Esben Nielsen, linux-kernel

Daniel Walker wrote:

>On Sat, 11 Jun 2005, [ISO-8859-15] Mika Penttil� wrote:
>  
>
>>Not with !softirq_preemption, then we take the immediate path and 
>>process soft irqs on irq exit.
>>    
>>
>
>My changes aren't used in that case.
>
>Daniel
>
>
>  
>
ok, so maybe the safe way is to enforce threaded softirq with the soft 
irq flag.

--Mika



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 17:29                     ` Mika Penttilä
@ 2005-06-11 17:30                       ` Daniel Walker
  2005-06-11 17:55                         ` Mika Penttilä
  0 siblings, 1 reply; 86+ messages in thread
From: Daniel Walker @ 2005-06-11 17:30 UTC (permalink / raw)
  To: Mika Penttilä
  Cc: Sven-Thorsten Dietrich, Ingo Molnar, Esben Nielsen, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 273 bytes --]



On Sat, 11 Jun 2005, [UTF-8] Mika Penttilä wrote:

> ok, so maybe the safe way is to enforce threaded softirq with the soft 
> irq flag.
> 

That's already handled, my changes are used only when you turn on
PREEMPT_RT , and PREEMPT_RT forces softirq threading.

Daniel


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 17:30                       ` Daniel Walker
@ 2005-06-11 17:55                         ` Mika Penttilä
  0 siblings, 0 replies; 86+ messages in thread
From: Mika Penttilä @ 2005-06-11 17:55 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Sven-Thorsten Dietrich, Ingo Molnar, Esben Nielsen, linux-kernel

Daniel Walker wrote:

>On Sat, 11 Jun 2005, [UTF-8] Mika Penttilä wrote:
>
>  
>
>>ok, so maybe the safe way is to enforce threaded softirq with the soft 
>>irq flag.
>>
>>    
>>
>
>That's already handled, my changes are used only when you turn on
>PREEMPT_RT , and PREEMPT_RT forces softirq threading.
>
>Daniel
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>  
>
Ok, so everything's fine :)

Thanks,
--Mika



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 17:26             ` Thomas Gleixner
@ 2005-06-11 18:40               ` Sven-Thorsten Dietrich
  2005-06-12  0:07                 ` Thomas Gleixner
  0 siblings, 1 reply; 86+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-06-11 18:40 UTC (permalink / raw)
  To: tglx; +Cc: Daniel Walker, Esben Nielsen, Ingo Molnar, linux-kernel

On Sat, 2005-06-11 at 19:26 +0200, Thomas Gleixner wrote:
> On Sat, 2005-06-11 at 09:36 -0700, Daniel Walker wrote:
> > 
> > On Sat, 11 Jun 2005, Esben Nielsen wrote:
> > 
> > > For me it is perfectly ok if RCU code, buffer caches etc use
> > > raw_local_irq_disable(). I consider that code to be "core" code.
> > 
> > This distinction seem completly baseless to me. Core code doesn't
> > carry any weight . The question is , can the code be called from real
> > interrupt context ? If not then don't protect it.
> > 
> > > 
> > > The current soft-irq states only gives us better hard-irq latency but
> > > nothing else. I think the overhead runtime and the complication of the
> > > code is way too big for gaining only that. 
> > 
> > Interrupt response is massive, check the adeos vs. RT numbers . They did
> > one test which was just interrupt latency.
> 
> Performance on RT systems is more than IRQ latencies. 
> 
> The wide spread misbelief that 
>   "Realtime == As fast as possible" 
> 
> seems to be still stuck in peoples mind.
> 
>   "Realtime == As fast as specified"
> is the correct equation.
> 

I think Daniel was referring to the deviations, but it is always good to
point that out.

> There is always a tradeoff between interrupt latencies and other
> performance values, as you have to invent new mechanisms to protect
> critical sections. In the end, they can be less effective than the gain
> on irq latencies.
> 

Basically you are investing effort to maintain predictability. 

In order to do that, you somethimes have to put a stitch in before the
deadline ("in time..."), the effort increases the work = overhead. 

But if you look at overall time the tasks are waiting you can implement
optimal scheduling, and maximize throughput.

This is too complex to argue about here.

For every example I can find you a corner case.
 
It depends on the application, and you need to decide how to configure
our kernel if it really matters to what you are doing with Linux.

We are merely working to provide alternatives that improve performance.

Sven







^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 16:36           ` Daniel Walker
  2005-06-11 17:26             ` Thomas Gleixner
@ 2005-06-11 19:16             ` Ingo Molnar
  2005-06-11 19:34               ` Esben Nielsen
  1 sibling, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2005-06-11 19:16 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Esben Nielsen, linux-kernel, sdietrich


* Daniel Walker <dwalker@mvista.com> wrote:

> > The current soft-irq states only gives us better hard-irq latency but
> > nothing else. I think the overhead runtime and the complication of the
> > code is way too big for gaining only that. 
> 
> Interrupt response is massive, check the adeos vs. RT numbers . They 
> did one test which was just interrupt latency.

the jury is still out on the accuracy of those numbers. The test had 
RT_DEADLOCK_DETECT (and other -RT debugging features) turned on, which 
mostly work with interrupts disabled. The other question is how were 
interrupt response times measured.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 17:16             ` Esben Nielsen
@ 2005-06-11 19:29               ` Sven-Thorsten Dietrich
  2005-06-11 20:02               ` Sven-Thorsten Dietrich
  1 sibling, 0 replies; 86+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-06-11 19:29 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Ingo Molnar, Daniel Walker, linux-kernel

On Sat, 2005-06-11 at 19:16 +0200, Esben Nielsen wrote:
> On Sat, 11 Jun 2005, Sven-Thorsten Dietrich wrote:
> 
> > On Sat, 2005-06-11 at 16:32 +0200, Esben Nielsen wrote:
> > 
> > > > > The more I think about it the more dangerous I think it is. What does 
> > > > > local_irq_disable() protect against? All local threads as well as 
> > > > > irq-handlers. If these sections keeped mutual exclusive but preemtible 
> > > > > we will not have protected against a irq-handler.
> > > >  
> > The more Dangerous you perieve it to be. Can you point to real damage?
> > After all this is experimental code.
> > 
> > > That is exactly my point: We can't make a per-cpu mutex to replace
> > > local_irq_disable(). We have to make real lock for each subsystem now
> > > relying on local_irq_disable(). A global lock will not work. We could have
> > > a temporary lock all non-RT can share but that would be a hack similar to
> > > BKL.
> > > 
> > 
> > Why do we need any of this?
> 
> If we want deterministic task-latencies without worrying about
> proof-reading the code of every subsystem which might be using
> local_irq_disable() in some non-deterministic way, we need to make another
> locking mechanism.

Why would you want to encourage oddball approaches to driver
development? 

If a driver is causing problems this way, it should be looked at from a
design perspective, because that sort of coding style usually causes
problems with SMP as well.

> > 
> > > The current soft-irq states only gives us better hard-irq latency but
> > > nothing else. I think the overhead runtime and the complication of the
> > > code is way too big for gaining only that. 
> > 
> > 
> > Real numbers please, not speculation! Science, not religion.
> > 
> Well, isn't enough to see that the code contains more instructions and
> looks somewhat more complicated?
> 

It clarifies design aspects of the kernel, and identifies code that has
different behavior assumptions associated with it, than other code that
disables IRQs.  Its a good thing to separate that out, because then you
know what you are looking at, rather than having to assume it.



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 19:16             ` Ingo Molnar
@ 2005-06-11 19:34               ` Esben Nielsen
  2005-06-11 19:44                 ` Sven-Thorsten Dietrich
  2005-06-11 20:03                 ` Ingo Molnar
  0 siblings, 2 replies; 86+ messages in thread
From: Esben Nielsen @ 2005-06-11 19:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Daniel Walker, linux-kernel, sdietrich

On Sat, 11 Jun 2005, Ingo Molnar wrote:

> 
> * Daniel Walker <dwalker@mvista.com> wrote:
> 
> > > The current soft-irq states only gives us better hard-irq latency but
> > > nothing else. I think the overhead runtime and the complication of the
> > > code is way too big for gaining only that. 
> > 
> > Interrupt response is massive, check the adeos vs. RT numbers . They 
> > did one test which was just interrupt latency.
> 
> the jury is still out on the accuracy of those numbers. The test had 
> RT_DEADLOCK_DETECT (and other -RT debugging features) turned on, which 
> mostly work with interrupts disabled. The other question is how were 
> interrupt response times measured.
> 
You would accept a patch where I made this stuff optional?

I have another problem:
I can't hide that my aim is to make task-latencies deterministic.
The worry is local_irq_disable() (and preempt_disable()). I can undefine
it and therefore find where it is used. I can then look at the code, make
it into raw_local_irq_disable() or try to make a lock.
In many cases the raw-irq disable is the best and simplest when I am only
worried about task-latencies. But now Daniel and Sven wants to use the
distingtion between raw_local_irq_disable() and local_irq_disable() to
make irqs fast. 
We do have a clash of notations. Any idea what to do? I mentioned
 local_
 raw_local_
 hard_local_

Would that work?

> 	Ingo

Esben

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 19:34               ` Esben Nielsen
@ 2005-06-11 19:44                 ` Sven-Thorsten Dietrich
  2005-06-11 19:53                   ` Daniel Walker
  2005-06-11 20:23                   ` Esben Nielsen
  2005-06-11 20:03                 ` Ingo Molnar
  1 sibling, 2 replies; 86+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-06-11 19:44 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Ingo Molnar, Daniel Walker, linux-kernel

On Sat, 2005-06-11 at 21:34 +0200, Esben Nielsen wrote:
> On Sat, 11 Jun 2005, Ingo Molnar wrote:
> 
> > 
> > * Daniel Walker <dwalker@mvista.com> wrote:
> > 
> > > > The current soft-irq states only gives us better hard-irq latency but
> > > > nothing else. I think the overhead runtime and the complication of the
> > > > code is way too big for gaining only that. 
> > > 
> > > Interrupt response is massive, check the adeos vs. RT numbers . They 
> > > did one test which was just interrupt latency.
> > 
> > the jury is still out on the accuracy of those numbers. The test had 
> > RT_DEADLOCK_DETECT (and other -RT debugging features) turned on, which 
> > mostly work with interrupts disabled. The other question is how were 
> > interrupt response times measured.
> > 
> You would accept a patch where I made this stuff optional?
> 

Daniel's original patch MADE it optional. Its just that the code is
apparently more complex looking that way, so its cleaner to do what Ingo
did.

> I have another problem:
> I can't hide that my aim is to make task-latencies deterministic.

Then this will help you.

> The worry is local_irq_disable() (and preempt_disable()). I can undefine
> it and therefore find where it is used. I can then look at the code, make
> it into raw_local_irq_disable() or try to make a lock.
> In many cases the raw-irq disable is the best and simplest when I am only
> worried about task-latencies. But now Daniel and Sven wants to use the
> distingtion between raw_local_irq_disable() and local_irq_disable() to
> make irqs fast. 

We aim to make IRQ latencies deterministic. This affects preemption
latency in a positive way. 

Anywhere in the kernel that IRQs are disabled, preemption is impossible.
(you can't interrupt the CPU when irqs are disabled)

But you said you are worried about overhead. You have to incur overhead
to make task response deterministic.

Are you sure you are not just trying to make it FAST?

But then - even if you do, this is still in your interest.

Let's wait for some numbers.

> We do have a clash of notations. Any idea what to do? I mentioned
>  local_

The following two are the same. The former is an earlier implementation,
and has been superseded by the latter. The former is NLA.
>  raw_local_
>  hard_local_

Sven



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 19:44                 ` Sven-Thorsten Dietrich
@ 2005-06-11 19:53                   ` Daniel Walker
  2005-06-11 20:23                   ` Esben Nielsen
  1 sibling, 0 replies; 86+ messages in thread
From: Daniel Walker @ 2005-06-11 19:53 UTC (permalink / raw)
  To: Sven-Thorsten Dietrich; +Cc: Esben Nielsen, Ingo Molnar, linux-kernel



On Sat, 11 Jun 2005, Sven-Thorsten Dietrich wrote:
> 
> Daniel's original patch MADE it optional. Its just that the code is
> apparently more complex looking that way, so its cleaner to do what Ingo
> did.
> 

We need the test coverage too, if there is a problem we're more likly to
find it if everyone runs with it on.

Daniel


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 17:16             ` Esben Nielsen
  2005-06-11 19:29               ` Sven-Thorsten Dietrich
@ 2005-06-11 20:02               ` Sven-Thorsten Dietrich
  1 sibling, 0 replies; 86+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-06-11 20:02 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Ingo Molnar, Daniel Walker, linux-kernel

> > 
> Well, isn't enough to see that the code contains more instructions and
> looks somewhat more complicated?
> 

And - another thing:

Not all instructions take the same amount of time to execute.

Think about the cli operation. This can take real time on highly
pipelined processors, when compared to a simple increment by one
operation.

So the more complex-looking code, might make things faster.

I'm going to say it one last time.

Do the analysis scientifically, and use numbers to get your answers.

This stuff is profound. 

Think it through, and when you find a real problem, then we can take a
look together, and see if it requires a bug fix. 

And consider this - 

There are robots walking around today, that use this concept in Linux
2.4, and they can help you carry a 20 foot long plank across rugged
terrain. In addition, they will stand it up against a wall for you...

So be careful when you say it can't be done in 2.6.

Sven

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 19:34               ` Esben Nielsen
  2005-06-11 19:44                 ` Sven-Thorsten Dietrich
@ 2005-06-11 20:03                 ` Ingo Molnar
  2005-06-11 20:51                   ` Daniel Walker
  2005-06-13  7:08                   ` Sven-Thorsten Dietrich
  1 sibling, 2 replies; 86+ messages in thread
From: Ingo Molnar @ 2005-06-11 20:03 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Daniel Walker, linux-kernel, sdietrich


* Esben Nielsen <simlo@phys.au.dk> wrote:

> > the jury is still out on the accuracy of those numbers. The test had 
> > RT_DEADLOCK_DETECT (and other -RT debugging features) turned on, which 
> > mostly work with interrupts disabled. The other question is how were 
> > interrupt response times measured.
> > 
> You would accept a patch where I made this stuff optional?

I'm not sure why. The soft-flag based local_irq_disable() should in fact 
be a tiny bit faster than the cli based approach, on a fair number of 
CPUs. But it should definitely not be slower in any measurable way.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 19:44                 ` Sven-Thorsten Dietrich
  2005-06-11 19:53                   ` Daniel Walker
@ 2005-06-11 20:23                   ` Esben Nielsen
  2005-06-11 22:59                     ` Sven-Thorsten Dietrich
  1 sibling, 1 reply; 86+ messages in thread
From: Esben Nielsen @ 2005-06-11 20:23 UTC (permalink / raw)
  To: Sven-Thorsten Dietrich; +Cc: Ingo Molnar, Daniel Walker, linux-kernel


On Sat, 11 Jun 2005, Sven-Thorsten Dietrich wrote:

> On Sat, 2005-06-11 at 21:34 +0200, Esben Nielsen wrote:
> > On Sat, 11 Jun 2005, Ingo Molnar wrote:
> > 
> > > 
> > > * Daniel Walker <dwalker@mvista.com> wrote:
> > > 
> > > > > The current soft-irq states only gives us better hard-irq latency but
> > > > > nothing else. I think the overhead runtime and the complication of the
> > > > > code is way too big for gaining only that. 
> > > > 
> > > > Interrupt response is massive, check the adeos vs. RT numbers . They 
> > > > did one test which was just interrupt latency.
> > > 
> > > the jury is still out on the accuracy of those numbers. The test had 
> > > RT_DEADLOCK_DETECT (and other -RT debugging features) turned on, which 
> > > mostly work with interrupts disabled. The other question is how were 
> > > interrupt response times measured.
> > > 
> > You would accept a patch where I made this stuff optional?
> > 
> 
> Daniel's original patch MADE it optional. Its just that the code is
> apparently more complex looking that way, so its cleaner to do what Ingo
> did.
> 
> > I have another problem:
> > I can't hide that my aim is to make task-latencies deterministic.
> 
> Then this will help you.
> 
No because it correctly leaves irqs on but not preemption on. 

> > The worry is local_irq_disable() (and preempt_disable()). I can undefine
> > it and therefore find where it is used. I can then look at the code, make
> > it into raw_local_irq_disable() or try to make a lock.
> > In many cases the raw-irq disable is the best and simplest when I am only
> > worried about task-latencies. But now Daniel and Sven wants to use the
> > distingtion between raw_local_irq_disable() and local_irq_disable() to
> > make irqs fast. 
> 
> We aim to make IRQ latencies deterministic. This affects preemption
> latency in a positive way. 
> 
No. If you leave preemption off but irqs on, which is what is done here,
you get good, deterministic IRQ latencies but nothing for task-latencies -
actually slightly (unmeassureable I agree) worse due to the extra step
you have to go from the physical interrupt to the task-switch is
completed.

> Anywhere in the kernel that IRQs are disabled, preemption is impossible.
> (you can't interrupt the CPU when irqs are disabled)
> 
For me it is the _same_ thing. Equally bad. If preemption is off I don't
care if irqs are off. 

> But you said you are worried about overhead. You have to incur overhead
> to make task response deterministic.
> 
> Are you sure you are not just trying to make it FAST?
> 
Certainly not. I was pressing for priority inheritance forinstance. That
thing does certainly not make it fast, but it makes use of locks 
deterministic.

> But then - even if you do, this is still in your interest.
> 
> Let's wait for some numbers.
> 
> > We do have a clash of notations. Any idea what to do? I mentioned
> >  local_
> 
> The following two are the same. The former is an earlier implementation,
> and has been superseded by the latter. The former is NLA.
> >  raw_local_
> >  hard_local_
> 
My idea was to split them:
 local_ are disallowed in PREEMPT_RT to catch code been made !PREEMPT_RT
suddenly destroying RT.
 raw_local_ is used all over in PREEMPT_RT core code.
 hard_local_ is the same as raw_local unless you configure for low IRQ
latencies (the original patch). Then it marks the soft-irq state.

Do you follow my idea:
I can persue low task latencies. You can persue low irq latencies and we
don't clash due to conflicting naming convensions.

Esben


> Sven
> 
> 


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 20:03                 ` Ingo Molnar
@ 2005-06-11 20:51                   ` Daniel Walker
  2005-06-11 23:44                     ` Thomas Gleixner
                                       ` (3 more replies)
  2005-06-13  7:08                   ` Sven-Thorsten Dietrich
  1 sibling, 4 replies; 86+ messages in thread
From: Daniel Walker @ 2005-06-11 20:51 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Esben Nielsen, linux-kernel, sdietrich

On Sat, 11 Jun 2005, Ingo Molnar wrote:

> 
> * Esben Nielsen <simlo@phys.au.dk> wrote:
> 
> > > the jury is still out on the accuracy of those numbers. The test had 
> > > RT_DEADLOCK_DETECT (and other -RT debugging features) turned on, which 
> > > mostly work with interrupts disabled. The other question is how were 
> > > interrupt response times measured.
> > > 
> > You would accept a patch where I made this stuff optional?
> 
> I'm not sure why. The soft-flag based local_irq_disable() should in fact 
> be a tiny bit faster than the cli based approach, on a fair number of 
> CPUs. But it should definitely not be slower in any measurable way.


Interesting .. So "cli" takes 7 cycles , "sti" takes 7 cycles. The current 
method does "lea" which takes 1 cycle, and "or" which takes 1 cycle. I'm 
not sure if there is any function call overhead .. So the soft replacment 
of cli/sti is 70% faster on a per instruction level .. So it's at least 
not any slower .. Does everyone agree on that?

Daniel


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 16:51 ` Christoph Hellwig
@ 2005-06-11 22:44   ` Ed Tomlinson
  2005-06-12  6:23   ` Ingo Molnar
  1 sibling, 0 replies; 86+ messages in thread
From: Ed Tomlinson @ 2005-06-11 22:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Daniel Walker, linux-kernel, mingo, sdietrich

Sorry Chirstoph but Please no.  Let keep this here.  If find this thread
much more interesting and pertinent than many others.

Ed Tomlinson

On Saturday 11 June 2005 12:51, Christoph Hellwig wrote:
> folks, can you please take this RT stuff of lkml?  And with that I don't
> mean the highlevel discussions what makes sense, but specific patches that
> aren't related to anything near mainline.  Just create your own list, like
> any other far from mainline project does.
> 
> Thanks.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 20:23                   ` Esben Nielsen
@ 2005-06-11 22:59                     ` Sven-Thorsten Dietrich
  2005-06-13  5:22                       ` Steven Rostedt
  0 siblings, 1 reply; 86+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-06-11 22:59 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Ingo Molnar, Daniel Walker, linux-kernel

On Sat, 2005-06-11 at 22:23 +0200, Esben Nielsen wrote:
> > 
> No because it correctly leaves irqs on but not preemption on. 
> 

I see your worries now. See below.

> > > The worry is local_irq_disable() (and preempt_disable()). I can undefine
> > > it and therefore find where it is used. I can then look at the code, make
> > > it into raw_local_irq_disable() or try to make a lock.
> > > In many cases the raw-irq disable is the best and simplest when I am only
> > > worried about task-latencies. But now Daniel and Sven wants to use the
> > > distingtion between raw_local_irq_disable() and local_irq_disable() to
> > > make irqs fast. 
> > 
> > We aim to make IRQ latencies deterministic. This affects preemption
> > latency in a positive way. 
> > 
> No. If you leave preemption off but irqs on, which is what is done here,
> you get good, deterministic IRQ latencies but nothing for task-latencies -
> actually slightly (unmeassureable I agree) worse due to the extra step
> you have to go from the physical interrupt to the task-switch is
> completed.
> 

If its unmeasurable, what is the BIG deal? Don't make me say again that
we need real data here.

> > Anywhere in the kernel that IRQs are disabled, preemption is impossible.
> > (you can't interrupt the CPU when irqs are disabled)
> > 
> For me it is the _same_ thing. Equally bad. If preemption is off I don't
> care if irqs are off. 
> 

So if its the same thing, why are you concerned with the improvement as
is?

> > But you said you are worried about overhead. You have to incur overhead
> > to make task response deterministic.
> > 
> > Are you sure you are not just trying to make it FAST?
> > 
> Certainly not. I was pressing for priority inheritance forinstance. That
> thing does certainly not make it fast, but it makes use of locks 
> deterministic.
> 

PI is already in there. I think you are missing some basic concepts here, 
for example that IRQs can happen ANYTIME, not just when we happen to enable 
interrupts where they have previously been disabled.

I am going to stop responding to this thread until you back up your concerns 
with real data, or throw some code out there, that you can back up with real data.

Regards,

Sven


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 20:51                   ` Daniel Walker
@ 2005-06-11 23:44                     ` Thomas Gleixner
  2005-06-11 23:50                       ` Daniel Walker
                                         ` (3 more replies)
  2005-06-12  4:31                     ` Karim Yaghmour
                                       ` (2 subsequent siblings)
  3 siblings, 4 replies; 86+ messages in thread
From: Thomas Gleixner @ 2005-06-11 23:44 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Ingo Molnar, Esben Nielsen, linux-kernel, sdietrich

On Sat, 2005-06-11 at 13:51 -0700, Daniel Walker wrote:
> On Sat, 11 Jun 2005, Ingo Molnar wrote:

> Interesting .. So "cli" takes 7 cycles , "sti" takes 7 cycles. The current 
> method does "lea" which takes 1 cycle, and "or" which takes 1 cycle. I'm 
> not sure if there is any function call overhead .. So the soft replacment 
> of cli/sti is 70% faster on a per instruction level .. So it's at least 
> not any slower .. Does everyone agree on that?

No, because x86 is not the whole universe

tglx




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 23:44                     ` Thomas Gleixner
@ 2005-06-11 23:50                       ` Daniel Walker
  2005-06-12  0:01                         ` Thomas Gleixner
  2005-06-12  0:09                       ` Sven-Thorsten Dietrich
                                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 86+ messages in thread
From: Daniel Walker @ 2005-06-11 23:50 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Ingo Molnar, Esben Nielsen, linux-kernel, sdietrich

On Sun, 12 Jun 2005, Thomas Gleixner wrote:

> On Sat, 2005-06-11 at 13:51 -0700, Daniel Walker wrote:
> > On Sat, 11 Jun 2005, Ingo Molnar wrote:
> 
> > Interesting .. So "cli" takes 7 cycles , "sti" takes 7 cycles. The current 
> > method does "lea" which takes 1 cycle, and "or" which takes 1 cycle. I'm 
> > not sure if there is any function call overhead .. So the soft replacment 
> > of cli/sti is 70% faster on a per instruction level .. So it's at least 
> > not any slower .. Does everyone agree on that?
> 
> No, because x86 is not the whole universe

It's only implemented on x86 . 

Daniel


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 23:50                       ` Daniel Walker
@ 2005-06-12  0:01                         ` Thomas Gleixner
  0 siblings, 0 replies; 86+ messages in thread
From: Thomas Gleixner @ 2005-06-12  0:01 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Ingo Molnar, Esben Nielsen, linux-kernel, sdietrich

On Sat, 2005-06-11 at 16:50 -0700, Daniel Walker wrote:
> > > Interesting .. So "cli" takes 7 cycles , "sti" takes 7 cycles. The current 
> > > method does "lea" which takes 1 cycle, and "or" which takes 1 cycle. I'm 
> > > not sure if there is any function call overhead .. So the soft replacment 
> > > of cli/sti is 70% faster on a per instruction level .. So it's at least 
> > > not any slower .. Does everyone agree on that?
> > 
> > No, because x86 is not the whole universe
> 
> It's only implemented on x86 . 

Interesting POV. 

What's not implemented by you does not exist.

tglx



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 18:40               ` Sven-Thorsten Dietrich
@ 2005-06-12  0:07                 ` Thomas Gleixner
  2005-06-12  0:15                   ` Sven-Thorsten Dietrich
  0 siblings, 1 reply; 86+ messages in thread
From: Thomas Gleixner @ 2005-06-12  0:07 UTC (permalink / raw)
  To: Sven-Thorsten Dietrich
  Cc: Daniel Walker, Esben Nielsen, Ingo Molnar, linux-kernel

On Sat, 2005-06-11 at 11:40 -0700, Sven-Thorsten Dietrich wrote:
> > Performance on RT systems is more than IRQ latencies. 
> > 
> > The wide spread misbelief that 
> >   "Realtime == As fast as possible" 
> > 
> > seems to be still stuck in peoples mind.
> > 
> >   "Realtime == As fast as specified"
> > is the correct equation.
> > 
> 
> I think Daniel was referring to the deviations, but it is always good to
> point that out.


I hope, we can agree on this premise for further discussions


> > There is always a tradeoff between interrupt latencies and other
> > performance values, as you have to invent new mechanisms to protect
> > critical sections. In the end, they can be less effective than the gain
> > on irq latencies.
> > 
> 
> Basically you are investing effort to maintain predictability. 
> 
> In order to do that, you somethimes have to put a stitch in before the
> deadline ("in time..."), the effort increases the work = overhead. 
> 
> But if you look at overall time the tasks are waiting you can implement
> optimal scheduling, and maximize throughput.
> 
> This is too complex to argue about here.

Whats too complex? Are you asserting that other people e.g. me, are too
dumb to understand that ?

> For every example I can find you a corner case.

There is everywhere a corner case. Thats nothing new.

> It depends on the application, and you need to decide how to configure
> our kernel if it really matters to what you are doing with Linux.

I completely agree, but your statement just confirms Ebsen's request for
keeping those tweaks as configurable options rather than built in
defaults.

For any RT application the application dictates the constraints and
therefor requires a maximum of configurability.

> We are merely working to provide alternatives that improve performance.

We all do just with a different set of constraints, right ?


tglx



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 23:44                     ` Thomas Gleixner
  2005-06-11 23:50                       ` Daniel Walker
@ 2005-06-12  0:09                       ` Sven-Thorsten Dietrich
  2005-06-12  0:28                         ` Thomas Gleixner
  2005-06-12  1:05                         ` Gene Heskett
  2005-06-12  4:50                       ` cutaway
  2005-06-12  6:57                       ` Ingo Molnar
  3 siblings, 2 replies; 86+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-06-12  0:09 UTC (permalink / raw)
  To: tglx; +Cc: Daniel Walker, Ingo Molnar, Esben Nielsen, linux-kernel

On Sun, 2005-06-12 at 01:44 +0200, Thomas Gleixner wrote:
> On Sat, 2005-06-11 at 13:51 -0700, Daniel Walker wrote:
> > On Sat, 11 Jun 2005, Ingo Molnar wrote:
> 
> > Interesting .. So "cli" takes 7 cycles , "sti" takes 7 cycles. The current 
> > method does "lea" which takes 1 cycle, and "or" which takes 1 cycle. I'm 
> > not sure if there is any function call overhead .. So the soft replacment 
> > of cli/sti is 70% faster on a per instruction level .. So it's at least 
> > not any slower .. Does everyone agree on that?
> 
> No, because x86 is not the whole universe
> 

Das Dehnt sich aus.

Even if there is a case of minimal expansion in the overhead on some
architecture, it may justify the effort for a certain class of
applications which require known interrupt response latencies.

The concept model here is, that you will have all interrupts running in
threads, EXCEPT one or more SA_NODELAY real-time IRQs. Those RT-IRQs may
be required to track satellites, manage I/O for a QOS or RF protocol
stack, shut down a SAW, or a variety of RT-related services.

The IRQ-disable-removal allows that the RT-IRQ encounters minimal
delay. 

Often, that IRQ will also wake up a process, and that process may have
some response-time constraints on it, as well.

SO that's one model that is helped by this design.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  0:07                 ` Thomas Gleixner
@ 2005-06-12  0:15                   ` Sven-Thorsten Dietrich
  2005-06-12  0:22                     ` Thomas Gleixner
  0 siblings, 1 reply; 86+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-06-12  0:15 UTC (permalink / raw)
  To: tglx; +Cc: Daniel Walker, Esben Nielsen, Ingo Molnar, linux-kernel

On Sun, 2005-06-12 at 02:07 +0200, Thomas Gleixner wrote:
> > 
> > This is too complex to argue about here.
> 
> Whats too complex? Are you asserting that other people e.g. me, are too
> dumb to understand that ?
> 

No, I said HERE, not FOR YOU. 




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  0:15                   ` Sven-Thorsten Dietrich
@ 2005-06-12  0:22                     ` Thomas Gleixner
  2005-06-12  0:24                       ` Sven-Thorsten Dietrich
  0 siblings, 1 reply; 86+ messages in thread
From: Thomas Gleixner @ 2005-06-12  0:22 UTC (permalink / raw)
  To: Sven-Thorsten Dietrich
  Cc: Daniel Walker, Esben Nielsen, Ingo Molnar, linux-kernel

On Sat, 2005-06-11 at 17:15 -0700, Sven-Thorsten Dietrich wrote:
> On Sun, 2005-06-12 at 02:07 +0200, Thomas Gleixner wrote:
> > > 
> > > This is too complex to argue about here.
> > 
> > Whats too complex? Are you asserting that other people e.g. me, are too
> > dumb to understand that ?
> > 
> 
> No, I said HERE, not FOR YOU. 

So where do you suggest to discuss this ?

tglx



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  0:22                     ` Thomas Gleixner
@ 2005-06-12  0:24                       ` Sven-Thorsten Dietrich
  0 siblings, 0 replies; 86+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-06-12  0:24 UTC (permalink / raw)
  To: tglx; +Cc: Daniel Walker, Esben Nielsen, Ingo Molnar, linux-kernel

On Sun, 2005-06-12 at 02:22 +0200, Thomas Gleixner wrote:
> On Sat, 2005-06-11 at 17:15 -0700, Sven-Thorsten Dietrich wrote:
> > On Sun, 2005-06-12 at 02:07 +0200, Thomas Gleixner wrote:
> > > > 
> > > > This is too complex to argue about here.
> > > 
> > > Whats too complex? Are you asserting that other people e.g. me, are too
> > > dumb to understand that ?
> > > 
> > 
> > No, I said HERE, not FOR YOU. 
> 
> So where do you suggest to discuss this ?

Thomas,

I think this is intense high-performance scheduling theory, that is
beyond the scope of what people need to have in their mailbox.

We can use ext-rt-dev@mvista.com if everyone agrees.

Its a public subscription list, let me know, and I'll dig up the
subscribe info.

Sven



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  0:09                       ` Sven-Thorsten Dietrich
@ 2005-06-12  0:28                         ` Thomas Gleixner
  2005-06-12  1:05                         ` Gene Heskett
  1 sibling, 0 replies; 86+ messages in thread
From: Thomas Gleixner @ 2005-06-12  0:28 UTC (permalink / raw)
  To: Sven-Thorsten Dietrich
  Cc: Daniel Walker, Ingo Molnar, Esben Nielsen, linux-kernel

On Sat, 2005-06-11 at 17:09 -0700, Sven-Thorsten Dietrich wrote:
> Even if there is a case of minimal expansion in the overhead on some
> architecture, it may justify the effort for a certain class of
> applications which require known interrupt response latencies.

Nobody denied that. I'm just cautious about arguments which count
instruction cylces on a given CPU.

> The concept model here is, that you will have all interrupts running in
> threads, EXCEPT one or more SA_NODELAY real-time IRQs. Those RT-IRQs may
> be required to track satellites, manage I/O for a QOS or RF protocol
> stack, shut down a SAW, or a variety of RT-related services.
> 
> The IRQ-disable-removal allows that the RT-IRQ encounters minimal
> delay. 
> 
> Often, that IRQ will also wake up a process, and that process may have
> some response-time constraints on it, as well.
> 
> SO that's one model that is helped by this design.

No problem with that. I have done this already and know about the pro
and cons. 

As I pointed out before, speed is not always the measure.

The point is configurability of features, but OTH you _cannot_ implement
a CONFIG option for each particular spinlock. You have to come down to a
certain set of config options by experimentation or by analysing ofcode
paths. Lot of work to be done though.

tglx





^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  0:09                       ` Sven-Thorsten Dietrich
  2005-06-12  0:28                         ` Thomas Gleixner
@ 2005-06-12  1:05                         ` Gene Heskett
  2005-06-13 12:03                           ` Paulo Marques
  1 sibling, 1 reply; 86+ messages in thread
From: Gene Heskett @ 2005-06-12  1:05 UTC (permalink / raw)
  To: linux-kernel

On Saturday 11 June 2005 20:09, Sven-Thorsten Dietrich wrote:
>On Sun, 2005-06-12 at 01:44 +0200, Thomas Gleixner wrote:
>> On Sat, 2005-06-11 at 13:51 -0700, Daniel Walker wrote:
>> > On Sat, 11 Jun 2005, Ingo Molnar wrote:
>> >
>> > Interesting .. So "cli" takes 7 cycles , "sti" takes 7 cycles.
>> > The current method does "lea" which takes 1 cycle, and "or"
>> > which takes 1 cycle. I'm not sure if there is any function call
>> > overhead .. So the soft replacment of cli/sti is 70% faster on a
>> > per instruction level .. So it's at least not any slower .. Does
>> > everyone agree on that?
>>
>> No, because x86 is not the whole universe
>
>Das Dehnt sich aus.
>
>Even if there is a case of minimal expansion in the overhead on some
>architecture, it may justify the effort for a certain class of
>applications which require known interrupt response latencies.
>
>The concept model here is, that you will have all interrupts running
> in threads, EXCEPT one or more SA_NODELAY real-time IRQs. Those
> RT-IRQs may be required to track satellites, manage I/O for a QOS
> or RF protocol stack, shut down a SAW, or a variety of RT-related
> services.

Lets add the operation of 4 or more stepper motors in real time for 
smaller milling machines.  There, the constraints are more related to 
maintaining a steady flow of step/direction data at high enough 
speeds to make a stepper, with 8 microsteps per step, and 240 steps 
per revolution, run smoothly at speeds up to say 20 kilohertz, or 50 
microseconds per step, maintaining that 50 microseconds plus or minus 
not more than 5 microseconds else the motors will start sounding 
ragged and stuttering.

That would correspond to (if my calculator's button pusher didn't 
screw up) 625 rpm at the motor shaft.  Thats faster than it would 
ever move while actually cutting, but would certainly be valuable in 
reaching the next point to start a fresh cut pass in a reasonable 
length of time.  Adeos is apparently able to do this now, given a rip 
snorter cpu of 2GHZ or more real clock speed.  Cutting the rep rate 
to 200 u-secs, cuts the motor speeds down into the 150 rpm range, but 
is then quite usable and the cpu has some time to do other things, 
like pay attention to the mouse and its clicks, or refresh the screen 
on a 500+ mhz box.  100 u-sec on a 1400 mhz Athlon, and the rest of 
the machine, while slow, is 100% usable.

These are the time constraints, generally speaking, it takes to run a 
small milling machine under the control of emc.  And this is the 
range we would expect for much of the machine controller world.

One of the complaints we (I at least) have in the adeos environment, 
is that the cpu is equally hogged, and the machine lags, full time 
even if the emc state is stopped.  This I think is an emc problem 
moreso than a kernel problem, in that emc should be capable of 
dynamicly adjusting the RT loops timeing, so that if the machine is 
in stop, the machine gives up its cpu to other tasks, making the 
machine much friendlier.  This is something needed by emc IMO, but 
the experts in emc may have other limits I don't know about.

Anyway, my $0.02 on the guaranteed 'deterministic' aspect of this.
Currently running V0.7.48-10 in mode 3 with hardirq threading turned 
off as turning it on still kills tvtimes video dma.  Its running 
quite decently in fact.

>The IRQ-disable-removal allows that the RT-IRQ encounters minimal
>delay.
>
>Often, that IRQ will also wake up a process, and that process may
> have some response-time constraints on it, as well.
>
>SO that's one model that is helped by this design.
>
>
>
>
>
>
>-
>To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.35% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 20:51                   ` Daniel Walker
  2005-06-11 23:44                     ` Thomas Gleixner
@ 2005-06-12  4:31                     ` Karim Yaghmour
  2005-06-12  4:32                       ` Daniel Walker
  2005-06-12 15:27                     ` Zwane Mwaikambo
  2005-06-12 17:02                     ` Andi Kleen
  3 siblings, 1 reply; 86+ messages in thread
From: Karim Yaghmour @ 2005-06-12  4:31 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Ingo Molnar, Esben Nielsen, linux-kernel, sdietrich,
	Philippe Gerum

Daniel Walker wrote:
> Interesting .. So "cli" takes 7 cycles , "sti" takes 7 cycles. The current 
> method does "lea" which takes 1 cycle, and "or" which takes 1 cycle. I'm 
> not sure if there is any function call overhead .. So the soft replacment 
> of cli/sti is 70% faster on a per instruction level .. So it's at least 
> not any slower .. Does everyone agree on that?

The proof is in the pudding: it's not for nothing that the results
we published earlier show that the mere enabling of Adeos actually
increases Linux's performance under heavy load.

This could easily be called the Stodolsky effect. Here, have a look
at this article, it was presented at the USENIX Symposium on
Microkernels and Other Kernel Architectures ... in 1993:
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/user/danner/www/OptSynch.ps
We've been referring back to this paper as early as the first public
release of Adeos ... in June 2002.

That being said, I'm not sure exactly why you guys are reinventing the
wheel. Adeos already does this soft-cli/sti stuff for you, it's been
available for a few years already, tested, and ported to a number of
architectures, and is generalized, why not just adopt it? After all,
like I've been saying for some time, it isn't mutually exclusive with
PREEMPT_RT.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  4:31                     ` Karim Yaghmour
@ 2005-06-12  4:32                       ` Daniel Walker
  2005-06-12  4:56                         ` Karim Yaghmour
  0 siblings, 1 reply; 86+ messages in thread
From: Daniel Walker @ 2005-06-12  4:32 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: Ingo Molnar, Esben Nielsen, linux-kernel, sdietrich,
	Philippe Gerum

On Sun, 12 Jun 2005, Karim Yaghmour wrote:

> 
> Daniel Walker wrote:
> > Interesting .. So "cli" takes 7 cycles , "sti" takes 7 cycles. The current 
> > method does "lea" which takes 1 cycle, and "or" which takes 1 cycle. I'm 
> > not sure if there is any function call overhead .. So the soft replacment 
> > of cli/sti is 70% faster on a per instruction level .. So it's at least 
> > not any slower .. Does everyone agree on that?
> 
> The proof is in the pudding: it's not for nothing that the results
> we published earlier show that the mere enabling of Adeos actually
> increases Linux's performance under heavy load.

Why do you think that is? Is ADEOS optimized for specific machine 
configurations?

> That being said, I'm not sure exactly why you guys are reinventing the
> wheel. Adeos already does this soft-cli/sti stuff for you, it's been
> available for a few years already, tested, and ported to a number of
> architectures, and is generalized, why not just adopt it? After all,
> like I've been saying for some time, it isn't mutually exclusive with
> PREEMPT_RT.

It doesn't seem like one could really merge the two. From what I've read 
, it seem like ADEOS is something completly indepedent . It would be linux 
and ADEOS , but never just linux . 

Daniel


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 23:44                     ` Thomas Gleixner
  2005-06-11 23:50                       ` Daniel Walker
  2005-06-12  0:09                       ` Sven-Thorsten Dietrich
@ 2005-06-12  4:50                       ` cutaway
  2005-06-12  6:57                       ` Ingo Molnar
  3 siblings, 0 replies; 86+ messages in thread
From: cutaway @ 2005-06-12  4:50 UTC (permalink / raw)
  To: tglx, Daniel Walker; +Cc: Ingo Molnar, Esben Nielsen, linux-kernel, sdietrich

FWIW - STI takes 3,5 and 7 cycles on 386,486, and 386SX respectively (these
are all still popular embedded processors)

Do not forget the impact of instruction size on fetch cost - the "book"
cycle counts are only for instructions ALREADY fetched and in the pipe/queue
and IGNORE the cost of fetching said instruction.  If you happen to get an
L1 hit, you win.  If that LEA or OR stradles a dword or cache line boundary
the CLI/STI may well wind up running faster under combat conditions if line
load or second memory fetch is needed to get the second half of the
instruction.

On an x86 CPU that which appears to be fast per the book clocks is often
slower, and that which appears to clock faster is often slower in reality
due to increased fetch cost of plump instructions.  I have many benchmarks
that prove this.

----- Original Message ----- 
From: "Thomas Gleixner" <tglx@linutronix.de>
To: "Daniel Walker" <dwalker@mvista.com>
Cc: "Ingo Molnar" <mingo@elte.hu>; "Esben Nielsen" <simlo@phys.au.dk>;
<linux-kernel@vger.kernel.org>; <sdietrich@mvista.com>
Sent: Saturday, June 11, 2005 19:44
Subject: Re: [PATCH] local_irq_disable removal

> On Sat, 2005-06-11 at 13:51 -0700, Daniel Walker wrote:
> > On Sat, 11 Jun 2005, Ingo Molnar wrote:
>
> > Interesting .. So "cli" takes 7 cycles , "sti" takes 7 cycles. The
current
> > method does "lea" which takes 1 cycle, and "or" which takes 1 cycle.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  4:56                         ` Karim Yaghmour
@ 2005-06-12  4:55                           ` Daniel Walker
  2005-06-12  5:16                             ` Karim Yaghmour
  0 siblings, 1 reply; 86+ messages in thread
From: Daniel Walker @ 2005-06-12  4:55 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: Ingo Molnar, Esben Nielsen, linux-kernel, sdietrich,
	Philippe Gerum

On Sun, 12 Jun 2005, Karim Yaghmour wrote:
 
 
> This is why I have a hard time understanding the statement that
> "It would be Linux and Adeos, but never just Linux." In this case,
> it would be Linux with an ipipe. Said ipipe can then be left
> unpopulated, and then we get back to what you guys have just
> implemented. Or a driver can use it to obtain hard-rt. Or
> additional Adeos components can hook onto the ipipe to provide
> services enabling RTAI to run side-by-side with Linux.

My reasoning is that Linux doesn't run in ring 0 . That to me makes linux 
and ADEOS two different entities. That's my way of looking at it. 

> May I suggest getting a copy of a recent Adeos patch and looking
> through it? I'm sure it would make things much simpler to
> understand.

I haven't looked at the code, I would like to . I have read about 
the ADEOS implementation .

Daniel


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  4:32                       ` Daniel Walker
@ 2005-06-12  4:56                         ` Karim Yaghmour
  2005-06-12  4:55                           ` Daniel Walker
  0 siblings, 1 reply; 86+ messages in thread
From: Karim Yaghmour @ 2005-06-12  4:56 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Ingo Molnar, Esben Nielsen, linux-kernel, sdietrich,
	Philippe Gerum

Daniel Walker wrote:
>>The proof is in the pudding: it's not for nothing that the results
>>we published earlier show that the mere enabling of Adeos actually
>>increases Linux's performance under heavy load.
> 
> 
> Why do you think that is? Is ADEOS optimized for specific machine 
> configurations?

I was refering back to what you were talking about just before I
replied: no disabling of interrupts.

> It doesn't seem like one could really merge the two. From what I've read 
> , it seem like ADEOS is something completly indepedent . It would be linux 
> and ADEOS , but never just linux . 

I'm not sure I follow. Forget about all the fancy hypervisor/
nanokernel talk. The bottom line is that while the initial design
called for an entire nanokernel, the actual code in adeos can be
summarized by the interrupt pipepline. Said ipipe is a feature
that stands on its own and could easily be integrated into mailine
as a feature. Using just the ipipe, for example, it would be
possible to load a module that would register ahead of Linux
in the pipeline and therefore obtain its interrupts regardless
of whether or not Linux has stalled its pipeline stage (i.e.
cli'ed.) That's hard-rt at a very low cost in terms of general
kernel source code intrusion.

This is why I have a hard time understanding the statement that
"It would be Linux and Adeos, but never just Linux." In this case,
it would be Linux with an ipipe. Said ipipe can then be left
unpopulated, and then we get back to what you guys have just
implemented. Or a driver can use it to obtain hard-rt. Or
additional Adeos components can hook onto the ipipe to provide
services enabling RTAI to run side-by-side with Linux.

May I suggest getting a copy of a recent Adeos patch and looking
through it? I'm sure it would make things much simpler to
understand.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  5:16                             ` Karim Yaghmour
@ 2005-06-12  5:14                               ` Daniel Walker
  2005-06-12  5:27                                 ` Karim Yaghmour
  0 siblings, 1 reply; 86+ messages in thread
From: Daniel Walker @ 2005-06-12  5:14 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: Ingo Molnar, Esben Nielsen, linux-kernel, sdietrich,
	Philippe Gerum

On Sun, 12 Jun 2005, Karim Yaghmour wrote:

> To understand what's in the current *PATCH*, please read this
> instead:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=102309348817485&w=2
> Here's the relevant passage:

Might want to consider changing the links on the ADEOS website, cause 
that's where I went to get information on it. 

Daniel


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  4:55                           ` Daniel Walker
@ 2005-06-12  5:16                             ` Karim Yaghmour
  2005-06-12  5:14                               ` Daniel Walker
  0 siblings, 1 reply; 86+ messages in thread
From: Karim Yaghmour @ 2005-06-12  5:16 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Ingo Molnar, Esben Nielsen, linux-kernel, sdietrich,
	Philippe Gerum

Daniel Walker wrote:
> My reasoning is that Linux doesn't run in ring 0 . That to me makes linux 
> and ADEOS two different entities. That's my way of looking at it. 
[snip]
> I haven't looked at the code, I would like to . I have read about 
> the ADEOS implementation .

*WARNING*WARNING*WARNING*WARNING*WARNING*WARNING*WARNING*WARNING*

ok, now that I've got the attention of anyone reading this post,
here goes:

The paper entitled "Adaptive Domain Environment for Operating
Systems", which I published in February 2001 presents a design
*PLAN*. The actual implementation is based on this paper, but
differs *SIGNIFICANTLY* from it. Hence, read the paper to
understand how the ipipe works *ONLY*. All this stuff about
ring 0 / ring 1 was *NEVER* implemented.

To understand what's in the current *PATCH*, please read this
instead:
http://marc.theaimsgroup.com/?l=linux-kernel&m=102309348817485&w=2
Here's the relevant passage:
> The complete Adeos approach has been thoroughly documented in a whitepaper
> published more than a year ago entitled "Adaptive Domain Environment
> for Operating Systems" and available here: http://www.opersys.com/adeos
> The current implementation is slightly different. Mainly, we do not
> implement the functionality to move Linux out of ring 0. Although of
> interest, this approach is not very portable.
> 
> Instead, our patch taps right into Linux's main source of control
> over the hardware, the interrupt dispatching code, and inserts an
> interrupt pipeline which can then serve all the nanokernel's clients,
> including Linux.

Again, Adeos does not play *ANY* ring tricks. It just hooks
onto the kernel's *EXISTING* cli/sti/int-delivery mechanisms
using a *PATCH*.

Hope this helps dissipate some misconceptions about what Adeos
is *TODAY*.

P.S.: Sorry Daniel, none of the above emphasis is directed
personally towards you. I just know from talking to quite a
few people that my original paper *CONTINUES* to *MISLEAD*
people about what Adeos currently is. So I just thought I'd
make it clear at this point for those interested.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  5:14                               ` Daniel Walker
@ 2005-06-12  5:27                                 ` Karim Yaghmour
  0 siblings, 0 replies; 86+ messages in thread
From: Karim Yaghmour @ 2005-06-12  5:27 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Ingo Molnar, Esben Nielsen, linux-kernel, sdietrich,
	Philippe Gerum


Daniel Walker wrote:
> Might want to consider changing the links on the ADEOS website, cause 
> that's where I went to get information on it. 

That's actually a good point. I've only got myself to blame about
the continued misconceptions on Adeos. I'll make sure the papers
are put somewhere where they are clearly labeled as *PRELIMNARY*
design documents.

Sorry for the confusion,

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 16:51 ` Christoph Hellwig
  2005-06-11 22:44   ` Ed Tomlinson
@ 2005-06-12  6:23   ` Ingo Molnar
  2005-06-12  9:28     ` Christoph Hellwig
  1 sibling, 1 reply; 86+ messages in thread
From: Ingo Molnar @ 2005-06-12  6:23 UTC (permalink / raw)
  To: Christoph Hellwig, Daniel Walker, linux-kernel, sdietrich

* Christoph Hellwig <hch@infradead.org> wrote:

> folks, can you please take this RT stuff of lkml?  And with that I
> don't mean the highlevel discussions what makes sense, but specific
> patches that aren't related to anything near mainline. [...]

this is a misconception - there's been a few dozen patches steadily 
trickling into mainline that were all started in the PREEMPT_RT 
patchset, so this "RT stuff", both the generic arguments and the details 
are very much relevant. I wouldnt be doing it if it wasnt relevant to 
the mainline kernel. The discussions are well concentrated into 2-3 
subjects so you can plonk those threads if you are not interested.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 23:44                     ` Thomas Gleixner
                                         ` (2 preceding siblings ...)
  2005-06-12  4:50                       ` cutaway
@ 2005-06-12  6:57                       ` Ingo Molnar
  2005-06-12 11:15                         ` Esben Nielsen
  2005-06-12 15:28                         ` Daniel Walker
  3 siblings, 2 replies; 86+ messages in thread
From: Ingo Molnar @ 2005-06-12  6:57 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Daniel Walker, Esben Nielsen, linux-kernel, sdietrich


* Thomas Gleixner <tglx@linutronix.de> wrote:

> On Sat, 2005-06-11 at 13:51 -0700, Daniel Walker wrote:
> > On Sat, 11 Jun 2005, Ingo Molnar wrote:
> 
> > Interesting .. So "cli" takes 7 cycles , "sti" takes 7 cycles. The current 
> > method does "lea" which takes 1 cycle, and "or" which takes 1 cycle. I'm 
> > not sure if there is any function call overhead .. So the soft replacment 
> > of cli/sti is 70% faster on a per instruction level .. So it's at least 
> > not any slower .. Does everyone agree on that?
> 
> No, because x86 is not the whole universe

x86 is actually a 'worst-case', because it has one of the cheapest CPU 
level cli/sti implementations. Usually it's the hard-local_irq_disable() 
overhead on non-x86 platforms that is a problem. (ARM iirc) So in this 
sense the soft-flag should be a win on most sane architectures.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  6:23   ` Ingo Molnar
@ 2005-06-12  9:28     ` Christoph Hellwig
  2005-06-13  4:39       ` [RT] " Steven Rostedt
  2005-06-16  5:35       ` Lee Revell
  0 siblings, 2 replies; 86+ messages in thread
From: Christoph Hellwig @ 2005-06-12  9:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Christoph Hellwig, Daniel Walker, linux-kernel, sdietrich

On Sun, Jun 12, 2005 at 08:23:50AM +0200, Ingo Molnar wrote:
> 
> * Christoph Hellwig <hch@infradead.org> wrote:
> 
> > folks, can you please take this RT stuff of lkml?  And with that I
> > don't mean the highlevel discussions what makes sense, but specific
> > patches that aren't related to anything near mainline. [...]
> 
> this is a misconception - there's been a few dozen patches steadily 
> trickling into mainline that were all started in the PREEMPT_RT 
> patchset, so this "RT stuff", both the generic arguments and the details 
> are very much relevant. I wouldnt be doing it if it wasnt relevant to 
> the mainline kernel. The discussions are well concentrated into 2-3 
> subjects so you can plonk those threads if you are not interested.

Then send patches when you think they're ready.  Everything directly
related to PREEPT_RT except the highlevel discussion is defintly offotpic.
Just create your preempt-rt mailinglist and get interested parties there,
lkml is for _general_ kernel discussion - even most subsystems that are
in mainline have their own lists.


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  6:57                       ` Ingo Molnar
@ 2005-06-12 11:15                         ` Esben Nielsen
  2005-06-12 11:52                           ` Ingo Molnar
  2005-06-13  7:01                           ` Sven-Thorsten Dietrich
  2005-06-12 15:28                         ` Daniel Walker
  1 sibling, 2 replies; 86+ messages in thread
From: Esben Nielsen @ 2005-06-12 11:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, Daniel Walker, linux-kernel, sdietrich

On Sun, 12 Jun 2005, Ingo Molnar wrote:

> 
> * Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > On Sat, 2005-06-11 at 13:51 -0700, Daniel Walker wrote:
> > > On Sat, 11 Jun 2005, Ingo Molnar wrote:
> > 
> > > Interesting .. So "cli" takes 7 cycles , "sti" takes 7 cycles. The current 
> > > method does "lea" which takes 1 cycle, and "or" which takes 1 cycle. I'm 
> > > not sure if there is any function call overhead .. So the soft replacment 
> > > of cli/sti is 70% faster on a per instruction level .. So it's at least 
> > > not any slower .. Does everyone agree on that?
> > 
> > No, because x86 is not the whole universe
> 
> x86 is actually a 'worst-case', because it has one of the cheapest CPU 
> level cli/sti implementations. Usually it's the hard-local_irq_disable() 
> overhead on non-x86 platforms that is a problem. (ARM iirc) So in this 
> sense the soft-flag should be a win on most sane architectures.
> 
> 	Ingo

I am surprised that is should actually be faster, but I give in to the
experts. I will see if I can find time to perform a test or I should spend
it on something else.

That said, this long discussion have not been a complete waste of time: I
think this thread have learned us that we do have different goals and
clarifies stuff.

I am not happy about the soft-irq thing. Mostly due to naming.
local_irq_disable() is really just preempt_disable() with some extra stuff
to make it backward combatible. 
I still believe local_irq_disable() (also in the soft version) should be
completely forbidden when PREEMPT_RT is set. All places using it should be
replaced with a mutex or a ???_local_irq_disable() to mark that the code
have been reviewed for PREEMPT_RT. With your argument above 
???_local_irq_disable() should really be preempt_disable() as that is
faster.

Esben

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12 11:15                         ` Esben Nielsen
@ 2005-06-12 11:52                           ` Ingo Molnar
  2005-06-13  7:01                           ` Sven-Thorsten Dietrich
  1 sibling, 0 replies; 86+ messages in thread
From: Ingo Molnar @ 2005-06-12 11:52 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Thomas Gleixner, Daniel Walker, linux-kernel, sdietrich


* Esben Nielsen <simlo@phys.au.dk> wrote:

> [...] With your argument above ???_local_irq_disable() should really 
> be preempt_disable() as that is faster.

local_irq_disable() _is_ almost the same thing as preempt_disable():

 void local_irq_disable(void)
 {
         mask_preempt_count(IRQSOFF_MASK);
 }
 EXPORT_SYMBOL(local_irq_disable);

which compiles to just 4 instructions:

 c012f355 <local_irq_disable>:
 c012f355:       b8 00 e0 ff ff          mov    $0xffffe000,%eax
 c012f35a:       21 e0                   and    %esp,%eax
 c012f35c:       81 48 14 00 00 00 20    orl    $0x20000000,0x14(%eax)
 c012f363:       c3                      ret

we could inline it too, that would make it exactly the same cost as 
preempt_disable().

it cannot be equivalent to preempt_disable()/enable due to the semantics 
of the IRQ flag. But fortunately masking/unmasking a bit is just as fast 
as inc/dec.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 20:51                   ` Daniel Walker
  2005-06-11 23:44                     ` Thomas Gleixner
  2005-06-12  4:31                     ` Karim Yaghmour
@ 2005-06-12 15:27                     ` Zwane Mwaikambo
  2005-06-12 15:46                       ` Daniel Walker
  2005-06-12 19:02                       ` Ingo Molnar
  2005-06-12 17:02                     ` Andi Kleen
  3 siblings, 2 replies; 86+ messages in thread
From: Zwane Mwaikambo @ 2005-06-12 15:27 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Ingo Molnar, Esben Nielsen, Linux Kernel, sdietrich

On Sat, 11 Jun 2005, Daniel Walker wrote:

> Interesting .. So "cli" takes 7 cycles , "sti" takes 7 cycles. The current 
> method does "lea" which takes 1 cycle, and "or" which takes 1 cycle. I'm 
> not sure if there is any function call overhead .. So the soft replacment 
> of cli/sti is 70% faster on a per instruction level .. So it's at least 
> not any slower .. Does everyone agree on that?

Well you also have to take into account the memory access, so it's not 
always that straightforward.

	Zwane


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  6:57                       ` Ingo Molnar
  2005-06-12 11:15                         ` Esben Nielsen
@ 2005-06-12 15:28                         ` Daniel Walker
  1 sibling, 0 replies; 86+ messages in thread
From: Daniel Walker @ 2005-06-12 15:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, Esben Nielsen, linux-kernel, sdietrich



On Sun, 12 Jun 2005, Ingo Molnar wrote:

> x86 is actually a 'worst-case', because it has one of the cheapest CPU 
> level cli/sti implementations. Usually it's the hard-local_irq_disable() 
> overhead on non-x86 platforms that is a problem. (ARM iirc) So in this 
> sense the soft-flag should be a win on most sane architectures.


My original port of this was on ARM , and I didn't notice a massive slow
down, or anything . I imagine it can't be unbearable. 

Daniel


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12 15:27                     ` Zwane Mwaikambo
@ 2005-06-12 15:46                       ` Daniel Walker
  2005-06-12 19:02                       ` Ingo Molnar
  1 sibling, 0 replies; 86+ messages in thread
From: Daniel Walker @ 2005-06-12 15:46 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Ingo Molnar, Esben Nielsen, Linux Kernel, sdietrich



I was just trying to get an idea of the possible slow down if any, and for
x86 I'm convinced that it's not a slowdown..

Daniel

On Sun, 12 Jun 2005, Zwane Mwaikambo wrote:

> On Sat, 11 Jun 2005, Daniel Walker wrote:
> 
> > Interesting .. So "cli" takes 7 cycles , "sti" takes 7 cycles. The current 
> > method does "lea" which takes 1 cycle, and "or" which takes 1 cycle. I'm 
> > not sure if there is any function call overhead .. So the soft replacment 
> > of cli/sti is 70% faster on a per instruction level .. So it's at least 
> > not any slower .. Does everyone agree on that?
> 
> Well you also have to take into account the memory access, so it's not 
> always that straightforward.
> 
> 	Zwane
> 
> 


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 20:51                   ` Daniel Walker
                                       ` (2 preceding siblings ...)
  2005-06-12 15:27                     ` Zwane Mwaikambo
@ 2005-06-12 17:02                     ` Andi Kleen
  3 siblings, 0 replies; 86+ messages in thread
From: Andi Kleen @ 2005-06-12 17:02 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Esben Nielsen, linux-kernel, sdietrich, mingo

Daniel Walker <dwalker@mvista.com> writes:
>
> Interesting .. So "cli" takes 7 cycles , "sti" takes 7 cycles. The current 
> method does "lea" which takes 1 cycle, and "or" which takes 1 cycle. I'm 
> not sure if there is any function call overhead .. So the soft replacment 
> of cli/sti is 70% faster on a per instruction level .. So it's at least 
> not any slower .. Does everyone agree on that?

It depends on what CPU your are benchmarking on. On K7 and K8 cli and sti
are only two or three cycles.

-Andi

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12 15:27                     ` Zwane Mwaikambo
  2005-06-12 15:46                       ` Daniel Walker
@ 2005-06-12 19:02                       ` Ingo Molnar
  1 sibling, 0 replies; 86+ messages in thread
From: Ingo Molnar @ 2005-06-12 19:02 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Daniel Walker, Esben Nielsen, Linux Kernel, sdietrich


* Zwane Mwaikambo <zwane@fsmlabs.com> wrote:

> > Interesting .. So "cli" takes 7 cycles , "sti" takes 7 cycles. The current 
> > method does "lea" which takes 1 cycle, and "or" which takes 1 cycle. I'm 
> > not sure if there is any function call overhead .. So the soft replacment 
> > of cli/sti is 70% faster on a per instruction level .. So it's at least 
> > not any slower .. Does everyone agree on that?
> 
> Well you also have to take into account the memory access, so it's not 
> always that straightforward.

preempt_count resides in the first cacheline of 'current thread info' 
and is almost always cached. It's the same cacheline that 'current', 
'smp_processor_id()' are using.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [RT] Re: [PATCH] local_irq_disable removal
  2005-06-12  9:28     ` Christoph Hellwig
@ 2005-06-13  4:39       ` Steven Rostedt
  2005-06-16  5:35       ` Lee Revell
  1 sibling, 0 replies; 86+ messages in thread
From: Steven Rostedt @ 2005-06-13  4:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: sdietrich, linux-kernel, Daniel Walker, Ingo Molnar

On Sun, 2005-06-12 at 10:28 +0100, Christoph Hellwig wrote:
> On Sun, Jun 12, 2005 at 08:23:50AM +0200, Ingo Molnar wrote:
> 
> Then send patches when you think they're ready.  Everything directly
> related to PREEPT_RT except the highlevel discussion is defintly offotpic.
> Just create your preempt-rt mailinglist and get interested parties there,
> lkml is for _general_ kernel discussion - even most subsystems that are
> in mainline have their own lists.

Actually Ingo, I think it might be a good time to create a RT list. I'm
much more interested in this topic than the average stuff that is on the
LKML.  The reason I'm more for setting up a mailing list, is that I keep
missing stuff that is related to your patch because there's no common
subject line. For instance, I missed the first 4 messages in this thread
since it didn't have anything about RT in the subject.  If there wasn't
a fifth message, I would never have seen the previous messages.

If anything, please have RT in the Subject.

Thank you,

-- Steve

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 22:59                     ` Sven-Thorsten Dietrich
@ 2005-06-13  5:22                       ` Steven Rostedt
  2005-06-13  6:20                         ` Sven-Thorsten Dietrich
  0 siblings, 1 reply; 86+ messages in thread
From: Steven Rostedt @ 2005-06-13  5:22 UTC (permalink / raw)
  To: Sven-Thorsten Dietrich
  Cc: linux-kernel, Daniel Walker, Ingo Molnar, Esben Nielsen

On Sat, 2005-06-11 at 15:59 -0700, Sven-Thorsten Dietrich wrote:
> On Sat, 2005-06-11 at 22:23 +0200, Esben Nielsen wrote:
> > > 
> > No because it correctly leaves irqs on but not preemption on. 
> > 
> 
> I see your worries now. See below.
> 
I'm still slightly confused :-)

> > No. If you leave preemption off but irqs on, which is what is done here,
> > you get good, deterministic IRQ latencies but nothing for task-latencies -
> > actually slightly (unmeassureable I agree) worse due to the extra step
> > you have to go from the physical interrupt to the task-switch is
> > completed.

So is this just to allow for waking up of the IRQ threads?  I can see an
improvement on SMP since it would allow for the IRQ thread to run on
another CPU while the current CPU has local_irq_disable.  Or is this
just to improve the SA_NODELAY?

> PI is already in there. I think you are missing some basic concepts here, 
> for example that IRQs can happen ANYTIME, not just when we happen to enable 
> interrupts where they have previously been disabled.

I don't understand this paragraph at all :-?    Where is the PI with the
local_irq_disable?   So what if the IRQs can happen anytime?  Maybe it's
just because it's late and I've spent the last three hours catching up
on this and other threads (I still need to read the "Attempted
summary ..." thread. Wow that's big!), but this paragraph just totally
lost me.

> 
> I am going to stop responding to this thread until you back up your concerns 
> with real data, or throw some code out there, that you can back up with real data.

I hope you at least respond to me ;-) but I might try to implement that
per CPU BKL for local_irq_... just to see how it looks. But then again
you have to check all the code that uses it to see if it's not just
protection some local CPU data. Since there's not too many reasons to
use local_irq_.. in an SMP environment.  So when they are used, it
probably would be for a reason that a global mutex wouldn't work for. Oh
well, I guess I don't need to implement that after all. 

Cheers,

-- Steve

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-13  5:22                       ` Steven Rostedt
@ 2005-06-13  6:20                         ` Sven-Thorsten Dietrich
  2005-06-13 12:28                           ` Steven Rostedt
  0 siblings, 1 reply; 86+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-06-13  6:20 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Daniel Walker, Ingo Molnar, Esben Nielsen

On Mon, 2005-06-13 at 01:22 -0400, Steven Rostedt wrote:
> On Sat, 2005-06-11 at 15:59 -0700, Sven-Thorsten Dietrich wrote:
> > On Sat, 2005-06-11 at 22:23 +0200, Esben Nielsen wrote:
> > > > 
> > > No because it correctly leaves irqs on but not preemption on. 
> > 
> > I see your worries now. See below.
> > 
> I'm still slightly confused :-)

I think there is a LOT of confusion.

> > > No. If you leave preemption off but irqs on, which is what is done here,
> > > you get good, deterministic IRQ latencies but nothing for task-latencies -
> > > actually slightly (unmeassureable I agree) worse due to the extra step
> > > you have to go from the physical interrupt to the task-switch is
> > > completed.
> 
> So is this just to allow for waking up of the IRQ threads?  I can see an
> improvement on SMP since it would allow for the IRQ thread to run on
> another CPU while the current CPU has local_irq_disable.  Or is this
> just to improve the SA_NODELAY?
> 

There are several benefits here:

1. local_irq no longer physically disables processor IRQs. This
eliminates exhaustive testing and driver examination, since any driver
you load, will only disable preemption, and cannot lock the system up
with irqs disabled for an unknown time.

2. To complicate matters further, its not actually required to disable
preemption in ALL cases. ALL you need to do, in many cases is to KEEP
the SPECIFIC IRQ pertaining to the driver that is disabling IRQs, from
being scheduled.

3. consequentially, by exposing more kernel code surface to run with
IRQs ENabled, you allow the SA_NODELAY to respond faster overall. 

4. In addition, but not trivially, the remaining code that is ACTUALLY
allowed to disable IRQs is SMALL. With this model, there are around 100
sections of code left, that DO disable processor interrupts. Those can
be analyzed, and every branch path measured. This allows the interrupt
response time for SA_NODELAY to be known and bounded. I'm not going to
say the h-word. Too much of a fire hazard ;)

5. By breaking up the opaque generic local_irq_disable into:

	A. irq disable to keep IRQs handler from running
	B. irq disable because we can't have interrupts for
	 another reason

All in the spirit of experimentation and research, we provided a tool
for analysis of the IRQ behavior, and IRQ response, by experiment, which
lets us know the subtle workings of the kernel a little better,
empirically. (i.e. by crashing it or not)

There are apparently some drawbacks as well, and I am a little confused
about that myself. I think I am being dense.

Experiments
-----------
It allows us to improve performance of SA_NODELAY interrupts (on X86)
towards a point, where you might try and measure the remaining code, and
make a prediction, and then beat the living daylights out of the kernel
with a million benchmarks and ping flood.

I recommend putting the machine on the uprotected Internet, with no
firewall, and let the hackers have their way with it while running your
tests.

When its a smoldering pile of ashes, go back to the logs, and see if it
ever took longer to process and interrupt than what you predicted with
measurements. If it did not, you might think hard and long about using
the h-word to describe the SA_NODELAY IRQ response.

Daniel proposed this entire concept, not to start a flame war, but to
get some feedback. We are not hell bent on pushing this on anyone.

But it is a tool, like RT in general, that is helping us look at the
kernel top-down, and test, and find problem areas and learn more about
the subtleties and the inner workings.

I realize that there are issues with this technology, and that its not
going to work for everyone. I realize that this may not be optimal for
all architectures.

It was posted for review and discussion.. 

We GOT some review, alright, so it wasn't ignored at least.

> > PI is already in there. I think you are missing some basic concepts here, 
> > for example that IRQs can happen ANYTIME, not just when we happen to enable 
> > interrupts where they have previously been disabled.
> 
> I don't understand this paragraph at all :-?    Where is the PI with the
> local_irq_disable?   So what if the IRQs can happen anytime?  Maybe it's
> just because it's late and I've spent the last three hours catching up
> on this and other threads (I still need to read the "Attempted
> summary ..." thread. Wow that's big!), but this paragraph just totally
> lost me.
> 

I might have mis-understood it in the first place. I was referring to PI
on the Mutex. I think Esben may have meant PI on Interrupts? Now if
Esben really does mean PI on IRQs, that's another interesting concept to
bend your mind around.

> > 
> > I am going to stop responding to this thread until you back up your concerns 
> > with real data, or throw some code out there, that you can back up with real data.
> 
> I hope you at least respond to me ;-) but I might try to implement that
> per CPU BKL for local_irq_... just to see how it looks. But then again
> you have to check all the code that uses it to see if it's not just
> protection some local CPU data. Since there's not too many reasons to
> use local_irq_.. in an SMP environment.  So when they are used, it
> probably would be for a reason that a global mutex wouldn't work for. Oh
> well, I guess I don't need to implement that after all. 
> 

Ok. There's a couple of bunkers around, but they maybe quite full right now.
Down here at the bottom of the Marianas trench, its dark and clammy, 
the sushi is fatty, and the uplink is slow.

Does Ottowa staff up their riot squads in mid-July?

Cheers,

Sven

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12 11:15                         ` Esben Nielsen
  2005-06-12 11:52                           ` Ingo Molnar
@ 2005-06-13  7:01                           ` Sven-Thorsten Dietrich
  2005-06-13  7:53                             ` Esben Nielsen
  1 sibling, 1 reply; 86+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-06-13  7:01 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Ingo Molnar, Thomas Gleixner, Daniel Walker, linux-kernel

On Sun, 2005-06-12 at 13:15 +0200, Esben Nielsen wrote:
> On Sun, 12 Jun 2005, Ingo Molnar wrote:
> I am surprised that is should actually be faster, but I give in to the
> experts. I will see if I can find time to perform a test or I should spend
> it on something else.
> 
> That said, this long discussion have not been a complete waste of time: I
> think this thread have learned us that we do have different goals and
> clarifies stuff.
> 
> I am not happy about the soft-irq thing. Mostly due to naming.
> local_irq_disable() is really just preempt_disable() with some extra stuff
> to make it backward combatible. 
> I still believe local_irq_disable() (also in the soft version) should be
> completely forbidden when PREEMPT_RT is set. All places using it should be
> replaced with a mutex or a ???_local_irq_disable() to mark that the code
> have been reviewed for PREEMPT_RT. With your argument above 
> ???_local_irq_disable() should really be preempt_disable() as that is
> faster.
> 

Hi Esben,

I just wondered if you are talking about the scenario where an interrupt
is executing on one processor, and gets preempted. Then some code runs
on the same CPU, which does local_irq_disable (now preempt_disable), to
keep that IRQ from running, but the IRQ thread is already started?

In the community kernel, this could never happen, because IRQs can't be
preempted. But in RT, its possible an IRQ could be preempted, and under
some circumstance, this sequence could occur.

Is that is what you are talking about? If not, it might be over my head,
and I am sorry. If so, I think that scenario is covered under SMP.

Sven



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-11 20:03                 ` Ingo Molnar
  2005-06-11 20:51                   ` Daniel Walker
@ 2005-06-13  7:08                   ` Sven-Thorsten Dietrich
  2005-06-13  7:44                     ` Esben Nielsen
  2005-06-13  7:47                     ` Ingo Molnar
  1 sibling, 2 replies; 86+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-06-13  7:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Esben Nielsen, Daniel Walker, linux-kernel

On Sat, 2005-06-11 at 22:03 +0200, Ingo Molnar wrote:
> * Esben Nielsen <simlo@phys.au.dk> wrote:
> 
> > > the jury is still out on the accuracy of those numbers. The test had 
> > > RT_DEADLOCK_DETECT (and other -RT debugging features) turned on, which 
> > > mostly work with interrupts disabled. The other question is how were 
> > > interrupt response times measured.
> > > 
> > You would accept a patch where I made this stuff optional?
> 
> I'm not sure why. The soft-flag based local_irq_disable() should in fact 
> be a tiny bit faster than the cli based approach, on a fair number of 
> CPUs. But it should definitely not be slower in any measurable way.
> 

Is there any such SMP concept as a local_preempt_disable()  ?




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-13  7:08                   ` Sven-Thorsten Dietrich
@ 2005-06-13  7:44                     ` Esben Nielsen
  2005-06-13  7:53                       ` Sven-Thorsten Dietrich
  2005-06-13  7:47                     ` Ingo Molnar
  1 sibling, 1 reply; 86+ messages in thread
From: Esben Nielsen @ 2005-06-13  7:44 UTC (permalink / raw)
  To: Sven-Thorsten Dietrich; +Cc: Ingo Molnar, Daniel Walker, linux-kernel

On Mon, 13 Jun 2005, Sven-Thorsten Dietrich wrote:

> On Sat, 2005-06-11 at 22:03 +0200, Ingo Molnar wrote:
> > * Esben Nielsen <simlo@phys.au.dk> wrote:
> > 
> > > > the jury is still out on the accuracy of those numbers. The test had 
> > > > RT_DEADLOCK_DETECT (and other -RT debugging features) turned on, which 
> > > > mostly work with interrupts disabled. The other question is how were 
> > > > interrupt response times measured.
> > > > 
> > > You would accept a patch where I made this stuff optional?
> > 
> > I'm not sure why. The soft-flag based local_irq_disable() should in fact 
> > be a tiny bit faster than the cli based approach, on a fair number of 
> > CPUs. But it should definitely not be slower in any measurable way.
> > 
> 
> Is there any such SMP concept as a local_preempt_disable()  ?
> 
You must think of preempt_disable() ? Except for the interface is a little
bit different using flags in local_irq_save(), preempt_disable() works
exactly the same way, blocking for everything but interrupts - on the
_local_ CPU. (Under PREEMPT_RT it ofcourse also blocks for threaded IRQ
handlers.)


Esben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-13  7:08                   ` Sven-Thorsten Dietrich
  2005-06-13  7:44                     ` Esben Nielsen
@ 2005-06-13  7:47                     ` Ingo Molnar
  1 sibling, 0 replies; 86+ messages in thread
From: Ingo Molnar @ 2005-06-13  7:47 UTC (permalink / raw)
  To: Sven-Thorsten Dietrich; +Cc: Esben Nielsen, Daniel Walker, linux-kernel


* Sven-Thorsten Dietrich <sdietrich@mvista.com> wrote:

> On Sat, 2005-06-11 at 22:03 +0200, Ingo Molnar wrote:
> > * Esben Nielsen <simlo@phys.au.dk> wrote:
> > 
> > > > the jury is still out on the accuracy of those numbers. The test had 
> > > > RT_DEADLOCK_DETECT (and other -RT debugging features) turned on, which 
> > > > mostly work with interrupts disabled. The other question is how were 
> > > > interrupt response times measured.
> > > > 
> > > You would accept a patch where I made this stuff optional?
> > 
> > I'm not sure why. The soft-flag based local_irq_disable() should in fact 
> > be a tiny bit faster than the cli based approach, on a fair number of 
> > CPUs. But it should definitely not be slower in any measurable way.
> > 
> 
> Is there any such SMP concept as a local_preempt_disable()  ?

preempt_disable() is always 'local'. (has effect only on the current 
CPU)

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-13  7:44                     ` Esben Nielsen
@ 2005-06-13  7:53                       ` Sven-Thorsten Dietrich
  2005-06-13  7:56                         ` Ingo Molnar
  0 siblings, 1 reply; 86+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-06-13  7:53 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Ingo Molnar, Daniel Walker, linux-kernel

On Mon, 2005-06-13 at 09:44 +0200, Esben Nielsen wrote:
> On Mon, 13 Jun 2005, Sven-Thorsten Dietrich wrote:
> 
> > On Sat, 2005-06-11 at 22:03 +0200, Ingo Molnar wrote:
> > > * Esben Nielsen <simlo@phys.au.dk> wrote:
> > > 
> > > > > the jury is still out on the accuracy of those numbers. The test had 
> > > > > RT_DEADLOCK_DETECT (and other -RT debugging features) turned on, which 
> > > > > mostly work with interrupts disabled. The other question is how were 
> > > > > interrupt response times measured.
> > > > > 
> > > > You would accept a patch where I made this stuff optional?
> > > 
> > > I'm not sure why. The soft-flag based local_irq_disable() should in fact 
> > > be a tiny bit faster than the cli based approach, on a fair number of 
> > > CPUs. But it should definitely not be slower in any measurable way.
> > > 
> > 
> > Is there any such SMP concept as a local_preempt_disable()  ?
> > 
> You must think of preempt_disable() ? Except for the interface is a little
> bit different using flags in local_irq_save(), preempt_disable() works
> exactly the same way, blocking for everything but interrupts - on the
> _local_ CPU. (Under PREEMPT_RT it ofcourse also blocks for threaded IRQ
> handlers.)

Doesn't preempt_disable() also block rescheduling on other CPUs?

We only need to prevent rescheduling on THIS CPU.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-13  7:01                           ` Sven-Thorsten Dietrich
@ 2005-06-13  7:53                             ` Esben Nielsen
  2005-06-13  8:05                               ` Sven-Thorsten Dietrich
  0 siblings, 1 reply; 86+ messages in thread
From: Esben Nielsen @ 2005-06-13  7:53 UTC (permalink / raw)
  To: Sven-Thorsten Dietrich
  Cc: Ingo Molnar, Thomas Gleixner, Daniel Walker, linux-kernel


On Mon, 13 Jun 2005, Sven-Thorsten Dietrich wrote:

> On Sun, 2005-06-12 at 13:15 +0200, Esben Nielsen wrote:
> > On Sun, 12 Jun 2005, Ingo Molnar wrote:
> > I am surprised that is should actually be faster, but I give in to the
> > experts. I will see if I can find time to perform a test or I should spend
> > it on something else.
> > 
> > That said, this long discussion have not been a complete waste of time: I
> > think this thread have learned us that we do have different goals and
> > clarifies stuff.
> > 
> > I am not happy about the soft-irq thing. Mostly due to naming.
> > local_irq_disable() is really just preempt_disable() with some extra stuff
> > to make it backward combatible. 
> > I still believe local_irq_disable() (also in the soft version) should be
> > completely forbidden when PREEMPT_RT is set. All places using it should be
> > replaced with a mutex or a ???_local_irq_disable() to mark that the code
> > have been reviewed for PREEMPT_RT. With your argument above 
> > ???_local_irq_disable() should really be preempt_disable() as that is
> > faster.
> > 
> 
> Hi Esben,
> 
> I just wondered if you are talking about the scenario where an interrupt
> is executing on one processor, and gets preempted. Then some code runs
> on the same CPU, which does local_irq_disable (now preempt_disable), to
> keep that IRQ from running, but the IRQ thread is already started?
> 
> In the community kernel, this could never happen, because IRQs can't be
> preempted. But in RT, its possible an IRQ could be preempted, and under
> some circumstance, this sequence could occur.
> 
> Is that is what you are talking about? If not, it might be over my head,
> and I am sorry. If so, I think that scenario is covered under SMP.
> 
> Sven
> 
No, Sven it is not. I am not so worried about that scenario.
I am worried about some coder somewhere still using local_irq_disable() -
there is a lot of code out there doing that. We have not confirmed that
all of it really locks small enough regions to preserver RT preemption.
I for one is doubtfull about the cmos_lock thingy. (Sorry, can't connect
to my machine at home to check where it is, right now.) A very weird setup
with a kind of homebrewn spinlock.
All these cases needs to be reviewed to see if it is valid to use a
global lock type like local_irq_disable() or a local mutex must be used.
The former is only "allowed" if the time being within the locked is
deterministicly only in the order of the time for scheduling.
I wanted to add a extra name to the namespace stating "this usage of
local_irq_disable() have been reviwed wrt. RT_PREEMPT".

Esben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-13  7:53                       ` Sven-Thorsten Dietrich
@ 2005-06-13  7:56                         ` Ingo Molnar
  0 siblings, 0 replies; 86+ messages in thread
From: Ingo Molnar @ 2005-06-13  7:56 UTC (permalink / raw)
  To: Sven-Thorsten Dietrich; +Cc: Esben Nielsen, Daniel Walker, linux-kernel


* Sven-Thorsten Dietrich <sdietrich@mvista.com> wrote:

> > > Is there any such SMP concept as a local_preempt_disable()  ?
> > > 
> > You must think of preempt_disable() ? Except for the interface is a little
> > bit different using flags in local_irq_save(), preempt_disable() works
> > exactly the same way, blocking for everything but interrupts - on the
> > _local_ CPU. (Under PREEMPT_RT it ofcourse also blocks for threaded IRQ
> > handlers.)
> 
> Doesn't preempt_disable() also block rescheduling on other CPUs?
> 
> We only need to prevent rescheduling on THIS CPU.

it doesnt. It's 2-4 instructions similar to the assembly i posted 
before, changing current_thread_info->preempt_count, nothing else.

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-13  7:53                             ` Esben Nielsen
@ 2005-06-13  8:05                               ` Sven-Thorsten Dietrich
  2005-06-13  8:54                                 ` Esben Nielsen
  0 siblings, 1 reply; 86+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-06-13  8:05 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Ingo Molnar, Thomas Gleixner, Daniel Walker, linux-kernel

On Mon, 2005-06-13 at 09:53 +0200, Esben Nielsen wrote:
> On Mon, 13 Jun 2005, Sven-Thorsten Dietrich wrote:
> > > 
> > 
> > Hi Esben,
> > 
> > I just wondered if you are talking about the scenario where an interrupt
> > is executing on one processor, and gets preempted. Then some code runs
> > on the same CPU, which does local_irq_disable (now preempt_disable), to
> > keep that IRQ from running, but the IRQ thread is already started?
> > 
> > In the community kernel, this could never happen, because IRQs can't be
> > preempted. But in RT, its possible an IRQ could be preempted, and under
> > some circumstance, this sequence could occur.
> > 
> > Is that is what you are talking about? If not, it might be over my head,
> > and I am sorry. If so, I think that scenario is covered under SMP.
> > 
> > Sven
> > 
> No, Sven it is not. I am not so worried about that scenario.
> I am worried about some coder somewhere still using local_irq_disable() -
> there is a lot of code out there doing that. We have not confirmed that
> all of it really locks small enough regions to preserver RT preemption.
> I for one is doubtfull about the cmos_lock thingy. (Sorry, can't connect
> to my machine at home to check where it is, right now.) A very weird setup
> with a kind of homebrewn spinlock.
> All these cases needs to be reviewed to see if it is valid to use a
> global lock type like local_irq_disable() or a local mutex must be used.
> The former is only "allowed" if the time being within the locked is
> deterministicly only in the order of the time for scheduling.
> I wanted to add a extra name to the namespace stating "this usage of
> local_irq_disable() have been reviwed wrt. RT_PREEMPT".

I am sure there are some issues like you describe out there.

I suppose we could examine each local_irq_disable, but I wonder if there
is not a better way.

I wrote earlier, that we could just keep specific IRQ threads from
runnning, rather than disabling preemption overall.

This is somewhat orthogonal, but would serve as a filter to break things
down into local_irq_disable to keep a specific IRQ from running, and
local_irq_disable to suppress all IRQ activity.

If we're going to walk trough all these local_irq_disables, we might as
well check-off that item as well.

What do you think?

Sven




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-13  8:05                               ` Sven-Thorsten Dietrich
@ 2005-06-13  8:54                                 ` Esben Nielsen
  2005-06-13  9:13                                   ` Ingo Molnar
  0 siblings, 1 reply; 86+ messages in thread
From: Esben Nielsen @ 2005-06-13  8:54 UTC (permalink / raw)
  To: Sven-Thorsten Dietrich
  Cc: Ingo Molnar, Thomas Gleixner, Daniel Walker, linux-kernel

On Mon, 13 Jun 2005, Sven-Thorsten Dietrich wrote:

> On Mon, 2005-06-13 at 09:53 +0200, Esben Nielsen wrote:
> > On Mon, 13 Jun 2005, Sven-Thorsten Dietrich wrote:
> > > > 
> > > 
> > > Hi Esben,
> > > 
> > > I just wondered if you are talking about the scenario where an interrupt
> > > is executing on one processor, and gets preempted. Then some code runs
> > > on the same CPU, which does local_irq_disable (now preempt_disable), to
> > > keep that IRQ from running, but the IRQ thread is already started?
> > > 
> > > In the community kernel, this could never happen, because IRQs can't be
> > > preempted. But in RT, its possible an IRQ could be preempted, and under
> > > some circumstance, this sequence could occur.
> > > 
> > > Is that is what you are talking about? If not, it might be over my head,
> > > and I am sorry. If so, I think that scenario is covered under SMP.
> > > 
> > > Sven
> > > 
> > No, Sven it is not. I am not so worried about that scenario.
> > I am worried about some coder somewhere still using local_irq_disable() -
> > there is a lot of code out there doing that. We have not confirmed that
> > all of it really locks small enough regions to preserver RT preemption.
> > I for one is doubtfull about the cmos_lock thingy. (Sorry, can't connect
> > to my machine at home to check where it is, right now.) A very weird setup
> > with a kind of homebrewn spinlock.
> > All these cases needs to be reviewed to see if it is valid to use a
> > global lock type like local_irq_disable() or a local mutex must be used.
> > The former is only "allowed" if the time being within the locked is
> > deterministicly only in the order of the time for scheduling.
> > I wanted to add a extra name to the namespace stating "this usage of
> > local_irq_disable() have been reviwed wrt. RT_PREEMPT".
> 
> I am sure there are some issues like you describe out there.
> 
> I suppose we could examine each local_irq_disable, but I wonder if there
> is not a better way.
> 
> I wrote earlier, that we could just keep specific IRQ threads from
> runnning, rather than disabling preemption overall.
> 
> This is somewhat orthogonal, but would serve as a filter to break things
> down into local_irq_disable to keep a specific IRQ from running, and
> local_irq_disable to suppress all IRQ activity.
> 
> If we're going to walk trough all these local_irq_disables, we might as
> well check-off that item as well.
> 
> What do you think?
>

The problem is local_irq_disable() doesn't carry around a notion of
locality: It doesn't say which code and which IRQ handler is the problem.
That is each instance have to be considered :-( :-(

We could make "global", per-cpu replacement lock (would be a bit like the
BKL hack), but as soon as a RT thread hits one of those RT is totally
gone. Each IRQ-handler needs to be reviewed wether or not they must be
executed under this lock or not. By instead doing the hard work of 
breaking it into smaller locks (with PI) or veryfying that the regions are
so small that disabling preemption is acceptable, the RT property can be
preserved.

I'll try to make an option in the config: "Disallow
local_irq_disable()/preempt_disable()". Then one can switch this on to
prevent local_irq_disable() not reviewed for PREEMPT_RT to sneak in from
!PREEMPT_RT. 
I will also try to make a lock which introduces the "notion
of locallity" (i.e. have the same semantics as a normal
spin_lock/mutex) into the code, but in !PREEMPT_RT will turn out to be
just local_irq_disable() or preempt_disable(). A lot of the
local_irq_disable() should be replaced with that.

Esben
 
> Sven
> 


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-13  8:54                                 ` Esben Nielsen
@ 2005-06-13  9:13                                   ` Ingo Molnar
  0 siblings, 0 replies; 86+ messages in thread
From: Ingo Molnar @ 2005-06-13  9:13 UTC (permalink / raw)
  To: Esben Nielsen
  Cc: Sven-Thorsten Dietrich, Thomas Gleixner, Daniel Walker,
	linux-kernel

* Esben Nielsen <simlo@phys.au.dk> wrote:

> I will also try to make a lock which introduces the "notion of 
> locallity" (i.e. have the same semantics as a normal spin_lock/mutex) 
> into the code, but in !PREEMPT_RT will turn out to be just 
> local_irq_disable() or preempt_disable(). A lot of the 
> local_irq_disable() should be replaced with that.

yes, this would be the right approach. Note that we already do something 
like that in the per_cpu_locked API, we hide a spinlock there, which 
gets turned off for !PREEMPT_RT. For local_irq_disable() replacements 
we'd need a separate API.

(one thing to watch out for are smp_call_function() handlers. These 
still execute in hardirq context even on PREEMPT_RT. E.g. in buffer.c 
you'll see such code.)

	Ingo

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  1:05                         ` Gene Heskett
@ 2005-06-13 12:03                           ` Paulo Marques
  2005-06-13 12:19                             ` Esben Nielsen
  0 siblings, 1 reply; 86+ messages in thread
From: Paulo Marques @ 2005-06-13 12:03 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel

Gene Heskett wrote:
> [...]
> Lets add the operation of 4 or more stepper motors in real time for 
> smaller milling machines.  There, the constraints are more related to 
> maintaining a steady flow of step/direction data at high enough 
> speeds to make a stepper, with 8 microsteps per step, and 240 steps 
> per revolution, run smoothly at speeds up to say 20 kilohertz, or 50 
> microseconds per step, maintaining that 50 microseconds plus or minus 
> not more than 5 microseconds else the motors will start sounding 
> ragged and stuttering.

This is the kind of problem that is screaming "give me dedicated 
hardware!". Why would one spend $500+ on a PC to do the work of a $2 
microcontroller (and possibly throw in an FPGA to the mix)?. Not to 
mention that the microcontroller/FPGA would maintain 50us +/- 0us 
instead of the 50 +/- 5 you've mentioned.

The same goes for the "hand under the saw". A simple 
transistor/triac/whatever and a logic circuit would stop the saw, we 
don't need any real time OS for that (and I would certainly trust the 
logic circuit more than any real time OS :).

IMHO, the kind of problems where real time OS's are useful are the kind 
that require computational power while having real-time constraints. 
Like the sound effects processor for the guitar that Lee Revell already 
mentioned or controlling an airplane in flight that has to measure a lot 
of sensors, do some state space calculations with some not-so-small 
matrices (together with Kalman filtering, etc.) and move the actuators.

-- 
Paulo Marques - www.grupopie.com

An expert is a person who has made all the mistakes that can be
made in a very narrow field.
Niels Bohr (1885 - 1962)

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-13 12:03                           ` Paulo Marques
@ 2005-06-13 12:19                             ` Esben Nielsen
  0 siblings, 0 replies; 86+ messages in thread
From: Esben Nielsen @ 2005-06-13 12:19 UTC (permalink / raw)
  To: Paulo Marques; +Cc: Gene Heskett, linux-kernel

On Mon, 13 Jun 2005, Paulo Marques wrote:

> Gene Heskett wrote:
> > [...]
> > Lets add the operation of 4 or more stepper motors in real time for 
> > smaller milling machines.  There, the constraints are more related to 
> > maintaining a steady flow of step/direction data at high enough 
> > speeds to make a stepper, with 8 microsteps per step, and 240 steps 
> > per revolution, run smoothly at speeds up to say 20 kilohertz, or 50 
> > microseconds per step, maintaining that 50 microseconds plus or minus 
> > not more than 5 microseconds else the motors will start sounding 
> > ragged and stuttering.
> 
> This is the kind of problem that is screaming "give me dedicated 
> hardware!". Why would one spend $500+ on a PC to do the work of a $2 
> microcontroller (and possibly throw in an FPGA to the mix)?. Not to 
> mention that the microcontroller/FPGA would maintain 50us +/- 0us 
> instead of the 50 +/- 5 you've mentioned.
>
Linux does not always run on a $500+ PC. People also use Linux on
_embedded_ devices costing only a small fraction of a $500+ PC. But I do
agree it is kind of silly to use the OS for this example. 

Assuming you do not put any "highlevel" calculations in a IRQ handler,
the only usefull place to have fast IRQ handlers, as far as I can see,
without having fast preemption is with cheap hardware without a lot of 
buffering. 

I can come up with CAN controllers, which only have very little
bufferspace (because they are cheap as they are sold in large volumes).
Worst case you will have to empty it every 100-200 us or so to avoid
having an overrun and loosing packages. The interpretation of the packets
should ofcourse be deferred to a task. The tolerated latency of that task
ofcourse depend on what you want to use the data for....

Esben

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-13  6:20                         ` Sven-Thorsten Dietrich
@ 2005-06-13 12:28                           ` Steven Rostedt
  0 siblings, 0 replies; 86+ messages in thread
From: Steven Rostedt @ 2005-06-13 12:28 UTC (permalink / raw)
  To: Sven-Thorsten Dietrich
  Cc: linux-kernel, Daniel Walker, Ingo Molnar, Esben Nielsen

On Sun, 2005-06-12 at 23:20 -0700, Sven-Thorsten Dietrich wrote:

> I think there is a LOT of confusion.

Thanks, I'm not so confused anymore thanks to you :-)

> There are several benefits here:
> 
> 1. local_irq no longer physically disables processor IRQs. This
> eliminates exhaustive testing and driver examination, since any driver
> you load, will only disable preemption, and cannot lock the system up
> with irqs disabled for an unknown time.

With all IRQs (except timer) as a thread, on a UP this will lock up the
system for a long time.  I'm not saying this is good or bad, I'm just
saying that this only helps the timer interrupt.  If the purpose is to
help your own SA_NODELAY then this is all good.

One thing I'm worried about is if some bad code goes into an infinite
loop you won't know why the system just locked up for a long time since
the NMI won't go off.  Hey, this is an idea, have the NMI also check for
preempt off.  Since local_irq_disable and preempt_disable both are
macros or functions, we can easily add a NMI counter to them.  I'll
write up a patch to do this. (If I can keep up with Ingo, last night it
was as -17 this morning it's at -24!)

> 
> 2. To complicate matters further, its not actually required to disable
> preemption in ALL cases. ALL you need to do, in many cases is to KEEP
> the SPECIFIC IRQ pertaining to the driver that is disabling IRQs, from
> being scheduled.

I consider this a bug when a driver disables irqs when it should only
disable it's own.  Actually, with a SMP system, the only safe thing to
do is turn off interrupts at the device (or spin_locks), and this would
be the proper fix for both RT and vanilla kernels.  If the device's
interrupt handler has a per CPU data that it modifies, then this would
need to be addressed even in the current implementation of RT, but I
don't know of any driver that really does this.

> 
> 3. consequentially, by exposing more kernel code surface to run with
> IRQs ENabled, you allow the SA_NODELAY to respond faster overall. 

No qualms here.

> 
> 4. In addition, but not trivially, the remaining code that is ACTUALLY
> allowed to disable IRQs is SMALL. With this model, there are around 100
> sections of code left, that DO disable processor interrupts. Those can
> be analyzed, and every branch path measured. This allows the interrupt
> response time for SA_NODELAY to be known and bounded. I'm not going to
> say the h-word. Too much of a fire hazard ;)
> 
> 5. By breaking up the opaque generic local_irq_disable into:
> 
> 	A. irq disable to keep IRQs handler from running
> 	B. irq disable because we can't have interrupts for
> 	 another reason

OK, so I'm not as confused as I thought. This all makes sense.

> 
> All in the spirit of experimentation and research, we provided a tool
> for analysis of the IRQ behavior, and IRQ response, by experiment, which
> lets us know the subtle workings of the kernel a little better,
> empirically. (i.e. by crashing it or not)
> 
> There are apparently some drawbacks as well, and I am a little confused
> about that myself. I think I am being dense.

What drawbacks?

> 
> Experiments
> -----------
> It allows us to improve performance of SA_NODELAY interrupts (on X86)
> towards a point, where you might try and measure the remaining code, and
> make a prediction, and then beat the living daylights out of the kernel
> with a million benchmarks and ping flood.
> 
> I recommend putting the machine on the uprotected Internet, with no
> firewall, and let the hackers have their way with it while running your
> tests.

No thank you, I don't need the FBI breaking down my door because my
computer is being used to hack into the pentagon ;-)

> 
> When its a smoldering pile of ashes, go back to the logs, and see if it
> ever took longer to process and interrupt than what you predicted with
> measurements. If it did not, you might think hard and long about using
> the h-word to describe the SA_NODELAY IRQ response.
> 
> 
> Daniel proposed this entire concept, not to start a flame war, but to
> get some feedback. We are not hell bent on pushing this on anyone.

I'm open to new ideas, and I wasn't flaming, and this thread really
didn't get that hot (unlike the flaming hell of the RT acceptance
thread).  I think Esben may have gotten a little excited, but nothing
that was enough to boil an egg with.

> 
> But it is a tool, like RT in general, that is helping us look at the
> kernel top-down, and test, and find problem areas and learn more about
> the subtleties and the inner workings.

Hey, lets try it out, if it works then keep it, if it's too much of a
burden then dump it!

> 
> I realize that there are issues with this technology, and that its not
> going to work for everyone. I realize that this may not be optimal for
> all architectures.
> 
> It was posted for review and discussion.. 
> 
> We GOT some review, alright, so it wasn't ignored at least.

And lots of discussion :-)

> 
> > > PI is already in there. I think you are missing some basic concepts here, 
> > > for example that IRQs can happen ANYTIME, not just when we happen to enable 
> > > interrupts where they have previously been disabled.
> > 
> > I don't understand this paragraph at all :-?    Where is the PI with the
> > local_irq_disable?   So what if the IRQs can happen anytime?  Maybe it's
> > just because it's late and I've spent the last three hours catching up
> > on this and other threads (I still need to read the "Attempted
> > summary ..." thread. Wow that's big!), but this paragraph just totally
> > lost me.
> > 
> 
> I might have mis-understood it in the first place. I was referring to PI
> on the Mutex. I think Esben may have meant PI on Interrupts? Now if
> Esben really does mean PI on IRQs, that's another interesting concept to
> bend your mind around.

This is where PI on wait queues comes in.

> 
> > > 
> > > I am going to stop responding to this thread until you back up your concerns 
> > > with real data, or throw some code out there, that you can back up with real data.
> > 
> > I hope you at least respond to me ;-) but I might try to implement that
> > per CPU BKL for local_irq_... just to see how it looks. But then again
> > you have to check all the code that uses it to see if it's not just
> > protection some local CPU data. Since there's not too many reasons to
> > use local_irq_.. in an SMP environment.  So when they are used, it
> > probably would be for a reason that a global mutex wouldn't work for. Oh
> > well, I guess I don't need to implement that after all. 
> > 
> 
> Ok. There's a couple of bunkers around, but they maybe quite full right now.
> Down here at the bottom of the Marianas trench, its dark and clammy, 
> the sushi is fatty, and the uplink is slow.
> 
> Does Ottowa staff up their riot squads in mid-July?

;}


-- Steve



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] local_irq_disable removal
  2005-06-12  9:28     ` Christoph Hellwig
  2005-06-13  4:39       ` [RT] " Steven Rostedt
@ 2005-06-16  5:35       ` Lee Revell
  1 sibling, 0 replies; 86+ messages in thread
From: Lee Revell @ 2005-06-16  5:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ingo Molnar, Daniel Walker, linux-kernel, sdietrich

On Sun, 2005-06-12 at 10:28 +0100, Christoph Hellwig wrote:
> On Sun, Jun 12, 2005 at 08:23:50AM +0200, Ingo Molnar wrote:
> > 
> > * Christoph Hellwig <hch@infradead.org> wrote:
> > 
> > > folks, can you please take this RT stuff of lkml?  And with that I
> > > don't mean the highlevel discussions what makes sense, but specific
> > > patches that aren't related to anything near mainline. [...]
> > 
> > this is a misconception - there's been a few dozen patches steadily 
> > trickling into mainline that were all started in the PREEMPT_RT 
> > patchset, so this "RT stuff", both the generic arguments and the details 
> > are very much relevant. I wouldnt be doing it if it wasnt relevant to 
> > the mainline kernel. The discussions are well concentrated into 2-3 
> > subjects so you can plonk those threads if you are not interested.
> 
> Then send patches when you think they're ready.  Everything directly
> related to PREEPT_RT except the highlevel discussion is defintly offotpic.
> Just create your preempt-rt mailinglist and get interested parties there,
> lkml is for _general_ kernel discussion - even most subsystems that are
> in mainline have their own lists.

I agree, this has to be annoying for people who have no interest in
PREEMPT_RT, and future PREEMPT_RT development is going to have zero
effect on people who don't enable it.

Any volunteers to set a list up?

Lee




^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2005-06-16  5:32 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-08  7:08 [PATCH] local_irq_disable removal Daniel Walker
2005-06-08 11:21 ` Ingo Molnar
2005-06-08 20:33   ` Daniel Walker
2005-06-09 11:56     ` Ingo Molnar
2005-06-10 23:37 ` Esben Nielsen
2005-06-11  0:20   ` Daniel Walker
2005-06-11 13:13     ` Esben Nielsen
2005-06-11 13:46       ` Ingo Molnar
2005-06-11 14:32         ` Esben Nielsen
2005-06-11 16:36           ` Daniel Walker
2005-06-11 17:26             ` Thomas Gleixner
2005-06-11 18:40               ` Sven-Thorsten Dietrich
2005-06-12  0:07                 ` Thomas Gleixner
2005-06-12  0:15                   ` Sven-Thorsten Dietrich
2005-06-12  0:22                     ` Thomas Gleixner
2005-06-12  0:24                       ` Sven-Thorsten Dietrich
2005-06-11 19:16             ` Ingo Molnar
2005-06-11 19:34               ` Esben Nielsen
2005-06-11 19:44                 ` Sven-Thorsten Dietrich
2005-06-11 19:53                   ` Daniel Walker
2005-06-11 20:23                   ` Esben Nielsen
2005-06-11 22:59                     ` Sven-Thorsten Dietrich
2005-06-13  5:22                       ` Steven Rostedt
2005-06-13  6:20                         ` Sven-Thorsten Dietrich
2005-06-13 12:28                           ` Steven Rostedt
2005-06-11 20:03                 ` Ingo Molnar
2005-06-11 20:51                   ` Daniel Walker
2005-06-11 23:44                     ` Thomas Gleixner
2005-06-11 23:50                       ` Daniel Walker
2005-06-12  0:01                         ` Thomas Gleixner
2005-06-12  0:09                       ` Sven-Thorsten Dietrich
2005-06-12  0:28                         ` Thomas Gleixner
2005-06-12  1:05                         ` Gene Heskett
2005-06-13 12:03                           ` Paulo Marques
2005-06-13 12:19                             ` Esben Nielsen
2005-06-12  4:50                       ` cutaway
2005-06-12  6:57                       ` Ingo Molnar
2005-06-12 11:15                         ` Esben Nielsen
2005-06-12 11:52                           ` Ingo Molnar
2005-06-13  7:01                           ` Sven-Thorsten Dietrich
2005-06-13  7:53                             ` Esben Nielsen
2005-06-13  8:05                               ` Sven-Thorsten Dietrich
2005-06-13  8:54                                 ` Esben Nielsen
2005-06-13  9:13                                   ` Ingo Molnar
2005-06-12 15:28                         ` Daniel Walker
2005-06-12  4:31                     ` Karim Yaghmour
2005-06-12  4:32                       ` Daniel Walker
2005-06-12  4:56                         ` Karim Yaghmour
2005-06-12  4:55                           ` Daniel Walker
2005-06-12  5:16                             ` Karim Yaghmour
2005-06-12  5:14                               ` Daniel Walker
2005-06-12  5:27                                 ` Karim Yaghmour
2005-06-12 15:27                     ` Zwane Mwaikambo
2005-06-12 15:46                       ` Daniel Walker
2005-06-12 19:02                       ` Ingo Molnar
2005-06-12 17:02                     ` Andi Kleen
2005-06-13  7:08                   ` Sven-Thorsten Dietrich
2005-06-13  7:44                     ` Esben Nielsen
2005-06-13  7:53                       ` Sven-Thorsten Dietrich
2005-06-13  7:56                         ` Ingo Molnar
2005-06-13  7:47                     ` Ingo Molnar
2005-06-11 16:41           ` Sven-Thorsten Dietrich
2005-06-11 17:16             ` Esben Nielsen
2005-06-11 19:29               ` Sven-Thorsten Dietrich
2005-06-11 20:02               ` Sven-Thorsten Dietrich
2005-06-11 16:19         ` Daniel Walker
2005-06-11 13:51       ` Ingo Molnar
2005-06-11 15:00         ` Mika Penttilä
2005-06-11 16:45           ` Sven-Thorsten Dietrich
2005-06-11 16:53             ` Mika Penttilä
2005-06-11 17:13               ` Daniel Walker
2005-06-11 17:22                 ` Mika Penttilä
2005-06-11 17:25                   ` Daniel Walker
2005-06-11 17:29                     ` Mika Penttilä
2005-06-11 17:30                       ` Daniel Walker
2005-06-11 17:55                         ` Mika Penttilä
2005-06-11 16:28         ` Daniel Walker
2005-06-11 16:46           ` Esben Nielsen
2005-06-11 16:09       ` Daniel Walker
2005-06-11 16:31         ` Esben Nielsen
2005-06-11 16:51 ` Christoph Hellwig
2005-06-11 22:44   ` Ed Tomlinson
2005-06-12  6:23   ` Ingo Molnar
2005-06-12  9:28     ` Christoph Hellwig
2005-06-13  4:39       ` [RT] " Steven Rostedt
2005-06-16  5:35       ` Lee Revell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox