public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00
@ 2005-03-19 19:16 Ingo Molnar
  2005-03-20  0:24 ` Lee Revell
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Ingo Molnar @ 2005-03-19 19:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Paul E. McKenney


i have released the -V0.7.41-00 Real-Time Preemption patch (merged to
2.6.12-rc1), which can be downloaded from the usual place:

  http://redhat.com/~mingo/realtime-preempt/

the biggest change in this patch is the merge of Paul E. McKenney's
preemptable RCU code. The new RCU code is active on PREEMPT_RT. While it
is still quite experimental at this stage, it allowed the removal of
locking cruft (mainly in the networking code), so it could solve some of
the longstanding netfilter/networking deadlocks/crashes reported by a
number of people. Be careful nevertheless.

there are a couple of minor changes relative to Paul's latest
preemptable-RCU code drop:

 - made the two variants two #ifdef blocks - this is sufficient for now
   and we'll see what the best way is in the longer run.

 - moved rcu_check_callbacks() from the timer IRQ to ksoftirqd. (the
   timer IRQ still runs in hardirq context on PREEMPT_RT.)

 - changed the irq-flags method to a preempt_disable()-based method, and
   moved the lock taking outside of the critical sections. (due to locks
   potentially sleeping on PREEMPT_RT).

to create a -V0.7.41-00 tree from scratch, the patching order is:

  http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.11.tar.bz2
  http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.12-rc1.bz2
  http://redhat.com/~mingo/realtime-preempt/realtime-preempt-2.6.12-rc1-V0.7.41-00

	Ingo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00
  2005-03-19 19:16 Ingo Molnar
@ 2005-03-20  0:24 ` Lee Revell
  2005-03-21 15:42   ` K.R. Foley
  2005-03-20  1:33 ` Lee Revell
  2005-03-20 17:45 ` Paul E. McKenney
  2 siblings, 1 reply; 11+ messages in thread
From: Lee Revell @ 2005-03-20  0:24 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Paul E. McKenney

On Sat, 2005-03-19 at 20:16 +0100, Ingo Molnar wrote:
> i have released the -V0.7.41-00 Real-Time Preemption patch (merged to
> 2.6.12-rc1), which can be downloaded from the usual place:
> 
>   http://redhat.com/~mingo/realtime-preempt/
> 

3ms latency in the NFS client code.  Workload was a kernel compile over
NFS.

preemption latency trace v1.1.4 on 2.6.12-rc1-RT-V0.7.41-00
--------------------------------------------------------------------
 latency: 3178 �s, #4095/14224, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1)
    -----------------
    | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0)
    -----------------

                 _------=> CPU#            
                / _-----=> irqs-off        
               | / _----=> need-resched    
               || / _---=> hardirq/softirq 
               ||| / _--=> preempt-depth   
               |||| /                      
               |||||     delay             
   cmd     pid ||||| time  |   caller      
      \   /    |||||   \   |   /           
(T1/#0)            <...> 32105 0 3 00000004 00000000 [0011939614227867] 0.000ms (+4137027.445ms): <6500646c> (<61000000>)
(T1/#2)            <...> 32105 0 3 00000004 00000002 [0011939614228097] 0.000ms (+0.000ms): __trace_start_sched_wakeup+0x9a/0xd0 <c013150a> (try_to_wake_up+0x94/0x140 <c0110474>)
(T1/#3)            <...> 32105 0 3 00000003 00000003 [0011939614228436] 0.000ms (+0.000ms): preempt_schedule+0x11/0x80 <c02b57c1> (try_to_wake_up+0x94/0x140 <c0110474>)
(T3/#4)    <...>-32105 0dn.3    0�s : try_to_wake_up+0x11e/0x140 <c01104fe> <<...>-2> (69 76): 
(T1/#5)            <...> 32105 0 3 00000002 00000005 [0011939614228942] 0.000ms (+0.000ms): preempt_schedule+0x11/0x80 <c02b57c1> (try_to_wake_up+0xf8/0x140 <c01104d8>)
(T1/#6)            <...> 32105 0 3 00000002 00000006 [0011939614229130] 0.000ms (+0.000ms): wake_up_process+0x35/0x40 <c0110555> (do_softirq+0x3f/0x50 <c011b05f>)
(T6/#7)    <...>-32105 0dn.1    1�s < (1)
(T1/#8)            <...> 32105 0 2 00000001 00000008 [0011939614229782] 0.001ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>)
(T1/#9)            <...> 32105 0 2 00000001 00000009 [0011939614229985] 0.001ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>)
(T1/#10)            <...> 32105 0 2 00000001 0000000a [0011939614230480] 0.001ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>)
(T1/#11)            <...> 32105 0 2 00000001 0000000b [0011939614230634] 0.002ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>)
(T1/#12)            <...> 32105 0 2 00000001 0000000c [0011939614230889] 0.002ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>)
(T1/#13)            <...> 32105 0 2 00000001 0000000d [0011939614231034] 0.002ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>)
(T1/#14)            <...> 32105 0 2 00000001 0000000e [0011939614231302] 0.002ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>)
(T1/#15)            <...> 32105 0 2 00000001 0000000f [0011939614231419] 0.002ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>)

(last two lines just repeat)

This is probably not be a regression; I had never tested this with NFS
before.

Lee


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00
  2005-03-19 19:16 Ingo Molnar
  2005-03-20  0:24 ` Lee Revell
@ 2005-03-20  1:33 ` Lee Revell
  2005-03-20  1:50   ` K.R. Foley
  2005-03-20 17:45 ` Paul E. McKenney
  2 siblings, 1 reply; 11+ messages in thread
From: Lee Revell @ 2005-03-20  1:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Paul E. McKenney

On Sat, 2005-03-19 at 20:16 +0100, Ingo Molnar wrote:
> the biggest change in this patch is the merge of Paul E. McKenney's
> preemptable RCU code. The new RCU code is active on PREEMPT_RT. While it
> is still quite experimental at this stage, it allowed the removal of
> locking cruft (mainly in the networking code), so it could solve some of
> the longstanding netfilter/networking deadlocks/crashes reported by a
> number of people. Be careful nevertheless.

With PREEMPT_RT my machine deadlocked within 20 minutes of boot.
"apt-get dist-upgrade" seemed to trigger the crash.  I did not see any
Oops unfortunately.

Lee




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00
  2005-03-20  1:33 ` Lee Revell
@ 2005-03-20  1:50   ` K.R. Foley
  2005-03-20  4:32     ` Lee Revell
  0 siblings, 1 reply; 11+ messages in thread
From: K.R. Foley @ 2005-03-20  1:50 UTC (permalink / raw)
  To: Lee Revell; +Cc: Ingo Molnar, linux-kernel, Paul E. McKenney

Lee Revell wrote:
> On Sat, 2005-03-19 at 20:16 +0100, Ingo Molnar wrote:
> 
>>the biggest change in this patch is the merge of Paul E. McKenney's
>>preemptable RCU code. The new RCU code is active on PREEMPT_RT. While it
>>is still quite experimental at this stage, it allowed the removal of
>>locking cruft (mainly in the networking code), so it could solve some of
>>the longstanding netfilter/networking deadlocks/crashes reported by a
>>number of people. Be careful nevertheless.
> 
> 
> With PREEMPT_RT my machine deadlocked within 20 minutes of boot.
> "apt-get dist-upgrade" seemed to trigger the crash.  I did not see any
> Oops unfortunately.
> 
> Lee
> 

Lee,

Just curious. Is this with UP or SMP? I currently have my UP box running 
  PREEMPT_RT, with no problems thus far. However, my SMP box dies while 
booting (with an oops). I am working on trying to get setup to capture 
the oops, although it might be tomorrow before I get that done.

-- 
    kr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00
  2005-03-20  1:50   ` K.R. Foley
@ 2005-03-20  4:32     ` Lee Revell
  2005-03-20 22:40       ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Lee Revell @ 2005-03-20  4:32 UTC (permalink / raw)
  To: K.R. Foley; +Cc: Ingo Molnar, linux-kernel, Paul E. McKenney

On Sat, 2005-03-19 at 19:50 -0600, K.R. Foley wrote:
> Lee Revell wrote:
> > On Sat, 2005-03-19 at 20:16 +0100, Ingo Molnar wrote:
> > 
> >>the biggest change in this patch is the merge of Paul E. McKenney's
> >>preemptable RCU code. The new RCU code is active on PREEMPT_RT. While it
> >>is still quite experimental at this stage, it allowed the removal of
> >>locking cruft (mainly in the networking code), so it could solve some of
> >>the longstanding netfilter/networking deadlocks/crashes reported by a
> >>number of people. Be careful nevertheless.
> > 
> > 
> > With PREEMPT_RT my machine deadlocked within 20 minutes of boot.
> > "apt-get dist-upgrade" seemed to trigger the crash.  I did not see any
> > Oops unfortunately.
> > 
> > Lee
> > 
> 
> Lee,
> 
> Just curious. Is this with UP or SMP? I currently have my UP box running 
>   PREEMPT_RT, with no problems thus far. However, my SMP box dies while 
> booting (with an oops). I am working on trying to get setup to capture 
> the oops, although it might be tomorrow before I get that done.
> 

UP.  It's 100% reproducible, this machine locks up over and over.  Seems
to be associated with network activity by multiple processes.

Lee


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00
  2005-03-19 19:16 Ingo Molnar
  2005-03-20  0:24 ` Lee Revell
  2005-03-20  1:33 ` Lee Revell
@ 2005-03-20 17:45 ` Paul E. McKenney
  2005-03-21  8:53   ` Ingo Molnar
  2 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2005-03-20 17:45 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Sat, Mar 19, 2005 at 08:16:58PM +0100, Ingo Molnar wrote:
> 
> i have released the -V0.7.41-00 Real-Time Preemption patch (merged to
> 2.6.12-rc1), which can be downloaded from the usual place:
> 
>   http://redhat.com/~mingo/realtime-preempt/
> 
> the biggest change in this patch is the merge of Paul E. McKenney's
> preemptable RCU code. The new RCU code is active on PREEMPT_RT. While it
> is still quite experimental at this stage, it allowed the removal of
> locking cruft (mainly in the networking code), so it could solve some of
> the longstanding netfilter/networking deadlocks/crashes reported by a
> number of people. Be careful nevertheless.
> 
> there are a couple of minor changes relative to Paul's latest
> preemptable-RCU code drop:
> 
>  - made the two variants two #ifdef blocks - this is sufficient for now
>    and we'll see what the best way is in the longer run.
> 
>  - moved rcu_check_callbacks() from the timer IRQ to ksoftirqd. (the
>    timer IRQ still runs in hardirq context on PREEMPT_RT.)
> 
>  - changed the irq-flags method to a preempt_disable()-based method, and
>    moved the lock taking outside of the critical sections. (due to locks
>    potentially sleeping on PREEMPT_RT).
> 
> to create a -V0.7.41-00 tree from scratch, the patching order is:
> 
>   http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.11.tar.bz2
>   http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.12-rc1.bz2
>   http://redhat.com/~mingo/realtime-preempt/realtime-preempt-2.6.12-rc1-V0.7.41-00

Some proposed fixes from a quick scan (untested, probably does not even
compile).  These proposed fixes fall into the following categories:

o	Some functions that should be static.

o	Introduced a synchronize_kernel_barrier() for a number of
	uses of synchronize_kernel() that are broken by the new
	implementation.  Note that synchronize_kernel_barrier() is
	the same as synchronize_kernel() in non-CONFIG_PREEMPT_RT
	kernels.  Not clear that synchronize_kernel_barrier()
	is strong enough for some uses, may need another API
	(synchronize_kernel_barrier_voluntary()???) that waits for all
	tasks to -voluntary- context switch or be executing in user
	space (these are marked with FIXME in the attached patch).
	Dipankar and/or Rusty put out a patch that did this some time
	back -- this was when we were trying to make an RCU that worked
	in CONFIG_PREEMPT kernels, but did not want preempt_disable()
	on the read side.

	That said, some of the synchronize_kernel_barrier()s marked
	with FIXME may be fixable more simply by inserting
	rcu_read_lock()/rcu_read_unlock() pairs in appropriate
	places.

o	Merged the two identical implementations each of
	rcu_dereference() and rcu_assign_pointer().

o	Added an rcu_read_lock() or two.  Clearly need to be searching
	for patches containing "synchronize_kernel" in addition to
	patches containing "rcu"...

Thoughts?

						Thanx, Paul

Signed-off-by: <paulmck@us.ibm.com>

diff -urpN -X dontdiff linux-2.6.11/arch/i386/oprofile/nmi_timer_int.c linux-2.6.11.fixes/arch/i386/oprofile/nmi_timer_int.c
--- linux-2.6.11/arch/i386/oprofile/nmi_timer_int.c	Tue Mar  1 23:37:52 2005
+++ linux-2.6.11.fixes/arch/i386/oprofile/nmi_timer_int.c	Sun Mar 20 08:40:31 2005
@@ -36,7 +36,7 @@ static void timer_stop(void)
 {
 	enable_timer_nmi_watchdog();
 	unset_nmi_callback();
-	synchronize_kernel();
+	synchronize_kernel_barrier();
 }
 
 
diff -urpN -X dontdiff linux-2.6.11/arch/ppc64/kernel/ItLpQueue.c linux-2.6.11.fixes/arch/ppc64/kernel/ItLpQueue.c
--- linux-2.6.11/arch/ppc64/kernel/ItLpQueue.c	Tue Mar  1 23:37:48 2005
+++ linux-2.6.11.fixes/arch/ppc64/kernel/ItLpQueue.c	Sun Mar 20 08:48:29 2005
@@ -142,7 +142,9 @@ unsigned ItLpQueue_process( struct ItLpQ
 				lpQueue->xLpIntCountByType[nextLpEvent->xType]++;
 			if ( nextLpEvent->xType < HvLpEvent_Type_NumTypes &&
 			     lpEventHandler[nextLpEvent->xType] ) 
+				rcu_read_lock();
 				lpEventHandler[nextLpEvent->xType](nextLpEvent, regs);
+				rcu_read_unlock();
 			else
 				printk(KERN_INFO "Unexpected Lp Event type=%d\n", nextLpEvent->xType );
 			
diff -urpN -X dontdiff linux-2.6.11/arch/x86_64/kernel/mce.c linux-2.6.11.fixes/arch/x86_64/kernel/mce.c
--- linux-2.6.11/arch/x86_64/kernel/mce.c	Tue Mar  1 23:37:52 2005
+++ linux-2.6.11.fixes/arch/x86_64/kernel/mce.c	Sun Mar 20 08:49:45 2005
@@ -392,7 +392,7 @@ static ssize_t mce_read(struct file *fil
 	memset(mcelog.entry, 0, next * sizeof(struct mce));
 	mcelog.next = 0;
 
-	synchronize_kernel();	
+	synchronize_kernel_barrier();	
 
 	/* Collect entries that were still getting written before the synchronize. */
 
diff -urpN -X dontdiff linux-2.6.11/drivers/acpi/processor_idle.c linux-2.6.11.fixes/drivers/acpi/processor_idle.c
--- linux-2.6.11/drivers/acpi/processor_idle.c	Tue Mar  1 23:38:25 2005
+++ linux-2.6.11.fixes/drivers/acpi/processor_idle.c	Sun Mar 20 09:01:44 2005
@@ -838,7 +838,7 @@ int acpi_processor_cst_has_changed (stru
 
 	/* Fall back to the default idle loop */
 	pm_idle = pm_idle_save;
-	synchronize_kernel();
+	synchronize_kernel_barrier(); /* FIXME: strong enough? */
 
 	pr->flags.power = 0;
 	result = acpi_processor_get_power_info(pr);
diff -urpN -X dontdiff linux-2.6.11/drivers/char/ipmi/ipmi_si_intf.c linux-2.6.11.fixes/drivers/char/ipmi/ipmi_si_intf.c
--- linux-2.6.11/drivers/char/ipmi/ipmi_si_intf.c	Sat Mar 19 14:04:13 2005
+++ linux-2.6.11.fixes/drivers/char/ipmi/ipmi_si_intf.c	Sun Mar 20 08:39:49 2005
@@ -2194,7 +2194,7 @@ static int init_one_smi(int intf_num, st
 	/* Wait until we know that we are out of any interrupt
 	   handlers might have been running before we freed the
 	   interrupt. */
-	synchronize_kernel();
+	synchronize_kernel_barrier();
 
 	if (new_smi->si_sm) {
 		if (new_smi->handlers)
@@ -2307,7 +2307,7 @@ static void __exit cleanup_one_si(struct
 	/* Wait until we know that we are out of any interrupt
 	   handlers might have been running before we freed the
 	   interrupt. */
-	synchronize_kernel();
+	synchronize_kernel_barrier();
 
 	/* Wait for the timer to stop.  This avoids problems with race
 	   conditions removing the timer here. */
diff -urpN -X dontdiff linux-2.6.11/drivers/input/keyboard/atkbd.c linux-2.6.11.fixes/drivers/input/keyboard/atkbd.c
--- linux-2.6.11/drivers/input/keyboard/atkbd.c	Sat Mar 19 14:04:16 2005
+++ linux-2.6.11.fixes/drivers/input/keyboard/atkbd.c	Sun Mar 20 09:02:33 2005
@@ -678,7 +678,7 @@ static void atkbd_disconnect(struct seri
 	atkbd_disable(atkbd);
 
 	/* make sure we don't have a command in flight */
-	synchronize_kernel();
+	synchronize_kernel_barrier(); /* FIXME: Strong enough? */
 	flush_scheduled_work();
 
 	device_remove_file(&serio->dev, &atkbd_attr_extra);
diff -urpN -X dontdiff linux-2.6.11/drivers/input/serio/i8042.c linux-2.6.11.fixes/drivers/input/serio/i8042.c
--- linux-2.6.11/drivers/input/serio/i8042.c	Sat Mar 19 14:04:16 2005
+++ linux-2.6.11.fixes/drivers/input/serio/i8042.c	Sun Mar 20 09:27:35 2005
@@ -396,7 +396,7 @@ static void i8042_stop(struct serio *ser
 	struct i8042_port *port = serio->port_data;
 
 	port->exists = 0;
-	synchronize_kernel();
+	synchronize_kernel_barrier(); /* FIXME: Strong enough? */
 	port->serio = NULL;
 }
 
diff -urpN -X dontdiff linux-2.6.11/drivers/net/r8169.c linux-2.6.11.fixes/drivers/net/r8169.c
--- linux-2.6.11/drivers/net/r8169.c	Sat Mar 19 14:04:19 2005
+++ linux-2.6.11.fixes/drivers/net/r8169.c	Sun Mar 20 09:09:06 2005
@@ -2385,7 +2385,7 @@ core_down:
 	}
 
 	/* Give a racing hard_start_xmit a few cycles to complete. */
-	synchronize_kernel();
+	synchronize_kernel_barrier(); /* FIXME: Strong enough? */
 
 	/*
 	 * And now for the 50k$ question: are IRQ disabled or not ?
diff -urpN -X dontdiff linux-2.6.11/drivers/s390/cio/airq.c linux-2.6.11.fixes/drivers/s390/cio/airq.c
--- linux-2.6.11/drivers/s390/cio/airq.c	Tue Mar  1 23:38:17 2005
+++ linux-2.6.11.fixes/drivers/s390/cio/airq.c	Sun Mar 20 09:11:57 2005
@@ -45,7 +45,7 @@ s390_register_adapter_interrupt (adapter
 	else
 		ret = (cmpxchg(&adapter_handler, NULL, handler) ? -EBUSY : 0);
 	if (!ret)
-		synchronize_kernel();
+		synchronize_kernel_barrier(); /* FIXME: Strong enough? */
 
 	sprintf (dbf_txt, "ret:%d", ret);
 	CIO_TRACE_EVENT (4, dbf_txt);
@@ -65,7 +65,7 @@ s390_unregister_adapter_interrupt (adapt
 		ret = -EINVAL;
 	else {
 		adapter_handler = NULL;
-		synchronize_kernel();
+		synchronize_kernel_barrier(); /* FIXME: Strong enough? */
 		ret = 0;
 	}
 	sprintf (dbf_txt, "ret:%d", ret);
diff -urpN -X dontdiff linux-2.6.11/include/linux/rcupdate.h linux-2.6.11.fixes/include/linux/rcupdate.h
--- linux-2.6.11/include/linux/rcupdate.h	Sat Mar 19 14:09:52 2005
+++ linux-2.6.11.fixes/include/linux/rcupdate.h	Sun Mar 20 09:24:20 2005
@@ -222,6 +222,8 @@ static inline int rcu_pending(int cpu)
  */
 #define rcu_read_unlock_bh()	local_bh_enable()
 
+#endif /* CONFIG_PREEMPT_RT */
+
 /**
  * rcu_dereference - fetch an RCU-protected pointer in an
  * RCU read-side critical section.  This pointer may later
@@ -256,6 +258,22 @@ static inline int rcu_pending(int cpu)
 						(p) = (v); \
 					})
 
+#ifndef CONFIG_PREEMPT_RT
+
+/**
+ * synchronize_kernel_barrier - block until each CPU executes a
+ * context switch, appears in the idle loop, or otherwise exits
+ * kernel execution.  This is synonymous with synchronize_kernel()
+ * in the classic RCU implementation, but not in some RCU
+ * implementations optimized for realtime use.  In these realtime
+ * uses, synchronize_kernel() can potentially return immediately,
+ * even on SMP systems.
+ *
+ * NMI-related uses of RCU need to use synchronize_kernel_barrier().
+ */
+
+#define synchronize_kernel_barrer() synchronize_kernel()
+
 extern void rcu_init(void);
 extern void rcu_check_callbacks(int cpu, int user);
 extern void rcu_restart_cpu(int cpu);
@@ -275,40 +293,6 @@ extern void synchronize_kernel(void);
 #define rcu_bh_qsctr_inc(cpu)
 #define rcu_qsctr_inc(cpu)
 
-/**
- * rcu_dereference - fetch an RCU-protected pointer in an
- * RCU read-side critical section.  This pointer may later
- * be safely dereferenced.
- *
- * Inserts memory barriers on architectures that require them
- * (currently only the Alpha), and, more importantly, documents
- * exactly which pointers are protected by RCU.
- */
-
-#define rcu_dereference(p)     ({ \
-				typeof(p) _________p1 = p; \
-				smp_read_barrier_depends(); \
-				(_________p1); \
-				})
-
-/**
- * rcu_assign_pointer - assign (publicize) a pointer to a newly
- * initialized structure that will be dereferenced by RCU read-side
- * critical sections.  Returns the value assigned.
- *
- * Inserts memory barriers on architectures that require them
- * (pretty much all of them other than x86), and also prevents
- * the compiler from reordering the code that initializes the
- * structure after the pointer assignment.  More importantly, this
- * call documents which pointers will be dereferenced by RCU read-side
- * code.
- */
-
-#define rcu_assign_pointer(p, v)	({ \
-						smp_wmb(); \
-						(p) = (v); \
-					})
-
 extern void rcu_init(void);
 
 /* Exported interfaces */
@@ -317,6 +301,7 @@ extern void FASTCALL(call_rcu(struct rcu
 extern void rcu_read_lock(void);
 extern void rcu_read_unlock(void);
 extern void synchronize_kernel(void);
+extern void synchronize_kernel_barrier(void);
 extern int rcu_pending(int cpu);
 extern void rcu_check_callbacks(int cpu, int user);
 
diff -urpN -X dontdiff linux-2.6.11/kernel/module.c linux-2.6.11.fixes/kernel/module.c
--- linux-2.6.11/kernel/module.c	Sat Mar 19 14:09:51 2005
+++ linux-2.6.11.fixes/kernel/module.c	Sun Mar 20 09:13:23 2005
@@ -1812,7 +1812,7 @@ sys_init_module(void __user *umod,
 		/* Init routine failed: abort.  Try to protect us from
                    buggy refcounters. */
 		mod->state = MODULE_STATE_GOING;
-		synchronize_kernel();
+		synchronize_kernel_barrier(); /* FIXME: Strong enough? */
 		if (mod->unsafe)
 			printk(KERN_ERR "%s: module is now stuck!\n",
 			       mod->name);
diff -urpN -X dontdiff linux-2.6.11/kernel/profile.c linux-2.6.11.fixes/kernel/profile.c
--- linux-2.6.11/kernel/profile.c	Sat Mar 19 14:09:51 2005
+++ linux-2.6.11.fixes/kernel/profile.c	Sun Mar 20 09:18:05 2005
@@ -194,7 +194,7 @@ void unregister_timer_hook(int (*hook)(s
 	WARN_ON(hook != timer_hook);
 	timer_hook = NULL;
 	/* make sure all CPUs see the NULL hook */
-	synchronize_kernel();
+	synchronize_kernel_barrier(); /* FIXME: Strong enough? */
 }
 
 EXPORT_SYMBOL_GPL(register_timer_hook);
diff -urpN -X dontdiff linux-2.6.11/kernel/rcupdate.c linux-2.6.11.fixes/kernel/rcupdate.c
--- linux-2.6.11/kernel/rcupdate.c	Sat Mar 19 14:09:51 2005
+++ linux-2.6.11.fixes/kernel/rcupdate.c	Sun Mar 20 09:32:13 2005
@@ -548,7 +548,37 @@ void synchronize_kernel(void)
 	}
 }
 
-void rcu_advance_callbacks(void)
+/*
+ * FIXME: Note that this implementation might not be strong enough
+ * for a number of driver uses of synchronize_kernel.  Some of these
+ * uses seem to assume a non-CONFIG_PREEMPT kernel, so may need
+ * to come up with a different approach.  Note that these uses
+ * are -not- waiting to free memory, but rather to ensure that
+ * a change is seen by all future driver invocations.
+ *
+ * The correct implementation is likely to be a tasklist scan,
+ * which blocks until all tasks encounter a voluntary context switch.
+ * If so, this implementation is required in CONFIG_PREEMPT
+ * kernels as well as CONFIG_PREEMPT_RT kernels.
+ */
+
+void synchronize_kernel_barrier(void)
+{
+	cpumask_t oldmask;
+	cpumask_t curmask;
+	int cpu;
+
+	if (sched_getaffinity(0, &oldmask) < 0) {
+		oldmask = cpu_possible_mask;
+	}
+	for_each_cpu(cpu) {
+		sched_setaffinity(0, cpumask_of_cpu(cpu));
+		schedule();
+	}
+	sched_setaffinity(0, oldmask);
+}
+
+static void rcu_advance_callbacks(void)
 {
 	struct rcu_data *rdp;
 
@@ -578,7 +608,7 @@ void fastcall call_rcu(struct rcu_head *
 	put_cpu_var(rcu_data);
 }
 
-void rcu_process_callbacks(void)
+static void rcu_process_callbacks(void)
 {
 	struct rcu_head *next, *list;
 	struct rcu_data *rdp;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00
  2005-03-20  4:32     ` Lee Revell
@ 2005-03-20 22:40       ` Paul E. McKenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2005-03-20 22:40 UTC (permalink / raw)
  To: Lee Revell; +Cc: K.R. Foley, Ingo Molnar, linux-kernel

On Sat, Mar 19, 2005 at 11:32:59PM -0500, Lee Revell wrote:
> On Sat, 2005-03-19 at 19:50 -0600, K.R. Foley wrote:
> > Lee Revell wrote:
> > > On Sat, 2005-03-19 at 20:16 +0100, Ingo Molnar wrote:
> > > 
> > >>the biggest change in this patch is the merge of Paul E. McKenney's
> > >>preemptable RCU code. The new RCU code is active on PREEMPT_RT. While it
> > >>is still quite experimental at this stage, it allowed the removal of
> > >>locking cruft (mainly in the networking code), so it could solve some of
> > >>the longstanding netfilter/networking deadlocks/crashes reported by a
> > >>number of people. Be careful nevertheless.
> > > 
> > > 
> > > With PREEMPT_RT my machine deadlocked within 20 minutes of boot.
> > > "apt-get dist-upgrade" seemed to trigger the crash.  I did not see any
> > > Oops unfortunately.
> > > 
> > > Lee
> > > 
> > 
> > Lee,
> > 
> > Just curious. Is this with UP or SMP? I currently have my UP box running 
> >   PREEMPT_RT, with no problems thus far. However, my SMP box dies while 
> > booting (with an oops). I am working on trying to get setup to capture 
> > the oops, although it might be tomorrow before I get that done.
> > 
> 
> UP.  It's 100% reproducible, this machine locks up over and over.  Seems
> to be associated with network activity by multiple processes.

OK, guess I need to go inspect the uses of synchronize_net() in addition
to synchronize_kernel...

If you do manage to get any additional info, please let me know...

						Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00
  2005-03-20 17:45 ` Paul E. McKenney
@ 2005-03-21  8:53   ` Ingo Molnar
  2005-03-21  9:01     ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2005-03-21  8:53 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel


got this early-bootup crash on an SMP box:

BUG: Unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
c0131aec
*pde = 00000000
Oops: 0002 [#1]
PREEMPT SMP 
Modules linked in:
CPU:    1
EIP:    0060:[<c0131aec>]    Not tainted VLI
EFLAGS: 00010293   (2.6.12-rc1-RT-V0.7.41-00) 
EIP is at rcu_advance_callbacks+0x3c/0x80
eax: 00000000   ebx: c050f280   ecx: c12191e0   edx: 00000000
esi: cfd2e560   edi: cfd2e4e0   ebp: cfd31dd0   esp: cfd31dc8
ds: 007b   es: 007b   ss: 0068   preempt: 00000003
Process khelper (pid: 60, threadinfo=cfd30000 task=cfd106a0)
Stack: 00000001 c12191e0 cfd31de4 c0131b67 00000001 cfd2e4d8 c13004d8 cfd31e00 
       c017e449 cfd2e4d8 c04d6e80 cfd32006 fffffffe cfd31e54 cfd31e70 c01749cc 
       cfd2e4d8 cfd31e50 cfd31e4c 00000001 cfd32001 cfd2e4d8 c03dd41f c04cf920 
Call Trace:
 [<c010412f>] show_stack+0x7f/0xa0 (28)
 [<c01042da>] show_registers+0x16a/0x1e0 (56)
 [<c0104511>] die+0x101/0x190 (64)
 [<c0115862>] do_page_fault+0x442/0x680 (216)
 [<c0103d9b>] error_code+0x2b/0x30 (68)
 [<c0131b67>] call_rcu+0x37/0x70 (20)
 [<c017e449>] dput+0x139/0x210 (28)
 [<c01749cc>] __link_path_walk+0x9fc/0xf80 (112)
 [<c0174f9a>] link_path_walk+0x4a/0x130 (100)
 [<c017538e>] path_lookup+0x9e/0x1c0 (32)
 [<c01707e8>] open_exec+0x28/0x100 (100)
 [<c0171a04>] do_execve+0x44/0x220 (36)
 [<c0101da2>] sys_execve+0x42/0xa0 (36)
 [<c0103315>] syscall_call+0x7/0xb (-8096)
---------------------------
| preempt count: 00000004 ]
| 4-level deep critical section nesting:
----------------------------------------
.. [<c0131b4f>] .... call_rcu+0x1f/0x70
.....[<c017e449>] ..   ( <= dput+0x139/0x210)
.. [<c0131ac3>] .... rcu_advance_callbacks+0x13/0x80
.....[<c0131b67>] ..   ( <= call_rcu+0x37/0x70)
.. [<c03dddca>] .... _raw_spin_lock_irqsave+0x1a/0xa0
.....[<c010444f>] ..   ( <= die+0x3f/0x190)
.. [<c013b9e6>] .... print_traces+0x16/0x50
.....[<c010412f>] ..   ( <= show_stack+0x7f/0xa0)

Code: 00 00 e8 78 2d 0a 00 8b 0c 85 20 20 51 c0 bb 80 f2 50 c0 01 d9 f0 83 44 24 00 00 a1 88 19 52 c0 39 41 40 74 23 8b 41 44 8b 51 50 <89> 02 8b 41 48 c7 41 44 00 00 00 00 89 41 50 8d 41 44 89 41 48 
 <6>note: khelper[60] exited with preempt_count 2

(gdb) list *0xc0131aec
0xc0131aec is in rcu_advance_callbacks (kernel/rcupdate.c:558).

553             struct rcu_data *rdp;
554
555             rdp = &get_cpu_var(rcu_data);
556             smp_mb();       /* prevent sampling batch # before list removal. */
557             if (rdp->batch != rcu_ctrlblk.batch) {
558                     *rdp->donetail = rdp->waitlist;
559                     rdp->donetail = rdp->waittail;
560                     rdp->waitlist = NULL;
561                     rdp->waittail = &rdp->waitlist;
562                     rdp->batch = rcu_ctrlblk.batch;
(gdb)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00
  2005-03-21  8:53   ` Ingo Molnar
@ 2005-03-21  9:01     ` Ingo Molnar
  0 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2005-03-21  9:01 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> got this early-bootup crash on an SMP box:

the same kernel image boots fine on an UP box, so it's an SMP bug.

note that the same occurs with your latest (synchronization barrier)
fixes applied as well.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00
  2005-03-20  0:24 ` Lee Revell
@ 2005-03-21 15:42   ` K.R. Foley
  0 siblings, 0 replies; 11+ messages in thread
From: K.R. Foley @ 2005-03-21 15:42 UTC (permalink / raw)
  To: Lee Revell; +Cc: Ingo Molnar, linux-kernel, Paul E. McKenney

Lee Revell wrote:
> On Sat, 2005-03-19 at 20:16 +0100, Ingo Molnar wrote:
> 
>>i have released the -V0.7.41-00 Real-Time Preemption patch (merged to
>>2.6.12-rc1), which can be downloaded from the usual place:
>>
>>  http://redhat.com/~mingo/realtime-preempt/
>>
> 
> 
> 3ms latency in the NFS client code.  Workload was a kernel compile over
> NFS.
> 
> preemption latency trace v1.1.4 on 2.6.12-rc1-RT-V0.7.41-00
> --------------------------------------------------------------------
>  latency: 3178 �s, #4095/14224, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1)
>     -----------------
>     | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0)
>     -----------------
> 
>                  _------=> CPU#            
>                 / _-----=> irqs-off        
>                | / _----=> need-resched    
>                || / _---=> hardirq/softirq 
>                ||| / _--=> preempt-depth   
>                |||| /                      
>                |||||     delay             
>    cmd     pid ||||| time  |   caller      
>       \   /    |||||   \   |   /           
> (T1/#0)            <...> 32105 0 3 00000004 00000000 [0011939614227867] 0.000ms (+4137027.445ms): <6500646c> (<61000000>)
> (T1/#2)            <...> 32105 0 3 00000004 00000002 [0011939614228097] 0.000ms (+0.000ms): __trace_start_sched_wakeup+0x9a/0xd0 <c013150a> (try_to_wake_up+0x94/0x140 <c0110474>)
> (T1/#3)            <...> 32105 0 3 00000003 00000003 [0011939614228436] 0.000ms (+0.000ms): preempt_schedule+0x11/0x80 <c02b57c1> (try_to_wake_up+0x94/0x140 <c0110474>)
> (T3/#4)    <...>-32105 0dn.3    0�s : try_to_wake_up+0x11e/0x140 <c01104fe> <<...>-2> (69 76): 
> (T1/#5)            <...> 32105 0 3 00000002 00000005 [0011939614228942] 0.000ms (+0.000ms): preempt_schedule+0x11/0x80 <c02b57c1> (try_to_wake_up+0xf8/0x140 <c01104d8>)
> (T1/#6)            <...> 32105 0 3 00000002 00000006 [0011939614229130] 0.000ms (+0.000ms): wake_up_process+0x35/0x40 <c0110555> (do_softirq+0x3f/0x50 <c011b05f>)
> (T6/#7)    <...>-32105 0dn.1    1�s < (1)
> (T1/#8)            <...> 32105 0 2 00000001 00000008 [0011939614229782] 0.001ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>)
> (T1/#9)            <...> 32105 0 2 00000001 00000009 [0011939614229985] 0.001ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>)
> (T1/#10)            <...> 32105 0 2 00000001 0000000a [0011939614230480] 0.001ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>)
> (T1/#11)            <...> 32105 0 2 00000001 0000000b [0011939614230634] 0.002ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>)
> (T1/#12)            <...> 32105 0 2 00000001 0000000c [0011939614230889] 0.002ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>)
> (T1/#13)            <...> 32105 0 2 00000001 0000000d [0011939614231034] 0.002ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>)
> (T1/#14)            <...> 32105 0 2 00000001 0000000e [0011939614231302] 0.002ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>)
> (T1/#15)            <...> 32105 0 2 00000001 0000000f [0011939614231419] 0.002ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>)
> 
> (last two lines just repeat)
> 
> This is probably not be a regression; I had never tested this with NFS
> before.
> 
> Lee
> 

Lee,

I did some testing with NFS quite a while ago. Actually it was the NFS 
compile within the stress-kernel pkg. I had similar crappy performance 
problems. Ingo pointed out that there were serious locking issues with 
NFS and suggested that I report the problems to the NFS folks, which I 
did. The reports seemed to fall mostly on deaf ears, at least that was 
my perspective. I decided to move on and took the NFS compile out of my 
stress testing to be revisited at a later time.

-- 
    kr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00
@ 2005-03-21 16:45 Paul Mckenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul Mckenney @ 2005-03-21 16:45 UTC (permalink / raw)
  To: mingo; +Cc: rlrevell, linux-kernel

> got this early-bootup crash on an SMP box:
> 
> BUG: Unable to handle kernel NULL pointer dereference at virtual address 00000000
>  printing eip:
> c0131aec
> *pde = 00000000
> Oops: 0002 [#1]
> PREEMPT SMP 
> Modules linked in:
> CPU:    1
> EIP:    0060:[<c0131aec>]    Not tainted VLI
> EFLAGS: 00010293   (2.6.12-rc1-RT-V0.7.41-00) 
> EIP is at rcu_advance_callbacks+0x3c/0x80
> eax: 00000000   ebx: c050f280   ecx: c12191e0   edx: 00000000
> esi: cfd2e560   edi: cfd2e4e0   ebp: cfd31dd0   esp: cfd31dc8
> ds: 007b   es: 007b   ss: 0068   preempt: 00000003
> Process khelper (pid: 60, threadinfo=cfd30000 task=cfd106a0)
> Stack: 00000001 c12191e0 cfd31de4 c0131b67 00000001 cfd2e4d8 c13004d8 cfd31e00 
>        c017e449 cfd2e4d8 c04d6e80 cfd32006 fffffffe cfd31e54 cfd31e70 c01749cc 
>        cfd2e4d8 cfd31e50 cfd31e4c 00000001 cfd32001 cfd2e4d8 c03dd41f c04cf920 
> Call Trace:
>  [<c010412f>] show_stack+0x7f/0xa0 (28)
>  [<c01042da>] show_registers+0x16a/0x1e0 (56)
>  [<c0104511>] die+0x101/0x190 (64)
>  [<c0115862>] do_page_fault+0x442/0x680 (216)
>  [<c0103d9b>] error_code+0x2b/0x30 (68)
>  [<c0131b67>] call_rcu+0x37/0x70 (20)
>  [<c017e449>] dput+0x139/0x210 (28)
>  [<c01749cc>] __link_path_walk+0x9fc/0xf80 (112)
>  [<c0174f9a>] link_path_walk+0x4a/0x130 (100)
>  [<c017538e>] path_lookup+0x9e/0x1c0 (32)
>  [<c01707e8>] open_exec+0x28/0x100 (100)
>  [<c0171a04>] do_execve+0x44/0x220 (36)
>  [<c0101da2>] sys_execve+0x42/0xa0 (36)
>  [<c0103315>] syscall_call+0x7/0xb (-8096)
> ---------------------------
> | preempt count: 00000004 ]
> | 4-level deep critical section nesting:
> ----------------------------------------
> .. [<c0131b4f>] .... call_rcu+0x1f/0x70
> .....[<c017e449>] ..   ( <= dput+0x139/0x210)
> .. [<c0131ac3>] .... rcu_advance_callbacks+0x13/0x80
> .....[<c0131b67>] ..   ( <= call_rcu+0x37/0x70)
> .. [<c03dddca>] .... _raw_spin_lock_irqsave+0x1a/0xa0
> .....[<c010444f>] ..   ( <= die+0x3f/0x190)
> .. [<c013b9e6>] .... print_traces+0x16/0x50
> .....[<c010412f>] ..   ( <= show_stack+0x7f/0xa0)
> Code: 00 00 e8 78 2d 0a 00 8b 0c 85 20 20 51 c0 bb 80 f2 50 c0 01 d9 f0 83 44 24 00 00 a1 88 19 52 c0 39 41 40 74 23 8b 41 44 8b 51 50 <89> 02 8b 41 48 c7 41 44 00 00 00 00 89 41 50 8d 41 44 89 41 48 
>  <6>note: khelper[60] exited with preempt_count 2
> 
> (gdb) list *0xc0131aec
> 0xc0131aec is in rcu_advance_callbacks (kernel/rcupdate.c:558).
> 
> 553             struct rcu_data *rdp;
> 554
> 555             rdp = &get_cpu_var(rcu_data);
> 556             smp_mb();       /* prevent sampling batch # before list removal. */
> 557             if (rdp->batch != rcu_ctrlblk.batch) {
> 558                     *rdp->donetail = rdp->waitlist;
> 559                     rdp->donetail = rdp->waittail;
> 560                     rdp->waitlist = NULL;
> 561                     rdp->waittail = &rdp->waitlist;
> 562                     rdp->batch = rcu_ctrlblk.batch;
> (gdb)

Does the following help?

						Thanx, Paul

diff -urpN -X dontdiff linux-2.6.11.fixes/kernel/rcupdate.c linux-2.6.11.fixes2/kernel/rcupdate.c
--- linux-2.6.11.fixes/kernel/rcupdate.c	Mon Mar 21 08:14:47 2005
+++ linux-2.6.11.fixes2/kernel/rcupdate.c	Mon Mar 21 08:17:00 2005
@@ -620,7 +620,7 @@ static void rcu_process_callbacks(void)
 		return;
 	}
 	rdp->donelist = NULL;
-	rdp->donetail = &rdp->waitlist;
+	rdp->donetail = &rdp->donelist;
 	put_cpu_var(rcu_data);
 	while (list) {
 		next = list->next;

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-03-21 16:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-21 16:45 [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00 Paul Mckenney
  -- strict thread matches above, loose matches on Subject: below --
2005-03-19 19:16 Ingo Molnar
2005-03-20  0:24 ` Lee Revell
2005-03-21 15:42   ` K.R. Foley
2005-03-20  1:33 ` Lee Revell
2005-03-20  1:50   ` K.R. Foley
2005-03-20  4:32     ` Lee Revell
2005-03-20 22:40       ` Paul E. McKenney
2005-03-20 17:45 ` Paul E. McKenney
2005-03-21  8:53   ` Ingo Molnar
2005-03-21  9:01     ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox