* [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00
@ 2005-03-19 19:16 Ingo Molnar
2005-03-20 0:24 ` Lee Revell
` (2 more replies)
0 siblings, 3 replies; 92+ messages in thread
From: Ingo Molnar @ 2005-03-19 19:16 UTC (permalink / raw)
To: linux-kernel; +Cc: Paul E. McKenney
i have released the -V0.7.41-00 Real-Time Preemption patch (merged to
2.6.12-rc1), which can be downloaded from the usual place:
http://redhat.com/~mingo/realtime-preempt/
the biggest change in this patch is the merge of Paul E. McKenney's
preemptable RCU code. The new RCU code is active on PREEMPT_RT. While it
is still quite experimental at this stage, it allowed the removal of
locking cruft (mainly in the networking code), so it could solve some of
the longstanding netfilter/networking deadlocks/crashes reported by a
number of people. Be careful nevertheless.
there are a couple of minor changes relative to Paul's latest
preemptable-RCU code drop:
- made the two variants two #ifdef blocks - this is sufficient for now
and we'll see what the best way is in the longer run.
- moved rcu_check_callbacks() from the timer IRQ to ksoftirqd. (the
timer IRQ still runs in hardirq context on PREEMPT_RT.)
- changed the irq-flags method to a preempt_disable()-based method, and
moved the lock taking outside of the critical sections. (due to locks
potentially sleeping on PREEMPT_RT).
to create a -V0.7.41-00 tree from scratch, the patching order is:
http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.11.tar.bz2
http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.12-rc1.bz2
http://redhat.com/~mingo/realtime-preempt/realtime-preempt-2.6.12-rc1-V0.7.41-00
Ingo
^ permalink raw reply [flat|nested] 92+ messages in thread* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00 2005-03-19 19:16 [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00 Ingo Molnar @ 2005-03-20 0:24 ` Lee Revell 2005-03-21 15:42 ` K.R. Foley 2005-03-20 1:33 ` Lee Revell 2005-03-20 17:45 ` Paul E. McKenney 2 siblings, 1 reply; 92+ messages in thread From: Lee Revell @ 2005-03-20 0:24 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Paul E. McKenney On Sat, 2005-03-19 at 20:16 +0100, Ingo Molnar wrote: > i have released the -V0.7.41-00 Real-Time Preemption patch (merged to > 2.6.12-rc1), which can be downloaded from the usual place: > > http://redhat.com/~mingo/realtime-preempt/ > 3ms latency in the NFS client code. Workload was a kernel compile over NFS. preemption latency trace v1.1.4 on 2.6.12-rc1-RT-V0.7.41-00 -------------------------------------------------------------------- latency: 3178 �s, #4095/14224, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) ----------------- | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0) ----------------- _------=> CPU# / _-----=> irqs-off | / _----=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth |||| / ||||| delay cmd pid ||||| time | caller \ / ||||| \ | / (T1/#0) <...> 32105 0 3 00000004 00000000 [0011939614227867] 0.000ms (+4137027.445ms): <6500646c> (<61000000>) (T1/#2) <...> 32105 0 3 00000004 00000002 [0011939614228097] 0.000ms (+0.000ms): __trace_start_sched_wakeup+0x9a/0xd0 <c013150a> (try_to_wake_up+0x94/0x140 <c0110474>) (T1/#3) <...> 32105 0 3 00000003 00000003 [0011939614228436] 0.000ms (+0.000ms): preempt_schedule+0x11/0x80 <c02b57c1> (try_to_wake_up+0x94/0x140 <c0110474>) (T3/#4) <...>-32105 0dn.3 0�s : try_to_wake_up+0x11e/0x140 <c01104fe> <<...>-2> (69 76): (T1/#5) <...> 32105 0 3 00000002 00000005 [0011939614228942] 0.000ms (+0.000ms): preempt_schedule+0x11/0x80 <c02b57c1> (try_to_wake_up+0xf8/0x140 <c01104d8>) (T1/#6) <...> 32105 0 3 00000002 00000006 [0011939614229130] 0.000ms (+0.000ms): wake_up_process+0x35/0x40 <c0110555> (do_softirq+0x3f/0x50 <c011b05f>) (T6/#7) <...>-32105 0dn.1 1�s < (1) (T1/#8) <...> 32105 0 2 00000001 00000008 [0011939614229782] 0.001ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>) (T1/#9) <...> 32105 0 2 00000001 00000009 [0011939614229985] 0.001ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>) (T1/#10) <...> 32105 0 2 00000001 0000000a [0011939614230480] 0.001ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>) (T1/#11) <...> 32105 0 2 00000001 0000000b [0011939614230634] 0.002ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>) (T1/#12) <...> 32105 0 2 00000001 0000000c [0011939614230889] 0.002ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>) (T1/#13) <...> 32105 0 2 00000001 0000000d [0011939614231034] 0.002ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>) (T1/#14) <...> 32105 0 2 00000001 0000000e [0011939614231302] 0.002ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>) (T1/#15) <...> 32105 0 2 00000001 0000000f [0011939614231419] 0.002ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>) (last two lines just repeat) This is probably not be a regression; I had never tested this with NFS before. Lee ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00 2005-03-20 0:24 ` Lee Revell @ 2005-03-21 15:42 ` K.R. Foley 0 siblings, 0 replies; 92+ messages in thread From: K.R. Foley @ 2005-03-21 15:42 UTC (permalink / raw) To: Lee Revell; +Cc: Ingo Molnar, linux-kernel, Paul E. McKenney Lee Revell wrote: > On Sat, 2005-03-19 at 20:16 +0100, Ingo Molnar wrote: > >>i have released the -V0.7.41-00 Real-Time Preemption patch (merged to >>2.6.12-rc1), which can be downloaded from the usual place: >> >> http://redhat.com/~mingo/realtime-preempt/ >> > > > 3ms latency in the NFS client code. Workload was a kernel compile over > NFS. > > preemption latency trace v1.1.4 on 2.6.12-rc1-RT-V0.7.41-00 > -------------------------------------------------------------------- > latency: 3178 �s, #4095/14224, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) > ----------------- > | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0) > ----------------- > > _------=> CPU# > / _-----=> irqs-off > | / _----=> need-resched > || / _---=> hardirq/softirq > ||| / _--=> preempt-depth > |||| / > ||||| delay > cmd pid ||||| time | caller > \ / ||||| \ | / > (T1/#0) <...> 32105 0 3 00000004 00000000 [0011939614227867] 0.000ms (+4137027.445ms): <6500646c> (<61000000>) > (T1/#2) <...> 32105 0 3 00000004 00000002 [0011939614228097] 0.000ms (+0.000ms): __trace_start_sched_wakeup+0x9a/0xd0 <c013150a> (try_to_wake_up+0x94/0x140 <c0110474>) > (T1/#3) <...> 32105 0 3 00000003 00000003 [0011939614228436] 0.000ms (+0.000ms): preempt_schedule+0x11/0x80 <c02b57c1> (try_to_wake_up+0x94/0x140 <c0110474>) > (T3/#4) <...>-32105 0dn.3 0�s : try_to_wake_up+0x11e/0x140 <c01104fe> <<...>-2> (69 76): > (T1/#5) <...> 32105 0 3 00000002 00000005 [0011939614228942] 0.000ms (+0.000ms): preempt_schedule+0x11/0x80 <c02b57c1> (try_to_wake_up+0xf8/0x140 <c01104d8>) > (T1/#6) <...> 32105 0 3 00000002 00000006 [0011939614229130] 0.000ms (+0.000ms): wake_up_process+0x35/0x40 <c0110555> (do_softirq+0x3f/0x50 <c011b05f>) > (T6/#7) <...>-32105 0dn.1 1�s < (1) > (T1/#8) <...> 32105 0 2 00000001 00000008 [0011939614229782] 0.001ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>) > (T1/#9) <...> 32105 0 2 00000001 00000009 [0011939614229985] 0.001ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>) > (T1/#10) <...> 32105 0 2 00000001 0000000a [0011939614230480] 0.001ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>) > (T1/#11) <...> 32105 0 2 00000001 0000000b [0011939614230634] 0.002ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>) > (T1/#12) <...> 32105 0 2 00000001 0000000c [0011939614230889] 0.002ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>) > (T1/#13) <...> 32105 0 2 00000001 0000000d [0011939614231034] 0.002ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>) > (T1/#14) <...> 32105 0 2 00000001 0000000e [0011939614231302] 0.002ms (+0.000ms): radix_tree_gang_lookup+0xe/0x70 <c01e05ee> (nfs_wait_on_requests+0x6d/0x110 <c01c744d>) > (T1/#15) <...> 32105 0 2 00000001 0000000f [0011939614231419] 0.002ms (+0.000ms): __lookup+0xe/0xd0 <c01e051e> (radix_tree_gang_lookup+0x52/0x70 <c01e0632>) > > (last two lines just repeat) > > This is probably not be a regression; I had never tested this with NFS > before. > > Lee > Lee, I did some testing with NFS quite a while ago. Actually it was the NFS compile within the stress-kernel pkg. I had similar crappy performance problems. Ingo pointed out that there were serious locking issues with NFS and suggested that I report the problems to the NFS folks, which I did. The reports seemed to fall mostly on deaf ears, at least that was my perspective. I decided to move on and took the NFS compile out of my stress testing to be revisited at a later time. -- kr ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00 2005-03-19 19:16 [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00 Ingo Molnar 2005-03-20 0:24 ` Lee Revell @ 2005-03-20 1:33 ` Lee Revell 2005-03-20 1:50 ` K.R. Foley 2005-03-20 17:45 ` Paul E. McKenney 2 siblings, 1 reply; 92+ messages in thread From: Lee Revell @ 2005-03-20 1:33 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Paul E. McKenney On Sat, 2005-03-19 at 20:16 +0100, Ingo Molnar wrote: > the biggest change in this patch is the merge of Paul E. McKenney's > preemptable RCU code. The new RCU code is active on PREEMPT_RT. While it > is still quite experimental at this stage, it allowed the removal of > locking cruft (mainly in the networking code), so it could solve some of > the longstanding netfilter/networking deadlocks/crashes reported by a > number of people. Be careful nevertheless. With PREEMPT_RT my machine deadlocked within 20 minutes of boot. "apt-get dist-upgrade" seemed to trigger the crash. I did not see any Oops unfortunately. Lee ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00 2005-03-20 1:33 ` Lee Revell @ 2005-03-20 1:50 ` K.R. Foley 2005-03-20 4:32 ` Lee Revell 0 siblings, 1 reply; 92+ messages in thread From: K.R. Foley @ 2005-03-20 1:50 UTC (permalink / raw) To: Lee Revell; +Cc: Ingo Molnar, linux-kernel, Paul E. McKenney Lee Revell wrote: > On Sat, 2005-03-19 at 20:16 +0100, Ingo Molnar wrote: > >>the biggest change in this patch is the merge of Paul E. McKenney's >>preemptable RCU code. The new RCU code is active on PREEMPT_RT. While it >>is still quite experimental at this stage, it allowed the removal of >>locking cruft (mainly in the networking code), so it could solve some of >>the longstanding netfilter/networking deadlocks/crashes reported by a >>number of people. Be careful nevertheless. > > > With PREEMPT_RT my machine deadlocked within 20 minutes of boot. > "apt-get dist-upgrade" seemed to trigger the crash. I did not see any > Oops unfortunately. > > Lee > Lee, Just curious. Is this with UP or SMP? I currently have my UP box running PREEMPT_RT, with no problems thus far. However, my SMP box dies while booting (with an oops). I am working on trying to get setup to capture the oops, although it might be tomorrow before I get that done. -- kr ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00 2005-03-20 1:50 ` K.R. Foley @ 2005-03-20 4:32 ` Lee Revell 2005-03-20 22:40 ` Paul E. McKenney 0 siblings, 1 reply; 92+ messages in thread From: Lee Revell @ 2005-03-20 4:32 UTC (permalink / raw) To: K.R. Foley; +Cc: Ingo Molnar, linux-kernel, Paul E. McKenney On Sat, 2005-03-19 at 19:50 -0600, K.R. Foley wrote: > Lee Revell wrote: > > On Sat, 2005-03-19 at 20:16 +0100, Ingo Molnar wrote: > > > >>the biggest change in this patch is the merge of Paul E. McKenney's > >>preemptable RCU code. The new RCU code is active on PREEMPT_RT. While it > >>is still quite experimental at this stage, it allowed the removal of > >>locking cruft (mainly in the networking code), so it could solve some of > >>the longstanding netfilter/networking deadlocks/crashes reported by a > >>number of people. Be careful nevertheless. > > > > > > With PREEMPT_RT my machine deadlocked within 20 minutes of boot. > > "apt-get dist-upgrade" seemed to trigger the crash. I did not see any > > Oops unfortunately. > > > > Lee > > > > Lee, > > Just curious. Is this with UP or SMP? I currently have my UP box running > PREEMPT_RT, with no problems thus far. However, my SMP box dies while > booting (with an oops). I am working on trying to get setup to capture > the oops, although it might be tomorrow before I get that done. > UP. It's 100% reproducible, this machine locks up over and over. Seems to be associated with network activity by multiple processes. Lee ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00 2005-03-20 4:32 ` Lee Revell @ 2005-03-20 22:40 ` Paul E. McKenney 0 siblings, 0 replies; 92+ messages in thread From: Paul E. McKenney @ 2005-03-20 22:40 UTC (permalink / raw) To: Lee Revell; +Cc: K.R. Foley, Ingo Molnar, linux-kernel On Sat, Mar 19, 2005 at 11:32:59PM -0500, Lee Revell wrote: > On Sat, 2005-03-19 at 19:50 -0600, K.R. Foley wrote: > > Lee Revell wrote: > > > On Sat, 2005-03-19 at 20:16 +0100, Ingo Molnar wrote: > > > > > >>the biggest change in this patch is the merge of Paul E. McKenney's > > >>preemptable RCU code. The new RCU code is active on PREEMPT_RT. While it > > >>is still quite experimental at this stage, it allowed the removal of > > >>locking cruft (mainly in the networking code), so it could solve some of > > >>the longstanding netfilter/networking deadlocks/crashes reported by a > > >>number of people. Be careful nevertheless. > > > > > > > > > With PREEMPT_RT my machine deadlocked within 20 minutes of boot. > > > "apt-get dist-upgrade" seemed to trigger the crash. I did not see any > > > Oops unfortunately. > > > > > > Lee > > > > > > > Lee, > > > > Just curious. Is this with UP or SMP? I currently have my UP box running > > PREEMPT_RT, with no problems thus far. However, my SMP box dies while > > booting (with an oops). I am working on trying to get setup to capture > > the oops, although it might be tomorrow before I get that done. > > > > UP. It's 100% reproducible, this machine locks up over and over. Seems > to be associated with network activity by multiple processes. OK, guess I need to go inspect the uses of synchronize_net() in addition to synchronize_kernel... If you do manage to get any additional info, please let me know... Thanx, Paul ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00 2005-03-19 19:16 [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00 Ingo Molnar 2005-03-20 0:24 ` Lee Revell 2005-03-20 1:33 ` Lee Revell @ 2005-03-20 17:45 ` Paul E. McKenney 2005-03-21 8:53 ` Ingo Molnar 2 siblings, 1 reply; 92+ messages in thread From: Paul E. McKenney @ 2005-03-20 17:45 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel On Sat, Mar 19, 2005 at 08:16:58PM +0100, Ingo Molnar wrote: > > i have released the -V0.7.41-00 Real-Time Preemption patch (merged to > 2.6.12-rc1), which can be downloaded from the usual place: > > http://redhat.com/~mingo/realtime-preempt/ > > the biggest change in this patch is the merge of Paul E. McKenney's > preemptable RCU code. The new RCU code is active on PREEMPT_RT. While it > is still quite experimental at this stage, it allowed the removal of > locking cruft (mainly in the networking code), so it could solve some of > the longstanding netfilter/networking deadlocks/crashes reported by a > number of people. Be careful nevertheless. > > there are a couple of minor changes relative to Paul's latest > preemptable-RCU code drop: > > - made the two variants two #ifdef blocks - this is sufficient for now > and we'll see what the best way is in the longer run. > > - moved rcu_check_callbacks() from the timer IRQ to ksoftirqd. (the > timer IRQ still runs in hardirq context on PREEMPT_RT.) > > - changed the irq-flags method to a preempt_disable()-based method, and > moved the lock taking outside of the critical sections. (due to locks > potentially sleeping on PREEMPT_RT). > > to create a -V0.7.41-00 tree from scratch, the patching order is: > > http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.11.tar.bz2 > http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.12-rc1.bz2 > http://redhat.com/~mingo/realtime-preempt/realtime-preempt-2.6.12-rc1-V0.7.41-00 Some proposed fixes from a quick scan (untested, probably does not even compile). These proposed fixes fall into the following categories: o Some functions that should be static. o Introduced a synchronize_kernel_barrier() for a number of uses of synchronize_kernel() that are broken by the new implementation. Note that synchronize_kernel_barrier() is the same as synchronize_kernel() in non-CONFIG_PREEMPT_RT kernels. Not clear that synchronize_kernel_barrier() is strong enough for some uses, may need another API (synchronize_kernel_barrier_voluntary()???) that waits for all tasks to -voluntary- context switch or be executing in user space (these are marked with FIXME in the attached patch). Dipankar and/or Rusty put out a patch that did this some time back -- this was when we were trying to make an RCU that worked in CONFIG_PREEMPT kernels, but did not want preempt_disable() on the read side. That said, some of the synchronize_kernel_barrier()s marked with FIXME may be fixable more simply by inserting rcu_read_lock()/rcu_read_unlock() pairs in appropriate places. o Merged the two identical implementations each of rcu_dereference() and rcu_assign_pointer(). o Added an rcu_read_lock() or two. Clearly need to be searching for patches containing "synchronize_kernel" in addition to patches containing "rcu"... Thoughts? Thanx, Paul Signed-off-by: <paulmck@us.ibm.com> diff -urpN -X dontdiff linux-2.6.11/arch/i386/oprofile/nmi_timer_int.c linux-2.6.11.fixes/arch/i386/oprofile/nmi_timer_int.c --- linux-2.6.11/arch/i386/oprofile/nmi_timer_int.c Tue Mar 1 23:37:52 2005 +++ linux-2.6.11.fixes/arch/i386/oprofile/nmi_timer_int.c Sun Mar 20 08:40:31 2005 @@ -36,7 +36,7 @@ static void timer_stop(void) { enable_timer_nmi_watchdog(); unset_nmi_callback(); - synchronize_kernel(); + synchronize_kernel_barrier(); } diff -urpN -X dontdiff linux-2.6.11/arch/ppc64/kernel/ItLpQueue.c linux-2.6.11.fixes/arch/ppc64/kernel/ItLpQueue.c --- linux-2.6.11/arch/ppc64/kernel/ItLpQueue.c Tue Mar 1 23:37:48 2005 +++ linux-2.6.11.fixes/arch/ppc64/kernel/ItLpQueue.c Sun Mar 20 08:48:29 2005 @@ -142,7 +142,9 @@ unsigned ItLpQueue_process( struct ItLpQ lpQueue->xLpIntCountByType[nextLpEvent->xType]++; if ( nextLpEvent->xType < HvLpEvent_Type_NumTypes && lpEventHandler[nextLpEvent->xType] ) + rcu_read_lock(); lpEventHandler[nextLpEvent->xType](nextLpEvent, regs); + rcu_read_unlock(); else printk(KERN_INFO "Unexpected Lp Event type=%d\n", nextLpEvent->xType ); diff -urpN -X dontdiff linux-2.6.11/arch/x86_64/kernel/mce.c linux-2.6.11.fixes/arch/x86_64/kernel/mce.c --- linux-2.6.11/arch/x86_64/kernel/mce.c Tue Mar 1 23:37:52 2005 +++ linux-2.6.11.fixes/arch/x86_64/kernel/mce.c Sun Mar 20 08:49:45 2005 @@ -392,7 +392,7 @@ static ssize_t mce_read(struct file *fil memset(mcelog.entry, 0, next * sizeof(struct mce)); mcelog.next = 0; - synchronize_kernel(); + synchronize_kernel_barrier(); /* Collect entries that were still getting written before the synchronize. */ diff -urpN -X dontdiff linux-2.6.11/drivers/acpi/processor_idle.c linux-2.6.11.fixes/drivers/acpi/processor_idle.c --- linux-2.6.11/drivers/acpi/processor_idle.c Tue Mar 1 23:38:25 2005 +++ linux-2.6.11.fixes/drivers/acpi/processor_idle.c Sun Mar 20 09:01:44 2005 @@ -838,7 +838,7 @@ int acpi_processor_cst_has_changed (stru /* Fall back to the default idle loop */ pm_idle = pm_idle_save; - synchronize_kernel(); + synchronize_kernel_barrier(); /* FIXME: strong enough? */ pr->flags.power = 0; result = acpi_processor_get_power_info(pr); diff -urpN -X dontdiff linux-2.6.11/drivers/char/ipmi/ipmi_si_intf.c linux-2.6.11.fixes/drivers/char/ipmi/ipmi_si_intf.c --- linux-2.6.11/drivers/char/ipmi/ipmi_si_intf.c Sat Mar 19 14:04:13 2005 +++ linux-2.6.11.fixes/drivers/char/ipmi/ipmi_si_intf.c Sun Mar 20 08:39:49 2005 @@ -2194,7 +2194,7 @@ static int init_one_smi(int intf_num, st /* Wait until we know that we are out of any interrupt handlers might have been running before we freed the interrupt. */ - synchronize_kernel(); + synchronize_kernel_barrier(); if (new_smi->si_sm) { if (new_smi->handlers) @@ -2307,7 +2307,7 @@ static void __exit cleanup_one_si(struct /* Wait until we know that we are out of any interrupt handlers might have been running before we freed the interrupt. */ - synchronize_kernel(); + synchronize_kernel_barrier(); /* Wait for the timer to stop. This avoids problems with race conditions removing the timer here. */ diff -urpN -X dontdiff linux-2.6.11/drivers/input/keyboard/atkbd.c linux-2.6.11.fixes/drivers/input/keyboard/atkbd.c --- linux-2.6.11/drivers/input/keyboard/atkbd.c Sat Mar 19 14:04:16 2005 +++ linux-2.6.11.fixes/drivers/input/keyboard/atkbd.c Sun Mar 20 09:02:33 2005 @@ -678,7 +678,7 @@ static void atkbd_disconnect(struct seri atkbd_disable(atkbd); /* make sure we don't have a command in flight */ - synchronize_kernel(); + synchronize_kernel_barrier(); /* FIXME: Strong enough? */ flush_scheduled_work(); device_remove_file(&serio->dev, &atkbd_attr_extra); diff -urpN -X dontdiff linux-2.6.11/drivers/input/serio/i8042.c linux-2.6.11.fixes/drivers/input/serio/i8042.c --- linux-2.6.11/drivers/input/serio/i8042.c Sat Mar 19 14:04:16 2005 +++ linux-2.6.11.fixes/drivers/input/serio/i8042.c Sun Mar 20 09:27:35 2005 @@ -396,7 +396,7 @@ static void i8042_stop(struct serio *ser struct i8042_port *port = serio->port_data; port->exists = 0; - synchronize_kernel(); + synchronize_kernel_barrier(); /* FIXME: Strong enough? */ port->serio = NULL; } diff -urpN -X dontdiff linux-2.6.11/drivers/net/r8169.c linux-2.6.11.fixes/drivers/net/r8169.c --- linux-2.6.11/drivers/net/r8169.c Sat Mar 19 14:04:19 2005 +++ linux-2.6.11.fixes/drivers/net/r8169.c Sun Mar 20 09:09:06 2005 @@ -2385,7 +2385,7 @@ core_down: } /* Give a racing hard_start_xmit a few cycles to complete. */ - synchronize_kernel(); + synchronize_kernel_barrier(); /* FIXME: Strong enough? */ /* * And now for the 50k$ question: are IRQ disabled or not ? diff -urpN -X dontdiff linux-2.6.11/drivers/s390/cio/airq.c linux-2.6.11.fixes/drivers/s390/cio/airq.c --- linux-2.6.11/drivers/s390/cio/airq.c Tue Mar 1 23:38:17 2005 +++ linux-2.6.11.fixes/drivers/s390/cio/airq.c Sun Mar 20 09:11:57 2005 @@ -45,7 +45,7 @@ s390_register_adapter_interrupt (adapter else ret = (cmpxchg(&adapter_handler, NULL, handler) ? -EBUSY : 0); if (!ret) - synchronize_kernel(); + synchronize_kernel_barrier(); /* FIXME: Strong enough? */ sprintf (dbf_txt, "ret:%d", ret); CIO_TRACE_EVENT (4, dbf_txt); @@ -65,7 +65,7 @@ s390_unregister_adapter_interrupt (adapt ret = -EINVAL; else { adapter_handler = NULL; - synchronize_kernel(); + synchronize_kernel_barrier(); /* FIXME: Strong enough? */ ret = 0; } sprintf (dbf_txt, "ret:%d", ret); diff -urpN -X dontdiff linux-2.6.11/include/linux/rcupdate.h linux-2.6.11.fixes/include/linux/rcupdate.h --- linux-2.6.11/include/linux/rcupdate.h Sat Mar 19 14:09:52 2005 +++ linux-2.6.11.fixes/include/linux/rcupdate.h Sun Mar 20 09:24:20 2005 @@ -222,6 +222,8 @@ static inline int rcu_pending(int cpu) */ #define rcu_read_unlock_bh() local_bh_enable() +#endif /* CONFIG_PREEMPT_RT */ + /** * rcu_dereference - fetch an RCU-protected pointer in an * RCU read-side critical section. This pointer may later @@ -256,6 +258,22 @@ static inline int rcu_pending(int cpu) (p) = (v); \ }) +#ifndef CONFIG_PREEMPT_RT + +/** + * synchronize_kernel_barrier - block until each CPU executes a + * context switch, appears in the idle loop, or otherwise exits + * kernel execution. This is synonymous with synchronize_kernel() + * in the classic RCU implementation, but not in some RCU + * implementations optimized for realtime use. In these realtime + * uses, synchronize_kernel() can potentially return immediately, + * even on SMP systems. + * + * NMI-related uses of RCU need to use synchronize_kernel_barrier(). + */ + +#define synchronize_kernel_barrer() synchronize_kernel() + extern void rcu_init(void); extern void rcu_check_callbacks(int cpu, int user); extern void rcu_restart_cpu(int cpu); @@ -275,40 +293,6 @@ extern void synchronize_kernel(void); #define rcu_bh_qsctr_inc(cpu) #define rcu_qsctr_inc(cpu) -/** - * rcu_dereference - fetch an RCU-protected pointer in an - * RCU read-side critical section. This pointer may later - * be safely dereferenced. - * - * Inserts memory barriers on architectures that require them - * (currently only the Alpha), and, more importantly, documents - * exactly which pointers are protected by RCU. - */ - -#define rcu_dereference(p) ({ \ - typeof(p) _________p1 = p; \ - smp_read_barrier_depends(); \ - (_________p1); \ - }) - -/** - * rcu_assign_pointer - assign (publicize) a pointer to a newly - * initialized structure that will be dereferenced by RCU read-side - * critical sections. Returns the value assigned. - * - * Inserts memory barriers on architectures that require them - * (pretty much all of them other than x86), and also prevents - * the compiler from reordering the code that initializes the - * structure after the pointer assignment. More importantly, this - * call documents which pointers will be dereferenced by RCU read-side - * code. - */ - -#define rcu_assign_pointer(p, v) ({ \ - smp_wmb(); \ - (p) = (v); \ - }) - extern void rcu_init(void); /* Exported interfaces */ @@ -317,6 +301,7 @@ extern void FASTCALL(call_rcu(struct rcu extern void rcu_read_lock(void); extern void rcu_read_unlock(void); extern void synchronize_kernel(void); +extern void synchronize_kernel_barrier(void); extern int rcu_pending(int cpu); extern void rcu_check_callbacks(int cpu, int user); diff -urpN -X dontdiff linux-2.6.11/kernel/module.c linux-2.6.11.fixes/kernel/module.c --- linux-2.6.11/kernel/module.c Sat Mar 19 14:09:51 2005 +++ linux-2.6.11.fixes/kernel/module.c Sun Mar 20 09:13:23 2005 @@ -1812,7 +1812,7 @@ sys_init_module(void __user *umod, /* Init routine failed: abort. Try to protect us from buggy refcounters. */ mod->state = MODULE_STATE_GOING; - synchronize_kernel(); + synchronize_kernel_barrier(); /* FIXME: Strong enough? */ if (mod->unsafe) printk(KERN_ERR "%s: module is now stuck!\n", mod->name); diff -urpN -X dontdiff linux-2.6.11/kernel/profile.c linux-2.6.11.fixes/kernel/profile.c --- linux-2.6.11/kernel/profile.c Sat Mar 19 14:09:51 2005 +++ linux-2.6.11.fixes/kernel/profile.c Sun Mar 20 09:18:05 2005 @@ -194,7 +194,7 @@ void unregister_timer_hook(int (*hook)(s WARN_ON(hook != timer_hook); timer_hook = NULL; /* make sure all CPUs see the NULL hook */ - synchronize_kernel(); + synchronize_kernel_barrier(); /* FIXME: Strong enough? */ } EXPORT_SYMBOL_GPL(register_timer_hook); diff -urpN -X dontdiff linux-2.6.11/kernel/rcupdate.c linux-2.6.11.fixes/kernel/rcupdate.c --- linux-2.6.11/kernel/rcupdate.c Sat Mar 19 14:09:51 2005 +++ linux-2.6.11.fixes/kernel/rcupdate.c Sun Mar 20 09:32:13 2005 @@ -548,7 +548,37 @@ void synchronize_kernel(void) } } -void rcu_advance_callbacks(void) +/* + * FIXME: Note that this implementation might not be strong enough + * for a number of driver uses of synchronize_kernel. Some of these + * uses seem to assume a non-CONFIG_PREEMPT kernel, so may need + * to come up with a different approach. Note that these uses + * are -not- waiting to free memory, but rather to ensure that + * a change is seen by all future driver invocations. + * + * The correct implementation is likely to be a tasklist scan, + * which blocks until all tasks encounter a voluntary context switch. + * If so, this implementation is required in CONFIG_PREEMPT + * kernels as well as CONFIG_PREEMPT_RT kernels. + */ + +void synchronize_kernel_barrier(void) +{ + cpumask_t oldmask; + cpumask_t curmask; + int cpu; + + if (sched_getaffinity(0, &oldmask) < 0) { + oldmask = cpu_possible_mask; + } + for_each_cpu(cpu) { + sched_setaffinity(0, cpumask_of_cpu(cpu)); + schedule(); + } + sched_setaffinity(0, oldmask); +} + +static void rcu_advance_callbacks(void) { struct rcu_data *rdp; @@ -578,7 +608,7 @@ void fastcall call_rcu(struct rcu_head * put_cpu_var(rcu_data); } -void rcu_process_callbacks(void) +static void rcu_process_callbacks(void) { struct rcu_head *next, *list; struct rcu_data *rdp; ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00 2005-03-20 17:45 ` Paul E. McKenney @ 2005-03-21 8:53 ` Ingo Molnar 2005-03-21 9:01 ` Ingo Molnar 0 siblings, 1 reply; 92+ messages in thread From: Ingo Molnar @ 2005-03-21 8:53 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel got this early-bootup crash on an SMP box: BUG: Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c0131aec *pde = 00000000 Oops: 0002 [#1] PREEMPT SMP Modules linked in: CPU: 1 EIP: 0060:[<c0131aec>] Not tainted VLI EFLAGS: 00010293 (2.6.12-rc1-RT-V0.7.41-00) EIP is at rcu_advance_callbacks+0x3c/0x80 eax: 00000000 ebx: c050f280 ecx: c12191e0 edx: 00000000 esi: cfd2e560 edi: cfd2e4e0 ebp: cfd31dd0 esp: cfd31dc8 ds: 007b es: 007b ss: 0068 preempt: 00000003 Process khelper (pid: 60, threadinfo=cfd30000 task=cfd106a0) Stack: 00000001 c12191e0 cfd31de4 c0131b67 00000001 cfd2e4d8 c13004d8 cfd31e00 c017e449 cfd2e4d8 c04d6e80 cfd32006 fffffffe cfd31e54 cfd31e70 c01749cc cfd2e4d8 cfd31e50 cfd31e4c 00000001 cfd32001 cfd2e4d8 c03dd41f c04cf920 Call Trace: [<c010412f>] show_stack+0x7f/0xa0 (28) [<c01042da>] show_registers+0x16a/0x1e0 (56) [<c0104511>] die+0x101/0x190 (64) [<c0115862>] do_page_fault+0x442/0x680 (216) [<c0103d9b>] error_code+0x2b/0x30 (68) [<c0131b67>] call_rcu+0x37/0x70 (20) [<c017e449>] dput+0x139/0x210 (28) [<c01749cc>] __link_path_walk+0x9fc/0xf80 (112) [<c0174f9a>] link_path_walk+0x4a/0x130 (100) [<c017538e>] path_lookup+0x9e/0x1c0 (32) [<c01707e8>] open_exec+0x28/0x100 (100) [<c0171a04>] do_execve+0x44/0x220 (36) [<c0101da2>] sys_execve+0x42/0xa0 (36) [<c0103315>] syscall_call+0x7/0xb (-8096) --------------------------- | preempt count: 00000004 ] | 4-level deep critical section nesting: ---------------------------------------- .. [<c0131b4f>] .... call_rcu+0x1f/0x70 .....[<c017e449>] .. ( <= dput+0x139/0x210) .. [<c0131ac3>] .... rcu_advance_callbacks+0x13/0x80 .....[<c0131b67>] .. ( <= call_rcu+0x37/0x70) .. [<c03dddca>] .... _raw_spin_lock_irqsave+0x1a/0xa0 .....[<c010444f>] .. ( <= die+0x3f/0x190) .. [<c013b9e6>] .... print_traces+0x16/0x50 .....[<c010412f>] .. ( <= show_stack+0x7f/0xa0) Code: 00 00 e8 78 2d 0a 00 8b 0c 85 20 20 51 c0 bb 80 f2 50 c0 01 d9 f0 83 44 24 00 00 a1 88 19 52 c0 39 41 40 74 23 8b 41 44 8b 51 50 <89> 02 8b 41 48 c7 41 44 00 00 00 00 89 41 50 8d 41 44 89 41 48 <6>note: khelper[60] exited with preempt_count 2 (gdb) list *0xc0131aec 0xc0131aec is in rcu_advance_callbacks (kernel/rcupdate.c:558). 553 struct rcu_data *rdp; 554 555 rdp = &get_cpu_var(rcu_data); 556 smp_mb(); /* prevent sampling batch # before list removal. */ 557 if (rdp->batch != rcu_ctrlblk.batch) { 558 *rdp->donetail = rdp->waitlist; 559 rdp->donetail = rdp->waittail; 560 rdp->waitlist = NULL; 561 rdp->waittail = &rdp->waitlist; 562 rdp->batch = rcu_ctrlblk.batch; (gdb) ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00 2005-03-21 8:53 ` Ingo Molnar @ 2005-03-21 9:01 ` Ingo Molnar 2005-03-21 9:06 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 Ingo Molnar 0 siblings, 1 reply; 92+ messages in thread From: Ingo Molnar @ 2005-03-21 9:01 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel * Ingo Molnar <mingo@elte.hu> wrote: > got this early-bootup crash on an SMP box: the same kernel image boots fine on an UP box, so it's an SMP bug. note that the same occurs with your latest (synchronization barrier) fixes applied as well. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 2005-03-21 9:01 ` Ingo Molnar @ 2005-03-21 9:06 ` Ingo Molnar 2005-03-21 23:10 ` Magnus Naeslund(t) 2005-03-22 5:43 ` Paul E. McKenney 0 siblings, 2 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-21 9:06 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel * Ingo Molnar <mingo@elte.hu> wrote: > > * Ingo Molnar <mingo@elte.hu> wrote: > > > got this early-bootup crash on an SMP box: > > the same kernel image boots fine on an UP box, so it's an SMP bug. > > note that the same occurs with your latest (synchronization barrier) > fixes applied as well. i've uploaded my current tree (-V0.7.41-01) to: http://redhat.com/~mingo/realtime-preempt/ it includes the latest round of RCU fixes - but doesnt solve the SMP bootup crash. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 2005-03-21 9:06 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 Ingo Molnar @ 2005-03-21 23:10 ` Magnus Naeslund(t) 2005-03-22 5:40 ` Paul E. McKenney 2005-03-22 5:43 ` Paul E. McKenney 1 sibling, 1 reply; 92+ messages in thread From: Magnus Naeslund(t) @ 2005-03-21 23:10 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel Ingo Molnar wrote: > > i've uploaded my current tree (-V0.7.41-01) to: > > http://redhat.com/~mingo/realtime-preempt/ > > it includes the latest round of RCU fixes - but doesnt solve the SMP > bootup crash. > > Ingo With this kernel I can run for some 20 minutes, then the ip_dst_cache overflows. I gather it has something to do with RCU... If I just let it run and grep ip_dst_cache /proc/slab it goes up to 4096 (max) and then the printk's are starting, and the networks stops. After i up the limit to the double (echo "8192" > /proc/sys/net/ipv4/route/max_size) network starts to work again. But of course, after a while it overflows again: # grep ip_dst_cache /proc/slabinfo ip_dst_cache 8192 8205 256 15 1 : tunables 16 8 0 : slabdata 547 547 0 I haven't tried the vanilla 2.6.12-rc2 kernel, and I don't have the time to do that right now, but i figured it would have something to do with your patch. Older 2.6.8 works just fine. Regards, Magnus ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 2005-03-21 23:10 ` Magnus Naeslund(t) @ 2005-03-22 5:40 ` Paul E. McKenney 2005-03-22 8:50 ` Ingo Molnar 2005-03-22 13:56 ` Magnus Naeslund(t) 0 siblings, 2 replies; 92+ messages in thread From: Paul E. McKenney @ 2005-03-22 5:40 UTC (permalink / raw) To: Magnus Naeslund(t); +Cc: Ingo Molnar, linux-kernel On Tue, Mar 22, 2005 at 12:10:14AM +0100, Magnus Naeslund(t) wrote: > Ingo Molnar wrote: > > > >i've uploaded my current tree (-V0.7.41-01) to: > > > > http://redhat.com/~mingo/realtime-preempt/ > > > >it includes the latest round of RCU fixes - but doesnt solve the SMP > >bootup crash. > > > > Ingo > > With this kernel I can run for some 20 minutes, then the ip_dst_cache > overflows. > I gather it has something to do with RCU... > > If I just let it run and grep ip_dst_cache /proc/slab it goes up to 4096 > (max) and then the printk's are starting, and the networks stops. > After i up the limit to the double (echo "8192" > > /proc/sys/net/ipv4/route/max_size) network starts to work again. > But of course, after a while it overflows again: > > # grep ip_dst_cache /proc/slabinfo > ip_dst_cache 8192 8205 256 15 1 : tunables 16 8 > 0 : slabdata 547 547 0 > > I haven't tried the vanilla 2.6.12-rc2 kernel, and I don't have the time > to do that right now, but i figured it would have something to do with > your patch. Older 2.6.8 works just fine. Hello, Magnus, I believe that my earlier patch might take care of this (included again for convenience). Thanx, Paul Signed-off-by: <paulmck@us.ibm.com> diff -urpN -X dontdiff linux-2.6.11.fixes/kernel/rcupdate.c linux-2.6.11.fixes2/kernel/rcupdate.c --- linux-2.6.11.fixes/kernel/rcupdate.c Mon Mar 21 08:14:47 2005 +++ linux-2.6.11.fixes2/kernel/rcupdate.c Mon Mar 21 08:17:00 2005 @@ -620,7 +620,7 @@ static void rcu_process_callbacks(void) return; } rdp->donelist = NULL; - rdp->donetail = &rdp->waitlist; + rdp->donetail = &rdp->donelist; put_cpu_var(rcu_data); while (list) { next = list->next; ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 2005-03-22 5:40 ` Paul E. McKenney @ 2005-03-22 8:50 ` Ingo Molnar 2005-03-22 13:56 ` Magnus Naeslund(t) 1 sibling, 0 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-22 8:50 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Magnus Naeslund(t), linux-kernel * Paul E. McKenney <paulmck@us.ibm.com> wrote: > return; > } > rdp->donelist = NULL; > - rdp->donetail = &rdp->waitlist; > + rdp->donetail = &rdp->donelist; > put_cpu_var(rcu_data); > while (list) { > next = list->next; seems like the RCU code should use list.h primitives, to avoid bugs like this and to make the code more readable. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 2005-03-22 5:40 ` Paul E. McKenney 2005-03-22 8:50 ` Ingo Molnar @ 2005-03-22 13:56 ` Magnus Naeslund(t) 2005-03-23 5:46 ` Paul E. McKenney 1 sibling, 1 reply; 92+ messages in thread From: Magnus Naeslund(t) @ 2005-03-22 13:56 UTC (permalink / raw) To: paulmck; +Cc: Ingo Molnar, linux-kernel Paul E. McKenney wrote: > > Hello, Magnus, > > I believe that my earlier patch might take care of this (included again > for convenience). > > Thanx, Paul > I just tested this patch, and it did not solve my problem. The dst cache still grows to the maximum and starts spitting errors via printk. I'll do a make mrproper build too to make sure I didn't make any mistakes. Regards, Magnus ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 2005-03-22 13:56 ` Magnus Naeslund(t) @ 2005-03-23 5:46 ` Paul E. McKenney 0 siblings, 0 replies; 92+ messages in thread From: Paul E. McKenney @ 2005-03-23 5:46 UTC (permalink / raw) To: Magnus Naeslund(t); +Cc: Ingo Molnar, linux-kernel On Tue, Mar 22, 2005 at 02:56:56PM +0100, Magnus Naeslund(t) wrote: > Paul E. McKenney wrote: > > > >Hello, Magnus, > > > >I believe that my earlier patch might take care of this (included again > >for convenience). > > > > Thanx, Paul > > > > I just tested this patch, and it did not solve my problem. > The dst cache still grows to the maximum and starts spitting errors via > printk. > > I'll do a make mrproper build too to make sure I didn't make any mistakes. Hello, Magnus, I have at least two other bugs that would be fatal. Having learned a bit more about PREEMPT_RT in this thread, I need to go off and dig through my code for similar problems. Thanx, Paul ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 2005-03-21 9:06 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 Ingo Molnar 2005-03-21 23:10 ` Magnus Naeslund(t) @ 2005-03-22 5:43 ` Paul E. McKenney 2005-03-22 7:24 ` Ingo Molnar 2005-03-22 8:59 ` Ingo Molnar 1 sibling, 2 replies; 92+ messages in thread From: Paul E. McKenney @ 2005-03-22 5:43 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel On Mon, Mar 21, 2005 at 10:06:22AM +0100, Ingo Molnar wrote: > > * Ingo Molnar <mingo@elte.hu> wrote: > > > > > * Ingo Molnar <mingo@elte.hu> wrote: > > > > > got this early-bootup crash on an SMP box: > > > > the same kernel image boots fine on an UP box, so it's an SMP bug. > > > > note that the same occurs with your latest (synchronization barrier) > > fixes applied as well. > > i've uploaded my current tree (-V0.7.41-01) to: > > http://redhat.com/~mingo/realtime-preempt/ > > it includes the latest round of RCU fixes - but doesnt solve the SMP > bootup crash. Hello, Ingo, Does the following help with the SMP problem? This fix and the earlier one make my old patch survive a few rounds of kernbench on a 4-CPU x86 box. (Yes, I am still being cowardly! But happy that the test system is alive again!) Without these fixes, it too dies during boot. Thanx, Paul diff -urpN -X dontdiff linux-2.6.11.fixes2/kernel/rcupdate.c linux-2.6.11.fixes3/kernel/rcupdate.c --- linux-2.6.11.fixes2/kernel/rcupdate.c Mon Mar 21 08:17:00 2005 +++ linux-2.6.11.fixes3/kernel/rcupdate.c Mon Mar 21 20:00:00 2005 @@ -633,7 +633,7 @@ void rcu_check_callbacks(int cpu, int us { if ((unsigned long)(jiffies - rcu_ctrlblk.last_sk) > HZ/GRACE_PERIODS_PER_SEC) { - synchronize_kernel(); + _synchronize_kernel(); rcu_advance_callbacks(); rcu_process_callbacks(); } ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 2005-03-22 5:43 ` Paul E. McKenney @ 2005-03-22 7:24 ` Ingo Molnar 2005-03-22 9:23 ` Ingo Molnar 2005-03-22 8:59 ` Ingo Molnar 1 sibling, 1 reply; 92+ messages in thread From: Ingo Molnar @ 2005-03-22 7:24 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel * Paul E. McKenney <paulmck@us.ibm.com> wrote: > > it includes the latest round of RCU fixes - but doesnt solve the SMP > > bootup crash. > > Hello, Ingo, > > Does the following help with the SMP problem? This fix and the > earlier one make my old patch survive a few rounds of kernbench on a > 4-CPU x86 box. [...] does not seem to fix my testbox (see the crash log below). I've uploaded the 40-02 patch with both of your fixes (maybe i mismerged something somewhere). Does it boot on your box with PREEMPT_RT enabled? The patch order is: http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.11.tar.bz2 http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.12-rc1.bz2 http://redhat.com/~mingo/realtime-preempt/realtime-preempt-2.6.12-rc1-V0.7.41-02 Ingo NET: Registered protocol family 16 BUG: Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c0131bcc *pde = 00000000 Oops: 0002 [#1] PREEMPT SMP Modules linked in: CPU: 1 EIP: 0060:[<c0131bcc>] Not tainted VLI EFLAGS: 00010297 (2.6.12-rc1-RT-V0.7.41-01) EIP is at rcu_advance_callbacks+0x3c/0x80 eax: 00000000 ebx: c050f280 ecx: c12191e0 edx: 00000000 esi: c1341c64 edi: c1341be4 ebp: c1355dd0 esp: c1355dc8 ds: 007b es: 007b ss: 0068 preempt: 00000003 Process khelper (pid: 17, threadinfo=c1354000 task=c13538d0) Stack: 00000001 c12191e0 c1355de4 c0131c47 00000001 c1341bdc c13004d8 c1355e00 c017e529 c1341bdc c04d6e80 c1358006 fffffffe c1355e54 c1355e70 c0174aac c1341bdc c1355e50 c1355e4c c1355e54 c1358001 c1341bdc c04cf920 00000286 Call Trace: [<c010412f>] show_stack+0x7f/0xa0 (28) [<c01042da>] show_registers+0x16a/0x1e0 (56) [<c0104511>] die+0x101/0x190 (64) [<c0115862>] do_page_fault+0x442/0x680 (216) [<c0103d9b>] error_code+0x2b/0x30 (68) [<c0131c47>] call_rcu+0x37/0x70 (20) [<c017e529>] dput+0x139/0x210 (28) [<c0174aac>] __link_path_walk+0x9fc/0xf80 (112) [<c017507a>] link_path_walk+0x4a/0x130 (100) [<c017546e>] path_lookup+0x9e/0x1c0 (32) [<c01708c8>] open_exec+0x28/0x100 (100) [<c0171ae4>] do_execve+0x44/0x220 (36) [<c0101da2>] sys_execve+0x42/0xa0 (36) [<c0103315>] syscall_call+0x7/0xb (-8096) --------------------------- | preempt count: 00000004 ] | 4-level deep critical section nesting: ---------------------------------------- .. [<c0131c2f>] .... call_rcu+0x1f/0x70 .....[<c017e529>] .. ( <= dput+0x139/0x210) .. [<c0131ba3>] .... rcu_advance_callbacks+0x13/0x80 .....[<c0131c47>] .. ( <= call_rcu+0x37/0x70) .. [<c03ddeaa>] .... _raw_spin_lock_irqsave+0x1a/0xa0 .....[<c010444f>] .. ( <= die+0x3f/0x190) .. [<c013bac6>] .... print_traces+0x16/0x50 .....[<c010412f>] .. ( <= show_stack+0x7f/0xa0) Code: 00 00 e8 78 2d 0a 00 8b 0c 85 20 20 51 c0 bb 80 f2 50 c0 01 d9 f0 83 44 24 00 00 a1 88 19 52 c0 39 41 40 74 23 8b 41 44 8b 51 50 <89> 02 8b 41 48 c7 41 44 00 00 00 00 89 41 50 8d 41 44 89 41 48 (gdb) list *0xc0131bcc 0xc0131bcc is in rcu_advance_callbacks (kernel/rcupdate.c:586). 581 struct rcu_data *rdp; 582 583 rdp = &get_cpu_var(rcu_data); 584 smp_mb(); /* prevent sampling batch # before list removal. */ 585 if (rdp->batch != rcu_ctrlblk.batch) { 586 *rdp->donetail = rdp->waitlist; 587 rdp->donetail = rdp->waittail; 588 rdp->waitlist = NULL; 589 rdp->waittail = &rdp->waitlist; 590 rdp->batch = rcu_ctrlblk.batch; (gdb) ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 2005-03-22 7:24 ` Ingo Molnar @ 2005-03-22 9:23 ` Ingo Molnar 2005-03-22 9:32 ` Ingo Molnar 0 siblings, 1 reply; 92+ messages in thread From: Ingo Molnar @ 2005-03-22 9:23 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel * Ingo Molnar <mingo@elte.hu> wrote: > > Does the following help with the SMP problem? This fix and the > > earlier one make my old patch survive a few rounds of kernbench on a > > 4-CPU x86 box. [...] > > does not seem to fix my testbox (see the crash log below). [...] seems to be a true SMP race: when i boot with 1 CPU it doesnt trigger, the same kernel image and 2 CPUs triggers it on CPU#1. (CPU#0 is the boot CPU) Note that the timing of the crash is not deterministic (sometimes i get it during net startup, sometimes during ACPI startup), but it always crashes within rcu_advance_callbacks(). one difference between your tests and mine is that your kernel is doing _synchronize_kernel() from preempt-off sections (correct?), while my kernel with PREEMPT_RT does it on preemptable sections. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 2005-03-22 9:23 ` Ingo Molnar @ 2005-03-22 9:32 ` Ingo Molnar 2005-03-22 10:01 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-05 Ingo Molnar 2005-03-23 4:48 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 Paul E. McKenney 0 siblings, 2 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-22 9:32 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel * Ingo Molnar <mingo@elte.hu> wrote: > seems to be a true SMP race: when i boot with 1 CPU it doesnt trigger, > the same kernel image and 2 CPUs triggers it on CPU#1. (CPU#0 is the > boot CPU) Note that the timing of the crash is not deterministic > (sometimes i get it during net startup, sometimes during ACPI > startup), but it always crashes within rcu_advance_callbacks(). > > one difference between your tests and mine is that your kernel is > doing _synchronize_kernel() from preempt-off sections (correct?), > while my kernel with PREEMPT_RT does it on preemptable sections. hm, another thing: i think call_rcu() needs to take the read-lock. Right now it assumes that it has the data structure private, but that's only statistically true on PREEMPT_RT: another CPU may have this CPU's RCU control structure in use. So IRQs-off (or preempt-off) is not a guarantee to have the data structure, the read lock has to be taken. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-05 2005-03-22 9:32 ` Ingo Molnar @ 2005-03-22 10:01 ` Ingo Molnar 2005-03-22 11:28 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 Ingo Molnar 2005-03-23 5:23 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-05 Paul E. McKenney 2005-03-23 4:48 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 Paul E. McKenney 1 sibling, 2 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-22 10:01 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel * Ingo Molnar <mingo@elte.hu> wrote: > hm, another thing: i think call_rcu() needs to take the read-lock. > Right now it assumes that it has the data structure private, but > that's only statistically true on PREEMPT_RT: another CPU may have > this CPU's RCU control structure in use. So IRQs-off (or preempt-off) > is not a guarantee to have the data structure, the read lock has to be > taken. i've reworked the code to use the read-lock to access the per-CPU data RCU structures, and it boots with 2 CPUs and PREEMPT_RT now. The -40-05 patch can be downloaded from the usual place: http://redhat.com/~mingo/realtime-preempt/ had to add two hacks though: static void rcu_advance_callbacks(struct rcu_data *rdp) { if (rdp->batch != rcu_ctrlblk.batch) { if (rdp->donetail) // HACK *rdp->donetail = rdp->waitlist; ... void fastcall call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu)) [...] rcu_advance_callbacks(rdp); if (rdp->waittail) // HACK *rdp->waittail = head; ... without them it crashes during bootup. maybe we are better off with the completely unlocked read path and the long grace periods. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-22 10:01 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-05 Ingo Molnar @ 2005-03-22 11:28 ` Ingo Molnar 2005-03-22 15:06 ` K.R. Foley ` (2 more replies) 2005-03-23 5:23 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-05 Paul E. McKenney 1 sibling, 3 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-22 11:28 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel, Esben Nielsen * Ingo Molnar <mingo@elte.hu> wrote: > > * Ingo Molnar <mingo@elte.hu> wrote: > > > hm, another thing: i think call_rcu() needs to take the read-lock. > > Right now it assumes that it has the data structure private, but > > that's only statistically true on PREEMPT_RT: another CPU may have > > this CPU's RCU control structure in use. So IRQs-off (or preempt-off) > > is not a guarantee to have the data structure, the read lock has to be > > taken. > > i've reworked the code to use the read-lock to access the per-CPU data > RCU structures, and it boots with 2 CPUs and PREEMPT_RT now. The > -40-05 patch can be downloaded from the usual place: bah, it's leaking dentries at a massive scale. I'm giving up on this variant for the time being and have gone towards a much simpler variant, implemented in the -40-07 patch at: http://redhat.com/~mingo/realtime-preempt/ it's along the lines of Esben's patch, but with the conceptual bug fixed via the rcu_read_lock_nesting code from Paul's patch. there's a new CONFIG_PREEMPT_RCU option. (always-enabled on PREEMPT_RT) It builds & boots fine on my 2-way box, doesnt leak dentries and networking is up and running. first question, (ignoring the grace priod problem) is this a correct RCU implementation? The critical scenario is when a task gets migrated to another CPU, so that current->rcu_data is that of another CPU's. That is why ->active_readers is an atomic variable now. [ Note that while ->active_readers may be decreased from another CPU, it's always increased on the current CPU, so when a preemption-off section determines that a quiescent state has passed that determination stays true up until it enables preemption again. This is needed for correct callback processing. ] this implementation has the 'long grace periods' problem. Starvation should only be possible if a system has zero idle time for a long period of time, and even then it needs the permanent starvation of involuntarily preempted rcu-read-locked tasks. Is there any way to force such a situation? (which would turn this into a DoS) [ in OOM situations we could force quiescent state by walking all tasks and checking for nonzero ->rcu_read_lock_nesting values and priority boosting affected tasks (to RT prio 99 or RT prio 1), which they'd automatically drop when they decrease their rcu_read_lock_nesting counter to zero. ] Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-22 11:28 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 Ingo Molnar @ 2005-03-22 15:06 ` K.R. Foley 2005-03-22 18:05 ` Magnus Naeslund(t) 2005-03-23 6:16 ` Paul E. McKenney 2 siblings, 0 replies; 92+ messages in thread From: K.R. Foley @ 2005-03-22 15:06 UTC (permalink / raw) To: Ingo Molnar; +Cc: Paul E. McKenney, linux-kernel, Esben Nielsen Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > >>* Ingo Molnar <mingo@elte.hu> wrote: >> >> >>>hm, another thing: i think call_rcu() needs to take the read-lock. >>>Right now it assumes that it has the data structure private, but >>>that's only statistically true on PREEMPT_RT: another CPU may have >>>this CPU's RCU control structure in use. So IRQs-off (or preempt-off) >>>is not a guarantee to have the data structure, the read lock has to be >>>taken. >> >>i've reworked the code to use the read-lock to access the per-CPU data >>RCU structures, and it boots with 2 CPUs and PREEMPT_RT now. The >>-40-05 patch can be downloaded from the usual place: > > > bah, it's leaking dentries at a massive scale. I'm giving up on this > variant for the time being and have gone towards a much simpler variant, > implemented in the -40-07 patch at: > > http://redhat.com/~mingo/realtime-preempt/ > Actually this is more correctly related to -RT-2.6.12-rc1-V0.7.41-08. This one boots on one of my SMP systems (dual 2.6 GHz Zeon) doesn't on the other (dual PIII 933 MHz). Unfortunately not close to the one that doesn't boot right now so can't debug it at all. Was still building on the UP box, but now that one seems to have gone down in flames. This system was running -RT-2.6.12-rc1-V0.7.41-00 while trying to build -RT-2.6.12-rc1-V0.7.41-07. :( -- kr ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-22 11:28 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 Ingo Molnar 2005-03-22 15:06 ` K.R. Foley @ 2005-03-22 18:05 ` Magnus Naeslund(t) 2005-03-23 6:16 ` Paul E. McKenney 2 siblings, 0 replies; 92+ messages in thread From: Magnus Naeslund(t) @ 2005-03-22 18:05 UTC (permalink / raw) To: Ingo Molnar, linux-kernel Ingo Molnar wrote: > > bah, it's leaking dentries at a massive scale. I'm giving up on this > variant for the time being and have gone towards a much simpler variant, > implemented in the -40-07 patch at: > > http://redhat.com/~mingo/realtime-preempt/ > I downloaded your V0.7.41-08 patch and it seems to have solved my problem, atleast at a first glance. The routing cache is now shrinkable with: echo "0" > /proc/sys/net/ipv4/route/flush That didn't work before. I'll leave the server for a while and see if it overflows again... Regards, Magnus ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-22 11:28 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 Ingo Molnar 2005-03-22 15:06 ` K.R. Foley 2005-03-22 18:05 ` Magnus Naeslund(t) @ 2005-03-23 6:16 ` Paul E. McKenney 2005-03-23 6:33 ` Ingo Molnar 2005-03-23 9:40 ` Herbert Xu 2 siblings, 2 replies; 92+ messages in thread From: Paul E. McKenney @ 2005-03-23 6:16 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Esben Nielsen On Tue, Mar 22, 2005 at 12:28:56PM +0100, Ingo Molnar wrote: > > * Ingo Molnar <mingo@elte.hu> wrote: > > > > > * Ingo Molnar <mingo@elte.hu> wrote: > > > > > hm, another thing: i think call_rcu() needs to take the read-lock. > > > Right now it assumes that it has the data structure private, but > > > that's only statistically true on PREEMPT_RT: another CPU may have > > > this CPU's RCU control structure in use. So IRQs-off (or preempt-off) > > > is not a guarantee to have the data structure, the read lock has to be > > > taken. > > > > i've reworked the code to use the read-lock to access the per-CPU data > > RCU structures, and it boots with 2 CPUs and PREEMPT_RT now. The > > -40-05 patch can be downloaded from the usual place: > > bah, it's leaking dentries at a massive scale. I'm giving up on this > variant for the time being and have gone towards a much simpler variant, > implemented in the -40-07 patch at: > > http://redhat.com/~mingo/realtime-preempt/ > > it's along the lines of Esben's patch, but with the conceptual bug fixed > via the rcu_read_lock_nesting code from Paul's patch. Fair enough! > there's a new CONFIG_PREEMPT_RCU option. (always-enabled on PREEMPT_RT) > It builds & boots fine on my 2-way box, doesnt leak dentries and > networking is up and running. > > first question, (ignoring the grace priod problem) is this a correct RCU > implementation? The critical scenario is when a task gets migrated to > another CPU, so that current->rcu_data is that of another CPU's. That is > why ->active_readers is an atomic variable now. [ Note that while > ->active_readers may be decreased from another CPU, it's always > increased on the current CPU, so when a preemption-off section > determines that a quiescent state has passed that determination stays > true up until it enables preemption again. This is needed for correct > callback processing. ] +#ifdef CONFIG_PREEMPT_RCU + +void rcu_read_lock(void) +{ + if (current->rcu_read_lock_nesting++ == 0) { + current->rcu_data = &get_cpu_var(rcu_data); + atomic_inc(¤t->rcu_data->active_readers); + put_cpu_var(rcu_data); Need an smp_mb() here for non-x86 CPUs. Otherwise, the CPU can re-order parts of the critical section to precede the rcu_read_lock(). Could precede the put_cpu_var(), but why increase latency? + } +} +EXPORT_SYMBOL(rcu_read_lock); + +void rcu_read_unlock(void) +{ + int cpu; + + if (--current->rcu_read_lock_nesting == 0) { + atomic_dec(¤t->rcu_data->active_readers); + /* + * Check whether we have reached quiescent state. + * Note! This is only for the local CPU, not for + * current->rcu_data's CPU [which typically is the + * current CPU, but may also be another CPU]. + */ + cpu = get_cpu(); And need an smp_mb() here, again for non-x86 CPUs. + rcu_qsctr_inc(cpu); + put_cpu(); + } +} +EXPORT_SYMBOL(rcu_read_unlock); + +#endif Assuming that the memory barriers are added, I can see a bunch of ways for races to extend grace periods, but none so far that result in the fatal too-short grace period. Since rcu_qsctr_inc() refuses to increment the quiescent-state counter on any CPU that started an RCU read-side critical section that has not yet completed, any long critical section will have a corresponding CPU that will refuse to go through a quiescent state. And that will prevent the grace period from completing. > this implementation has the 'long grace periods' problem. Starvation > should only be possible if a system has zero idle time for a long period > of time, and even then it needs the permanent starvation of > involuntarily preempted rcu-read-locked tasks. Is there any way to force > such a situation? (which would turn this into a DoS) If the sum of all the RCU read-side critical sections consumed much more than 1/N of the CPU time, where N is the number of CPUs, then you would see infinite grace periods just by statistics. But my guess is that you would be in the tens of CPUs before hitting this. That said, I have never measured this, since there has not been a reason to care. > [ in OOM situations we could force quiescent state by walking all tasks > and checking for nonzero ->rcu_read_lock_nesting values and priority > boosting affected tasks (to RT prio 99 or RT prio 1), which they'd > automatically drop when they decrease their rcu_read_lock_nesting > counter to zero. ] Makes sense to me. If there are too-long RCU read-side critical sections, they could cause other tasks to miss deadlines once boosted. Of course, if RCU read-side critical sections are all short enough, you would already just be disabling preemption across all of them. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 6:16 ` Paul E. McKenney @ 2005-03-23 6:33 ` Ingo Molnar 2005-03-23 6:37 ` Ingo Molnar ` (3 more replies) 2005-03-23 9:40 ` Herbert Xu 1 sibling, 4 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-23 6:33 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel, Esben Nielsen * Paul E. McKenney <paulmck@us.ibm.com> wrote: > +#ifdef CONFIG_PREEMPT_RCU > + > +void rcu_read_lock(void) > +{ > + if (current->rcu_read_lock_nesting++ == 0) { > + current->rcu_data = &get_cpu_var(rcu_data); > + atomic_inc(¤t->rcu_data->active_readers); > + put_cpu_var(rcu_data); > > Need an smp_mb() here for non-x86 CPUs. Otherwise, the CPU can > re-order parts of the critical section to precede the rcu_read_lock(). > Could precede the put_cpu_var(), but why increase latency? ok. It's enough to put a barrier into the else branch here, because the atomic op in the main brain is a barrier by itself. > +void rcu_read_unlock(void) > +{ [...] > And need an smp_mb() here, again for non-x86 CPUs. ok. > Assuming that the memory barriers are added, I can see a bunch of ways > for races to extend grace periods, but none so far that result in the > fatal too-short grace period. Since rcu_qsctr_inc() refuses to > increment the quiescent-state counter on any CPU that started an RCU > read-side critical section that has not yet completed, any long > critical section will have a corresponding CPU that will refuse to go > through a quiescent state. And that will prevent the grace period > from completing. i'm worried about the following scenario: what happens when a task is migrated from CPU#1 to CPU#2, while in an RCU read section that it acquired on CPU#1, and queues a callback. E.g. d_free() does a call_rcu(), to queue the freeing of the dentry. That callback will be queued on CPU#2 - while the task still keeps current->rcu_data of CPU#1. It also means that CPU#2's read counter did _not_ get increased - and a too short grace period may occur. it seems to me that that only safe method is to pick an 'RCU CPU' when first entering the read section, and then sticking to it, no matter where the task gets migrated to. Or to 'migrate' the +1 read count from one CPU to the other, within the scheduler. the 'migrate read count' solution seems more promising, as it would keep other parts of the RCU code unchanged. [ But it seems to break the nice 'flip pointers' method you found to force a grace period. If a 'read section' can migrate from one CPU to another then it can migrate back as well, at which point it cannot have the 'old' pointer. Maybe it would still work better than no flip pointers. ] Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 6:33 ` Ingo Molnar @ 2005-03-23 6:37 ` Ingo Molnar 2005-03-24 6:06 ` Paul E. McKenney 2005-03-23 7:16 ` Ingo Molnar ` (2 subsequent siblings) 3 siblings, 1 reply; 92+ messages in thread From: Ingo Molnar @ 2005-03-23 6:37 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel, Esben Nielsen * Ingo Molnar <mingo@elte.hu> wrote: > the 'migrate read count' solution seems more promising, as it would > keep other parts of the RCU code unchanged. [ But it seems to break > the nice 'flip pointers' method you found to force a grace period. If > a 'read section' can migrate from one CPU to another then it can > migrate back as well, at which point it cannot have the 'old' pointer. > Maybe it would still work better than no flip pointers. ] the flip pointer method could be made to work if we had a NR_CPUS array of 'current RCU pointer' values attached to the task - and that array would be cleared if the task exits the read section. But this has memory usage worries with large NR_CPUS. (full clearing of the array can be avoided by using some sort of 'read section generation' counter attached to each pointer) Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 6:37 ` Ingo Molnar @ 2005-03-24 6:06 ` Paul E. McKenney 0 siblings, 0 replies; 92+ messages in thread From: Paul E. McKenney @ 2005-03-24 6:06 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Esben Nielsen On Wed, Mar 23, 2005 at 07:37:27AM +0100, Ingo Molnar wrote: > > * Ingo Molnar <mingo@elte.hu> wrote: > > > the 'migrate read count' solution seems more promising, as it would > > keep other parts of the RCU code unchanged. [ But it seems to break > > the nice 'flip pointers' method you found to force a grace period. If > > a 'read section' can migrate from one CPU to another then it can > > migrate back as well, at which point it cannot have the 'old' pointer. > > Maybe it would still work better than no flip pointers. ] > > the flip pointer method could be made to work if we had a NR_CPUS array > of 'current RCU pointer' values attached to the task - and that array > would be cleared if the task exits the read section. But this has memory > usage worries with large NR_CPUS. (full clearing of the array can be > avoided by using some sort of 'read section generation' counter attached > to each pointer) The only per-task data you need to maintain is a single pointer to the per-CPU counter that was incremented by the outermost rcu_read_lock(). You also need a global index, call it "rcu_current_ctr_set" (or come up with a better name!) along with a per-CPU pair of counters: struct rcu_ctr_set { /* or maybe there is a way to drop array */ atomic_t ctr[2]; /* into DECLARE_PER_CPU() without a struct */ } /* wrapper... */ int rcu_curset = 0; DEFINE_PER_CPU(struct rcu_ctr_set, rcu_ctr_set) = { 0, 0 }; You need two fields in the task structure: atomic_t *rcu_preempt_ctr; int rcu_nesting; Then you have something like: void rcu_read_lock(void) { if (current->rcu_nesting++ == 0) { preempt_disable(); current->rcu_preempt_ctr = &__get_cpu_var(rcu_ctr_set).ctr[rcu_curset]; atomic_inc(current->rcu_preempt_ctr); preempt_enable(); smp_mb(); } } void rcu_read_unlock(void) { if (--current->rcu_nesting == 0) { smb_mb(); /* might only need smp_wmb()... */ atomic_dec(current->rcu_preempt_ctr); current->rcu_preempt_ctr = NULL; /* for debug */ } } One can then force a grace period via something like the following, but only if you know that all of the rcu_ctr_set.ctr[!current] are zero: void _synchronize_kernel(void) { int cpu; spin_lock(&rcu_mutex); rcu_curset = !rcu_curset; for (;;) { for_each_cpu(cpu) { if (atomic_read(&__get_cpu_var(rcu_ctr_set).ctr[!rcu_curset]) != 0) { /* yield CPU for a bit */ continue; } } } spin_unlock(&rcu_mutex); } In real life, you need a way to multiplex multiple calls to _synchronize_kernel() into a single counter-flip event, by setting up callbacks. And so on... The above stolen liberally from your patch and from my memories of Dipankar's RCU patch for CONFIG_PREEMPT kernels. Guaranteed to have horrible bugs. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 6:33 ` Ingo Molnar 2005-03-23 6:37 ` Ingo Molnar @ 2005-03-23 7:16 ` Ingo Molnar 2005-03-23 7:54 ` Steven Rostedt ` (3 more replies) 2005-03-23 9:38 ` Herbert Xu 2005-03-24 5:28 ` Paul E. McKenney 3 siblings, 4 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-23 7:16 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel, Esben Nielsen * Ingo Molnar <mingo@elte.hu> wrote: > That callback will be queued on CPU#2 - while the task still keeps > current->rcu_data of CPU#1. It also means that CPU#2's read counter > did _not_ get increased - and a too short grace period may occur. > > it seems to me that that only safe method is to pick an 'RCU CPU' when > first entering the read section, and then sticking to it, no matter > where the task gets migrated to. Or to 'migrate' the +1 read count > from one CPU to the other, within the scheduler. i think the 'migrate read-count' method is not adequate either, because all callbacks queued within an RCU read section must be called after the lock has been dropped - while with the migration method CPU#1 would be free to process callbacks queued in the RCU read section still active on CPU#2. i'm wondering how much of a problem this is though. Can there be stale pointers at that point? Yes in theory, because code like: rcu_read_lock(); call_rcu(&dentry->d_rcu, d_callback); func(dentry->whatever); rcu_read_unlock(); would be unsafe because the pointer is still accessed within the RCU read section, and if we get migrated from CPU#1 to CPU#2 after call_rcu but before dentry->whatever dereference, the callback may be processed early by CPU#1, making the dentry->whatever read operation unsafe. the question is, does this occur in practice? Does existing RCU-using code use pointers it has queued for freeing, relying on the fact that the callback wont be processed until we drop the RCU read lock? Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 7:16 ` Ingo Molnar @ 2005-03-23 7:54 ` Steven Rostedt 2005-03-23 7:58 ` Ingo Molnar 2005-03-23 8:29 ` Peter Zijlstra ` (2 subsequent siblings) 3 siblings, 1 reply; 92+ messages in thread From: Steven Rostedt @ 2005-03-23 7:54 UTC (permalink / raw) To: Ingo Molnar; +Cc: Paul E. McKenney, linux-kernel, Esben Nielsen On Wed, 23 Mar 2005, Ingo Molnar wrote: > > * Ingo Molnar <mingo@elte.hu> wrote: > > > That callback will be queued on CPU#2 - while the task still keeps > > current->rcu_data of CPU#1. It also means that CPU#2's read counter > > did _not_ get increased - and a too short grace period may occur. > > > > it seems to me that that only safe method is to pick an 'RCU CPU' when > > first entering the read section, and then sticking to it, no matter > > where the task gets migrated to. Or to 'migrate' the +1 read count > > from one CPU to the other, within the scheduler. > > i think the 'migrate read-count' method is not adequate either, because > all callbacks queued within an RCU read section must be called after the > lock has been dropped - while with the migration method CPU#1 would be > free to process callbacks queued in the RCU read section still active on > CPU#2. > Hi Ingo, Although you can't disable preemption for the duration of the rcu_readlock, what about pinning the process to a CPU while it has the lock. Would this help solve the migration issue? -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 7:54 ` Steven Rostedt @ 2005-03-23 7:58 ` Ingo Molnar 0 siblings, 0 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-23 7:58 UTC (permalink / raw) To: Steven Rostedt; +Cc: Paul E. McKenney, linux-kernel, Esben Nielsen * Steven Rostedt <rostedt@goodmis.org> wrote: > > i think the 'migrate read-count' method is not adequate either, because > > all callbacks queued within an RCU read section must be called after the > > lock has been dropped - while with the migration method CPU#1 would be > > free to process callbacks queued in the RCU read section still active on > > CPU#2. > > Although you can't disable preemption for the duration of the > rcu_readlock, what about pinning the process to a CPU while it has the > lock. Would this help solve the migration issue? yes, but that's rather gross. PREEMPT_BKL was opposed by Linus precisely because it used such a method - once that was fixed, PREEMPT_BKL got accepted. It also limits the execution flow quite much. I'd rather not do it if there's any other method. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 7:16 ` Ingo Molnar 2005-03-23 7:54 ` Steven Rostedt @ 2005-03-23 8:29 ` Peter Zijlstra 2005-03-23 9:03 ` Ingo Molnar 2005-03-23 21:46 ` Ingo Molnar 2005-03-24 6:38 ` Paul E. McKenney 3 siblings, 1 reply; 92+ messages in thread From: Peter Zijlstra @ 2005-03-23 8:29 UTC (permalink / raw) To: Ingo Molnar; +Cc: Paul E. McKenney, Linux-kernel, Esben Nielsen On Wed, 2005-03-23 at 08:16 +0100, Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > > That callback will be queued on CPU#2 - while the task still keeps > > current->rcu_data of CPU#1. It also means that CPU#2's read counter > > did _not_ get increased - and a too short grace period may occur. > > > > it seems to me that that only safe method is to pick an 'RCU CPU' when > > first entering the read section, and then sticking to it, no matter > > where the task gets migrated to. Or to 'migrate' the +1 read count > > from one CPU to the other, within the scheduler. > > i think the 'migrate read-count' method is not adequate either, because > all callbacks queued within an RCU read section must be called after the > lock has been dropped - while with the migration method CPU#1 would be > free to process callbacks queued in the RCU read section still active on > CPU#2. > how about keeping the rcu callback list in process context and only splice it to a global (per cpu) list on rcu_read_unlock? Kind regrads, Peter Zijlstra -- Peter Zijlstra <peter@programming.kicks-ass.net> ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 8:29 ` Peter Zijlstra @ 2005-03-23 9:03 ` Ingo Molnar 2005-03-24 6:45 ` Paul E. McKenney 0 siblings, 1 reply; 92+ messages in thread From: Ingo Molnar @ 2005-03-23 9:03 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Paul E. McKenney, Linux-kernel, Esben Nielsen * Peter Zijlstra <peter@programming.kicks-ass.net> wrote: > > i think the 'migrate read-count' method is not adequate either, because > > all callbacks queued within an RCU read section must be called after the > > lock has been dropped - while with the migration method CPU#1 would be > > free to process callbacks queued in the RCU read section still active on > > CPU#2. > > how about keeping the rcu callback list in process context and only > splice it to a global (per cpu) list on rcu_read_unlock? hm, that would indeed solve this problem. It would also solve the grace period problem: all callbacks in the global (per-CPU) list are immediately processable. Paul? Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 9:03 ` Ingo Molnar @ 2005-03-24 6:45 ` Paul E. McKenney 0 siblings, 0 replies; 92+ messages in thread From: Paul E. McKenney @ 2005-03-24 6:45 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, Linux-kernel, Esben Nielsen On Wed, Mar 23, 2005 at 10:03:41AM +0100, Ingo Molnar wrote: > > * Peter Zijlstra <peter@programming.kicks-ass.net> wrote: > > > > i think the 'migrate read-count' method is not adequate either, because > > > all callbacks queued within an RCU read section must be called after the > > > lock has been dropped - while with the migration method CPU#1 would be > > > free to process callbacks queued in the RCU read section still active on > > > CPU#2. > > > > how about keeping the rcu callback list in process context and only > > splice it to a global (per cpu) list on rcu_read_unlock? > > hm, that would indeed solve this problem. It would also solve the grace > period problem: all callbacks in the global (per-CPU) list are > immediately processable. Paul? If I understand the proposal, it would break in the following situation (lifted from earlier email): rcu_read_lock(); list_for_each_entry_rcu(p, head, list) { if (unlikely(p->status == DELETE_ME)) { spin_lock(&p->mutex); if (likely(p->status == DELETE_ME)) { p->status = DELETED; list_rcu(&p->list); call_rcu(&p->rcu, sublist_finalize_rcu); } spin_unlock(&p->mutex); } } rcu_read_unlock(); Here, sublist_finalize_rcu() just finds the front of the block and kfree()s it. Here is the scenario: CPU 1 CPU 2 task 1 does rcu_read lock task 2 does rcu_read_lock task 1 sees DELETE_ME task 2 sees DELETE_ME task 1 acquires the lock task 2 blocks/spins on lock task 1 does call_rcu task 1 releases lock task 1 does rcu_read_unlock() task 2 acquires lock RCU puts the callback on global list RCU invokes callback, kfree()!!! task 2 now sees garbage!!! Callbacks cannot be invoked until all RCU read-side critical sections that were in process at the time of the rcu_callback() have all completed. Thanx, Paul ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 7:16 ` Ingo Molnar 2005-03-23 7:54 ` Steven Rostedt 2005-03-23 8:29 ` Peter Zijlstra @ 2005-03-23 21:46 ` Ingo Molnar 2005-03-24 6:59 ` Paul E. McKenney 2005-03-24 6:38 ` Paul E. McKenney 3 siblings, 1 reply; 92+ messages in thread From: Ingo Molnar @ 2005-03-23 21:46 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel, Esben Nielsen * Ingo Molnar <mingo@elte.hu> wrote: > i think the 'migrate read-count' method is not adequate either, > because all callbacks queued within an RCU read section must be called > after the lock has been dropped - while with the migration method > CPU#1 would be free to process callbacks queued in the RCU read > section still active on CPU#2. > > i'm wondering how much of a problem this is though. Can there be stale > pointers at that point? Yes in theory, because code like: > > rcu_read_lock(); > call_rcu(&dentry->d_rcu, d_callback); > func(dentry->whatever); > rcu_read_unlock(); but, this cannot happen, because call_rcu() is used by RCU-write code. so the important property seems to be that any active RCU-read section should keep at least one CPU's active_readers count elevated permanently, for the duration of the RCU-read section. It doesnt matter that the reader migrates between CPUs - because the RCU code itself guarantees that no callbacks will be executed until _all_ CPUs have been in quiescent state. I.e. all CPUs have to go through a zero active_readers value before the callback is done. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 21:46 ` Ingo Molnar @ 2005-03-24 6:59 ` Paul E. McKenney 0 siblings, 0 replies; 92+ messages in thread From: Paul E. McKenney @ 2005-03-24 6:59 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Esben Nielsen On Wed, Mar 23, 2005 at 10:46:45PM +0100, Ingo Molnar wrote: > > * Ingo Molnar <mingo@elte.hu> wrote: > > > i think the 'migrate read-count' method is not adequate either, > > because all callbacks queued within an RCU read section must be called > > after the lock has been dropped - while with the migration method > > CPU#1 would be free to process callbacks queued in the RCU read > > section still active on CPU#2. > > > > i'm wondering how much of a problem this is though. Can there be stale > > pointers at that point? Yes in theory, because code like: > > > > rcu_read_lock(); > > call_rcu(&dentry->d_rcu, d_callback); > > func(dentry->whatever); > > rcu_read_unlock(); > > but, this cannot happen, because call_rcu() is used by RCU-write code. The code would not look exactly like this, but acquiring the update-side lock inside an RCU read-side critical section can be thought of as a reader-to-writer lock upgrade. RCU can do this unconditionally, which was one of the walls I was banging my head against when trying to think up a realtime-safe RCU implementation. So something like the following would be legitimate RCU code: rcu_read_lock(); spin_lock(&dcache_lock); call_rcu(&dentry->d_rcu, d_callback); spin_unlock(&dcache_lock); rcu_read_unlock(); The spin_lock() call upgrades from a read-side to a write-side critical section. > so the important property seems to be that any active RCU-read section > should keep at least one CPU's active_readers count elevated > permanently, for the duration of the RCU-read section. Yep! > It doesnt matter > that the reader migrates between CPUs - because the RCU code itself > guarantees that no callbacks will be executed until _all_ CPUs have been > in quiescent state. I.e. all CPUs have to go through a zero > active_readers value before the callback is done. Yep again! Thanx, Paul ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 7:16 ` Ingo Molnar ` (2 preceding siblings ...) 2005-03-23 21:46 ` Ingo Molnar @ 2005-03-24 6:38 ` Paul E. McKenney 3 siblings, 0 replies; 92+ messages in thread From: Paul E. McKenney @ 2005-03-24 6:38 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Esben Nielsen On Wed, Mar 23, 2005 at 08:16:04AM +0100, Ingo Molnar wrote: > > * Ingo Molnar <mingo@elte.hu> wrote: > > > That callback will be queued on CPU#2 - while the task still keeps > > current->rcu_data of CPU#1. It also means that CPU#2's read counter > > did _not_ get increased - and a too short grace period may occur. > > > > it seems to me that that only safe method is to pick an 'RCU CPU' when > > first entering the read section, and then sticking to it, no matter > > where the task gets migrated to. Or to 'migrate' the +1 read count > > from one CPU to the other, within the scheduler. > > i think the 'migrate read-count' method is not adequate either, because > all callbacks queued within an RCU read section must be called after the > lock has been dropped - while with the migration method CPU#1 would be > free to process callbacks queued in the RCU read section still active on > CPU#2. Right -- the limitation is that you cannot declare a grace period over until -all- RCU read-side critical sections that started before the call_rcu() have completed. > i'm wondering how much of a problem this is though. Can there be stale > pointers at that point? Yes in theory, because code like: > > rcu_read_lock(); > call_rcu(&dentry->d_rcu, d_callback); > func(dentry->whatever); > rcu_read_unlock(); In this code segment, a correct RCU will not invoke the callback until after the rcu_read_unlock(). Ugly code, though. But see below... > would be unsafe because the pointer is still accessed within the RCU > read section, and if we get migrated from CPU#1 to CPU#2 after call_rcu > but before dentry->whatever dereference, the callback may be processed > early by CPU#1, making the dentry->whatever read operation unsafe. > > the question is, does this occur in practice? Does existing RCU-using > code use pointers it has queued for freeing, relying on the fact that > the callback wont be processed until we drop the RCU read lock? Using a pointer that had been passed to call_rcu() would be in theory safe, but I would usually object to it. In most cases, it would cause confusion at the very least. However, there are exceptions: rcu_read_lock(); list_for_each_entry_rcu(p, head, list) { if (unlikely(p->status == DELETE_ME)) { spin_lock(&p->mutex); if (likely(p->status == DELETE_ME)) { p->status = DELETED; list_rcu(&p->list); call_rcu(&p->rcu, sublist_finalize_rcu); } spin_unlock(&p->mutex); } } rcu_read_unlock(); You can drop the spinlock before the call_rcu() if you want, but doing so makes the code uglier. Besides, you have to allow for the fact that another instance of this same code might be running on the same list on some other CPU. You cannot invoke the callback until -both- CPUs/tasks have exited their RCU read-side critical sections. Thanx, Paul ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 6:33 ` Ingo Molnar 2005-03-23 6:37 ` Ingo Molnar 2005-03-23 7:16 ` Ingo Molnar @ 2005-03-23 9:38 ` Herbert Xu 2005-03-23 9:49 ` Herbert Xu 2005-03-24 5:28 ` Paul E. McKenney 3 siblings, 1 reply; 92+ messages in thread From: Herbert Xu @ 2005-03-23 9:38 UTC (permalink / raw) To: Ingo Molnar; +Cc: paulmck, linux-kernel, simlo Ingo Molnar <mingo@elte.hu> wrote: > > * Paul E. McKenney <paulmck@us.ibm.com> wrote: > >> +#ifdef CONFIG_PREEMPT_RCU >> + >> +void rcu_read_lock(void) >> +{ >> + if (current->rcu_read_lock_nesting++ == 0) { >> + current->rcu_data = &get_cpu_var(rcu_data); >> + atomic_inc(¤t->rcu_data->active_readers); >> + put_cpu_var(rcu_data); >> >> Need an smp_mb() here for non-x86 CPUs. Otherwise, the CPU can >> re-order parts of the critical section to precede the rcu_read_lock(). >> Could precede the put_cpu_var(), but why increase latency? > > ok. It's enough to put a barrier into the else branch here, because the > atomic op in the main brain is a barrier by itself. Since the else branch is only taken when rcu_read_lock_nesting > 0, do we need the barrier at all? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 9:38 ` Herbert Xu @ 2005-03-23 9:49 ` Herbert Xu 2005-03-24 6:52 ` Paul E. McKenney 0 siblings, 1 reply; 92+ messages in thread From: Herbert Xu @ 2005-03-23 9:49 UTC (permalink / raw) To: Ingo Molnar; +Cc: paulmck, linux-kernel, simlo On Wed, Mar 23, 2005 at 08:38:11PM +1100, Herbert Xu wrote: > > > ok. It's enough to put a barrier into the else branch here, because the > > atomic op in the main brain is a barrier by itself. > > Since the else branch is only taken when rcu_read_lock_nesting > 0, do > we need the barrier at all? Actually, since atomic_inc isn't a barrier, we do need that mb. However, it should only be necessary in the main branch and we can use smp_mb__after_atomic_inc which is optimised away on a number of architectures (i386 in particular). Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 9:49 ` Herbert Xu @ 2005-03-24 6:52 ` Paul E. McKenney 0 siblings, 0 replies; 92+ messages in thread From: Paul E. McKenney @ 2005-03-24 6:52 UTC (permalink / raw) To: Herbert Xu; +Cc: Ingo Molnar, linux-kernel, simlo On Wed, Mar 23, 2005 at 08:49:54PM +1100, Herbert Xu wrote: > On Wed, Mar 23, 2005 at 08:38:11PM +1100, Herbert Xu wrote: > > > > > ok. It's enough to put a barrier into the else branch here, because the > > > atomic op in the main brain is a barrier by itself. > > > > Since the else branch is only taken when rcu_read_lock_nesting > 0, do > > we need the barrier at all? > > Actually, since atomic_inc isn't a barrier, we do need that mb. > However, it should only be necessary in the main branch and we > can use smp_mb__after_atomic_inc which is optimised away on a > number of architectures (i386 in particular). You are right, I should have said smp_mb__after_atomic_inc() instead of smp_mb() in the rcu_read_lock() case and smp_mb__after_atomic_dec() in the rcu_read_unlock() case. Thanx, Paul ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 6:33 ` Ingo Molnar ` (2 preceding siblings ...) 2005-03-23 9:38 ` Herbert Xu @ 2005-03-24 5:28 ` Paul E. McKenney 2005-03-24 5:34 ` Ingo Molnar 3 siblings, 1 reply; 92+ messages in thread From: Paul E. McKenney @ 2005-03-24 5:28 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Esben Nielsen On Wed, Mar 23, 2005 at 07:33:17AM +0100, Ingo Molnar wrote: > > * Paul E. McKenney <paulmck@us.ibm.com> wrote: > > > +#ifdef CONFIG_PREEMPT_RCU > > + > > +void rcu_read_lock(void) > > +{ > > + if (current->rcu_read_lock_nesting++ == 0) { > > + current->rcu_data = &get_cpu_var(rcu_data); > > + atomic_inc(¤t->rcu_data->active_readers); > > + put_cpu_var(rcu_data); > > > > Need an smp_mb() here for non-x86 CPUs. Otherwise, the CPU can > > re-order parts of the critical section to precede the rcu_read_lock(). > > Could precede the put_cpu_var(), but why increase latency? > > ok. It's enough to put a barrier into the else branch here, because the > atomic op in the main brain is a barrier by itself. For x86, the atomic op is a barrier, but not for other architectures. You don't need a barrier in the else branch, because in that case you are already in an enclosing RCU read-side critical section, so any bleeding of code will be into this enclosing section, thus still safe. > > +void rcu_read_unlock(void) > > +{ > [...] > > And need an smp_mb() here, again for non-x86 CPUs. > > ok. > > > Assuming that the memory barriers are added, I can see a bunch of ways > > for races to extend grace periods, but none so far that result in the > > fatal too-short grace period. Since rcu_qsctr_inc() refuses to > > increment the quiescent-state counter on any CPU that started an RCU > > read-side critical section that has not yet completed, any long > > critical section will have a corresponding CPU that will refuse to go > > through a quiescent state. And that will prevent the grace period > > from completing. > > i'm worried about the following scenario: what happens when a task is > migrated from CPU#1 to CPU#2, while in an RCU read section that it > acquired on CPU#1, and queues a callback. E.g. d_free() does a > call_rcu(), to queue the freeing of the dentry. > > That callback will be queued on CPU#2 - while the task still keeps > current->rcu_data of CPU#1. It also means that CPU#2's read counter did > _not_ get increased - and a too short grace period may occur. Let me make sure I understand the sequence of events: CPU#1 CPU#2 task1: rcu_read_lock() task1: migrate to CPU#2 task1: call_rcu() task1: rcu_read_unlock() This sequence will be safe because CPU#1's active_readers field will be non-zero throughout, so that CPU#1 will refuse to record any quiescent state from the time that task1 does the rcu_read_lock() on CPU#1 until the time that it does the rcu_read_unlock() on CPU#2. Now, it is true that CPU#2 might record a quiescent state during this time, but this will have no effect because -all- CPUs must pass through a quiescent state before any callbacks will be invoked. Since CPU#1 is refusing to record a quiescent state, grace periods will be blocked for the full extent of task 1's RCU read-side critical section. Or am I misunderstanding your scenario? Or, for that matter, your code? > it seems to me that that only safe method is to pick an 'RCU CPU' when > first entering the read section, and then sticking to it, no matter > where the task gets migrated to. Or to 'migrate' the +1 read count from > one CPU to the other, within the scheduler. This would work too, but I don't believe it is necessary given what you are already doing. > the 'migrate read count' solution seems more promising, as it would keep > other parts of the RCU code unchanged. [ But it seems to break the nice > 'flip pointers' method you found to force a grace period. If a 'read > section' can migrate from one CPU to another then it can migrate back as > well, at which point it cannot have the 'old' pointer. Maybe it would > still work better than no flip pointers. ] Well, I do believe that suppressing migration of tasks during RCU read-side critical sections would simplify the flip-pointers/counters code. But that is easy to say in advance of actually producing this code. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-24 5:28 ` Paul E. McKenney @ 2005-03-24 5:34 ` Ingo Molnar 2005-03-24 7:46 ` Paul E. McKenney 2005-03-24 8:31 ` Steven Rostedt 0 siblings, 2 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-24 5:34 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel, Esben Nielsen * Paul E. McKenney <paulmck@us.ibm.com> wrote: > Now, it is true that CPU#2 might record a quiescent state during this > time, but this will have no effect because -all- CPUs must pass > through a quiescent state before any callbacks will be invoked. Since > CPU#1 is refusing to record a quiescent state, grace periods will be > blocked for the full extent of task 1's RCU read-side critical > section. ok, great. So besides the barriers issue (and the long grace period time issue), the current design is quite ok. And i think your original flip pointers suggestion can be used to force synchronization. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-24 5:34 ` Ingo Molnar @ 2005-03-24 7:46 ` Paul E. McKenney 2005-03-24 8:31 ` Steven Rostedt 1 sibling, 0 replies; 92+ messages in thread From: Paul E. McKenney @ 2005-03-24 7:46 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Esben Nielsen On Thu, Mar 24, 2005 at 06:34:56AM +0100, Ingo Molnar wrote: > > * Paul E. McKenney <paulmck@us.ibm.com> wrote: > > > Now, it is true that CPU#2 might record a quiescent state during this > > time, but this will have no effect because -all- CPUs must pass > > through a quiescent state before any callbacks will be invoked. Since > > CPU#1 is refusing to record a quiescent state, grace periods will be > > blocked for the full extent of task 1's RCU read-side critical > > section. > > ok, great. So besides the barriers issue (and the long grace period time > issue), the current design is quite ok. And i think your original flip > pointers suggestion can be used to force synchronization. The thing I am currently struggling with on the flip-pointers approach is handling races between rcu_read_lock() and the flipping. In the earlier implementations that used this trick, you were guaranteed that if you were executing concurrently with one flip, you would do a voluntary context switch before the next flip happened, so that the race was harmless. This guarantee does not work in the PREEMPT_RT case, so more thought will be required. :-/ Thanx, Paul ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-24 5:34 ` Ingo Molnar 2005-03-24 7:46 ` Paul E. McKenney @ 2005-03-24 8:31 ` Steven Rostedt 2005-03-24 8:47 ` Steven Rostedt 2005-03-24 10:42 ` Ingo Molnar 1 sibling, 2 replies; 92+ messages in thread From: Steven Rostedt @ 2005-03-24 8:31 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel Ingo, I've noticed the following situation which is caused by __up_mutex assigning an owner right away. Given 3 processes, with priorities 1, 2, 3, where 3 is highest and 1 is lowest, and call them process A, B, C respectively. 1. Process A runs and grabs Lock L. 2. Process B preempts A, tries to grab Lock L. 3. Process A inherits process B's priority and starts to run. 4. Process C preempts A, tries to grab Lock L. 5. Process A inherits process C's priority and starts to run. 6. Process A releases Lock L loses its priority. 7. Process C gets Lock L. 8. Process C runs and releases Lock L. 9. With __up_mutex, Process B automatically gets Lock L. 10. Process C tries to grab Lock L again, but is now blocked by B. So we have a needless latency for Procss C, since it should be able to get lock L again. The problem arises because process B grabbed the lock without actually running. Since I agree with the rule that a lock can't have waiters while not being owned, I believe that this problem can be solved by adding a flag that states that the lock is "partially owned". That is the ownership of the lock should be transferred at step 9, but a flag is set that it has not been grabbed. This flag would be cleared when Process B wakes up and leaves the __down call. This way when process C tries to get the lock again, it sees that it's owned, but B hasn't executed yet. So it would be safe for C to take the lock back from B, that is if C is greater priority than B, otherwise it would act the same. If you agree with this approach, I would be willing to write a patch for you. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-24 8:31 ` Steven Rostedt @ 2005-03-24 8:47 ` Steven Rostedt 2005-03-24 10:45 ` Ingo Molnar 2005-03-24 11:39 ` Ingo Molnar 2005-03-24 10:42 ` Ingo Molnar 1 sibling, 2 replies; 92+ messages in thread From: Steven Rostedt @ 2005-03-24 8:47 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel On Thu, 24 Mar 2005, Steven Rostedt wrote: > > I've noticed the following situation which is caused by __up_mutex > assigning an owner right away. > I also see this with non rt tasks causing a burst of schedules. 1. Process A runs and grabs lock L. then finishes its time slice. 2. Process B runs and tries to grab Lock L. 3. Process A runs and releases lock L. 4. __up_mutex gives process B lock L. 5. Process A tries to grab lock L. 6. Process B runs and releases lock L. 7. __up_mutex gives lock L to process A. 8. Process B tries to grab lock L again. 9. Process A runs... Here we have more unnecessary schedules. So the condition to grab a lock should be: 1. not owned. 2. partially owned, and the owner is not RT. 3. partially owned but the owner is RT and so is the grabber, and the grabber's priority is >= the owner's priority. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-24 8:47 ` Steven Rostedt @ 2005-03-24 10:45 ` Ingo Molnar 2005-03-24 11:39 ` Ingo Molnar 1 sibling, 0 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-24 10:45 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-kernel * Steven Rostedt <rostedt@goodmis.org> wrote: > I also see this with non rt tasks causing a burst of schedules. > > 1. Process A runs and grabs lock L. then finishes its time slice. > 2. Process B runs and tries to grab Lock L. > 3. Process A runs and releases lock L. > 4. __up_mutex gives process B lock L. > 5. Process A tries to grab lock L. > 6. Process B runs and releases lock L. > 7. __up_mutex gives lock L to process A. > 8. Process B tries to grab lock L again. > 9. Process A runs... > > Here we have more unnecessary schedules. So the condition to grab a lock > should be: > > 1. not owned. > 2. partially owned, and the owner is not RT. > 3. partially owned but the owner is RT and so is the grabber, and the > grabber's priority is >= the owner's priority. yeah, sounds good - but i'd not make any distinction between RT and non-RT tasks - just make the rule #3 distinction based on ->prio. In particular on UP a task should only run when its higher prio, so if a lock is 'partially owned' then the priority rule should always be true. (On SMP it's a bit more complex, there the priority rule could make a real difference.) Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-24 8:47 ` Steven Rostedt 2005-03-24 10:45 ` Ingo Molnar @ 2005-03-24 11:39 ` Ingo Molnar 2005-03-24 14:33 ` Steven Rostedt 2005-03-24 23:05 ` Esben Nielsen 1 sibling, 2 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-24 11:39 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-kernel, Esben Nielsen * Steven Rostedt <rostedt@goodmis.org> wrote: > Here we have more unnecessary schedules. So the condition to grab a > lock should be: > > 1. not owned. > 2. partially owned, and the owner is not RT. > 3. partially owned but the owner is RT and so is the grabber, and the > grabber's priority is >= the owner's priority. there's another approach that could solve this problem: let the scheduler sort it all out. Esben Nielsen had this suggestion a couple of months ago - i didnt follow it because i thought that technique would create too many runnable tasks, but maybe that was a mistake. If we do the owning of the lock once the wakee hits the CPU we avoid the 'partial owner' problem, and we have the scheduler sort out priorities and policies. but i think i like the 'partial owner' (or rather 'owner pending') technique a bit better, because it controls concurrency explicitly, and it would thus e.g. allow another trick: when a new owner 'steals' a lock from another in-flight task, then we could 'unwakeup' that in-flight thread which could thus avoid two more context-switches on e.g. SMP systems: hitting the CPU and immediately blocking on the lock. (But this is a second-phase optimization which needs some core scheduler magic as well, i guess i'll be the one to code it up.) Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-24 11:39 ` Ingo Molnar @ 2005-03-24 14:33 ` Steven Rostedt 2005-03-24 17:51 ` Ingo Molnar 2005-03-24 18:17 ` Ingo Molnar 2005-03-24 23:05 ` Esben Nielsen 1 sibling, 2 replies; 92+ messages in thread From: Steven Rostedt @ 2005-03-24 14:33 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Esben Nielsen On Thu, 24 Mar 2005, Ingo Molnar wrote: > > there's another approach that could solve this problem: let the > scheduler sort it all out. Esben Nielsen had this suggestion a couple of > months ago - i didnt follow it because i thought that technique would > create too many runnable tasks, but maybe that was a mistake. If we do > the owning of the lock once the wakee hits the CPU we avoid the 'partial > owner' problem, and we have the scheduler sort out priorities and > policies. I've thought about this too, and came up with the conclusion that this was too messy. You have to give up the information of the processes that are waiting on the lock when you release it. Or keep the information of the waiters (waking only one "the wakee") but then you have a lock with waiters and no owner, which is the messy part. On an SMP machine, there may even be a chance of a lower priority process that gets it. That would be possible if the low priority process on the other CPU tries to grab the lock just after it was released but before the just woken up high priorty processes get scheduled. So there's a window where the lock is open, and the lower priority process snagged it just before the others got in. > > but i think i like the 'partial owner' (or rather 'owner pending') > technique a bit better, because it controls concurrency explicitly, and > it would thus e.g. allow another trick: when a new owner 'steals' a lock > from another in-flight task, then we could 'unwakeup' that in-flight > thread which could thus avoid two more context-switches on e.g. SMP > systems: hitting the CPU and immediately blocking on the lock. (But this > is a second-phase optimization which needs some core scheduler magic as > well, i guess i'll be the one to code it up.) > Darn! It seemed like fun to implement. I may do it myself anyway on my kernel just to understand your implementation even better. Later, -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-24 14:33 ` Steven Rostedt @ 2005-03-24 17:51 ` Ingo Molnar 2005-03-24 18:17 ` Ingo Molnar 1 sibling, 0 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-24 17:51 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-kernel, Esben Nielsen * Steven Rostedt <rostedt@goodmis.org> wrote: > > but i think i like the 'partial owner' (or rather 'owner pending') > > technique a bit better, because it controls concurrency explicitly, and > > it would thus e.g. allow another trick: when a new owner 'steals' a lock > > from another in-flight task, then we could 'unwakeup' that in-flight > > thread which could thus avoid two more context-switches on e.g. SMP > > systems: hitting the CPU and immediately blocking on the lock. (But this > > is a second-phase optimization which needs some core scheduler magic as > > well, i guess i'll be the one to code it up.) > > Darn! It seemed like fun to implement. I may do it myself anyway on my > kernel just to understand your implementation even better. feel free to implement the whole thing. Unscheduling a task should be done carefully, for obvious reasons. (I've implemented it once 1-2 years ago for a different purpose, to unschedule ksoftirqd - it ought to be somewhere in the lkml archives.) Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-24 14:33 ` Steven Rostedt 2005-03-24 17:51 ` Ingo Molnar @ 2005-03-24 18:17 ` Ingo Molnar 1 sibling, 0 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-24 18:17 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-kernel, Esben Nielsen * Steven Rostedt <rostedt@goodmis.org> wrote: > On an SMP machine, there may even be a chance of a lower priority > process that gets it. That would be possible if the low priority > process on the other CPU tries to grab the lock just after it was > released but before the just woken up high priorty processes get > scheduled. So there's a window where the lock is open, and the lower > priority process snagged it just before the others got in. that's always a possibility, on UP too: if a lower priority task manages to acquire a lock 'just before' a highprio thread got interested in it there's no way to undo that. but for the other reasons the explicit approach looks better. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-24 11:39 ` Ingo Molnar 2005-03-24 14:33 ` Steven Rostedt @ 2005-03-24 23:05 ` Esben Nielsen 2005-03-25 6:19 ` Ingo Molnar 2005-03-26 16:04 ` Steven Rostedt 1 sibling, 2 replies; 92+ messages in thread From: Esben Nielsen @ 2005-03-24 23:05 UTC (permalink / raw) To: Ingo Molnar; +Cc: Steven Rostedt, linux-kernel On Thu, 24 Mar 2005, Ingo Molnar wrote: > > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > Here we have more unnecessary schedules. So the condition to grab a > > lock should be: > > > > 1. not owned. > > 2. partially owned, and the owner is not RT. > > 3. partially owned but the owner is RT and so is the grabber, and the > > grabber's priority is >= the owner's priority. > > there's another approach that could solve this problem: let the > scheduler sort it all out. Esben Nielsen had this suggestion a couple of > months ago - i didnt follow it because i thought that technique would > create too many runnable tasks, but maybe that was a mistake. If we do > the owning of the lock once the wakee hits the CPU we avoid the 'partial > owner' problem, and we have the scheduler sort out priorities and > policies. > > but i think i like the 'partial owner' (or rather 'owner pending') > technique a bit better, because it controls concurrency explicitly, and > it would thus e.g. allow another trick: when a new owner 'steals' a lock > from another in-flight task, then we could 'unwakeup' that in-flight > thread which could thus avoid two more context-switches on e.g. SMP > systems: hitting the CPU and immediately blocking on the lock. (But this > is a second-phase optimization which needs some core scheduler magic as > well, i guess i'll be the one to code it up.) > I checked the implementation of a mutex I send in last fall. The unlock operation does give ownership explicitly to the highest priority waiter, as Ingo's implementation does. Originally I planned for just having unlock to wake up the highest priority owner and set lock->owner = NULL. The lock operation would be something like while(lock->owner!=NULL) { schedule(); } grap the lock. Then the first task, i.e. the one with highest priority on UP, will get it first. On SMP a low priority task on another CPU might get in and take it. I like the idea of having the scheduler take care of it - it is a very optimal coded queue-system after all. That will work on UP but not on SMP. Having the unlock operation to set the mutex in a "partially owned" state will work better. The only problem I see, relative to Ingo's implementation, is that then the awoken task have to go in and change the state of the mutex, i.e. it has to lock the wait_lock again. Will the extra schedulings being the problem happen offen enough in practise to have the extra overhead? > Ingo Esben ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-24 23:05 ` Esben Nielsen @ 2005-03-25 6:19 ` Ingo Molnar 2005-03-26 16:31 ` Steven Rostedt 2005-03-26 16:04 ` Steven Rostedt 1 sibling, 1 reply; 92+ messages in thread From: Ingo Molnar @ 2005-03-25 6:19 UTC (permalink / raw) To: Esben Nielsen; +Cc: Steven Rostedt, linux-kernel * Esben Nielsen <simlo@phys.au.dk> wrote: > I like the idea of having the scheduler take care of it - it is a very > optimal coded queue-system after all. That will work on UP but not on > SMP. Having the unlock operation to set the mutex in a "partially > owned" state will work better. The only problem I see, relative to > Ingo's implementation, is that then the awoken task have to go in and > change the state of the mutex, i.e. it has to lock the wait_lock > again. Will the extra schedulings being the problem happen offen > enough in practise to have the extra overhead? i think this should be covered by the 'unschedule/unwakeup' feature, mentioned in the latest mails. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-25 6:19 ` Ingo Molnar @ 2005-03-26 16:31 ` Steven Rostedt 2005-03-26 19:11 ` Ingo Molnar 0 siblings, 1 reply; 92+ messages in thread From: Steven Rostedt @ 2005-03-26 16:31 UTC (permalink / raw) To: Ingo Molnar; +Cc: Esben Nielsen, linux-kernel On Fri, 25 Mar 2005, Ingo Molnar wrote: > * Esben Nielsen <simlo@phys.au.dk> wrote: > > > I like the idea of having the scheduler take care of it - it is a very > > optimal coded queue-system after all. That will work on UP but not on > > SMP. Having the unlock operation to set the mutex in a "partially > > owned" state will work better. The only problem I see, relative to > > Ingo's implementation, is that then the awoken task have to go in and > > change the state of the mutex, i.e. it has to lock the wait_lock > > again. Will the extra schedulings being the problem happen offen > > enough in practise to have the extra overhead? > > i think this should be covered by the 'unschedule/unwakeup' feature, > mentioned in the latest mails. > The first implementation would probably just be the setting of a "pending owner" bit. But the better one may be to unschedule. But, what would the overhead be for unscheduling. Since you need to grab the run queue locks for that. This might make for an interesting case study. The waking up of a process who had the lock stolen may not happen that much. The lock stealing, would (as I see in my runs) happen quite a bit though. But on UP, the waking of the robbed owner, would never happen, unless it also owned a lock that a higher priority process wanted. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-26 16:31 ` Steven Rostedt @ 2005-03-26 19:11 ` Ingo Molnar 0 siblings, 0 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-26 19:11 UTC (permalink / raw) To: Steven Rostedt; +Cc: Esben Nielsen, linux-kernel * Steven Rostedt <rostedt@goodmis.org> wrote: > > i think this should be covered by the 'unschedule/unwakeup' feature, > > mentioned in the latest mails. > > The first implementation would probably just be the setting of a > "pending owner" bit. But the better one may be to unschedule. But, > what would the overhead be for unscheduling. Since you need to grab > the run queue locks for that. This might make for an interesting case > study. The waking up of a process who had the lock stolen may not > happen that much. The lock stealing, would (as I see in my runs) > happen quite a bit though. But on UP, the waking of the robbed owner, > would never happen, unless it also owned a lock that a higher priority > process wanted. yeah, lets skip the unscheduling for now, the 'pending owner' bit is the important one. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-24 23:05 ` Esben Nielsen 2005-03-25 6:19 ` Ingo Molnar @ 2005-03-26 16:04 ` Steven Rostedt 2005-03-30 6:31 ` Steven Rostedt 1 sibling, 1 reply; 92+ messages in thread From: Steven Rostedt @ 2005-03-26 16:04 UTC (permalink / raw) To: Esben Nielsen; +Cc: Ingo Molnar, linux-kernel On Fri, 25 Mar 2005, Esben Nielsen wrote: > > > > I checked the implementation of a mutex I send in last fall. The unlock > operation does give ownership explicitly to the highest priority waiter, > as Ingo's implementation does. > > Originally I planned for just having unlock to wake up the highest > priority owner and set lock->owner = NULL. The lock operation would be > something like > while(lock->owner!=NULL) > { > schedule(); > } > grap the lock. > > Then the first task, i.e. the one with highest priority on UP, will get it > first. On SMP a low priority task on another CPU might get in and take it. This could be dangerous on SMP then, because, if a higher priority process on the CPU of the process that grabbed the lock, preempted it, then the other CPU can spin on this since it would be the highest priority process for that CPU. Not to mention that you have to make sure that priority inheritance was still implemented for the higher priority process woken up but having it stollen by the lower priority process. > > I like the idea of having the scheduler take care of it - it is a very > optimal coded queue-system after all. That will work on UP but not on SMP. > Having the unlock operation to set the mutex in a "partially owned" state > will work better. The only problem I see, relative to Ingo's > implementation, is that then the awoken task have to go in and > change the state of the mutex, i.e. it has to lock the wait_lock again. > Will the extra schedulings being the problem happen offen enough in > practise to have the extra overhead? Another answer is to have the "pending owner" bit be part of the task structure. A flag maybe. If a higher priority process comes in and decides to grab the lock from this owner, it does a test_and_clear on the this flag on the pending owner task. When the pending owner task wakes up, it does the test_and_clear on its own bit. Who ever had the bit set on the test wins. If the higher prio task were to clear it first, then it takes the ownership away from the pending owner. If the pending owner were to clear the bit first, it won and would contiue as though it got the lock. The higher priority tasks would do this test within the wait_lock to keep from having more than one trying to grab the lock from the pending owner, but the pending owner won't need to do anything since it will know if it was the new owner just by testing its own bit. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-26 16:04 ` Steven Rostedt @ 2005-03-30 6:31 ` Steven Rostedt 2005-03-30 6:50 ` Ingo Molnar 0 siblings, 1 reply; 92+ messages in thread From: Steven Rostedt @ 2005-03-30 6:31 UTC (permalink / raw) To: Esben Nielsen; +Cc: Ingo Molnar, LKML On Sat, 2005-03-26 at 11:04 -0500, Steven Rostedt wrote: > > On Fri, 25 Mar 2005, Esben Nielsen wrote: > > > > I like the idea of having the scheduler take care of it - it is a very > > optimal coded queue-system after all. That will work on UP but not on SMP. > > Having the unlock operation to set the mutex in a "partially owned" state > > will work better. The only problem I see, relative to Ingo's > > implementation, is that then the awoken task have to go in and > > change the state of the mutex, i.e. it has to lock the wait_lock again. > > Will the extra schedulings being the problem happen offen enough in > > practise to have the extra overhead? > > Another answer is to have the "pending owner" bit be part of the task > structure. A flag maybe. If a higher priority process comes in and > decides to grab the lock from this owner, it does a test_and_clear on the > this flag on the pending owner task. When the pending owner task wakes > up, it does the test_and_clear on its own bit. Who ever had the bit set > on the test wins. If the higher prio task were to clear it first, then it > takes the ownership away from the pending owner. If the pending owner > were to clear the bit first, it won and would contiue as though it got the > lock. The higher priority tasks would do this test within the wait_lock > to keep from having more than one trying to grab the lock from the pending > owner, but the pending owner won't need to do anything since it will know > if it was the new owner just by testing its own bit. OK, I'm declaring defeat here. I've been fighting race conditions all day, and it's now 1 in the morning where I live. It looks like this implementation has no other choice but to have the waking up "pending owner" take the wait_list lock once again. How heavy of a overhead is that really? The problem I've painfully discovered, is that a task trying to take the lock must test the pending owner for two things. One is, is the pending owner owning the same lock as the one the task is trying to get. Since the waking up of the pending owner has no synchronous locking, it can grab the lock and then become a pending owner to another lock after the other task thinks it's still the pending owner of the lock its trying to get, but before testing it. So it can mistake it as the pending owner still for this lock, when in reality it owns to lock and is pending for another. The other test must also do the test_and_clear_bit on the pending owner bit. So you need to make sure the owner not only stays the owner of the lock the task is trying to get, but also be able to do the atomic test_and_clear on the owner's pending owner bit. I can't get these two in sync without grabbing a lock (in this case, the wait_list lock). Ingo, unless you can think of a way to do this, tomorrow (actually today), I'll change this to have the end of __down (and friends) grab the wait_list lock to test and clear it's pending owner bit. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-30 6:31 ` Steven Rostedt @ 2005-03-30 6:50 ` Ingo Molnar 2005-03-30 16:46 ` Steven Rostedt 0 siblings, 1 reply; 92+ messages in thread From: Ingo Molnar @ 2005-03-30 6:50 UTC (permalink / raw) To: Steven Rostedt; +Cc: Esben Nielsen, LKML * Steven Rostedt <rostedt@goodmis.org> wrote: > OK, I'm declaring defeat here. I've been fighting race conditions all > day, and it's now 1 in the morning where I live. It looks like this > implementation has no other choice but to have the waking up "pending > owner" take the wait_list lock once again. How heavy of a overhead is > that really? as i mentioned it before, taking a lock is not a big issue at all. Since you have to touch the lock data structure anyway (and all of it fits into a single cacheline), it doesnt really matter whether it's atomic flag setting/clearing, or raw spinlock based. later on, once things are stable and well-understood, we can still attempt to micro-optimize it. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-30 6:50 ` Ingo Molnar @ 2005-03-30 16:46 ` Steven Rostedt 2005-03-30 19:44 ` Esben Nielsen 0 siblings, 1 reply; 92+ messages in thread From: Steven Rostedt @ 2005-03-30 16:46 UTC (permalink / raw) To: Ingo Molnar; +Cc: Esben Nielsen, LKML On Wed, 2005-03-30 at 08:50 +0200, Ingo Molnar wrote: > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > OK, I'm declaring defeat here. I've been fighting race conditions all > > day, and it's now 1 in the morning where I live. It looks like this > > implementation has no other choice but to have the waking up "pending > > owner" take the wait_list lock once again. How heavy of a overhead is > > that really? > > as i mentioned it before, taking a lock is not a big issue at all. Since > you have to touch the lock data structure anyway (and all of it fits > into a single cacheline), it doesnt really matter whether it's atomic > flag setting/clearing, or raw spinlock based.c0267ad4 > OK, I've implemented adding the write_lock, but I still have a nasty race condition, and I finally figured out why! The damn BKL! Here's the situation I'm having. Process A grabs BKL then tries to grab another semaphore (say sem X) But process B has sem X so process A sleeps. Since semaphores don't have the restriction of saving the BKL, the BKL gets released from process A. Process C comes along and grabs the BKL. Finally B releases sem X and process A becomes the new "pending owner" and wakes up. When A schedules in it tries to grab the BKL but blocks. Now it is at the point where A doesn't actually own either the BKL or sem X. When C releases the BKL and gives it to A, A is now the pending owner of both the BKL and sem X. When this occurs, all hell breaks loose. I believe I can solve this by making the BKL a special case and not implement the pending owner at all for it. > later on, once things are stable and well-understood, we can still > attempt to micro-optimize it. > Heck, I'll make it bloat city till I get it working, and then tone it down a little :-) And maybe later we can have a better solution for the BKL. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-30 16:46 ` Steven Rostedt @ 2005-03-30 19:44 ` Esben Nielsen 2005-03-30 19:56 ` Steven Rostedt 0 siblings, 1 reply; 92+ messages in thread From: Esben Nielsen @ 2005-03-30 19:44 UTC (permalink / raw) To: Steven Rostedt; +Cc: Ingo Molnar, LKML On Wed, 30 Mar 2005, Steven Rostedt wrote: > [...] > > Heck, I'll make it bloat city till I get it working, and then tone it > down a little :-) And maybe later we can have a better solution for the > BKL. > What about removing it alltogether? Seriously, how much work would it be to simply remove it and go in and make specific locks in all those places the code can't compile? Esben > -- Steve > > ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-30 19:44 ` Esben Nielsen @ 2005-03-30 19:56 ` Steven Rostedt 2005-03-30 21:39 ` Steven Rostedt 0 siblings, 1 reply; 92+ messages in thread From: Steven Rostedt @ 2005-03-30 19:56 UTC (permalink / raw) To: Esben Nielsen; +Cc: Ingo Molnar, LKML On Wed, 2005-03-30 at 20:44 +0100, Esben Nielsen wrote: > On Wed, 30 Mar 2005, Steven Rostedt wrote: > > > [...] > > > > Heck, I'll make it bloat city till I get it working, and then tone it > > down a little :-) And maybe later we can have a better solution for the > > BKL. > > > What about removing it alltogether? > Seriously, how much work would it be to simply remove it and go in and > make specific locks in all those places the code can't compile? I would really love to do that! But I don't have the time or the knowledge on the effects that would have. Because of the stupid BKL, I'm going with a combination of your idea and my idea for the solution of pending owners. I originally wanted the stealer of the lock to put the task that was robbed back on the list. But because of the BKL, you end up with a state that a task can be blocked by two locks at the same time. This causes hell with priority inheritance. So finally, what I'm doing now is to have the lock still pick the pending owner, and that owner is not blocked on the lock. If another process comes along and steals the lock, the robbed task goes to a state as if it was just before calling __down*. So when it wakes up, it checks to see if it is the pending owner, and if not, then it tries to grab the lock again, if it doesn't get it, it just calls task_blocks_on_lock again. This is the easiest solution so far, but I still like the stealer to put it back on the list. But until we get rid of the BKL that wont happen. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-30 19:56 ` Steven Rostedt @ 2005-03-30 21:39 ` Steven Rostedt 2005-03-30 21:43 ` Steven Rostedt 2005-03-31 11:03 ` Ingo Molnar 0 siblings, 2 replies; 92+ messages in thread From: Steven Rostedt @ 2005-03-30 21:39 UTC (permalink / raw) To: Esben Nielsen; +Cc: Ingo Molnar, LKML On Wed, 2005-03-30 at 14:56 -0500, Steven Rostedt wrote: > Because of the stupid BKL, I'm going with a combination of your idea and > my idea for the solution of pending owners. I originally wanted the > stealer of the lock to put the task that was robbed back on the list. > But because of the BKL, you end up with a state that a task can be > blocked by two locks at the same time. This causes hell with priority > inheritance. > > So finally, what I'm doing now is to have the lock still pick the > pending owner, and that owner is not blocked on the lock. If another > process comes along and steals the lock, the robbed task goes to a state > as if it was just before calling __down*. So when it wakes up, it checks > to see if it is the pending owner, and if not, then it tries to grab the > lock again, if it doesn't get it, it just calls task_blocks_on_lock > again. > > This is the easiest solution so far, but I still like the stealer to put > it back on the list. But until we get rid of the BKL that wont happen. Well, here it finally is. There's still things I don't like about it. But it seems to work, and that's the important part. I had to reluctantly add two items to the task_struct. I was hoping to avoid that. But because of race conditions it seemed to be the only way. The two are: pending_owner - probably should be pending_lock but it made sense with the state of the code when I first created it. This points to the lock that the task is pending on. I would have used the blocked_on->lock but to test this, there existed a race condition between testing it and if it wasn't the lock, then the task could continue and this waiter (being on the stack) would be garbage. The chances that it would be garbage and containing the same lock is extremely low, but the chance exists, and talk about one hell of a debug to find it! rt_flags - I would have just added the flag to flags, but since this was protected by the lock and not by the task, I was afraid of a read-modify-write from schedule screwing this flag up. To get a lock, all the down functions call grab_lock. This function may be optimized too, but right now it is easy to read. It performs a series of tests to see if it can grab the lock, including if the lock is free, and if not, can the current task steal it. All sleeping down functions also call capture_lock, which tests to see if the lock was stolen or not. If it was, then it tries to get it again, and if it fails it just calls task_blocks_on_lock again. It's a relatively simple patch, but it took a lot of pain since I was trying very hard to have the stealer do the work. But the BKL proved to be too much. Also, Ingo, I hope you don't mind my little copyright notice :-) -- Steve Index: kernel/rt.c =================================================================== --- kernel/rt.c (revision 109) +++ kernel/rt.c (working copy) @@ -17,6 +17,11 @@ * Copyright (c) 2001 David Howells (dhowells@redhat.com). * - Derived partially from idea by Andrea Arcangeli <andrea@suse.de> * - Derived also from comments by Linus + * + * Pending ownership of locks and ownership stealing: + * + * Copyright (C) 2005, Kihon Technologies Inc., Steven Rostedt + * */ #include <linux/config.h> #include <linux/sched.h> @@ -28,6 +33,14 @@ #include <linux/interrupt.h> /* + * These flags are used for allowing of stealing of ownerships. + */ +#define RT_PENDOWNER 1 /* pending owner on a lock */ + +#define TASK_PENDING(task) \ + ((task)->rt_flags & RT_PENDOWNER) + +/* * This flag is good for debugging the PI code - it makes all tasks * in the system fall under PI handling. Normally only SCHED_FIFO/RR * tasks are PI-handled: @@ -791,6 +804,113 @@ } /* + * Try to grab a lock, and if it is owned but the owner + * hasn't woken up yet, see if we can steal it. + * + * Return: 1 if task can grab lock. + * 0 if not. + */ +static int grab_lock(struct rt_mutex *lock, struct task_struct *task) +{ + struct task_struct *owner = lock->owner; + + if (!owner) + return 1; + /* + * The lock is owned, but now test to see if the owner + * is still sleeping and hasn't woken up to get the lock. + */ + + /* Test the simple case first, is it already running? */ + if (!TASK_PENDING(owner)) + return 0; + + /* The owner is pending on a lock, but is it this lock? */ + if (owner->pending_owner != lock) + return 0; + + /* + * There's an owner, but it hasn't woken up to take the lock yet. + * See if we should steal it from him. + */ + if (task->prio > owner->prio) + return 0; + + /* + * The BKL is a pain in the ass. Don't ever steal it + + */ + if (lock == &kernel_sem.lock) + return 0; + + /* + * This task is of higher priority than the current pending + * owner, so we may steal it. + */ + owner->rt_flags &= ~RT_PENDOWNER; + owner->pending_owner = NULL; + +#ifdef CONFIG_RT_DEADLOCK_DETECT + /* + * This task will be taking the ownership away, and + * when it does, the lock can't be on the held list. + */ + if (lock->debug) { + TRACE_WARN_ON(list_empty(&lock->held_list)); + list_del_init(&lock->held_list); + } +#endif + return 1; +} + +/* + * Bring a task from pending ownership to owning a lock. + * + * Return 0 if we secured it, otherwise non-zero if it was + * stolen. + */ +static int capture_lock(struct rt_mutex_waiter *waiter, struct task_struct *task) +{ + struct rt_mutex *lock = waiter->lock; + unsigned long flags; + int ret = 0; + + /* + * The BKL is special, we always get it. + */ + if (lock == &kernel_sem.lock) + return 0; + + trace_lock_irqsave(&trace_lock, flags); + spin_lock(&lock->wait_lock); + + if (!(task->rt_flags & RT_PENDOWNER)) { + /* someone else stole it */ + TRACE_BUG_ON(lock->owner == task); + if (grab_lock(lock,task)) { + /* we got it back! */ + struct task_struct *old_owner = lock->owner; + spin_lock(&pi_lock); + set_new_owner(lock, old_owner, task, waiter->eip); + spin_unlock(&pi_lock); + ret = 0; + } else { + /* Add ourselves back to the list */ + task_blocks_on_lock(waiter,task,lock,waiter->eip); + ret = 1; + } + } else { + task->rt_flags &= ~RT_PENDOWNER; + task->pending_owner = NULL; + } + + spin_unlock(&lock->wait_lock); + trace_unlock_irqrestore(&trace_lock, flags); + + return ret; +} + +/* * lock it semaphore-style: no worries about missed wakeups. */ static void __sched __down(struct rt_mutex *lock, unsigned long eip) @@ -805,11 +925,12 @@ init_lists(lock); - if (!lock->owner) { + if (grab_lock(lock,task)) { /* granted */ - TRACE_WARN_ON(!list_empty(&lock->wait_list)); + struct task_struct *old_owner = lock->owner; + TRACE_WARN_ON(!list_empty(&lock->wait_list) && !old_owner); spin_lock(&pi_lock); - set_new_owner(lock, NULL, task, eip); + set_new_owner(lock, old_owner, task, eip); spin_unlock(&pi_lock); spin_unlock(&lock->wait_lock); trace_unlock_irqrestore(&trace_lock, flags); @@ -834,6 +955,7 @@ #endif current->flags &= ~PF_NOSCHED; + wait_again: /* wait to be given the lock */ for (;;) { if (!waiter.task) @@ -841,6 +963,14 @@ schedule(); set_task_state(task, TASK_UNINTERRUPTIBLE); } + /* + * Check to see if we didn't have ownership stolen. + */ + if (capture_lock(&waiter,task)) { + set_task_state(task, TASK_UNINTERRUPTIBLE); + goto wait_again; + } + current->flags |= nosched_flag; task->state = TASK_RUNNING; } @@ -894,11 +1024,12 @@ init_lists(lock); - if (!lock->owner) { + if (grab_lock(lock,task)) { /* granted */ - TRACE_WARN_ON(!list_empty(&lock->wait_list)); + struct task_struct *old_owner = lock->owner; + TRACE_WARN_ON(!list_empty(&lock->wait_list) && !old_owner); spin_lock(&pi_lock); - set_new_owner(lock, NULL, task, eip); + set_new_owner(lock, old_owner, task, eip); spin_unlock(&pi_lock); spin_unlock(&lock->wait_lock); trace_unlock_irqrestore(&trace_lock, flags); @@ -933,6 +1064,7 @@ #endif current->flags &= ~PF_NOSCHED; + wait_again: /* wait to be given the lock */ for (;;) { unsigned long saved_flags = current->flags & PF_NOSCHED; @@ -949,6 +1081,16 @@ got_wakeup = 1; } /* + * Check to see if we didn't have ownership stolen. + */ + if (capture_lock(&waiter,task)) { + state = xchg(&task->state, TASK_UNINTERRUPTIBLE); + if (state == TASK_RUNNING) + got_wakeup = 1; + goto wait_again; + } + + /* * Only set the task's state to TASK_RUNNING if it got * a non-mutex wakeup. We keep the original state otherwise. * A mutex wakeup changes the task's state to TASK_RUNNING_MUTEX, @@ -1024,11 +1166,12 @@ init_lists(lock); - if (!lock->owner) { + if (grab_lock(lock,task)) { /* granted */ - TRACE_WARN_ON(!list_empty(&lock->wait_list)); + struct task_struct *old_owner = lock->owner; + TRACE_WARN_ON(!list_empty(&lock->wait_list) && !old_owner); spin_lock(&pi_lock); - set_new_owner(lock, NULL, task, eip); + set_new_owner(lock, old_owner, task, eip); spin_unlock(&pi_lock); spin_unlock(&lock->wait_lock); trace_unlock_irqrestore(&trace_lock, flags); @@ -1054,6 +1197,7 @@ current->flags &= ~PF_NOSCHED; ret = 0; + wait_again: /* wait to be given the lock */ for (;;) { if (signal_pending(current)) { @@ -1084,6 +1228,16 @@ schedule(); set_task_state(task, TASK_INTERRUPTIBLE); } + + /* + * Check to see if we didn't have ownership stolen. + */ + if (ret) { + if (capture_lock(&waiter,task)) { + set_task_state(task, TASK_INTERRUPTIBLE); + goto wait_again; + } + } task->state = TASK_RUNNING; current->flags |= nosched_flag; @@ -1112,11 +1266,12 @@ init_lists(lock); - if (!lock->owner) { + if (grab_lock(lock,task)) { /* granted */ - TRACE_WARN_ON(!list_empty(&lock->wait_list)); + struct task_struct *old_owner = lock->owner; + TRACE_WARN_ON(!list_empty(&lock->wait_list) && !old_owner); spin_lock(&pi_lock); - set_new_owner(lock, NULL, task, eip); + set_new_owner(lock, old_owner, task, eip); spin_unlock(&pi_lock); ret = 1; } @@ -1217,6 +1372,10 @@ if (prio != old_owner->prio) pi_setprio(lock, old_owner, prio); if (new_owner) { + if (lock != &kernel_sem.lock) { + new_owner->rt_flags |= RT_PENDOWNER; + new_owner->pending_owner = lock; + } if (save_state) wake_up_process_mutex(new_owner); else Index: include/linux/sched.h =================================================================== --- include/linux/sched.h (revision 109) +++ include/linux/sched.h (working copy) @@ -831,6 +831,9 @@ /* RT deadlock detection and priority inheritance handling */ struct rt_mutex_waiter *blocked_on; + struct rt_mutex *pending_owner; + unsigned long rt_flags; + #ifdef CONFIG_RT_DEADLOCK_DETECT void *last_kernel_lock; ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-30 21:39 ` Steven Rostedt @ 2005-03-30 21:43 ` Steven Rostedt 2005-03-31 11:03 ` Ingo Molnar 1 sibling, 0 replies; 92+ messages in thread From: Steven Rostedt @ 2005-03-30 21:43 UTC (permalink / raw) To: Esben Nielsen; +Cc: Ingo Molnar, LKML On Wed, 2005-03-30 at 16:39 -0500, Steven Rostedt wrote: > On Wed, 2005-03-30 at 14:56 -0500, Steven Rostedt wrote: > > > Because of the stupid BKL, I'm going with a combination of your idea and > > my idea for the solution of pending owners. I originally wanted the > > stealer of the lock to put the task that was robbed back on the list. > > But because of the BKL, you end up with a state that a task can be > > blocked by two locks at the same time. This causes hell with priority > > inheritance. > > [snip] > It's a relatively simple patch, but it took a lot of pain since I was > trying very hard to have the stealer do the work. But the BKL proved to > be too much. Oh, I forgot, this is patched against V0.7.41-11. Ingo, you're moving so fast, I can't keep up. I'll download your latest later, and see if this patch still qualifies. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-30 21:39 ` Steven Rostedt 2005-03-30 21:43 ` Steven Rostedt @ 2005-03-31 11:03 ` Ingo Molnar 2005-03-31 12:03 ` Esben Nielsen ` (3 more replies) 1 sibling, 4 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-31 11:03 UTC (permalink / raw) To: Steven Rostedt; +Cc: Esben Nielsen, LKML * Steven Rostedt <rostedt@goodmis.org> wrote: > Well, here it finally is. There's still things I don't like about it. > But it seems to work, and that's the important part. > > I had to reluctantly add two items to the task_struct. I was hoping > to avoid that. But because of race conditions it seemed to be the only > way. well it's not a big problem, and we avoided having to add flags to the rt_lock structure, which is the important issue. your patch looks good, i've added it to my tree and have uploaded the -26-00 patch. It boots fine on my testbox, except for some new messages: knodemgrd_0/902: BUG in __down_complete at kernel/rt.c:1568 [<c0103956>] dump_stack+0x23/0x25 (20) [<c0130dcd>] down_trylock+0x1fb/0x200 (48) [<c0364ee2>] nodemgr_host_thread+0xd0/0x17b (48) [<c0100d4d>] kernel_thread_helper+0x5/0xb (136249364) --------------------------- | preempt count: 00000001 ] | 1-level deep critical section nesting: ---------------------------------------- .. [<c0133a75>] .... print_traces+0x1b/0x52 .....[<c0103956>] .. ( <= dump_stack+0x23/0x25) this goes away if i revert your patch. It seems the reason is that trylock hasnt been updated to use the pending-owner logic? Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-31 11:03 ` Ingo Molnar @ 2005-03-31 12:03 ` Esben Nielsen 2005-03-31 12:14 ` Steven Rostedt 2005-03-31 12:22 ` Steven Rostedt ` (2 subsequent siblings) 3 siblings, 1 reply; 92+ messages in thread From: Esben Nielsen @ 2005-03-31 12:03 UTC (permalink / raw) To: Ingo Molnar; +Cc: Steven Rostedt, LKML On Thu, 31 Mar 2005, Ingo Molnar wrote: > > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > Well, here it finally is. There's still things I don't like about it. > > But it seems to work, and that's the important part. > > > > I had to reluctantly add two items to the task_struct. I was hoping > > to avoid that. But because of race conditions it seemed to be the only > > way. > > well it's not a big problem, and we avoided having to add flags to the > rt_lock structure, which is the important issue. > I was going to say the opposit. I know that there are many more rt_locks locks around and the fields thus will take more memory when put there but I believe it is more logical to have the fields there. Esben ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-31 12:03 ` Esben Nielsen @ 2005-03-31 12:14 ` Steven Rostedt 2005-03-31 13:33 ` Ingo Molnar 0 siblings, 1 reply; 92+ messages in thread From: Steven Rostedt @ 2005-03-31 12:14 UTC (permalink / raw) To: Esben Nielsen; +Cc: Ingo Molnar, LKML On Thu, 2005-03-31 at 13:03 +0100, Esben Nielsen wrote: > On Thu, 31 Mar 2005, Ingo Molnar wrote: > > > > > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > Well, here it finally is. There's still things I don't like about it. > > > But it seems to work, and that's the important part. > > > > > > I had to reluctantly add two items to the task_struct. I was hoping > > > to avoid that. But because of race conditions it seemed to be the only > > > way. > > > > well it's not a big problem, and we avoided having to add flags to the > > rt_lock structure, which is the important issue. > > > I was going to say the opposit. I know that there are many more rt_locks > locks around and the fields thus will take more memory when put there but > I believe it is more logical to have the fields there. It seems logical to be there, but in practicality, it's not. The problem is that the flags represent a state of the task with respect to a single lock. When the task loses ownership of a lock, the state of the task changes. But the the lock has a different state at that moment (it has a new onwner). Now when it releases the lock, it might give the lock to another task, and that becomes the pending owner. Now the state of the lock is the same as in the beginning. But the first task needs to see this change. You can still pull this off by testing the state of the lock and compare it to the current owner, but I too like the fact that you don't increase the size of the kernel statically. There are a lot more locks in the kernel than tasks on most systems. And those systems that will have more tasks than locks, need a lot of memory anyway. So we only punish the big systems (that expect to be punished) and keep the little guys safe. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-31 12:14 ` Steven Rostedt @ 2005-03-31 13:33 ` Ingo Molnar 0 siblings, 0 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-31 13:33 UTC (permalink / raw) To: Steven Rostedt; +Cc: Esben Nielsen, LKML * Steven Rostedt <rostedt@goodmis.org> wrote: > > I was going to say the opposit. I know that there are many more rt_locks > > locks around and the fields thus will take more memory when put there but > > I believe it is more logical to have the fields there. > > It seems logical to be there, but in practicality, it's not. > > The problem is that the flags represent a state of the task with > respect to a single lock. When the task loses ownership of a lock, > the state of the task changes. But the the lock has a different state > at that moment (it has a new onwner). Now when it releases the lock, > it might give the lock to another task, and that becomes the pending > owner. Now the state of the lock is the same as in the beginning. But > the first task needs to see this change. > > You can still pull this off by testing the state of the lock and > compare it to the current owner, but I too like the fact that you > don't increase the size of the kernel statically. There are a lot > more locks in the kernel than tasks on most systems. And those systems > that will have more tasks than locks, need a lot of memory anyway. So > we only punish the big systems (that expect to be punished) and keep > the little guys safe. no system is punished. Since task_struct embedds 2 locks already, moving the field(s) into task_struct is already a win. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-31 11:03 ` Ingo Molnar 2005-03-31 12:03 ` Esben Nielsen @ 2005-03-31 12:22 ` Steven Rostedt 2005-03-31 12:36 ` Steven Rostedt 2005-03-31 12:49 ` Steven Rostedt 3 siblings, 0 replies; 92+ messages in thread From: Steven Rostedt @ 2005-03-31 12:22 UTC (permalink / raw) To: Ingo Molnar; +Cc: Esben Nielsen, LKML On Thu, 2005-03-31 at 13:03 +0200, Ingo Molnar wrote: > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > Well, here it finally is. There's still things I don't like about it. > > But it seems to work, and that's the important part. > > > > I had to reluctantly add two items to the task_struct. I was hoping > > to avoid that. But because of race conditions it seemed to be the only > > way. > > well it's not a big problem, and we avoided having to add flags to the > rt_lock structure, which is the important issue. > > your patch looks good, i've added it to my tree and have uploaded the > -26-00 patch. It boots fine on my testbox, except for some new messages: > > knodemgrd_0/902: BUG in __down_complete at kernel/rt.c:1568 > [<c0103956>] dump_stack+0x23/0x25 (20) > [<c0130dcd>] down_trylock+0x1fb/0x200 (48) > [<c0364ee2>] nodemgr_host_thread+0xd0/0x17b (48) > [<c0100d4d>] kernel_thread_helper+0x5/0xb (136249364) > --------------------------- > | preempt count: 00000001 ] > | 1-level deep critical section nesting: > ---------------------------------------- > .. [<c0133a75>] .... print_traces+0x1b/0x52 > .....[<c0103956>] .. ( <= dump_stack+0x23/0x25) > > this goes away if i revert your patch. It seems the reason is that > trylock hasnt been updated to use the pending-owner logic? Hmm, The pending owner logic in __down_trylock uses the grab_lock function. It doesn't need the capture_lock since it never sleeps. I'm downloading your 42-00-experimental now and installing it to see if I can get the same message. Did you try the patch against 41-11? Maybe the patch didn't go in so smoothly. Anyway, I'll take a look at it now and let you know what I find. Thanks, -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-31 11:03 ` Ingo Molnar 2005-03-31 12:03 ` Esben Nielsen 2005-03-31 12:22 ` Steven Rostedt @ 2005-03-31 12:36 ` Steven Rostedt 2005-03-31 12:58 ` Steven Rostedt 2005-03-31 12:49 ` Steven Rostedt 3 siblings, 1 reply; 92+ messages in thread From: Steven Rostedt @ 2005-03-31 12:36 UTC (permalink / raw) To: Ingo Molnar; +Cc: Esben Nielsen, LKML On Thu, 2005-03-31 at 13:03 +0200, Ingo Molnar wrote: > * Steven Rostedt <rostedt@goodmis.org> wrote: > > your patch looks good, i've added it to my tree and have uploaded the > -26-00 patch. It boots fine on my testbox, except for some new messages: > > knodemgrd_0/902: BUG in __down_complete at kernel/rt.c:1568 > [<c0103956>] dump_stack+0x23/0x25 (20) > [<c0130dcd>] down_trylock+0x1fb/0x200 (48) > [<c0364ee2>] nodemgr_host_thread+0xd0/0x17b (48) > [<c0100d4d>] kernel_thread_helper+0x5/0xb (136249364) > --------------------------- > | preempt count: 00000001 ] > | 1-level deep critical section nesting: > ---------------------------------------- > .. [<c0133a75>] .... print_traces+0x1b/0x52 > .....[<c0103956>] .. ( <= dump_stack+0x23/0x25) > > this goes away if i revert your patch. It seems the reason is that > trylock hasnt been updated to use the pending-owner logic? Damn, now that I have a comparable kernel to look at, I see where that 1568 is. I did see these, but they went away when I changed the logic to handle the BKL, and I thought it was related. Since this happened with the trylock, do you see anyway that a pending owner can cause problems? Maybe this has to do with is_locked. Now a pending owner makes this ambiguous. Since the lock has a owner, and a task can't get it if it is of lower priority than the pending owner, but it can get it if it is higher. Now is it locked? My implementation was to be safe and say that it is locked. I'll play around some more with this. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-31 12:36 ` Steven Rostedt @ 2005-03-31 12:58 ` Steven Rostedt 2005-03-31 13:28 ` Ingo Molnar 0 siblings, 1 reply; 92+ messages in thread From: Steven Rostedt @ 2005-03-31 12:58 UTC (permalink / raw) To: Ingo Molnar; +Cc: Esben Nielsen, LKML On Thu, 2005-03-31 at 07:36 -0500, Steven Rostedt wrote: > On Thu, 2005-03-31 at 13:03 +0200, Ingo Molnar wrote: > > * Steven Rostedt <rostedt@goodmis.org> wrote: > > Since this happened with the trylock, do you see anyway that a pending > owner can cause problems? Maybe this has to do with is_locked. Now a > pending owner makes this ambiguous. Since the lock has a owner, and a > task can't get it if it is of lower priority than the pending owner, but > it can get it if it is higher. Now is it locked? My implementation was > to be safe and say that it is locked. > > I'll play around some more with this. Oops! Found a little bug. Ingo, see if this fixes it. -- Steve --- ./kernel/rt.c.orig 2005-03-31 07:27:59.000000000 -0500 +++ ./kernel/rt.c 2005-03-31 07:53:14.913072893 -0500 @@ -1244,7 +1244,7 @@ /* * Check to see if we didn't have ownership stolen. */ - if (ret) { + if (!ret) { if (capture_lock(&waiter,task)) { set_task_state(task, TASK_INTERRUPTIBLE); goto wait_again; ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-31 12:58 ` Steven Rostedt @ 2005-03-31 13:28 ` Ingo Molnar 0 siblings, 0 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-31 13:28 UTC (permalink / raw) To: Steven Rostedt; +Cc: Esben Nielsen, LKML * Steven Rostedt <rostedt@goodmis.org> wrote: > > I'll play around some more with this. > > Oops! Found a little bug. Ingo, see if this fixes it. yeah, that was it. I've uploaded -42-02 with the fix included. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-31 11:03 ` Ingo Molnar ` (2 preceding siblings ...) 2005-03-31 12:36 ` Steven Rostedt @ 2005-03-31 12:49 ` Steven Rostedt 2005-03-31 14:10 ` Ingo Molnar 3 siblings, 1 reply; 92+ messages in thread From: Steven Rostedt @ 2005-03-31 12:49 UTC (permalink / raw) To: Ingo Molnar; +Cc: Esben Nielsen, LKML On Thu, 2005-03-31 at 13:03 +0200, Ingo Molnar wrote: > * Steven Rostedt <rostedt@goodmis.org> wrote: > knodemgrd_0/902: BUG in __down_complete at kernel/rt.c:1568 > [<c0103956>] dump_stack+0x23/0x25 (20) > [<c0130dcd>] down_trylock+0x1fb/0x200 (48) > [<c0364ee2>] nodemgr_host_thread+0xd0/0x17b (48) > [<c0100d4d>] kernel_thread_helper+0x5/0xb (136249364) > --------------------------- > | preempt count: 00000001 ] > | 1-level deep critical section nesting: > ---------------------------------------- > .. [<c0133a75>] .... print_traces+0x1b/0x52 > .....[<c0103956>] .. ( <= dump_stack+0x23/0x25) One more thing. Was this on SMP or UP. I haven't tested this on SMP yet. When my laptop (HT) gets done with its work, I'll give it a try there. Of course I need to disable NVidia on it first. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-31 12:49 ` Steven Rostedt @ 2005-03-31 14:10 ` Ingo Molnar 2005-03-31 17:41 ` Steven Rostedt 0 siblings, 1 reply; 92+ messages in thread From: Ingo Molnar @ 2005-03-31 14:10 UTC (permalink / raw) To: Steven Rostedt; +Cc: Esben Nielsen, LKML * Steven Rostedt <rostedt@goodmis.org> wrote: > One more thing. Was this on SMP or UP. I haven't tested this on SMP > yet. When my laptop (HT) gets done with its work, I'll give it a try > there. Of course I need to disable NVidia on it first. i've booted the latest tree on a 4-way testbox and everything seems ok. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-31 14:10 ` Ingo Molnar @ 2005-03-31 17:41 ` Steven Rostedt 2005-03-31 17:49 ` Ingo Molnar 0 siblings, 1 reply; 92+ messages in thread From: Steven Rostedt @ 2005-03-31 17:41 UTC (permalink / raw) To: Ingo Molnar; +Cc: Esben Nielsen, LKML On Thu, 2005-03-31 at 16:10 +0200, Ingo Molnar wrote: > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > One more thing. Was this on SMP or UP. I haven't tested this on SMP > > yet. When my laptop (HT) gets done with its work, I'll give it a try > > there. Of course I need to disable NVidia on it first. > > i've booted the latest tree on a 4-way testbox and everything seems ok. Thanks Ingo. Oh, and did your really want to assign debug = .1? -- Steve Here you go: --- ./include/linux/rt_lock.h.orig 2005-03-31 12:38:31.583913080 -0500 +++ ./include/linux/rt_lock.h 2005-03-31 12:38:35.499061576 -0500 @@ -125,7 +125,7 @@ # ifdef CONFIG_RT_DEADLOCK_DETECT # define __RW_LOCK_UNLOCKED \ .wait_lock = __RAW_SPIN_LOCK_UNLOCKED, .save_state = 1, \ - .debug = .1, .file = __FILE__, .line = __LINE__ + .debug = 1, .file = __FILE__, .line = __LINE__ # define _RW_LOCK_UNLOCKED(lock) \ (rwlock_t) { { { __RW_LOCK_UNLOCKED, .name = #lock } } } # define RW_LOCK_UNLOCKED \ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-31 17:41 ` Steven Rostedt @ 2005-03-31 17:49 ` Ingo Molnar 2005-03-31 18:17 ` Gene Heskett ` (2 more replies) 0 siblings, 3 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-31 17:49 UTC (permalink / raw) To: Steven Rostedt; +Cc: Esben Nielsen, LKML * Steven Rostedt <rostedt@goodmis.org> wrote: > Oh, and did your really want to assign debug = .1? > - .debug = .1, .file = __FILE__, .line = __LINE__ > + .debug = 1, .file = __FILE__, .line = __LINE__ doh - indeed. This means rwlocks were nondebug all along? Ouch. I've uploaded -42-08 with the fix. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-31 17:49 ` Ingo Molnar @ 2005-03-31 18:17 ` Gene Heskett 2005-03-31 21:00 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 (update) Gene Heskett 2005-03-31 20:22 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 Steven Rostedt 2005-04-01 0:59 ` Steven Rostedt 2 siblings, 1 reply; 92+ messages in thread From: Gene Heskett @ 2005-03-31 18:17 UTC (permalink / raw) To: linux-kernel; +Cc: Ingo Molnar, Steven Rostedt, Esben Nielsen On Thursday 31 March 2005 12:49, Ingo Molnar wrote: >* Steven Rostedt <rostedt@goodmis.org> wrote: >> Oh, and did your really want to assign debug = .1? >> >> - .debug = .1, .file = __FILE__, .line = __LINE__ >> + .debug = 1, .file = __FILE__, .line = __LINE__ > >doh - indeed. This means rwlocks were nondebug all along? Ouch. I've >uploaded -42-08 with the fix. > > Ingo It wasn't there when I looked yet, so I just built 42-05, Ingo. Almost everything works except I can't get any video out of tvtime, and the audio quality is still intermittently tinny, sometimes cured by an internal tvtime restart with earlier versions but not this version. Tinny sound seem to be forever now. Also, lots of logging from tvtime, as if it was getting the video but missing a frame occasionally, and spit this out when I quit it. Mar 31 13:05:00 coyote kernel: rtc latency histogram of {tvtime/5251, 1303 samples}: Mar 31 13:05:00 coyote kernel: 4 1 Mar 31 13:05:00 coyote kernel: 5 190 Mar 31 13:05:00 coyote kernel: 6 455 Mar 31 13:05:00 coyote kernel: 7 264 Mar 31 13:05:00 coyote kernel: 8 84 Mar 31 13:05:00 coyote kernel: 9 42 Mar 31 13:05:00 coyote kernel: 10 22 Mar 31 13:05:00 coyote kernel: 11 34 Mar 31 13:05:00 coyote kernel: 12 15 Mar 31 13:05:00 coyote kernel: 13 21 Mar 31 13:05:00 coyote kernel: 14 19 Mar 31 13:05:00 coyote kernel: 15 18 Mar 31 13:05:00 coyote kernel: 16 10 Mar 31 13:05:00 coyote kernel: 17 6 Mar 31 13:05:00 coyote kernel: 18 8 Mar 31 13:05:00 coyote kernel: 19 6 Mar 31 13:05:00 coyote kernel: 20 8 Mar 31 13:05:00 coyote kernel: 21 1 Mar 31 13:05:00 coyote kernel: 22 2 Mar 31 13:05:00 coyote kernel: 23 2 Mar 31 13:05:00 coyote kernel: 25 1 Mar 31 13:05:00 coyote kernel: 26 2 Mar 31 13:05:00 coyote kernel: 29 1 Mar 31 13:05:00 coyote kernel: 30 2 Mar 31 13:05:00 coyote kernel: 47 1 Mar 31 13:05:00 coyote kernel: 9999 88 I have -08 now, so I'll go build that one next. Other than these, this one 'feels' good. >- >To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.34% setiathome rank, not too shabby for a WV hillbilly Yahoo.com and AOL/TW attorneys please note, additions to the above message by Gene Heskett are: Copyright 2005 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 (update) 2005-03-31 18:17 ` Gene Heskett @ 2005-03-31 21:00 ` Gene Heskett 0 siblings, 0 replies; 92+ messages in thread From: Gene Heskett @ 2005-03-31 21:00 UTC (permalink / raw) To: linux-kernel; +Cc: Ingo Molnar, Steven Rostedt, Esben Nielsen On Thursday 31 March 2005 13:17, Gene Heskett wrote: >On Thursday 31 March 2005 12:49, Ingo Molnar wrote: >>* Steven Rostedt <rostedt@goodmis.org> wrote: >>> Oh, and did your really want to assign debug = .1? >>> >>> - .debug = .1, .file = __FILE__, .line = __LINE__ >>> + .debug = 1, .file = __FILE__, .line = __LINE__ >> >>doh - indeed. This means rwlocks were nondebug all along? Ouch. >> I've uploaded -42-08 with the fix. >> >> Ingo > >It wasn't there when I looked yet, so I just built 42-05, Ingo. >Almost everything works except I can't get any video out of tvtime, >and the audio quality is still intermittently tinny, sometimes > cured by an internal tvtime restart with earlier versions but not > this version. Tinny sound seem to be forever now. > >Also, lots of logging from tvtime, as if it was getting the video > but missing a frame occasionally, and spit this out when I quit it. > >Mar 31 13:05:00 coyote kernel: rtc latency histogram of > {tvtime/5251, 1303 samples}: >Mar 31 13:05:00 coyote kernel: 4 1 >Mar 31 13:05:00 coyote kernel: 5 190 >Mar 31 13:05:00 coyote kernel: 6 455 >Mar 31 13:05:00 coyote kernel: 7 264 >Mar 31 13:05:00 coyote kernel: 8 84 >Mar 31 13:05:00 coyote kernel: 9 42 >Mar 31 13:05:00 coyote kernel: 10 22 >Mar 31 13:05:00 coyote kernel: 11 34 >Mar 31 13:05:00 coyote kernel: 12 15 >Mar 31 13:05:00 coyote kernel: 13 21 >Mar 31 13:05:00 coyote kernel: 14 19 >Mar 31 13:05:00 coyote kernel: 15 18 >Mar 31 13:05:00 coyote kernel: 16 10 >Mar 31 13:05:00 coyote kernel: 17 6 >Mar 31 13:05:00 coyote kernel: 18 8 >Mar 31 13:05:00 coyote kernel: 19 6 >Mar 31 13:05:00 coyote kernel: 20 8 >Mar 31 13:05:00 coyote kernel: 21 1 >Mar 31 13:05:00 coyote kernel: 22 2 >Mar 31 13:05:00 coyote kernel: 23 2 >Mar 31 13:05:00 coyote kernel: 25 1 >Mar 31 13:05:00 coyote kernel: 26 2 >Mar 31 13:05:00 coyote kernel: 29 1 >Mar 31 13:05:00 coyote kernel: 30 2 >Mar 31 13:05:00 coyote kernel: 47 1 >Mar 31 13:05:00 coyote kernel: 9999 88 > > >I have -08 now, so I'll go build that one next. Other than these, >this one 'feels' good. > Now I'm on 42-08, it still feels good, but somebodies done something bad for tvtime when its accessing a pcHDTV-3000 card for the src. No video, blue screen. Thinking that loading it in rc.local was out of order (that works for 2.6.12-rc1), I've rmmod-ed the whole thing and reloaded after starting X, no difference. >From the logs when I executed a modprobe cx88-dvb: ----------------------------- Mar 31 15:18:53 coyote kernel: Linux video capture interface: v1.00 Mar 31 15:18:53 coyote kernel: cx2388x v4l2 driver version 0.0.4 loaded Mar 31 15:18:53 coyote kernel: ACPI: PCI interrupt 0000:01:07.0[A] -> GSI 11 (level, low) -> IRQ 11 Mar 31 15:18:53 coyote kernel: cx88[0]: subsystem: 7063:3000, board: pcHDTV HD3000 HDTV [card=22,autodetected] Mar 31 15:18:53 coyote kernel: i2c-algo-bit.o: cx88[0] passed test. Mar 31 15:18:53 coyote kernel: tuner 3-0061: chip found @ 0xc2 (cx88 [0]) Mar 31 15:18:53 coyote kernel: tuner 3-0061: type set to 52 (Thomson DDT 7610 (ATSC/NTSC)) Mar 31 15:18:53 coyote kernel: cx88[0]/0: found at 0000:01:07.0, rev: 5, irq: 11, latency: 32, mmio: 0xea000000 Mar 31 15:18:53 coyote kernel: cx88[0]/0: registered device video0 [v4l2] Mar 31 15:18:53 coyote kernel: cx88[0]/0: registered device vbi0 Mar 31 15:18:53 coyote kernel: cx88[0]/0: registered device radio0 Mar 31 15:19:03 coyote kernel: cx2388x dvb driver version 0.0.4 loaded Mar 31 15:19:03 coyote kernel: ACPI: PCI interrupt 0000:01:07.2[A] -> GSI 11 (level, low) -> IRQ 11 Mar 31 15:19:03 coyote kernel: cx88[0]/2: found at 0000:01:07.2, rev: 5, irq: 11, latency: 32, mmio: 0xeb000000 Mar 31 15:19:03 coyote kernel: cx88[0]/2: cx2388x based dvb card Mar 31 15:19:03 coyote kernel: DVB: registering new adapter (cx88[0]). Mar 31 15:19:03 coyote kernel: DVB: registering frontend 0 (pcHDTV HD3000 HDTV)... ---------------------------- An lsmod shows this for that card: --Module Size Used by cx88_dvb 5764 0 cx8802 8068 1 cx88_dvb mt352 6532 1 cx88_dvb or51132 9348 1 cx88_dvb dvb_pll 3972 2 cx88_dvb,or51132 video_buf_dvb 4740 1 cx88_dvb dvb_core 76072 1 video_buf_dvb cx8800 27276 0 cx88xx 48416 3 cx88_dvb,cx8802,cx8800 i2c_algo_bit 8456 1 cx88xx video_buf 17796 5 cx88_dvb,cx8802,video_buf_dvb,cx8800,cx88xx ir_common 6404 1 cx88xx tveeprom 11800 1 cx88xx v4l1_compat 12676 1 cx8800 v4l2_common 4992 1 cx8800 btcx_risc 4104 3 cx8802,cx8800,cx88xx tuner 25768 0 videodev 7680 2 cx8800,cx88xx ------------------- I've snipped the webcam and firewire stuff as its not applicable here. >From the logs when I fired up tvtime: ----------------------------- Mar 31 15:22:12 coyote kernel: cx88[0]: video y / packed - dma channel status dump Mar 31 15:22:12 coyote kernel: cx88[0]: cmds: initial risc: 0x1ecce000 Mar 31 15:22:12 coyote kernel: cx88[0]: cmds: cdt base : 0x00180440 Mar 31 15:22:12 coyote kernel: cx88[0]: cmds: cdt size : 0x0000000c Mar 31 15:22:12 coyote kernel: cx88[0]: cmds: iq base : 0x00180400 Mar 31 15:22:12 coyote kernel: cx88[0]: cmds: iq size : 0x00000010 Mar 31 15:22:12 coyote kernel: cx88[0]: cmds: risc pc : 0x00000000 Mar 31 15:22:12 coyote kernel: cx88[0]: cmds: iq wr ptr : 0x00000000 Mar 31 15:22:12 coyote kernel: cx88[0]: cmds: iq rd ptr : 0x00000000 Mar 31 15:22:12 coyote kernel: cx88[0]: cmds: cdt current : 0x00000000 Mar 31 15:22:12 coyote kernel: cx88[0]: cmds: pci target : 0x00000000 Mar 31 15:22:12 coyote kernel: cx88[0]: cmds: line / byte : 0x00000000 Mar 31 15:22:12 coyote kernel: cx88[0]: risc0: 0x00000000 [ INVALID count=0 ] Mar 31 15:22:12 coyote kernel: cx88[0]: risc1: 0x00000000 [ INVALID count=0 ] Mar 31 15:22:12 coyote kernel: cx88[0]: risc2: 0x00000000 [ INVALID count=0 ] Mar 31 15:22:12 coyote kernel: cx88[0]: risc3: 0x00000000 [ INVALID count=0 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq 0: 0x80008000 [ sync resync count=0 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq 1: 0x1c000500 [ write sol eol count=1280 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq 2: 0x1b9a3000 [ arg #1 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq 3: 0x1c000500 [ write sol eol count=1280 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq 4: 0x1b9a3a00 [ arg #1 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq 5: 0x1c000500 [ write sol eol count=1280 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq 6: 0x194d4400 [ arg #1 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq 7: 0x18000200 [ write sol count=512 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq 8: 0x194d4e00 [ arg #1 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq 9: 0x14000300 [ write eol count=768 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq a: 0x194d5000 [ arg #1 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq b: 0x1c000500 [ write sol eol count=1280 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq c: 0x194d5800 [ arg #1 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq d: 0x0031c040 [ INVALID 21 20 cnt0 resync 14 count=64 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq e: 0x00000000 [ INVALID count=0 ] Mar 31 15:22:12 coyote kernel: cx88[0]: iq f: 0x00000011 [ INVALID count=17 ] Mar 31 15:22:12 coyote kernel: cx88[0]: fifo: 0x00180c00 -> 0x183400 Mar 31 15:22:12 coyote kernel: cx88[0]: ctrl: 0x00180400 -> 0x180460 Mar 31 15:22:12 coyote kernel: cx88[0]: ptr1_reg: 0x00181d20 Mar 31 15:22:12 coyote kernel: cx88[0]: ptr2_reg: 0x00180478 Mar 31 15:22:12 coyote kernel: cx88[0]: cnt1_reg: 0x0000005a Mar 31 15:22:12 coyote kernel: cx88[0]: cnt2_reg: 0x00000000 Mar 31 15:22:12 coyote kernel: cx88[0]/0: [f3aa07e0/0] timeout - dma=0x00000000 Mar 31 15:22:12 coyote kernel: cx88[0]/0: [f3aa01e0/1] timeout - dma=0x00000000 Mar 31 15:22:12 coyote kernel: cx88[0]/0: [edbc8a60/2] timeout - dma=0x00000000 Mar 31 15:22:12 coyote kernel: cx88[0]/0: [f7391540/3] timeout - dma=0x1ecce000 ------------------------------ and which appears to repeat for every frame it should display. Very voluminous logging. It works great when running 2.6.12-rc1 FWTW. Which I'll be doing again shortly, but first I'll check out kino & my movie camera, and spca50x & my webcam. kino works, includeing sound, this is ieee1394/firewire stuffs. And, after a make clean;make;make install in the scpa50x src dir, that works. My scanner works and while I haven't tried to print, the web interface says its all ready and waiting to go. In other words the usb appears to be fine. Networking seems to be ok, mail is coming in more or less normally. So does anyone have any suggestions as to what to check out in tvtime/pcHDTV area. Humm, in the FWIW category, under the video stuffs in a make xconfig, the module/switch/whatever that is under DVB broadcasting-->DVB for linux-->Core Support, and way down at the bottom of the list is "Customize DVB Frontends", and again clear at the bottom of the list is "nxt2002 based" which when you check it, should show (I think) a checkbox infront of the OR51132 Based (pcHDTV) But it does not. However, the command to 'modprobe cx88-dvb' does load it as can be seen by the lsmod screen above. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.34% setiathome rank, not too shabby for a WV hillbilly Yahoo.com and AOL/TW attorneys please note, additions to the above message by Gene Heskett are: Copyright 2005 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-31 17:49 ` Ingo Molnar 2005-03-31 18:17 ` Gene Heskett @ 2005-03-31 20:22 ` Steven Rostedt 2005-04-01 0:59 ` Steven Rostedt 2 siblings, 0 replies; 92+ messages in thread From: Steven Rostedt @ 2005-03-31 20:22 UTC (permalink / raw) To: Ingo Molnar; +Cc: Esben Nielsen, LKML On Thu, 2005-03-31 at 19:49 +0200, Ingo Molnar wrote: > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > Oh, and did your really want to assign debug = .1? > > > - .debug = .1, .file = __FILE__, .line = __LINE__ > > + .debug = 1, .file = __FILE__, .line = __LINE__ > > doh - indeed. This means rwlocks were nondebug all along? Ouch. I've > uploaded -42-08 with the fix. I noticed it because starting with V0.7.41-25 (although I only actually noticed it with 42-07) not only were they nondebug, but they also didn't have any priority inheritance. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-31 17:49 ` Ingo Molnar 2005-03-31 18:17 ` Gene Heskett 2005-03-31 20:22 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 Steven Rostedt @ 2005-04-01 0:59 ` Steven Rostedt 2005-04-01 4:43 ` Ingo Molnar 2 siblings, 1 reply; 92+ messages in thread From: Steven Rostedt @ 2005-04-01 0:59 UTC (permalink / raw) To: Ingo Molnar; +Cc: LKML Hi Ingo, I was wondering if the issue the bit_spin_lock has gone into the side burner? I understand that this is a major problem to change it, if you want to get into the mainline kernel. But I still believe it to be a problem. With kjournald spinning on a bit lock until it finishes it's quota, can be bad for latencies. Although it only deadlocks on your system if it was a real-time task, it still has the ability to become one. Since it can hold two different locks at the time of the spin, if a rt-task tries to grab it, with priority inheritance it becomes rt, and if that just happened to be the highest priority task on the system, you just created a deadlock. It's not so much of an issue with me, since I'm working with a parallel kernel, and have implemented my earlier fixes to it. I just wanted to know if there's any plan to deal with them on your end. I do see this causing a random lockup once in a while, and wanted the user to beware. Of course if you just avoid ext3, you wont have a problem. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-04-01 0:59 ` Steven Rostedt @ 2005-04-01 4:43 ` Ingo Molnar 2005-04-01 5:13 ` Steven Rostedt 0 siblings, 1 reply; 92+ messages in thread From: Ingo Molnar @ 2005-04-01 4:43 UTC (permalink / raw) To: Steven Rostedt; +Cc: LKML * Steven Rostedt <rostedt@goodmis.org> wrote: > Hi Ingo, > > I was wondering if the issue the bit_spin_lock has gone into the side > burner? [...] could you send me your latest patch for the bit-spin issue? My main issue was cleanliness, so that the patch doesnt get stuck in the -RT tree forever. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-04-01 4:43 ` Ingo Molnar @ 2005-04-01 5:13 ` Steven Rostedt 2005-04-01 5:19 ` Ingo Molnar 0 siblings, 1 reply; 92+ messages in thread From: Steven Rostedt @ 2005-04-01 5:13 UTC (permalink / raw) To: Ingo Molnar; +Cc: LKML On Fri, 2005-04-01 at 06:43 +0200, Ingo Molnar wrote: > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > Hi Ingo, > > > > I was wondering if the issue the bit_spin_lock has gone into the side > > burner? [...] > > could you send me your latest patch for the bit-spin issue? My main > issue was cleanliness, so that the patch doesnt get stuck in the -RT > tree forever. I think that's the main problem. Without changing the design of the ext3 system, I don't think there is a clean patch. The implementation that I finally settled down with was to make the j_state and j_journal_head locks two global locks. I had to make a few modifications to some spots to avoid deadlocks, but this worked out well. The problem I was worried about was this causing too much overhead. So the ext3 buffers would have to contend with each other. I don't have any tests to see if this had too much of an impact. If you are still interested, then let me know and I'll pull it out and send it to you. I preferred this method over the other wait_on_bit, since using normal spin_locks gives priority inheritance, but to put this into the buffer head seemed too much of an overhead. Also, there was that inverted_lock crap in commit.c that also caused problems. I just used the expensive wait_queue fix, where instead of just calling schedule, I put the task on the wait queue to wake up when the lock was released, and had unlock of j_state_lock wake it up. But this is expensive since every time j_state_lock is unlocked, you need to try to wake up a task that is most likely not there. This I still need to optimize. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-04-01 5:13 ` Steven Rostedt @ 2005-04-01 5:19 ` Ingo Molnar 2005-04-01 12:27 ` Steven Rostedt 0 siblings, 1 reply; 92+ messages in thread From: Ingo Molnar @ 2005-04-01 5:19 UTC (permalink / raw) To: Steven Rostedt; +Cc: LKML * Steven Rostedt <rostedt@goodmis.org> wrote: > > could you send me your latest patch for the bit-spin issue? My main > > issue was cleanliness, so that the patch doesnt get stuck in the -RT > > tree forever. > > I think that's the main problem. Without changing the design of the > ext3 system, I don't think there is a clean patch. The implementation > that I finally settled down with was to make the j_state and > j_journal_head locks two global locks. I had to make a few > modifications to some spots to avoid deadlocks, but this worked out > well. The problem I was worried about was this causing too much > overhead. So the ext3 buffers would have to contend with each other. I > don't have any tests to see if this had too much of an impact. yeah - i think Andrew said that a global lock at that particular place might not be that much of an issue. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-04-01 5:19 ` Ingo Molnar @ 2005-04-01 12:27 ` Steven Rostedt 2005-04-07 21:21 ` Steven Rostedt 0 siblings, 1 reply; 92+ messages in thread From: Steven Rostedt @ 2005-04-01 12:27 UTC (permalink / raw) To: Ingo Molnar; +Cc: LKML On Fri, 2005-04-01 at 07:19 +0200, Ingo Molnar wrote: > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > > could you send me your latest patch for the bit-spin issue? My main > > > issue was cleanliness, so that the patch doesnt get stuck in the -RT > > > tree forever. > > > > I think that's the main problem. Without changing the design of the > > ext3 system, I don't think there is a clean patch. The implementation > > that I finally settled down with was to make the j_state and > > j_journal_head locks two global locks. I had to make a few > > modifications to some spots to avoid deadlocks, but this worked out > > well. The problem I was worried about was this causing too much > > overhead. So the ext3 buffers would have to contend with each other. I > > don't have any tests to see if this had too much of an impact. > > yeah - i think Andrew said that a global lock at that particular place > might not be that much of an issue. > OK, I'll start stripping it out of my kernel today and make a clean patch for you. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-04-01 12:27 ` Steven Rostedt @ 2005-04-07 21:21 ` Steven Rostedt 2005-04-10 10:31 ` Ingo Molnar 0 siblings, 1 reply; 92+ messages in thread From: Steven Rostedt @ 2005-04-07 21:21 UTC (permalink / raw) To: Ingo Molnar; +Cc: LKML On Fri, 2005-04-01 at 07:27 -0500, Steven Rostedt wrote: > On Fri, 2005-04-01 at 07:19 +0200, Ingo Molnar wrote: > > * Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > could you send me your latest patch for the bit-spin issue? My main > > > > issue was cleanliness, so that the patch doesnt get stuck in the -RT > > > > tree forever. > > > > > > I think that's the main problem. Without changing the design of the > > > ext3 system, I don't think there is a clean patch. The implementation > > > that I finally settled down with was to make the j_state and > > > j_journal_head locks two global locks. I had to make a few > > > modifications to some spots to avoid deadlocks, but this worked out > > > well. The problem I was worried about was this causing too much > > > overhead. So the ext3 buffers would have to contend with each other. I > > > don't have any tests to see if this had too much of an impact. > > > > yeah - i think Andrew said that a global lock at that particular place > > might not be that much of an issue. > > > > OK, I'll start stripping it out of my kernel today and make a clean > patch for you. > Ingo, I haven't forgotten about this, I just been heavily bug wacking lately and just haven't had the time to do this. I've pulled out both the lock_bh_state and lock_bh_journal_head and made them two global locks. I haven't noticed any slowing down here, but then again I haven't ran any real benchmarks. There's a BH flag set to know when the lock is pending on a specific buffer head. I don't know how acceptable this patch is. Take a look and if you have any better ideas then let me know. I prefer this patch over the wait_on_bit patch I sent you earlier since this patch still accounts for priority inheritance, as the wait_on_bits don't. -- Steve Index: include/linux/jbd.h =================================================================== --- include/linux/jbd.h (revision 113) +++ include/linux/jbd.h (working copy) @@ -313,47 +313,100 @@ BUFFER_FNS(RevokeValid, revokevalid) TAS_BUFFER_FNS(RevokeValid, revokevalid) BUFFER_FNS(Freed, freed) +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +BUFFER_FNS(State,state) -static inline struct buffer_head *jh2bh(struct journal_head *jh) +extern spinlock_t journal_bh_state_lock; +extern spinlock_t journal_bh_journal_lock; + +static inline void jbd_lock_bh_state(struct buffer_head *bh) { - return jh->b_bh; + spin_lock(&journal_bh_state_lock); + BUG_ON(buffer_state(bh)); + set_buffer_state(bh); + __acquire(journal_bh_state_lock); } -static inline struct journal_head *bh2jh(struct buffer_head *bh) +static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bh->b_private; + if (!spin_trylock(&journal_bh_state_lock)) + return 0; + + BUG_ON(buffer_state(bh)); + set_buffer_state(bh); + __acquire(journal_bh_state_lock); + return 1; } +static inline int jbd_is_locked_bh_state(struct buffer_head *bh) +{ + return buffer_state(bh); +} + +static inline void jbd_unlock_bh_state(struct buffer_head *bh) +{ + BUG_ON(!buffer_state(bh)); + clear_buffer_state(bh); + spin_unlock(&journal_bh_state_lock); + __release(journal_bh_state_lock); +} + +static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) +{ + spin_lock(&journal_bh_journal_lock); + __acquire(journal_bh_journal_lock); +} + +static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) +{ + spin_unlock(&journal_bh_journal_lock); + __release(journal_bh_journal_lock); +} + +#else /* ! (CONFIG_SMP || CONFIG_DEBUG_SPINLOCK || CONFIG_PREEMPT) */ + static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, &bh->b_state); + __acquire(journal_bh_state_lock); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, &bh->b_state); + __acquire(journal_bh_state_lock); + return 1; } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, &bh->b_state); + return 1; } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, &bh->b_state); + __release(journal_bh_state_lock); } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { - bit_spin_lock(BH_JournalHead, &bh->b_state); + __acquire(journal_bh_journal_lock); } static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) { - bit_spin_unlock(BH_JournalHead, &bh->b_state); + __release(journal_bh_journal_lock); } +#endif /* (CONFIG_SMP || CONFIG_DEBUG_SPINLOCK || CONFIG_PREEMPT) */ +static inline struct buffer_head *jh2bh(struct journal_head *jh) +{ + return jh->b_bh; +} + +static inline struct journal_head *bh2jh(struct buffer_head *bh) +{ + return bh->b_private; +} + struct jbd_revoke_table_s; /** Index: fs/jbd/journal.c =================================================================== --- fs/jbd/journal.c (revision 113) +++ fs/jbd/journal.c (working copy) @@ -82,6 +82,14 @@ static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +spinlock_t journal_bh_state_lock = SPIN_LOCK_UNLOCKED; +spinlock_t journal_bh_journal_lock = SPIN_LOCK_UNLOCKED; + +EXPORT_SYMBOL(journal_bh_state_lock); +EXPORT_SYMBOL(journal_bh_journal_lock); +#endif + /* * Helper function used to manage commit timeouts */ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-04-07 21:21 ` Steven Rostedt @ 2005-04-10 10:31 ` Ingo Molnar 2005-04-10 15:06 ` Steven Rostedt 0 siblings, 1 reply; 92+ messages in thread From: Ingo Molnar @ 2005-04-10 10:31 UTC (permalink / raw) To: Steven Rostedt; +Cc: LKML * Steven Rostedt <rostedt@goodmis.org> wrote: > > > yeah - i think Andrew said that a global lock at that particular place > > > might not be that much of an issue. > > > > OK, I'll start stripping it out of my kernel today and make a clean > > patch for you. > > Ingo, I haven't forgotten about this, I just been heavily bug wacking > lately and just haven't had the time to do this. > > I've pulled out both the lock_bh_state and lock_bh_journal_head and > made them two global locks. I haven't noticed any slowing down here, > but then again I haven't ran any real benchmarks. There's a BH flag > set to know when the lock is pending on a specific buffer head. > > I don't know how acceptable this patch is. Take a look and if you have > any better ideas then let me know. I prefer this patch over the > wait_on_bit patch I sent you earlier since this patch still accounts > for priority inheritance, as the wait_on_bits don't. looks much cleaner than earlier ones. Would it be possible to make the locks per journal? I've applied it to the -44-05 kernel so that it gets some testing. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-04-10 10:31 ` Ingo Molnar @ 2005-04-10 15:06 ` Steven Rostedt 0 siblings, 0 replies; 92+ messages in thread From: Steven Rostedt @ 2005-04-10 15:06 UTC (permalink / raw) To: Ingo Molnar; +Cc: LKML On Sun, 2005-04-10 at 12:31 +0200, Ingo Molnar wrote: > looks much cleaner than earlier ones. Would it be possible to make the > locks per journal? [...] I've already looked into doing this, but it would be much more intrusive to implement. The problem lies where these locks are called with only the buffer head as a reference. The locks are used to attach or detach the buffer head from a journal or just see if it is already attached. So having the lock with the journal is difficult since you need to take the lock sometimes before you know which journal is needed. I'm sure this is possible but it will need modifying the code where the locks are called instead of just replacing the contents of the lock function. Maybe with the help of Stephen Tweedie, this can be done. But what I gave you was the cleanest and most reliable solution currently, without changing anything but the functions to take the locks. -- Steve ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-24 8:31 ` Steven Rostedt 2005-03-24 8:47 ` Steven Rostedt @ 2005-03-24 10:42 ` Ingo Molnar 1 sibling, 0 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-24 10:42 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-kernel * Steven Rostedt <rostedt@goodmis.org> wrote: > This way when process C tries to get the lock again, it sees that it's > owned, but B hasn't executed yet. So it would be safe for C to take > the lock back from B, that is if C is greater priority than B, > otherwise it would act the same. agreed. In particular this would be a nice optimization for cases where tasks are delayed for a longer time due to CPU congestion (e.g. lots of tasks on the same SCHED_FIFO priority). So if a higher prio task comes along while the > If you agree with this approach, I would be willing to write a patch > for you. yeah - please do. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 6:16 ` Paul E. McKenney 2005-03-23 6:33 ` Ingo Molnar @ 2005-03-23 9:40 ` Herbert Xu 2005-03-23 9:48 ` Herbert Xu 1 sibling, 1 reply; 92+ messages in thread From: Herbert Xu @ 2005-03-23 9:40 UTC (permalink / raw) To: paulmck; +Cc: mingo, linux-kernel, simlo Paul E. McKenney <paulmck@us.ibm.com> wrote: > > +void rcu_read_unlock(void) > +{ > + int cpu; > + > + if (--current->rcu_read_lock_nesting == 0) { > + atomic_dec(¤t->rcu_data->active_readers); > + /* > + * Check whether we have reached quiescent state. > + * Note! This is only for the local CPU, not for > + * current->rcu_data's CPU [which typically is the > + * current CPU, but may also be another CPU]. > + */ > + cpu = get_cpu(); > > And need an smp_mb() here, again for non-x86 CPUs. The atomic_dec is already a barrier. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 2005-03-23 9:40 ` Herbert Xu @ 2005-03-23 9:48 ` Herbert Xu 0 siblings, 0 replies; 92+ messages in thread From: Herbert Xu @ 2005-03-23 9:48 UTC (permalink / raw) To: paulmck; +Cc: mingo, linux-kernel, simlo On Wed, Mar 23, 2005 at 08:40:58PM +1100, Herbert Xu wrote: > Paul E. McKenney <paulmck@us.ibm.com> wrote: > > > > +void rcu_read_unlock(void) > > +{ > > + int cpu; > > + > > + if (--current->rcu_read_lock_nesting == 0) { > > + atomic_dec(¤t->rcu_data->active_readers); > > + /* > > + * Check whether we have reached quiescent state. > > + * Note! This is only for the local CPU, not for > > + * current->rcu_data's CPU [which typically is the > > + * current CPU, but may also be another CPU]. > > + */ > > + cpu = get_cpu(); > > > > And need an smp_mb() here, again for non-x86 CPUs. > > The atomic_dec is already a barrier. Sorry, I was confused. atomic_dec is not a barrier of course. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-05 2005-03-22 10:01 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-05 Ingo Molnar 2005-03-22 11:28 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 Ingo Molnar @ 2005-03-23 5:23 ` Paul E. McKenney 1 sibling, 0 replies; 92+ messages in thread From: Paul E. McKenney @ 2005-03-23 5:23 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel On Tue, Mar 22, 2005 at 11:01:53AM +0100, Ingo Molnar wrote: > > * Ingo Molnar <mingo@elte.hu> wrote: > > > hm, another thing: i think call_rcu() needs to take the read-lock. > > Right now it assumes that it has the data structure private, but > > that's only statistically true on PREEMPT_RT: another CPU may have > > this CPU's RCU control structure in use. So IRQs-off (or preempt-off) > > is not a guarantee to have the data structure, the read lock has to be > > taken. > > i've reworked the code to use the read-lock to access the per-CPU data > RCU structures, and it boots with 2 CPUs and PREEMPT_RT now. The -40-05 > patch can be downloaded from the usual place: > > http://redhat.com/~mingo/realtime-preempt/ > > had to add two hacks though: > > static void rcu_advance_callbacks(struct rcu_data *rdp) > { > if (rdp->batch != rcu_ctrlblk.batch) { > if (rdp->donetail) // HACK > *rdp->donetail = rdp->waitlist; > ... > > void fastcall call_rcu(struct rcu_head *head, > void (*func)(struct rcu_head *rcu)) > [...] > rcu_advance_callbacks(rdp); > if (rdp->waittail) // HACK > *rdp->waittail = head; > ... > > without them it crashes during bootup. Probably also the cause of the memory leakage. More evidence that list macros are useful, just in case anyone needed any more. > maybe we are better off with the completely unlocked read path and the > long grace periods. Might well be... But doesn't there need to be an smp_mb() right after the increment in rcu_read_lock() and before the rcu_qsctr_inc(cpu) in rcu_read_unlock(), at least for non-x86 CPUs? On the long grace periods... It should be possible to force the grace periods to end by providing a pair of active_readers counters per rcu_data structure, one "current", the other "expiring". New rcu_read_lock() invocations get a pointer to the "current" active_readers counter for their current CPU, and rcu_read_unlock() decrements whichever one the corresponding rcu_read_lock() got a pointer to. When it is necessary to force a grace period, the roles of the "current" and the "expiring" counters are reversed. Once all CPUs' "expiring" counters have gone to zero, then we know that all RCU read-side critical sections that were executing at the time of the role reversal have completed. It is necessary to wait for all the "expiring" counters to go to zero before doing a second role reversal. This is very similar to some mechanism in the K42 and in Dipankar's rcu. See page 18 of http://www.rdrop.com/users/paulmck/RCU/rcu.2002.07.08.pdf for a better description. There is a patch against a very early 2.5 that I can dig up if anyone is interested. Once I get the lock-based method working, I might have the confidence to take this on. If someone wants to try it out in the meantime, I would of course be happy to review their code. Thanx, Paul ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 2005-03-22 9:32 ` Ingo Molnar 2005-03-22 10:01 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-05 Ingo Molnar @ 2005-03-23 4:48 ` Paul E. McKenney 2005-03-23 6:21 ` Ingo Molnar 1 sibling, 1 reply; 92+ messages in thread From: Paul E. McKenney @ 2005-03-23 4:48 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel On Tue, Mar 22, 2005 at 10:32:01AM +0100, Ingo Molnar wrote: > > * Ingo Molnar <mingo@elte.hu> wrote: > > > seems to be a true SMP race: when i boot with 1 CPU it doesnt trigger, > > the same kernel image and 2 CPUs triggers it on CPU#1. (CPU#0 is the > > boot CPU) Note that the timing of the crash is not deterministic > > (sometimes i get it during net startup, sometimes during ACPI > > startup), but it always crashes within rcu_advance_callbacks(). > > > > one difference between your tests and mine is that your kernel is > > doing _synchronize_kernel() from preempt-off sections (correct?), > > while my kernel with PREEMPT_RT does it on preemptable sections. > > hm, another thing: i think call_rcu() needs to take the read-lock. Right > now it assumes that it has the data structure private, but that's only > statistically true on PREEMPT_RT: another CPU may have this CPU's RCU > control structure in use. So IRQs-off (or preempt-off) is not a > guarantee to have the data structure, the read lock has to be taken. !!! The difference is that in the stock kernel, rcu_check_callbacks() is invoked from irq. In PREEMPT_RT, it is invoked from process context and appears to be preemptible. This means that rcu_advance_callbacks() can be preempted, resulting in all sorts of problems. Need to disable preemption over this. There are probably other bugs of this sort, I will track them down. But, just to make sure I understand -- if I have preempt disabled over all accesses to a per-CPU variable, and that variable is -not- accessed from a real interrupt handler, then I am safe without a lock, right? Thanx, Paul PS. Fixing my stupid bugs that you pointed out in earlier email, as well. :-/ ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 2005-03-23 4:48 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 Paul E. McKenney @ 2005-03-23 6:21 ` Ingo Molnar 0 siblings, 0 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-23 6:21 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel * Paul E. McKenney <paulmck@us.ibm.com> wrote: > !!! The difference is that in the stock kernel, rcu_check_callbacks() > is invoked from irq. In PREEMPT_RT, it is invoked from process > context and appears to be preemptible. This means that > rcu_advance_callbacks() can be preempted, resulting in all sorts of > problems. Need to disable preemption over this. we can disable preemption in the PREEMPT_RT kernel too, but only for code we know the maximum execution length of. I.e. we dont want want to disable preemption while processing the callback _list_, but we can use preemption-off to protect the per-cpu data structures. > There are probably other bugs of this sort, I will track them down. > > But, just to make sure I understand -- if I have preempt disabled over > all accesses to a per-CPU variable, and that variable is -not- > accessed from a real interrupt handler, then I am safe without a lock, > right? correct, assuming that the pointer you get to the per-CPU data is truly pointing to the current CPU's data. (i.e. the current->rcu_data pointer approach breaks this assumption.) Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 2005-03-22 5:43 ` Paul E. McKenney 2005-03-22 7:24 ` Ingo Molnar @ 2005-03-22 8:59 ` Ingo Molnar 1 sibling, 0 replies; 92+ messages in thread From: Ingo Molnar @ 2005-03-22 8:59 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel regarding this code: void synchronize_kernel(void) { long oldbatch; smp_mb(); oldbatch = rcu_ctrlblk.batch; schedule_timeout(HZ/GRACE_PERIODS_PER_SEC); what is the intent of that schedule_timeout() - a fixed-period delay? Is that acceptable? In any case, it wont do what you think it does, you should use mdelay(), or do current->state = TASK_UNINTERRUPTIBLE - otherwise the call will just return when we are in TASK_RUNNING. Ingo ^ permalink raw reply [flat|nested] 92+ messages in thread
end of thread, other threads:[~2005-04-10 15:06 UTC | newest] Thread overview: 92+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-03-19 19:16 [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-00 Ingo Molnar 2005-03-20 0:24 ` Lee Revell 2005-03-21 15:42 ` K.R. Foley 2005-03-20 1:33 ` Lee Revell 2005-03-20 1:50 ` K.R. Foley 2005-03-20 4:32 ` Lee Revell 2005-03-20 22:40 ` Paul E. McKenney 2005-03-20 17:45 ` Paul E. McKenney 2005-03-21 8:53 ` Ingo Molnar 2005-03-21 9:01 ` Ingo Molnar 2005-03-21 9:06 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 Ingo Molnar 2005-03-21 23:10 ` Magnus Naeslund(t) 2005-03-22 5:40 ` Paul E. McKenney 2005-03-22 8:50 ` Ingo Molnar 2005-03-22 13:56 ` Magnus Naeslund(t) 2005-03-23 5:46 ` Paul E. McKenney 2005-03-22 5:43 ` Paul E. McKenney 2005-03-22 7:24 ` Ingo Molnar 2005-03-22 9:23 ` Ingo Molnar 2005-03-22 9:32 ` Ingo Molnar 2005-03-22 10:01 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-05 Ingo Molnar 2005-03-22 11:28 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 Ingo Molnar 2005-03-22 15:06 ` K.R. Foley 2005-03-22 18:05 ` Magnus Naeslund(t) 2005-03-23 6:16 ` Paul E. McKenney 2005-03-23 6:33 ` Ingo Molnar 2005-03-23 6:37 ` Ingo Molnar 2005-03-24 6:06 ` Paul E. McKenney 2005-03-23 7:16 ` Ingo Molnar 2005-03-23 7:54 ` Steven Rostedt 2005-03-23 7:58 ` Ingo Molnar 2005-03-23 8:29 ` Peter Zijlstra 2005-03-23 9:03 ` Ingo Molnar 2005-03-24 6:45 ` Paul E. McKenney 2005-03-23 21:46 ` Ingo Molnar 2005-03-24 6:59 ` Paul E. McKenney 2005-03-24 6:38 ` Paul E. McKenney 2005-03-23 9:38 ` Herbert Xu 2005-03-23 9:49 ` Herbert Xu 2005-03-24 6:52 ` Paul E. McKenney 2005-03-24 5:28 ` Paul E. McKenney 2005-03-24 5:34 ` Ingo Molnar 2005-03-24 7:46 ` Paul E. McKenney 2005-03-24 8:31 ` Steven Rostedt 2005-03-24 8:47 ` Steven Rostedt 2005-03-24 10:45 ` Ingo Molnar 2005-03-24 11:39 ` Ingo Molnar 2005-03-24 14:33 ` Steven Rostedt 2005-03-24 17:51 ` Ingo Molnar 2005-03-24 18:17 ` Ingo Molnar 2005-03-24 23:05 ` Esben Nielsen 2005-03-25 6:19 ` Ingo Molnar 2005-03-26 16:31 ` Steven Rostedt 2005-03-26 19:11 ` Ingo Molnar 2005-03-26 16:04 ` Steven Rostedt 2005-03-30 6:31 ` Steven Rostedt 2005-03-30 6:50 ` Ingo Molnar 2005-03-30 16:46 ` Steven Rostedt 2005-03-30 19:44 ` Esben Nielsen 2005-03-30 19:56 ` Steven Rostedt 2005-03-30 21:39 ` Steven Rostedt 2005-03-30 21:43 ` Steven Rostedt 2005-03-31 11:03 ` Ingo Molnar 2005-03-31 12:03 ` Esben Nielsen 2005-03-31 12:14 ` Steven Rostedt 2005-03-31 13:33 ` Ingo Molnar 2005-03-31 12:22 ` Steven Rostedt 2005-03-31 12:36 ` Steven Rostedt 2005-03-31 12:58 ` Steven Rostedt 2005-03-31 13:28 ` Ingo Molnar 2005-03-31 12:49 ` Steven Rostedt 2005-03-31 14:10 ` Ingo Molnar 2005-03-31 17:41 ` Steven Rostedt 2005-03-31 17:49 ` Ingo Molnar 2005-03-31 18:17 ` Gene Heskett 2005-03-31 21:00 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 (update) Gene Heskett 2005-03-31 20:22 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07 Steven Rostedt 2005-04-01 0:59 ` Steven Rostedt 2005-04-01 4:43 ` Ingo Molnar 2005-04-01 5:13 ` Steven Rostedt 2005-04-01 5:19 ` Ingo Molnar 2005-04-01 12:27 ` Steven Rostedt 2005-04-07 21:21 ` Steven Rostedt 2005-04-10 10:31 ` Ingo Molnar 2005-04-10 15:06 ` Steven Rostedt 2005-03-24 10:42 ` Ingo Molnar 2005-03-23 9:40 ` Herbert Xu 2005-03-23 9:48 ` Herbert Xu 2005-03-23 5:23 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-05 Paul E. McKenney 2005-03-23 4:48 ` [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-01 Paul E. McKenney 2005-03-23 6:21 ` Ingo Molnar 2005-03-22 8:59 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox