* CONFIG_PREEMPT_RCU in next/mmotm
@ 2009-08-08 19:34 Hugh Dickins
2009-08-08 23:56 ` Paul E. McKenney
0 siblings, 1 reply; 14+ messages in thread
From: Hugh Dickins @ 2009-08-08 19:34 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Ingo Molnar, Ben Herrenschmidt, Andrew Morton, linux-kernel
Hi Paul,
Is CONFIG_PREEMPT_RCU=y expected to be working in -next (or mmotm)?
I ask because it appears to break down on PowerPC G5 when I try two
"make -j20" kernel builds in parallel. The "filp" slab which usually
contains a couple of thousand objects or so, jumps up to a couple of
hundred thousand before the builds complete, and continues rising
from then on - I think that's a sign of RCU in disgrace?
And rebooting hangs thereafter.
And I notice that include/linux/rcupreempt.h currently says:
static inline void synchronize_rcu_expedited(void)
{
synchronize_rcu(); /* Placeholder for new rcupreempt implementation. */
}
which gives an impression of work in progress?
CONFIG_PREEMPT_TREE=y seems okay on PowerPC; and when I briefly tried
CONFIG_PREEMPT_RCU=y on x86_64, it didn't show above symptoms there.
I did try bisecting yesterday's linux-next git, but that led me to
commit 8ca17c6082feee5841a7b0e91d00e18c3f85f063
Merge: dafcc6e... 7256cf0...
Author: Ingo Molnar <mingo@elte.hu>
Date: Mon Aug 3 15:50:37 2009 +0200
Merge branch 'core/rcu' into auto-sched-next
rather than to any particular patch of yours which that merges:
which seemed odd, but I'm not accustomed to bisecting next.
Another odd thing is that mmotm .DATE=2009-07-30-05-01, the last
I tried before Thursday's, showed no such symptoms: yet appears to
contain all the larger RCU changes, just lacking some more recent
on/offline race fixes. I didn't notice any likely difference
between those mmotm trees down in arch/powerpc either.
My guess is that there's some other issue which is triggering
the RCU disgrace, but that is just a guess.
Config attached. Any suggestions?
Thanks,
Hugh
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: CONFIG_PREEMPT_RCU in next/mmotm 2009-08-08 19:34 CONFIG_PREEMPT_RCU in next/mmotm Hugh Dickins @ 2009-08-08 23:56 ` Paul E. McKenney 2009-08-09 3:32 ` Hugh Dickins 0 siblings, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2009-08-08 23:56 UTC (permalink / raw) To: Hugh Dickins; +Cc: Ingo Molnar, Ben Herrenschmidt, Andrew Morton, linux-kernel On Sat, Aug 08, 2009 at 08:34:03PM +0100, Hugh Dickins wrote: > Hi Paul, > > Is CONFIG_PREEMPT_RCU=y expected to be working in -next (or mmotm)? > > I ask because it appears to break down on PowerPC G5 when I try two > "make -j20" kernel builds in parallel. The "filp" slab which usually > contains a couple of thousand objects or so, jumps up to a couple of > hundred thousand before the builds complete, and continues rising > from then on - I think that's a sign of RCU in disgrace? > And rebooting hangs thereafter. That would indeed be bad. > And I notice that include/linux/rcupreempt.h currently says: > static inline void synchronize_rcu_expedited(void) > { > synchronize_rcu(); /* Placeholder for new rcupreempt implementation. */ > } > which gives an impression of work in progress? Yep, but I would be surprised if this was causing your problem. In fact, I would be surprised if this function was being invoked. Then again, I have been surprised more than once. For whatever it is worth, with luck, include/linux/rcupreempt.h disappears entirely at some point. > CONFIG_PREEMPT_TREE=y seems okay on PowerPC; and when I briefly tried CONFIG_TREE_RCU=y, you mean, right? > CONFIG_PREEMPT_RCU=y on x86_64, it didn't show above symptoms there. Interesting... > I did try bisecting yesterday's linux-next git, but that led me to > > commit 8ca17c6082feee5841a7b0e91d00e18c3f85f063 > Merge: dafcc6e... 7256cf0... > Author: Ingo Molnar <mingo@elte.hu> > Date: Mon Aug 3 15:50:37 2009 +0200 > > Merge branch 'core/rcu' into auto-sched-next > > rather than to any particular patch of yours which that merges: > which seemed odd, but I'm not accustomed to bisecting next. I honestly don't know how "git bisect" interacts with merges. > Another odd thing is that mmotm .DATE=2009-07-30-05-01, the last > I tried before Thursday's, showed no such symptoms: yet appears to > contain all the larger RCU changes, just lacking some more recent > on/offline race fixes. I didn't notice any likely difference > between those mmotm trees down in arch/powerpc either. > > My guess is that there's some other issue which is triggering > the RCU disgrace, but that is just a guess. The one other issue I know of is one that apparently happens only on one of Ingo's machines, which suddenly decides that it need not inform RCU when onlining a CPU (at boot time). But you seem to be making it past boot, and I would guess that you are not doing any hotplug CPU operations, right? > Config attached. Any suggestions? /me scratches head... Will try some more CONFIG_PREEMPT_RCU testing locally. Thanx, Paul ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CONFIG_PREEMPT_RCU in next/mmotm 2009-08-08 23:56 ` Paul E. McKenney @ 2009-08-09 3:32 ` Hugh Dickins 2009-08-09 5:33 ` Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2009-08-09 3:32 UTC (permalink / raw) To: Paul E. McKenney Cc: Hugh Dickins, Ingo Molnar, Ben Herrenschmidt, Andrew Morton, linux-kernel On Sat, 8 Aug 2009, Paul E. McKenney wrote: > On Sat, Aug 08, 2009 at 08:34:03PM +0100, Hugh Dickins wrote: > > > CONFIG_PREEMPT_TREE=y seems okay on PowerPC; and when I briefly tried > > CONFIG_TREE_RCU=y, you mean, right? Whoops, yes indeed. > > My guess is that there's some other issue which is triggering > > the RCU disgrace, but that is just a guess. > > The one other issue I know of is one that apparently happens only on > one of Ingo's machines, which suddenly decides that it need not inform > RCU when onlining a CPU (at boot time). > > But you seem to be making it past boot, and I would guess that you are > not doing any hotplug CPU operations, right? That's right, I'm well past boot, and don't explicitly get into CPU hotplug, nor using suspend/hibernate on that machine; and looking at the config, see I don't even have CONFIG_HOTPLUG_CPU set. In the morning I'll check that /proc/cpuinfo really shows all four cpus before I start, and is still showing four cpus when it goes bad, in case there's nevertheless an issue with that. Is there some particular attribute of a cpu, most relevant to its RCU grace status, that I can check on with a printk? > > > Config attached. Any suggestions? [ Of course I forgot the config, but sent it just to Paul later. ] > > /me scratches head... Will try some more CONFIG_PREEMPT_RCU testing > locally. Thanks: I'm kind of glad that it's interesting, though it may sink some of our time. Hugh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CONFIG_PREEMPT_RCU in next/mmotm 2009-08-09 3:32 ` Hugh Dickins @ 2009-08-09 5:33 ` Paul E. McKenney 2009-08-09 11:24 ` Hugh Dickins 0 siblings, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2009-08-09 5:33 UTC (permalink / raw) To: Hugh Dickins; +Cc: Ingo Molnar, Ben Herrenschmidt, Andrew Morton, linux-kernel On Sun, Aug 09, 2009 at 04:32:28AM +0100, Hugh Dickins wrote: > On Sat, 8 Aug 2009, Paul E. McKenney wrote: > > On Sat, Aug 08, 2009 at 08:34:03PM +0100, Hugh Dickins wrote: > > > > > CONFIG_PREEMPT_TREE=y seems okay on PowerPC; and when I briefly tried > > > > CONFIG_TREE_RCU=y, you mean, right? > > Whoops, yes indeed. > > > > My guess is that there's some other issue which is triggering > > > the RCU disgrace, but that is just a guess. > > > > The one other issue I know of is one that apparently happens only on > > one of Ingo's machines, which suddenly decides that it need not inform > > RCU when onlining a CPU (at boot time). > > > > But you seem to be making it past boot, and I would guess that you are > > not doing any hotplug CPU operations, right? > > That's right, I'm well past boot, and don't explicitly get into CPU > hotplug, nor using suspend/hibernate on that machine; and looking at > the config, see I don't even have CONFIG_HOTPLUG_CPU set. > > In the morning I'll check that /proc/cpuinfo really shows all four > cpus before I start, and is still showing four cpus when it goes bad, > in case there's nevertheless an issue with that. > > Is there some particular attribute of a cpu, most relevant to its RCU > grace status, that I can check on with a printk? The debugfs files enabled by CONFIG_RCU_TRACE, except rcu/rcugp, which, since it will attempt to wait for a grace period, will probably wait forever. > > > Config attached. Any suggestions? > > [ Of course I forgot the config, but sent it just to Paul later. ] > > > > > /me scratches head... Will try some more CONFIG_PREEMPT_RCU testing > > locally. > > Thanks: I'm kind of glad that it's interesting, > though it may sink some of our time. Well, it certainly does have me a bit confused at the moment! Thanx, Paul ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CONFIG_PREEMPT_RCU in next/mmotm 2009-08-09 5:33 ` Paul E. McKenney @ 2009-08-09 11:24 ` Hugh Dickins 2009-08-09 13:06 ` Hugh Dickins 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2009-08-09 11:24 UTC (permalink / raw) To: Paul E. McKenney Cc: Ingo Molnar, Ben Herrenschmidt, Andrew Morton, linux-kernel On Sat, 8 Aug 2009, Paul E. McKenney wrote: > On Sun, Aug 09, 2009 at 04:32:28AM +0100, Hugh Dickins wrote: > > > > Is there some particular attribute of a cpu, most relevant to its RCU > > grace status, that I can check on with a printk? > > The debugfs files enabled by CONFIG_RCU_TRACE, except rcu/rcugp, which, > since it will attempt to wait for a grace period, will probably wait Here's ten second samples from just before and after it "goes": the number of filp objects (my own guide to it going), rcuctrs and rctstats. With ggp stuck on 35124, as you predicted, rcugp just hangs. filp 5616 objects: CPU last cur F M 0 0 0 0 1 1 0 0 0 0 2 0 -1 0 0 3 0 1 0 0 ggp = 33821, state = waitmb na=2751281 nl=64 wa=2751217 wl=176 da=2751041 dl=0 dr=2751041 di=2751023 1=353642 e1=193343 i1=33939 ie1=118 g1=33821 a1=38041 ae1=4220 a2=33821 z1=36834 ze1=3013 z2=33821 m1=51485 me1=17665 m2=33820 filp 5780 objects: CPU last cur F M 0 0 0 0 0 1 0 0 0 0 2 -1 0 0 0 3 1 0 0 0 ggp = 34518, state = waitack na=2826191 nl=18 wa=2826173 wl=71 da=2826102 dl=0 dr=2826102 di=2826058 1=362636 e1=198972 i1=34640 ie1=122 g1=34518 a1=38847 ae1=4330 a2=34517 z1=37702 ze1=3185 z2=34517 m1=52475 me1=17958 m2=34517 filp 13343 objects: CPU last cur F M 0 1 0 0 0 1 0 0 0 0 2 -1 0 0 0 3 0 0 0 0 ggp = 35124, state = waitzero na=2898045 nl=15380 wa=2882665 wl=150 da=2882515 dl=0 dr=2882515 di=2882465 1=371717 e1=204532 i1=35248 ie1=124 g1=35124 a1=39545 ae1=4421 a2=35124 z1=39077 ze1=3954 z2=35123 m1=53315 me1=18192 m2=35123 filp 67190 objects: CPU last cur F M 0 1 0 0 0 1 0 1 0 0 2 -1 0 0 0 3 0 0 0 0 ggp = 35124, state = waitzero na=2961758 nl=79093 wa=2882665 wl=150 da=2882515 dl=0 dr=2882515 di=2882465 1=383301 e1=212065 i1=35248 ie1=124 g1=35124 a1=39545 ae1=4421 a2=35124 z1=43128 ze1=8005 z2=35123 m1=53315 me1=18192 m2=35123 filp 137160 objects: CPU last cur F M 0 1 1 0 0 1 0 0 0 0 2 -1 0 0 0 3 0 0 0 0 ggp = 35124, state = waitzero na=3041696 nl=159031 wa=2882665 wl=150 da=2882515 dl=0 dr=2882515 di=2882465 1=394717 e1=219335 i1=35248 ie1=124 g1=35124 a1=39545 ae1=4421 a2=35124 z1=47274 ze1=12151 z2=35123 m1=53315 me1=18192 m2=35123 Hugh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CONFIG_PREEMPT_RCU in next/mmotm 2009-08-09 11:24 ` Hugh Dickins @ 2009-08-09 13:06 ` Hugh Dickins 2009-08-09 18:57 ` Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2009-08-09 13:06 UTC (permalink / raw) To: Paul E. McKenney Cc: Ingo Molnar, Ben Herrenschmidt, Andrew Morton, linux-kernel On Sun, 9 Aug 2009, Hugh Dickins wrote: > > filp 13343 objects: > > CPU last cur F M > 0 1 0 0 0 > 1 0 0 0 0 > 2 -1 0 0 0 > 3 0 0 0 0 > ggp = 35124, state = waitzero Interesting that rcu_try_flip_waitzero() doesn't see 1 + 0 + -1 + 0 == 0. That's because rcu_cpu_online_map is 0x1 instead of the 0xf it should be. Which is because I don't have CONFIG_HOTPLUG_CPU=y on that PPC machine (unlike the x86s), and I think you've made some recent mods which accidentally made the rcu cpu initialization dependent on hotplug cpu notifiers? CONFIG_HOTPLUG_CPU=y and it works properly again. And TREE_RCU doesn't use an rcu_cpu_online_map (though it does expect some per-cpu initialization, but seems to get away without it). So I think that's the mystery solved - I'll let you decide the right fix! Hugh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CONFIG_PREEMPT_RCU in next/mmotm 2009-08-09 13:06 ` Hugh Dickins @ 2009-08-09 18:57 ` Paul E. McKenney 2009-08-09 20:53 ` Hugh Dickins 0 siblings, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2009-08-09 18:57 UTC (permalink / raw) To: Hugh Dickins; +Cc: Ingo Molnar, Ben Herrenschmidt, Andrew Morton, linux-kernel On Sun, Aug 09, 2009 at 02:06:05PM +0100, Hugh Dickins wrote: > On Sun, 9 Aug 2009, Hugh Dickins wrote: > > > > filp 13343 objects: > > > > CPU last cur F M > > 0 1 0 0 0 > > 1 0 0 0 0 > > 2 -1 0 0 0 > > 3 0 0 0 0 > > ggp = 35124, state = waitzero > > Interesting that rcu_try_flip_waitzero() doesn't see 1 + 0 + -1 + 0 == 0. Indeed!!! You nailed it!!! > That's because rcu_cpu_online_map is 0x1 instead of the 0xf it should be. > > Which is because I don't have CONFIG_HOTPLUG_CPU=y on that PPC machine > (unlike the x86s), and I think you've made some recent mods which > accidentally made the rcu cpu initialization dependent on hotplug > cpu notifiers? CONFIG_HOTPLUG_CPU=y and it works properly again. Back to the drawing board at this end... > And TREE_RCU doesn't use an rcu_cpu_online_map (though it does expect > some per-cpu initialization, but seems to get away without it). Only by pure luck. The kind of luck that gets you into serious trouble later on. > So I think that's the mystery solved - I'll let you decide the right fix! Thank you -very- much for tracking this down, Hugh!!! I introduced the problem in commit 7fe616c5dd50a50f334edec1ea0580b90b7af0d9 by changing from register_cpu_notifier() to hotcpu_notifier(). The former lets you know when CPUs come on line unconditionally, the latter only when CONFIG_HOTPLUG_CPU is in effect. But hotcpu_notifier() is much nicer to use, so I propose introducing a cpu_notifier() that is invoked like hotcpu_notifier() is, but is unconditional in the same way that register_cpu_notifier(). Something like the following (untested, probably does not compile): diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 4d668e0..d5dfc1f 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -48,6 +48,15 @@ struct notifier_block; #ifdef CONFIG_SMP /* Need to know about CPUs going up/down? */ +#if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) +#define cpu_notifier(fn, pri) { \ + static struct notifier_block fn##_nb __cpuinitdata = \ + { .notifier_call = fn, .priority = pri }; \ + register_cpu_notifier(&fn##_nb); \ +} +#else /* #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */ +#define cpu_notifier(fn, pri) do { (void)(fn); } while (0) +#endif /* #else #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */ #ifdef CONFIG_HOTPLUG_CPU extern int register_cpu_notifier(struct notifier_block *nb); extern void unregister_cpu_notifier(struct notifier_block *nb); @@ -99,11 +108,7 @@ extern struct sysdev_class cpu_sysdev_class; extern void get_online_cpus(void); extern void put_online_cpus(void); -#define hotcpu_notifier(fn, pri) { \ - static struct notifier_block fn##_nb __cpuinitdata = \ - { .notifier_call = fn, .priority = pri }; \ - register_cpu_notifier(&fn##_nb); \ -} +#define hotcpu_notifier(fn, pri) cpu_notifier(fn, pri) #define register_hotcpu_notifier(nb) register_cpu_notifier(nb) #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb) int cpu_down(unsigned int cpu); diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c index 9f0584e..c1bbfd5 100644 --- a/kernel/rcupdate.c +++ b/kernel/rcupdate.c @@ -251,7 +251,7 @@ void __init rcu_init(void) int i; __rcu_init(); - hotcpu_notifier(rcu_barrier_cpu_hotplug, 0); + cpu_notifier(rcu_barrier_cpu_hotplug, 0); /* * We don't need protection against CPU-hotplug here because With this in place: diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c index 9f0584e..c1bbfd5 100644 --- a/kernel/rcupdate.c +++ b/kernel/rcupdate.c @@ -251,7 +251,7 @@ void __init rcu_init(void) int i; __rcu_init(); - hotcpu_notifier(rcu_barrier_cpu_hotplug, 0); + cpu_notifier(rcu_barrier_cpu_hotplug, 0); /* * We don't need protection against CPU-hotplug here because Thoughts? Thanx, Paul ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: CONFIG_PREEMPT_RCU in next/mmotm 2009-08-09 18:57 ` Paul E. McKenney @ 2009-08-09 20:53 ` Hugh Dickins 2009-08-09 21:16 ` Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2009-08-09 20:53 UTC (permalink / raw) To: Paul E. McKenney Cc: Ingo Molnar, Ben Herrenschmidt, Andrew Morton, linux-kernel On Sun, 9 Aug 2009, Paul E. McKenney wrote: > > I introduced the problem in commit 7fe616c5dd50a50f334edec1ea0580b90b7af0d9 > by changing from register_cpu_notifier() to hotcpu_notifier(). The former > lets you know when CPUs come on line unconditionally, the latter only > when CONFIG_HOTPLUG_CPU is in effect. > > But hotcpu_notifier() is much nicer to use, so I propose introducing > a cpu_notifier() that is invoked like hotcpu_notifier() is, but is > unconditional in the same way that register_cpu_notifier(). > > Something like the following (untested, probably does not compile): > > diff --git a/include/linux/cpu.h b/include/linux/cpu.h > index 4d668e0..d5dfc1f 100644 > --- a/include/linux/cpu.h > +++ b/include/linux/cpu.h > @@ -48,6 +48,15 @@ struct notifier_block; > > #ifdef CONFIG_SMP > /* Need to know about CPUs going up/down? */ > +#if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) > +#define cpu_notifier(fn, pri) { \ > + static struct notifier_block fn##_nb __cpuinitdata = \ > + { .notifier_call = fn, .priority = pri }; \ > + register_cpu_notifier(&fn##_nb); \ > +} > +#else /* #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */ > +#define cpu_notifier(fn, pri) do { (void)(fn); } while (0) > +#endif /* #else #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */ > #ifdef CONFIG_HOTPLUG_CPU > extern int register_cpu_notifier(struct notifier_block *nb); > extern void unregister_cpu_notifier(struct notifier_block *nb); > @@ -99,11 +108,7 @@ extern struct sysdev_class cpu_sysdev_class; > > extern void get_online_cpus(void); > extern void put_online_cpus(void); > -#define hotcpu_notifier(fn, pri) { \ > - static struct notifier_block fn##_nb __cpuinitdata = \ > - { .notifier_call = fn, .priority = pri }; \ > - register_cpu_notifier(&fn##_nb); \ > -} > +#define hotcpu_notifier(fn, pri) cpu_notifier(fn, pri) > #define register_hotcpu_notifier(nb) register_cpu_notifier(nb) > #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb) > int cpu_down(unsigned int cpu); > diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c > index 9f0584e..c1bbfd5 100644 > --- a/kernel/rcupdate.c > +++ b/kernel/rcupdate.c > @@ -251,7 +251,7 @@ void __init rcu_init(void) > int i; > > __rcu_init(); > - hotcpu_notifier(rcu_barrier_cpu_hotplug, 0); > + cpu_notifier(rcu_barrier_cpu_hotplug, 0); > > /* > * We don't need protection against CPU-hotplug here because > [ removed repeat of rcupdate.c patch ] > > Thoughts? That builds and works for me, with or without CONFIG_HOTPLUG_CPU. But I didn't get what you're achieving with the MODULE part of it; and (I'm not a notifier buff at all) it does seems rather baroque to me - a single callsite, why not stick with register_cpu_notifier()? Ah, perhaps it's your ambition to move others over to this (or perhaps it's your ambition to leave that to someone else ;-) Hugh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CONFIG_PREEMPT_RCU in next/mmotm 2009-08-09 20:53 ` Hugh Dickins @ 2009-08-09 21:16 ` Paul E. McKenney 2009-08-10 3:39 ` Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2009-08-09 21:16 UTC (permalink / raw) To: Hugh Dickins; +Cc: Ingo Molnar, Ben Herrenschmidt, Andrew Morton, linux-kernel On Sun, Aug 09, 2009 at 09:53:53PM +0100, Hugh Dickins wrote: > On Sun, 9 Aug 2009, Paul E. McKenney wrote: > > > > I introduced the problem in commit 7fe616c5dd50a50f334edec1ea0580b90b7af0d9 > > by changing from register_cpu_notifier() to hotcpu_notifier(). The former > > lets you know when CPUs come on line unconditionally, the latter only > > when CONFIG_HOTPLUG_CPU is in effect. > > > > But hotcpu_notifier() is much nicer to use, so I propose introducing > > a cpu_notifier() that is invoked like hotcpu_notifier() is, but is > > unconditional in the same way that register_cpu_notifier(). > > > > Something like the following (untested, probably does not compile): > > > > diff --git a/include/linux/cpu.h b/include/linux/cpu.h > > index 4d668e0..d5dfc1f 100644 > > --- a/include/linux/cpu.h > > +++ b/include/linux/cpu.h > > @@ -48,6 +48,15 @@ struct notifier_block; > > > > #ifdef CONFIG_SMP > > /* Need to know about CPUs going up/down? */ > > +#if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) > > +#define cpu_notifier(fn, pri) { \ > > + static struct notifier_block fn##_nb __cpuinitdata = \ > > + { .notifier_call = fn, .priority = pri }; \ > > + register_cpu_notifier(&fn##_nb); \ > > +} > > +#else /* #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */ > > +#define cpu_notifier(fn, pri) do { (void)(fn); } while (0) > > +#endif /* #else #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */ > > #ifdef CONFIG_HOTPLUG_CPU > > extern int register_cpu_notifier(struct notifier_block *nb); > > extern void unregister_cpu_notifier(struct notifier_block *nb); > > @@ -99,11 +108,7 @@ extern struct sysdev_class cpu_sysdev_class; > > > > extern void get_online_cpus(void); > > extern void put_online_cpus(void); > > -#define hotcpu_notifier(fn, pri) { \ > > - static struct notifier_block fn##_nb __cpuinitdata = \ > > - { .notifier_call = fn, .priority = pri }; \ > > - register_cpu_notifier(&fn##_nb); \ > > -} > > +#define hotcpu_notifier(fn, pri) cpu_notifier(fn, pri) > > #define register_hotcpu_notifier(nb) register_cpu_notifier(nb) > > #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb) > > int cpu_down(unsigned int cpu); > > diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c > > index 9f0584e..c1bbfd5 100644 > > --- a/kernel/rcupdate.c > > +++ b/kernel/rcupdate.c > > @@ -251,7 +251,7 @@ void __init rcu_init(void) > > int i; > > > > __rcu_init(); > > - hotcpu_notifier(rcu_barrier_cpu_hotplug, 0); > > + cpu_notifier(rcu_barrier_cpu_hotplug, 0); > > > > /* > > * We don't need protection against CPU-hotplug here because > > > [ removed repeat of rcupdate.c patch ] > > > > Thoughts? > > That builds and works for me, with or without CONFIG_HOTPLUG_CPU. > > But I didn't get what you're achieving with the MODULE part of it; > and (I'm not a notifier buff at all) it does seems rather baroque to > me - a single callsite, why not stick with register_cpu_notifier()? > > Ah, perhaps it's your ambition to move others over to this > (or perhaps it's your ambition to leave that to someone else ;-) Actually, nothing quite that clearly thought out. I was just following the pattern set for register_cpu_notifier(). My guess at the reasoning is that when !HOTPLUG_CPU, modules cannot be loaded until all the CPUs are online, so there is no point in letting a module set itself up for notification. But whatever their reasoning, mine was that there is no point in creating a struct notifier_block that wasn't going to be used. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CONFIG_PREEMPT_RCU in next/mmotm 2009-08-09 21:16 ` Paul E. McKenney @ 2009-08-10 3:39 ` Paul E. McKenney 2009-08-10 22:43 ` Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2009-08-10 3:39 UTC (permalink / raw) To: Hugh Dickins; +Cc: Ingo Molnar, Ben Herrenschmidt, Andrew Morton, linux-kernel On Sun, Aug 09, 2009 at 02:16:29PM -0700, Paul E. McKenney wrote: > On Sun, Aug 09, 2009 at 09:53:53PM +0100, Hugh Dickins wrote: > > On Sun, 9 Aug 2009, Paul E. McKenney wrote: > > > > > > I introduced the problem in commit 7fe616c5dd50a50f334edec1ea0580b90b7af0d9 > > > by changing from register_cpu_notifier() to hotcpu_notifier(). The former > > > lets you know when CPUs come on line unconditionally, the latter only > > > when CONFIG_HOTPLUG_CPU is in effect. > > > > > > But hotcpu_notifier() is much nicer to use, so I propose introducing > > > a cpu_notifier() that is invoked like hotcpu_notifier() is, but is > > > unconditional in the same way that register_cpu_notifier(). > > > > > > Something like the following (untested, probably does not compile): > > > > > > diff --git a/include/linux/cpu.h b/include/linux/cpu.h > > > index 4d668e0..d5dfc1f 100644 > > > --- a/include/linux/cpu.h > > > +++ b/include/linux/cpu.h > > > @@ -48,6 +48,15 @@ struct notifier_block; > > > > > > #ifdef CONFIG_SMP > > > /* Need to know about CPUs going up/down? */ > > > +#if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) > > > +#define cpu_notifier(fn, pri) { \ > > > + static struct notifier_block fn##_nb __cpuinitdata = \ > > > + { .notifier_call = fn, .priority = pri }; \ > > > + register_cpu_notifier(&fn##_nb); \ > > > +} > > > +#else /* #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */ > > > +#define cpu_notifier(fn, pri) do { (void)(fn); } while (0) > > > +#endif /* #else #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */ > > > #ifdef CONFIG_HOTPLUG_CPU > > > extern int register_cpu_notifier(struct notifier_block *nb); > > > extern void unregister_cpu_notifier(struct notifier_block *nb); > > > @@ -99,11 +108,7 @@ extern struct sysdev_class cpu_sysdev_class; > > > > > > extern void get_online_cpus(void); > > > extern void put_online_cpus(void); > > > -#define hotcpu_notifier(fn, pri) { \ > > > - static struct notifier_block fn##_nb __cpuinitdata = \ > > > - { .notifier_call = fn, .priority = pri }; \ > > > - register_cpu_notifier(&fn##_nb); \ > > > -} > > > +#define hotcpu_notifier(fn, pri) cpu_notifier(fn, pri) > > > #define register_hotcpu_notifier(nb) register_cpu_notifier(nb) > > > #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb) > > > int cpu_down(unsigned int cpu); > > > diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c > > > index 9f0584e..c1bbfd5 100644 > > > --- a/kernel/rcupdate.c > > > +++ b/kernel/rcupdate.c > > > @@ -251,7 +251,7 @@ void __init rcu_init(void) > > > int i; > > > > > > __rcu_init(); > > > - hotcpu_notifier(rcu_barrier_cpu_hotplug, 0); > > > + cpu_notifier(rcu_barrier_cpu_hotplug, 0); > > > > > > /* > > > * We don't need protection against CPU-hotplug here because > > > > > [ removed repeat of rcupdate.c patch ] > > > > > > Thoughts? > > > > That builds and works for me, with or without CONFIG_HOTPLUG_CPU. > > > > But I didn't get what you're achieving with the MODULE part of it; > > and (I'm not a notifier buff at all) it does seems rather baroque to > > me - a single callsite, why not stick with register_cpu_notifier()? > > > > Ah, perhaps it's your ambition to move others over to this > > (or perhaps it's your ambition to leave that to someone else ;-) > > Actually, nothing quite that clearly thought out. I was just following > the pattern set for register_cpu_notifier(). My guess at the reasoning > is that when !HOTPLUG_CPU, modules cannot be loaded until all the CPUs > are online, so there is no point in letting a module set itself up for > notification. > > But whatever their reasoning, mine was that there is no point in > creating a struct notifier_block that wasn't going to be used. ;-) And the above patch fails for !CONFIG_SMP. Here is an update, testing in progress. Still not fully tested, but results are encouraging. In particular, this one is more likely to compile. Thanx, Paul ------------------------------------------------------------------------ diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 4d668e0..4753619 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -48,6 +48,15 @@ struct notifier_block; #ifdef CONFIG_SMP /* Need to know about CPUs going up/down? */ +#if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) +#define cpu_notifier(fn, pri) { \ + static struct notifier_block fn##_nb __cpuinitdata = \ + { .notifier_call = fn, .priority = pri }; \ + register_cpu_notifier(&fn##_nb); \ +} +#else /* #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */ +#define cpu_notifier(fn, pri) do { (void)(fn); } while (0) +#endif /* #else #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */ #ifdef CONFIG_HOTPLUG_CPU extern int register_cpu_notifier(struct notifier_block *nb); extern void unregister_cpu_notifier(struct notifier_block *nb); @@ -74,6 +83,8 @@ extern void cpu_maps_update_done(void); #else /* CONFIG_SMP */ +#define cpu_notifier(fn, pri) do { (void)(fn); } while (0) + static inline int register_cpu_notifier(struct notifier_block *nb) { return 0; @@ -99,11 +110,7 @@ extern struct sysdev_class cpu_sysdev_class; extern void get_online_cpus(void); extern void put_online_cpus(void); -#define hotcpu_notifier(fn, pri) { \ - static struct notifier_block fn##_nb __cpuinitdata = \ - { .notifier_call = fn, .priority = pri }; \ - register_cpu_notifier(&fn##_nb); \ -} +#define hotcpu_notifier(fn, pri) cpu_notifier(fn, pri) #define register_hotcpu_notifier(nb) register_cpu_notifier(nb) #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb) int cpu_down(unsigned int cpu); diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c index 9f0584e..c1bbfd5 100644 --- a/kernel/rcupdate.c +++ b/kernel/rcupdate.c @@ -251,7 +251,7 @@ void __init rcu_init(void) int i; __rcu_init(); - hotcpu_notifier(rcu_barrier_cpu_hotplug, 0); + cpu_notifier(rcu_barrier_cpu_hotplug, 0); /* * We don't need protection against CPU-hotplug here because ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: CONFIG_PREEMPT_RCU in next/mmotm 2009-08-10 3:39 ` Paul E. McKenney @ 2009-08-10 22:43 ` Paul E. McKenney 2009-08-12 1:34 ` Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2009-08-10 22:43 UTC (permalink / raw) To: Hugh Dickins; +Cc: Ingo Molnar, Ben Herrenschmidt, Andrew Morton, linux-kernel On Sun, Aug 09, 2009 at 08:39:07PM -0700, Paul E. McKenney wrote: > On Sun, Aug 09, 2009 at 02:16:29PM -0700, Paul E. McKenney wrote: > > On Sun, Aug 09, 2009 at 09:53:53PM +0100, Hugh Dickins wrote: > > > On Sun, 9 Aug 2009, Paul E. McKenney wrote: > > > > > > > > I introduced the problem in commit 7fe616c5dd50a50f334edec1ea0580b90b7af0d9 > > > > by changing from register_cpu_notifier() to hotcpu_notifier(). The former > > > > lets you know when CPUs come on line unconditionally, the latter only > > > > when CONFIG_HOTPLUG_CPU is in effect. > > > > > > > > But hotcpu_notifier() is much nicer to use, so I propose introducing > > > > a cpu_notifier() that is invoked like hotcpu_notifier() is, but is > > > > unconditional in the same way that register_cpu_notifier(). > > > > > > > > Something like the following (untested, probably does not compile): > > > > > > > > diff --git a/include/linux/cpu.h b/include/linux/cpu.h > > > > index 4d668e0..d5dfc1f 100644 > > > > --- a/include/linux/cpu.h > > > > +++ b/include/linux/cpu.h > > > > @@ -48,6 +48,15 @@ struct notifier_block; > > > > > > > > #ifdef CONFIG_SMP > > > > /* Need to know about CPUs going up/down? */ > > > > +#if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) > > > > +#define cpu_notifier(fn, pri) { \ > > > > + static struct notifier_block fn##_nb __cpuinitdata = \ > > > > + { .notifier_call = fn, .priority = pri }; \ > > > > + register_cpu_notifier(&fn##_nb); \ > > > > +} > > > > +#else /* #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */ > > > > +#define cpu_notifier(fn, pri) do { (void)(fn); } while (0) > > > > +#endif /* #else #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */ > > > > #ifdef CONFIG_HOTPLUG_CPU > > > > extern int register_cpu_notifier(struct notifier_block *nb); > > > > extern void unregister_cpu_notifier(struct notifier_block *nb); > > > > @@ -99,11 +108,7 @@ extern struct sysdev_class cpu_sysdev_class; > > > > > > > > extern void get_online_cpus(void); > > > > extern void put_online_cpus(void); > > > > -#define hotcpu_notifier(fn, pri) { \ > > > > - static struct notifier_block fn##_nb __cpuinitdata = \ > > > > - { .notifier_call = fn, .priority = pri }; \ > > > > - register_cpu_notifier(&fn##_nb); \ > > > > -} > > > > +#define hotcpu_notifier(fn, pri) cpu_notifier(fn, pri) > > > > #define register_hotcpu_notifier(nb) register_cpu_notifier(nb) > > > > #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb) > > > > int cpu_down(unsigned int cpu); > > > > diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c > > > > index 9f0584e..c1bbfd5 100644 > > > > --- a/kernel/rcupdate.c > > > > +++ b/kernel/rcupdate.c > > > > @@ -251,7 +251,7 @@ void __init rcu_init(void) > > > > int i; > > > > > > > > __rcu_init(); > > > > - hotcpu_notifier(rcu_barrier_cpu_hotplug, 0); > > > > + cpu_notifier(rcu_barrier_cpu_hotplug, 0); > > > > > > > > /* > > > > * We don't need protection against CPU-hotplug here because > > > > > > > [ removed repeat of rcupdate.c patch ] > > > > > > > > Thoughts? > > > > > > That builds and works for me, with or without CONFIG_HOTPLUG_CPU. > > > > > > But I didn't get what you're achieving with the MODULE part of it; > > > and (I'm not a notifier buff at all) it does seems rather baroque to > > > me - a single callsite, why not stick with register_cpu_notifier()? > > > > > > Ah, perhaps it's your ambition to move others over to this > > > (or perhaps it's your ambition to leave that to someone else ;-) > > > > Actually, nothing quite that clearly thought out. I was just following > > the pattern set for register_cpu_notifier(). My guess at the reasoning > > is that when !HOTPLUG_CPU, modules cannot be loaded until all the CPUs > > are online, so there is no point in letting a module set itself up for > > notification. > > > > But whatever their reasoning, mine was that there is no point in > > creating a struct notifier_block that wasn't going to be used. ;-) > > And the above patch fails for !CONFIG_SMP. Here is an update, testing > in progress. Still not fully tested, but results are encouraging. > In particular, this one is more likely to compile. And this handled !CONFIG_SMP, but fails two of fifteen test cases. So better, but still far from perfect. Chasing the failures down. Thanx, Paul ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CONFIG_PREEMPT_RCU in next/mmotm 2009-08-10 22:43 ` Paul E. McKenney @ 2009-08-12 1:34 ` Paul E. McKenney 2009-08-12 9:22 ` Ingo Molnar 0 siblings, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2009-08-12 1:34 UTC (permalink / raw) To: Ingo Molnar; +Cc: Ben Herrenschmidt, Andrew Morton, linux-kernel, Hugh Dickins On Mon, Aug 10, 2009 at 03:43:45PM -0700, Paul E. McKenney wrote: > On Sun, Aug 09, 2009 at 08:39:07PM -0700, Paul E. McKenney wrote: > > On Sun, Aug 09, 2009 at 02:16:29PM -0700, Paul E. McKenney wrote: > > > On Sun, Aug 09, 2009 at 09:53:53PM +0100, Hugh Dickins wrote: [ . . . ] > > > > That builds and works for me, with or without CONFIG_HOTPLUG_CPU. > > > > > > > > But I didn't get what you're achieving with the MODULE part of it; > > > > and (I'm not a notifier buff at all) it does seems rather baroque to > > > > me - a single callsite, why not stick with register_cpu_notifier()? > > > > > > > > Ah, perhaps it's your ambition to move others over to this > > > > (or perhaps it's your ambition to leave that to someone else ;-) > > > > > > Actually, nothing quite that clearly thought out. I was just following > > > the pattern set for register_cpu_notifier(). My guess at the reasoning > > > is that when !HOTPLUG_CPU, modules cannot be loaded until all the CPUs > > > are online, so there is no point in letting a module set itself up for > > > notification. > > > > > > But whatever their reasoning, mine was that there is no point in > > > creating a struct notifier_block that wasn't going to be used. ;-) > > > > And the above patch fails for !CONFIG_SMP. Here is an update, testing > > in progress. Still not fully tested, but results are encouraging. > > In particular, this one is more likely to compile. > > And this handled !CONFIG_SMP, but fails two of fifteen test cases. > So better, but still far from perfect. > > Chasing the failures down. And I believe I have a patch that works for all of my test cases, but am rerunning the full set to double-check. Patch against tip/core/rcu below for your collective amusement. Should these tests pass... Unless someone tells me otherwise, I will make a patch series intended to replace tip/core/rcu commits 7fe616c5d ("Simplify RCU CPU-hotplug notification"), 04b06256c ("Fix RCU & CPU hotplug hang"), and 7256cf0e83b ("Add diagnostic check for a possible CPU-hotplug race"), re-run all tests on that patchset, and submit the series. I expect the resulting patch set to have three patches, one to split out boot-time initialization for RCU_TREE, a second to create the cpu_notifier() API, and the third to make RCU use it. I guess the lesson to me is that although I should send a patch quickly in response to bug reports, I need to nevertheless run my full set of RCU torture tests on it -- and verify that the specified kernel configuration parameters actually were in effect for those tests. :-/ Thanx, Paul ------------------------------------------------------------------------ diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 4d668e0..4753619 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -48,6 +48,15 @@ struct notifier_block; #ifdef CONFIG_SMP /* Need to know about CPUs going up/down? */ +#if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) +#define cpu_notifier(fn, pri) { \ + static struct notifier_block fn##_nb __cpuinitdata = \ + { .notifier_call = fn, .priority = pri }; \ + register_cpu_notifier(&fn##_nb); \ +} +#else /* #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */ +#define cpu_notifier(fn, pri) do { (void)(fn); } while (0) +#endif /* #else #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */ #ifdef CONFIG_HOTPLUG_CPU extern int register_cpu_notifier(struct notifier_block *nb); extern void unregister_cpu_notifier(struct notifier_block *nb); @@ -74,6 +83,8 @@ extern void cpu_maps_update_done(void); #else /* CONFIG_SMP */ +#define cpu_notifier(fn, pri) do { (void)(fn); } while (0) + static inline int register_cpu_notifier(struct notifier_block *nb) { return 0; @@ -99,11 +110,7 @@ extern struct sysdev_class cpu_sysdev_class; extern void get_online_cpus(void); extern void put_online_cpus(void); -#define hotcpu_notifier(fn, pri) { \ - static struct notifier_block fn##_nb __cpuinitdata = \ - { .notifier_call = fn, .priority = pri }; \ - register_cpu_notifier(&fn##_nb); \ -} +#define hotcpu_notifier(fn, pri) cpu_notifier(fn, pri) #define register_hotcpu_notifier(nb) register_cpu_notifier(nb) #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb) int cpu_down(unsigned int cpu); diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c index 9f0584e..8df1156 100644 --- a/kernel/rcupdate.c +++ b/kernel/rcupdate.c @@ -238,7 +238,7 @@ static int __cpuinit rcu_barrier_cpu_hotplug(struct notifier_block *self, call_rcu_bh(rcu_migrate_head, rcu_migrate_callback); call_rcu_sched(rcu_migrate_head + 1, rcu_migrate_callback); call_rcu(rcu_migrate_head + 2, rcu_migrate_callback); - } else if (action == CPU_DEAD) { + } else if (action == CPU_POST_DEAD) { /* rcu_migrate_head is protected by cpu_add_remove_lock */ wait_migrated_callbacks(); } @@ -251,7 +251,7 @@ void __init rcu_init(void) int i; __rcu_init(); - hotcpu_notifier(rcu_barrier_cpu_hotplug, 0); + cpu_notifier(rcu_barrier_cpu_hotplug, 0); /* * We don't need protection against CPU-hotplug here because ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: CONFIG_PREEMPT_RCU in next/mmotm 2009-08-12 1:34 ` Paul E. McKenney @ 2009-08-12 9:22 ` Ingo Molnar 2009-08-13 0:51 ` Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Ingo Molnar @ 2009-08-12 9:22 UTC (permalink / raw) To: Paul E. McKenney Cc: Ben Herrenschmidt, Andrew Morton, linux-kernel, Hugh Dickins * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > Should these tests pass... > > Unless someone tells me otherwise, I will make a patch series > intended to replace tip/core/rcu commits 7fe616c5d ("Simplify RCU > CPU-hotplug notification"), 04b06256c ("Fix RCU & CPU hotplug > hang"), and 7256cf0e83b ("Add diagnostic check for a possible > CPU-hotplug race"), re-run all tests on that patchset, and submit > the series. I expect the resulting patch set to have three > patches, one to split out boot-time initialization for RCU_TREE, a > second to create the cpu_notifier() API, and the third to make RCU > use it. Sure - we can reasonably rebase portions of that stack of commits: earth4:~/tip> gll linus..core/rcu 7256cf0: rcu: Add diagnostic check for a possible CPU-hotplug race 04b0625: rcu: Fix RCU & CPU hotplug hang 7fe616c: rcu: Simplify RCU CPU-hotplug notification 240ebbf: rcu: Add synchronize_sched_expedited() rcutorture doc + updates 0acc512: rcu: Add synchronize_sched_expedited() torture tests 03b042b: rcu: Add synchronize_sched_expedited() primitive c17ef45: rcu: Remove Classic RCU Please mention the magic words "please reset core/rcu to 240ebbf before applying these patches" in the mail to me, should i forget in the days to come. (hm, what was i supposed to not forget? Weird.) Ingo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CONFIG_PREEMPT_RCU in next/mmotm 2009-08-12 9:22 ` Ingo Molnar @ 2009-08-13 0:51 ` Paul E. McKenney 0 siblings, 0 replies; 14+ messages in thread From: Paul E. McKenney @ 2009-08-13 0:51 UTC (permalink / raw) To: Ingo Molnar; +Cc: Ben Herrenschmidt, Andrew Morton, linux-kernel, Hugh Dickins On Wed, Aug 12, 2009 at 11:22:50AM +0200, Ingo Molnar wrote: > > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > > Should these tests pass... Things are working much better, but I can still cause failures by hotplugging CPUs with zero wait time between them while concurrently modprobe-ing and rmmod-ing rcutorture repeatedly while running CONFIG_PREEMPT_RCU. I increased the kernel's lifespan by an order of magnitude or so by simplifying rcupreempt's hotplug code path. And I just -know- that I will deeply regret having described my test process, given that my life is much easier when my RCU testing is more rigorous than anyone else's. ;-) The remaining (known) problem appears to be due to a kthread_stop() deadlock between the migration threads and a few of rcutorture's kthreads. In this deadlock, rcutorture is waiting for one of the fakewriters to stop (via kthread_stop()), while the fakewriter is waiting for synchronize_rcu() to complete. The migration thread's CPU-hotplug notifier is blocked in kthread_stop() because rcutorture holds the kthread_stop() mutex. I could argue that CPU hotplug should allow RCU grace periods to proceed regardless, but I believe that the problem is that some thread is preempted in the middle of an RCU read-side critical section, but cannot be migrated to a CPU that could run it due to the fact that the migration kthread is stuck in its CPU-hotplug notifier. RCU being what it is, it seems reasonable for me to instead arrange so that rcutorture never invokes kthread_stop() unless it knows that the target thread cannot possibly be in the midst of synchronize_rcu(). That said, there is the concern that this general pattern might rear its ugly head elsewhere. > > Unless someone tells me otherwise, I will make a patch series > > intended to replace tip/core/rcu commits 7fe616c5d ("Simplify RCU > > CPU-hotplug notification"), 04b06256c ("Fix RCU & CPU hotplug > > hang"), and 7256cf0e83b ("Add diagnostic check for a possible > > CPU-hotplug race"), re-run all tests on that patchset, and submit > > the series. I expect the resulting patch set to have three > > patches, one to split out boot-time initialization for RCU_TREE, a > > second to create the cpu_notifier() API, and the third to make RCU > > use it. While thinking this over, I am rebasing as described above, and doing full-up testing at each step. No more Mr. Nice Guy!!! ;-) In the meantime, can anyone tell me why we only let one kthread stop at a time? > Sure - we can reasonably rebase portions of that stack of commits: > > earth4:~/tip> gll linus..core/rcu > 7256cf0: rcu: Add diagnostic check for a possible CPU-hotplug race > 04b0625: rcu: Fix RCU & CPU hotplug hang > 7fe616c: rcu: Simplify RCU CPU-hotplug notification > 240ebbf: rcu: Add synchronize_sched_expedited() rcutorture doc + updates > 0acc512: rcu: Add synchronize_sched_expedited() torture tests > 03b042b: rcu: Add synchronize_sched_expedited() primitive > c17ef45: rcu: Remove Classic RCU > > Please mention the magic words "please reset core/rcu to 240ebbf > before applying these patches" in the mail to me, should i forget in > the days to come. Will do! > (hm, what was i supposed to not forget? Weird.) I can't remember. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-08-13 0:51 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-08 19:34 CONFIG_PREEMPT_RCU in next/mmotm Hugh Dickins 2009-08-08 23:56 ` Paul E. McKenney 2009-08-09 3:32 ` Hugh Dickins 2009-08-09 5:33 ` Paul E. McKenney 2009-08-09 11:24 ` Hugh Dickins 2009-08-09 13:06 ` Hugh Dickins 2009-08-09 18:57 ` Paul E. McKenney 2009-08-09 20:53 ` Hugh Dickins 2009-08-09 21:16 ` Paul E. McKenney 2009-08-10 3:39 ` Paul E. McKenney 2009-08-10 22:43 ` Paul E. McKenney 2009-08-12 1:34 ` Paul E. McKenney 2009-08-12 9:22 ` Ingo Molnar 2009-08-13 0:51 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).