* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
[not found] ` <200807101725.36175.nickpiggin@yahoo.com.au>
@ 2008-07-10 9:03 ` Kamalesh Babulal
2008-07-10 11:02 ` Alexey Dobriyan
2008-08-01 21:09 ` Paul E. McKenney
2 siblings, 0 replies; 13+ messages in thread
From: Kamalesh Babulal @ 2008-07-10 9:03 UTC (permalink / raw)
To: Nick Piggin
Cc: Rafael J. Wysocki, Paul E. McKenney, Alexey Dobriyan,
Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
Linus Torvalds, Natalie Protasevich, Kernel Testers List
Nick Piggin wrote:
> On Wednesday 09 July 2008 07:37, Rafael J. Wysocki wrote:
>
>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11023
>> Subject : 2.6.26-rc8-git2 - kernel BUG at mm/page_alloc.c:585
>> Submitter : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
>> Date : 2008-07-02 11:55 (7 days old)
>> References : http://lkml.org/lkml/2008/7/2/32
>> Handled-By : Andrew Morton <akpm@linux-foundation.org>
>
> I expect Andrew probably doesn't have time to delve into this.
> Usual questions apply: is it reproduceable, is it bisectable?
> Someone at IBM is probably best to handle it. Maybe try Mel or
> powerpc list?
>
This is reproducible, I have marked the powerpc list in the bug report,
send to the list. I will try and bisect the bug.
>
>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10906
>> Subject : repeatable slab corruption with LTP msgctl08
>> Submitter : Andrew Morton <akpm@linux-foundation.org>
>> Date : 2008-06-12 5:13 (27 days old)
>> References : http://marc.info/?l=linux-kernel&m=121324775927704&w=4
>> Handled-By : Pekka J Enberg <penberg@cs.helsinki.fi>
>> Christoph Lameter <clameter@sgi.com>
>> Manfred Spraul <manfred@colorfullife.com>
>> Andi Kleen <andi@firstfloor.org>
>
> I couldn't reproduce this one either. Maybe hardware failure?
>
>
>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10629
>> Subject : 2.6.26-rc1-$sha1: RIP __d_lookup+0x8c/0x160
>> Submitter : Alexey Dobriyan <adobriyan@gmail.com>
>> Date : 2008-05-05 09:59 (65 days old)
>> References : http://lkml.org/lkml/2008/5/5/28
>> Handled-By : Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>
> Attached is my fix for this problem. I don't think it is a regression
> as such, but it can't hurt to go into 2.6.26 IMO.
>
>
>
> ------------------------------------------------------------------------
>
> PREEMPT_RCU without HOTPLUG_CPU is broken. The rcu_online_cpu is called to
> initially populate rcu_cpu_online_map with all online CPUs when the hotplug
> event handler is installed, and also to populate the map with CPUs as they
> come online. The former case is meant to happen with and without HOTPLUG_CPU,
> but without HOTPLUG_CPU, the rcu_offline_cpu function is no-oped -- while it
> still gets called, it does not set the rcu CPU map.
>
> With a blank RCU CPU map, grace periods get to tick by completely oblivious
> to active RCU read side critical sections. This results in free-before-grace
> bugs.
>
> Fix is obvious once the problem is known. (Also, change __devinit to
> __cpuinit so the function gets thrown away on !HOTPLUG_CPU kernels).
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
>
> Annoyed this wasn't a crazy obscure error in the algorithm I could fix :)
> I spent all day debugging it and had to make a special test case (rcutorture
> didn't seem to trigger it), and a big RCU state logging infrastructure to log
> millions of RCU state transitions and events. Oh well.
>
> Index: linux-2.6/kernel/rcupreempt.c
> ===================================================================
> --- linux-2.6.orig/kernel/rcupreempt.c 2008-07-10 17:08:56.000000000 +1000
> +++ linux-2.6/kernel/rcupreempt.c 2008-07-10 17:09:10.000000000 +1000
> @@ -925,26 +925,22 @@ void rcu_offline_cpu(int cpu)
> spin_unlock_irqrestore(&rdp->lock, flags);
> }
>
> -void __devinit rcu_online_cpu(int cpu)
> -{
> - unsigned long flags;
> -
> - spin_lock_irqsave(&rcu_ctrlblk.fliplock, flags);
> - cpu_set(cpu, rcu_cpu_online_map);
> - spin_unlock_irqrestore(&rcu_ctrlblk.fliplock, flags);
> -}
> -
> #else /* #ifdef CONFIG_HOTPLUG_CPU */
>
> void rcu_offline_cpu(int cpu)
> {
> }
>
> -void __devinit rcu_online_cpu(int cpu)
> +#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */
> +
> +void __cpuinit rcu_online_cpu(int cpu)
> {
> -}
> + unsigned long flags;
>
> -#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */
> + spin_lock_irqsave(&rcu_ctrlblk.fliplock, flags);
> + cpu_set(cpu, rcu_cpu_online_map);
> + spin_unlock_irqrestore(&rcu_ctrlblk.fliplock, flags);
> +}
>
> static void rcu_process_callbacks(struct softirq_action *unused)
> {
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
[not found] ` <200807101725.36175.nickpiggin@yahoo.com.au>
2008-07-10 9:03 ` Kamalesh Babulal
@ 2008-07-10 11:02 ` Alexey Dobriyan
2008-07-10 17:21 ` Linus Torvalds
2008-08-01 21:09 ` Paul E. McKenney
2 siblings, 1 reply; 13+ messages in thread
From: Alexey Dobriyan @ 2008-07-10 11:02 UTC (permalink / raw)
To: Nick Piggin
Cc: Rafael J. Wysocki, Kamalesh Babulal, Paul E. McKenney,
Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
Linus Torvalds, Natalie Protasevich, Kernel Testers List
On Thu, Jul 10, 2008 at 05:25:35PM +1000, Nick Piggin wrote:
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10629
> > Subject : 2.6.26-rc1-$sha1: RIP __d_lookup+0x8c/0x160
> > Submitter : Alexey Dobriyan <adobriyan@gmail.com>
> > Date : 2008-05-05 09:59 (65 days old)
> > References : http://lkml.org/lkml/2008/5/5/28
> > Handled-By : Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>
> Attached is my fix for this problem. I don't think it is a regression
> as such, but it can't hurt to go into 2.6.26 IMO.
> PREEMPT_RCU without HOTPLUG_CPU is broken.
Bastard!
rcutorture fixed here, starting cross-compile stuff (without much interest).
> Annoyed this wasn't a crazy obscure error in the algorithm I could fix :)
> I spent all day debugging it and had to make a special test case (rcutorture
> didn't seem to trigger it), and a big RCU state logging infrastructure to log
> millions of RCU state transitions and events. Oh well.
> --- linux-2.6.orig/kernel/rcupreempt.c
> +++ linux-2.6/kernel/rcupreempt.c
> @@ -925,26 +925,22 @@ void rcu_offline_cpu(int cpu)
> spin_unlock_irqrestore(&rdp->lock, flags);
> }
>
> -void __devinit rcu_online_cpu(int cpu)
> -{
> - unsigned long flags;
> -
> - spin_lock_irqsave(&rcu_ctrlblk.fliplock, flags);
> - cpu_set(cpu, rcu_cpu_online_map);
> - spin_unlock_irqrestore(&rcu_ctrlblk.fliplock, flags);
> -}
> -
> #else /* #ifdef CONFIG_HOTPLUG_CPU */
>
> void rcu_offline_cpu(int cpu)
> {
> }
>
> -void __devinit rcu_online_cpu(int cpu)
> +#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */
> +
> +void __cpuinit rcu_online_cpu(int cpu)
> {
> -}
> + unsigned long flags;
>
> -#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */
> + spin_lock_irqsave(&rcu_ctrlblk.fliplock, flags);
> + cpu_set(cpu, rcu_cpu_online_map);
> + spin_unlock_irqrestore(&rcu_ctrlblk.fliplock, flags);
> +}
>
> static void rcu_process_callbacks(struct softirq_action *unused)
> {
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
2008-07-10 11:02 ` Alexey Dobriyan
@ 2008-07-10 17:21 ` Linus Torvalds
2008-07-10 17:34 ` Ingo Molnar
0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2008-07-10 17:21 UTC (permalink / raw)
To: Alexey Dobriyan
Cc: Nick Piggin, Rafael J. Wysocki, Kamalesh Babulal,
Paul E. McKenney, Linux Kernel Mailing List, Adrian Bunk,
Andrew Morton, Natalie Protasevich, Kernel Testers List
On Thu, 10 Jul 2008, Alexey Dobriyan wrote:
> On Thu, Jul 10, 2008 at 05:25:35PM +1000, Nick Piggin wrote:
> >
> > Attached is my fix for this problem. I don't think it is a regression
> > as such, but it can't hurt to go into 2.6.26 IMO.
Nick, you're a hero.
> > PREEMPT_RCU without HOTPLUG_CPU is broken.
>
> Bastard!
>
> rcutorture fixed here, starting cross-compile stuff (without much interest).
I'm marking this "tested-by" by you too, on the strength of that
rcutorture thing. I think Nick nailed this one.
Good jorb,
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
2008-07-10 17:21 ` Linus Torvalds
@ 2008-07-10 17:34 ` Ingo Molnar
2008-07-10 18:06 ` Ingo Molnar
0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2008-07-10 17:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alexey Dobriyan, Nick Piggin, Rafael J. Wysocki, Kamalesh Babulal,
Paul E. McKenney, Linux Kernel Mailing List, Adrian Bunk,
Andrew Morton, Natalie Protasevich, Kernel Testers List
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Thu, 10 Jul 2008, Alexey Dobriyan wrote:
>
> > On Thu, Jul 10, 2008 at 05:25:35PM +1000, Nick Piggin wrote:
> > >
> > > Attached is my fix for this problem. I don't think it is a
> > > regression as such, but it can't hurt to go into 2.6.26 IMO.
>
> Nick, you're a hero.
cool! :)
(hm, could anyone please resend Nick's original mail? The original one
is not in my lkml folder nor on lkml.org - only the quoted one.)
Ingo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
2008-07-10 17:34 ` Ingo Molnar
@ 2008-07-10 18:06 ` Ingo Molnar
2008-07-11 4:11 ` Nick Piggin
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Ingo Molnar @ 2008-07-10 18:06 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alexey Dobriyan, Nick Piggin, Rafael J. Wysocki, Kamalesh Babulal,
Paul E. McKenney, Linux Kernel Mailing List, Adrian Bunk,
Andrew Morton, Natalie Protasevich, Kernel Testers List,
Paul E. McKenney
* Ingo Molnar <mingo@elte.hu> wrote:
> cool! :)
>
> (hm, could anyone please resend Nick's original mail? The original one
> is not in my lkml folder nor on lkml.org - only the quoted one.)
ok, got the mail now now:
| | Annoyed this wasn't a crazy obscure error in the algorithm I could
| | fix :) [...]
Paul recently ran a formal proof against all sorts of RCU details (and
found and fixed a few obscure races that way that no-one ever
triggered), so i'd be quite surprised if we found anything in the core
algorithm :-)
| | [...] I spent all day debugging it and had to make a special test
| | case (rcutorture didn't seem to trigger it), and a big RCU state
| | logging infrastructure to log millions of RCU state transitions and
| | events. Oh well.
nice debugging!
Acked-by: Ingo Molnar <mingo@elte.hu>
i'm wondering why rcutorture didnt trigger it. I do run !HOTPLUG +
RCU_PREEMPT kernels and never saw this. Nor did Paul. That aspect is
weird.
Ingo
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
2008-07-10 18:06 ` Ingo Molnar
@ 2008-07-11 4:11 ` Nick Piggin
2008-08-01 21:09 ` Paul E. McKenney
2008-08-01 21:09 ` Paul E. McKenney
[not found] ` <20080710204157.GG6877@linux.vnet.ibm.com>
2 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2008-07-11 4:11 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Alexey Dobriyan, Rafael J. Wysocki,
Kamalesh Babulal, Paul E. McKenney, Linux Kernel Mailing List,
Adrian Bunk, Andrew Morton, Natalie Protasevich,
Kernel Testers List, Paul E. McKenney
On Friday 11 July 2008 04:06, Ingo Molnar wrote:
> i'm wondering why rcutorture didnt trigger it. I do run !HOTPLUG +
> RCU_PREEMPT kernels and never saw this. Nor did Paul. That aspect is
> weird.
It basically requires an active rcu reader to be preempted (preferably
by something doing a lot of call_rcu or other activity ie. the writer
so it can tick along the different states quickly).
I found just 2 threads (reader and writer) bound to the same CPU would
trigger it fastest, my reader has quite a long rcu read section.
I'm not sure why rcutorture doesn't trigger for everyone. I'm surprised
it does not have much longer maximum read delays -- several ms I would
have thought should be useful to have a crticial section open while the
rcu engine can run through a number of states...
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
2008-07-11 4:11 ` Nick Piggin
@ 2008-08-01 21:09 ` Paul E. McKenney
0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2008-08-01 21:09 UTC (permalink / raw)
To: Nick Piggin
Cc: Ingo Molnar, Linus Torvalds, Alexey Dobriyan, Rafael J. Wysocki,
Kamalesh Babulal, Linux Kernel Mailing List, Adrian Bunk,
Andrew Morton, Natalie Protasevich, Kernel Testers List
On Fri, Jul 11, 2008 at 02:11:59PM +1000, Nick Piggin wrote:
> On Friday 11 July 2008 04:06, Ingo Molnar wrote:
>
> > i'm wondering why rcutorture didnt trigger it. I do run !HOTPLUG +
> > RCU_PREEMPT kernels and never saw this. Nor did Paul. That aspect is
> > weird.
>
> It basically requires an active rcu reader to be preempted (preferably
> by something doing a lot of call_rcu or other activity ie. the writer
> so it can tick along the different states quickly).
>
> I found just 2 threads (reader and writer) bound to the same CPU would
> trigger it fastest, my reader has quite a long rcu read section.
>
> I'm not sure why rcutorture doesn't trigger for everyone. I'm surprised
> it does not have much longer maximum read delays -- several ms I would
> have thought should be useful to have a crticial section open while the
> rcu engine can run through a number of states...
Hit it in 10 seconds once I actually got HOTPLUG_CPU disabled.
The theory behind the default settings for rcutorture are as follows:
o Having two reader threads for each CPU helps ensure interactions
between those threads.
o The writer is normally going to have to share a CPU with a
reader or two, maybe three. This should force reader-writer
interactions.
o The read-hold time needs to be long enough to ensure interactions
with the writer, but if it is too long, there are too few
rcu_read_lock() and rcu_read_unlock() events to really stress
the read-side processing.
o The four fakewriters ensure interaction between multiple
writers.
To Nick's point, I did use a hacked-up rcutorture with millisecond
read-side delays when debugging preemptable RCU, but I also used stock
rcutorture.
I will give this some thought and see if the defaults should change or
if more knobs are needed.
Thanx, Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
2008-07-10 18:06 ` Ingo Molnar
2008-07-11 4:11 ` Nick Piggin
@ 2008-08-01 21:09 ` Paul E. McKenney
[not found] ` <20080710204157.GG6877@linux.vnet.ibm.com>
2 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2008-08-01 21:09 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Alexey Dobriyan, Nick Piggin, Rafael J. Wysocki,
Kamalesh Babulal, Linux Kernel Mailing List, Adrian Bunk,
Andrew Morton, Natalie Protasevich, Kernel Testers List
On Thu, Jul 10, 2008 at 08:06:20PM +0200, Ingo Molnar wrote:
>
> * Ingo Molnar <mingo@elte.hu> wrote:
>
> > cool! :)
> >
> > (hm, could anyone please resend Nick's original mail? The original one
> > is not in my lkml folder nor on lkml.org - only the quoted one.)
>
> ok, got the mail now now:
>
> | | Annoyed this wasn't a crazy obscure error in the algorithm I could
> | | fix :) [...]
>
> Paul recently ran a formal proof against all sorts of RCU details (and
> found and fixed a few obscure races that way that no-one ever
> triggered), so i'd be quite surprised if we found anything in the core
> algorithm :-)
>
> | | [...] I spent all day debugging it and had to make a special test
> | | case (rcutorture didn't seem to trigger it), and a big RCU state
> | | logging infrastructure to log millions of RCU state transitions and
> | | events. Oh well.
>
> nice debugging!
Indeed!!!
> Acked-by: Ingo Molnar <mingo@elte.hu>
>
> i'm wondering why rcutorture didnt trigger it. I do run !HOTPLUG +
> RCU_PREEMPT kernels and never saw this. Nor did Paul. That aspect is
> weird.
Turns out that my environment was silently re-enabling HOTPLUG_CPU, so I
only -thought- I was testing !CPU_HOTPLUG. Once I forced it to really
disable HOTPLUG_CPU (by manually also specifying CONFIG_SUSPEND=n and
CONFIG_HIBERNATION=n), then rcutorture complained within 10 seconds.
Sigh!!!
Thanx, Paul
^ permalink raw reply [flat|nested] 13+ messages in thread[parent not found: <20080710204157.GG6877@linux.vnet.ibm.com>]
* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
[not found] ` <20080710204157.GG6877@linux.vnet.ibm.com>
@ 2008-08-01 21:09 ` Paul E. McKenney
0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2008-08-01 21:09 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Alexey Dobriyan, Nick Piggin, Rafael J. Wysocki,
Kamalesh Babulal, Linux Kernel Mailing List, Adrian Bunk,
Andrew Morton, Natalie Protasevich, Kernel Testers List
On Thu, Jul 10, 2008 at 01:41:57PM -0700, Paul E. McKenney wrote:
> On Thu, Jul 10, 2008 at 08:06:20PM +0200, Ingo Molnar wrote:
> >
> > * Ingo Molnar <mingo@elte.hu> wrote:
> >
> > > cool! :)
> > >
> > > (hm, could anyone please resend Nick's original mail? The original one
> > > is not in my lkml folder nor on lkml.org - only the quoted one.)
> >
> > ok, got the mail now now:
> >
> > | | Annoyed this wasn't a crazy obscure error in the algorithm I could
> > | | fix :) [...]
> >
> > Paul recently ran a formal proof against all sorts of RCU details (and
> > found and fixed a few obscure races that way that no-one ever
> > triggered), so i'd be quite surprised if we found anything in the core
> > algorithm :-)
Yeah, it was instead the simple stuff that I messed up... :-/
> > | | [...] I spent all day debugging it and had to make a special test
> > | | case (rcutorture didn't seem to trigger it), and a big RCU state
> > | | logging infrastructure to log millions of RCU state transitions and
> > | | events. Oh well.
> >
> > nice debugging!
>
> Indeed!!!
>
> > Acked-by: Ingo Molnar <mingo@elte.hu>
> >
> > i'm wondering why rcutorture didnt trigger it. I do run !HOTPLUG +
> > RCU_PREEMPT kernels and never saw this. Nor did Paul. That aspect is
> > weird.
>
> Turns out that my environment was silently re-enabling HOTPLUG_CPU, so I
> only -thought- I was testing !CPU_HOTPLUG. Once I forced it to really
> disable HOTPLUG_CPU (by manually also specifying CONFIG_SUSPEND=n and
> CONFIG_HIBERNATION=n), then rcutorture complained within 10 seconds.
>
> Sigh!!!
And Nick's patch gets rid of the rcutorture failures for me as well,
now that I can reproduce them. ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
[not found] ` <200807101725.36175.nickpiggin@yahoo.com.au>
2008-07-10 9:03 ` Kamalesh Babulal
2008-07-10 11:02 ` Alexey Dobriyan
@ 2008-08-01 21:09 ` Paul E. McKenney
2 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2008-08-01 21:09 UTC (permalink / raw)
To: Nick Piggin
Cc: Rafael J. Wysocki, Kamalesh Babulal, Alexey Dobriyan,
Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
Linus Torvalds, Natalie Protasevich, Kernel Testers List
On Thu, Jul 10, 2008 at 05:25:35PM +1000, Nick Piggin wrote:
> On Wednesday 09 July 2008 07:37, Rafael J. Wysocki wrote:
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10629
> > Subject : 2.6.26-rc1-$sha1: RIP __d_lookup+0x8c/0x160
> > Submitter : Alexey Dobriyan <adobriyan@gmail.com>
> > Date : 2008-05-05 09:59 (65 days old)
> > References : http://lkml.org/lkml/2008/5/5/28
> > Handled-By : Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>
> Attached is my fix for this problem. I don't think it is a regression
> as such, but it can't hurt to go into 2.6.26 IMO.
>
> PREEMPT_RCU without HOTPLUG_CPU is broken. The rcu_online_cpu is called to
> initially populate rcu_cpu_online_map with all online CPUs when the hotplug
> event handler is installed, and also to populate the map with CPUs as they
> come online. The former case is meant to happen with and without HOTPLUG_CPU,
> but without HOTPLUG_CPU, the rcu_offline_cpu function is no-oped -- while it
> still gets called, it does not set the rcu CPU map.
>
> With a blank RCU CPU map, grace periods get to tick by completely oblivious
> to active RCU read side critical sections. This results in free-before-grace
> bugs.
>
> Fix is obvious once the problem is known. (Also, change __devinit to
> __cpuinit so the function gets thrown away on !HOTPLUG_CPU kernels).
I officially feel extremely stupid. Thank you -very- much for tracking
this down, Nick!!! And especially for the fix!
I will give this a good testing. In the meantime:
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
>
> Annoyed this wasn't a crazy obscure error in the algorithm I could fix :)
> I spent all day debugging it and had to make a special test case (rcutorture
> didn't seem to trigger it), and a big RCU state logging infrastructure to log
> millions of RCU state transitions and events. Oh well.
>
> Index: linux-2.6/kernel/rcupreempt.c
> ===================================================================
> --- linux-2.6.orig/kernel/rcupreempt.c 2008-07-10 17:08:56.000000000 +1000
> +++ linux-2.6/kernel/rcupreempt.c 2008-07-10 17:09:10.000000000 +1000
> @@ -925,26 +925,22 @@ void rcu_offline_cpu(int cpu)
> spin_unlock_irqrestore(&rdp->lock, flags);
> }
>
> -void __devinit rcu_online_cpu(int cpu)
> -{
> - unsigned long flags;
> -
> - spin_lock_irqsave(&rcu_ctrlblk.fliplock, flags);
> - cpu_set(cpu, rcu_cpu_online_map);
> - spin_unlock_irqrestore(&rcu_ctrlblk.fliplock, flags);
> -}
> -
> #else /* #ifdef CONFIG_HOTPLUG_CPU */
>
> void rcu_offline_cpu(int cpu)
> {
> }
>
> -void __devinit rcu_online_cpu(int cpu)
> +#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */
> +
> +void __cpuinit rcu_online_cpu(int cpu)
> {
> -}
> + unsigned long flags;
>
> -#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */
> + spin_lock_irqsave(&rcu_ctrlblk.fliplock, flags);
> + cpu_set(cpu, rcu_cpu_online_map);
> + spin_unlock_irqrestore(&rcu_ctrlblk.fliplock, flags);
> +}
>
> static void rcu_process_callbacks(struct softirq_action *unused)
> {
^ permalink raw reply [flat|nested] 13+ messages in thread