* [PATCH] Make treercu safe for suspend and resume
@ 2009-01-04 19:41 Paul E. McKenney
2009-01-04 20:41 ` Eric Sesterhenn
2009-01-05 9:12 ` Ingo Molnar
0 siblings, 2 replies; 4+ messages in thread
From: Paul E. McKenney @ 2009-01-04 19:41 UTC (permalink / raw)
To: linux-kernel
Cc: dhaval, jens.axboe, mingo, snakebyte, andi, akpm, dvhltc, niv,
rostedt, tglx, manfred
Hello!
Kudos to both Dhaval Giani and Jens Axboe for finding a bug in treercu
that causes warnings after suspend-resume cycles in Dhaval's case and
during stress tests in Jens's case. It would also probably cause failures
if heavily stressed. The solution, ironically enough, is to revert to
rcupreempt's code for initializing the dynticks state. And the patch
even results in smaller code -- so what was I thinking???
This is 2.6.29 material, given that people really do suspend and resume
Linux these days. ;-)
Located-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Located-by: Jens Axboe <jens.axboe@oracle.com>
Tested-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Tested-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
rcutree.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index a342b03..e0a347f 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -79,7 +79,10 @@ struct rcu_state rcu_bh_state = RCU_STATE_INITIALIZER(rcu_bh_state);
DEFINE_PER_CPU(struct rcu_data, rcu_bh_data);
#ifdef CONFIG_NO_HZ
-DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks);
+DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
+ .dynticks_nesting = 1,
+ .dynticks = 1,
+};
#endif /* #ifdef CONFIG_NO_HZ */
static int blimit = 10; /* Maximum callbacks per softirq. */
@@ -1379,13 +1382,6 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
static void __cpuinit rcu_online_cpu(int cpu)
{
-#ifdef CONFIG_NO_HZ
- struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
-
- rdtp->dynticks_nesting = 1;
- rdtp->dynticks |= 1; /* need consecutive #s even for hotplug. */
- rdtp->dynticks_nmi = (rdtp->dynticks_nmi + 1) & ~0x1;
-#endif /* #ifdef CONFIG_NO_HZ */
rcu_init_percpu_data(cpu, &rcu_state);
rcu_init_percpu_data(cpu, &rcu_bh_state);
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] Make treercu safe for suspend and resume
2009-01-04 19:41 [PATCH] Make treercu safe for suspend and resume Paul E. McKenney
@ 2009-01-04 20:41 ` Eric Sesterhenn
2009-01-04 21:14 ` Paul E. McKenney
2009-01-05 9:12 ` Ingo Molnar
1 sibling, 1 reply; 4+ messages in thread
From: Eric Sesterhenn @ 2009-01-04 20:41 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-kernel, dhaval, jens.axboe, mingo, andi, akpm, dvhltc, niv,
rostedt, tglx, manfred
* Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> Hello!
>
> Kudos to both Dhaval Giani and Jens Axboe for finding a bug in treercu
> that causes warnings after suspend-resume cycles in Dhaval's case and
> during stress tests in Jens's case. It would also probably cause failures
> if heavily stressed. The solution, ironically enough, is to revert to
> rcupreempt's code for initializing the dynticks state. And the patch
> even results in smaller code -- so what was I thinking???
>
> This is 2.6.29 material, given that people really do suspend and resume
> Linux these days. ;-)
sadly even with this patch i still get this oops when doing
modprobe rcutorture; sleep 2s; rmmod rcutorture
[ 74.413097] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 74.413424] IP: [<(null)>] (null)
[ 74.413651] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
[ 74.413956] last sysfs file: /sys/block/ram9/range
[ 74.414039] Modules linked in: [last unloaded: rcutorture]
[ 74.414039]
[ 74.414039] Pid: 4997, comm: rcu_torture_wri Tainted: G W
(2.6.28-05692-g7d3b56b-dirty #167) System Name
[ 74.414039] EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
[ 74.414039] EIP is at 0x0
[ 74.414039] EAX: d0afd130 EBX: 00000000 ECX: c01612a6 EDX: 00000006
[ 74.414039] ESI: d0afd130 EDI: 0000001c EBP: c0b03fe0 ESP: c0b03fd4
[ 74.414039] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 74.414039] Process rcu_torture_wri (pid: 4997, ti=c0b03000
task=c98bce00 task.ti=c988b000)
[ 74.414039] Stack:
[ 74.414039] c01612ad 00000200 00000001 c0b03ff8 c012aa97 0000000a
c988beac 00000046
[ 74.414039] c012aa28 c988bebc c01042c2
[ 74.414039] Call Trace:
[ 74.414039] [<c01612ad>] ? rcu_process_callbacks+0x65/0x79
[ 74.414039] [<c012aa97>] ? __do_softirq+0x6f/0xf6
[ 74.414039] [<c012aa28>] ? __do_softirq+0x0/0xf6
[ 74.414039] <IRQ> <0> [<c012a9a5>] ? irq_exit+0x40/0x7c
[ 74.414039] [<c0110ce1>] ? smp_apic_timer_interrupt+0x68/0x73
[ 74.414039] [<c0103521>] ? apic_timer_interrupt+0x2d/0x34
[ 74.414039] [<c01219f7>] ? finish_task_switch+0x4d/0x8b
[ 74.414039] [<c014007b>] ? tick_check_oneshot_change+0xb1/0xf9
[ 74.414039] [<c07a091f>] ? _spin_unlock_irq+0x2d/0x47
[ 74.414039] [<c01219f7>] ? finish_task_switch+0x4d/0x8b
[ 74.414039] [<c01219aa>] ? finish_task_switch+0x0/0x8b
[ 74.414039] [<c079e366>] ? schedule+0x404/0x450
[ 74.414039] [<c079e582>] ? schedule_timeout+0x70/0x95
[ 74.414039] [<c012e13a>] ? process_timeout+0x0/0xf
[ 74.414039] [<c079e57d>] ? schedule_timeout+0x6b/0x95
[ 74.414039] [<c079e5c0>] ?
schedule_timeout_uninterruptible+0x19/0x1b
[ 74.414039] [<c0136bcc>] ? kthread+0x3e/0x66
[ 74.414039] [<c0136b8e>] ? kthread+0x0/0x66
[ 74.414039] [<c0103643>] ? kernel_thread_helper+0x7/0x10
[ 74.414039] Code: Bad EIP value.
[ 74.414039] EIP: [<00000000>] 0x0 SS:ESP 0068:c0b03fd4
[ 74.422275] ---[ end trace 4eaa2a86a8e2da22 ]---
[ 74.422406] Kernel panic - not syncing: Fatal exception in interrupt
Greetings Eric
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Make treercu safe for suspend and resume
2009-01-04 20:41 ` Eric Sesterhenn
@ 2009-01-04 21:14 ` Paul E. McKenney
0 siblings, 0 replies; 4+ messages in thread
From: Paul E. McKenney @ 2009-01-04 21:14 UTC (permalink / raw)
To: Eric Sesterhenn
Cc: linux-kernel, dhaval, jens.axboe, mingo, andi, akpm, dvhltc, niv,
rostedt, tglx, manfred
On Sun, Jan 04, 2009 at 09:41:08PM +0100, Eric Sesterhenn wrote:
> * Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> > Hello!
> >
> > Kudos to both Dhaval Giani and Jens Axboe for finding a bug in treercu
> > that causes warnings after suspend-resume cycles in Dhaval's case and
> > during stress tests in Jens's case. It would also probably cause failures
> > if heavily stressed. The solution, ironically enough, is to revert to
> > rcupreempt's code for initializing the dynticks state. And the patch
> > even results in smaller code -- so what was I thinking???
> >
> > This is 2.6.29 material, given that people really do suspend and resume
> > Linux these days. ;-)
>
> sadly even with this patch i still get this oops when doing
> modprobe rcutorture; sleep 2s; rmmod rcutorture
I would have been extremely surprised had that patch fixed this problem,
but thank you very much for trying it out! What can I say, I worked on
the easy ones first. ;-)
Thanx, Paul
> [ 74.413097] BUG: unable to handle kernel NULL pointer dereference at
> (null)
> [ 74.413424] IP: [<(null)>] (null)
> [ 74.413651] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> [ 74.413956] last sysfs file: /sys/block/ram9/range
> [ 74.414039] Modules linked in: [last unloaded: rcutorture]
> [ 74.414039]
> [ 74.414039] Pid: 4997, comm: rcu_torture_wri Tainted: G W
> (2.6.28-05692-g7d3b56b-dirty #167) System Name
> [ 74.414039] EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
> [ 74.414039] EIP is at 0x0
> [ 74.414039] EAX: d0afd130 EBX: 00000000 ECX: c01612a6 EDX: 00000006
> [ 74.414039] ESI: d0afd130 EDI: 0000001c EBP: c0b03fe0 ESP: c0b03fd4
> [ 74.414039] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 74.414039] Process rcu_torture_wri (pid: 4997, ti=c0b03000
> task=c98bce00 task.ti=c988b000)
> [ 74.414039] Stack:
> [ 74.414039] c01612ad 00000200 00000001 c0b03ff8 c012aa97 0000000a
> c988beac 00000046
> [ 74.414039] c012aa28 c988bebc c01042c2
> [ 74.414039] Call Trace:
> [ 74.414039] [<c01612ad>] ? rcu_process_callbacks+0x65/0x79
> [ 74.414039] [<c012aa97>] ? __do_softirq+0x6f/0xf6
> [ 74.414039] [<c012aa28>] ? __do_softirq+0x0/0xf6
> [ 74.414039] <IRQ> <0> [<c012a9a5>] ? irq_exit+0x40/0x7c
> [ 74.414039] [<c0110ce1>] ? smp_apic_timer_interrupt+0x68/0x73
> [ 74.414039] [<c0103521>] ? apic_timer_interrupt+0x2d/0x34
> [ 74.414039] [<c01219f7>] ? finish_task_switch+0x4d/0x8b
> [ 74.414039] [<c014007b>] ? tick_check_oneshot_change+0xb1/0xf9
> [ 74.414039] [<c07a091f>] ? _spin_unlock_irq+0x2d/0x47
> [ 74.414039] [<c01219f7>] ? finish_task_switch+0x4d/0x8b
> [ 74.414039] [<c01219aa>] ? finish_task_switch+0x0/0x8b
> [ 74.414039] [<c079e366>] ? schedule+0x404/0x450
> [ 74.414039] [<c079e582>] ? schedule_timeout+0x70/0x95
> [ 74.414039] [<c012e13a>] ? process_timeout+0x0/0xf
> [ 74.414039] [<c079e57d>] ? schedule_timeout+0x6b/0x95
> [ 74.414039] [<c079e5c0>] ?
> schedule_timeout_uninterruptible+0x19/0x1b
> [ 74.414039] [<c0136bcc>] ? kthread+0x3e/0x66
> [ 74.414039] [<c0136b8e>] ? kthread+0x0/0x66
> [ 74.414039] [<c0103643>] ? kernel_thread_helper+0x7/0x10
> [ 74.414039] Code: Bad EIP value.
> [ 74.414039] EIP: [<00000000>] 0x0 SS:ESP 0068:c0b03fd4
> [ 74.422275] ---[ end trace 4eaa2a86a8e2da22 ]---
> [ 74.422406] Kernel panic - not syncing: Fatal exception in interrupt
>
> Greetings Eric
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Make treercu safe for suspend and resume
2009-01-04 19:41 [PATCH] Make treercu safe for suspend and resume Paul E. McKenney
2009-01-04 20:41 ` Eric Sesterhenn
@ 2009-01-05 9:12 ` Ingo Molnar
1 sibling, 0 replies; 4+ messages in thread
From: Ingo Molnar @ 2009-01-05 9:12 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-kernel, dhaval, jens.axboe, snakebyte, andi, akpm, dvhltc,
niv, rostedt, tglx, manfred
* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> Hello!
>
> Kudos to both Dhaval Giani and Jens Axboe for finding a bug in treercu
> that causes warnings after suspend-resume cycles in Dhaval's case and
> during stress tests in Jens's case. It would also probably cause failures
> if heavily stressed. The solution, ironically enough, is to revert to
> rcupreempt's code for initializing the dynticks state. And the patch
> even results in smaller code -- so what was I thinking???
>
> This is 2.6.29 material, given that people really do suspend and resume
> Linux these days. ;-)
>
> Located-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
> Located-by: Jens Axboe <jens.axboe@oracle.com>
> Tested-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
> Tested-by: Jens Axboe <jens.axboe@oracle.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>
> rcutree.c | 12 ++++--------
> 1 file changed, 4 insertions(+), 8 deletions(-)
applied to tip/core/urgent, thanks guys!
Ingo
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-01-05 9:13 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-04 19:41 [PATCH] Make treercu safe for suspend and resume Paul E. McKenney
2009-01-04 20:41 ` Eric Sesterhenn
2009-01-04 21:14 ` Paul E. McKenney
2009-01-05 9:12 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox