From: Andi Kleen <andi@firstfloor.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
mingo@elte.hu, linux-kernel@vger.kernel.org, rjw@sisk.pl,
dipankar@in.ibm.com
Subject: Re: RCU hang on cpu re-hotplug with 2.6.27rc8
Date: Thu, 9 Oct 2008 10:22:30 +0200 [thread overview]
Message-ID: <20081009082230.GE24560@one.firstfloor.org> (raw)
In-Reply-To: <alpine.LFD.2.00.0810090921320.3237@apollo>
On Thu, Oct 09, 2008 at 09:24:51AM +0200, Thomas Gleixner wrote:
> On Thu, 9 Oct 2008, Andi Kleen wrote:
> > It actually does. The stall detector makes the online echo return after three seconds,
> > although it's not 100% clear to me why.
> >
> > here's the backtrace
> >
> > RCU detected CPU 14 stall (t=4295149800/5928 jiffies)
> > Pid: 0, comm: swapper Not tainted 2.6.27-rc9 #5
> >
> > Call Trace:
> > <IRQ> [<ffffffff8025d188>] __rcu_pending+0x6e/0x1d9
> > [<ffffffff8025d329>] rcu_pending+0x36/0x6e
> > [<ffffffff8023b480>] update_process_times+0x37/0x5b
> > [<ffffffff8024be72>] tick_periodic+0x68/0x74
> > [<ffffffff8024be9f>] tick_handle_periodic+0x21/0x66
> > [<ffffffff8021bcd2>] smp_apic_timer_interrupt+0x8a/0xa8
> > [<ffffffff8020bfe6>] apic_timer_interrupt+0x66/0x70
> > <EOI> [<ffffffff803adb39>] ? acpi_safe_halt+0x2b/0x3e
> > [<ffffffff803adbfa>] ? acpi_idle_enter_c1+0xae/0x102
> > [<ffffffff804ffdd6>] ? cpuidle_idle_call+0x70/0xa2
> > [<ffffffff8020a097>] ? cpu_idle+0x7e/0x9c
> > [<ffffffff805bef4a>] ? start_secondary+0x157/0x15c
> >
> > Timer issue?
>
> Hmm, this is periodic mode so rather unlikely, but who knows. Does
> this happen with nohz and/or highres as well ?
With nohz/highres enabled it takes much longer to trigger. Normally
it happened near always on the first try, now I had to let a loop
run for several minutes to trigger it.
But the strange thing is that the stall detector doesn't detect
the hotplugged CPUs stalling now, but other unrelated ones.
I only hotplug 14/15, but it reports 3 and 4. In periodic
mode the correct CPUs were reported.
-Andi
Here are the backtraces
Switched to high resolution mode on CPU 14
CPU 15 is now offline
RCU detected CPU 3 stall (t=4294999688/3809 jiffies)
Pid: 0, comm: swapper Not tainted 2.6.27-rc9 #6
Call Trace:
<IRQ> [<ffffffff8025f474>] __rcu_pending+0x6e/0x1d9
[<ffffffff8025f615>] rcu_pending+0x36/0x6e
[<ffffffff8023bc5d>] update_process_times+0x37/0x5b
[<ffffffff8024de88>] tick_sched_timer+0x81/0xb5
[<ffffffff80247538>] __run_hrtimer+0x56/0x96
[<ffffffff80248002>] hrtimer_interrupt+0xe6/0x14d
[<ffffffff8021bcd2>] smp_apic_timer_interrupt+0x8a/0xa8
[<ffffffff8020bff6>] apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff803b0254>] ? acpi_idle_enter_bm+0x2a2/0x312
[<ffffffff803b024a>] ? acpi_idle_enter_bm+0x298/0x312
[<ffffffff805020c2>] ? cpuidle_idle_call+0x70/0xa2
[<ffffffff8020a0a1>] ? cpu_idle+0x88/0xae
[<ffffffff805c137a>] ? start_secondary+0x157/0x15c
RCU detected CPU 3 stall (t=4295007688/1250 jiffies)
Pid: 0, comm: swapper Not tainted 2.6.27-rc9 #6
Call Trace:
<IRQ> [<ffffffff8025f474>] __rcu_pending+0x6e/0x1d9
[<ffffffff8025f615>] rcu_pending+0x36/0x6e
[<ffffffff8023bc5d>] update_process_times+0x37/0x5b
[<ffffffff8024de88>] tick_sched_timer+0x81/0xb5
[<ffffffff80247538>] __run_hrtimer+0x56/0x96
[<ffffffff80248002>] hrtimer_interrupt+0xe6/0x14d
[<ffffffff8021bcd2>] smp_apic_timer_interrupt+0x8a/0xa8
[<ffffffff8020bff6>] apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff803b0254>] ? acpi_idle_enter_bm+0x2a2/0x312
[<ffffffff803b024a>] ? acpi_idle_enter_bm+0x298/0x312
[<ffffffff805020c2>] ? cpuidle_idle_call+0x70/0xa2
[<ffffffff8020a0a1>] ? cpu_idle+0x88/0xae
[<ffffffff805c137a>] ? start_secondary+0x157/0x15c
RCU detected CPU 3 stall (t=4295012121/2548 jiffies)
Pid: 0, comm: swapper Not tainted 2.6.27-rc9 #6
Call Trace:
<IRQ> [<ffffffff8025f474>] __rcu_pending+0x6e/0x1d9
[<ffffffff8025f640>] rcu_pending+0x61/0x6e
[<ffffffff8023bc5d>] update_process_times+0x37/0x5b
[<ffffffff8024de88>] tick_sched_timer+0x81/0xb5
[<ffffffff80247538>] __run_hrtimer+0x56/0x96
[<ffffffff80248002>] hrtimer_interrupt+0xe6/0x14d
[<ffffffff8021bcd2>] smp_apic_timer_interrupt+0x8a/0xa8
[<ffffffff8020bff6>] apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff803b0254>] ? acpi_idle_enter_bm+0x2a2/0x312
[<ffffffff803b024a>] ? acpi_idle_enter_bm+0x298/0x312
[<ffffffff805020c2>] ? cpuidle_idle_call+0x70/0xa2
[<ffffffff8020a0a1>] ? cpu_idle+0x88/0xae
[<ffffffff805c137a>] ? start_secondary+0x157/0x15c
RCU detected CPU 2 stall (t=4295014976/874 jiffies)
Pid: 0, comm: swapper Not tainted 2.6.27-rc9 #6
Call Trace:
<IRQ> <3>RCU detected CPU 3 stall (t=4295014976/874 jiffies)
[<ffffffff8025f474>] __rcu_pending+0x6e/0x1d9
[<ffffffff8025f615>] rcu_pending+0x36/0x6e
[<ffffffff8023bc5d>] update_process_times+0x37/0x5b
[<ffffffff8024de88>] tick_sched_timer+0x81/0xb5
[<ffffffff80247538>] __run_hrtimer+0x56/0x96
[<ffffffff80248002>] hrtimer_interrupt+0xe6/0x14d
[<ffffffff8021bcd2>] smp_apic_timer_interrupt+0x8a/0xa8
Pid: 0, comm: swapper Not tainted 2.6.27-rc9 #6
Call Trace:
<IRQ> [<ffffffff8020bff6>] apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff8024e1b0>] ? tick_nohz_restart_sched_tick+0x15e/0x165
[<ffffffff8025f474>] __rcu_pending+0x6e/0x1d9
[<ffffffff8025f615>] rcu_pending+0x36/0x6e
[<ffffffff8023bc5d>] update_process_times+0x37/0x5b
[<ffffffff8024de88>] tick_sched_timer+0x81/0xb5
[<ffffffff80247538>] __run_hrtimer+0x56/0x96
[<ffffffff80248002>] hrtimer_interrupt+0xe6/0x14d
[<ffffffff8021bcd2>] smp_apic_timer_interrupt+0x8a/0xa8
[<ffffffff8020a0bd>] ? cpu_idle+0xa4/0xae
[<ffffffff805c137a>] ? start_secondary+0x157/0x15c
[<ffffffff8020bff6>] apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff803b0254>] ? acpi_idle_enter_bm+0x2a2/0x312
[<ffffffff803b024a>] ? acpi_idle_enter_bm+0x298/0x312
[<ffffffff805020c2>] ? cpuidle_idle_call+0x70/0xa2
[<ffffffff8020a0a1>] ? cpu_idle+0x88/0xae
[<ffffffff805c137a>] ? start_secondary+0x157/0x15c
RCU detected CPU 4 stall (t=4295019871/4894 jiffies)
Pid: 0, comm: swapper Not tainted 2.6.27-rc9 #6
Call Trace:
<IRQ> [<ffffffff8025f474>] __rcu_pending+0x6e/0x1d9
[<ffffffff8025f615>] rcu_pending+0x36/0x6e
[<ffffffff8023bc5d>] update_process_times+0x37/0x5b
[<ffffffff8024de88>] tick_sched_timer+0x81/0xb5
[<ffffffff80247538>] __run_hrtimer+0x56/0x96
RCU detected CPU 6 stall (t=4295019871/4894 jiffies)
Pid: 0, comm: swapper Not tainted 2.6.27-rc9 #6
Call Trace:
<IRQ> [<ffffffff8025f474>] __rcu_pending+0x6e/0x1d9
[<ffffffff80248002>] hrtimer_interrupt+0xe6/0x14d
[<ffffffff8021bcd2>] smp_apic_timer_interrupt+0x8a/0xa8
[<ffffffff8020bff6>] apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff803b0254>] ? acpi_idle_enter_bm+0x2a2/0x312
[<ffffffff8025f615>] rcu_pending+0x36/0x6e
[<ffffffff8023bc5d>] update_process_times+0x37/0x5b
[<ffffffff8024de88>] tick_sched_timer+0x81/0xb5
[<ffffffff80247538>] __run_hrtimer+0x56/0x96
[<ffffffff80248002>] hrtimer_interrupt+0xe6/0x14d
[<ffffffff8021bcd2>] smp_apic_timer_interrupt+0x8a/0xa8
[<ffffffff8020bff6>] apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff8024e1b0>] ? tick_nohz_restart_sched_tick+0x15e/0x165
[<ffffffff803b024a>] ? acpi_idle_enter_bm+0x298/0x312
[<ffffffff805020c2>] ? cpuidle_idle_call+0x70/0xa2
[<ffffffff8020a0a1>] ? cpu_idle+0x88/0xae
[<ffffffff805c137a>] ? start_secondary+0x157/0x15c
[<ffffffff8020a0bd>] ? cpu_idle+0xa4/0xae
[<ffffffff805c137a>] ? start_secondary+0x157/0x15c
--
ak@linux.intel.com
next prev parent reply other threads:[~2008-10-09 8:16 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-06 14:12 scheduler hang on cpu re-hotplug with 2.6.27rc8 Andi Kleen
2008-10-06 23:28 ` RCU " Andi Kleen
2008-10-07 3:08 ` Paul E. McKenney
2008-10-07 7:15 ` Andi Kleen
2008-10-07 15:26 ` Paul E. McKenney
2008-10-07 15:49 ` Andi Kleen
2008-10-07 16:34 ` Paul E. McKenney
2008-10-07 21:09 ` Andi Kleen
2008-10-07 21:22 ` Paul E. McKenney
2008-10-09 1:08 ` [PATCH] rudimentary tracing for Classic RCU Paul E. McKenney
2008-10-09 6:20 ` Lai Jiangshan
2008-10-09 6:55 ` Andi Kleen
2008-10-09 7:05 ` Lai Jiangshan
2008-10-09 7:14 ` KOSAKI Motohiro
2008-10-09 7:26 ` Lai Jiangshan
2008-10-09 8:06 ` Andi Kleen
2008-10-10 11:48 ` Paul E. McKenney
2008-10-09 11:50 ` Paul E. McKenney
2008-10-09 11:50 ` Paul E. McKenney
2008-10-09 10:23 ` Frédéric Weisbecker
2008-10-09 10:53 ` Andi Kleen
2008-10-09 11:44 ` Frédéric Weisbecker
2008-10-09 11:54 ` Paul E. McKenney
2008-10-09 13:01 ` Frédéric Weisbecker
2008-10-10 3:44 ` [PATCH] v2 " Paul E. McKenney
2008-10-13 23:09 ` [PATCH] v3 " Paul E. McKenney
2008-10-14 3:53 ` Lai Jiangshan
2008-10-14 14:35 ` Paul E. McKenney
2008-10-23 11:12 ` Lai Jiangshan
2008-10-26 21:59 ` Paul E. McKenney
2008-10-27 21:50 ` Paul E. McKenney
2008-10-27 23:57 ` Paul E. McKenney
2008-10-29 1:16 ` Paul E. McKenney
2008-10-29 1:31 ` Lai Jiangshan
2008-10-30 15:52 ` Paul E. McKenney
2008-10-09 1:33 ` RCU hang on cpu re-hotplug with 2.6.27rc8 Paul E. McKenney
2008-10-09 4:56 ` Andi Kleen
2008-10-09 7:24 ` Thomas Gleixner
2008-10-09 8:22 ` Andi Kleen [this message]
2008-10-09 11:44 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081009082230.GE24560@one.firstfloor.org \
--to=andi@firstfloor.org \
--cc=dipankar@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=paulmck@linux.vnet.ibm.com \
--cc=rjw@sisk.pl \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.