From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Amit Shah <amit.shah@redhat.com>
Cc: linux-kernel@vger.kernel.org, riel@redhat.com, mingo@kernel.org,
laijs@cn.fujitsu.com, dipankar@in.ibm.com,
akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com,
dvhart@linux.intel.com, fweisbec@gmail.com, oleg@redhat.com,
sbw@mit.edu
Subject: Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups
Date: Fri, 22 Aug 2014 14:57:20 -0700 [thread overview]
Message-ID: <20140822215720.GA21092@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140822215343.GH2663@linux.vnet.ibm.com>
On Fri, Aug 22, 2014 at 02:53:44PM -0700, Paul E. McKenney wrote:
> On Fri, Aug 22, 2014 at 10:44:05PM +0530, Amit Shah wrote:
> > On (Fri) 22 Aug 2014 [07:48:19], Paul E. McKenney wrote:
> > > On Fri, Aug 22, 2014 at 06:26:49PM +0530, Amit Shah wrote:
> > > > On (Fri) 22 Aug 2014 [18:06:51], Amit Shah wrote:
> > > > > On (Fri) 22 Aug 2014 [17:54:53], Amit Shah wrote:
> > > > > > On (Mon) 18 Aug 2014 [21:01:49], Paul E. McKenney wrote:
> > > > > >
> > > > > > > The odds are low over the next few days. I am adding nastier rcutorture
> > > > > > > testing, however. It would still be very good to get debug information
> > > > > > > from your setup. One approach would be to convert the trace function
> > > > > > > calls into printk(), if that would help.
> > > > > >
> > > > > > I added a few printks on the lines of the traces in cases where
> > > > > > rcu_nocb_poll was checked -- since that reproduces the hang. Are the
> > > > > > following traces sufficient, or should I keep adding more printks?
> > > > > >
> > > > > > In the case of rcu-trace-nopoll.txt, the messages stop after a while
> > > > > > (when the guest locks up hard). That's when I kill the qemu process.
> > > > >
> > > > > And this is bt from gdb when the endless
> > > > >
> > > > > RCUDEBUG __call_rcu_nocb_enqueue 2146 rcu_preempt 0 WakeNot
> > > > >
> > > > > messages are being spewed.
> > > > >
> > > > > I can't time it, but hope it gives some indication along with the printks.
> > > >
> > > > ... and after the system 'locks up', this is the state it's in:
> > > >
> > > > ^C
> > > > Program received signal SIGINT, Interrupt.
> > > > native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
> > > > 50 }
> > > > (gdb) bt
> > > > #0 native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
> > > > #1 0xffffffff8100b9c1 in arch_safe_halt () at ./arch/x86/include/asm/paravirt.h:111
> > > > #2 default_idle () at arch/x86/kernel/process.c:311
> > > > #3 0xffffffff8100c107 in arch_cpu_idle () at arch/x86/kernel/process.c:302
> > > > #4 0xffffffff8106a25a in cpuidle_idle_call () at kernel/sched/idle.c:120
> > > > #5 cpu_idle_loop () at kernel/sched/idle.c:220
> > > > #6 cpu_startup_entry (state=<optimized out>) at kernel/sched/idle.c:268
> > > > #7 0xffffffff813e068b in rest_init () at init/main.c:418
> > > > #8 0xffffffff81a8cf5a in start_kernel () at init/main.c:680
> > > > #9 0xffffffff81a8c4ba in x86_64_start_reservations (real_mode_data=<optimized out>) at arch/x86/kernel/head64.c:193
> > > > #10 0xffffffff81a8c607 in x86_64_start_kernel (real_mode_data=0x13f90 <cpu_lock_stats+29184> <error: Cannot access memory at address 0x13f90>)
> > > > at arch/x86/kernel/head64.c:182
> > > > #11 0x0000000000000000 in ?? ()
> > > >
> > > >
> > > > Wondering why it's doing this. Am stepping through
> > > > cpu_startup_entry() to see if I get any clues.
> > >
> > > This looks to me like normal behavior in the x86 ACPI idle loop.
> > > My guess is that the lockup is caused by indefinite blocking, in
> > > which case we would expect all the CPUs to be in the idle loop.
> >
> > Hm, found it:
> >
> > The stall happens in do_initcalls().
> >
> > pm_sysrq_init() is the function that causes the hang. When I #if 0
> > the line
> >
> > register_sysrq_key('o', &sysrq_poweroff_op);
> >
> > in pm_sysrq_init(), the boot proceeds normally.
>
> Yow!!!
>
> > Now what this is, and what relation this has to rcu and that patch in
> > particular is next...
>
> Hmmm... Please try replacing the synchronize_rcu() in
> __sysrq_swap_key_ops() with (say) schedule_timeout_interruptible(HZ / 10).
> I bet that gets rid of the hang. (And also introduces a low-probability
> bug, but should be OK for testing.)
>
> The other thing to try is to revert your patch that turned my event
> traces into printk()s, then put an ftrace_dump(DUMP_ALL); just after
> the synchronize_rcu() -- that might make it so that the ftrace data
> actually gets dumped out.
And one other thing to try...
Put a printk at the beginning of rcu_spawn_gp_kthread(), which is in
kernel/rcu/tree.c. If that printk does not appear before the call
to pm_sysrq_init(), that would be an important clue.
Thanx, Paul
next prev parent reply other threads:[~2014-08-22 21:57 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-07 22:48 [PATCH tip/core/rcu 0/2] Callback-offloading changes for 3.17 Paul E. McKenney
2014-07-11 13:35 ` [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups Paul E. McKenney
2014-07-11 13:35 ` [PATCH tip/core/rcu 2/2] rcu: Don't offload callbacks unless specifically requested Paul E. McKenney
2014-07-11 13:47 ` Frederic Weisbecker
2014-07-11 15:28 ` Paul E. McKenney
2014-08-08 8:40 ` [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups Amit Shah
2014-08-08 16:25 ` Paul E. McKenney
2014-08-08 17:37 ` Amit Shah
2014-08-08 18:18 ` Paul E. McKenney
2014-08-08 18:34 ` Amit Shah
2014-08-08 21:43 ` Paul E. McKenney
2014-08-08 21:46 ` Paul E. McKenney
2014-08-11 7:13 ` Amit Shah
2014-08-11 16:28 ` Paul E. McKenney
2014-08-11 19:41 ` Amit Shah
2014-08-11 20:11 ` Paul E. McKenney
2014-08-11 20:18 ` Amit Shah
2014-08-11 20:34 ` Paul E. McKenney
2014-08-12 3:45 ` Paul E. McKenney
2014-08-12 5:33 ` Amit Shah
2014-08-12 16:06 ` Paul E. McKenney
2014-08-12 21:39 ` Paul E. McKenney
2014-08-12 21:41 ` Paul E. McKenney
2014-08-12 21:44 ` Paul E. McKenney
2014-08-13 5:44 ` Amit Shah
2014-08-13 13:00 ` Paul E. McKenney
2014-08-13 14:18 ` Paul E. McKenney
2014-08-15 5:24 ` Amit Shah
2014-08-15 15:04 ` Paul E. McKenney
2014-08-18 17:53 ` Amit Shah
2014-08-19 4:01 ` Paul E. McKenney
2014-08-22 12:24 ` Amit Shah
2014-08-22 12:36 ` Amit Shah
2014-08-22 12:56 ` Amit Shah
2014-08-22 14:48 ` Paul E. McKenney
2014-08-22 17:14 ` Amit Shah
2014-08-22 17:37 ` Amit Shah
2014-08-22 21:53 ` Paul E. McKenney
2014-08-22 21:57 ` Paul E. McKenney [this message]
2014-08-22 14:43 ` Paul E. McKenney
2014-08-12 5:27 ` Amit Shah
2014-08-12 16:08 ` Paul E. McKenney
-- strict thread matches above, loose matches on Subject: below --
2014-08-23 7:43 Pranith Kumar
2014-08-23 16:51 ` Paul E. McKenney
2014-08-24 0:26 ` Pranith Kumar
2014-08-24 3:23 ` Paul E. McKenney
2014-08-24 3:39 ` Pranith Kumar
2014-08-24 14:36 ` Paul E. McKenney
2014-08-27 4:43 ` Amit Shah
2014-08-27 16:21 ` Paul E. McKenney
2014-08-27 16:43 ` Pranith Kumar
2014-08-27 17:08 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140822215720.GA21092@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=amit.shah@redhat.com \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=dvhart@linux.intel.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=josh@joshtriplett.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=rostedt@goodmis.org \
--cc=sbw@mit.edu \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).