All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Amit Shah <amit.shah@redhat.com>
Cc: linux-kernel@vger.kernel.org, riel@redhat.com, mingo@kernel.org,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
	rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com,
	dvhart@linux.intel.com, fweisbec@gmail.com, oleg@redhat.com,
	sbw@mit.edu
Subject: Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups
Date: Fri, 22 Aug 2014 14:53:44 -0700	[thread overview]
Message-ID: <20140822215343.GH2663@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140822171405.GJ16198@grmbl.mre>

On Fri, Aug 22, 2014 at 10:44:05PM +0530, Amit Shah wrote:
> On (Fri) 22 Aug 2014 [07:48:19], Paul E. McKenney wrote:
> > On Fri, Aug 22, 2014 at 06:26:49PM +0530, Amit Shah wrote:
> > > On (Fri) 22 Aug 2014 [18:06:51], Amit Shah wrote:
> > > > On (Fri) 22 Aug 2014 [17:54:53], Amit Shah wrote:
> > > > > On (Mon) 18 Aug 2014 [21:01:49], Paul E. McKenney wrote:
> > > > > 
> > > > > > The odds are low over the next few days.  I am adding nastier rcutorture
> > > > > > testing, however.  It would still be very good to get debug information
> > > > > > from your setup.  One approach would be to convert the trace function
> > > > > > calls into printk(), if that would help.
> > > > > 
> > > > > I added a few printks on the lines of the traces in cases where
> > > > > rcu_nocb_poll was checked -- since that reproduces the hang.  Are the
> > > > > following traces sufficient, or should I keep adding more printks?
> > > > > 
> > > > > In the case of rcu-trace-nopoll.txt, the messages stop after a while
> > > > > (when the guest locks up hard).  That's when I kill the qemu process.
> > > > 
> > > > And this is bt from gdb when the endless 
> > > > 
> > > >   RCUDEBUG __call_rcu_nocb_enqueue 2146 rcu_preempt 0 WakeNot
> > > > 
> > > > messages are being spewed.
> > > > 
> > > > I can't time it, but hope it gives some indication along with the printks.
> > > 
> > > ... and after the system 'locks up', this is the state it's in:
> > > 
> > > ^C
> > > Program received signal SIGINT, Interrupt.
> > > native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
> > > 50		 }
> > > (gdb) bt
> > > #0  native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
> > > #1  0xffffffff8100b9c1 in arch_safe_halt () at ./arch/x86/include/asm/paravirt.h:111
> > > #2  default_idle () at arch/x86/kernel/process.c:311
> > > #3  0xffffffff8100c107 in arch_cpu_idle () at arch/x86/kernel/process.c:302
> > > #4  0xffffffff8106a25a in cpuidle_idle_call () at kernel/sched/idle.c:120
> > > #5  cpu_idle_loop () at kernel/sched/idle.c:220
> > > #6  cpu_startup_entry (state=<optimized out>) at kernel/sched/idle.c:268
> > > #7  0xffffffff813e068b in rest_init () at init/main.c:418
> > > #8  0xffffffff81a8cf5a in start_kernel () at init/main.c:680
> > > #9  0xffffffff81a8c4ba in x86_64_start_reservations (real_mode_data=<optimized out>) at arch/x86/kernel/head64.c:193
> > > #10 0xffffffff81a8c607 in x86_64_start_kernel (real_mode_data=0x13f90 <cpu_lock_stats+29184> <error: Cannot access memory at address 0x13f90>)
> > >     at arch/x86/kernel/head64.c:182
> > > #11 0x0000000000000000 in ?? ()
> > > 
> > > 
> > > Wondering why it's doing this.  Am stepping through
> > > cpu_startup_entry() to see if I get any clues.
> > 
> > This looks to me like normal behavior in the x86 ACPI idle loop.
> > My guess is that the lockup is caused by indefinite blocking, in
> > which case we would expect all the CPUs to be in the idle loop.
> 
> Hm, found it:
> 
> The stall happens in do_initcalls().
> 
> pm_sysrq_init() is the function that causes the hang.  When I #if 0
> the line
> 
>     register_sysrq_key('o', &sysrq_poweroff_op);
> 
> in pm_sysrq_init(), the boot proceeds normally.

Yow!!!

> Now what this is, and what relation this has to rcu and that patch in
> particular is next...

Hmmm...  Please try replacing the synchronize_rcu() in
__sysrq_swap_key_ops() with (say) schedule_timeout_interruptible(HZ / 10).
I bet that gets rid of the hang.  (And also introduces a low-probability
bug, but should be OK for testing.)

The other thing to try is to revert your patch that turned my event
traces into printk()s, then put an ftrace_dump(DUMP_ALL); just after
the synchronize_rcu() -- that might make it so that the ftrace data
actually gets dumped out.

							Thanx, Paul


  parent reply	other threads:[~2014-08-22 21:54 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-07 22:48 [PATCH tip/core/rcu 0/2] Callback-offloading changes for 3.17 Paul E. McKenney
2014-07-11 13:35 ` [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups Paul E. McKenney
2014-07-11 13:35   ` [PATCH tip/core/rcu 2/2] rcu: Don't offload callbacks unless specifically requested Paul E. McKenney
2014-07-11 13:47     ` Frederic Weisbecker
2014-07-11 15:28       ` Paul E. McKenney
2014-08-08  8:40   ` [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups Amit Shah
2014-08-08 16:25     ` Paul E. McKenney
2014-08-08 17:37       ` Amit Shah
2014-08-08 18:18         ` Paul E. McKenney
2014-08-08 18:34           ` Amit Shah
2014-08-08 21:43             ` Paul E. McKenney
2014-08-08 21:46               ` Paul E. McKenney
2014-08-11  7:13                 ` Amit Shah
2014-08-11 16:28                   ` Paul E. McKenney
2014-08-11 19:41                     ` Amit Shah
2014-08-11 20:11                       ` Paul E. McKenney
2014-08-11 20:18                         ` Amit Shah
2014-08-11 20:34                           ` Paul E. McKenney
2014-08-12  3:45                             ` Paul E. McKenney
2014-08-12  5:33                               ` Amit Shah
2014-08-12 16:06                                 ` Paul E. McKenney
2014-08-12 21:39                                   ` Paul E. McKenney
2014-08-12 21:41                                     ` Paul E. McKenney
2014-08-12 21:44                                       ` Paul E. McKenney
2014-08-13  5:44                                       ` Amit Shah
2014-08-13 13:00                                         ` Paul E. McKenney
2014-08-13 14:18                                           ` Paul E. McKenney
2014-08-15  5:24                                           ` Amit Shah
2014-08-15 15:04                                             ` Paul E. McKenney
2014-08-18 17:53                                               ` Amit Shah
2014-08-19  4:01                                                 ` Paul E. McKenney
2014-08-22 12:24                                                   ` Amit Shah
2014-08-22 12:36                                                     ` Amit Shah
2014-08-22 12:56                                                       ` Amit Shah
2014-08-22 14:48                                                         ` Paul E. McKenney
2014-08-22 17:14                                                           ` Amit Shah
2014-08-22 17:37                                                             ` Amit Shah
2014-08-22 21:53                                                             ` Paul E. McKenney [this message]
2014-08-22 21:57                                                               ` Paul E. McKenney
2014-08-22 14:43                                                     ` Paul E. McKenney
2014-08-12  5:27                             ` Amit Shah
2014-08-12 16:08                               ` Paul E. McKenney
  -- strict thread matches above, loose matches on Subject: below --
2014-08-23  7:43 Pranith Kumar
2014-08-23 16:51 ` Paul E. McKenney
2014-08-24  0:26   ` Pranith Kumar
2014-08-24  3:23     ` Paul E. McKenney
2014-08-24  3:39       ` Pranith Kumar
2014-08-24 14:36         ` Paul E. McKenney
2014-08-27  4:43 ` Amit Shah
2014-08-27 16:21   ` Paul E. McKenney
2014-08-27 16:43     ` Pranith Kumar
2014-08-27 17:08       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140822215343.GH2663@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=amit.shah@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhart@linux.intel.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=sbw@mit.edu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.