From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932247AbaHVOsj (ORCPT ); Fri, 22 Aug 2014 10:48:39 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:54873 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932153AbaHVOsi (ORCPT ); Fri, 22 Aug 2014 10:48:38 -0400 Date: Fri, 22 Aug 2014 07:48:19 -0700 From: "Paul E. McKenney" To: Amit Shah Cc: linux-kernel@vger.kernel.org, riel@redhat.com, mingo@kernel.org, laijs@cn.fujitsu.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, dvhart@linux.intel.com, fweisbec@gmail.com, oleg@redhat.com, sbw@mit.edu Subject: Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups Message-ID: <20140822144819.GG2663@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20140812214151.GB3106@linux.vnet.ibm.com> <20140813054439.GA29913@grmbl.mre> <20140813130049.GS4752@linux.vnet.ibm.com> <20140815052411.GF1934@grmbl.mre> <20140815150402.GD4752@linux.vnet.ibm.com> <20140818175345.GD31856@grmbl.mre> <20140819040149.GJ4752@linux.vnet.ibm.com> <20140822122453.GG16198@grmbl.mre> <20140822123651.GH16198@grmbl.mre> <20140822125649.GI16198@grmbl.mre> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140822125649.GI16198@grmbl.mre> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14082214-9332-0000-0000-000001C5AE84 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 22, 2014 at 06:26:49PM +0530, Amit Shah wrote: > On (Fri) 22 Aug 2014 [18:06:51], Amit Shah wrote: > > On (Fri) 22 Aug 2014 [17:54:53], Amit Shah wrote: > > > On (Mon) 18 Aug 2014 [21:01:49], Paul E. McKenney wrote: > > > > > > > The odds are low over the next few days. I am adding nastier rcutorture > > > > testing, however. It would still be very good to get debug information > > > > from your setup. One approach would be to convert the trace function > > > > calls into printk(), if that would help. > > > > > > I added a few printks on the lines of the traces in cases where > > > rcu_nocb_poll was checked -- since that reproduces the hang. Are the > > > following traces sufficient, or should I keep adding more printks? > > > > > > In the case of rcu-trace-nopoll.txt, the messages stop after a while > > > (when the guest locks up hard). That's when I kill the qemu process. > > > > And this is bt from gdb when the endless > > > > RCUDEBUG __call_rcu_nocb_enqueue 2146 rcu_preempt 0 WakeNot > > > > messages are being spewed. > > > > I can't time it, but hope it gives some indication along with the printks. > > ... and after the system 'locks up', this is the state it's in: > > ^C > Program received signal SIGINT, Interrupt. > native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50 > 50 } > (gdb) bt > #0 native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50 > #1 0xffffffff8100b9c1 in arch_safe_halt () at ./arch/x86/include/asm/paravirt.h:111 > #2 default_idle () at arch/x86/kernel/process.c:311 > #3 0xffffffff8100c107 in arch_cpu_idle () at arch/x86/kernel/process.c:302 > #4 0xffffffff8106a25a in cpuidle_idle_call () at kernel/sched/idle.c:120 > #5 cpu_idle_loop () at kernel/sched/idle.c:220 > #6 cpu_startup_entry (state=) at kernel/sched/idle.c:268 > #7 0xffffffff813e068b in rest_init () at init/main.c:418 > #8 0xffffffff81a8cf5a in start_kernel () at init/main.c:680 > #9 0xffffffff81a8c4ba in x86_64_start_reservations (real_mode_data=) at arch/x86/kernel/head64.c:193 > #10 0xffffffff81a8c607 in x86_64_start_kernel (real_mode_data=0x13f90 ) > at arch/x86/kernel/head64.c:182 > #11 0x0000000000000000 in ?? () > > > Wondering why it's doing this. Am stepping through > cpu_startup_entry() to see if I get any clues. This looks to me like normal behavior in the x86 ACPI idle loop. My guess is that the lockup is caused by indefinite blocking, in which case we would expect all the CPUs to be in the idle loop. Of course, this all assumes that your system is using ACPI for idle. (Is it?) Thanx, Paul