From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756007AbaFRVU3 (ORCPT ); Wed, 18 Jun 2014 17:20:29 -0400 Received: from e38.co.us.ibm.com ([32.97.110.159]:55911 "EHLO e38.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755409AbaFRVU0 (ORCPT ); Wed, 18 Jun 2014 17:20:26 -0400 Date: Wed, 18 Jun 2014 14:20:22 -0700 From: "Paul E. McKenney" To: Jiri Kosina Cc: Linus Torvalds , Linux Kernel Mailing List , Michal Hocko , Jan Kara , Frederic Weisbecker , Steven Rostedt , Dave Anderson , Andrew Morton , Petr Mladek , Kay Sievers Subject: Re: [RFC PATCH 00/11] printk: safe printing in NMI context Message-ID: <20140618212022.GV4669@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20140610164641.GD1951@localhost.localdomain> <20140618143612.GC4669@linux.vnet.ibm.com> <20140618162117.GM4669@linux.vnet.ibm.com> <20140618210757.GU4669@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14061821-1344-0000-0000-000002498D80 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 18, 2014 at 11:12:48PM +0200, Jiri Kosina wrote: > On Wed, 18 Jun 2014, Paul E. McKenney wrote: > > > > > /* Complain about tasks blocking the grace period. */ > > > > @@ -1044,8 +1041,7 @@ static void print_cpu_stall(struct rcu_state *rsp) > > > > pr_cont(" (t=%lu jiffies g=%ld c=%ld q=%lu)\n", > > > > jiffies - rsp->gp_start, > > > > (long)rsp->gpnum, (long)rsp->completed, totqlen); > > > > - if (!trigger_all_cpu_backtrace()) > > > > - dump_stack(); > > > > + rcu_dump_cpu_stacks(rsp); > > > > > > This is prone to producing not really consistent stacktraces though, > > > right? As the target task is still running at the time the stack is being > > > walked, it might produce stacktraces that are potentially nonsensial. > > > > If a CPU is stuck, the stack trace down to where it is stuck is > > likely to be static. But yes, there is some potential for confusion. > > My (admittedly limited) rcutorture testing produced sensible stack traces, > > but things might be a bit uglier in other situations. > > I agree that it might work nicely for RCU stall detector indeed. I was > looking for solution that'd work nicely both for RCU and for sysrq-l > (where we can't rely on processess being stuck in any way). Agreed. And if some more generally useful approach appears, I will be quite happy to adjust RCU to use it. In the meantime, I expect that my patch will be helpful. Thanx, Paul > > > How about sending NMI to the target CPU, so that the task is actually > > > stopped, but printing its stacktrace from the CPU that detected the stall > > > while it's stopped? > > > > > > That way, there is no printk()-from-NMI, but also the stacktrace is > > > guaranteed to be self-consistent. > > > > I believe that this was what Steven was suggesting, though by using > > tracing. > > My understanding was that Steven is suggesting using trace_printk() from > NMI. > > > Of course, if my current approach isn't up to the job, then something > > like this general approach would look quite good. > > Thanks, > > -- > Jiri Kosina > SUSE Labs >