All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	linux-rt-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Threaded irqs + 100% CPU RT task = RCU stall
Date: Wed, 13 Mar 2013 14:35:11 -0700	[thread overview]
Message-ID: <20130313213511.GA15969@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130313212802.GA3725@linux.vnet.ibm.com>

On Wed, Mar 13, 2013 at 02:28:02PM -0700, Paul E. McKenney wrote:
> On Wed, Mar 13, 2013 at 05:03:07PM -0400, Paul Gortmaker wrote:
> > [Re: Threaded irqs + 100% CPU RT task = RCU stall] On 06/03/2013 (Wed 13:45) Paul E. McKenney wrote:
> > 
> > [...]
> > 
> > > 
> > > Is this behavior OK?  If so, the following (untested) patch might do
> > > what you want.  ;-)
> > > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > rcu: Add softirq-stall indications to stall-warning messages
> > 
> > [...]
> > 
> > > 
> > > +The "softirq=" portion of the message tracks the number of RCU softirq
> > > +handlers that the stalled CPU has executed.  The number before the "/"
> > > +is the number that had executed since boot at the time that this CPU
> > > +last noted the beginning of a grace period, which might be the current
> > > +(stalled) grace period, or it might be some earlier grace period (for
> > > +example, if the CPU might have been in dyntick-idle mode for an extended
> > > +time period.  The number after the "/" is the number that have executed
> > > +since boot until the current time.  If this latter number stays constant
> > > +across repeated stall-warning messages, it is possible that RCU's softirq
> > > +handlers are no longer able to execute on this CPU.  This can happen if
> > > +the stalled CPU is spinning with interrupts are disabled, or, in -rt
> > > +kernels, if a high-priority process is starving RCU's softirq handler.
> > 
> > Here is the output of two consecutive stalls (triggered exactly as I'd
> > described before) after applying the commit and enabling the new config
> > option for RCU_CPU_STALL_INFO (btw, do we need this?  we already have
> > the RCU_CPU_STALL_VERBOSE option, and the distinction isn't clear.)

I won't argue that the choice of names is particularly inspired.

RCU_CPU_STALL_VERBOSE prints per-task information for any tasks stalling
the current RCU grace period.

RCU_CPU_STALL_INFO prints additional debug info for each CPU.

Perhaps I should merge them or rename them.  I will give it some thought.

							Thanx, Paul

> > Looking at the output, it doesn't necessarily scream out "you are an
> > idiot" in a way that Joe Average can immediately parse and understand,
> > but I guess it does at least arm us with more information so that we
> > can tell Joe Average that he is an idiot (assuming he posts more than
> > just a single stall instance).
> 
> OK, will queue this patch for 3.10, then, with your Tested-by.
> 
> > Also note right after the <EOI> below, it looks like two stall
> > messages got interleavedi, or a carriage return got dropped...
> > (not suggesting that this patch caused that.).
> 
> No idea...  Will recheck synchronization.  Oh, wait...  The stall warnings
> for self-detected stalls are not synchronized.  This is a tradeoff.
> If I synchronize them, and there are multiple CPUs stalling concurrently
> and self-detecting those stalls, then I randomly lose stalls from some
> of the CPUs.  I could let the winner complain on behalf of all currently
> stalled CPUs, but remote stack tracing is inaccurate.
> 
> My thought is to leave it, unless someone has a cute idea for making it
> all work nicely.
> 
> 							Thanx, Paul
> 
> > Paul.
> > --
> > 
> > INFO: rcu_preempt self-detected stall on CPU
> > INFO: rcu_preempt detected stalls on CPUs/tasks:
> > 	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
> > 	(detected by 5, t=60002 jiffies, g=324, c=323, q=1368)
> > Task dump for CPU 1:
> > eatme-simple    R  running task        0  1487   1433 0x00000000
> >  ffff88042ef47f60 ffffffff81316de1 ffff88042e5f5810 ffff88042ef47fd8
> >  0000000000010c00 ffff88042ef47fd8 ffff88042f994210 ffff88042e5f5810
> >  0000000000000000 ffff88043f4fe980 ffffffff810a56e4 0000000000000203
> > Call Trace:
> >  [<ffffffff81316de1>] ? __schedule+0x62a/0x75e
> >  [<ffffffff810a56e4>] ? dput+0x20/0x15c
> >  [<ffffffff81096070>] ? __fput+0x1a1/0x1c8
> >  [<ffffffff810ab833>] ? mntput_no_expire+0x13/0x11f
> >  [<ffffffff8101c55c>] ? do_page_fault+0x1f/0x3b
> > 
> > 	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
> > 	 (t=60082 jiffies g=324 c=323 q=1368)
> > Pid: 1487, comm: eatme-simple Not tainted 3.9.0-rc2+ #2
> > Call Trace:
> >  <IRQ>  [<ffffffff8105cac6>] ? rcu_check_callbacks+0x215/0x61a
> >  [<ffffffff8102cffc>] ? update_process_times+0x31/0x5c
> >  [<ffffffff8104bb5b>] ? tick_handle_periodic+0x18/0x52
> >  [<ffffffff81016328>] ? smp_apic_timer_interrupt+0x7d/0x8f
> >  [<ffffffff8131944a>] ? apic_timer_interrupt+0x6a/0x70
> >  <EOI> 
> > INFO: rcu_preempt self-detected stall on CPUINFO: rcu_preempt detected stalls on CPUs/tasks:
> > 	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
> > 	(detected by 5, t=240007 jiffies, g=324, c=323, q=9386)
> > Task dump for CPU 1:
> > eatme-simple    R  running task        0  1487   1433 0x00000000
> >  ffff88042ef47f60 ffffffff81316dfd ffff88042e5f5810 ffff88042ef47fd8
> >  0000000000010c00 ffff88042ef47fd8 ffff88042e5f5810 ffff88042e5f5810
> >  0000000000000000 ffff88043f4fe980 ffffffff810a56e4 0000000000000203
> > Call Trace:
> >  [<ffffffff81316dfd>] ? __schedule+0x646/0x75e
> >  [<ffffffff810a56e4>] ? dput+0x20/0x15c
> >  [<ffffffff81096070>] ? __fput+0x1a1/0x1c8
> >  [<ffffffff810ab833>] ? mntput_no_expire+0x13/0x11f
> >  [<ffffffff8101c55c>] ? do_page_fault+0x1f/0x3b
> > 
> > 	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
> > 	 (t=240087 jiffies g=324 c=323 q=9386)
> > Pid: 1487, comm: eatme-simple Not tainted 3.9.0-rc2+ #2
> > Call Trace:
> >  <IRQ>  [<ffffffff8105cac6>] ? rcu_check_callbacks+0x215/0x61a
> >  [<ffffffff8102cffc>] ? update_process_times+0x31/0x5c
> >  [<ffffffff8104bb5b>] ? tick_handle_periodic+0x18/0x52
> >  [<ffffffff81016328>] ? smp_apic_timer_interrupt+0x7d/0x8f
> >  [<ffffffff8131944a>] ? apic_timer_interrupt+0x6a/0x70
> > 

      reply	other threads:[~2013-03-13 21:35 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-06 15:49 Threaded irqs + 100% CPU RT task = RCU stall Paul Gortmaker
2013-03-06 15:58 ` Thomas Gleixner
2013-03-06 16:14   ` Paul Gortmaker
2013-03-06 17:16   ` Paul E. McKenney
2013-03-06 19:11     ` Thomas Gleixner
2013-03-06 21:45       ` Paul E. McKenney
2013-03-11 17:54         ` Sebastian Andrzej Siewior
2013-03-13 21:03         ` Paul Gortmaker
2013-03-13 21:28           ` Paul E. McKenney
2013-03-13 21:35             ` Paul E. McKenney [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130313213511.GA15969@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.