From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
linux-rt-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Threaded irqs + 100% CPU RT task = RCU stall
Date: Wed, 13 Mar 2013 14:28:02 -0700 [thread overview]
Message-ID: <20130313212802.GA3725@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130313210307.GA22448@windriver.com>
On Wed, Mar 13, 2013 at 05:03:07PM -0400, Paul Gortmaker wrote:
> [Re: Threaded irqs + 100% CPU RT task = RCU stall] On 06/03/2013 (Wed 13:45) Paul E. McKenney wrote:
>
> [...]
>
> >
> > Is this behavior OK? If so, the following (untested) patch might do
> > what you want. ;-)
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > rcu: Add softirq-stall indications to stall-warning messages
>
> [...]
>
> >
> > +The "softirq=" portion of the message tracks the number of RCU softirq
> > +handlers that the stalled CPU has executed. The number before the "/"
> > +is the number that had executed since boot at the time that this CPU
> > +last noted the beginning of a grace period, which might be the current
> > +(stalled) grace period, or it might be some earlier grace period (for
> > +example, if the CPU might have been in dyntick-idle mode for an extended
> > +time period. The number after the "/" is the number that have executed
> > +since boot until the current time. If this latter number stays constant
> > +across repeated stall-warning messages, it is possible that RCU's softirq
> > +handlers are no longer able to execute on this CPU. This can happen if
> > +the stalled CPU is spinning with interrupts are disabled, or, in -rt
> > +kernels, if a high-priority process is starving RCU's softirq handler.
>
> Here is the output of two consecutive stalls (triggered exactly as I'd
> described before) after applying the commit and enabling the new config
> option for RCU_CPU_STALL_INFO (btw, do we need this? we already have
> the RCU_CPU_STALL_VERBOSE option, and the distinction isn't clear.)
>
> Looking at the output, it doesn't necessarily scream out "you are an
> idiot" in a way that Joe Average can immediately parse and understand,
> but I guess it does at least arm us with more information so that we
> can tell Joe Average that he is an idiot (assuming he posts more than
> just a single stall instance).
OK, will queue this patch for 3.10, then, with your Tested-by.
> Also note right after the <EOI> below, it looks like two stall
> messages got interleavedi, or a carriage return got dropped...
> (not suggesting that this patch caused that.).
No idea... Will recheck synchronization. Oh, wait... The stall warnings
for self-detected stalls are not synchronized. This is a tradeoff.
If I synchronize them, and there are multiple CPUs stalling concurrently
and self-detecting those stalls, then I randomly lose stalls from some
of the CPUs. I could let the winner complain on behalf of all currently
stalled CPUs, but remote stack tracing is inaccurate.
My thought is to leave it, unless someone has a cute idea for making it
all work nicely.
Thanx, Paul
> Paul.
> --
>
> INFO: rcu_preempt self-detected stall on CPU
> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257
> (detected by 5, t=60002 jiffies, g=324, c=323, q=1368)
> Task dump for CPU 1:
> eatme-simple R running task 0 1487 1433 0x00000000
> ffff88042ef47f60 ffffffff81316de1 ffff88042e5f5810 ffff88042ef47fd8
> 0000000000010c00 ffff88042ef47fd8 ffff88042f994210 ffff88042e5f5810
> 0000000000000000 ffff88043f4fe980 ffffffff810a56e4 0000000000000203
> Call Trace:
> [<ffffffff81316de1>] ? __schedule+0x62a/0x75e
> [<ffffffff810a56e4>] ? dput+0x20/0x15c
> [<ffffffff81096070>] ? __fput+0x1a1/0x1c8
> [<ffffffff810ab833>] ? mntput_no_expire+0x13/0x11f
> [<ffffffff8101c55c>] ? do_page_fault+0x1f/0x3b
>
> 1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257
> (t=60082 jiffies g=324 c=323 q=1368)
> Pid: 1487, comm: eatme-simple Not tainted 3.9.0-rc2+ #2
> Call Trace:
> <IRQ> [<ffffffff8105cac6>] ? rcu_check_callbacks+0x215/0x61a
> [<ffffffff8102cffc>] ? update_process_times+0x31/0x5c
> [<ffffffff8104bb5b>] ? tick_handle_periodic+0x18/0x52
> [<ffffffff81016328>] ? smp_apic_timer_interrupt+0x7d/0x8f
> [<ffffffff8131944a>] ? apic_timer_interrupt+0x6a/0x70
> <EOI>
> INFO: rcu_preempt self-detected stall on CPUINFO: rcu_preempt detected stalls on CPUs/tasks:
> 1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257
> (detected by 5, t=240007 jiffies, g=324, c=323, q=9386)
> Task dump for CPU 1:
> eatme-simple R running task 0 1487 1433 0x00000000
> ffff88042ef47f60 ffffffff81316dfd ffff88042e5f5810 ffff88042ef47fd8
> 0000000000010c00 ffff88042ef47fd8 ffff88042e5f5810 ffff88042e5f5810
> 0000000000000000 ffff88043f4fe980 ffffffff810a56e4 0000000000000203
> Call Trace:
> [<ffffffff81316dfd>] ? __schedule+0x646/0x75e
> [<ffffffff810a56e4>] ? dput+0x20/0x15c
> [<ffffffff81096070>] ? __fput+0x1a1/0x1c8
> [<ffffffff810ab833>] ? mntput_no_expire+0x13/0x11f
> [<ffffffff8101c55c>] ? do_page_fault+0x1f/0x3b
>
> 1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257
> (t=240087 jiffies g=324 c=323 q=9386)
> Pid: 1487, comm: eatme-simple Not tainted 3.9.0-rc2+ #2
> Call Trace:
> <IRQ> [<ffffffff8105cac6>] ? rcu_check_callbacks+0x215/0x61a
> [<ffffffff8102cffc>] ? update_process_times+0x31/0x5c
> [<ffffffff8104bb5b>] ? tick_handle_periodic+0x18/0x52
> [<ffffffff81016328>] ? smp_apic_timer_interrupt+0x7d/0x8f
> [<ffffffff8131944a>] ? apic_timer_interrupt+0x6a/0x70
>
next prev parent reply other threads:[~2013-03-13 21:28 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-06 15:49 Threaded irqs + 100% CPU RT task = RCU stall Paul Gortmaker
2013-03-06 15:58 ` Thomas Gleixner
2013-03-06 16:14 ` Paul Gortmaker
2013-03-06 17:16 ` Paul E. McKenney
2013-03-06 19:11 ` Thomas Gleixner
2013-03-06 21:45 ` Paul E. McKenney
2013-03-11 17:54 ` Sebastian Andrzej Siewior
2013-03-13 21:03 ` Paul Gortmaker
2013-03-13 21:28 ` Paul E. McKenney [this message]
2013-03-13 21:35 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130313212802.GA3725@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=paul.gortmaker@windriver.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.