Threaded irqs + 100% CPU RT task = RCU stall

public inbox for linux-rt-users@vger.kernel.org
 help / color / mirror / Atom feed

* Threaded irqs + 100% CPU RT task = RCU stall
@ 2013-03-06 15:49 Paul Gortmaker
  2013-03-06 15:58 ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Paul Gortmaker @ 2013-03-06 15:49 UTC (permalink / raw)
  To: Paul E. McKenney, Thomas Gleixner; +Cc: linux-rt-users, linux-kernel

It turns out that if a person enables threaded irqs, takes a
100% CPU using task, binds it to a core, and then chrt's it,
they will get an RCU stall.

This was seen on 3.6-RT (which implicitly forces "threadirqs=1") and
then reproduced on vanilla 3.6.11, 3.8 and 3.9-rc1 when explicitly
using threadirqs=1

The problem (if you want to call it that, vs. an "educational issue")
is that Joe Average will never correlate the mysterious RCU stall
to the underlying root cause of that core's ksoftirqd never being run.

It wasn't immediately evident to me either, I had to "cheat" by
doing a pseudo bisect on RT which got me looking at RT's patch
called "genirq-force-threading.patch", and then things all started
to make more sense.

To be more specific, say we have user "Joe" who has a busy polling
loop in his RT task, and since Joe isn't that bright, he doesn't
ever bother to briefly yield the core, or change other system thread
priorities.

We can simulate what Joe does by doing this:

-boot 3.9-rc1 SMP box with threadirqs=1 on kernel command line and run:

   echo "main() { for(;;);}" > eatme-simple.c
   gcc eatme-simple.c -o eatme-simple
   taskset -c 1 ./eatme-simple  &
   chrt -p 80 `pgrep eatme`

   <wait ~2m>

   INFO: rcu_preempt self-detected stall on CPU
   INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 4, t=60002 jiffies, g=314, c=313, q=2800)
   Task dump for CPU 1:
   eatme-simple    R  running task        0  1485   1428 0x00000000
    ffff88042b7a7f70 ffffffff81316396 ffff88042a90ad30 ffff88042b7a7fd8
    0000000000010b00 ffff88042b7a7fd8 ffff88042dac2050 ffff88042a90ad30
    0000000000000000 0000000000000000 0000000000000000 ffff88043fc4ceb0
   Call Trace:
    [<ffffffff81316396>] ? __schedule+0x617/0x74b
    [<ffffffff810a4dfc>] ? dput+0x20/0x15c
    [<ffffffff8109577c>] ? __fput+0x1a1/0x1c8
    [<ffffffff810aaf4b>] ? mntput_no_expire+0x13/0x11f
    { 1}  (t=60010 jiffies g=314 c=313 q=2800)
   Pid: 1485, comm: eatme-simple Not tainted 3.9.0-rc1 #15
   Call Trace:
    <IRQ>  [<ffffffff8105c2fc>] ? rcu_check_callbacks+0x1de/0x5ab
    [<ffffffff8102ce08>] ? update_process_times+0x31/0x5c
    [<ffffffff8104b947>] ? tick_handle_periodic+0x18/0x52
    [<ffffffff8101616c>] ? smp_apic_timer_interrupt+0x7d/0x8f
    [<ffffffff813189ca>] ? apic_timer_interrupt+0x6a/0x70

What happens is that the threadirqs=1 option makes the difference in
the invoke_softirq() function -- it will either call do_softirq()
directly, or if threading is on, it will instead call the
wakeup_softirqd() (i.e. defer it to the ksoftirqd.)  But the default
prio for ksoftirqd on that core won't ever trump Joe's eatme task.
So we never do any do_softirq work at all on that core.  Eventually
the rcu stall detection sees it from sched tick:

scheduling-clock interrupt
  --> rcu_check_callbacks
      --> if (rcu_pending)    <--- Stall message!
          --> invoke_rcu_core
              --> raise_softirq(RCU_SOFTIRQ)

The stall comes rom rcu_pending as follows:

rcu_pending
  --> __rcu_pending
      --> check_cpu_stall
          --> print_cpu_stall
              --> printk "self-detected stall.."

Note that softirq isn't implicated at all in the above path, and hence
even if Joe does a bit of digging, he probably still has no idea why
things don't work.  He thinks this is just some as yet undiscovered
kernel/RCU bug.

To confirm that I was properly understanding what was really happening,
I added a quick hack (below) that set a hard upper limit on the number
of times we'd allow continuous defer to wakeup_softirqd() vs. running
do_softirq() directly.  Choosing an arbitrary limit of ~8k showed that
we'd hit it in roughly ~5s intervals with the busy loop program executing.
And we'd avoid the stalls by forcing it when we hit that limit.

So, I guess the question is, whether we want to try and make the system
fail in a more meaningful way -- kind of like the rt throttling message
does - as it lets users know they've hit the wall?  Something watching
for kstat_incr_softirqs traffic perhaps?  Or other options?

Paul.
--

diff --git a/kernel/softirq.c b/kernel/softirq.c
index b4d252f..a1ded00 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -205,6 +205,8 @@ EXPORT_SYMBOL(local_bh_enable_ip);
  */
 #define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)

+DEFINE_PER_CPU(unsigned long, defer_count);
+
 asmlinkage void __do_softirq(void)
 {
 	struct softirq_action *h;
@@ -271,6 +273,9 @@ restart:
 		wakeup_softirqd();
 	}

+	/* we've really run the queue, so reset counter */
+	this_cpu_write(defer_count, 0);
+
 	lockdep_softirq_exit();

 	account_irq_exit_time(current);
@@ -321,19 +326,30 @@ void irq_enter(void)
 	__irq_enter();
 }

+/*
+ * if we've gone too many iterations without having really run,
+ * then genuinely force it, otherwise we'll be setting ourselves
+ * up for a backlog, and get rcu stalls and so on.
+ */
+#define MAX_DEFER 8192
 static inline void invoke_softirq(void)
 {
-	if (!force_irqthreads) {
+	bool overdue = (this_cpu_read(defer_count) > MAX_DEFER);
+
+	if (!force_irqthreads || overdue) {
 #ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED
 		__do_softirq();
 #else
 		do_softirq();
 #endif
+		if (overdue)
+			printk("defer limit reached: softirq execution was forced\n");
 	} else {
 		__local_bh_disable((unsigned long)__builtin_return_address(0),
 				SOFTIRQ_OFFSET);
 		wakeup_softirqd();
 		__local_bh_enable(SOFTIRQ_OFFSET);
+		this_cpu_inc(defer_count);
 	}
 }

-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Threaded irqs + 100% CPU RT task = RCU stall
  2013-03-06 15:49 Threaded irqs + 100% CPU RT task = RCU stall Paul Gortmaker
@ 2013-03-06 15:58 ` Thomas Gleixner
  2013-03-06 16:14   ` Paul Gortmaker
  2013-03-06 17:16   ` Paul E. McKenney
  0 siblings, 2 replies; 10+ messages in thread
From: Thomas Gleixner @ 2013-03-06 15:58 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: Paul E. McKenney, linux-rt-users, linux-kernel

On Wed, 6 Mar 2013, Paul Gortmaker wrote:
> So, I guess the question is, whether we want to try and make the system
> fail in a more meaningful way -- kind of like the rt throttling message
> does - as it lets users know they've hit the wall?  Something watching

That Joe Doe should have noticed the throttler message, which came
before the stall, shouldn't he?

> for kstat_incr_softirqs traffic perhaps?  Or other options?

The rcu stall detector could use the softirq counter and if it did not
change in the stall period print: "Caused by softirq starvation" or
something like that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Threaded irqs + 100% CPU RT task = RCU stall
  2013-03-06 15:58 ` Thomas Gleixner
@ 2013-03-06 16:14   ` Paul Gortmaker
  2013-03-06 17:16   ` Paul E. McKenney
  1 sibling, 0 replies; 10+ messages in thread
From: Paul Gortmaker @ 2013-03-06 16:14 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Paul E. McKenney, linux-rt-users, linux-kernel

On 13-03-06 10:58 AM, Thomas Gleixner wrote:
> On Wed, 6 Mar 2013, Paul Gortmaker wrote:
>> So, I guess the question is, whether we want to try and make the system
>> fail in a more meaningful way -- kind of like the rt throttling message
>> does - as it lets users know they've hit the wall?  Something watching
> 
> That Joe Doe should have noticed the throttler message, which came
> before the stall, shouldn't he?

Actually it isn't showing up at all -- just the RCU stall itself.
Now that you mention it, I do wonder why the throttler message 
isn't tripped though...

Paul.
--

> 
>> for kstat_incr_softirqs traffic perhaps?  Or other options?
> 
> The rcu stall detector could use the softirq counter and if it did not
> change in the stall period print: "Caused by softirq starvation" or
> something like that.
> 
> Thanks,
> 
> 	tglx
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Threaded irqs + 100% CPU RT task = RCU stall
  2013-03-06 15:58 ` Thomas Gleixner
  2013-03-06 16:14   ` Paul Gortmaker
@ 2013-03-06 17:16   ` Paul E. McKenney
  2013-03-06 19:11     ` Thomas Gleixner
  1 sibling, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2013-03-06 17:16 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Paul Gortmaker, linux-rt-users, linux-kernel

On Wed, Mar 06, 2013 at 04:58:54PM +0100, Thomas Gleixner wrote:
> On Wed, 6 Mar 2013, Paul Gortmaker wrote:
> > So, I guess the question is, whether we want to try and make the system
> > fail in a more meaningful way -- kind of like the rt throttling message
> > does - as it lets users know they've hit the wall?  Something watching
> 
> That Joe Doe should have noticed the throttler message, which came
> before the stall, shouldn't he?
> 
> > for kstat_incr_softirqs traffic perhaps?  Or other options?
> 
> The rcu stall detector could use the softirq counter and if it did not
> change in the stall period print: "Caused by softirq starvation" or
> something like that.

The idea is to (at grace-period start) take a snapshot of the CPU's
value of kstat.softirqs[RCU_SOFTIRQ], then check it at stall time, right?

Or do I have the wrong softirq counter?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Threaded irqs + 100% CPU RT task = RCU stall
  2013-03-06 17:16   ` Paul E. McKenney
@ 2013-03-06 19:11     ` Thomas Gleixner
  2013-03-06 21:45       ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2013-03-06 19:11 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Paul Gortmaker, linux-rt-users, linux-kernel

On Wed, 6 Mar 2013, Paul E. McKenney wrote:

> On Wed, Mar 06, 2013 at 04:58:54PM +0100, Thomas Gleixner wrote:
> > On Wed, 6 Mar 2013, Paul Gortmaker wrote:
> > > So, I guess the question is, whether we want to try and make the system
> > > fail in a more meaningful way -- kind of like the rt throttling message
> > > does - as it lets users know they've hit the wall?  Something watching
> > 
> > That Joe Doe should have noticed the throttler message, which came
> > before the stall, shouldn't he?
> > 
> > > for kstat_incr_softirqs traffic perhaps?  Or other options?
> > 
> > The rcu stall detector could use the softirq counter and if it did not
> > change in the stall period print: "Caused by softirq starvation" or
> > something like that.
> 
> The idea is to (at grace-period start) take a snapshot of the CPU's
> value of kstat.softirqs[RCU_SOFTIRQ], then check it at stall time, right?

Yep.
 
> Or do I have the wrong softirq counter?

kstat_softirqs_cpu(RCU_SOFTIRQ, cpu) is the function you want to use.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Threaded irqs + 100% CPU RT task = RCU stall
  2013-03-06 19:11     ` Thomas Gleixner
@ 2013-03-06 21:45       ` Paul E. McKenney
  2013-03-11 17:54         ` Sebastian Andrzej Siewior
  2013-03-13 21:03         ` Paul Gortmaker
  0 siblings, 2 replies; 10+ messages in thread
From: Paul E. McKenney @ 2013-03-06 21:45 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Paul Gortmaker, linux-rt-users, linux-kernel

On Wed, Mar 06, 2013 at 08:11:25PM +0100, Thomas Gleixner wrote:
> On Wed, 6 Mar 2013, Paul E. McKenney wrote:
> 
> > On Wed, Mar 06, 2013 at 04:58:54PM +0100, Thomas Gleixner wrote:
> > > On Wed, 6 Mar 2013, Paul Gortmaker wrote:
> > > > So, I guess the question is, whether we want to try and make the system
> > > > fail in a more meaningful way -- kind of like the rt throttling message
> > > > does - as it lets users know they've hit the wall?  Something watching
> > > 
> > > That Joe Doe should have noticed the throttler message, which came
> > > before the stall, shouldn't he?
> > > 
> > > > for kstat_incr_softirqs traffic perhaps?  Or other options?
> > > 
> > > The rcu stall detector could use the softirq counter and if it did not
> > > change in the stall period print: "Caused by softirq starvation" or
> > > something like that.
> > 
> > The idea is to (at grace-period start) take a snapshot of the CPU's
> > value of kstat.softirqs[RCU_SOFTIRQ], then check it at stall time, right?
> 
> Yep.
> 
> > Or do I have the wrong softirq counter?
> 
> kstat_softirqs_cpu(RCU_SOFTIRQ, cpu) is the function you want to use.

Good point!

So this is what I am considering doing:

o	Adding this under CONFIG_RCU_CPU_STALL_INFO=y.

o	When a given CPU sees a new grace period starting, it takes a
	snapshot of kstat_softirqs_cpu(RCU_SOFTIRQ, smp_processor_id().

o	When a CPU detects a stall, for each CPU that is stalled, it
	prints kstat_softirqs_cpu(RCU_SOFTIRQ, cpu) as well as the
	last snapshot taken from that CPU.

This information will require some interpretation.  Here are a few
things that can happen:

o	The CPU saw the beginning of the stalled grace period, took a
	few more RCU softirqs, but got into a softirq-unfriendly state,
	thus causing the stall.  In this case, the stall-warning
	message would show a small difference between the snapshot
	and the stalled CPU's current value, but this difference would
	not change for additional stall-warning messages happening for
	the same CPU-stall event.  Also, because this CPU did see the
	beginning of the stalled grace period, the line will have a
	"(N ticks this GP)" string instead of an "(N GPs behind)" string.

o	The CPU entered a softirq-unfriendly state before the beginning
	of the stalled grace period.  In this case, the stall-warning
	message would show the number of RCU softirqs that the stalled
	CPU took since the last time it saw the beginning of some prior
	grace period.  This prior grace period might have been a really
	long time ago if the CPU was in dyntick-idle mode immediately
	before going into a softirq-unfriendly state.  In addition,
	the CPU's line in the stall-warning message will have an "(N
	GPs behind)" string.

o	If the CPU entered a softirq-unfriendly state just after the
	beginning of the last grace period that it saw, then and only then
	will it show zero softirq handlers having happened.  It might have
	either a "(N ticks this GP)" string or an "(N GPs behind)" string,
	depending on which grace period was the last one that it saw.

Is this behavior OK?  If so, the following (untested) patch might do
what you want.  ;-)

							Thanx, Paul

------------------------------------------------------------------------

rcu: Add softirq-stall indications to stall-warning messages

If RCU's softirq handler is prevented from executing, an RCU CPU stall
warning can result.  Ways to prevent RCU's softirq handler from executing
include: (1) CPU spinning with interrupts disabled, (2) infinite loop
in some softirq handler, and (3) in -rt kernels, an infinite loop in a
set of real-time threads running at priorities higher than that of RCU's
softirq handler.

Because this situation can be difficult to track down, this commit causes
the count of RCU softirq handler invocations to be printed with RCU
CPU stall warnings.  This information does require some interpretation,
as now documented in Documentation/RCU/stallwarn.txt.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 1927151..e38b8df 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -92,14 +92,14 @@ If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set,
 more information is printed with the stall-warning message, for example:
 
 	INFO: rcu_preempt detected stall on CPU
-	0: (63959 ticks this GP) idle=241/3fffffffffffffff/0
+	0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 softirq=82/543
 	   (t=65000 jiffies)
 
 In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is
 printed:
 
 	INFO: rcu_preempt detected stall on CPU
-	0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 drain=0 . timer not pending
+	0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 nonlazy_posted: 25 .D
 	   (t=65000 jiffies)
 
 The "(64628 ticks this GP)" indicates that this CPU has taken more
@@ -116,13 +116,28 @@ number between the two "/"s is the value of the nesting, which will
 be a small positive number if in the idle loop and a very large positive
 number (as shown above) otherwise.
 
-For CONFIG_RCU_FAST_NO_HZ kernels, the "drain=0" indicates that the CPU is
-not in the process of trying to force itself into dyntick-idle state, the
-"." indicates that the CPU has not given up forcing RCU into dyntick-idle
-mode (it would be "H" otherwise), and the "timer not pending" indicates
-that the CPU has not recently forced RCU into dyntick-idle mode (it
-would otherwise indicate the number of microseconds remaining in this
-forced state).
+The "softirq=" portion of the message tracks the number of RCU softirq
+handlers that the stalled CPU has executed.  The number before the "/"
+is the number that had executed since boot at the time that this CPU
+last noted the beginning of a grace period, which might be the current
+(stalled) grace period, or it might be some earlier grace period (for
+example, if the CPU might have been in dyntick-idle mode for an extended
+time period.  The number after the "/" is the number that have executed
+since boot until the current time.  If this latter number stays constant
+across repeated stall-warning messages, it is possible that RCU's softirq
+handlers are no longer able to execute on this CPU.  This can happen if
+the stalled CPU is spinning with interrupts are disabled, or, in -rt
+kernels, if a high-priority process is starving RCU's softirq handler.
+
+For CONFIG_RCU_FAST_NO_HZ kernels, the "last_accelerate:" prints the
+low-order 16 bits (in hex) of the jiffies counter when this CPU last
+invoked rcu_try_advance_all_cbs() from rcu_needs_cpu() or last invoked
+rcu_accelerate_cbs() from rcu_prepare_for_idle().  The "nonlazy_posted:"
+prints the number of non-lazy callbacks posted since the last call to
+rcu_needs_cpu().  Finally, an "L" indicates that there are currently
+no non-lazy callbacks ("." is printed otherwise, as shown above) and
+"D" indicates that dyntick-idle processing is enabled ("." is printed
+otherwise, for example, if disabled via the "nohz=" kernel boot parameter).
 
 
 Multiple Warnings From One Stall
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index de6ad68..14ee407 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -326,6 +326,11 @@ struct rcu_data {
 	struct task_struct *nocb_kthread;
 #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
 
+	/* 8) RCU CPU stall data. */
+#ifdef CONFIG_RCU_CPU_STALL_INFO
+	unsigned int softirq_snap;	/* Snapshot of softirq activity. */
+#endif /* #ifdef CONFIG_RCU_CPU_STALL_INFO */
+
 	int cpu;
 	struct rcu_state *rsp;
 };
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 723af5f..d084ae3 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -1913,10 +1913,11 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
 		ticks_value = rsp->gpnum - rdp->gpnum;
 	}
 	print_cpu_stall_fast_no_hz(fast_no_hz, cpu);
-	printk(KERN_ERR "\t%d: (%lu %s) idle=%03x/%llx/%d %s\n",
+	printk(KERN_ERR "\t%d: (%lu %s) idle=%03x/%llx/%d softirq=%u/%u %s\n",
 	       cpu, ticks_value, ticks_title,
 	       atomic_read(&rdtp->dynticks) & 0xfff,
 	       rdtp->dynticks_nesting, rdtp->dynticks_nmi_nesting,
+	       rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu),
 	       fast_no_hz);
 }
 
@@ -1930,6 +1931,7 @@ static void print_cpu_stall_info_end(void)
 static void zero_cpu_stall_ticks(struct rcu_data *rdp)
 {
 	rdp->ticks_this_gp = 0;
+	rdp->softirq_snap = kstat_softirqs_cpu(RCU_SOFTIRQ, smp_processor_id());
 }
 
 /* Increment ->ticks_this_gp for all flavors of RCU. */


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Threaded irqs + 100% CPU RT task = RCU stall
  2013-03-06 21:45       ` Paul E. McKenney
@ 2013-03-11 17:54         ` Sebastian Andrzej Siewior
  2013-03-13 21:03         ` Paul Gortmaker
  1 sibling, 0 replies; 10+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-03-11 17:54 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: Thomas Gleixner, Paul E. McKenney, linux-rt-users, linux-kernel

* Paul E. McKenney | 2013-03-06 13:45:02 [-0800]:

>Is this behavior OK?  If so, the following (untested) patch might do
>what you want.  ;-)

Paul (Gortmaker) could you please test Paul's (McKenney) to see if it
makes things for Joe (Average) easier to spot?

Sebastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Threaded irqs + 100% CPU RT task = RCU stall
  2013-03-06 21:45       ` Paul E. McKenney
  2013-03-11 17:54         ` Sebastian Andrzej Siewior
@ 2013-03-13 21:03         ` Paul Gortmaker
  2013-03-13 21:28           ` Paul E. McKenney
  1 sibling, 1 reply; 10+ messages in thread
From: Paul Gortmaker @ 2013-03-13 21:03 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Thomas Gleixner, linux-rt-users, linux-kernel

[Re: Threaded irqs + 100% CPU RT task = RCU stall] On 06/03/2013 (Wed 13:45) Paul E. McKenney wrote:

[...]

> 
> Is this behavior OK?  If so, the following (untested) patch might do
> what you want.  ;-)
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> rcu: Add softirq-stall indications to stall-warning messages

[...]

> 
> +The "softirq=" portion of the message tracks the number of RCU softirq
> +handlers that the stalled CPU has executed.  The number before the "/"
> +is the number that had executed since boot at the time that this CPU
> +last noted the beginning of a grace period, which might be the current
> +(stalled) grace period, or it might be some earlier grace period (for
> +example, if the CPU might have been in dyntick-idle mode for an extended
> +time period.  The number after the "/" is the number that have executed
> +since boot until the current time.  If this latter number stays constant
> +across repeated stall-warning messages, it is possible that RCU's softirq
> +handlers are no longer able to execute on this CPU.  This can happen if
> +the stalled CPU is spinning with interrupts are disabled, or, in -rt
> +kernels, if a high-priority process is starving RCU's softirq handler.

Here is the output of two consecutive stalls (triggered exactly as I'd
described before) after applying the commit and enabling the new config
option for RCU_CPU_STALL_INFO (btw, do we need this?  we already have
the RCU_CPU_STALL_VERBOSE option, and the distinction isn't clear.)

Looking at the output, it doesn't necessarily scream out "you are an
idiot" in a way that Joe Average can immediately parse and understand,
but I guess it does at least arm us with more information so that we
can tell Joe Average that he is an idiot (assuming he posts more than
just a single stall instance).

Also note right after the <EOI> below, it looks like two stall
messages got interleavedi, or a carriage return got dropped...
(not suggesting that this patch caused that.).

Paul.
--

INFO: rcu_preempt self-detected stall on CPU
INFO: rcu_preempt detected stalls on CPUs/tasks:
	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
	(detected by 5, t=60002 jiffies, g=324, c=323, q=1368)
Task dump for CPU 1:
eatme-simple    R  running task        0  1487   1433 0x00000000
 ffff88042ef47f60 ffffffff81316de1 ffff88042e5f5810 ffff88042ef47fd8
 0000000000010c00 ffff88042ef47fd8 ffff88042f994210 ffff88042e5f5810
 0000000000000000 ffff88043f4fe980 ffffffff810a56e4 0000000000000203
Call Trace:
 [<ffffffff81316de1>] ? __schedule+0x62a/0x75e
 [<ffffffff810a56e4>] ? dput+0x20/0x15c
 [<ffffffff81096070>] ? __fput+0x1a1/0x1c8
 [<ffffffff810ab833>] ? mntput_no_expire+0x13/0x11f
 [<ffffffff8101c55c>] ? do_page_fault+0x1f/0x3b

	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
	 (t=60082 jiffies g=324 c=323 q=1368)
Pid: 1487, comm: eatme-simple Not tainted 3.9.0-rc2+ #2
Call Trace:
 <IRQ>  [<ffffffff8105cac6>] ? rcu_check_callbacks+0x215/0x61a
 [<ffffffff8102cffc>] ? update_process_times+0x31/0x5c
 [<ffffffff8104bb5b>] ? tick_handle_periodic+0x18/0x52
 [<ffffffff81016328>] ? smp_apic_timer_interrupt+0x7d/0x8f
 [<ffffffff8131944a>] ? apic_timer_interrupt+0x6a/0x70
 <EOI> 
INFO: rcu_preempt self-detected stall on CPUINFO: rcu_preempt detected stalls on CPUs/tasks:
	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
	(detected by 5, t=240007 jiffies, g=324, c=323, q=9386)
Task dump for CPU 1:
eatme-simple    R  running task        0  1487   1433 0x00000000
 ffff88042ef47f60 ffffffff81316dfd ffff88042e5f5810 ffff88042ef47fd8
 0000000000010c00 ffff88042ef47fd8 ffff88042e5f5810 ffff88042e5f5810
 0000000000000000 ffff88043f4fe980 ffffffff810a56e4 0000000000000203
Call Trace:
 [<ffffffff81316dfd>] ? __schedule+0x646/0x75e
 [<ffffffff810a56e4>] ? dput+0x20/0x15c
 [<ffffffff81096070>] ? __fput+0x1a1/0x1c8
 [<ffffffff810ab833>] ? mntput_no_expire+0x13/0x11f
 [<ffffffff8101c55c>] ? do_page_fault+0x1f/0x3b

	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
	 (t=240087 jiffies g=324 c=323 q=9386)
Pid: 1487, comm: eatme-simple Not tainted 3.9.0-rc2+ #2
Call Trace:
 <IRQ>  [<ffffffff8105cac6>] ? rcu_check_callbacks+0x215/0x61a
 [<ffffffff8102cffc>] ? update_process_times+0x31/0x5c
 [<ffffffff8104bb5b>] ? tick_handle_periodic+0x18/0x52
 [<ffffffff81016328>] ? smp_apic_timer_interrupt+0x7d/0x8f
 [<ffffffff8131944a>] ? apic_timer_interrupt+0x6a/0x70

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Threaded irqs + 100% CPU RT task = RCU stall
  2013-03-13 21:03         ` Paul Gortmaker
@ 2013-03-13 21:28           ` Paul E. McKenney
  2013-03-13 21:35             ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2013-03-13 21:28 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: Thomas Gleixner, linux-rt-users, linux-kernel

On Wed, Mar 13, 2013 at 05:03:07PM -0400, Paul Gortmaker wrote:
> [Re: Threaded irqs + 100% CPU RT task = RCU stall] On 06/03/2013 (Wed 13:45) Paul E. McKenney wrote:
> 
> [...]
> 
> > 
> > Is this behavior OK?  If so, the following (untested) patch might do
> > what you want.  ;-)
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > rcu: Add softirq-stall indications to stall-warning messages
> 
> [...]
> 
> > 
> > +The "softirq=" portion of the message tracks the number of RCU softirq
> > +handlers that the stalled CPU has executed.  The number before the "/"
> > +is the number that had executed since boot at the time that this CPU
> > +last noted the beginning of a grace period, which might be the current
> > +(stalled) grace period, or it might be some earlier grace period (for
> > +example, if the CPU might have been in dyntick-idle mode for an extended
> > +time period.  The number after the "/" is the number that have executed
> > +since boot until the current time.  If this latter number stays constant
> > +across repeated stall-warning messages, it is possible that RCU's softirq
> > +handlers are no longer able to execute on this CPU.  This can happen if
> > +the stalled CPU is spinning with interrupts are disabled, or, in -rt
> > +kernels, if a high-priority process is starving RCU's softirq handler.
> 
> Here is the output of two consecutive stalls (triggered exactly as I'd
> described before) after applying the commit and enabling the new config
> option for RCU_CPU_STALL_INFO (btw, do we need this?  we already have
> the RCU_CPU_STALL_VERBOSE option, and the distinction isn't clear.)
> 
> Looking at the output, it doesn't necessarily scream out "you are an
> idiot" in a way that Joe Average can immediately parse and understand,
> but I guess it does at least arm us with more information so that we
> can tell Joe Average that he is an idiot (assuming he posts more than
> just a single stall instance).

OK, will queue this patch for 3.10, then, with your Tested-by.

> Also note right after the <EOI> below, it looks like two stall
> messages got interleavedi, or a carriage return got dropped...
> (not suggesting that this patch caused that.).

No idea...  Will recheck synchronization.  Oh, wait...  The stall warnings
for self-detected stalls are not synchronized.  This is a tradeoff.
If I synchronize them, and there are multiple CPUs stalling concurrently
and self-detecting those stalls, then I randomly lose stalls from some
of the CPUs.  I could let the winner complain on behalf of all currently
stalled CPUs, but remote stack tracing is inaccurate.

My thought is to leave it, unless someone has a cute idea for making it
all work nicely.

							Thanx, Paul

> Paul.
> --
> 
> INFO: rcu_preempt self-detected stall on CPU
> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
> 	(detected by 5, t=60002 jiffies, g=324, c=323, q=1368)
> Task dump for CPU 1:
> eatme-simple    R  running task        0  1487   1433 0x00000000
>  ffff88042ef47f60 ffffffff81316de1 ffff88042e5f5810 ffff88042ef47fd8
>  0000000000010c00 ffff88042ef47fd8 ffff88042f994210 ffff88042e5f5810
>  0000000000000000 ffff88043f4fe980 ffffffff810a56e4 0000000000000203
> Call Trace:
>  [<ffffffff81316de1>] ? __schedule+0x62a/0x75e
>  [<ffffffff810a56e4>] ? dput+0x20/0x15c
>  [<ffffffff81096070>] ? __fput+0x1a1/0x1c8
>  [<ffffffff810ab833>] ? mntput_no_expire+0x13/0x11f
>  [<ffffffff8101c55c>] ? do_page_fault+0x1f/0x3b
> 
> 	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
> 	 (t=60082 jiffies g=324 c=323 q=1368)
> Pid: 1487, comm: eatme-simple Not tainted 3.9.0-rc2+ #2
> Call Trace:
>  <IRQ>  [<ffffffff8105cac6>] ? rcu_check_callbacks+0x215/0x61a
>  [<ffffffff8102cffc>] ? update_process_times+0x31/0x5c
>  [<ffffffff8104bb5b>] ? tick_handle_periodic+0x18/0x52
>  [<ffffffff81016328>] ? smp_apic_timer_interrupt+0x7d/0x8f
>  [<ffffffff8131944a>] ? apic_timer_interrupt+0x6a/0x70
>  <EOI> 
> INFO: rcu_preempt self-detected stall on CPUINFO: rcu_preempt detected stalls on CPUs/tasks:
> 	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
> 	(detected by 5, t=240007 jiffies, g=324, c=323, q=9386)
> Task dump for CPU 1:
> eatme-simple    R  running task        0  1487   1433 0x00000000
>  ffff88042ef47f60 ffffffff81316dfd ffff88042e5f5810 ffff88042ef47fd8
>  0000000000010c00 ffff88042ef47fd8 ffff88042e5f5810 ffff88042e5f5810
>  0000000000000000 ffff88043f4fe980 ffffffff810a56e4 0000000000000203
> Call Trace:
>  [<ffffffff81316dfd>] ? __schedule+0x646/0x75e
>  [<ffffffff810a56e4>] ? dput+0x20/0x15c
>  [<ffffffff81096070>] ? __fput+0x1a1/0x1c8
>  [<ffffffff810ab833>] ? mntput_no_expire+0x13/0x11f
>  [<ffffffff8101c55c>] ? do_page_fault+0x1f/0x3b
> 
> 	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
> 	 (t=240087 jiffies g=324 c=323 q=9386)
> Pid: 1487, comm: eatme-simple Not tainted 3.9.0-rc2+ #2
> Call Trace:
>  <IRQ>  [<ffffffff8105cac6>] ? rcu_check_callbacks+0x215/0x61a
>  [<ffffffff8102cffc>] ? update_process_times+0x31/0x5c
>  [<ffffffff8104bb5b>] ? tick_handle_periodic+0x18/0x52
>  [<ffffffff81016328>] ? smp_apic_timer_interrupt+0x7d/0x8f
>  [<ffffffff8131944a>] ? apic_timer_interrupt+0x6a/0x70
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Threaded irqs + 100% CPU RT task = RCU stall
  2013-03-13 21:28           ` Paul E. McKenney
@ 2013-03-13 21:35             ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2013-03-13 21:35 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: Thomas Gleixner, linux-rt-users, linux-kernel

On Wed, Mar 13, 2013 at 02:28:02PM -0700, Paul E. McKenney wrote:
> On Wed, Mar 13, 2013 at 05:03:07PM -0400, Paul Gortmaker wrote:
> > [Re: Threaded irqs + 100% CPU RT task = RCU stall] On 06/03/2013 (Wed 13:45) Paul E. McKenney wrote:
> > 
> > [...]
> > 
> > > 
> > > Is this behavior OK?  If so, the following (untested) patch might do
> > > what you want.  ;-)
> > > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > rcu: Add softirq-stall indications to stall-warning messages
> > 
> > [...]
> > 
> > > 
> > > +The "softirq=" portion of the message tracks the number of RCU softirq
> > > +handlers that the stalled CPU has executed.  The number before the "/"
> > > +is the number that had executed since boot at the time that this CPU
> > > +last noted the beginning of a grace period, which might be the current
> > > +(stalled) grace period, or it might be some earlier grace period (for
> > > +example, if the CPU might have been in dyntick-idle mode for an extended
> > > +time period.  The number after the "/" is the number that have executed
> > > +since boot until the current time.  If this latter number stays constant
> > > +across repeated stall-warning messages, it is possible that RCU's softirq
> > > +handlers are no longer able to execute on this CPU.  This can happen if
> > > +the stalled CPU is spinning with interrupts are disabled, or, in -rt
> > > +kernels, if a high-priority process is starving RCU's softirq handler.
> > 
> > Here is the output of two consecutive stalls (triggered exactly as I'd
> > described before) after applying the commit and enabling the new config
> > option for RCU_CPU_STALL_INFO (btw, do we need this?  we already have
> > the RCU_CPU_STALL_VERBOSE option, and the distinction isn't clear.)

I won't argue that the choice of names is particularly inspired.

RCU_CPU_STALL_VERBOSE prints per-task information for any tasks stalling
the current RCU grace period.

RCU_CPU_STALL_INFO prints additional debug info for each CPU.

Perhaps I should merge them or rename them.  I will give it some thought.

							Thanx, Paul

> > Looking at the output, it doesn't necessarily scream out "you are an
> > idiot" in a way that Joe Average can immediately parse and understand,
> > but I guess it does at least arm us with more information so that we
> > can tell Joe Average that he is an idiot (assuming he posts more than
> > just a single stall instance).
> 
> OK, will queue this patch for 3.10, then, with your Tested-by.
> 
> > Also note right after the <EOI> below, it looks like two stall
> > messages got interleavedi, or a carriage return got dropped...
> > (not suggesting that this patch caused that.).
> 
> No idea...  Will recheck synchronization.  Oh, wait...  The stall warnings
> for self-detected stalls are not synchronized.  This is a tradeoff.
> If I synchronize them, and there are multiple CPUs stalling concurrently
> and self-detecting those stalls, then I randomly lose stalls from some
> of the CPUs.  I could let the winner complain on behalf of all currently
> stalled CPUs, but remote stack tracing is inaccurate.
> 
> My thought is to leave it, unless someone has a cute idea for making it
> all work nicely.
> 
> 							Thanx, Paul
> 
> > Paul.
> > --
> > 
> > INFO: rcu_preempt self-detected stall on CPU
> > INFO: rcu_preempt detected stalls on CPUs/tasks:
> > 	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
> > 	(detected by 5, t=60002 jiffies, g=324, c=323, q=1368)
> > Task dump for CPU 1:
> > eatme-simple    R  running task        0  1487   1433 0x00000000
> >  ffff88042ef47f60 ffffffff81316de1 ffff88042e5f5810 ffff88042ef47fd8
> >  0000000000010c00 ffff88042ef47fd8 ffff88042f994210 ffff88042e5f5810
> >  0000000000000000 ffff88043f4fe980 ffffffff810a56e4 0000000000000203
> > Call Trace:
> >  [<ffffffff81316de1>] ? __schedule+0x62a/0x75e
> >  [<ffffffff810a56e4>] ? dput+0x20/0x15c
> >  [<ffffffff81096070>] ? __fput+0x1a1/0x1c8
> >  [<ffffffff810ab833>] ? mntput_no_expire+0x13/0x11f
> >  [<ffffffff8101c55c>] ? do_page_fault+0x1f/0x3b
> > 
> > 	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
> > 	 (t=60082 jiffies g=324 c=323 q=1368)
> > Pid: 1487, comm: eatme-simple Not tainted 3.9.0-rc2+ #2
> > Call Trace:
> >  <IRQ>  [<ffffffff8105cac6>] ? rcu_check_callbacks+0x215/0x61a
> >  [<ffffffff8102cffc>] ? update_process_times+0x31/0x5c
> >  [<ffffffff8104bb5b>] ? tick_handle_periodic+0x18/0x52
> >  [<ffffffff81016328>] ? smp_apic_timer_interrupt+0x7d/0x8f
> >  [<ffffffff8131944a>] ? apic_timer_interrupt+0x6a/0x70
> >  <EOI> 
> > INFO: rcu_preempt self-detected stall on CPUINFO: rcu_preempt detected stalls on CPUs/tasks:
> > 	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
> > 	(detected by 5, t=240007 jiffies, g=324, c=323, q=9386)
> > Task dump for CPU 1:
> > eatme-simple    R  running task        0  1487   1433 0x00000000
> >  ffff88042ef47f60 ffffffff81316dfd ffff88042e5f5810 ffff88042ef47fd8
> >  0000000000010c00 ffff88042ef47fd8 ffff88042e5f5810 ffff88042e5f5810
> >  0000000000000000 ffff88043f4fe980 ffffffff810a56e4 0000000000000203
> > Call Trace:
> >  [<ffffffff81316dfd>] ? __schedule+0x646/0x75e
> >  [<ffffffff810a56e4>] ? dput+0x20/0x15c
> >  [<ffffffff81096070>] ? __fput+0x1a1/0x1c8
> >  [<ffffffff810ab833>] ? mntput_no_expire+0x13/0x11f
> >  [<ffffffff8101c55c>] ? do_page_fault+0x1f/0x3b
> > 
> > 	1: (1 GPs behind) idle=f4f/140000000000001/0 softirq=2256/2257 
> > 	 (t=240087 jiffies g=324 c=323 q=9386)
> > Pid: 1487, comm: eatme-simple Not tainted 3.9.0-rc2+ #2
> > Call Trace:
> >  <IRQ>  [<ffffffff8105cac6>] ? rcu_check_callbacks+0x215/0x61a
> >  [<ffffffff8102cffc>] ? update_process_times+0x31/0x5c
> >  [<ffffffff8104bb5b>] ? tick_handle_periodic+0x18/0x52
> >  [<ffffffff81016328>] ? smp_apic_timer_interrupt+0x7d/0x8f
> >  [<ffffffff8131944a>] ? apic_timer_interrupt+0x6a/0x70
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-03-13 21:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-06 15:49 Threaded irqs + 100% CPU RT task = RCU stall Paul Gortmaker
2013-03-06 15:58 ` Thomas Gleixner
2013-03-06 16:14   ` Paul Gortmaker
2013-03-06 17:16   ` Paul E. McKenney
2013-03-06 19:11     ` Thomas Gleixner
2013-03-06 21:45       ` Paul E. McKenney
2013-03-11 17:54         ` Sebastian Andrzej Siewior
2013-03-13 21:03         ` Paul Gortmaker
2013-03-13 21:28           ` Paul E. McKenney
2013-03-13 21:35             ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox