From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Jochen Striepe <jochen@tolot.escape.de>
Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: 3.10.5: rcu_sched detected stalls on CPUs/tasks
Date: Thu, 5 Dec 2013 16:26:14 -0800 [thread overview]
Message-ID: <20131206002614.GD15492@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130923164931.GC2322@pompeji.miese-zwerge.org>
On Mon, Sep 23, 2013 at 06:49:31PM +0200, Jochen Striepe wrote:
> Hello again,
>
> On Sat, Sep 14, 2013 at 01:28:34PM +0200, Jochen Striepe wrote:
> > On Mon, Sep 09, 2013 at 03:27:51PM -0700, Paul E. McKenney wrote:
> > > rcu: Reject memory-order-induced stall-warning false positives
> >
> > I run this patch on top of 3.10.11 vanilla since Wednesday, so far
> > without any further stalls, on light to heavy loads. Works smooth
> > as pie.
>
> Hmm, perhaps it is not as easy as I thought. On exactly this machine
> with exactly this kernel (3.10.11 vanilla with your patch from this
> thread), some minutes ago another one came up. The system should have
> been mostly idle at that moment. Dmesg appended ... do you need
> anything else to have an educated guess? I waited 10 minutes after
> the stall message (following your earlier advise), but no further
> dmesg lines appeared after that.
Hmmm... Does the following patch help?
Thanx, Paul
------------------------------------------------------------------------
rcu: Kick CPU halfway to RCU CPU stall warning
When an RCU CPU stall warning occurs, the CPU invokes resched_cpu() on
itself. This can help move the grace period forward in some situations,
but it would be even better to do this -before- the RCU CPU stall warning.
This commit therefore causes resched_cpu() to be called every five jiffies
once the system is halfway to an RCU CPU stall warning.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index dd081987a8ec..5243ebea0fc1 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -755,6 +755,12 @@ static int dyntick_save_progress_counter(struct rcu_data *rdp,
}
/*
+ * This function really isn't for public consumption, but RCU is special in
+ * that context switches can allow the state machine to make progress.
+ */
+extern void resched_cpu(int cpu);
+
+/*
* Return true if the specified CPU has passed through a quiescent
* state by virtue of being in or having passed through an dynticks
* idle state since the last call to dyntick_save_progress_counter()
@@ -812,16 +818,34 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp,
*/
rcu_kick_nohz_cpu(rdp->cpu);
+ /*
+ * Alternatively, the CPU might be running in the kernel
+ * for an extended period of time without a quiescent state.
+ * Attempt to force the CPU through the scheduler to gain the
+ * needed quiescent state, but only if the grace period has gone
+ * on for an uncommonly long time. If there are many stuck CPUs,
+ * we will beat on the first one until it gets unstuck, then move
+ * to the next. Only do this for the primary flavor of RCU.
+ */
+ if (rdp->rsp == rcu_state &&
+ ULONG_CMP_GE(ACCESS_ONCE(jiffies), rdp->rsp->jiffies_resched)) {
+ rdp->rsp->jiffies_resched += 5;
+ resched_cpu(rdp->cpu);
+ }
+
return 0;
}
static void record_gp_stall_check_time(struct rcu_state *rsp)
{
unsigned long j = ACCESS_ONCE(jiffies);
+ unsigned long j1;
rsp->gp_start = j;
smp_wmb(); /* Record start time before stall time. */
- rsp->jiffies_stall = j + rcu_jiffies_till_stall_check();
+ j1 = rcu_jiffies_till_stall_check();
+ rsp->jiffies_stall = j + j1;
+ rsp->jiffies_resched = j + j1 / 2;
}
/*
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 52be957c9fe2..8e34d8674a4e 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -453,6 +453,8 @@ struct rcu_state {
/* but in jiffies. */
unsigned long jiffies_stall; /* Time at which to check */
/* for CPU stalls. */
+ unsigned long jiffies_resched; /* Time at which to resched */
+ /* a reluctant CPU. */
unsigned long gp_max; /* Maximum GP duration in */
/* jiffies. */
const char *name; /* Name of structure. */
next prev parent reply other threads:[~2013-12-06 0:26 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-13 10:32 3.10.5: rcu_sched detected stalls on CPUs/tasks Jochen Striepe
2013-08-18 18:02 ` Paul E. McKenney
2013-08-18 18:48 ` Jochen Striepe
2013-09-09 21:58 ` Jochen Striepe
2013-09-09 22:27 ` Paul E. McKenney
2013-09-10 7:45 ` Jochen Striepe
2013-09-10 17:54 ` Paul E. McKenney
2013-09-11 10:34 ` Jochen Striepe
2013-09-14 11:28 ` Jochen Striepe
2013-09-14 18:36 ` Paul E. McKenney
2013-09-23 16:49 ` Jochen Striepe
2013-12-06 0:26 ` Paul E. McKenney [this message]
2013-12-06 13:58 ` Jochen Striepe
2013-12-06 14:54 ` Paul E. McKenney
2013-12-10 11:22 ` Jochen Striepe
2013-12-16 22:40 ` Paul E. McKenney
2013-12-22 2:25 ` Jochen Striepe
2013-12-22 4:58 ` Paul E. McKenney
2013-12-27 17:15 ` Jochen Striepe
2013-12-27 18:08 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131206002614.GD15492@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=jochen@tolot.escape.de \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.