From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Jochen Striepe <jochen@tolot.escape.de>
Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: 3.10.5: rcu_sched detected stalls on CPUs/tasks
Date: Thu, 5 Dec 2013 16:26:14 -0800 [thread overview]
Message-ID: <20131206002614.GD15492@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130923164931.GC2322@pompeji.miese-zwerge.org>
On Mon, Sep 23, 2013 at 06:49:31PM +0200, Jochen Striepe wrote:
> Hello again,
>
> On Sat, Sep 14, 2013 at 01:28:34PM +0200, Jochen Striepe wrote:
> > On Mon, Sep 09, 2013 at 03:27:51PM -0700, Paul E. McKenney wrote:
> > > rcu: Reject memory-order-induced stall-warning false positives
> >
> > I run this patch on top of 3.10.11 vanilla since Wednesday, so far
> > without any further stalls, on light to heavy loads. Works smooth
> > as pie.
>
> Hmm, perhaps it is not as easy as I thought. On exactly this machine
> with exactly this kernel (3.10.11 vanilla with your patch from this
> thread), some minutes ago another one came up. The system should have
> been mostly idle at that moment. Dmesg appended ... do you need
> anything else to have an educated guess? I waited 10 minutes after
> the stall message (following your earlier advise), but no further
> dmesg lines appeared after that.
Hmmm... Does the following patch help?
Thanx, Paul
------------------------------------------------------------------------
rcu: Kick CPU halfway to RCU CPU stall warning
When an RCU CPU stall warning occurs, the CPU invokes resched_cpu() on
itself. This can help move the grace period forward in some situations,
but it would be even better to do this -before- the RCU CPU stall warning.
This commit therefore causes resched_cpu() to be called every five jiffies
once the system is halfway to an RCU CPU stall warning.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index dd081987a8ec..5243ebea0fc1 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -755,6 +755,12 @@ static int dyntick_save_progress_counter(struct rcu_data *rdp,
}
/*
+ * This function really isn't for public consumption, but RCU is special in
+ * that context switches can allow the state machine to make progress.
+ */
+extern void resched_cpu(int cpu);
+
+/*
* Return true if the specified CPU has passed through a quiescent
* state by virtue of being in or having passed through an dynticks
* idle state since the last call to dyntick_save_progress_counter()
@@ -812,16 +818,34 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp,
*/
rcu_kick_nohz_cpu(rdp->cpu);
+ /*
+ * Alternatively, the CPU might be running in the kernel
+ * for an extended period of time without a quiescent state.
+ * Attempt to force the CPU through the scheduler to gain the
+ * needed quiescent state, but only if the grace period has gone
+ * on for an uncommonly long time. If there are many stuck CPUs,
+ * we will beat on the first one until it gets unstuck, then move
+ * to the next. Only do this for the primary flavor of RCU.
+ */
+ if (rdp->rsp == rcu_state &&
+ ULONG_CMP_GE(ACCESS_ONCE(jiffies), rdp->rsp->jiffies_resched)) {
+ rdp->rsp->jiffies_resched += 5;
+ resched_cpu(rdp->cpu);
+ }
+
return 0;
}
static void record_gp_stall_check_time(struct rcu_state *rsp)
{
unsigned long j = ACCESS_ONCE(jiffies);
+ unsigned long j1;
rsp->gp_start = j;
smp_wmb(); /* Record start time before stall time. */
- rsp->jiffies_stall = j + rcu_jiffies_till_stall_check();
+ j1 = rcu_jiffies_till_stall_check();
+ rsp->jiffies_stall = j + j1;
+ rsp->jiffies_resched = j + j1 / 2;
}
/*
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 52be957c9fe2..8e34d8674a4e 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -453,6 +453,8 @@ struct rcu_state {
/* but in jiffies. */
unsigned long jiffies_stall; /* Time at which to check */
/* for CPU stalls. */
+ unsigned long jiffies_resched; /* Time at which to resched */
+ /* a reluctant CPU. */
unsigned long gp_max; /* Maximum GP duration in */
/* jiffies. */
const char *name; /* Name of structure. */
next prev parent reply other threads:[~2013-12-06 0:26 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20130813103202.GB2338@pompeji.miese-zwerge.org>
[not found] ` <20130818180232.GL29406@linux.vnet.ibm.com>
[not found] ` <20130818184848.GA2398@pompeji.miese-zwerge.org>
2013-09-09 21:58 ` 3.10.5: rcu_sched detected stalls on CPUs/tasks Jochen Striepe
2013-09-09 22:27 ` Paul E. McKenney
2013-09-10 7:45 ` Jochen Striepe
2013-09-10 17:54 ` Paul E. McKenney
2013-09-11 10:34 ` Jochen Striepe
2013-09-14 11:28 ` Jochen Striepe
2013-09-14 18:36 ` Paul E. McKenney
2013-09-23 16:49 ` Jochen Striepe
2013-12-06 0:26 ` Paul E. McKenney [this message]
2013-12-06 13:58 ` Jochen Striepe
2013-12-06 14:54 ` Paul E. McKenney
2013-12-10 11:22 ` Jochen Striepe
2013-12-16 22:40 ` Paul E. McKenney
2013-12-22 2:25 ` Jochen Striepe
2013-12-22 4:58 ` Paul E. McKenney
2013-12-27 17:15 ` Jochen Striepe
2013-12-27 18:08 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131206002614.GD15492@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=jochen@tolot.escape.de \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).