From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
laijs@cn.fujitsu.com, dipankar@in.ibm.com,
akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com,
edumazet@google.com, darren@dvhart.com, sbw@mit.edu
Subject: Re: [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine
Date: Wed, 17 Jul 2013 17:41:41 -0700 [thread overview]
Message-ID: <20130718004141.GI4161@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130717233119.GA2801@somewhere>
On Thu, Jul 18, 2013 at 01:31:21AM +0200, Frederic Weisbecker wrote:
> On Mon, Jul 08, 2013 at 06:30:05PM -0700, Paul E. McKenney wrote:
> > }
> >
> > /*
> > + * Unconditionally force exit from full system-idle state. This is
> > + * invoked when a normal CPU exits idle, but must be called separately
> > + * for the timekeeping CPU (tick_do_timer_cpu). The reason for this
> > + * is that the timekeeping CPU is permitted to take scheduling-clock
> > + * interrupts while the system is in system-idle state, and of course
> > + * rcu_sysidle_exit() has no way of distinguishing a scheduling-clock
> > + * interrupt from any other type of interrupt.
> > + */
> > +void rcu_sysidle_force_exit(void)
> > +{
> > + int oldstate = ACCESS_ONCE(full_sysidle_state);
> > + int newoldstate;
> > +
> > + /*
> > + * Each pass through the following loop attempts to exit full
> > + * system-idle state. If contention proves to be a problem,
> > + * a trylock-based contention tree could be used here.
> > + */
> > + while (oldstate > RCU_SYSIDLE_SHORT) {
>
> I'm missing a key here.
>
> Let's imagine that the timekeeper has finally set full_sysidle_state = RCU_SYSIDLE_FULL_NOTED
> with cmpxchg, what guarantees that this CPU is not seeing a stale RCU_SYSIDLE_SHORT value
> for example?
Good question! Let's see if I have a reasonable answer. ;-)
I am going to start with the large-CPU case, so that the state is advanced
only by the grace-period kthread.
1. Case 1: the timekeeper CPU invoked rcu_sysidle_force_exit().
In this case, this is the same CPU that set full_sysidle_state
to RCU_SYSIDLE_FULL_NOTED, so it is guaranteed not to see a
stale value.
2. Case 2: Some CPU came out of idle, and invoked rcu_sysidle_exit().
In this case, if this CPU reads a RCU_SYSIDLE_SHORT from
full_sysidle_state, this read must have come before the
cmpxchg() (on some other CPU) that set it to RCU_SYSIDLE_LONG.
Because this CPU's read from full_sysidle_state was preceded by
an atomic_inc() that updated this CPU's ->dynticks_idle, that
update must also precede the cmpxchg() that set full_sysidle_state
to RCU_SYSIDLE_LONG. Because the state advancing is done from
within a single thread, the subsequent scan is guaranteed to see
the first CPU's update of ->dynticks_idle, and will therefore
refrain from advancing full_sysidle_state to RCU_SYSIDLE_FULL.
This will in turn prevent the timekeeping thread from advancing
the state to RCU_SYSIDLE_FULL_NOTED, so this scenario cannot
happen.
Unfortunately, the reasoning in #2 above does not hold in the small-CPU
case because there is the possibility of both the timekeeping CPU and
the RCU grace-period kthread concurrently advancing the state machine.
This would be bad, good catch!!!
The patch below (untested) is an attempt to fix this. If it actually
works, I will merge it in with 6/7.
Anything else I missed? ;-)
Thanx, Paul
------------------------------------------------------------------------
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index aa3f525..fe83085 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1364,7 +1364,7 @@ int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
}
force_qs_rnp(rsp, dyntick_save_progress_counter,
&isidle, &maxj);
- rcu_sysidle_report(rsp, isidle, maxj);
+ rcu_sysidle_report_gp(rsp, isidle, maxj);
fqs_state = RCU_FORCE_QS;
} else {
/* Handle dyntick-idle and offline CPUs. */
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 1602c21..657b415 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -559,8 +559,8 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle,
unsigned long *maxj);
static bool is_sysidle_rcu_state(struct rcu_state *rsp);
static void rcu_bind_gp_kthread(void);
-static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
- unsigned long maxj);
+static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
+ unsigned long maxj);
static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
#endif /* #ifndef RCU_TREE_NONCORE */
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index a4d44c3..f65d9c2 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2655,14 +2655,22 @@ static void rcu_sysidle_cancel(void)
* scan of the CPUs' dyntick-idle state.
*/
static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
- unsigned long maxj)
+ unsigned long maxj, bool gpkt)
{
if (rsp != rcu_sysidle_state)
return; /* Wrong flavor, ignore. */
- if (isidle)
- rcu_sysidle(maxj); /* More idle! */
- else
+ if (isidle) {
+ if (gpkt && nr_cpu_ids > RCU_SYSIDLE_SMALL)
+ rcu_sysidle(maxj); /* More idle! */
+ } else {
rcu_sysidle_cancel(); /* Idle is over. */
+ }
+}
+
+static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
+ unsigned long maxj)
+{
+ rcu_sysidle_report(rsp, isidle, maxj, true);
}
/* Callback and function for forcing an RCU grace period. */
@@ -2713,7 +2721,8 @@ bool rcu_sys_is_idle(void)
if (!isidle)
break;
}
- rcu_sysidle_report(rcu_sysidle_state, isidle, maxj);
+ rcu_sysidle_report(rcu_sysidle_state,
+ isidle, maxj, false);
oldrss = rss;
rss = ACCESS_ONCE(full_sysidle_state);
}
@@ -2776,8 +2785,8 @@ static void rcu_bind_gp_kthread(void)
{
}
-static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
- unsigned long maxj)
+static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
+ unsigned long maxj)
{
}
next prev parent reply other threads:[~2013-07-18 0:41 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-09 1:29 [PATCH RFC nohz_full 0/7] v3 Provide infrastructure for full-system idle Paul E. McKenney
2013-07-09 1:30 ` [PATCH RFC nohz_full 1/7] nohz_full: Add Kconfig parameter for scalable detection of all-idle state Paul E. McKenney
2013-07-09 1:30 ` [PATCH RFC nohz_full 2/7] nohz_full: Add rcu_dyntick data " Paul E. McKenney
2013-07-09 9:37 ` Peter Zijlstra
2013-07-09 13:23 ` Paul E. McKenney
2013-07-09 1:30 ` [PATCH RFC nohz_full 3/7] nohz_full: Add per-CPU idle-state tracking Paul E. McKenney
2013-07-09 1:30 ` [PATCH RFC nohz_full 4/7] nohz_full: Add full-system idle states and variables Paul E. McKenney
2013-07-09 1:30 ` [PATCH RFC nohz_full 5/7] nohz_full: Add full-system-idle arguments to API Paul E. McKenney
2013-07-09 1:30 ` [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine Paul E. McKenney
2013-07-17 23:31 ` Frederic Weisbecker
2013-07-18 0:41 ` Paul E. McKenney [this message]
2013-07-18 1:33 ` Frederic Weisbecker
2013-07-18 3:39 ` Paul E. McKenney
2013-07-18 14:24 ` Frederic Weisbecker
2013-07-18 16:47 ` Paul E. McKenney
2013-07-18 22:46 ` Frederic Weisbecker
2013-07-19 0:24 ` Paul E. McKenney
2013-07-19 2:12 ` Frederic Weisbecker
2013-07-19 5:06 ` Paul E. McKenney
2013-07-24 18:09 ` Frederic Weisbecker
2013-07-24 22:09 ` Paul E. McKenney
2013-07-24 23:26 ` Frederic Weisbecker
2013-07-26 22:52 ` Paul E. McKenney
2013-07-27 18:13 ` Frederic Weisbecker
2013-07-09 1:30 ` [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU Paul E. McKenney
-- strict thread matches above, loose matches on Subject: below --
2013-07-26 23:18 [PATCH RFC nohz_full 0/7] v4 Provide infrastructure for full-system idle Paul E. McKenney
2013-07-26 23:19 ` [PATCH RFC nohz_full 1/7] nohz_full: Add Kconfig parameter for scalable detection of all-idle state Paul E. McKenney
2013-07-26 23:19 ` [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine Paul E. McKenney
2013-07-29 8:19 ` Lai Jiangshan
2013-07-29 17:43 ` Paul E. McKenney
2013-08-09 16:20 ` Frederic Weisbecker
2013-08-14 3:07 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130718004141.GI4161@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=darren@dvhart.com \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=josh@joshtriplett.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@polymtl.ca \
--cc=mingo@elte.hu \
--cc=niv@us.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sbw@mit.edu \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).