All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
	josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
	peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com,
	edumazet@google.com, darren@dvhart.com, sbw@mit.edu
Subject: Re: [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine
Date: Wed, 17 Jul 2013 17:41:41 -0700	[thread overview]
Message-ID: <20130718004141.GI4161@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130717233119.GA2801@somewhere>

On Thu, Jul 18, 2013 at 01:31:21AM +0200, Frederic Weisbecker wrote:
> On Mon, Jul 08, 2013 at 06:30:05PM -0700, Paul E. McKenney wrote:
> >  }
> >  
> >  /*
> > + * Unconditionally force exit from full system-idle state.  This is
> > + * invoked when a normal CPU exits idle, but must be called separately
> > + * for the timekeeping CPU (tick_do_timer_cpu).  The reason for this
> > + * is that the timekeeping CPU is permitted to take scheduling-clock
> > + * interrupts while the system is in system-idle state, and of course
> > + * rcu_sysidle_exit() has no way of distinguishing a scheduling-clock
> > + * interrupt from any other type of interrupt.
> > + */
> > +void rcu_sysidle_force_exit(void)
> > +{
> > +	int oldstate = ACCESS_ONCE(full_sysidle_state);
> > +	int newoldstate;
> > +
> > +	/*
> > +	 * Each pass through the following loop attempts to exit full
> > +	 * system-idle state.  If contention proves to be a problem,
> > +	 * a trylock-based contention tree could be used here.
> > +	 */
> > +	while (oldstate > RCU_SYSIDLE_SHORT) {
> 
> I'm missing a key here.
> 
> Let's imagine that the timekeeper has finally set full_sysidle_state = RCU_SYSIDLE_FULL_NOTED
> with cmpxchg, what guarantees that this CPU is not seeing a stale RCU_SYSIDLE_SHORT value
> for example?

Good question!  Let's see if I have a reasonable answer.  ;-)

I am going to start with the large-CPU case, so that the state is advanced
only by the grace-period kthread.

1.	Case 1: the timekeeper CPU invoked rcu_sysidle_force_exit().
	In this case, this is the same CPU that set full_sysidle_state
	to RCU_SYSIDLE_FULL_NOTED, so it is guaranteed not to see a
	stale value.

2.	Case 2: Some CPU came out of idle, and invoked rcu_sysidle_exit().
	In this case, if this CPU reads a RCU_SYSIDLE_SHORT from
	full_sysidle_state, this read must have come before the
	cmpxchg() (on some other CPU) that set it to RCU_SYSIDLE_LONG.
	Because this CPU's read from full_sysidle_state was preceded by
	an atomic_inc() that updated this CPU's ->dynticks_idle, that
	update must also precede the cmpxchg() that set full_sysidle_state
	to RCU_SYSIDLE_LONG.  Because the state advancing is done from
	within a single thread, the subsequent scan is guaranteed to see
	the first CPU's update of ->dynticks_idle, and will therefore
	refrain from advancing full_sysidle_state to RCU_SYSIDLE_FULL.

	This will in turn prevent the timekeeping thread from advancing
	the state to RCU_SYSIDLE_FULL_NOTED, so this scenario cannot
	happen.

Unfortunately, the reasoning in #2 above does not hold in the small-CPU
case because there is the possibility of both the timekeeping CPU and
the RCU grace-period kthread concurrently advancing the state machine.
This would be bad, good catch!!!

The patch below (untested) is an attempt to fix this.  If it actually
works, I will merge it in with 6/7.

Anything else I missed?  ;-)

							Thanx, Paul

------------------------------------------------------------------------

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index aa3f525..fe83085 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1364,7 +1364,7 @@ int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
 		}
 		force_qs_rnp(rsp, dyntick_save_progress_counter,
 			     &isidle, &maxj);
-		rcu_sysidle_report(rsp, isidle, maxj);
+		rcu_sysidle_report_gp(rsp, isidle, maxj);
 		fqs_state = RCU_FORCE_QS;
 	} else {
 		/* Handle dyntick-idle and offline CPUs. */
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 1602c21..657b415 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -559,8 +559,8 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle,
 				  unsigned long *maxj);
 static bool is_sysidle_rcu_state(struct rcu_state *rsp);
 static void rcu_bind_gp_kthread(void);
-static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
-			       unsigned long maxj);
+static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
+				  unsigned long maxj);
 static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
 
 #endif /* #ifndef RCU_TREE_NONCORE */
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index a4d44c3..f65d9c2 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2655,14 +2655,22 @@ static void rcu_sysidle_cancel(void)
  * scan of the CPUs' dyntick-idle state.
  */
 static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
-			       unsigned long maxj)
+			       unsigned long maxj, bool gpkt)
 {
 	if (rsp != rcu_sysidle_state)
 		return;  /* Wrong flavor, ignore. */
-	if (isidle)
-		rcu_sysidle(maxj);    /* More idle! */
-	else
+	if (isidle) {
+		if (gpkt && nr_cpu_ids > RCU_SYSIDLE_SMALL)
+			rcu_sysidle(maxj);    /* More idle! */
+	} else {
 		rcu_sysidle_cancel(); /* Idle is over. */
+	}
+}
+
+static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
+				  unsigned long maxj)
+{
+	rcu_sysidle_report(rsp, isidle, maxj, true);
 }
 
 /* Callback and function for forcing an RCU grace period. */
@@ -2713,7 +2721,8 @@ bool rcu_sys_is_idle(void)
 				if (!isidle)
 					break;
 			}
-			rcu_sysidle_report(rcu_sysidle_state, isidle, maxj);
+			rcu_sysidle_report(rcu_sysidle_state,
+					   isidle, maxj, false);
 			oldrss = rss;
 			rss = ACCESS_ONCE(full_sysidle_state);
 		}
@@ -2776,8 +2785,8 @@ static void rcu_bind_gp_kthread(void)
 {
 }
 
-static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
-			       unsigned long maxj)
+static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
+				  unsigned long maxj)
 {
 }
 


  reply	other threads:[~2013-07-18  0:41 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-09  1:29 [PATCH RFC nohz_full 0/7] v3 Provide infrastructure for full-system idle Paul E. McKenney
2013-07-09  1:30 ` [PATCH RFC nohz_full 1/7] nohz_full: Add Kconfig parameter for scalable detection of all-idle state Paul E. McKenney
2013-07-09  1:30   ` [PATCH RFC nohz_full 2/7] nohz_full: Add rcu_dyntick data " Paul E. McKenney
2013-07-09  9:37     ` Peter Zijlstra
2013-07-09 13:23       ` Paul E. McKenney
2013-07-09  1:30   ` [PATCH RFC nohz_full 3/7] nohz_full: Add per-CPU idle-state tracking Paul E. McKenney
2013-07-09  1:30   ` [PATCH RFC nohz_full 4/7] nohz_full: Add full-system idle states and variables Paul E. McKenney
2013-07-09  1:30   ` [PATCH RFC nohz_full 5/7] nohz_full: Add full-system-idle arguments to API Paul E. McKenney
2013-07-09  1:30   ` [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine Paul E. McKenney
2013-07-17 23:31     ` Frederic Weisbecker
2013-07-18  0:41       ` Paul E. McKenney [this message]
2013-07-18  1:33         ` Frederic Weisbecker
2013-07-18  3:39           ` Paul E. McKenney
2013-07-18 14:24             ` Frederic Weisbecker
2013-07-18 16:47               ` Paul E. McKenney
2013-07-18 22:46                 ` Frederic Weisbecker
2013-07-19  0:24                   ` Paul E. McKenney
2013-07-19  2:12                     ` Frederic Weisbecker
2013-07-19  5:06                       ` Paul E. McKenney
2013-07-24 18:09                         ` Frederic Weisbecker
2013-07-24 22:09                           ` Paul E. McKenney
2013-07-24 23:26                             ` Frederic Weisbecker
2013-07-26 22:52                               ` Paul E. McKenney
2013-07-27 18:13                                 ` Frederic Weisbecker
2013-07-09  1:30   ` [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU Paul E. McKenney
  -- strict thread matches above, loose matches on Subject: below --
2013-07-26 23:18 [PATCH RFC nohz_full 0/7] v4 Provide infrastructure for full-system idle Paul E. McKenney
2013-07-26 23:19 ` [PATCH RFC nohz_full 1/7] nohz_full: Add Kconfig parameter for scalable detection of all-idle state Paul E. McKenney
2013-07-26 23:19   ` [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine Paul E. McKenney
2013-07-29  8:19     ` Lai Jiangshan
2013-07-29 17:43       ` Paul E. McKenney
2013-08-09 16:20     ` Frederic Weisbecker
2013-08-14  3:07       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130718004141.GI4161@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=darren@dvhart.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sbw@mit.edu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.