All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
	josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
	peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com,
	edumazet@google.com, darren@dvhart.com, sbw@mit.edu
Subject: Re: [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine
Date: Wed, 17 Jul 2013 20:39:21 -0700	[thread overview]
Message-ID: <20130718033921.GL4161@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130718013259.GA7398@somewhere>

On Thu, Jul 18, 2013 at 03:33:01AM +0200, Frederic Weisbecker wrote:
> On Wed, Jul 17, 2013 at 05:41:41PM -0700, Paul E. McKenney wrote:
> > On Thu, Jul 18, 2013 at 01:31:21AM +0200, Frederic Weisbecker wrote:
> > > I'm missing a key here.
> > > 
> > > Let's imagine that the timekeeper has finally set full_sysidle_state = RCU_SYSIDLE_FULL_NOTED
> > > with cmpxchg, what guarantees that this CPU is not seeing a stale RCU_SYSIDLE_SHORT value
> > > for example?
> > 
> > Good question!  Let's see if I have a reasonable answer.  ;-)
> > 
> > I am going to start with the large-CPU case, so that the state is advanced
> > only by the grace-period kthread.
> > 
> > 1.	Case 1: the timekeeper CPU invoked rcu_sysidle_force_exit().
> > 	In this case, this is the same CPU that set full_sysidle_state
> > 	to RCU_SYSIDLE_FULL_NOTED, so it is guaranteed not to see a
> > 	stale value.
> > 
> > 2.	Case 2: Some CPU came out of idle, and invoked rcu_sysidle_exit().
> > 	In this case, if this CPU reads a RCU_SYSIDLE_SHORT from
> > 	full_sysidle_state, this read must have come before the
> > 	cmpxchg() (on some other CPU) that set it to RCU_SYSIDLE_LONG.
> > 	Because this CPU's read from full_sysidle_state was preceded by
> > 	an atomic_inc() that updated this CPU's ->dynticks_idle, that
> > 	update must also precede the cmpxchg() that set full_sysidle_state
> > 	to RCU_SYSIDLE_LONG.  Because the state advancing is done from
> > 	within a single thread, the subsequent scan is guaranteed to see
> > 	the first CPU's update of ->dynticks_idle, and will therefore
> > 	refrain from advancing full_sysidle_state to RCU_SYSIDLE_FULL.
> > 
> > 	This will in turn prevent the timekeeping thread from advancing
> > 	the state to RCU_SYSIDLE_FULL_NOTED, so this scenario cannot
> > 	happen.
> 
> Ok, so IIUC the safety is guaranteed in the following ordering:
> 
>     CPU 0                                               CPU 1
> 
>     idle = 1                                            smp_mb()
>     //for each cpu
>     if (atomic_read(rdp(1)->dyntick_idle) & 1)          atomic_inc(rdp->dyntick_idle)
>           idle = 0                                      smp_mb()
> 
>     if (idle)
>          cmpxchg(full_sysidle_state, SHORT, LONG)       while (full_sysidle_state > SHORT)
>                                                             //reset with cmpxchg
> 
> So it's like:
> 
>     CPU 0                                              CPU 1
> 
>     read I                                             write I
>     smp_mb()                                           smp_mb()
>     cmpxchg S                                          read S
> 
> I still can't find what guarantees we don't read a value in CPU 1 that is way below
> what we want.

One key point is that there is a second cycle from LONG to FULL.

(Not saying that there is not a bug -- there might well be.  In fact,
I am starting to think that I need to do another Promela model...)

> > Unfortunately, the reasoning in #2 above does not hold in the small-CPU
> > case because there is the possibility of both the timekeeping CPU and
> > the RCU grace-period kthread concurrently advancing the state machine.
> > This would be bad, good catch!!!
> 
> It's not like I spotted anything myself but you're welcome :)

I will take them any way I can get them.  ;-)

> > The patch below (untested) is an attempt to fix this.  If it actually
> > works, I will merge it in with 6/7.
> > 
> > Anything else I missed?  ;-)
> 
> Well I guess I'll wait one more night before trying to understand
> the below ;)

The key point is that the added check means that either the timekeeping
CPU is advancing the state machine (if there are few CPUs) or the
RCU grace-period kthread is (if there are many CPUs), but never both.
Or that is the intent, anyway!

							Thanx, Paul

>     Thanks!
> 
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index aa3f525..fe83085 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -1364,7 +1364,7 @@ int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
> >  		}
> >  		force_qs_rnp(rsp, dyntick_save_progress_counter,
> >  			     &isidle, &maxj);
> > -		rcu_sysidle_report(rsp, isidle, maxj);
> > +		rcu_sysidle_report_gp(rsp, isidle, maxj);
> >  		fqs_state = RCU_FORCE_QS;
> >  	} else {
> >  		/* Handle dyntick-idle and offline CPUs. */
> > diff --git a/kernel/rcutree.h b/kernel/rcutree.h
> > index 1602c21..657b415 100644
> > --- a/kernel/rcutree.h
> > +++ b/kernel/rcutree.h
> > @@ -559,8 +559,8 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle,
> >  				  unsigned long *maxj);
> >  static bool is_sysidle_rcu_state(struct rcu_state *rsp);
> >  static void rcu_bind_gp_kthread(void);
> > -static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
> > -			       unsigned long maxj);
> > +static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
> > +				  unsigned long maxj);
> >  static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
> >  
> >  #endif /* #ifndef RCU_TREE_NONCORE */
> > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> > index a4d44c3..f65d9c2 100644
> > --- a/kernel/rcutree_plugin.h
> > +++ b/kernel/rcutree_plugin.h
> > @@ -2655,14 +2655,22 @@ static void rcu_sysidle_cancel(void)
> >   * scan of the CPUs' dyntick-idle state.
> >   */
> >  static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
> > -			       unsigned long maxj)
> > +			       unsigned long maxj, bool gpkt)
> >  {
> >  	if (rsp != rcu_sysidle_state)
> >  		return;  /* Wrong flavor, ignore. */
> > -	if (isidle)
> > -		rcu_sysidle(maxj);    /* More idle! */
> > -	else
> > +	if (isidle) {
> > +		if (gpkt && nr_cpu_ids > RCU_SYSIDLE_SMALL)
> > +			rcu_sysidle(maxj);    /* More idle! */
> > +	} else {
> >  		rcu_sysidle_cancel(); /* Idle is over. */
> > +	}
> > +}
> > +
> > +static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
> > +				  unsigned long maxj)
> > +{
> > +	rcu_sysidle_report(rsp, isidle, maxj, true);
> >  }
> >  
> >  /* Callback and function for forcing an RCU grace period. */
> > @@ -2713,7 +2721,8 @@ bool rcu_sys_is_idle(void)
> >  				if (!isidle)
> >  					break;
> >  			}
> > -			rcu_sysidle_report(rcu_sysidle_state, isidle, maxj);
> > +			rcu_sysidle_report(rcu_sysidle_state,
> > +					   isidle, maxj, false);
> >  			oldrss = rss;
> >  			rss = ACCESS_ONCE(full_sysidle_state);
> >  		}
> > @@ -2776,8 +2785,8 @@ static void rcu_bind_gp_kthread(void)
> >  {
> >  }
> >  
> > -static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
> > -			       unsigned long maxj)
> > +static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
> > +				  unsigned long maxj)
> >  {
> >  }
> >  
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 


  reply	other threads:[~2013-07-18  3:39 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-09  1:29 [PATCH RFC nohz_full 0/7] v3 Provide infrastructure for full-system idle Paul E. McKenney
2013-07-09  1:30 ` [PATCH RFC nohz_full 1/7] nohz_full: Add Kconfig parameter for scalable detection of all-idle state Paul E. McKenney
2013-07-09  1:30   ` [PATCH RFC nohz_full 2/7] nohz_full: Add rcu_dyntick data " Paul E. McKenney
2013-07-09  9:37     ` Peter Zijlstra
2013-07-09 13:23       ` Paul E. McKenney
2013-07-09  1:30   ` [PATCH RFC nohz_full 3/7] nohz_full: Add per-CPU idle-state tracking Paul E. McKenney
2013-07-09  1:30   ` [PATCH RFC nohz_full 4/7] nohz_full: Add full-system idle states and variables Paul E. McKenney
2013-07-09  1:30   ` [PATCH RFC nohz_full 5/7] nohz_full: Add full-system-idle arguments to API Paul E. McKenney
2013-07-09  1:30   ` [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine Paul E. McKenney
2013-07-17 23:31     ` Frederic Weisbecker
2013-07-18  0:41       ` Paul E. McKenney
2013-07-18  1:33         ` Frederic Weisbecker
2013-07-18  3:39           ` Paul E. McKenney [this message]
2013-07-18 14:24             ` Frederic Weisbecker
2013-07-18 16:47               ` Paul E. McKenney
2013-07-18 22:46                 ` Frederic Weisbecker
2013-07-19  0:24                   ` Paul E. McKenney
2013-07-19  2:12                     ` Frederic Weisbecker
2013-07-19  5:06                       ` Paul E. McKenney
2013-07-24 18:09                         ` Frederic Weisbecker
2013-07-24 22:09                           ` Paul E. McKenney
2013-07-24 23:26                             ` Frederic Weisbecker
2013-07-26 22:52                               ` Paul E. McKenney
2013-07-27 18:13                                 ` Frederic Weisbecker
2013-07-09  1:30   ` [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU Paul E. McKenney
  -- strict thread matches above, loose matches on Subject: below --
2013-07-26 23:18 [PATCH RFC nohz_full 0/7] v4 Provide infrastructure for full-system idle Paul E. McKenney
2013-07-26 23:19 ` [PATCH RFC nohz_full 1/7] nohz_full: Add Kconfig parameter for scalable detection of all-idle state Paul E. McKenney
2013-07-26 23:19   ` [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine Paul E. McKenney
2013-07-29  8:19     ` Lai Jiangshan
2013-07-29 17:43       ` Paul E. McKenney
2013-08-09 16:20     ` Frederic Weisbecker
2013-08-14  3:07       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130718033921.GL4161@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=darren@dvhart.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sbw@mit.edu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.