linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
	josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
	peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com,
	edumazet@google.com, darren@dvhart.com, sbw@mit.edu
Subject: Re: [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine
Date: Thu, 18 Jul 2013 03:33:01 +0200	[thread overview]
Message-ID: <20130718013259.GA7398@somewhere> (raw)
In-Reply-To: <20130718004141.GI4161@linux.vnet.ibm.com>

On Wed, Jul 17, 2013 at 05:41:41PM -0700, Paul E. McKenney wrote:
> On Thu, Jul 18, 2013 at 01:31:21AM +0200, Frederic Weisbecker wrote:
> > I'm missing a key here.
> > 
> > Let's imagine that the timekeeper has finally set full_sysidle_state = RCU_SYSIDLE_FULL_NOTED
> > with cmpxchg, what guarantees that this CPU is not seeing a stale RCU_SYSIDLE_SHORT value
> > for example?
> 
> Good question!  Let's see if I have a reasonable answer.  ;-)
> 
> I am going to start with the large-CPU case, so that the state is advanced
> only by the grace-period kthread.
> 
> 1.	Case 1: the timekeeper CPU invoked rcu_sysidle_force_exit().
> 	In this case, this is the same CPU that set full_sysidle_state
> 	to RCU_SYSIDLE_FULL_NOTED, so it is guaranteed not to see a
> 	stale value.
> 
> 2.	Case 2: Some CPU came out of idle, and invoked rcu_sysidle_exit().
> 	In this case, if this CPU reads a RCU_SYSIDLE_SHORT from
> 	full_sysidle_state, this read must have come before the
> 	cmpxchg() (on some other CPU) that set it to RCU_SYSIDLE_LONG.
> 	Because this CPU's read from full_sysidle_state was preceded by
> 	an atomic_inc() that updated this CPU's ->dynticks_idle, that
> 	update must also precede the cmpxchg() that set full_sysidle_state
> 	to RCU_SYSIDLE_LONG.  Because the state advancing is done from
> 	within a single thread, the subsequent scan is guaranteed to see
> 	the first CPU's update of ->dynticks_idle, and will therefore
> 	refrain from advancing full_sysidle_state to RCU_SYSIDLE_FULL.
> 
> 	This will in turn prevent the timekeeping thread from advancing
> 	the state to RCU_SYSIDLE_FULL_NOTED, so this scenario cannot
> 	happen.

Ok, so IIUC the safety is guaranteed in the following ordering:

    CPU 0                                               CPU 1

    idle = 1                                            smp_mb()
    //for each cpu
    if (atomic_read(rdp(1)->dyntick_idle) & 1)          atomic_inc(rdp->dyntick_idle)
          idle = 0                                      smp_mb()

    if (idle)
         cmpxchg(full_sysidle_state, SHORT, LONG)       while (full_sysidle_state > SHORT)
                                                            //reset with cmpxchg

So it's like:

    CPU 0                                              CPU 1

    read I                                             write I
    smp_mb()                                           smp_mb()
    cmpxchg S                                          read S

I still can't find what guarantees we don't read a value in CPU 1 that is way below
what we want.

> 
> Unfortunately, the reasoning in #2 above does not hold in the small-CPU
> case because there is the possibility of both the timekeeping CPU and
> the RCU grace-period kthread concurrently advancing the state machine.
> This would be bad, good catch!!!

It's not like I spotted anything myself but you're welcome :)

> 
> The patch below (untested) is an attempt to fix this.  If it actually
> works, I will merge it in with 6/7.
> 
> Anything else I missed?  ;-)

Well I guess I'll wait one more night before trying to understand
the below ;)

    Thanks!

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index aa3f525..fe83085 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1364,7 +1364,7 @@ int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
>  		}
>  		force_qs_rnp(rsp, dyntick_save_progress_counter,
>  			     &isidle, &maxj);
> -		rcu_sysidle_report(rsp, isidle, maxj);
> +		rcu_sysidle_report_gp(rsp, isidle, maxj);
>  		fqs_state = RCU_FORCE_QS;
>  	} else {
>  		/* Handle dyntick-idle and offline CPUs. */
> diff --git a/kernel/rcutree.h b/kernel/rcutree.h
> index 1602c21..657b415 100644
> --- a/kernel/rcutree.h
> +++ b/kernel/rcutree.h
> @@ -559,8 +559,8 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle,
>  				  unsigned long *maxj);
>  static bool is_sysidle_rcu_state(struct rcu_state *rsp);
>  static void rcu_bind_gp_kthread(void);
> -static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
> -			       unsigned long maxj);
> +static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
> +				  unsigned long maxj);
>  static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
>  
>  #endif /* #ifndef RCU_TREE_NONCORE */
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index a4d44c3..f65d9c2 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -2655,14 +2655,22 @@ static void rcu_sysidle_cancel(void)
>   * scan of the CPUs' dyntick-idle state.
>   */
>  static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
> -			       unsigned long maxj)
> +			       unsigned long maxj, bool gpkt)
>  {
>  	if (rsp != rcu_sysidle_state)
>  		return;  /* Wrong flavor, ignore. */
> -	if (isidle)
> -		rcu_sysidle(maxj);    /* More idle! */
> -	else
> +	if (isidle) {
> +		if (gpkt && nr_cpu_ids > RCU_SYSIDLE_SMALL)
> +			rcu_sysidle(maxj);    /* More idle! */
> +	} else {
>  		rcu_sysidle_cancel(); /* Idle is over. */
> +	}
> +}
> +
> +static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
> +				  unsigned long maxj)
> +{
> +	rcu_sysidle_report(rsp, isidle, maxj, true);
>  }
>  
>  /* Callback and function for forcing an RCU grace period. */
> @@ -2713,7 +2721,8 @@ bool rcu_sys_is_idle(void)
>  				if (!isidle)
>  					break;
>  			}
> -			rcu_sysidle_report(rcu_sysidle_state, isidle, maxj);
> +			rcu_sysidle_report(rcu_sysidle_state,
> +					   isidle, maxj, false);
>  			oldrss = rss;
>  			rss = ACCESS_ONCE(full_sysidle_state);
>  		}
> @@ -2776,8 +2785,8 @@ static void rcu_bind_gp_kthread(void)
>  {
>  }
>  
> -static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
> -			       unsigned long maxj)
> +static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
> +				  unsigned long maxj)
>  {
>  }
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

  reply	other threads:[~2013-07-18  1:33 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-09  1:29 [PATCH RFC nohz_full 0/7] v3 Provide infrastructure for full-system idle Paul E. McKenney
2013-07-09  1:30 ` [PATCH RFC nohz_full 1/7] nohz_full: Add Kconfig parameter for scalable detection of all-idle state Paul E. McKenney
2013-07-09  1:30   ` [PATCH RFC nohz_full 2/7] nohz_full: Add rcu_dyntick data " Paul E. McKenney
2013-07-09  9:37     ` Peter Zijlstra
2013-07-09 13:23       ` Paul E. McKenney
2013-07-09  1:30   ` [PATCH RFC nohz_full 3/7] nohz_full: Add per-CPU idle-state tracking Paul E. McKenney
2013-07-09  1:30   ` [PATCH RFC nohz_full 4/7] nohz_full: Add full-system idle states and variables Paul E. McKenney
2013-07-09  1:30   ` [PATCH RFC nohz_full 5/7] nohz_full: Add full-system-idle arguments to API Paul E. McKenney
2013-07-09  1:30   ` [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine Paul E. McKenney
2013-07-17 23:31     ` Frederic Weisbecker
2013-07-18  0:41       ` Paul E. McKenney
2013-07-18  1:33         ` Frederic Weisbecker [this message]
2013-07-18  3:39           ` Paul E. McKenney
2013-07-18 14:24             ` Frederic Weisbecker
2013-07-18 16:47               ` Paul E. McKenney
2013-07-18 22:46                 ` Frederic Weisbecker
2013-07-19  0:24                   ` Paul E. McKenney
2013-07-19  2:12                     ` Frederic Weisbecker
2013-07-19  5:06                       ` Paul E. McKenney
2013-07-24 18:09                         ` Frederic Weisbecker
2013-07-24 22:09                           ` Paul E. McKenney
2013-07-24 23:26                             ` Frederic Weisbecker
2013-07-26 22:52                               ` Paul E. McKenney
2013-07-27 18:13                                 ` Frederic Weisbecker
2013-07-09  1:30   ` [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU Paul E. McKenney
  -- strict thread matches above, loose matches on Subject: below --
2013-07-26 23:18 [PATCH RFC nohz_full 0/7] v4 Provide infrastructure for full-system idle Paul E. McKenney
2013-07-26 23:19 ` [PATCH RFC nohz_full 1/7] nohz_full: Add Kconfig parameter for scalable detection of all-idle state Paul E. McKenney
2013-07-26 23:19   ` [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine Paul E. McKenney
2013-07-29  8:19     ` Lai Jiangshan
2013-07-29 17:43       ` Paul E. McKenney
2013-08-09 16:20     ` Frederic Weisbecker
2013-08-14  3:07       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130718013259.GA7398@somewhere \
    --to=fweisbec@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=darren@dvhart.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sbw@mit.edu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).