linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <compudj@krystal.dyndns.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, josh@joshtriplett.org, niv@us.ibm.com,
	tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org,
	Valdis.Kletnieks@vt.edu, dhowells@redhat.com,
	eric.dumazet@gmail.com, darren@dvhart.com, fweisbec@gmail.com,
	patches@linaro.org
Subject: Re: [PATCH RFC tip/core/rcu] rcu: direct algorithmic SRCU implementation
Date: Wed, 15 Feb 2012 09:51:44 -0500	[thread overview]
Message-ID: <20120215145144.GA6277@Krystal> (raw)
In-Reply-To: <20120215143116.GA1696@Krystal>

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> * Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> [...]
> > +/*
> > + * To be called from the update side after an index flip.  Returns true
> > + * if the modulo sum of the counters is stably zero, false if there is
> > + * some possibility of non-zero.
> > + */
> > +static bool srcu_readers_active_idx_check(struct srcu_struct *sp, int idx)
> >  {
> >  	int cpu;
> > -	int sum;
> >  
> > -	sum = 0;
> > +	/*
> > +	 * Note that srcu_readers_active_idx() can incorrectly return
> > +	 * zero even though there is a pre-existing reader throughout.
> > +	 * To see this, suppose that task A is in a very long SRCU
> > +	 * read-side critical section that started on CPU 0, and that
> > +	 * no other reader exists, so that the modulo sum of the counters
> > +	 * is equal to one.  Then suppose that task B starts executing
> > +	 * srcu_readers_active_idx(), summing up to CPU 1, and then that
> > +	 * task C starts reading on CPU 0, so that its increment is not
> > +	 * summed, but finishes reading on CPU 2, so that its decrement
> > +	 * -is- summed.  Then when task B completes its sum, it will
> > +	 * incorrectly get zero, despite the fact that task A has been
> > +	 * in its SRCU read-side critical section the whole time.
> > +	 *
> > +	 * We therefore do a validation step should srcu_readers_active_idx()
> > +	 * return zero.
> > +	 */
> > +	if (srcu_readers_active_idx(sp, idx) != 0)
> > +		return false;
> > +
> > +	/*
> > +	 * Since the caller recently flipped ->completed, we can see at
> > +	 * most one increment of each CPU's counter from this point
> > +	 * forward.  The reason for this is that the reader CPU must have
> > +	 * fetched the index before srcu_readers_active_idx checked
> > +	 * that CPU's counter, but not yet incremented its counter.
> > +	 * Its eventual counter increment will follow the read in
> > +	 * srcu_readers_active_idx(), and that increment is immediately
> > +	 * followed by smp_mb() B.  Because smp_mb() D is between
> > +	 * the ->completed flip and srcu_readers_active_idx()'s read,
> > +	 * that CPU's subsequent load of ->completed must see the new
> > +	 * value, and therefore increment the counter in the other rank.
> > +	 */
> > +	smp_mb(); /* A */
> > +
> > +	/*
> > +	 * Now, we check the ->snap array that srcu_readers_active_idx()
> > +	 * filled in from the per-CPU counter values.  Since both
> > +	 * __srcu_read_lock() and __srcu_read_unlock() increment the
> > +	 * upper bits of the per-CPU counter, an increment/decrement
> > +	 * pair will change the value of the counter.  Since there is
> > +	 * only one possible increment, the only way to wrap the counter
> > +	 * is to have a huge number of counter decrements, which requires
> > +	 * a huge number of tasks and huge SRCU read-side critical-section
> > +	 * nesting levels, even on 32-bit systems.
> > +	 *
> > +	 * All of the ways of confusing the readings require that the scan
> > +	 * in srcu_readers_active_idx() see the read-side task's decrement,
> > +	 * but not its increment.  However, between that decrement and
> > +	 * increment are smb_mb() B and C.  Either or both of these pair
> > +	 * with smp_mb() A above to ensure that the scan below will see
> > +	 * the read-side tasks's increment, thus noting a difference in
> > +	 * the counter values between the two passes.
> 
> Hi Paul,
> 
> I think the implementation is correct, but the explanation above might
> be improved. Let's consider the following a scenario, where a reader is
> migrated between increment of the counter and issuing the memory barrier
> in the read lock:
> 
> A,B,C are readers
> D is synchronize_rcu (one flip'n'wait)
> 
> CPU A               CPU B           CPU C         CPU D
>                                     c[1]++
>                                     smp_mb(1)
>                                                   read c[0] -> 0
> c[0]++
> (implicit smp_mb (2))
>       -> migrated ->
>                     (implicit smp_mb (3))
>                     smp_mb (4)
>                     smp_mb (5)
>                     c[1]--
>                                                   read c[1] -> -1
>                                                   read c[2] -> 1
>                                                   (false 0 sum)
>                                                   smp_mb (6)
>                                                   re-check each.
>                                     c[1]--
> 
> re-check: because we observed c[1] == -1, thanks to the implicit memory
> barriers within thread migration (2 and 3), we can assume that we _will_
> observe the updated value of c[0] after smp_mb (6).
> 
> The current explanation states that memory barriers 4 and 5, along with  
> 6, are responsible for ensuring that the increment will be observed by 
> the re-check. However, I doubt they have anything to do with it: it's 
> rather the implicit memory barriers in thread migration, along with
> program order guarantees on writes to the same address, that seems to be
> the reason why we can do this ordering assumption.

Please disregard the part about program order: CPU A writes to c[0], and
CPU B writes to c[1], which are two different memory locations. The rest
of my discussion stands though.

Simply reasoning about write to c[0], memory barriers 2-3, write to
c[1], along with c[1] read, memory barrier 6, and then c[0] read is
enough to explain the ordering guarantees you need, without invoking
program order.

Thanks,

Mathieu

> 
> Does it make sense, or shall I get another coffee to wake myself up ?
> ;)
> 
> Thanks,
> 
> Mathieu
> 
> > +	 *
> > +	 * Therefore, if srcu_readers_active_idx() returned zero, and
> > +	 * none of the counters changed, we know that the zero was the
> > +	 * correct sum.
> > +	 *
> > +	 * Of course, it is possible that a task might be delayed
> > +	 * for a very long time in __srcu_read_lock() after fetching
> > +	 * the index but before incrementing its counter.  This
> > +	 * possibility will be dealt with in __synchronize_srcu().
> > +	 */
> >  	for_each_possible_cpu(cpu)
> > -		sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx];
> > -	return sum;
> > +		if (sp->snap[cpu] !=
> > +		    ACCESS_ONCE(per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx]))
> > +			return false;  /* False zero reading! */
> > +	return true;
> >  }
> 
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2012-02-15 14:51 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-13  2:09 [PATCH RFC tip/core/rcu] rcu: direct algorithmic SRCU implementation Paul E. McKenney
2012-02-15 12:59 ` Peter Zijlstra
2012-02-16  6:35   ` Paul E. McKenney
2012-02-16 10:50     ` Mathieu Desnoyers
2012-02-16 10:52       ` Peter Zijlstra
2012-02-16 11:14         ` Mathieu Desnoyers
2012-02-15 14:31 ` Mathieu Desnoyers
2012-02-15 14:51   ` Mathieu Desnoyers [this message]
2012-02-16  6:38     ` Paul E. McKenney
2012-02-16 11:00       ` Mathieu Desnoyers
2012-02-16 11:51         ` Peter Zijlstra
2012-02-16 12:18           ` Mathieu Desnoyers
2012-02-16 12:44             ` Peter Zijlstra
2012-02-16 14:52               ` Mathieu Desnoyers
2012-02-16 14:58                 ` Peter Zijlstra
2012-02-16 15:13               ` Paul E. McKenney
2012-02-20  7:15 ` Lai Jiangshan
2012-02-20 17:44   ` Paul E. McKenney
2012-02-21  1:11     ` Lai Jiangshan
2012-02-21  1:50       ` Paul E. McKenney
2012-02-21  8:44         ` Lai Jiangshan
2012-02-21 17:24           ` Paul E. McKenney
2012-02-22  9:29             ` [PATCH 1/3 RFC paul/rcu/srcu] srcu: Remove fast check path Lai Jiangshan
2012-02-22  9:29             ` [PATCH 2/3 RFC paul/rcu/srcu] srcu: only increase the upper bit for srcu_read_lock() Lai Jiangshan
2012-02-22  9:50               ` Peter Zijlstra
2012-02-22 21:20               ` Paul E. McKenney
2012-02-22 21:26                 ` Paul E. McKenney
2012-02-22 21:39                   ` Steven Rostedt
2012-02-23  1:01                     ` Paul E. McKenney
2012-02-22  9:29             ` [PATCH 3/3 RFC paul/rcu/srcu] srcu: flip only once for every grace period Lai Jiangshan
2012-02-23  1:01               ` Paul E. McKenney
2012-02-24  8:06               ` Lai Jiangshan
2012-02-24 20:01                 ` Paul E. McKenney
2012-02-27  8:01                   ` [PATCH 1/2 RFC] srcu: change the comments of the wait algorithm Lai Jiangshan
2012-02-27  8:01                   ` [PATCH 2/2 RFC] srcu: implement Peter's checking algorithm Lai Jiangshan
2012-02-27 18:30                     ` Paul E. McKenney
2012-02-28  1:51                       ` Lai Jiangshan
2012-02-28 13:47                         ` Paul E. McKenney
2012-02-29 10:07                           ` Lai Jiangshan
2012-02-29 13:55                             ` Paul E. McKenney
2012-03-01  2:31                               ` Lai Jiangshan
2012-03-01 13:20                                 ` Paul E. McKenney
2012-03-10  3:41                                   ` Lai Jiangshan
2012-03-06  8:42             ` [RFC PATCH 0/6 paul/rcu/srcu] srcu: implement call_srcu() Lai Jiangshan
2012-03-06  9:57               ` [PATCH 1/6] remove unused srcu_barrier() Lai Jiangshan
2012-03-06  9:57                 ` [PATCH 2/6] Don't touch the snap in srcu_readers_active() Lai Jiangshan
2012-03-08 19:14                   ` Paul E. McKenney
2012-03-06  9:57                 ` [PATCH 3/6] use "int trycount" instead of "bool expedited" Lai Jiangshan
2012-03-08 19:25                   ` Paul E. McKenney
2012-03-06  9:57                 ` [PATCH 4/6] remove flip_idx_and_wait() Lai Jiangshan
2012-03-06 10:41                   ` Peter Zijlstra
2012-03-07  3:54                   ` [RFC PATCH 5/5 single-thread-version] implement per-domain single-thread state machine call_srcu() Lai Jiangshan
2012-03-08 13:04                     ` Peter Zijlstra
2012-03-08 14:17                       ` Lai Jiangshan
2012-03-08 13:08                     ` Peter Zijlstra
2012-03-08 20:35                     ` Paul E. McKenney
2012-03-10  3:16                       ` Lai Jiangshan
2012-03-12 18:03                         ` Paul E. McKenney
2012-03-14  7:47                           ` Lai Jiangshan
2012-04-10 20:15                             ` Paul E. McKenney
2012-03-06  9:57                 ` [RFC PATCH 5/6] implement per-cpu&per-domain " Lai Jiangshan
2012-03-06 10:47                   ` Peter Zijlstra
2012-03-08 19:44                     ` Paul E. McKenney
2012-03-06 10:58                   ` Peter Zijlstra
2012-03-06 15:17                     ` Lai Jiangshan
2012-03-06 15:38                       ` Peter Zijlstra
2012-03-08 19:49                         ` Paul E. McKenney
2012-03-10 10:12                           ` Peter Zijlstra
2012-03-12 17:52                             ` Paul E. McKenney
2012-03-06 11:16                   ` Peter Zijlstra
2012-03-06 15:12                     ` Lai Jiangshan
2012-03-06 15:34                       ` Peter Zijlstra
2012-03-08 19:58                         ` Paul E. McKenney
2012-03-10  3:32                           ` Lai Jiangshan
2012-03-10 10:09                           ` Peter Zijlstra
2012-03-12 17:54                             ` Paul E. McKenney
2012-03-12 17:58                               ` Peter Zijlstra
2012-03-12 18:32                                 ` Paul E. McKenney
2012-03-12 20:25                                   ` Peter Zijlstra
2012-03-12 23:15                                     ` Paul E. McKenney
2012-03-12 23:18                                       ` Peter Zijlstra
2012-03-12 23:38                                         ` Paul E. McKenney
2012-03-06 15:26                     ` Lai Jiangshan
2012-03-06 15:37                       ` Peter Zijlstra
2012-03-06 11:17                   ` Peter Zijlstra
2012-03-06 11:22                   ` Peter Zijlstra
2012-03-06 11:35                   ` Peter Zijlstra
2012-03-06 11:36                   ` Peter Zijlstra
2012-03-06 11:39                   ` Peter Zijlstra
2012-03-06 14:50                     ` Lai Jiangshan
2012-03-06 11:52                   ` Peter Zijlstra
2012-03-06 14:44                     ` Lai Jiangshan
2012-03-06 15:31                       ` Peter Zijlstra
2012-03-06 15:32                       ` Peter Zijlstra
2012-03-07  6:44                         ` Lai Jiangshan
2012-03-07  8:10                       ` Gilad Ben-Yossef
2012-03-07  9:21                         ` Lai Jiangshan
2012-03-06 14:47                     ` Lai Jiangshan
2012-03-06  9:57                 ` [PATCH 6/6] add srcu torture test Lai Jiangshan
2012-03-08 19:03                 ` [PATCH 1/6] remove unused srcu_barrier() Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120215145144.GA6277@Krystal \
    --to=compudj@krystal.dyndns.org \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=darren@dvhart.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=patches@linaro.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).