From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Lance Roy <ldr709@gmail.com>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
jiangshanlai@gmail.com, dipankar@in.ibm.com,
akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com,
dvhart@linux.intel.com, fweisbec@gmail.com, oleg@redhat.com,
bobby.prani@gmail.com
Subject: Re: [PATCH v3 tip/core/rcu 1/4] srcu: Implement more-efficient reader counts
Date: Wed, 25 Jan 2017 13:03:36 -0800 [thread overview]
Message-ID: <20170125210336.GI3989@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170125101752.3f2efd6c@gmail.com>
On Wed, Jan 25, 2017 at 10:17:52AM -0800, Lance Roy wrote:
> Could you please use the new patch? The remark about ULONG_MAX - NR_CPUS is
> incorrect in this one.
Apologies -- I very carefully applied your patch, verified that it changed
only comments, and then very carefully forgot to rebase the rest of the
commits on top of it.
Fixed on -rcu branch dev.2017.01.25a.
Thanx, Paul
> Thanks,
> Lance
>
> On Tue, 24 Jan 2017 14:00:26 -0800
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>
> > From: Lance Roy <ldr709@gmail.com>
> >
> > SRCU uses two per-cpu counters: a nesting counter to count the number of
> > active critical sections, and a sequence counter to ensure that the nesting
> > counters don't change while they are being added together in
> > srcu_readers_active_idx_check().
> >
> > This patch instead uses per-cpu lock and unlock counters. Because both
> > counters only increase and srcu_readers_active_idx_check() reads the unlock
> > counter before the lock counter, this achieves the same end without having
> > to increment two different counters in srcu_read_lock(). This also saves a
> > smp_mb() in srcu_readers_active_idx_check().
> >
> > A possible problem with this patch is that it can only handle
> > ULONG_MAX - NR_CPUS simultaneous readers, whereas the old version could
> > handle up to ULONG_MAX.
> >
> > Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Signed-off-by: Lance Roy <ldr709@gmail.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > ---
> > include/linux/srcu.h | 10 ++--
> > kernel/rcu/rcutorture.c | 19 +++++++-
> > kernel/rcu/srcu.c | 123
> > ++++++++++++++++++------------------------------ 3 files changed, 67
> > insertions(+), 85 deletions(-)
> >
> > diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> > index dc8eb63c6568..a598cf3ac70c 100644
> > --- a/include/linux/srcu.h
> > +++ b/include/linux/srcu.h
> > @@ -33,9 +33,9 @@
> > #include <linux/rcupdate.h>
> > #include <linux/workqueue.h>
> >
> > -struct srcu_struct_array {
> > - unsigned long c[2];
> > - unsigned long seq[2];
> > +struct srcu_array {
> > + unsigned long lock_count[2];
> > + unsigned long unlock_count[2];
> > };
> >
> > struct rcu_batch {
> > @@ -46,7 +46,7 @@ struct rcu_batch {
> >
> > struct srcu_struct {
> > unsigned long completed;
> > - struct srcu_struct_array __percpu *per_cpu_ref;
> > + struct srcu_array __percpu *per_cpu_ref;
> > spinlock_t queue_lock; /* protect ->batch_queue, ->running */
> > bool running;
> > /* callbacks just queued */
> > @@ -118,7 +118,7 @@ void process_srcu(struct work_struct *work);
> > * See include/linux/percpu-defs.h for the rules on per-CPU variables.
> > */
> > #define __DEFINE_SRCU(name,
> > is_static) \
> > - static DEFINE_PER_CPU(struct srcu_struct_array, name##_srcu_array);\
> > + static DEFINE_PER_CPU(struct srcu_array, name##_srcu_array);\
> > is_static struct srcu_struct name = __SRCU_STRUCT_INIT(name)
> > #define DEFINE_SRCU(name) __DEFINE_SRCU(name, /* not static
> > */) #define DEFINE_STATIC_SRCU(name) __DEFINE_SRCU(name, static)
> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index 87c51225ceec..d81345be730e 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -564,10 +564,25 @@ static void srcu_torture_stats(void)
> > pr_alert("%s%s per-CPU(idx=%d):",
> > torture_type, TORTURE_FLAG, idx);
> > for_each_possible_cpu(cpu) {
> > + unsigned long l0, l1;
> > + unsigned long u0, u1;
> > long c0, c1;
> > + struct srcu_array *counts =
> > per_cpu_ptr(srcu_ctlp->per_cpu_ref, cpu);
> > - c0 = (long)per_cpu_ptr(srcu_ctlp->per_cpu_ref, cpu)->c[!idx];
> > - c1 = (long)per_cpu_ptr(srcu_ctlp->per_cpu_ref, cpu)->c[idx];
> > + u0 = counts->unlock_count[!idx];
> > + u1 = counts->unlock_count[idx];
> > +
> > + /*
> > + * Make sure that a lock is always counted if the
> > corresponding
> > + * unlock is counted.
> > + */
> > + smp_rmb();
> > +
> > + l0 = counts->lock_count[!idx];
> > + l1 = counts->lock_count[idx];
> > +
> > + c0 = l0 - u0;
> > + c1 = l1 - u1;
> > pr_cont(" %d(%ld,%ld)", cpu, c0, c1);
> > }
> > pr_cont("\n");
> > diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
> > index 9b9cdd549caa..ddabf5fbf562 100644
> > --- a/kernel/rcu/srcu.c
> > +++ b/kernel/rcu/srcu.c
> > @@ -106,7 +106,7 @@ static int init_srcu_struct_fields(struct srcu_struct *sp)
> > rcu_batch_init(&sp->batch_check1);
> > rcu_batch_init(&sp->batch_done);
> > INIT_DELAYED_WORK(&sp->work, process_srcu);
> > - sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array);
> > + sp->per_cpu_ref = alloc_percpu(struct srcu_array);
> > return sp->per_cpu_ref ? 0 : -ENOMEM;
> > }
> >
> > @@ -141,114 +141,78 @@ EXPORT_SYMBOL_GPL(init_srcu_struct);
> > #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> >
> > /*
> > - * Returns approximate total of the readers' ->seq[] values for the
> > + * Returns approximate total of the readers' ->lock_count[] values for the
> > * rank of per-CPU counters specified by idx.
> > */
> > -static unsigned long srcu_readers_seq_idx(struct srcu_struct *sp, int idx)
> > +static unsigned long srcu_readers_lock_idx(struct srcu_struct *sp, int idx)
> > {
> > int cpu;
> > unsigned long sum = 0;
> > - unsigned long t;
> >
> > for_each_possible_cpu(cpu) {
> > - t = READ_ONCE(per_cpu_ptr(sp->per_cpu_ref, cpu)->seq[idx]);
> > - sum += t;
> > + struct srcu_array *cpuc = per_cpu_ptr(sp->per_cpu_ref, cpu);
> > +
> > + sum += READ_ONCE(cpuc->lock_count[idx]);
> > }
> > return sum;
> > }
> >
> > /*
> > - * Returns approximate number of readers active on the specified rank
> > - * of the per-CPU ->c[] counters.
> > + * Returns approximate total of the readers' ->unlock_count[] values for the
> > + * rank of per-CPU counters specified by idx.
> > */
> > -static unsigned long srcu_readers_active_idx(struct srcu_struct *sp, int idx)
> > +static unsigned long srcu_readers_unlock_idx(struct srcu_struct *sp, int idx)
> > {
> > int cpu;
> > unsigned long sum = 0;
> > - unsigned long t;
> >
> > for_each_possible_cpu(cpu) {
> > - t = READ_ONCE(per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx]);
> > - sum += t;
> > + struct srcu_array *cpuc = per_cpu_ptr(sp->per_cpu_ref, cpu);
> > +
> > + sum += READ_ONCE(cpuc->unlock_count[idx]);
> > }
> > return sum;
> > }
> >
> > /*
> > * Return true if the number of pre-existing readers is determined to
> > - * be stably zero. An example unstable zero can occur if the call
> > - * to srcu_readers_active_idx() misses an __srcu_read_lock() increment,
> > - * but due to task migration, sees the corresponding __srcu_read_unlock()
> > - * decrement. This can happen because srcu_readers_active_idx() takes
> > - * time to sum the array, and might in fact be interrupted or preempted
> > - * partway through the summation.
> > + * be zero.
> > */
> > static bool srcu_readers_active_idx_check(struct srcu_struct *sp, int idx)
> > {
> > - unsigned long seq;
> > + unsigned long unlocks;
> >
> > - seq = srcu_readers_seq_idx(sp, idx);
> > + unlocks = srcu_readers_unlock_idx(sp, idx);
> >
> > /*
> > - * The following smp_mb() A pairs with the smp_mb() B located in
> > - * __srcu_read_lock(). This pairing ensures that if an
> > - * __srcu_read_lock() increments its counter after the summation
> > - * in srcu_readers_active_idx(), then the corresponding SRCU
> > read-side
> > - * critical section will see any changes made prior to the start
> > - * of the current SRCU grace period.
> > + * Make sure that a lock is always counted if the corresponding
> > unlock
> > + * is counted. Needs to be a smp_mb() as the read side may contain a
> > + * read from a variable that is written to before the
> > synchronize_srcu()
> > + * in the write side. In this case smp_mb()s A and B act like the
> > store
> > + * buffering pattern.
> > *
> > - * Also, if the above call to srcu_readers_seq_idx() saw the
> > - * increment of ->seq[], then the call to srcu_readers_active_idx()
> > - * must see the increment of ->c[].
> > + * This smp_mb() also pairs with smp_mb() C to prevent writes after
> > the
> > + * synchronize_srcu() from being executed before the grace period
> > ends. */
> > smp_mb(); /* A */
> >
> > /*
> > - * Note that srcu_readers_active_idx() can incorrectly return
> > - * zero even though there is a pre-existing reader throughout.
> > - * To see this, suppose that task A is in a very long SRCU
> > - * read-side critical section that started on CPU 0, and that
> > - * no other reader exists, so that the sum of the counters
> > - * is equal to one. Then suppose that task B starts executing
> > - * srcu_readers_active_idx(), summing up to CPU 1, and then that
> > - * task C starts reading on CPU 0, so that its increment is not
> > - * summed, but finishes reading on CPU 2, so that its decrement
> > - * -is- summed. Then when task B completes its sum, it will
> > - * incorrectly get zero, despite the fact that task A has been
> > - * in its SRCU read-side critical section the whole time.
> > + * If the locks are the same as the unlocks, then there must have
> > + * been no readers on this index at some time in between. This does
> > not
> > + * mean that there are no more readers, as one could have read the
> > + * current index but not have incremented the lock counter yet.
> > *
> > - * We therefore do a validation step should srcu_readers_active_idx()
> > - * return zero.
> > + * Note that there can be at most NR_CPUS worth of readers using the
> > old
> > + * index that haven't incremented ->lock_count[] yet. Therefore, the
> > + * sum of the ->lock_count[]s cannot increment enough times to
> > overflow
> > + * and end up equal the sum of the ->unlock_count[]s, as long as
> > there
> > + * are at most ULONG_MAX - NR_CPUS readers at a time. (Yes, this
> > does
> > + * mean that systems having more than a billion or so CPUs need to be
> > + * 64-bit systems.) Therefore, the only way that the return values
> > of
> > + * the two calls to srcu_readers_(un)lock_idx() can be equal is if
> > there
> > + * are no active readers using this index.
> > */
> > - if (srcu_readers_active_idx(sp, idx) != 0)
> > - return false;
> > -
> > - /*
> > - * The remainder of this function is the validation step.
> > - * The following smp_mb() D pairs with the smp_mb() C in
> > - * __srcu_read_unlock(). If the __srcu_read_unlock() was seen
> > - * by srcu_readers_active_idx() above, then any destructive
> > - * operation performed after the grace period will happen after
> > - * the corresponding SRCU read-side critical section.
> > - *
> > - * Note that there can be at most NR_CPUS worth of readers using
> > - * the old index, which is not enough to overflow even a 32-bit
> > - * integer. (Yes, this does mean that systems having more than
> > - * a billion or so CPUs need to be 64-bit systems.) Therefore,
> > - * the sum of the ->seq[] counters cannot possibly overflow.
> > - * Therefore, the only way that the return values of the two
> > - * calls to srcu_readers_seq_idx() can be equal is if there were
> > - * no increments of the corresponding rank of ->seq[] counts
> > - * in the interim. But the missed-increment scenario laid out
> > - * above includes an increment of the ->seq[] counter by
> > - * the corresponding __srcu_read_lock(). Therefore, if this
> > - * scenario occurs, the return values from the two calls to
> > - * srcu_readers_seq_idx() will differ, and thus the validation
> > - * step below suffices.
> > - */
> > - smp_mb(); /* D */
> > -
> > - return srcu_readers_seq_idx(sp, idx) == seq;
> > + return srcu_readers_lock_idx(sp, idx) == unlocks;
> > }
> >
> > /**
> > @@ -266,8 +230,12 @@ static bool srcu_readers_active(struct srcu_struct *sp)
> > unsigned long sum = 0;
> >
> > for_each_possible_cpu(cpu) {
> > - sum += READ_ONCE(per_cpu_ptr(sp->per_cpu_ref, cpu)->c[0]);
> > - sum += READ_ONCE(per_cpu_ptr(sp->per_cpu_ref, cpu)->c[1]);
> > + struct srcu_array *cpuc = per_cpu_ptr(sp->per_cpu_ref, cpu);
> > +
> > + sum += READ_ONCE(cpuc->lock_count[0]);
> > + sum += READ_ONCE(cpuc->lock_count[1]);
> > + sum -= READ_ONCE(cpuc->unlock_count[0]);
> > + sum -= READ_ONCE(cpuc->unlock_count[1]);
> > }
> > return sum;
> > }
> > @@ -298,9 +266,8 @@ int __srcu_read_lock(struct srcu_struct *sp)
> > int idx;
> >
> > idx = READ_ONCE(sp->completed) & 0x1;
> > - __this_cpu_inc(sp->per_cpu_ref->c[idx]);
> > + __this_cpu_inc(sp->per_cpu_ref->lock_count[idx]);
> > smp_mb(); /* B */ /* Avoid leaking the critical section. */
> > - __this_cpu_inc(sp->per_cpu_ref->seq[idx]);
> > return idx;
> > }
> > EXPORT_SYMBOL_GPL(__srcu_read_lock);
> > @@ -314,7 +281,7 @@ EXPORT_SYMBOL_GPL(__srcu_read_lock);
> > void __srcu_read_unlock(struct srcu_struct *sp, int idx)
> > {
> > smp_mb(); /* C */ /* Avoid leaking the critical section. */
> > - this_cpu_dec(sp->per_cpu_ref->c[idx]);
> > + this_cpu_inc(sp->per_cpu_ref->unlock_count[idx]);
> > }
> > EXPORT_SYMBOL_GPL(__srcu_read_unlock);
> >
> > @@ -349,7 +316,7 @@ static bool try_check_zero(struct srcu_struct *sp, int
> > idx, int trycount)
> > /*
> > * Increment the ->completed counter so that future SRCU readers will
> > - * use the other rank of the ->c[] and ->seq[] arrays. This allows
> > + * use the other rank of the ->(un)lock_count[] arrays. This allows
> > * us to wait for pre-existing readers in a starvation-free manner.
> > */
> > static void srcu_flip(struct srcu_struct *sp)
>
next prev parent reply other threads:[~2017-01-25 21:03 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-14 9:19 [PATCH tip/core/rcu 0/3] SRCU updates for 4.11 Paul E. McKenney
2017-01-14 9:19 ` [PATCH tip/core/rcu 1/3] srcu: More efficient reader counts Paul E. McKenney
2017-01-14 9:31 ` Ingo Molnar
2017-01-14 19:48 ` Paul E. McKenney
2017-01-14 9:20 ` [PATCH tip/core/rcu 2/3] srcu: Force full grace-period ordering Paul E. McKenney
2017-01-14 9:35 ` Ingo Molnar
2017-01-14 19:54 ` Paul E. McKenney
2017-01-14 21:41 ` Paul E. McKenney
2017-01-15 7:11 ` Ingo Molnar
2017-01-15 7:40 ` Paul E. McKenney
2017-01-15 7:57 ` Ingo Molnar
2017-01-15 9:24 ` Paul E. McKenney
2017-01-15 9:40 ` Ingo Molnar
2017-01-15 19:45 ` Paul E. McKenney
2017-01-16 6:56 ` Ingo Molnar
2017-01-23 8:12 ` Michael Ellerman
2017-01-24 2:45 ` Paul E. McKenney
2017-01-15 6:54 ` Ingo Molnar
2017-01-14 9:20 ` [PATCH tip/core/rcu 3/3] rcutorture: Add CBMC-based formal verification for SRCU Paul E. McKenney
2017-01-15 22:41 ` [PATCH tip/core/rcu v2 0/3] SRCU updates for 4.11 Paul E. McKenney
2017-01-15 22:42 ` [PATCH v2 tip/core/rcu 1/3] srcu: Implement more-efficient reader counts Paul E. McKenney
2017-01-23 20:17 ` Lance Roy
2017-01-23 20:17 ` [PATCH] SRCU: More efficient " Lance Roy
2017-01-23 20:35 ` Paul E. McKenney
2017-01-23 21:33 ` Lance Roy
2017-01-23 21:35 ` [PATCH] srcu: Implement more-efficient " Lance Roy
2017-01-24 0:42 ` Paul E. McKenney
2017-01-24 0:53 ` Lance Roy
2017-01-24 1:57 ` Paul E. McKenney
2017-01-24 3:26 ` Lance Roy
2017-01-24 17:07 ` Paul E. McKenney
2017-01-15 22:42 ` [PATCH v2 tip/core/rcu 2/3] srcu: Force full grace-period ordering Paul E. McKenney
2017-01-23 8:38 ` Lance Roy
2017-01-23 19:12 ` Paul E. McKenney
2017-01-23 20:06 ` Lance Roy
2017-01-15 22:42 ` [PATCH v2 tip/core/rcu 3/3] rcutorture: Add CBMC-based formal verification for SRCU Paul E. McKenney
2017-01-24 22:00 ` [PATCH v3 tip/core/rcu 0/4] SRCU updates for 4.11 Paul E. McKenney
2017-01-24 22:00 ` [PATCH v3 tip/core/rcu 1/4] srcu: Implement more-efficient reader counts Paul E. McKenney
2017-01-25 18:17 ` Lance Roy
2017-01-25 21:03 ` Paul E. McKenney [this message]
2017-01-24 22:00 ` [PATCH v3 tip/core/rcu 2/4] srcu: Force full grace-period ordering Paul E. McKenney
2017-01-24 22:00 ` [PATCH v3 tip/core/rcu 3/4] rcutorture: Add CBMC-based formal verification for SRCU Paul E. McKenney
2017-01-24 22:00 ` [PATCH v3 tip/core/rcu 4/4] srcu: Reduce probability of SRCU ->unlock_count[] counter overflow Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170125210336.GI3989@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=bobby.prani@gmail.com \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=dvhart@linux.intel.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=ldr709@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).