Re: [RFC][PATCH 5/5] percpu-rwsem: Optimize readers and reduce global impact

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: oleg@redhat.com, tj@kernel.org, mingo@redhat.com,
	linux-kernel@vger.kernel.org, der.herr@hofr.at,
	dave@stgolabs.net, torvalds@linux-foundation.org,
	josh@joshtriplett.org
Subject: Re: [RFC][PATCH 5/5] percpu-rwsem: Optimize readers and reduce global impact
Date: Sat, 30 May 2015 10:18:06 -0700	[thread overview]
Message-ID: <20150530171806.GB14999@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150526120215.042527659@infradead.org>

On Tue, May 26, 2015 at 01:44:01PM +0200, Peter Zijlstra wrote:
> Currently the percpu-rwsem has two issues:
> 
>  - it switches to (global) atomic ops while a writer is waiting;
>    which could be quite a while and slows down releasing the readers.
> 
>  - it employs synchronize_sched_expedited() _twice_ which is evil and
>    should die -- it shoots IPIs around the machine.
> 
> This patch cures the first problem by ordering the reader-state vs
> reader-count (see the comments in __percpu_down_read() and
> percpu_down_write()). This changes a global atomic op into a full
> memory barrier, which doesn't have the global cacheline contention.
> 
> It cures the second problem by employing the rcu-sync primitives by
> Oleg which reduces to no sync_sched() calls in the 'normal' case of
> no write contention -- global locks had better be rare, and has a
> maximum of one sync_sched() call in case of contention.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/percpu-rwsem.h  |   62 +++++++++-
>  kernel/locking/percpu-rwsem.c |  238 +++++++++++++++++++++---------------------
>  2 files changed, 176 insertions(+), 124 deletions(-)

[ . . . ]

> --- a/kernel/locking/percpu-rwsem.c
> +++ b/kernel/locking/percpu-rwsem.c
> @@ -8,158 +8,164 @@
>  #include <linux/sched.h>
>  #include <linux/errno.h>
> 
> -int __percpu_init_rwsem(struct percpu_rw_semaphore *brw,
> +enum { readers_slow, readers_block };
> +
> +int __percpu_init_rwsem(struct percpu_rw_semaphore *sem,
>  			const char *name, struct lock_class_key *rwsem_key)
>  {
> -	brw->fast_read_ctr = alloc_percpu(int);
> -	if (unlikely(!brw->fast_read_ctr))
> +	sem->refcount = alloc_percpu(unsigned int);
> +	if (unlikely(!sem->refcount))
>  		return -ENOMEM;
> 
> -	/* ->rw_sem represents the whole percpu_rw_semaphore for lockdep */
> -	__init_rwsem(&brw->rw_sem, name, rwsem_key);
> -	atomic_set(&brw->write_ctr, 0);
> -	atomic_set(&brw->slow_read_ctr, 0);
> -	init_waitqueue_head(&brw->write_waitq);
> +	sem->state = readers_slow;
> +	rcu_sync_init(&sem->rss, RCU_SCHED_SYNC);

But it looks like you need the RCU-sched variant.  Please see below for
an untested patch providing this support.  One benefit of this patch
is that it does not add any bloat to Tiny RCU.

							Thanx, Paul

------------------------------------------------------------------------

    rcu: Add RCU-sched flavors of get-state and cond-sync
    
    The get_state_synchronize_rcu() and cond_synchronize_rcu() functions
    allow polling for grace-period completion, with an actual wait for a
    grace period occuring only when cond_synchronize_rcu() is called too
    soon after the corresponding get_state_synchronize_rcu().  However,
    these functions work only for vanilla RCU.  This commit adds the
    get_state_synchronize_sched() and cond_synchronize_sched(), which provide
    the same capability for RCU-sched.
    
    Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 3df6c1ec4e25..ff968b7af3a4 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -37,6 +37,16 @@ static inline void cond_synchronize_rcu(unsigned long oldstate)
 	might_sleep();
 }
 
+static inline unsigned long get_state_synchronize_sched(void)
+{
+	return 0;
+}
+
+static inline void cond_synchronize_sched(unsigned long oldstate)
+{
+	might_sleep();
+}
+
 static inline void rcu_barrier_bh(void)
 {
 	wait_rcu_gp(call_rcu_bh);
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 3fa4a43ab415..80e68d344205 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -76,6 +76,8 @@ void rcu_barrier_bh(void);
 void rcu_barrier_sched(void);
 unsigned long get_state_synchronize_rcu(void);
 void cond_synchronize_rcu(unsigned long oldstate);
+unsigned long get_state_synchronize_sched(void);
+void cond_synchronize_sched(unsigned long oldstate);
 
 extern unsigned long rcutorture_testseq;
 extern unsigned long rcutorture_vernum;
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 1cead7806ca6..f256dee0f6b1 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -635,6 +635,8 @@ static struct rcu_torture_ops sched_ops = {
 	.deferred_free	= rcu_sched_torture_deferred_free,
 	.sync		= synchronize_sched,
 	.exp_sync	= synchronize_sched_expedited,
+	.get_state	= get_state_synchronize_sched,
+	.cond_sync	= cond_synchronize_sched,
 	.call		= call_rcu_sched,
 	.cb_barrier	= rcu_barrier_sched,
 	.fqs		= rcu_sched_force_quiescent_state,
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2fce662fa058..e33e1a8a8d08 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3259,6 +3259,58 @@ void cond_synchronize_rcu(unsigned long oldstate)
 }
 EXPORT_SYMBOL_GPL(cond_synchronize_rcu);
 
+/**
+ * get_state_synchronize_sched - Snapshot current RCU-sched state
+ *
+ * Returns a cookie that is used by a later call to cond_synchronize_sched()
+ * to determine whether or not a full grace period has elapsed in the
+ * meantime.
+ */
+unsigned long get_state_synchronize_sched(void)
+{
+	/*
+	 * Any prior manipulation of RCU-protected data must happen
+	 * before the load from ->gpnum.
+	 */
+	smp_mb();  /* ^^^ */
+
+	/*
+	 * Make sure this load happens before the purportedly
+	 * time-consuming work between get_state_synchronize_sched()
+	 * and cond_synchronize_sched().
+	 */
+	return smp_load_acquire(&rcu_sched_state.gpnum);
+}
+EXPORT_SYMBOL_GPL(get_state_synchronize_sched);
+
+/**
+ * cond_synchronize_sched - Conditionally wait for an RCU-sched grace period
+ *
+ * @oldstate: return value from earlier call to get_state_synchronize_sched()
+ *
+ * If a full RCU-sched grace period has elapsed since the earlier call to
+ * get_state_synchronize_sched(), just return.  Otherwise, invoke
+ * synchronize_sched() to wait for a full grace period.
+ *
+ * Yes, this function does not take counter wrap into account.  But
+ * counter wrap is harmless.  If the counter wraps, we have waited for
+ * more than 2 billion grace periods (and way more on a 64-bit system!),
+ * so waiting for one additional grace period should be just fine.
+ */
+void cond_synchronize_sched(unsigned long oldstate)
+{
+	unsigned long newstate;
+
+	/*
+	 * Ensure that this load happens before any RCU-destructive
+	 * actions the caller might carry out after we return.
+	 */
+	newstate = smp_load_acquire(&rcu_sched_state.completed);
+	if (ULONG_CMP_GE(oldstate, newstate))
+		synchronize_sched();
+}
+EXPORT_SYMBOL_GPL(cond_synchronize_sched);
+
 static int synchronize_sched_expedited_cpu_stop(void *data)
 {
 	/*

next prev parent reply	other threads:[~2015-05-30 17:18 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-26 11:43 [RFC][PATCH 0/5] Optimize percpu-rwsem Peter Zijlstra
2015-05-26 11:43 ` [RFC][PATCH 1/5] rcu: Create rcu_sync infrastructure Peter Zijlstra
2015-05-30 16:58   ` Paul E. McKenney
2015-05-30 19:16     ` Oleg Nesterov
2015-05-30 19:25       ` Oleg Nesterov
2015-05-31 16:07       ` Paul E. McKenney
2015-05-26 11:43 ` [RFC][PATCH 2/5] rcusync: Introduce struct rcu_sync_ops Peter Zijlstra
2015-05-26 11:43 ` [RFC][PATCH 3/5] rcusync: Add the CONFIG_PROVE_RCU checks Peter Zijlstra
2015-05-26 11:44 ` [RFC][PATCH 4/5] rcusync: Introduce rcu_sync_dtor() Peter Zijlstra
2015-05-26 11:44 ` [RFC][PATCH 5/5] percpu-rwsem: Optimize readers and reduce global impact Peter Zijlstra
2015-05-29 19:45   ` Oleg Nesterov
2015-05-29 20:09     ` Oleg Nesterov
2015-05-29 20:41       ` Linus Torvalds
2015-05-30 20:49         ` Oleg Nesterov
2015-06-16 11:48           ` Peter Zijlstra
2015-05-30 17:18   ` Paul E. McKenney [this message]
2015-05-30 20:04     ` ring_buffer_attach && cond_synchronize_rcu (Was: percpu-rwsem: Optimize readers and reduce global impact) Oleg Nesterov
2015-06-16 11:08       ` Peter Zijlstra
2015-06-16 11:16         ` Peter Zijlstra
2015-06-16 19:03           ` Oleg Nesterov
2015-06-19 17:57       ` [tip:perf/urgent] perf: Fix ring_buffer_attach() RCU sync, again tip-bot for Oleg Nesterov
2015-05-26 18:12 ` [RFC][PATCH 0/5] Optimize percpu-rwsem Linus Torvalds
2015-05-26 18:34   ` Peter Zijlstra
2015-05-26 18:35   ` Tejun Heo
2015-05-26 18:42   ` Davidlohr Bueso
2015-05-26 21:57     ` Linus Torvalds
2015-05-27  9:28       ` Nicholas Mc Guire
2015-06-05  1:45       ` Al Viro
2015-06-05 21:08         ` Oleg Nesterov
2015-06-05 22:11           ` Al Viro
2015-06-05 23:36             ` Oleg Nesterov
2015-05-27  6:53     ` Peter Zijlstra
2015-05-26 18:57   ` Oleg Nesterov
2015-05-26 19:13     ` Oleg Nesterov
2015-05-26 19:29     ` Oleg Nesterov
2015-05-26 19:54 ` Davidlohr Bueso

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:3df6c1ec4e2 dfblob:ff968b7af3a dfblob:3fa4a43ab41
dfblob:80e68d34420 dfblob:1cead7806ca dfblob:f256dee0f6b
dfblob:2fce662fa05 dfblob:e33e1a8a8d0 )
 OR (
bs:"Re: [RFC][PATCH 5/5] percpu-rwsem: Optimize readers and reduce global impact" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150530171806.GB14999@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=dave@stgolabs.net \
    --cc=der.herr@hofr.at \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.