Re: [PATCH 2/2] percpu-rw-semaphores: use rcu_read_lock_sched

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>, Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	Anton Arapov <anton@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] percpu-rw-semaphores: use rcu_read_lock_sched
Date: Wed, 24 Oct 2012 09:16:38 -0700	[thread overview]
Message-ID: <20121024161638.GA2465@linux.vnet.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.64.1210221937550.25995@file.rdu.redhat.com>

On Mon, Oct 22, 2012 at 07:39:16PM -0400, Mikulas Patocka wrote:
> Use rcu_read_lock_sched / rcu_read_unlock_sched / synchronize_sched
> instead of rcu_read_lock / rcu_read_unlock / synchronize_rcu.
> 
> This is an optimization. The RCU-protected region is very small, so
> there will be no latency problems if we disable preempt in this region.
> 
> So we use rcu_read_lock_sched / rcu_read_unlock_sched that translates
> to preempt_disable / preempt_disable. It is smaller (and supposedly
> faster) than preemptible rcu_read_lock / rcu_read_unlock.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

OK, as promised/threatened, I finally got a chance to take a closer look.

The light_mb() and heavy_mb() definitions aren't doing much for me,
the code would be cleared with them expanded inline.  And while the
approach of pairing barrier() with synchronize_sched() is interesting,
it would be simpler to rely on RCU's properties.  The key point is that
if RCU cannot prove that a given RCU-sched read-side critical section
is seen by all CPUs to have started after a given synchronize_sched(),
then that synchronize_sched() must wait for that RCU-sched read-side
critical section to complete.

This means, as discussed earlier, that there will be a memory barrier
somewhere following the end of that RCU-sched read-side critical section,
and that this memory barrier executes before the completion of the
synchronize_sched().

So I suggest something like the following (untested!) implementation:

------------------------------------------------------------------------

struct percpu_rw_semaphore {
	unsigned __percpu *counters;
	bool locked;
	struct mutex mtx;
	wait_queue_head_t wq;
};

static inline void percpu_down_read(struct percpu_rw_semaphore *p)
{
	rcu_read_lock_sched();
	if (unlikely(p->locked)) {
		rcu_read_unlock_sched();

		/*
		 * There might (or might not) be a writer.  Acquire &p->mtx,
		 * it is always safe (if a bit slow) to do so.
		 */
		mutex_lock(&p->mtx);
		this_cpu_inc(*p->counters);
		mutex_unlock(&p->mtx);
		return;
	}

	/* No writer, proceed locklessly. */
	this_cpu_inc(*p->counters);
	rcu_read_unlock_sched();
}

static inline void percpu_up_read(struct percpu_rw_semaphore *p)
{
	/*
	 * Decrement our count, but protected by RCU-sched so that
	 * the writer can force proper serialization.
	 */
	rcu_read_lock_sched();
	this_cpu_dec(*p->counters);
	rcu_read_unlock_sched();
}

static inline unsigned __percpu_count(unsigned __percpu *counters)
{
	unsigned total = 0;
	int cpu;

	for_each_possible_cpu(cpu)
		total += ACCESS_ONCE(*per_cpu_ptr(counters, cpu));

	return total;
}

static inline void percpu_down_write(struct percpu_rw_semaphore *p)
{
	mutex_lock(&p->mtx);

	/* Wait for a previous writer, if necessary. */
	wait_event(p->wq, !ACCESS_ONCE(p->locked));

	/* Force the readers to acquire the lock when manipulating counts. */
	ACCESS_ONCE(p->locked) = true;

	/* Wait for all pre-existing readers' checks of ->locked to finish. */
	synchronize_sched();
	/*
	 * At this point, all percpu_down_read() invocations will
	 * acquire p->mtx.
	 */

	/*
	 * Wait for all pre-existing readers to complete their
	 * percpu_up_read() calls.  Because ->locked is set and
	 * because we hold ->mtx, there cannot be any new readers.
	 * ->counters will therefore monotonically decrement to zero.
	 */
	while (__percpu_count(p->counters))
		msleep(1);

	/*
	 * Invoke synchronize_sched() in order to force the last
	 * caller of percpu_up_read() to exit its RCU-sched read-side
	 * critical section.  On SMP systems, this also forces the CPU
	 * that invoked that percpu_up_read() to execute a full memory
	 * barrier between the time it exited the RCU-sched read-side
	 * critical section and the time that synchronize_sched() returns,
	 * so that the critical section begun by this invocation of
	 * percpu_down_write() will happen after the critical section
	 * ended by percpu_up_read().
	 */
	synchronize_sched();
}

static inline void percpu_up_write(struct percpu_rw_semaphore *p)
{
	/* Allow others to proceed, but not yet locklessly. */
	mutex_unlock(&p->mtx);

	/*
	 * Ensure that all calls to percpu_down_read() that did not
	 * start unambiguously after the above mutex_unlock() still
	 * acquire the lock, forcing their critical sections to be
	 * serialized with the one terminated by this call to
	 * percpu_up_write().
	 */
	synchronize_sched();

	/* Now it is safe to allow readers to proceed locklessly. */
	ACCESS_ONCE(p->locked) = false;

	/*
	 * If there is another writer waiting, wake it up.  Note that
	 * p->mtx properly serializes its critical section with the
	 * critical section terminated by this call to percpu_up_write().
	 */
	wake_up(&p->wq);
}

static inline int percpu_init_rwsem(struct percpu_rw_semaphore *p)
{
	p->counters = alloc_percpu(unsigned);
	if (unlikely(!p->counters))
		return -ENOMEM;
	p->locked = false;
	mutex_init(&p->mtx);
	init_waitqueue_head(&p->wq);
	return 0;
}

static inline void percpu_free_rwsem(struct percpu_rw_semaphore *p)
{
	free_percpu(p->counters);
	p->counters = NULL; /* catch use after free bugs */
}

------------------------------------------------------------------------

Of course, it would be nice to get rid of the extra synchronize_sched().
One way to do this is to use SRCU, which allows blocking operations in
its read-side critical sections (though also increasing read-side overhead
a bit, and also untested):

------------------------------------------------------------------------

struct percpu_rw_semaphore {
	bool locked;
	struct mutex mtx; /* Could also be rw_semaphore. */
	struct srcu_struct s;
	wait_queue_head_t wq;
};

static inline int percpu_down_read(struct percpu_rw_semaphore *p)
{
	int idx;

	idx = srcu_read_lock(&p->s);
	if (unlikely(p->locked)) {
		srcu_read_unlock(&p->s, idx);

		/*
		 * There might (or might not) be a writer.  Acquire &p->mtx,
		 * it is always safe (if a bit slow) to do so.
		 */
		mutex_lock(&p->mtx);
		return -1;  /* srcu_read_lock() cannot return -1. */
	}
	return idx;
}

static inline void percpu_up_read(struct percpu_rw_semaphore *p, int idx)
{
	if (idx == -1)
		mutex_unlock(&p->mtx);
	else
		srcu_read_unlock(&p->s, idx);
}

static inline void percpu_down_write(struct percpu_rw_semaphore *p)
{
	mutex_lock(&p->mtx);

	/* Wait for a previous writer, if necessary. */
	wait_event(p->wq, !ACCESS_ONCE(p->locked));

	/* Force new readers to acquire the lock when manipulating counts. */
	ACCESS_ONCE(p->locked) = true;

	/* Wait for all pre-existing readers' checks of ->locked to finish. */
	synchronize_srcu(&p->s);
	/* At this point, all lockless readers have completed. */
}

static inline void percpu_up_write(struct percpu_rw_semaphore *p)
{
	/* Allow others to proceed, but not yet locklessly. */
	mutex_unlock(&p->mtx);

	/*
	 * Ensure that all calls to percpu_down_read() that did not
	 * start unambiguously after the above mutex_unlock() still
	 * acquire the lock, forcing their critical sections to be
	 * serialized with the one terminated by this call to
	 * percpu_up_write().
	 */
	synchronize_sched();

	/* Now it is safe to allow readers to proceed locklessly. */
	ACCESS_ONCE(p->locked) = false;

	/*
	 * If there is another writer waiting, wake it up.  Note that
	 * p->mtx properly serializes its critical section with the
	 * critical section terminated by this call to percpu_up_write().
	 */
	wake_up(&p->wq);
}

static inline int percpu_init_rwsem(struct percpu_rw_semaphore *p)
{
	p->locked = false;
	mutex_init(&p->mtx);
	if (unlikely(!init_srcu_struct(&p->s)));
		return -ENOMEM;
	init_waitqueue_head(&p->wq);
	return 0;
}

static inline void percpu_free_rwsem(struct percpu_rw_semaphore *p)
{
	cleanup_srcu_struct(&p->s);
}

------------------------------------------------------------------------

Of course, there was a question raised as to whether something already
exists that does this job...

And you guys did ask!

							Thanx, Paul

next prev parent reply	other threads:[~2012-10-24 16:34 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-15 19:09 [RFC PATCH 0/2] uprobes: register/unregister can race with fork Oleg Nesterov
2012-10-15 19:10 ` [PATCH 1/2] brw_mutex: big read-write mutex Oleg Nesterov
2012-10-15 23:28   ` Paul E. McKenney
2012-10-16 15:56     ` Oleg Nesterov
2012-10-16 18:58       ` Paul E. McKenney
2012-10-17 16:37         ` Oleg Nesterov
2012-10-17 22:28           ` Paul E. McKenney
2012-10-16 19:56   ` Linus Torvalds
2012-10-17 16:59     ` Oleg Nesterov
2012-10-17 22:44       ` Paul E. McKenney
2012-10-18 16:24         ` Oleg Nesterov
2012-10-18 16:38           ` Paul E. McKenney
2012-10-18 17:57             ` Oleg Nesterov
2012-10-18 19:28               ` Mikulas Patocka
2012-10-19 12:38                 ` Peter Zijlstra
2012-10-19 15:32                   ` Mikulas Patocka
2012-10-19 17:40                     ` Peter Zijlstra
2012-10-19 17:57                       ` Oleg Nesterov
2012-10-19 22:54                       ` Mikulas Patocka
2012-10-24  3:08                         ` Dave Chinner
2012-10-25 14:09                           ` Mikulas Patocka
2012-10-25 23:40                             ` Dave Chinner
2012-10-26 12:06                               ` Oleg Nesterov
2012-10-26 13:22                                 ` Mikulas Patocka
2012-10-26 14:12                                   ` Oleg Nesterov
2012-10-26 15:23                                     ` mark_files_ro && sb_end_write Oleg Nesterov
2012-10-26 16:09                                     ` [PATCH 1/2] brw_mutex: big read-write mutex Mikulas Patocka
2012-10-19 17:49                     ` Oleg Nesterov
2012-10-22 23:09                       ` Mikulas Patocka
2012-10-23 15:12                         ` Oleg Nesterov
2012-10-19 19:28               ` Paul E. McKenney
2012-10-22 23:36                 ` [PATCH 0/2] fix and improvements for percpu-rw-semaphores (was: brw_mutex: big read-write mutex) Mikulas Patocka
2012-10-22 23:37                   ` [PATCH 1/2] percpu-rw-semaphores: use light/heavy barriers Mikulas Patocka
2012-10-22 23:39                     ` [PATCH 2/2] percpu-rw-semaphores: use rcu_read_lock_sched Mikulas Patocka
2012-10-24 16:16                       ` Paul E. McKenney [this message]
2012-10-24 17:18                         ` Oleg Nesterov
2012-10-24 18:20                           ` Paul E. McKenney
2012-10-24 18:43                             ` Oleg Nesterov
2012-10-24 19:43                               ` Paul E. McKenney
2012-10-25 14:54                         ` Mikulas Patocka
2012-10-25 15:07                           ` Paul E. McKenney
2012-10-25 16:15                             ` Mikulas Patocka
2012-10-23 16:59                     ` [PATCH 1/2] percpu-rw-semaphores: use light/heavy barriers Oleg Nesterov
2012-10-23 18:05                       ` Paul E. McKenney
2012-10-23 18:27                         ` Oleg Nesterov
2012-10-23 18:41                         ` Oleg Nesterov
2012-10-23 20:29                           ` Paul E. McKenney
2012-10-23 20:32                             ` Paul E. McKenney
2012-10-23 21:39                               ` Mikulas Patocka
2012-10-24 16:23                                 ` Paul E. McKenney
2012-10-24 20:22                                   ` Mikulas Patocka
2012-10-24 20:36                                     ` Paul E. McKenney
2012-10-24 20:44                                       ` Mikulas Patocka
2012-10-24 23:57                                         ` Paul E. McKenney
2012-10-25 12:39                                           ` Paul E. McKenney
2012-10-25 13:48                                           ` Mikulas Patocka
2012-10-23 19:23                       ` Oleg Nesterov
2012-10-23 20:45                         ` Peter Zijlstra
2012-10-23 20:57                         ` Peter Zijlstra
2012-10-24 15:11                           ` Oleg Nesterov
2012-10-23 21:26                         ` Mikulas Patocka
2012-10-23 20:32                     ` Peter Zijlstra
2012-10-30 18:48                   ` [PATCH 0/2] fix and improvements for percpu-rw-semaphores (was: brw_mutex: big read-write mutex) Oleg Nesterov
2012-10-31 19:41                     ` [PATCH 0/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily Oleg Nesterov
2012-10-31 19:41                       ` [PATCH 1/1] " Oleg Nesterov
2012-11-01 15:10                         ` Linus Torvalds
2012-11-01 15:34                           ` Oleg Nesterov
2012-11-02 18:06                           ` [PATCH v2 0/1] " Oleg Nesterov
2012-11-02 18:06                             ` [PATCH v2 1/1] " Oleg Nesterov
2012-11-07 17:04                               ` [PATCH v3 " Mikulas Patocka
2012-11-07 17:47                                 ` Oleg Nesterov
2012-11-07 19:17                                   ` Mikulas Patocka
2012-11-08 13:42                                     ` Oleg Nesterov
2012-11-08  1:23                                 ` Paul E. McKenney
2012-11-08  1:16                               ` [PATCH v2 " Paul E. McKenney
2012-11-08 13:33                                 ` Oleg Nesterov
2012-11-08 16:27                                   ` Paul E. McKenney
2012-11-08 13:48                             ` [PATCH RESEND v2 0/1] " Oleg Nesterov
2012-11-08 13:48                               ` [PATCH RESEND v2 1/1] " Oleg Nesterov
2012-11-08 20:07                                 ` Andrew Morton
2012-11-08 21:08                                   ` Paul E. McKenney
2012-11-08 23:41                                     ` Mikulas Patocka
2012-11-09  0:41                                       ` Paul E. McKenney
2012-11-09  3:23                                         ` Paul E. McKenney
2012-11-09 16:35                                           ` Oleg Nesterov
2012-11-09 16:59                                             ` Paul E. McKenney
2012-11-09 12:47                                   ` Mikulas Patocka
2012-11-09 15:46                                   ` Oleg Nesterov
2012-11-09 17:01                                     ` Paul E. McKenney
2012-11-09 18:10                                       ` Oleg Nesterov
2012-11-09 18:19                                         ` Oleg Nesterov
2012-11-10  0:55                                         ` Paul E. McKenney
2012-11-11 15:45                                           ` Oleg Nesterov
2012-11-12 18:38                                             ` Paul E. McKenney
2012-11-11 18:27                                   ` [PATCH -mm] percpu_rw_semaphore-reimplement-to-not-block-the-readers-unnecessari ly.fix Oleg Nesterov
2012-11-12 18:31                                     ` Paul E. McKenney
2012-11-16 23:22                                     ` Andrew Morton
2012-11-18 19:32                                       ` Oleg Nesterov
2012-11-01 15:43                         ` [PATCH 1/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily Paul E. McKenney
2012-11-01 18:33                           ` Oleg Nesterov
2012-11-02 16:18                             ` Oleg Nesterov
2012-10-15 19:10 ` [PATCH 2/2] uprobes: Use brw_mutex to fix register/unregister vs dup_mmap() race Oleg Nesterov
2012-10-18  7:03   ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121024161638.GA2465@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=ananth@in.ibm.com \
    --cc=anton@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mpatocka@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.