public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: paulmck@linux.vnet.ibm.com, tj@kernel.org, mingo@redhat.com,
	linux-kernel@vger.kernel.org, der.herr@hofr.at,
	dave@stgolabs.net, torvalds@linux-foundation.org
Subject: Re: [RFC][PATCH 5/5] percpu-rwsem: Optimize readers and reduce global impact
Date: Fri, 29 May 2015 21:45:34 +0200	[thread overview]
Message-ID: <20150529194534.GA31860@redhat.com> (raw)
In-Reply-To: <20150526120215.042527659@infradead.org>

Sorry for delay, finally I found the time to read this series...
The code matches our previous discussion and I believe it is correct.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


Just one nit below,

On 05/26, Peter Zijlstra wrote:
>
>  struct percpu_rw_semaphore {
> -	unsigned int __percpu	*fast_read_ctr;
> -	atomic_t		write_ctr;
> +	unsigned int __percpu	*refcount;
> +	int			state;

....

> +enum { readers_slow, readers_block };

Now that we rely on rss_sync() and thus we do not have "readers_fast",
I think that "bool reader_should_block" will look better.

> +void percpu_down_write(struct percpu_rw_semaphore *sem)
>  {
...

so it does

	rcu_sync_enter(&sem->rss);

	state = BLOCK;

	mb();

	wait_event(sem->writer, readers_active_check(sem));

and this looks correct.

The only nontrivial thing we need to ensure is that
per_cpu_sum(*sem->refcount) == 0 can't be false positive. False
negative is fine.

And this means that if we see the result of this_cpu_dec() we must
not miss the result of the previous this_cpu_inc() on another CPU.
same or _another_ CPU.

And this is true because if the reader does dec() on another CPU
it does up_read() and this is only possible if down_read() didn't
see state == BLOCK.

But if it didn't see state == BLOCK then the writer must see the
result of the previous down_read()->inc().

IOW, we just rely on STORE-MB-LOAD, just the writer does LOAD
multiple times in per_cpu_sum():

DOWN_WRITE:			DOWN_READ on CPU X:

state = BLOCK;			refcount[X]++;

mb();				mb();

sum = 0;			if (state != BLOCK)
sum += refcount[0];			return;	 /* success* /
sum += refcount[1];		
...				refcount[X]--;
sum += refcount[NR_CPUS];


If the reader wins and takes the lock, then its addition to
refcount[X] must be accounted by the writer.

The writer can obviously miss dec() from the reader, but we rely
on wake_up/wait_event in this case.

Oleg.


  reply	other threads:[~2015-05-29 19:46 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-26 11:43 [RFC][PATCH 0/5] Optimize percpu-rwsem Peter Zijlstra
2015-05-26 11:43 ` [RFC][PATCH 1/5] rcu: Create rcu_sync infrastructure Peter Zijlstra
2015-05-30 16:58   ` Paul E. McKenney
2015-05-30 19:16     ` Oleg Nesterov
2015-05-30 19:25       ` Oleg Nesterov
2015-05-31 16:07       ` Paul E. McKenney
2015-05-26 11:43 ` [RFC][PATCH 2/5] rcusync: Introduce struct rcu_sync_ops Peter Zijlstra
2015-05-26 11:43 ` [RFC][PATCH 3/5] rcusync: Add the CONFIG_PROVE_RCU checks Peter Zijlstra
2015-05-26 11:44 ` [RFC][PATCH 4/5] rcusync: Introduce rcu_sync_dtor() Peter Zijlstra
2015-05-26 11:44 ` [RFC][PATCH 5/5] percpu-rwsem: Optimize readers and reduce global impact Peter Zijlstra
2015-05-29 19:45   ` Oleg Nesterov [this message]
2015-05-29 20:09     ` Oleg Nesterov
2015-05-29 20:41       ` Linus Torvalds
2015-05-30 20:49         ` Oleg Nesterov
2015-06-16 11:48           ` Peter Zijlstra
2015-05-30 17:18   ` Paul E. McKenney
2015-05-30 20:04     ` ring_buffer_attach && cond_synchronize_rcu (Was: percpu-rwsem: Optimize readers and reduce global impact) Oleg Nesterov
2015-06-16 11:08       ` Peter Zijlstra
2015-06-16 11:16         ` Peter Zijlstra
2015-06-16 19:03           ` Oleg Nesterov
2015-06-19 17:57       ` [tip:perf/urgent] perf: Fix ring_buffer_attach() RCU sync, again tip-bot for Oleg Nesterov
2015-05-26 18:12 ` [RFC][PATCH 0/5] Optimize percpu-rwsem Linus Torvalds
2015-05-26 18:34   ` Peter Zijlstra
2015-05-26 18:35   ` Tejun Heo
2015-05-26 18:42   ` Davidlohr Bueso
2015-05-26 21:57     ` Linus Torvalds
2015-05-27  9:28       ` Nicholas Mc Guire
2015-06-05  1:45       ` Al Viro
2015-06-05 21:08         ` Oleg Nesterov
2015-06-05 22:11           ` Al Viro
2015-06-05 23:36             ` Oleg Nesterov
2015-05-27  6:53     ` Peter Zijlstra
2015-05-26 18:57   ` Oleg Nesterov
2015-05-26 19:13     ` Oleg Nesterov
2015-05-26 19:29     ` Oleg Nesterov
2015-05-26 19:54 ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150529194534.GA31860@redhat.com \
    --to=oleg@redhat.com \
    --cc=dave@stgolabs.net \
    --cc=der.herr@hofr.at \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox