All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	der.herr@hofr.at, Davidlohr Bueso <dave@stgolabs.net>
Subject: Re: [RFC][PATCH 0/5] Optimize percpu-rwsem
Date: Tue, 26 May 2015 21:13:43 +0200	[thread overview]
Message-ID: <20150526191343.GA5794@redhat.com> (raw)
In-Reply-To: <20150526185727.GA3763@redhat.com>

On 05/26, Oleg Nesterov wrote:
>
> On 05/26, Linus Torvalds wrote:
> >
> >
> > We literally have one single percpu-rwsem IN THE WHOLE KERNEL TREE.
> >
> > One.
>
> Well. IIRC Tejun is going to turn signal_struct->group_rwsem into
> percpu-rwsem.
>
> And it can have more users. Say, __sb_start_write/etc  does something
> similar, and last time I checked this code it looked buggy to me.

I have found my old email, see below. Perhaps this code was changed
since 2013 when I sent this email, I didn't verify... but in any
case this logic doesn't look simple, imo it would be nice to rely
on the generic helpers from kernel/locking.

Oleg.

------------------------------------------------------------------------------
When I look at __sb_start_write/etc I am not sure this locking
is correct. OK, __sb_start_write() does:

	percpu_counter_inc();

	mb();

	if (sb->s_writers.frozen)
		abort_and_retry;

freeze_super() does STORE + mb + LOAD in reverse order so either
__sb_start_write() must see SB_FREEZE_WRITE or freeze_super() must
see the change in ->s_writers.counter. This is correct.

Still I am not sure sb_wait_write() can trust percpu_counter_sum(),
because it can also see _other_ changes.

To simplify the discussion, suppose that percpu_counter doesn't have
lock/count/batch/whatever and  inc/dec/sum only uses "__percpu *counters".
Lets denote sb->s_writers.counter[level] as CTR[cpu].

Suppose that some thread did __sb_start_write() on CPU_1 and sleeps
"forever". CTR[0] == 0, CTR_[1] == 1, freezer_super() should block.

Now:

	1. freeze_super() sets SB_FREEZE_WRITE, does mb(), and
	   starts sb_wait_write()->percpu_counter_sum().

	2. __percpu_counter_sum() does for_each_online_cpu(),
	   reads CTR[0] == 0. ret = 0.

	3. Another thread comes, calls __sb_start_write() on CPU_0,
	   increments CTR[0].

	   Then it notices sb->s_writers.frozen >= level and starts
	   __sb_end_write() before retry.

	   Then it migrates to CPU_1. And decrements CTR[1] before
	   __percpu_counter_sum() reads it.

	   So CTR[0] == 1, CTR[1] == 0. Everything is fine except
	   sb_wait_write() has already read CTR[0].

	 4. __percpu_counter_sum() continues, reads CTR[1] == 0
	    and returns ret == 0.

sb_wait_write() returns while it should not?


  reply	other threads:[~2015-05-26 19:14 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-26 11:43 [RFC][PATCH 0/5] Optimize percpu-rwsem Peter Zijlstra
2015-05-26 11:43 ` [RFC][PATCH 1/5] rcu: Create rcu_sync infrastructure Peter Zijlstra
2015-05-30 16:58   ` Paul E. McKenney
2015-05-30 19:16     ` Oleg Nesterov
2015-05-30 19:25       ` Oleg Nesterov
2015-05-31 16:07       ` Paul E. McKenney
2015-05-26 11:43 ` [RFC][PATCH 2/5] rcusync: Introduce struct rcu_sync_ops Peter Zijlstra
2015-05-26 11:43 ` [RFC][PATCH 3/5] rcusync: Add the CONFIG_PROVE_RCU checks Peter Zijlstra
2015-05-26 11:44 ` [RFC][PATCH 4/5] rcusync: Introduce rcu_sync_dtor() Peter Zijlstra
2015-05-26 11:44 ` [RFC][PATCH 5/5] percpu-rwsem: Optimize readers and reduce global impact Peter Zijlstra
2015-05-29 19:45   ` Oleg Nesterov
2015-05-29 20:09     ` Oleg Nesterov
2015-05-29 20:41       ` Linus Torvalds
2015-05-30 20:49         ` Oleg Nesterov
2015-06-16 11:48           ` Peter Zijlstra
2015-05-30 17:18   ` Paul E. McKenney
2015-05-30 20:04     ` ring_buffer_attach && cond_synchronize_rcu (Was: percpu-rwsem: Optimize readers and reduce global impact) Oleg Nesterov
2015-06-16 11:08       ` Peter Zijlstra
2015-06-16 11:16         ` Peter Zijlstra
2015-06-16 19:03           ` Oleg Nesterov
2015-06-19 17:57       ` [tip:perf/urgent] perf: Fix ring_buffer_attach() RCU sync, again tip-bot for Oleg Nesterov
2015-05-26 18:12 ` [RFC][PATCH 0/5] Optimize percpu-rwsem Linus Torvalds
2015-05-26 18:34   ` Peter Zijlstra
2015-05-26 18:35   ` Tejun Heo
2015-05-26 18:42   ` Davidlohr Bueso
2015-05-26 21:57     ` Linus Torvalds
2015-05-27  9:28       ` Nicholas Mc Guire
2015-06-05  1:45       ` Al Viro
2015-06-05 21:08         ` Oleg Nesterov
2015-06-05 22:11           ` Al Viro
2015-06-05 23:36             ` Oleg Nesterov
2015-05-27  6:53     ` Peter Zijlstra
2015-05-26 18:57   ` Oleg Nesterov
2015-05-26 19:13     ` Oleg Nesterov [this message]
2015-05-26 19:29     ` Oleg Nesterov
2015-05-26 19:54 ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150526191343.GA5794@redhat.com \
    --to=oleg@redhat.com \
    --cc=dave@stgolabs.net \
    --cc=der.herr@hofr.at \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.