Re: [PATCH] locking/percpu_rwsem: Rewrite to not use rwsem

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Will Deacon <will@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>, Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, bigeasy@linutronix.de,
	juri.lelli@redhat.com, williams@redhat.com, bristot@redhat.com,
	longman@redhat.com, dave@stgolabs.net, jack@suse.com
Subject: Re: [PATCH] locking/percpu_rwsem: Rewrite to not use rwsem
Date: Tue, 29 Oct 2019 20:06:24 +0100	[thread overview]
Message-ID: <20191029190624.GB3079@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <20190807144305.v55fohssujsqtegb@willie-the-truck>

On Wed, Aug 07, 2019 at 03:45:34PM +0100, Will Deacon wrote:
> Hi Peter,
> 
> I've mostly been spared the joys of pcpu rwsem, but I took a look anyway.
> Comments of questionable quality below.

Thanks for having a look, and sorry for being tardy.

> On Mon, Aug 05, 2019 at 04:02:41PM +0200, Peter Zijlstra wrote:

> > +/*
> > + * Called with preemption disabled, and returns with preemption disabled,
> > + * except when it fails the trylock.
> > + */
> > +bool __percpu_down_read(struct percpu_rw_semaphore *sem, bool try)
> >  {
> >  	/*
> >  	 * Due to having preemption disabled the decrement happens on
> >  	 * the same CPU as the increment, avoiding the
> >  	 * increment-on-one-CPU-and-decrement-on-another problem.
> >  	 *
> > -	 * If the reader misses the writer's assignment of readers_block, then
> > -	 * the writer is guaranteed to see the reader's increment.
> > +	 * If the reader misses the writer's assignment of sem->block, then the
> > +	 * writer is guaranteed to see the reader's increment.
> >  	 *
> >  	 * Conversely, any readers that increment their sem->read_count after
> > -	 * the writer looks are guaranteed to see the readers_block value,
> > -	 * which in turn means that they are guaranteed to immediately
> > -	 * decrement their sem->read_count, so that it doesn't matter that the
> > -	 * writer missed them.
> > +	 * the writer looks are guaranteed to see the sem->block value, which
> > +	 * in turn means that they are guaranteed to immediately decrement
> > +	 * their sem->read_count, so that it doesn't matter that the writer
> > +	 * missed them.
> >  	 */
> >  
> > +again:
> >  	smp_mb(); /* A matches D */
> >  
> >  	/*
> > -	 * If !readers_block the critical section starts here, matched by the
> > +	 * If !sem->block the critical section starts here, matched by the
> >  	 * release in percpu_up_write().
> >  	 */
> > -	if (likely(!smp_load_acquire(&sem->readers_block)))
> > -		return 1;
> > +	if (likely(!atomic_read_acquire(&sem->block)))
> > +		return true;
> >  
> >  	/*
> >  	 * Per the above comment; we still have preemption disabled and
> >  	 * will thus decrement on the same CPU as we incremented.
> >  	 */
> > -	__percpu_up_read(sem);
> > +	__percpu_up_read(sem); /* implies preempt_enable() */
> 
> Irritatingly, it also implies an smp_mb() which I don't think we need here.

Hurm.. yes, I think you're right. We have:

	__this_cpu_inc()
	smp_mb()
	if (!atomic_read_acquire(&sem->block))
		return true;
	__percpu_up_read();

Between that smp_mb() and the ACQUIRE there is no way the
__this_cpu_dec() can get hoisted. That is, except if we rely on that
stronger transitivity guarantee (I forgot what we ended up calling it,
and I can't ever find anything in a hurry in memory-barriers.txt).

That said, with Oleg's suggestion getting here is much reduced and when
we do, the cost of actual wakeups is much higher than that of the memory
barrier, so I'm inclined to leave things as they are.

> >  	if (try)
> > -		return 0;
> > +		return false;
> >  
> > -	/*
> > -	 * We either call schedule() in the wait, or we'll fall through
> > -	 * and reschedule on the preempt_enable() in percpu_down_read().
> > -	 */
> > -	preempt_enable_no_resched();
> > +	wait_event(sem->waiters, !atomic_read_acquire(&sem->block));
> 
> Why do you need acquire semantics here? Is the control dependency to the
> increment not enough?

Uuuhhh... let me think about that. Clearly we want ACQUIRE for the
'return true' case, but I'm not sure why I used it here.

> Talking of control dependencies, could we replace the smp_mb() in
> readers_active_check() with smp_acquire__after_ctrl_dep()? In fact, perhaps
> we could remove it altogether, given that our writes will be ordered by
> the dependency and I don't think we care about ordering our reads wrt
> previous readers. Hmm. Either way, clearly not for this patch.

I think we can, but we've never had the need to optimize the slow path
here. The design of this thing has always been that if you hit the
slow-path, you're doing it wrong.

That said, I think cgroups use a variant of percpu-rwsem that wreck rss
on purpose and always take the slowpaths. So it might actually be worth
it.

next prev parent reply	other threads:[~2019-10-29 19:06 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-05 14:02 [PATCH] locking/percpu_rwsem: Rewrite to not use rwsem Peter Zijlstra
2019-08-05 14:43 ` Boqun Feng
2019-08-05 14:58   ` Boqun Feng
2019-08-05 15:43     ` Peter Zijlstra
2019-08-06 14:15       ` Boqun Feng
2019-08-06 16:17 ` Oleg Nesterov
2019-08-06 17:15   ` Peter Zijlstra
2019-08-07  9:56     ` Oleg Nesterov
2019-10-29 18:47       ` Peter Zijlstra
2019-10-30 14:21         ` Oleg Nesterov
2019-10-30 16:09           ` Peter Zijlstra
2019-10-30 17:52         ` Peter Zijlstra
2019-10-30 18:47           ` Peter Zijlstra
2019-10-30 19:31           ` Peter Zijlstra
2019-10-31  6:11           ` Peter Zijlstra
2019-08-07 14:45 ` Will Deacon
2019-10-29 19:06   ` Peter Zijlstra [this message]
2019-10-30 15:57     ` Oleg Nesterov
2019-10-30 16:47       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191029190624.GB3079@worktop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bigeasy@linutronix.de \
    --cc=bristot@redhat.com \
    --cc=dave@stgolabs.net \
    --cc=jack@suse.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox