Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Will Deacon <will.deacon@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: mingo@kernel.org, linux-kernel@vger.kernel.org,
	longman@redhat.com, andrea.parri@amarulasolutions.com,
	tglx@linutronix.de
Subject: Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86
Date: Tue, 2 Oct 2018 14:19:53 +0100	[thread overview]
Message-ID: <20181002131952.GD16422@arm.com> (raw)
In-Reply-To: <20181001200028.GO3439@hirez.programming.kicks-ass.net>

On Mon, Oct 01, 2018 at 10:00:28PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 01, 2018 at 06:17:00PM +0100, Will Deacon wrote:
> > Thanks for chewing up my afternoon ;)
> 
> I'll get you a beer in EDI ;-)

Just one?!

> > But actually,
> > consider this scenario with your patch:
> > 
> > 1. CPU0 sees a locked val, and is about to do your xchg_relaxed() to set
> >    pending.
> > 
> > 2. CPU1 comes in and sets pending, spins on locked
> > 
> > 3. CPU2 sees a pending and locked val, and is about to enter the head of
> >    the waitqueue (i.e. it's right before xchg_tail()).
> > 
> > 4. The locked holder unlock()s, CPU1 takes the lock() and then unlock()s
> >    it, so pending and locked are now 0.
> > 
> > 5. CPU0 sets pending and reads back zeroes for the other fields
> > 
> > 6. CPU0 clears pending and sets locked -- it now has the lock
> > 
> > 7. CPU2 updates tail, sees it's at the head of the waitqueue and spins
> >    for locked and pending to go clear. However, it reads a stale value
> >    from step (4) and attempts the atomic_try_cmpxchg() to take the lock.
> > 
> > 8. CPU2 will fail the cmpxchg(), but then go ahead and set locked. At this
> >    point we're hosed, because both CPU2 and CPU0 have the lock.
> 
> Let me draw a picture of that..
> 
> 
>   CPU0		CPU1		CPU2		CPU3
> 
> 0)						lock
> 						  trylock -> (0,0,1)
> 1)lock
>     trylock /* fail */
> 
> 2)		lock
> 		  trylock /* fail */
> 		  tas-pending -> (0,1,1)
> 		  wait-locked
> 
> 3)				lock
> 				  trylock /* fail */
> 				  tas-pending /* fail */
> 
> 4)						unlock -> (0,1,0)
> 		  clr_pnd_set_lck -> (0,0,1)
> 		  unlock -> (0,0,0)
> 
> 5)  tas-pending -> (0,1,0)
>     read-val -> (0,1,0)
> 6)  clr_pnd_set_lck -> (0,0,1)
> 7)				  xchg_tail -> (n,0,1)
> 				  load_acquire <- (n,0,0) (from-4)
> 8)				  cmpxchg /* fail */
> 				  set_locked()
> 
> > Is there something I'm missing that means this can't happen? I suppose
> > cacheline granularity ends up giving serialisation between (4) and (7),
> > but I'd *much* prefer not to rely on that because it feels horribly
> > fragile.
> 
> Well, on x86 atomics are fully ordered, so the xchg_tail() does in
> fact have smp_mb() in and that should order it sufficient for that not
> to happen I think.

Hmm, does that actually help, though? I still think you're relying on the
cache-coherence protocol to serialise the xchg() on pending before the
xchg_tail(), which I think is fragile because they don't actually overlap.

> But in general, yes ick. Alternatively, making xchg_tail an ACQUIRE
> doesn't seem too far out..
> 
> > Another idea I was playing with was adding test_and_set_bit_acquire()
> > for this, because x86 has an instruction for that, right?
> 
> LOCK BTS, yes. So it can do a full 32bit RmW, but it cannot return the
> old value of the word, just the old bit (in CF).
> 
> I suppose you get rid of the whole mixed size thing, but you still have
> the whole two instruction thing.

I really think we need that set of pending to operate on the whole lock
word.

> > > +	/*
> > > +	 * Ensures the tail load happens after the xchg().
> > > +	 *
> > > +	 *	   lock  unlock    (a)
> > > +	 *   xchg ---------------.
> > > +	 *    (b)  lock  unlock  +----- fetch_or
> > > +	 *   load ---------------'
> > > +	 *	   lock  unlock    (c)
> > > +	 *
> > 
> > I failed miserably at parsing this comment :(
> > 
> > I think the main issue is that I don't understand how to read the little
> > diagram you've got.
> 
> Where fetch_or() is indivisible and has happens-before (a) and
> happens-after (c), the new thing is in fact divisible and has
> happens-in-between (b).
> 
> Of the happens-in-between (b), we can either get a new concurrent
> locker, or make progress of an extant concurrent locker because an
> unlock happened.
> 
> But the rest of the text might indeed be very confused. I think I wrote
> the bulk of that when I was in fact doing a xchg16 on locked_pending,
> but that's fundamentally broken. I did edit it afterwards, but that
> might have just made it worse.

Ok, maybe just remove it :)

Will

next prev parent reply	other threads:[~2018-10-02 13:19 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-26 11:01 [RFC][PATCH 0/3] locking/qspinlock: Improve determinism for x86 Peter Zijlstra
2018-09-26 11:01 ` [RFC][PATCH 1/3] locking/qspinlock: Re-order code Peter Zijlstra
2018-10-01 17:17   ` Will Deacon
2018-09-26 11:01 ` [RFC][PATCH 2/3] locking/qspinlock: Rework some comments Peter Zijlstra
2018-10-01 17:17   ` Will Deacon
2018-10-01 19:10     ` Peter Zijlstra
2018-10-02 13:20       ` Will Deacon
2018-10-02 13:43         ` Peter Zijlstra
2018-09-26 11:01 ` [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86 Peter Zijlstra
2018-09-26 16:30   ` Waiman Long
2018-09-26 17:54     ` Peter Zijlstra
2018-09-27  7:29       ` Peter Zijlstra
2018-09-26 20:52   ` Andrea Parri
2018-09-27  7:17     ` Peter Zijlstra
2018-09-27  7:47       ` Andrea Parri
2018-09-27  7:59         ` Peter Zijlstra
2018-09-27  8:13           ` Andrea Parri
2018-09-27  8:57             ` Peter Zijlstra
2018-09-27 12:16   ` David Laight
2018-10-01 17:17   ` Will Deacon
2018-10-01 20:00     ` Peter Zijlstra
2018-10-02 13:19       ` Will Deacon [this message]
2018-10-02 14:14         ` Peter Zijlstra
2018-10-02 12:31     ` Andrea Parri
2018-10-02 13:22       ` Will Deacon
2018-10-02 13:44         ` Andrea Parri
2018-09-26 15:01 ` [RFC][PATCH 0/3] locking/qspinlock: Improve determinism " Sebastian Andrzej Siewior
2018-09-26 15:08   ` Thomas Gleixner
2018-09-26 15:38     ` Sebastian Andrzej Siewior
2018-09-26 16:20 ` Waiman Long
2018-09-26 17:51   ` Peter Zijlstra
2018-09-26 23:21     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181002131952.GD16422@arm.com \
    --to=will.deacon@arm.com \
    --cc=andrea.parri@amarulasolutions.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.