From: Will Deacon <will.deacon@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>,
linux-kernel@vger.kernel.org, torvalds@linux-foundation.org,
manfred@colorfullife.com, dave@stgolabs.net,
paulmck@linux.vnet.ibm.com, Waiman.Long@hpe.com, tj@kernel.org,
pablo@netfilter.org, kaber@trash.net, davem@davemloft.net,
oleg@redhat.com, netfilter-devel@vger.kernel.org,
sasha.levin@oracle.com, hofrat@osadl.org, jejb@parisc-linux.org,
chris@zankel.net, rth@twiddle.net, dhowells@redhat.com,
schwidefsky@de.ibm.com, mpe@ellerman.id.au, ralf@linux-mips.org,
linux@armlinux.org.uk, rkuo@codeaurora.org, vgupta@synopsys.com,
james.hogan@imgtec.com, realmz6@gmail.com,
ysato@users.sourceforge.jp, tony.luck@intel.com,
cmetcalf@mellanox.com
Subject: Re: [PATCH -v4 5/7] locking, arch: Update spin_unlock_wait()
Date: Fri, 3 Jun 2016 13:47:34 +0100 [thread overview]
Message-ID: <20160603124734.GK9915@arm.com> (raw)
In-Reply-To: <20160602215119.GF3190@twins.programming.kicks-ass.net>
Hi Peter,
On Thu, Jun 02, 2016 at 11:51:19PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 02, 2016 at 06:57:00PM +0100, Will Deacon wrote:
> > > +++ b/include/asm-generic/qspinlock.h
> > > @@ -28,30 +28,13 @@
> > > */
> > > static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
> > > {
> > > + /*
> > > + * See queued_spin_unlock_wait().
> > > *
> > > + * Any !0 state indicates it is locked, even if _Q_LOCKED_VAL
> > > + * isn't immediately observable.
> > > */
> > > - smp_mb();
> > > + return !!atomic_read(&lock->val);
> > > }
> >
> > I'm failing to keep up here :(
> >
> > The fast-path code in queued_spin_lock is just an atomic_cmpxchg_acquire.
> > If that's built out of LL/SC instructions, then why don't we need a barrier
> > here in queued_spin_is_locked?
> >
> > Or is the decision now that only spin_unlock_wait is required to enforce
> > this ordering?
>
> (warning: long, somewhat rambling, email)
Thanks for taking the time to write this up. Comments inline.
> You're talking about the smp_mb() that went missing?
Right -- I think you still need it.
> So that wasn't the reason that smp_mb() existed..... but that makes the
> atomic_foo_acquire() things somewhat hard to use, because I don't think
> we want to unconditionally put the smp_mb() in there just in case.
>
> Now, the normal atomic_foo_acquire() stuff uses smp_mb() as per
> smp_mb__after_atomic(), its just ARM64 and PPC that go all 'funny' and
> need this extra barrier. Blergh. So lets shelf this issue for a bit.
Hmm... I certainly plan to get qspinlock up and running for arm64 in the
near future, so I'm not keen on shelving it for very long.
> Let me write some text to hopefully explain where it did come from and
> why I now removed it.
>
>
> So the spin_is_locked() correctness issue comes from something like:
>
> CPU0 CPU1
>
> global_lock(); local_lock(i)
> spin_lock(&G) spin_lock(&L[i])
> for (i) if (!spin_is_locked(&G)) {
> spin_unlock_wait(&L[i]); smp_acquire__after_ctrl_dep();
> return;
> }
> /* deal with fail */
>
> Where it is important CPU1 sees G locked or CPU0 sees L[i] locked such
> that there is exclusion between the two critical sections.
Yes, and there's also a version of this where CPU0 is using spin_is_locked
(see 51d7d5205d338 "powerpc: Add smp_mb() to arch_spin_is_locked()").
> The load from spin_is_locked(&G) is constrained by the ACQUIRE from
> spin_lock(&L[i]), and similarly the load(s) from spin_unlock_wait(&L[i])
> are constrained by the ACQUIRE from spin_lock(&G).
>
> Similarly, later stuff is constrained by the ACQUIRE from CTRL+RMB.
>
> Given a simple (SC) test-and-set spinlock the above is fairly straight
> forward and 'correct', right?
Well, the same issue that you want to shelve can manifest here with
test-and-set locks too, allowing the spin_is_locked(&G) to be speculated
before the spin_lock(&L[i]) has finished taking the lock on CPU1.
Even ignoring that, I'm not convinced this would work for test-and-set
locks without barriers in unlock_wait and is_locked. For example, a
cut-down version of your test looks like:
CPU0: CPU1:
LOCK x LOCK y
Read y Read x
and you don't want the reads to both return "unlocked".
Even on x86, I think you need a fence here:
X86 lock
{
}
P0 | P1 ;
MOV EAX,$1 | MOV EAX,$1 ;
LOCK XCHG [x],EAX | LOCK XCHG [y],EAX ;
MOV EBX,[y] | MOV EBX,[x] ;
exists
(0:EAX=0 /\ 0:EBX=0 /\ 1:EAX=0 /\ 1:EBX=0)
is permitted by herd.
> Now, the 'problem' with qspinlock is that one possible acquire path goes
> like (there are more, but this is the easiest):
>
> smp_cond_acquire(!(atomic_read(&lock->val) & _Q_LOCKED_MASK));
> clear_pending_set_locked(lock);
>
> And one possible implementation of clear_pending_set_locked() is:
>
> WRITE_ONCE(l->locked_pending, _Q_LOCKED_VAL);
>
> IOW, we load-acquire the locked byte until its cleared, at which point
> we know the pending byte to be 1. Then we consider the lock owned by us
> and issue a regular unordered store to flip the pending and locked
> bytes.
>
> Normal mutual exclusion is fine with this, no new pending can happen
> until this store becomes visible at which time the locked byte is
> visibly taken.
>
> This unordered store however, can be delayed (store buffer) such that
> the loads from spin_unlock_wait/spin_is_locked can pass up before it
> (even on TSO arches).
Right, and this is surprisingly similar to the LL/SC problem imo.
> _IF_ spin_unlock_wait/spin_is_locked only look at the locked byte, this
> is a problem because at that point the crossed store-load pattern
> becomes uncrossed and we loose our guarantee. That is, what used to be:
>
> [S] G.locked = 1 [S] L[i].locked = 1
> [MB] [MB]
> [L] L[i].locked [L] G.locked
>
> becomes:
>
> [S] G.locked = 1 [S] L[i].locked = 1
> [L] L[i].locked [L] G.locked
>
> Which we can reorder at will and bad things follow.
>
> The previous fix for this was to include an smp_mb() in both
> spin_is_locked() and spin_unlock_wait() to restore that ordering.
>
>
> So at this point spin_is_locked() looks like:
>
> smp_mb();
> while (atomic_read(&lock->val) & _Q_LOCKED_MASK)
> cpu_relax();
I have something similar queued for arm64's ticket locks.
> But for spin_unlock_wait() there is a second correctness issue, namely:
>
> CPU0 CPU1
>
> flag = set;
> smp_mb(); spin_lock(&l)
> spin_unlock_wait(&l); if (flag)
> /* bail */
>
> /* add to lockless list */
> spin_unlock(&l);
>
> /* iterate lockless list */
>
>
> Which ensures that CPU1 will stop adding bits to the list and CPU0 will
> observe the last entry on the list (if spin_unlock_wait() had ACQUIRE
> semantics etc..)
>
> This however, is still broken.. nothing ensures CPU0 sees l.locked
> before CPU1 tests flag.
Yup.
> So while we fixed the first correctness case (global / local locking as
> employed by ipc/sem and nf_conntrack) this is still very much broken.
>
> My patch today rewrites spin_unlock_wait() and spin_is_locked() to rely
> on more information to (hopefully -- I really need sleep) fix both.
>
> The primary observation is that even though the l.locked store is
> delayed, there has been a prior atomic operation on the lock word to
> register the contending lock (in the above scenario, set the pending
> byte, in the other paths, queue onto the tail word).
>
> This means that any load passing the .locked byte store, must at least
> observe that state.
That's what I'm not sure about. Just because there was an atomic operation
writing that state, I don't think it means that it's visible to a normal
load.
Will
next prev parent reply other threads:[~2016-06-03 12:47 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-02 11:51 [PATCH -v4 0/7] spin_unlock_wait borkage and assorted bits Peter Zijlstra
2016-06-02 11:51 ` [PATCH -v4 1/7] locking: Replace smp_cond_acquire with smp_cond_load_acquire Peter Zijlstra
2016-06-02 11:51 ` [PATCH -v4 2/7] locking: Introduce smp_acquire__after_ctrl_dep Peter Zijlstra
2016-06-02 11:52 ` [PATCH -v4 3/7] locking: Move smp_cond_load_acquire() to asm-generic/barrier.h Peter Zijlstra
2016-06-02 11:52 ` [PATCH -v4 4/7] locking, tile: Provide TILE specific smp_acquire__after_ctrl_dep Peter Zijlstra
2016-06-02 11:52 ` [PATCH -v4 5/7] locking, arch: Update spin_unlock_wait() Peter Zijlstra
2016-06-02 14:24 ` Boqun Feng
2016-06-02 14:44 ` Peter Zijlstra
2016-06-02 15:11 ` Boqun Feng
2016-06-02 15:57 ` Boqun Feng
2016-06-02 16:04 ` Peter Zijlstra
2016-06-02 16:34 ` Peter Zijlstra
2016-06-02 17:57 ` Will Deacon
2016-06-02 21:51 ` Peter Zijlstra
2016-06-03 12:47 ` Will Deacon [this message]
2016-06-03 13:42 ` Peter Zijlstra
2016-06-03 17:35 ` Will Deacon
2016-06-03 19:13 ` Peter Zijlstra
2016-06-03 13:48 ` Peter Zijlstra
2016-06-06 16:08 ` Peter Zijlstra
2016-06-07 11:43 ` Boqun Feng
2016-06-07 12:00 ` Peter Zijlstra
2016-06-07 12:45 ` Boqun Feng
2016-06-07 17:36 ` Peter Zijlstra
2016-06-02 11:52 ` [PATCH -v4 6/7] locking: Update spin_unlock_wait users Peter Zijlstra
2016-06-02 11:52 ` [PATCH -v4 7/7] locking,netfilter: Fix nf_conntrack_lock() Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160603124734.GK9915@arm.com \
--to=will.deacon@arm.com \
--cc=Waiman.Long@hpe.com \
--cc=boqun.feng@gmail.com \
--cc=chris@zankel.net \
--cc=cmetcalf@mellanox.com \
--cc=dave@stgolabs.net \
--cc=davem@davemloft.net \
--cc=dhowells@redhat.com \
--cc=hofrat@osadl.org \
--cc=james.hogan@imgtec.com \
--cc=jejb@parisc-linux.org \
--cc=kaber@trash.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=manfred@colorfullife.com \
--cc=mpe@ellerman.id.au \
--cc=netfilter-devel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=pablo@netfilter.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=ralf@linux-mips.org \
--cc=realmz6@gmail.com \
--cc=rkuo@codeaurora.org \
--cc=rth@twiddle.net \
--cc=sasha.levin@oracle.com \
--cc=schwidefsky@de.ibm.com \
--cc=tj@kernel.org \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=vgupta@synopsys.com \
--cc=ysato@users.sourceforge.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).