[PATCH 07/18] arm64: locks: patch in lse instructions when supported by the CPU

All of lore.kernel.org
 help / color / mirror / Atom feed

From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 07/18] arm64: locks: patch in lse instructions when supported by the CPU
Date: Tue, 21 Jul 2015 18:29:18 +0100	[thread overview]
Message-ID: <20150721172918.GO31095@arm.com> (raw)
In-Reply-To: <20150721165338.GE7250@e104818-lin.cambridge.arm.com>

Hi Catalin,

On Tue, Jul 21, 2015 at 05:53:39PM +0100, Catalin Marinas wrote:
> On Mon, Jul 13, 2015 at 10:25:08AM +0100, Will Deacon wrote:
> > @@ -67,15 +78,25 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
> >  	unsigned int tmp;
> >  	arch_spinlock_t lockval;
> >  
> > -	asm volatile(
> > -"	prfm	pstl1strm, %2\n"
> > -"1:	ldaxr	%w0, %2\n"
> > -"	eor	%w1, %w0, %w0, ror #16\n"
> > -"	cbnz	%w1, 2f\n"
> > -"	add	%w0, %w0, %3\n"
> > -"	stxr	%w1, %w0, %2\n"
> > -"	cbnz	%w1, 1b\n"
> > -"2:"
> > +	asm volatile(ARM64_LSE_ATOMIC_INSN(
> > +	/* LL/SC */
> > +	"	prfm	pstl1strm, %2\n"
> > +	"1:	ldaxr	%w0, %2\n"
> > +	"	eor	%w1, %w0, %w0, ror #16\n"
> > +	"	cbnz	%w1, 2f\n"
> > +	"	add	%w0, %w0, %3\n"
> > +	"	stxr	%w1, %w0, %2\n"
> > +	"	cbnz	%w1, 1b\n"
> > +	"2:",
> > +	/* LSE atomics */
> > +	"	ldar	%w0, %2\n"
> 
> Do we still need an acquire if we fail to take the lock?

Good point, I think we can drop that. Read-after-read ordering is sufficient
to order LDR -> CASA in a trylock loop.

> 
> > +	"	eor	%w1, %w0, %w0, ror #16\n"
> > +	"	cbnz	%w1, 1f\n"
> > +	"	add	%w1, %w0, %3\n"
> > +	"	casa	%w0, %w1, %2\n"
> > +	"	and	%w1, %w1, #0xffff\n"
> > +	"	eor	%w1, %w1, %w0, lsr #16\n"
> > +	"1:")
> >  	: "=&r" (lockval), "=&r" (tmp), "+Q" (*lock)
> >  	: "I" (1 << TICKET_SHIFT)
> >  	: "memory");
> 
> I wonder if this is any faster with LSE. CAS would have to re-load the
> lock but we already have the value loaded (though most likely the reload
> will be from cache and we save a cbnz that can be mispredicted). I guess
> we'll re-test when we get some real hardware.

Yeah, I'll definitely want to evaluate this on real hardware when it shows
up. For now, I'm basically trying to avoid explicit dirtying at L1, so
the CAS at least gives hardware the chance to go far in the face of
contention.

> Does prfm help in any way with the LDAR?

We'd need to prfm for write, which would end up forcing everything near
(which I'm avoiding for now).

> > @@ -85,10 +106,19 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
> >  
> >  static inline void arch_spin_unlock(arch_spinlock_t *lock)
> >  {
> > -	asm volatile(
> > -"	stlrh	%w1, %0\n"
> > -	: "=Q" (lock->owner)
> > -	: "r" (lock->owner + 1)
> > +	unsigned long tmp;
> > +
> > +	asm volatile(ARM64_LSE_ATOMIC_INSN(
> > +	/* LL/SC */
> > +	"	ldr	%w1, %0\n"
> > +	"	add	%w1, %w1, #1\n"
> > +	"	stlrh	%w1, %0",
> > +	/* LSE atomics */
> > +	"	mov	%w1, #1\n"
> > +	"	nop\n"
> > +	"	staddlh	%w1, %0")
> > +	: "=Q" (lock->owner), "=&r" (tmp)
> > +	:
> >  	: "memory");
> >  }
> >  
> > @@ -125,11 +155,19 @@ static inline void arch_write_lock(arch_rwlock_t *rw)
> >  
> >  	asm volatile(
> >  	"	sevl\n"
> > +	ARM64_LSE_ATOMIC_INSN(
> > +	/* LL/SC */
> >  	"1:	wfe\n"
> >  	"2:	ldaxr	%w0, %1\n"
> >  	"	cbnz	%w0, 1b\n"
> >  	"	stxr	%w0, %w2, %1\n"
> > -	"	cbnz	%w0, 2b\n"
> > +	"	cbnz	%w0, 2b",
> > +	/* LSE atomics */
> > +	"1:	wfe\n"
> > +	"	mov	%w0, wzr\n"
> > +	"	casa	%w0, %w2, %1\n"
> > +	"	nop\n"
> > +	"	cbnz	%w0, 1b")
> >  	: "=&r" (tmp), "+Q" (rw->lock)
> >  	: "r" (0x80000000)
> >  	: "memory");
> 
> With WFE in the LL/SC case, we rely on LDAXR to set the exclusive
> monitor and an event would be generated every time it gets cleared. With
> CAS, we no longer have this behaviour, so what guarantees a SEV?

My understanding was that failed CAS will set the exclusive monitor, but
what I have for a spec doesn't actually comment on this behaviour. I'll
go digging...

> > @@ -139,11 +177,16 @@ static inline int arch_write_trylock(arch_rwlock_t *rw)
> >  {
> >  	unsigned int tmp;
> >  
> > -	asm volatile(
> > +	asm volatile(ARM64_LSE_ATOMIC_INSN(
> > +	/* LL/SC */
> >  	"	ldaxr	%w0, %1\n"
> >  	"	cbnz	%w0, 1f\n"
> >  	"	stxr	%w0, %w2, %1\n"
> 
> Not a comment to your patch but to the original code: don't we need a
> branch back to ldaxr if stxr fails, like we do for arch_spin_trylock?
> The same comment for arch_read_trylock.

I don't think Linux specifies the behaviour here, to be honest. C11
distinguishes between "weak" and "strong" cmpxchg to try and address this
sort of thing. Since we're not sure, I suppose we should loop (like we
do for arch/arm/).

Bear in mind that I'm currently moving us over to the qrwlock once I've
got this out of the way and added generic support for relaxed atomics,
which will result in the deletion of all of this anyway.

> 
> > -	"1:\n"
> > +	"1:",
> > +	/* LSE atomics */
> > +	"	mov	%w0, wzr\n"
> > +	"	casa	%w0, %w2, %1\n"
> > +	"	nop")
> >  	: "=&r" (tmp), "+Q" (rw->lock)
> >  	: "r" (0x80000000)
> >  	: "memory");
> > @@ -153,9 +196,10 @@ static inline int arch_write_trylock(arch_rwlock_t *rw)
> >  
> >  static inline void arch_write_unlock(arch_rwlock_t *rw)
> >  {
> > -	asm volatile(
> > -	"	stlr	%w1, %0\n"
> > -	: "=Q" (rw->lock) : "r" (0) : "memory");
> > +	asm volatile(ARM64_LSE_ATOMIC_INSN(
> > +	"	stlr	wzr, %0",
> > +	"	swpl	wzr, wzr, %0")
> > +	: "=Q" (rw->lock) :: "memory");
> 
> Is this any better than just STLR? We don't need the memory read.

Again, I'm trying to avoid explicit dirtying at L1.

Will

next prev parent reply	other threads:[~2015-07-21 17:29 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-13  9:25 [PATCH 00/18] arm64: support for 8.1 LSE atomic instructions Will Deacon
2015-07-13  9:25 ` [PATCH 01/18] arm64: cpufeature.h: add missing #include of kernel.h Will Deacon
2015-07-13  9:25 ` [PATCH 02/18] arm64: atomics: move ll/sc atomics into separate header file Will Deacon
2015-07-13  9:25 ` [PATCH 03/18] arm64: elf: advertise 8.1 atomic instructions as new hwcap Will Deacon
2015-07-17 13:48   ` Catalin Marinas
2015-07-17 13:57     ` Russell King - ARM Linux
2015-07-13  9:25 ` [PATCH 04/18] arm64: alternatives: add cpu feature for lse atomics Will Deacon
2015-07-13  9:25 ` [PATCH 05/18] arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics Will Deacon
2015-07-17 16:32   ` Catalin Marinas
2015-07-17 17:25     ` Will Deacon
2015-07-13  9:25 ` [PATCH 06/18] arm64: atomics: patch in lse instructions when supported by the CPU Will Deacon
2015-07-13  9:25 ` [PATCH 07/18] arm64: locks: " Will Deacon
2015-07-21 16:53   ` Catalin Marinas
2015-07-21 17:29     ` Will Deacon [this message]
2015-07-23 13:39       ` Will Deacon
2015-07-23 14:14         ` Catalin Marinas
2015-07-13  9:25 ` [PATCH 08/18] arm64: bitops: " Will Deacon
2015-07-13  9:25 ` [PATCH 09/18] arm64: xchg: " Will Deacon
2015-07-13  9:25 ` [PATCH 10/18] arm64: cmpxchg: " Will Deacon
2015-07-13  9:25 ` [PATCH 11/18] arm64: cmpxchg_dbl: " Will Deacon
2015-07-13  9:25 ` [PATCH 12/18] arm64: cmpxchg: avoid "cc" clobber in ll/sc routines Will Deacon
2015-07-21 17:16   ` Catalin Marinas
2015-07-21 17:32     ` Will Deacon
2015-07-13  9:25 ` [PATCH 13/18] arm64: cmpxchg: avoid memory barrier on comparison failure Will Deacon
2015-07-13 10:28   ` Peter Zijlstra
2015-07-13 11:22     ` Will Deacon
2015-07-13 13:39       ` Peter Zijlstra
2015-07-13 14:52         ` Will Deacon
2015-07-13 15:32           ` Peter Zijlstra
2015-07-13 15:58             ` Will Deacon
2015-08-03 16:59               ` [tip:locking/core] locking/pvqspinlock: Order pv_unhash() after cmpxchg() on unlock slowpath tip-bot for Will Deacon
2015-07-13  9:25 ` [PATCH 14/18] arm64: atomics: tidy up common atomic{,64}_* macros Will Deacon
2015-07-13  9:25 ` [PATCH 15/18] arm64: atomics: prefetch the destination word for write prior to stxr Will Deacon
2015-07-13  9:25 ` [PATCH 16/18] arm64: atomics: implement atomic{, 64}_cmpxchg using cmpxchg Will Deacon
2015-07-13  9:25 ` [PATCH 17/18] arm64: atomic64_dec_if_positive: fix incorrect branch condition Will Deacon
2015-07-13  9:25 ` [PATCH 18/18] arm64: kconfig: select HAVE_CMPXCHG_LOCAL Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150721172918.GO31095@arm.com \
    --to=will.deacon@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.