All of lore.kernel.org
 help / color / mirror / Atom feed
From: steve.capper@linaro.org (Steve Capper)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] arm64: percpu: Implement this_cpu operations
Date: Fri, 7 Nov 2014 13:52:06 +0000	[thread overview]
Message-ID: <20141107135205.GA7591@linaro.org> (raw)
In-Reply-To: <20141106122753.GC31605@arm.com>

On Thu, Nov 06, 2014 at 12:27:53PM +0000, Will Deacon wrote:
> Hi Steve,
> 
> Thanks for looking at this!

Hey Will,
No problem, it's quite beneficial for performance.

> 
> On Thu, Nov 06, 2014 at 11:12:57AM +0000, Steve Capper wrote:
> > The generic this_cpu operations disable interrupts to ensure that the
> > requested operation is protected from pre-emption. For arm64, this is
> > overkill and can hurt throughput and latency.
> > 
> > This patch provides arm64 specific implementations for the this_cpu
> > operations. Rather than disable interrupts, we use the exclusive
> > monitor or atomic operations as appropriate.
> > 
> > The following operations are implemented: add, add_return, and, or,
> > read, write, xchg. We also wire up a cmpxchg implementation from
> > cmpxchg.h.
> > 
> > Testing was performed using the percpu_test module and hackbench on a
> > Juno board running 3.18-rc3.
> 
> [...]
> 
> > diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
> > index 3e02245..3e51f49 100644
> > --- a/arch/arm64/include/asm/cmpxchg.h
> > +++ b/arch/arm64/include/asm/cmpxchg.h
> > @@ -237,8 +237,10 @@ static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
> >  	__ret; \
> >  })
> >  
> > -#define this_cpu_cmpxchg_8(ptr, o, n) \
> > -	cmpxchg(raw_cpu_ptr(&(ptr)), o, n);
> > +#define this_cpu_cmpxchg_1(ptr, o, n) cmpxchg(raw_cpu_ptr(&(ptr)), o, n)
> > +#define this_cpu_cmpxchg_2(ptr, o, n) cmpxchg(raw_cpu_ptr(&(ptr)), o, n)
> > +#define this_cpu_cmpxchg_4(ptr, o, n) cmpxchg(raw_cpu_ptr(&(ptr)), o, n)
> > +#define this_cpu_cmpxchg_8(ptr, o, n) cmpxchg(raw_cpu_ptr(&(ptr)), o, n)
> 
> You can use cmpxchg_local here, as we don't require barrier semantics.

Agreed, thanks, I will update that.

> 
> >  #define this_cpu_cmpxchg_double_8(ptr1, ptr2, o1, o2, n1, n2) \
> >  	cmpxchg_double(raw_cpu_ptr(&(ptr1)), raw_cpu_ptr(&(ptr2)), \
> > diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
> > index 5279e57..e751681 100644
> > --- a/arch/arm64/include/asm/percpu.h
> > +++ b/arch/arm64/include/asm/percpu.h
> > @@ -44,6 +44,237 @@ static inline unsigned long __my_cpu_offset(void)
> >  
> >  #endif /* CONFIG_SMP */
> >  
> > +#define PERCPU_OP(op, asm_op)						\
> > +static inline unsigned long __percpu_##op(void *ptr,			\
> > +			unsigned long val, int size)			\
> > +{									\
> > +	unsigned long loop, ret;					\
> > +									\
> > +	switch (size) {							\
> > +	case 1:								\
> > +		do {							\
> > +			asm ("//__per_cpu_" #op "_1\n"			\
> > +			"ldxrb	  %w[ret], %[ptr]\n"			\
> > +			#asm_op " %w[ret], %w[ret], %w[val]\n"		\
> > +			"stxrb	  %w[loop], %w[ret], %[ptr]\n"		\
> > +			: [loop] "=&r" (loop), [ret] "=&r" (ret),	\
> > +			  [ptr] "+Q"(*(u8 *)ptr)			\
> > +			: [val] "Ir" (val));				\
> > +		} while (loop);						\
> > +		break;							\
> 
> Curious, but do you see any difference in code generation over an explicit
> cbnz, like we use in the ATOMIC_OP macro?

I've not noticed any substancial difference in the code paths I've
inspected. Theoretically, a compiler that is extremely averse to
branches could do some unrolling with this.

I decided to give the compiler as much control as possible, so elected
to put the minimum amount of assembler in.

> 
> > +	case 2:								\
> > +		do {							\
> > +			asm ("//__per_cpu_" #op "_2\n"			\
> > +			"ldxrh	  %w[ret], %[ptr]\n"			\
> > +			#asm_op " %w[ret], %w[ret], %w[val]\n"		\
> > +			"stxrh	  %w[loop], %w[ret], %[ptr]\n"		\
> > +			: [loop] "=&r" (loop), [ret] "=&r" (ret),	\
> > +			  [ptr]  "+Q"(*(u16 *)ptr)			\
> > +			: [val] "Ir" (val));				\
> > +		} while (loop);						\
> > +		break;							\
> > +	case 4:								\
> > +		do {							\
> > +			asm ("//__per_cpu_" #op "_4\n"			\
> > +			"ldxr	  %w[ret], %[ptr]\n"			\
> > +			#asm_op " %w[ret], %w[ret], %w[val]\n"		\
> > +			"stxr	  %w[loop], %w[ret], %[ptr]\n"		\
> > +			: [loop] "=&r" (loop), [ret] "=&r" (ret),	\
> > +			  [ptr] "+Q"(*(u32 *)ptr)			\
> > +			: [val] "Ir" (val));				\
> > +		} while (loop);						\
> > +		break;							\
> > +	case 8:								\
> > +		do {							\
> > +			asm ("//__per_cpu_" #op "_8\n"			\
> > +			"ldxr	  %[ret], %[ptr]\n"			\
> > +			#asm_op " %[ret], %[ret], %[val]\n"		\
> > +			"stxr	  %w[loop], %[ret], %[ptr]\n"		\
> > +			: [loop] "=&r" (loop), [ret] "=&r" (ret),	\
> > +			  [ptr] "+Q"(*(u64 *)ptr)			\
> > +			: [val] "Ir" (val));				\
> > +		} while (loop);						\
> > +		break;							\
> > +	default:							\
> > +		BUILD_BUG();						\
> > +	}								\
> > +									\
> > +	return ret;							\
> > +}
> > +
> > +PERCPU_OP(add, add)
> > +PERCPU_OP(and, and)
> > +PERCPU_OP(or, orr)
> 
> Can you use these to generate local_t versions too?

Sure, I forgot about them, I will add them to V3.

Cheers,
-- 
Steve

  reply	other threads:[~2014-11-07 13:52 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-06 11:12 [PATCH] arm64: percpu: Implement this_cpu operations Steve Capper
2014-11-06 11:26 ` Ard Biesheuvel
2014-11-06 11:52   ` Steve Capper
2014-11-06 12:23     ` [PATCH V2] " Steve Capper
2014-11-06 12:27 ` [PATCH] " Will Deacon
2014-11-07 13:52   ` Steve Capper [this message]
2014-11-14 11:37     ` [PATCH V3] " Steve Capper
2014-11-14 13:46       ` Will Deacon
2014-11-14 15:03         ` [PATCH V4] " Steve Capper
2014-11-17 10:40           ` Will Deacon
2014-11-19 15:49             ` Steve Capper
2014-11-19 16:53               ` [PATCH V5] " Steve Capper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141107135205.GA7591@linaro.org \
    --to=steve.capper@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.