All of lore.kernel.org
 help / color / mirror / Atom feed
From: robherring2@gmail.com (Rob Herring)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] ARM: implement optimized percpu variable access
Date: Fri, 23 Nov 2012 11:06:07 -0600	[thread overview]
Message-ID: <50AFACFF.4000907@gmail.com> (raw)
In-Reply-To: <20121122113401.GC3113@mudshark.cambridge.arm.com>

On 11/22/2012 05:34 AM, Will Deacon wrote:
> Hi Rob,
> 
> On Sun, Nov 11, 2012 at 03:20:40AM +0000, Rob Herring wrote:
>> From: Rob Herring <rob.herring@calxeda.com>
>>
>> Use the previously unused TPIDRPRW register to store percpu offsets.
>> TPIDRPRW is only accessible in PL1, so it can only be used in the kernel.
>>
>> This saves 2 loads for each percpu variable access which should yield
>> improved performance, but the improvement has not been quantified.
>>
>> Signed-off-by: Rob Herring <rob.herring@calxeda.com>
>> ---
>>  arch/arm/include/asm/Kbuild   |    1 -
>>  arch/arm/include/asm/percpu.h |   44 +++++++++++++++++++++++++++++++++++++++++
>>  arch/arm/kernel/smp.c         |    3 +++
>>  3 files changed, 47 insertions(+), 1 deletion(-)
>>  create mode 100644 arch/arm/include/asm/percpu.h
> 
> Russell pointed out to me that this patch will break on v6 CPUs if they don't
> have the thread ID registers and we're running with SMP_ON_UP=y. Looking at
> the TRMs, the only case we care about is 1136 < r1p0, but it does indeed break
> there (I have a board on my desk).

Are there any non ARM Ltd. cores without v6K we need to worry about? I
wouldn't think there are many 1136 < r1p0 out there (your desk being an
obvious exception).

> There are a few ways to fix this:
> 
> (1) Use the SMP alternates to patch the code when running on UP systems. I
>     tried this and the code is absolutely diabolical (see below).
> 
> (2) Rely on the registers being RAZ/WI rather than undef (which seems to be
>     the case on my board) and add on the pcpu delta manually. This is also
>     really horrible.
> 
> (3) Just make the thing depend on __LINUX_ARM_ARCH__ >= 7. Yes, we lose on
>     11MPCore, but we win on A8 and the code is much, much simpler.

I would lean towards this option. It really only has to depend on v6K
and !v6. We can refine the multi-platform selection to allow v7 and v6K
only builds in addition to v7 and v6. I think we are going to have
enough optimizations with v7 (gcc optimizations, thumb2, unaligned
accesses, etc.) that most people will do v7 only builds.

Rob

> 
> As an aside, you also need to make the asm block volatile in
> __my_cpu_offset -- I can see it being re-ordered before the set for
> secondary CPUs otherwise.
> 
> Will
> 
> --->8
> 
> From b12ab049b864c2b6f0be1558c934d7213871b223 Mon Sep 17 00:00:00 2001
> From: Will Deacon <will.deacon@arm.com>
> Date: Wed, 21 Nov 2012 10:53:28 +0000
> Subject: [PATCH 1/2] ARM: smp: move SMP_ON_UP alternate macros to common
>  header file
> 
> The ALT_{UP,SMP} macros are used to patch instructions at runtime
> depending on whether we are running on an SMP platform or not. This is
> generally done in out-of-line assembly code, but there is also the need
> for this functionality in inline assembly (e.g. spinlocks).
> 
> This patch moves the macros into their own header file, which provides
> the correct definitions depending on __ASSEMBLY__.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  arch/arm/include/asm/assembler.h | 29 +------------------
>  arch/arm/include/asm/smp_on_up.h | 60 ++++++++++++++++++++++++++++++++++++++++
>  arch/arm/include/asm/spinlock.h  | 21 +++++---------
>  arch/arm/kernel/head.S           |  1 +
>  arch/arm/kernel/module.c         |  8 ++----
>  5 files changed, 71 insertions(+), 48 deletions(-)
>  create mode 100644 arch/arm/include/asm/smp_on_up.h
> 
> diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
> index 2ef9581..7da33e0 100644
> --- a/arch/arm/include/asm/assembler.h
> +++ b/arch/arm/include/asm/assembler.h
> @@ -23,6 +23,7 @@
>  #include <asm/ptrace.h>
>  #include <asm/domain.h>
>  #include <asm/opcodes-virt.h>
> +#include <asm/smp_on_up.h>
>  
>  #define IOMEM(x)	(x)
>  
> @@ -166,34 +167,6 @@
>  	.long	9999b,9001f;			\
>  	.popsection
>  
> -#ifdef CONFIG_SMP
> -#define ALT_SMP(instr...)					\
> -9998:	instr
> -/*
> - * Note: if you get assembler errors from ALT_UP() when building with
> - * CONFIG_THUMB2_KERNEL, you almost certainly need to use
> - * ALT_SMP( W(instr) ... )
> - */
> -#define ALT_UP(instr...)					\
> -	.pushsection ".alt.smp.init", "a"			;\
> -	.long	9998b						;\
> -9997:	instr							;\
> -	.if . - 9997b != 4					;\
> -		.error "ALT_UP() content must assemble to exactly 4 bytes";\
> -	.endif							;\
> -	.popsection
> -#define ALT_UP_B(label)					\
> -	.equ	up_b_offset, label - 9998b			;\
> -	.pushsection ".alt.smp.init", "a"			;\
> -	.long	9998b						;\
> -	W(b)	. + up_b_offset					;\
> -	.popsection
> -#else
> -#define ALT_SMP(instr...)
> -#define ALT_UP(instr...) instr
> -#define ALT_UP_B(label) b label
> -#endif
> -
>  /*
>   * Instruction barrier
>   */
> diff --git a/arch/arm/include/asm/smp_on_up.h b/arch/arm/include/asm/smp_on_up.h
> new file mode 100644
> index 0000000..cc9c527
> --- /dev/null
> +++ b/arch/arm/include/asm/smp_on_up.h
> @@ -0,0 +1,60 @@
> +#ifndef __ASM_ARM_SMP_ON_UP_H_
> +#define __ASM_ARM_SMP_ON_UP_H_
> +
> +#ifndef __ASSEMBLY__
> +#ifdef CONFIG_SMP_ON_UP
> +extern int fixup_smp(const void *addr, unsigned long size);
> +#else
> +static inline int fixup_smp(const void *addr, unsigned long size)
> +{
> +	return -EINVAL;
> +}
> +#endif	/* CONFIG_SMP_ON_UP */
> +#endif	/* __ASSEMBLY__ */
> +
> +/*
> + * Note: if you get assembler errors from ALT_UP() when building with
> + * CONFIG_THUMB2_KERNEL, you almost certainly need to use
> + * ALT_SMP( W(instr) ... )
> + */
> +#ifdef CONFIG_SMP
> +#ifndef __ASSEMBLY__
> +
> +#define ALT_SMP(instr)						\
> +	"9998:	" instr "\n"
> +
> +#define ALT_UP(instr)						\
> +	"	.pushsection \".alt.smp.init\", \"a\"\n"	\
> +	"	.long	9998b\n"				\
> +	"	" instr "\n"					\
> +	"	.popsection\n"
> +
> +#else
> +
> +#define ALT_SMP(instr...)					\
> +9998:	instr
> +
> +#define ALT_UP(instr...)					\
> +	.pushsection ".alt.smp.init", "a"			;\
> +	.long	9998b						;\
> +9997:	instr							;\
> +	.if . - 9997b != 4					;\
> +		.error "ALT_UP() content must assemble to exactly 4 bytes";\
> +	.endif							;\
> +	.popsection
> +
> +#define ALT_UP_B(label)						\
> +	.equ	up_b_offset, label - 9998b			;\
> +	.pushsection ".alt.smp.init", "a"			;\
> +	.long	9998b						;\
> +	W(b)	. + up_b_offset					;\
> +	.popsection
> +
> +#endif	/* __ASSEMBLY */
> +#else
> +#define ALT_SMP(instr...)
> +#define ALT_UP(instr...) instr
> +#define ALT_UP_B(label) b label
> +#endif	/* CONFIG_SMP */
> +
> +#endif	/* __ASM_ARM_SMP_ON_UP_H_ */
> diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
> index b4ca707..58d2f42 100644
> --- a/arch/arm/include/asm/spinlock.h
> +++ b/arch/arm/include/asm/spinlock.h
> @@ -6,20 +6,14 @@
>  #endif
>  
>  #include <asm/processor.h>
> +#include <asm/smp_on_up.h>
>  
>  /*
>   * sev and wfe are ARMv6K extensions.  Uniprocessor ARMv6 may not have the K
>   * extensions, so when running on UP, we have to patch these instructions away.
>   */
> -#define ALT_SMP(smp, up)					\
> -	"9998:	" smp "\n"					\
> -	"	.pushsection \".alt.smp.init\", \"a\"\n"	\
> -	"	.long	9998b\n"				\
> -	"	" up "\n"					\
> -	"	.popsection\n"
> -
>  #ifdef CONFIG_THUMB2_KERNEL
> -#define SEV		ALT_SMP("sev.w", "nop.w")
> +#define SEV		ALT_SMP("sev.w")	ALT_UP("nop.w")
>  /*
>   * For Thumb-2, special care is needed to ensure that the conditional WFE
>   * instruction really does assemble to exactly 4 bytes (as required by
> @@ -33,13 +27,12 @@
>   */
>  #define WFE(cond)	ALT_SMP(		\
>  	"it " cond "\n\t"			\
> -	"wfe" cond ".n",			\
> -						\
> -	"nop.w"					\
> -)
> +	"wfe" cond ".n")			\
> +			ALT_UP(			\
> +	"nop.w")
>  #else
> -#define SEV		ALT_SMP("sev", "nop")
> -#define WFE(cond)	ALT_SMP("wfe" cond, "nop")
> +#define SEV		ALT_SMP("sev")		ALT_UP("nop")
> +#define WFE(cond)	ALT_SMP("wfe" cond)	ALT_UP("nop")
>  #endif
>  
>  static inline void dsb_sev(void)
> diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
> index 4eee351..c44b51f 100644
> --- a/arch/arm/kernel/head.S
> +++ b/arch/arm/kernel/head.S
> @@ -495,6 +495,7 @@ smp_on_up:
>  	.text
>  __do_fixup_smp_on_up:
>  	cmp	r4, r5
> +	movhs	r0, #0
>  	movhs	pc, lr
>  	ldmia	r4!, {r0, r6}
>   ARM(	str	r6, [r0, r3]	)
> diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
> index 1e9be5d..f2299d0 100644
> --- a/arch/arm/kernel/module.c
> +++ b/arch/arm/kernel/module.c
> @@ -23,6 +23,7 @@
>  #include <asm/pgtable.h>
>  #include <asm/sections.h>
>  #include <asm/smp_plat.h>
> +#include <asm/smp_on_up.h>
>  #include <asm/unwind.h>
>  
>  #ifdef CONFIG_XIP_KERNEL
> @@ -266,7 +267,6 @@ static const Elf_Shdr *find_mod_section(const Elf32_Ehdr *hdr,
>  }
>  
>  extern void fixup_pv_table(const void *, unsigned long);
> -extern void fixup_smp(const void *, unsigned long);
>  
>  int module_finalize(const Elf32_Ehdr *hdr, const Elf_Shdr *sechdrs,
>  		    struct module *mod)
> @@ -323,11 +323,7 @@ int module_finalize(const Elf32_Ehdr *hdr, const Elf_Shdr *sechdrs,
>  #endif
>  	s = find_mod_section(hdr, sechdrs, ".alt.smp.init");
>  	if (s && !is_smp())
> -#ifdef CONFIG_SMP_ON_UP
> -		fixup_smp((void *)s->sh_addr, s->sh_size);
> -#else
> -		return -EINVAL;
> -#endif
> +		return fixup_smp((void *)s->sh_addr, s->sh_size);
>  	return 0;
>  }
>  
> 

  parent reply	other threads:[~2012-11-23 17:06 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-11  3:20 [PATCH] ARM: implement optimized percpu variable access Rob Herring
2012-11-12 10:23 ` Will Deacon
2012-11-12 13:03   ` Rob Herring
2012-11-12 13:28     ` Will Deacon
2012-11-12 14:03       ` Rob Herring
2012-11-27 17:29     ` Nicolas Pitre
2012-11-12 14:21   ` Rob Herring
2012-11-12 14:41     ` Will Deacon
2012-11-12 16:51       ` Will Deacon
2012-11-12 21:01         ` Rob Herring
2012-11-13 10:40           ` Will Deacon
2012-11-22 11:34 ` Will Deacon
2012-11-22 11:39   ` Russell King - ARM Linux
2012-11-23 17:06   ` Rob Herring [this message]
2012-11-23 17:12     ` Russell King - ARM Linux
2012-11-23 17:16     ` Will Deacon
2012-11-23 20:34       ` Tony Lindgren
2012-11-23 20:32   ` Tony Lindgren
2012-11-25 18:46   ` Rob Herring
2012-11-26 11:13     ` Will Deacon
2012-11-26 15:15       ` Will Deacon
2012-11-26 17:30         ` Rob Herring
2012-11-27 13:17           ` Will Deacon
2012-11-27 13:26             ` Russell King - ARM Linux
2012-11-26 21:58         ` Jamie Lokier
2012-11-26 23:50           ` Jamie Lokier
2012-11-27  1:02         ` Jamie Lokier
2012-11-27 22:02           ` Rob Herring
2012-11-28 12:34           ` Will Deacon
2012-11-27 17:35         ` Nicolas Pitre
2012-11-27 19:27           ` Nicolas Pitre
2012-11-27 17:19 ` Nicolas Pitre
2012-11-27 19:37   ` Rob Herring
2012-11-27 20:42     ` Rob Herring
2012-11-27 22:02       ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50AFACFF.4000907@gmail.com \
    --to=robherring2@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.