Re: [PATCH 2/2] xen: arm: update arm32 assembly primitives to Linux v3.16-rc6

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Julien Grall <julien.grall@linaro.org>
To: Ian Campbell <ian.campbell@citrix.com>, xen-devel@lists.xen.org
Cc: tim@xen.org, stefano.stabellini@eu.citrix.com
Subject: Re: [PATCH 2/2] xen: arm: update arm32 assembly primitives to Linux v3.16-rc6
Date: Fri, 25 Jul 2014 16:42:43 +0100	[thread overview]
Message-ID: <53D27AF3.5070706@linaro.org> (raw)
In-Reply-To: <2c06427f1180cf408a3e9750de3040dde0afe2ea.1406301772.git.ian.campbell@citrix.com>

Hi Ian,

On 07/25/2014 04:22 PM, Ian Campbell wrote:
> bitops, cmpxchg, atomics: Import:
>   c32ffce ARM: 7984/1: prefetch: add prefetchw invocations for barriered atomics

Compare to Linux we don't have specific prefetch* helpers. We directly
use the compiler builtin ones. Shouldn't we import the ARM specific
helpers to gain in performance?

Regards,

>     Author: Will Deacon <will.deacon@arm.com>
>     Signed-off-by: Will Deacon <will.deacon@arm.com>
>     Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> 
> atomics: In addition to the above import:
>   db38ee8 ARM: 7983/1: atomics: implement a better __atomic_add_unless for v6+
>     Author: Will Deacon <will.deacon@arm.com>
>     Signed-off-by: Will Deacon <will.deacon@arm.com>
>     Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> 
> spinlocks: We have diverged from Linux, so no updates but note this in the README.
> 
> mem* and str*: Import:
>   d98b90e ARM: 7990/1: asm: rename logical shift macros push pull into lspush lspull
>     Author: Victor Kamensky <victor.kamensky@linaro.org>
>     Suggested-by: Will Deacon <will.deacon@arm.com>
>     Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
>     Acked-by: Nicolas Pitre <nico@linaro.org>
>     Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> 
>   For some reason str* were mentioned under mem* in the README, fix.
> 
> libgcc: No changes, update baseline
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> ---
>  xen/arch/arm/README.LinuxPrimitives    |   17 +++++++--------
>  xen/arch/arm/arm32/lib/assembler.h     |    8 +++----
>  xen/arch/arm/arm32/lib/bitops.h        |    5 +++++
>  xen/arch/arm/arm32/lib/copy_template.S |   36 ++++++++++++++++----------------
>  xen/arch/arm/arm32/lib/memmove.S       |   36 ++++++++++++++++----------------
>  xen/include/asm-arm/arm32/atomic.h     |   32 ++++++++++++++++++++++++++++
>  xen/include/asm-arm/arm32/cmpxchg.h    |    5 +++++
>  7 files changed, 90 insertions(+), 49 deletions(-)
> 
> diff --git a/xen/arch/arm/README.LinuxPrimitives b/xen/arch/arm/README.LinuxPrimitives
> index 69eeb70..7e15b04 100644
> --- a/xen/arch/arm/README.LinuxPrimitives
> +++ b/xen/arch/arm/README.LinuxPrimitives
> @@ -65,7 +65,7 @@ linux/arch/arm64/lib/copy_page.S        unused in Xen
>  arm32
>  =====================================================================
>  
> -bitops: last sync @ v3.14-rc7 (last commit: b7ec699)
> +bitops: last sync @ v3.16-rc6 (last commit: c32ffce0f66e)
>  
>  linux/arch/arm/lib/bitops.h             xen/arch/arm/arm32/lib/bitops.h
>  linux/arch/arm/lib/changebit.S          xen/arch/arm/arm32/lib/changebit.S
> @@ -83,13 +83,13 @@ done
>  
>  ---------------------------------------------------------------------
>  
> -cmpxchg: last sync @ v3.14-rc7 (last commit: 775ebcc)
> +cmpxchg: last sync @ v3.16-rc6 (last commit: c32ffce0f66e)
>  
>  linux/arch/arm/include/asm/cmpxchg.h    xen/include/asm-arm/arm32/cmpxchg.h
>  
>  ---------------------------------------------------------------------
>  
> -atomics: last sync @ v3.14-rc7 (last commit: aed3a4e)
> +atomics: last sync @ v3.16-rc6 (last commit: 030d0178bdbd)
>  
>  linux/arch/arm/include/asm/atomic.h     xen/include/asm-arm/arm32/atomic.h
>  
> @@ -99,6 +99,8 @@ spinlocks: last sync: 15e7e5c1ebf5
>  
>  linux/arch/arm/include/asm/spinlock.h   xen/include/asm-arm/arm32/spinlock.h
>  
> +*** Linux has switched to ticket locks but we still use bitlocks.
> +
>  resync to v3.14-rc7:
>  
>    7c8746a ARM: 7955/1: spinlock: ensure we have a compiler barrier before sev
> @@ -111,7 +113,7 @@ resync to v3.14-rc7:
>  
>  ---------------------------------------------------------------------
>  
> -mem*: last sync @ v3.14-rc7 (last commit: 418df63a)
> +mem*: last sync @ v3.16-rc6 (last commit: d98b90ea22b0)
>  
>  linux/arch/arm/lib/copy_template.S      xen/arch/arm/arm32/lib/copy_template.S
>  linux/arch/arm/lib/memchr.S             xen/arch/arm/arm32/lib/memchr.S
> @@ -120,9 +122,6 @@ linux/arch/arm/lib/memmove.S            xen/arch/arm/arm32/lib/memmove.S
>  linux/arch/arm/lib/memset.S             xen/arch/arm/arm32/lib/memset.S
>  linux/arch/arm/lib/memzero.S            xen/arch/arm/arm32/lib/memzero.S
>  
> -linux/arch/arm/lib/strchr.S             xen/arch/arm/arm32/lib/strchr.S
> -linux/arch/arm/lib/strrchr.S            xen/arch/arm/arm32/lib/strrchr.S
> -
>  for i in copy_template.S memchr.S memcpy.S memmove.S memset.S \
>           memzero.S ; do
>      diff -u linux/arch/arm/lib/$i xen/arch/arm/arm32/lib/$i
> @@ -130,7 +129,7 @@ done
>  
>  ---------------------------------------------------------------------
>  
> -str*: last sync @ v3.13-rc7 (last commit: 93ed397)
> +str*: last sync @ v3.16-rc6 (last commit: d98b90ea22b0)
>  
>  linux/arch/arm/lib/strchr.S             xen/arch/arm/arm32/lib/strchr.S
>  linux/arch/arm/lib/strrchr.S            xen/arch/arm/arm32/lib/strrchr.S
> @@ -145,7 +144,7 @@ clear_page == memset
>  
>  ---------------------------------------------------------------------
>  
> -libgcc: last sync @ v3.14-rc7 (last commit: 01885bc)
> +libgcc: last sync @ v3.16-rc6 (last commit: 01885bc)
>  
>  linux/arch/arm/lib/lib1funcs.S          xen/arch/arm/arm32/lib/lib1funcs.S
>  linux/arch/arm/lib/lshrdi3.S            xen/arch/arm/arm32/lib/lshrdi3.S
> diff --git a/xen/arch/arm/arm32/lib/assembler.h b/xen/arch/arm/arm32/lib/assembler.h
> index f8d4b3a..6de2638 100644
> --- a/xen/arch/arm/arm32/lib/assembler.h
> +++ b/xen/arch/arm/arm32/lib/assembler.h
> @@ -36,8 +36,8 @@
>   * Endian independent macros for shifting bytes within registers.
>   */
>  #ifndef __ARMEB__
> -#define pull            lsr
> -#define push            lsl
> +#define lspull          lsr
> +#define lspush          lsl
>  #define get_byte_0      lsl #0
>  #define get_byte_1	lsr #8
>  #define get_byte_2	lsr #16
> @@ -47,8 +47,8 @@
>  #define put_byte_2	lsl #16
>  #define put_byte_3	lsl #24
>  #else
> -#define pull            lsl
> -#define push            lsr
> +#define lspull          lsl
> +#define lspush          lsr
>  #define get_byte_0	lsr #24
>  #define get_byte_1	lsr #16
>  #define get_byte_2	lsr #8
> diff --git a/xen/arch/arm/arm32/lib/bitops.h b/xen/arch/arm/arm32/lib/bitops.h
> index 25784c3..a167c2d 100644
> --- a/xen/arch/arm/arm32/lib/bitops.h
> +++ b/xen/arch/arm/arm32/lib/bitops.h
> @@ -37,6 +37,11 @@ UNWIND(	.fnstart	)
>  	add	r1, r1, r0, lsl #2	@ Get word offset
>  	mov	r3, r2, lsl r3		@ create mask
>  	smp_dmb
> +#if __LINUX_ARM_ARCH__ >= 7 && defined(CONFIG_SMP)
> +	.arch_extension	mp
> +	ALT_SMP(W(pldw)	[r1])
> +	ALT_UP(W(nop))
> +#endif
>  1:	ldrex	r2, [r1]
>  	ands	r0, r2, r3		@ save old value of bit
>  	\instr	r2, r2, r3		@ toggle bit
> diff --git a/xen/arch/arm/arm32/lib/copy_template.S b/xen/arch/arm/arm32/lib/copy_template.S
> index 805e3f8..3bc8eb8 100644
> --- a/xen/arch/arm/arm32/lib/copy_template.S
> +++ b/xen/arch/arm/arm32/lib/copy_template.S
> @@ -197,24 +197,24 @@
>  
>  12:	PLD(	pld	[r1, #124]		)
>  13:		ldr4w	r1, r4, r5, r6, r7, abort=19f
> -		mov	r3, lr, pull #\pull
> +		mov	r3, lr, lspull #\pull
>  		subs	r2, r2, #32
>  		ldr4w	r1, r8, r9, ip, lr, abort=19f
> -		orr	r3, r3, r4, push #\push
> -		mov	r4, r4, pull #\pull
> -		orr	r4, r4, r5, push #\push
> -		mov	r5, r5, pull #\pull
> -		orr	r5, r5, r6, push #\push
> -		mov	r6, r6, pull #\pull
> -		orr	r6, r6, r7, push #\push
> -		mov	r7, r7, pull #\pull
> -		orr	r7, r7, r8, push #\push
> -		mov	r8, r8, pull #\pull
> -		orr	r8, r8, r9, push #\push
> -		mov	r9, r9, pull #\pull
> -		orr	r9, r9, ip, push #\push
> -		mov	ip, ip, pull #\pull
> -		orr	ip, ip, lr, push #\push
> +		orr	r3, r3, r4, lspush #\push
> +		mov	r4, r4, lspull #\pull
> +		orr	r4, r4, r5, lspush #\push
> +		mov	r5, r5, lspull #\pull
> +		orr	r5, r5, r6, lspush #\push
> +		mov	r6, r6, lspull #\pull
> +		orr	r6, r6, r7, lspush #\push
> +		mov	r7, r7, lspull #\pull
> +		orr	r7, r7, r8, lspush #\push
> +		mov	r8, r8, lspull #\pull
> +		orr	r8, r8, r9, lspush #\push
> +		mov	r9, r9, lspull #\pull
> +		orr	r9, r9, ip, lspush #\push
> +		mov	ip, ip, lspull #\pull
> +		orr	ip, ip, lr, lspush #\push
>  		str8w	r0, r3, r4, r5, r6, r7, r8, r9, ip, , abort=19f
>  		bge	12b
>  	PLD(	cmn	r2, #96			)
> @@ -225,10 +225,10 @@
>  14:		ands	ip, r2, #28
>  		beq	16f
>  
> -15:		mov	r3, lr, pull #\pull
> +15:		mov	r3, lr, lspull #\pull
>  		ldr1w	r1, lr, abort=21f
>  		subs	ip, ip, #4
> -		orr	r3, r3, lr, push #\push
> +		orr	r3, r3, lr, lspush #\push
>  		str1w	r0, r3, abort=21f
>  		bgt	15b
>  	CALGN(	cmp	r2, #0			)
> diff --git a/xen/arch/arm/arm32/lib/memmove.S b/xen/arch/arm/arm32/lib/memmove.S
> index 4e142b8..18634c3 100644
> --- a/xen/arch/arm/arm32/lib/memmove.S
> +++ b/xen/arch/arm/arm32/lib/memmove.S
> @@ -148,24 +148,24 @@ ENTRY(memmove)
>  
>  12:	PLD(	pld	[r1, #-128]		)
>  13:		ldmdb   r1!, {r7, r8, r9, ip}
> -		mov     lr, r3, push #\push
> +		mov     lr, r3, lspush #\push
>  		subs    r2, r2, #32
>  		ldmdb   r1!, {r3, r4, r5, r6}
> -		orr     lr, lr, ip, pull #\pull
> -		mov     ip, ip, push #\push
> -		orr     ip, ip, r9, pull #\pull
> -		mov     r9, r9, push #\push
> -		orr     r9, r9, r8, pull #\pull
> -		mov     r8, r8, push #\push
> -		orr     r8, r8, r7, pull #\pull
> -		mov     r7, r7, push #\push
> -		orr     r7, r7, r6, pull #\pull
> -		mov     r6, r6, push #\push
> -		orr     r6, r6, r5, pull #\pull
> -		mov     r5, r5, push #\push
> -		orr     r5, r5, r4, pull #\pull
> -		mov     r4, r4, push #\push
> -		orr     r4, r4, r3, pull #\pull
> +		orr     lr, lr, ip, lspull #\pull
> +		mov     ip, ip, lspush #\push
> +		orr     ip, ip, r9, lspull #\pull
> +		mov     r9, r9, lspush #\push
> +		orr     r9, r9, r8, lspull #\pull
> +		mov     r8, r8, lspush #\push
> +		orr     r8, r8, r7, lspull #\pull
> +		mov     r7, r7, lspush #\push
> +		orr     r7, r7, r6, lspull #\pull
> +		mov     r6, r6, lspush #\push
> +		orr     r6, r6, r5, lspull #\pull
> +		mov     r5, r5, lspush #\push
> +		orr     r5, r5, r4, lspull #\pull
> +		mov     r4, r4, lspush #\push
> +		orr     r4, r4, r3, lspull #\pull
>  		stmdb   r0!, {r4 - r9, ip, lr}
>  		bge	12b
>  	PLD(	cmn	r2, #96			)
> @@ -176,10 +176,10 @@ ENTRY(memmove)
>  14:		ands	ip, r2, #28
>  		beq	16f
>  
> -15:		mov     lr, r3, push #\push
> +15:		mov     lr, r3, lspush #\push
>  		ldr	r3, [r1, #-4]!
>  		subs	ip, ip, #4
> -		orr	lr, lr, r3, pull #\pull
> +		orr	lr, lr, r3, lspull #\pull
>  		str	lr, [r0, #-4]!
>  		bgt	15b
>  	CALGN(	cmp	r2, #0			)
> diff --git a/xen/include/asm-arm/arm32/atomic.h b/xen/include/asm-arm/arm32/atomic.h
> index 3d601d1..7ec712f 100644
> --- a/xen/include/asm-arm/arm32/atomic.h
> +++ b/xen/include/asm-arm/arm32/atomic.h
> @@ -39,6 +39,7 @@ static inline int atomic_add_return(int i, atomic_t *v)
>  	int result;
>  
>  	smp_mb();
> +	prefetchw(&v->counter);
>  
>  	__asm__ __volatile__("@ atomic_add_return\n"
>  "1:	ldrex	%0, [%3]\n"
> @@ -78,6 +79,7 @@ static inline int atomic_sub_return(int i, atomic_t *v)
>  	int result;
>  
>  	smp_mb();
> +	prefetchw(&v->counter);
>  
>  	__asm__ __volatile__("@ atomic_sub_return\n"
>  "1:	ldrex	%0, [%3]\n"
> @@ -100,6 +102,7 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
>  	unsigned long res;
>  
>  	smp_mb();
> +	prefetchw(&ptr->counter);
>  
>  	do {
>  		__asm__ __volatile__("@ atomic_cmpxchg\n"
> @@ -117,6 +120,35 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
>  	return oldval;
>  }
>  
> +static inline int __atomic_add_unless(atomic_t *v, int a, int u)
> +{
> +	int oldval, newval;
> +	unsigned long tmp;
> +
> +	smp_mb();
> +	prefetchw(&v->counter);
> +
> +	__asm__ __volatile__ ("@ atomic_add_unless\n"
> +"1:	ldrex	%0, [%4]\n"
> +"	teq	%0, %5\n"
> +"	beq	2f\n"
> +"	add	%1, %0, %6\n"
> +"	strex	%2, %1, [%4]\n"
> +"	teq	%2, #0\n"
> +"	bne	1b\n"
> +"2:"
> +	: "=&r" (oldval), "=&r" (newval), "=&r" (tmp), "+Qo" (v->counter)
> +	: "r" (&v->counter), "r" (u), "r" (a)
> +	: "cc");
> +
> +	if (oldval != u)
> +		smp_mb();
> +
> +	return oldval;
> +}
> +
> +#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
> +
>  #define atomic_inc(v)		atomic_add(1, v)
>  #define atomic_dec(v)		atomic_sub(1, v)
>  
> diff --git a/xen/include/asm-arm/arm32/cmpxchg.h b/xen/include/asm-arm/arm32/cmpxchg.h
> index 9a511f2..03e0bed 100644
> --- a/xen/include/asm-arm/arm32/cmpxchg.h
> +++ b/xen/include/asm-arm/arm32/cmpxchg.h
> @@ -1,6 +1,8 @@
>  #ifndef __ASM_ARM32_CMPXCHG_H
>  #define __ASM_ARM32_CMPXCHG_H
>  
> +#include <xen/prefetch.h>
> +
>  extern void __bad_xchg(volatile void *, int);
>  
>  static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
> @@ -9,6 +11,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
>  	unsigned int tmp;
>  
>  	smp_mb();
> +	prefetchw((const void *)ptr);
>  
>  	switch (size) {
>  	case 1:
> @@ -56,6 +59,8 @@ static always_inline unsigned long __cmpxchg(
>  {
>  	unsigned long oldval, res;
>  
> +	prefetchw((const void *)ptr);
> +
>  	switch (size) {
>  	case 1:
>  		do {
> 


-- 
Julien Grall

next prev parent reply	other threads:[~2014-07-25 15:42 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-25 15:22 [PATCH 1/2] xen: arm: update arm64 assembly primitives to Linux v3.16-rc6 Ian Campbell
2014-07-25 15:22 ` [PATCH 2/2] xen: arm: update arm32 " Ian Campbell
2014-07-25 15:42   ` Julien Grall [this message]
2014-07-25 15:48     ` Ian Campbell
2014-07-25 15:48       ` Julien Grall
2014-07-25 16:03         ` Ian Campbell
2014-07-25 16:13           ` Ian Campbell
2014-07-25 16:20             ` Julien Grall
2014-07-25 16:17           ` Julien Grall
2014-07-25 16:23             ` Ian Campbell
2014-07-25 15:36 ` [PATCH 1/2] xen: arm: update arm64 " Julien Grall
2014-08-04 16:16   ` Ian Campbell
2014-07-25 15:43 ` Ian Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53D27AF3.5070706@linaro.org \
    --to=julien.grall@linaro.org \
    --cc=ian.campbell@citrix.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.