All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/17] xen: arm: resync low level asm primitive from Linux
@ 2014-03-20 15:45 Ian Campbell
  2014-03-20 15:45 ` [PATCH 01/17] xen: x86 & generic: change to __builtin_prefetch() Ian Campbell
                   ` (16 more replies)
  0 siblings, 17 replies; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Julien Grall, Stefano Stabellini, Keir Fraser, Tim Deegan,
	Jan Beulich

(Jan/Keir -- only the first patch is of interest to you)

The following resyncs the bitops, atomics, cmpxchg and various optimised
library functions (str*, mem*, clear_page) from Linux. It also adds
various additional optimised variants, especially for arm64 which was
lacking them in Linux when we started.

One area which I have skipped is spinlocks, the generic infrastructure
is pretty different between Xen and Linux so this would need more
thought (it would have included a switch to ticket locks on arm64 for
example..).

I've combined multiple Linux changes into a single Xen change where I
thought it made sense, i.e. for smaller changes even if they are
independent, but for large and complicated changes I've kept things
separate.

As part of this I've also reinstated Linux coding style (in particular
the use of hard tabs) to make life easier when comparing things. This
was always the intention but it seems one or two files got accidentally
reindented at some point.

This booted a guest on both Midway and Xgene. I haven't done any actual
perf measurement, having assumed that whoever wrote this for Linux found
them to be worthwhile enough...

Ian.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 01/17] xen: x86 & generic: change to __builtin_prefetch()
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
  2014-03-20 16:12   ` Jan Beulich
  2014-03-20 15:45 ` [PATCH 02/17] xen: arm32: resync bitops with Linux v3.14-rc7 Ian Campbell
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, stefano.stabellini, julien.grall, tim,
	Jan Beulich

Quoting Andi Kleen in Linux b483570a13be from 2007:
    gcc 3.2+ supports __builtin_prefetch, so it's possible to use it on all
    architectures. Change the generic fallback in linux/prefetch.h to use it
    instead of noping it out. gcc should do the right thing when the
    architecture doesn't support prefetching

    Undefine the x86-64 inline assembler version and use the fallback.

ARM wants to use the builtins.

Fix a pair of spelling errors, one of which was from Lucas De Marchi in the
Linux tree.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/include/xen/prefetch.h |   13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/xen/include/xen/prefetch.h b/xen/include/xen/prefetch.h
index 8d7d3ff..ba73998 100644
--- a/xen/include/xen/prefetch.h
+++ b/xen/include/xen/prefetch.h
@@ -28,24 +28,17 @@
 	prefetchw(x)	- prefetches the cacheline at "x" for write
 	spin_lock_prefetch(x) - prefectches the spinlock *x for taking
 	
-	there is also PREFETCH_STRIDE which is the architecure-prefered 
+	there is also PREFETCH_STRIDE which is the architecture-preferred
 	"lookahead" size for prefetching streamed operations.
 	
 */
 
-/*
- *	These cannot be do{}while(0) macros. See the mental gymnastics in
- *	the loop macro.
- */
- 
 #ifndef ARCH_HAS_PREFETCH
-#define ARCH_HAS_PREFETCH
-static inline void prefetch(const void *x) {;}
+#define prefetch(x) __builtin_prefetch(x)
 #endif
 
 #ifndef ARCH_HAS_PREFETCHW
-#define ARCH_HAS_PREFETCHW
-static inline void prefetchw(const void *x) {;}
+#define prefetchw(x) __builtin_prefetch(x,1)
 #endif
 
 #ifndef ARCH_HAS_SPINLOCK_PREFETCH
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 02/17] xen: arm32: resync bitops with Linux v3.14-rc7
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
  2014-03-20 15:45 ` [PATCH 01/17] xen: x86 & generic: change to __builtin_prefetch() Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
  2014-03-20 17:13   ` Julien Grall
  2014-03-20 15:45 ` [PATCH 03/17] xen: arm32: ensure cmpxchg has full barrier semantics Ian Campbell
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

This pulls in the following Linux commits:

commit c36ef4b1762302a493c6cb754073bded084700e2
Author: Will Deacon <will.deacon@arm.com>
Date:   Wed Nov 23 11:28:25 2011 +0100

    ARM: 7171/1: unwind: add unwind directives to bitops assembly macros

    The bitops functions (e.g. _test_and_set_bit) on ARM do not have unwind
    annotations and therefore the kernel cannot backtrace out of them on a
    fatal error (for example, NULL pointer dereference).

    This patch annotates the bitops assembly macros with UNWIND annotations
    so that we can produce a meaningful backtrace on error. Callers of the
    macros are modified to pass their function name as a macro parameter,
    enforcing that the macros are used as standalone function implementations.

    Acked-by: Dave Martin <dave.martin@linaro.org>
    Signed-off-by: Will Deacon <will.deacon@arm.com>
    Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

commit d779c07dd72098a7416d907494f958213b7726f3
Author: Will Deacon <will.deacon@arm.com>
Date:   Thu Jun 27 12:01:51 2013 +0100

    ARM: bitops: prefetch the destination word for write prior to strex

    The cost of changing a cacheline from shared to exclusive state can be
    significant, especially when this is triggered by an exclusive store,
    since it may result in having to retry the transaction.

    This patch prefixes our atomic bitops implementation with prefetchw,
    to try and grab the line in exclusive state from the start. The testop
    macro is left alone, since the barrier semantics limit the usefulness
    of prefetching data.

    Acked-by: Nicolas Pitre <nico@linaro.org>
    Signed-off-by: Will Deacon <will.deacon@arm.com>

commit b7ec699405f55667caeb46d96229d75bf33a83ad
Author: Will Deacon <will.deacon@arm.com>
Date:   Tue Nov 19 15:46:11 2013 +0100

    ARM: 7893/1: bitops: only emit .arch_extension mp if CONFIG_SMP

    Uwe reported a build failure when targetting a NOMMU platform with my
    recent prefetch changes:

      arch/arm/lib/changebit.S: Assembler messages:
      arch/arm/lib/changebit.S:15: Error: architectural extension `mp' is
                        not allowed for the current base architecture

    This is due to use of the .arch_extension mp directive immediately prior
    to an ALT_SMP(...) instruction. Whilst the ALT_SMP macro will expand to
    nothing if !CONFIG_SMP, gas will still choke on the directive.

    This patch fixes the issue by only emitting the sequence (including the
    directive) if CONFIG_SMP=y.

    Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
    Signed-off-by: Will Deacon <will.deacon@arm.com>
    Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/arm32/lib/bitops.h        |   17 +++++++++++++++--
 xen/arch/arm/arm32/lib/changebit.S     |    4 +---
 xen/arch/arm/arm32/lib/clearbit.S      |    4 +---
 xen/arch/arm/arm32/lib/setbit.S        |    4 +---
 xen/arch/arm/arm32/lib/testchangebit.S |    4 +---
 xen/arch/arm/arm32/lib/testclearbit.S  |    4 +---
 xen/arch/arm/arm32/lib/testsetbit.S    |    4 +---
 7 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/xen/arch/arm/arm32/lib/bitops.h b/xen/arch/arm/arm32/lib/bitops.h
index 689f2e8..25784c3 100644
--- a/xen/arch/arm/arm32/lib/bitops.h
+++ b/xen/arch/arm/arm32/lib/bitops.h
@@ -1,13 +1,20 @@
 #include <xen/config.h>
 
 #if __LINUX_ARM_ARCH__ >= 6
-	.macro	bitop, instr
+	.macro	bitop, name, instr
+ENTRY(	\name		)
+UNWIND(	.fnstart	)
 	ands	ip, r1, #3
 	strneb	r1, [ip]		@ assert word-aligned
 	mov	r2, #1
 	and	r3, r0, #31		@ Get bit offset
 	mov	r0, r0, lsr #5
 	add	r1, r1, r0, lsl #2	@ Get word offset
+#if __LINUX_ARM_ARCH__ >= 7 && defined(CONFIG_SMP)
+	.arch_extension	mp
+	ALT_SMP(W(pldw)	[r1])
+	ALT_UP(W(nop))
+#endif
 	mov	r3, r2, lsl r3
 1:	ldrex	r2, [r1]
 	\instr	r2, r2, r3
@@ -15,9 +22,13 @@
 	cmp	r0, #0
 	bne	1b
 	bx	lr
+UNWIND(	.fnend		)
+ENDPROC(\name		)
 	.endm
 
-	.macro	testop, instr, store
+	.macro	testop, name, instr, store
+ENTRY(	\name		)
+UNWIND(	.fnstart	)
 	ands	ip, r1, #3
 	strneb	r1, [ip]		@ assert word-aligned
 	mov	r2, #1
@@ -36,6 +47,8 @@
 	cmp	r0, #0
 	movne	r0, #1
 2:	bx	lr
+UNWIND(	.fnend		)
+ENDPROC(\name		)
 	.endm
 #else
 	.macro	bitop, name, instr
diff --git a/xen/arch/arm/arm32/lib/changebit.S b/xen/arch/arm/arm32/lib/changebit.S
index 62954bc..11f41d2 100644
--- a/xen/arch/arm/arm32/lib/changebit.S
+++ b/xen/arch/arm/arm32/lib/changebit.S
@@ -13,6 +13,4 @@
 #include "bitops.h"
                 .text
 
-ENTRY(_change_bit)
-	bitop	eor
-ENDPROC(_change_bit)
+bitop	_change_bit, eor
diff --git a/xen/arch/arm/arm32/lib/clearbit.S b/xen/arch/arm/arm32/lib/clearbit.S
index 42ce416..1b6a569 100644
--- a/xen/arch/arm/arm32/lib/clearbit.S
+++ b/xen/arch/arm/arm32/lib/clearbit.S
@@ -14,6 +14,4 @@
 #include "bitops.h"
                 .text
 
-ENTRY(_clear_bit)
-	bitop	bic
-ENDPROC(_clear_bit)
+bitop	_clear_bit, bic
diff --git a/xen/arch/arm/arm32/lib/setbit.S b/xen/arch/arm/arm32/lib/setbit.S
index c828851..1f4ef56 100644
--- a/xen/arch/arm/arm32/lib/setbit.S
+++ b/xen/arch/arm/arm32/lib/setbit.S
@@ -13,6 +13,4 @@
 #include "bitops.h"
 	.text
 
-ENTRY(_set_bit)
-	bitop	orr
-ENDPROC(_set_bit)
+bitop	_set_bit, orr
diff --git a/xen/arch/arm/arm32/lib/testchangebit.S b/xen/arch/arm/arm32/lib/testchangebit.S
index a7f527c..7f4635c 100644
--- a/xen/arch/arm/arm32/lib/testchangebit.S
+++ b/xen/arch/arm/arm32/lib/testchangebit.S
@@ -13,6 +13,4 @@
 #include "bitops.h"
                 .text
 
-ENTRY(_test_and_change_bit)
-	testop	eor, str
-ENDPROC(_test_and_change_bit)
+testop	_test_and_change_bit, eor, str
diff --git a/xen/arch/arm/arm32/lib/testclearbit.S b/xen/arch/arm/arm32/lib/testclearbit.S
index 8f39c72..4d4152f 100644
--- a/xen/arch/arm/arm32/lib/testclearbit.S
+++ b/xen/arch/arm/arm32/lib/testclearbit.S
@@ -13,6 +13,4 @@
 #include "bitops.h"
                 .text
 
-ENTRY(_test_and_clear_bit)
-	testop	bicne, strne
-ENDPROC(_test_and_clear_bit)
+testop	_test_and_clear_bit, bicne, strne
diff --git a/xen/arch/arm/arm32/lib/testsetbit.S b/xen/arch/arm/arm32/lib/testsetbit.S
index 1b8d273..54f48f9 100644
--- a/xen/arch/arm/arm32/lib/testsetbit.S
+++ b/xen/arch/arm/arm32/lib/testsetbit.S
@@ -13,6 +13,4 @@
 #include "bitops.h"
                 .text
 
-ENTRY(_test_and_set_bit)
-	testop	orreq, streq
-ENDPROC(_test_and_set_bit)
+testop	_test_and_set_bit, orreq, streq
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 03/17] xen: arm32: ensure cmpxchg has full barrier semantics
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
  2014-03-20 15:45 ` [PATCH 01/17] xen: x86 & generic: change to __builtin_prefetch() Ian Campbell
  2014-03-20 15:45 ` [PATCH 02/17] xen: arm32: resync bitops with Linux v3.14-rc7 Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
  2014-03-20 17:22   ` Julien Grall
  2014-03-20 15:45 ` [PATCH 04/17] xen: arm32: replace hard tabs in atomics.h Ian Campbell
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Unrelated reads/writes should not pass the xchg.

Provide cmpxchg_local for parity with arm64, although it appears to be unused.
It also helps make the reason for the separation of __cmpxchg_mb more
apparent.

With this our cmpxchg is in sync with Linux v3.14-rc7.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
We got our cmpxchg implementation from Linux which AFAICS has always had these
additional barriers. I don't recall us having decided that Xen barriers should
not have this property as well, and if we did we were remiss in not adding a
comment etc... If my memory is faulty then I am happy to replace thispatch
with one which adds a comment instead.
---
 xen/include/asm-arm/arm32/system.h |   26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/xen/include/asm-arm/arm32/system.h b/xen/include/asm-arm/arm32/system.h
index 9f233fe..dfaa3b6 100644
--- a/xen/include/asm-arm/arm32/system.h
+++ b/xen/include/asm-arm/arm32/system.h
@@ -113,9 +113,29 @@ static always_inline unsigned long __cmpxchg(
     return oldval;
 }
 
-#define cmpxchg(ptr,o,n)                                                \
-    ((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),            \
-                                   (unsigned long)(n),sizeof(*(ptr))))
+static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
+					 unsigned long new, int size)
+{
+	unsigned long ret;
+
+	smp_mb();
+	ret = __cmpxchg(ptr, old, new, size);
+	smp_mb();
+
+	return ret;
+}
+
+#define cmpxchg(ptr,o,n)						\
+	((__typeof__(*(ptr)))__cmpxchg_mb((ptr),			\
+					  (unsigned long)(o),		\
+					  (unsigned long)(n),		\
+					  sizeof(*(ptr))))
+
+#define cmpxchg_local(ptr,o,n)						\
+	((__typeof__(*(ptr)))__cmpxchg((ptr),				\
+				       (unsigned long)(o),		\
+				       (unsigned long)(n),		\
+				       sizeof(*(ptr))))
 
 #define local_irq_disable() asm volatile ( "cpsid i @ local_irq_disable\n" : : : "cc" )
 #define local_irq_enable()  asm volatile ( "cpsie i @ local_irq_enable\n" : : : "cc" )
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 04/17] xen: arm32: replace hard tabs in atomics.h
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
                   ` (2 preceding siblings ...)
  2014-03-20 15:45 ` [PATCH 03/17] xen: arm32: ensure cmpxchg has full barrier semantics Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
  2014-03-20 17:23   ` Julien Grall
  2014-03-20 15:45 ` [PATCH 05/17] xen: arm32: resync atomics with (almost) v3.14-rc7 Ian Campbell
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

This file is from Linux and the intention was to keep the formatting the same
to make resyncing easier. Put the hardtabs back and adjust the emacs magic to
reflect the desired use of whitespace.

Adjust the 64-bit  emacs magic too.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/include/asm-arm/arm32/atomic.h |  166 ++++++++++++++++++------------------
 xen/include/asm-arm/arm64/atomic.h |    4 +-
 2 files changed, 85 insertions(+), 85 deletions(-)

diff --git a/xen/include/asm-arm/arm32/atomic.h b/xen/include/asm-arm/arm32/atomic.h
index 523c745..3f024d4 100644
--- a/xen/include/asm-arm/arm32/atomic.h
+++ b/xen/include/asm-arm/arm32/atomic.h
@@ -18,122 +18,122 @@
  */
 static inline void atomic_add(int i, atomic_t *v)
 {
-        unsigned long tmp;
-        int result;
-
-        __asm__ __volatile__("@ atomic_add\n"
-"1:     ldrex   %0, [%3]\n"
-"       add     %0, %0, %4\n"
-"       strex   %1, %0, [%3]\n"
-"       teq     %1, #0\n"
-"       bne     1b"
-        : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
-        : "r" (&v->counter), "Ir" (i)
-        : "cc");
+	unsigned long tmp;
+	int result;
+
+	__asm__ __volatile__("@ atomic_add\n"
+"1:	ldrex	%0, [%3]\n"
+"	add	%0, %0, %4\n"
+"	strex	%1, %0, [%3]\n"
+"	teq	%1, #0\n"
+"	bne	1b"
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
+	: "r" (&v->counter), "Ir" (i)
+	: "cc");
 }
 
 static inline int atomic_add_return(int i, atomic_t *v)
 {
-        unsigned long tmp;
-        int result;
+	unsigned long tmp;
+	int result;
 
-        smp_mb();
+	smp_mb();
 
-        __asm__ __volatile__("@ atomic_add_return\n"
-"1:     ldrex   %0, [%3]\n"
-"       add     %0, %0, %4\n"
-"       strex   %1, %0, [%3]\n"
-"       teq     %1, #0\n"
-"       bne     1b"
-        : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
-        : "r" (&v->counter), "Ir" (i)
-        : "cc");
+	__asm__ __volatile__("@ atomic_add_return\n"
+"1:	ldrex	%0, [%3]\n"
+"	add	%0, %0, %4\n"
+"	strex	%1, %0, [%3]\n"
+"	teq	%1, #0\n"
+"	bne	1b"
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
+	: "r" (&v->counter), "Ir" (i)
+	: "cc");
 
-        smp_mb();
+	smp_mb();
 
-        return result;
+	return result;
 }
 
 static inline void atomic_sub(int i, atomic_t *v)
 {
-        unsigned long tmp;
-        int result;
-
-        __asm__ __volatile__("@ atomic_sub\n"
-"1:     ldrex   %0, [%3]\n"
-"       sub     %0, %0, %4\n"
-"       strex   %1, %0, [%3]\n"
-"       teq     %1, #0\n"
-"       bne     1b"
-        : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
-        : "r" (&v->counter), "Ir" (i)
-        : "cc");
+	unsigned long tmp;
+	int result;
+
+	__asm__ __volatile__("@ atomic_sub\n"
+"1:	ldrex	%0, [%3]\n"
+"	sub	%0, %0, %4\n"
+"	strex	%1, %0, [%3]\n"
+"	teq	%1, #0\n"
+"	bne	1b"
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
+	: "r" (&v->counter), "Ir" (i)
+	: "cc");
 }
 
 static inline int atomic_sub_return(int i, atomic_t *v)
 {
-        unsigned long tmp;
-        int result;
+	unsigned long tmp;
+	int result;
 
-        smp_mb();
+	smp_mb();
 
-        __asm__ __volatile__("@ atomic_sub_return\n"
-"1:     ldrex   %0, [%3]\n"
-"       sub     %0, %0, %4\n"
-"       strex   %1, %0, [%3]\n"
-"       teq     %1, #0\n"
-"       bne     1b"
-        : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
-        : "r" (&v->counter), "Ir" (i)
-        : "cc");
+	__asm__ __volatile__("@ atomic_sub_return\n"
+"1:	ldrex	%0, [%3]\n"
+"	sub	%0, %0, %4\n"
+"	strex	%1, %0, [%3]\n"
+"	teq	%1, #0\n"
+"	bne	1b"
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
+	: "r" (&v->counter), "Ir" (i)
+	: "cc");
 
-        smp_mb();
+	smp_mb();
 
-        return result;
+	return result;
 }
 
 static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
 {
-        unsigned long oldval, res;
+	unsigned long oldval, res;
 
-        smp_mb();
+	smp_mb();
 
-        do {
-                __asm__ __volatile__("@ atomic_cmpxchg\n"
-                "ldrex  %1, [%3]\n"
-                "mov    %0, #0\n"
-                "teq    %1, %4\n"
-                "strexeq %0, %5, [%3]\n"
-                    : "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
-                    : "r" (&ptr->counter), "Ir" (old), "r" (new)
-                    : "cc");
-        } while (res);
+	do {
+		__asm__ __volatile__("@ atomic_cmpxchg\n"
+		"ldrex	%1, [%3]\n"
+		"mov	%0, #0\n"
+		"teq	%1, %4\n"
+		"strexeq %0, %5, [%3]\n"
+		    : "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
+		    : "r" (&ptr->counter), "Ir" (old), "r" (new)
+		    : "cc");
+	} while (res);
 
-        smp_mb();
+	smp_mb();
 
-        return oldval;
+	return oldval;
 }
 
 static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
 {
-        unsigned long tmp, tmp2;
-
-        __asm__ __volatile__("@ atomic_clear_mask\n"
-"1:     ldrex   %0, [%3]\n"
-"       bic     %0, %0, %4\n"
-"       strex   %1, %0, [%3]\n"
-"       teq     %1, #0\n"
-"       bne     1b"
-        : "=&r" (tmp), "=&r" (tmp2), "+Qo" (*addr)
-        : "r" (addr), "Ir" (mask)
-        : "cc");
+	unsigned long tmp, tmp2;
+
+	__asm__ __volatile__("@ atomic_clear_mask\n"
+"1:	ldrex	%0, [%3]\n"
+"	bic	%0, %0, %4\n"
+"	strex	%1, %0, [%3]\n"
+"	teq	%1, #0\n"
+"	bne	1b"
+	: "=&r" (tmp), "=&r" (tmp2), "+Qo" (*addr)
+	: "r" (addr), "Ir" (mask)
+	: "cc");
 }
 
-#define atomic_inc(v)           atomic_add(1, v)
-#define atomic_dec(v)           atomic_sub(1, v)
+#define atomic_inc(v)		atomic_add(1, v)
+#define atomic_dec(v)		atomic_sub(1, v)
 
-#define atomic_inc_and_test(v)  (atomic_add_return(1, v) == 0)
-#define atomic_dec_and_test(v)  (atomic_sub_return(1, v) == 0)
+#define atomic_inc_and_test(v)	(atomic_add_return(1, v) == 0)
+#define atomic_dec_and_test(v)	(atomic_sub_return(1, v) == 0)
 #define atomic_inc_return(v)    (atomic_add_return(1, v))
 #define atomic_dec_return(v)    (atomic_sub_return(1, v))
 #define atomic_sub_and_test(i, v) (atomic_sub_return(i, v) == 0)
@@ -145,7 +145,7 @@ static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
  * Local variables:
  * mode: C
  * c-file-style: "BSD"
- * c-basic-offset: 4
- * indent-tabs-mode: nil
+ * c-basic-offset: 8
+ * indent-tabs-mode: t
  * End:
  */
diff --git a/xen/include/asm-arm/arm64/atomic.h b/xen/include/asm-arm/arm64/atomic.h
index a279755..b04e6d5 100644
--- a/xen/include/asm-arm/arm64/atomic.h
+++ b/xen/include/asm-arm/arm64/atomic.h
@@ -157,7 +157,7 @@ static inline int __atomic_add_unless(atomic_t *v, int a, int u)
  * Local variables:
  * mode: C
  * c-file-style: "BSD"
- * c-basic-offset: 4
- * indent-tabs-mode: nil
+ * c-basic-offset: 8
+ * indent-tabs-mode: t
  * End:
  */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 05/17] xen: arm32: resync atomics with (almost) v3.14-rc7
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
                   ` (3 preceding siblings ...)
  2014-03-20 15:45 ` [PATCH 04/17] xen: arm32: replace hard tabs in atomics.h Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
  2014-03-20 17:27   ` Julien Grall
  2014-03-20 15:45 ` [PATCH 06/17] xen: arm32: resync mem* with Linux v3.14-rc7 Ian Campbell
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Almost because I omitting aed3a4e "ARM: 7868/1: arm/arm64: remove
atomic_clear_mask() ..." which I will apply to both arm32 and arm64
simultaneously in a later patch.

This pulls in the following Linux patches:

commit f38d999c4d16fc0fce4270374f15fbb2d8713c09
Author: Will Deacon <will.deacon@arm.com>
Date:   Thu Jul 4 11:43:18 2013 +0100

    ARM: atomics: prefetch the destination word for write prior to strex

    The cost of changing a cacheline from shared to exclusive state can be
    significant, especially when this is triggered by an exclusive store,
    since it may result in having to retry the transaction.

    This patch prefixes our atomic access implementations with pldw
    instructions (on CPUs which support them) to try and grab the line in
    exclusive state from the start. Only the barrier-less functions are
    updated, since memory barriers can limit the usefulness of prefetching
    data.

    Acked-by: Nicolas Pitre <nico@linaro.org>
    Signed-off-by: Will Deacon <will.deacon@arm.com>

commit 4dcc1cf7316a26e112f5c9fcca531ff98ef44700
Author: Chen Gang <gang.chen@asianux.com>
Date:   Sat Oct 26 15:07:25 2013 +0100

    ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval

    For atomic_cmpxchg(), the type of 'oldval' need be 'int' to match the
    type of "*ptr" (used by 'ldrex' instruction) and 'old' (used by 'teq'
    instruction).

    Reviewed-by: Will Deacon <will.deacon@arm.com>
    Signed-off-by: Chen Gang <gang.chen@asianux.com>
    Signed-off-by: Will Deacon <will.deacon@arm.com>
    Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/include/asm-arm/arm32/atomic.h |    6 +++++-
 xen/include/asm-arm/atomic.h       |    1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/xen/include/asm-arm/arm32/atomic.h b/xen/include/asm-arm/arm32/atomic.h
index 3f024d4..d309f66 100644
--- a/xen/include/asm-arm/arm32/atomic.h
+++ b/xen/include/asm-arm/arm32/atomic.h
@@ -21,6 +21,7 @@ static inline void atomic_add(int i, atomic_t *v)
 	unsigned long tmp;
 	int result;
 
+	prefetchw(&v->counter);
 	__asm__ __volatile__("@ atomic_add\n"
 "1:	ldrex	%0, [%3]\n"
 "	add	%0, %0, %4\n"
@@ -59,6 +60,7 @@ static inline void atomic_sub(int i, atomic_t *v)
 	unsigned long tmp;
 	int result;
 
+	prefetchw(&v->counter);
 	__asm__ __volatile__("@ atomic_sub\n"
 "1:	ldrex	%0, [%3]\n"
 "	sub	%0, %0, %4\n"
@@ -94,7 +96,8 @@ static inline int atomic_sub_return(int i, atomic_t *v)
 
 static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
 {
-	unsigned long oldval, res;
+	int oldval;
+	unsigned long res;
 
 	smp_mb();
 
@@ -118,6 +121,7 @@ static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
 {
 	unsigned long tmp, tmp2;
 
+	prefetchw(addr);
 	__asm__ __volatile__("@ atomic_clear_mask\n"
 "1:	ldrex	%0, [%3]\n"
 "	bic	%0, %0, %4\n"
diff --git a/xen/include/asm-arm/atomic.h b/xen/include/asm-arm/atomic.h
index 69c8f3f..2c92de9 100644
--- a/xen/include/asm-arm/atomic.h
+++ b/xen/include/asm-arm/atomic.h
@@ -2,6 +2,7 @@
 #define __ARCH_ARM_ATOMIC__
 
 #include <xen/config.h>
+#include <xen/prefetch.h>
 #include <asm/system.h>
 
 #define build_atomic_read(name, size, width, type, reg)\
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 06/17] xen: arm32: resync mem* with Linux v3.14-rc7
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
                   ` (4 preceding siblings ...)
  2014-03-20 15:45 ` [PATCH 05/17] xen: arm32: resync atomics with (almost) v3.14-rc7 Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
  2014-03-20 17:29   ` Julien Grall
  2014-03-20 15:45 ` [PATCH 07/17] xen: arm32: add optimised memchr routine Ian Campbell
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

This pulls in the following Linux commits:
commit 455bd4c430b0c0a361f38e8658a0d6cb469942b5
Author: Ivan Djelic <ivan.djelic@parrot.com>
Date:   Wed Mar 6 20:09:27 2013 +0100

    ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimi

    Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
    assumptions about the implementation of memset and similar functions.
    The current ARM optimized memset code does not return the value of
    its first argument, as is usually expected from standard implementations.

    For instance in the following function:

    void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waite
    {
        memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
        waiter->magic = waiter;
        INIT_LIST_HEAD(&waiter->list);
    }

    compiled as:

    800554d0 <debug_mutex_lock_common>:
    800554d0:       e92d4008        push    {r3, lr}
    800554d4:       e1a00001        mov     r0, r1
    800554d8:       e3a02010        mov     r2, #16 ; 0x10
    800554dc:       e3a01011        mov     r1, #17 ; 0x11
    800554e0:       eb04426e        bl      80165ea0 <memset>
    800554e4:       e1a03000        mov     r3, r0
    800554e8:       e583000c        str     r0, [r3, #12]
    800554ec:       e5830000        str     r0, [r3]
    800554f0:       e5830004        str     r0, [r3, #4]
    800554f4:       e8bd8008        pop     {r3, pc}

    GCC assumes memset returns the value of pointer 'waiter' in register r0; ca
    register/memory corruptions.

    This patch fixes the return value of the assembly version of memset.
    It adds a 'mov' instruction and merges an additional load+store into
    existing load/store instructions.
    For ease of review, here is a breakdown of the patch into 4 simple steps:

    Step 1
    ======
    Perform the following substitutions:
    ip -> r8, then
    r0 -> ip,
    and insert 'mov ip, r0' as the first statement of the function.
    At this point, we have a memset() implementation returning the proper resul
    but corrupting r8 on some paths (the ones that were using ip).

    Step 2
    ======
    Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:

    save r8:
    -       str     lr, [sp, #-4]!
    +       stmfd   sp!, {r8, lr}

    and restore r8 on both exit paths:
    -       ldmeqfd sp!, {pc}               @ Now <64 bytes to go.
    +       ldmeqfd sp!, {r8, pc}           @ Now <64 bytes to go.
    (...)
            tst     r2, #16
            stmneia ip!, {r1, r3, r8, lr}
    -       ldr     lr, [sp], #4
    +       ldmfd   sp!, {r8, lr}

    Step 3
    ======
    Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:

    save r8:
    -       stmfd   sp!, {r4-r7, lr}
    +       stmfd   sp!, {r4-r8, lr}

    and restore r8 on both exit paths:
            bgt     3b
    -       ldmeqfd sp!, {r4-r7, pc}
    +       ldmeqfd sp!, {r4-r8, pc}
    (...)
            tst     r2, #16
            stmneia ip!, {r4-r7}
    -       ldmfd   sp!, {r4-r7, lr}
    +       ldmfd   sp!, {r4-r8, lr}

    Step 4
    ======
    Rewrite register list "r4-r7, r8" as "r4-r8".

    Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
    Reviewed-by: Nicolas Pitre <nico@linaro.org>
    Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
    Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

commit 418df63adac56841ef6b0f1fcf435bc64d4ed177
Author: Nicolas Pitre <nicolas.pitre@linaro.org>
Date:   Tue Mar 12 13:00:42 2013 +0100

    ARM: 7670/1: fix the memset fix

    Commit 455bd4c430b0 ("ARM: 7668/1: fix memset-related crashes caused by
    recent GCC (4.7.2) optimizations") attempted to fix a compliance issue
    with the memset return value.  However the memset itself became broken
    by that patch for misaligned pointers.

    This fixes the above by branching over the entry code from the
    misaligned fixup code to avoid reloading the original pointer.

    Also, because the function entry alignment is wrong in the Thumb mode
    compilation, that fixup code is moved to the end.

    While at it, the entry instructions are slightly reworked to help dual
    issue pipelines.

    Signed-off-by: Nicolas Pitre <nico@linaro.org>
    Tested-by: Alexander Holler <holler@ahsoftware.de>
    Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/arm32/lib/memset.S |  100 +++++++++++++++++++--------------------
 1 file changed, 48 insertions(+), 52 deletions(-)

diff --git a/xen/arch/arm/arm32/lib/memset.S b/xen/arch/arm/arm32/lib/memset.S
index d2937a3..c8ab257 100644
--- a/xen/arch/arm/arm32/lib/memset.S
+++ b/xen/arch/arm/arm32/lib/memset.S
@@ -16,27 +16,15 @@
 
 	.text
 	.align	5
-	.word	0
-
-1:	subs	r2, r2, #4		@ 1 do we have enough
-	blt	5f			@ 1 bytes to align with?
-	cmp	r3, #2			@ 1
-	strltb	r1, [r0], #1		@ 1
-	strleb	r1, [r0], #1		@ 1
-	strb	r1, [r0], #1		@ 1
-	add	r2, r2, r3		@ 1 (r2 = r2 - (4 - r3))
-/*
- * The pointer is now aligned and the length is adjusted.  Try doing the
- * memset again.
- */
 
 ENTRY(memset)
 	ands	r3, r0, #3		@ 1 unaligned?
-	bne	1b			@ 1
+	mov	ip, r0			@ preserve r0 as return value
+	bne	6f			@ 1
 /*
- * we know that the pointer in r0 is aligned to a word boundary.
+ * we know that the pointer in ip is aligned to a word boundary.
  */
-	orr	r1, r1, r1, lsl #8
+1:	orr	r1, r1, r1, lsl #8
 	orr	r1, r1, r1, lsl #16
 	mov	r3, r1
 	cmp	r2, #16
@@ -45,29 +33,28 @@ ENTRY(memset)
 #if ! CALGN(1)+0
 
 /*
- * We need an extra register for this loop - save the return address and
- * use the LR
+ * We need 2 extra registers for this loop - use r8 and the LR
  */
-	str	lr, [sp, #-4]!
-	mov	ip, r1
+	stmfd	sp!, {r8, lr}
+	mov	r8, r1
 	mov	lr, r1
 
 2:	subs	r2, r2, #64
-	stmgeia	r0!, {r1, r3, ip, lr}	@ 64 bytes at a time.
-	stmgeia	r0!, {r1, r3, ip, lr}
-	stmgeia	r0!, {r1, r3, ip, lr}
-	stmgeia	r0!, {r1, r3, ip, lr}
+	stmgeia	ip!, {r1, r3, r8, lr}	@ 64 bytes at a time.
+	stmgeia	ip!, {r1, r3, r8, lr}
+	stmgeia	ip!, {r1, r3, r8, lr}
+	stmgeia	ip!, {r1, r3, r8, lr}
 	bgt	2b
-	ldmeqfd	sp!, {pc}		@ Now <64 bytes to go.
+	ldmeqfd	sp!, {r8, pc}		@ Now <64 bytes to go.
 /*
  * No need to correct the count; we're only testing bits from now on
  */
 	tst	r2, #32
-	stmneia	r0!, {r1, r3, ip, lr}
-	stmneia	r0!, {r1, r3, ip, lr}
+	stmneia	ip!, {r1, r3, r8, lr}
+	stmneia	ip!, {r1, r3, r8, lr}
 	tst	r2, #16
-	stmneia	r0!, {r1, r3, ip, lr}
-	ldr	lr, [sp], #4
+	stmneia	ip!, {r1, r3, r8, lr}
+	ldmfd	sp!, {r8, lr}
 
 #else
 
@@ -76,54 +63,63 @@ ENTRY(memset)
  * whole cache lines at once.
  */
 
-	stmfd	sp!, {r4-r7, lr}
+	stmfd	sp!, {r4-r8, lr}
 	mov	r4, r1
 	mov	r5, r1
 	mov	r6, r1
 	mov	r7, r1
-	mov	ip, r1
+	mov	r8, r1
 	mov	lr, r1
 
 	cmp	r2, #96
-	tstgt	r0, #31
+	tstgt	ip, #31
 	ble	3f
 
-	and	ip, r0, #31
-	rsb	ip, ip, #32
-	sub	r2, r2, ip
-	movs	ip, ip, lsl #(32 - 4)
-	stmcsia	r0!, {r4, r5, r6, r7}
-	stmmiia	r0!, {r4, r5}
-	tst	ip, #(1 << 30)
-	mov	ip, r1
-	strne	r1, [r0], #4
+	and	r8, ip, #31
+	rsb	r8, r8, #32
+	sub	r2, r2, r8
+	movs	r8, r8, lsl #(32 - 4)
+	stmcsia	ip!, {r4, r5, r6, r7}
+	stmmiia	ip!, {r4, r5}
+	tst	r8, #(1 << 30)
+	mov	r8, r1
+	strne	r1, [ip], #4
 
 3:	subs	r2, r2, #64
-	stmgeia	r0!, {r1, r3-r7, ip, lr}
-	stmgeia	r0!, {r1, r3-r7, ip, lr}
+	stmgeia	ip!, {r1, r3-r8, lr}
+	stmgeia	ip!, {r1, r3-r8, lr}
 	bgt	3b
-	ldmeqfd	sp!, {r4-r7, pc}
+	ldmeqfd	sp!, {r4-r8, pc}
 
 	tst	r2, #32
-	stmneia	r0!, {r1, r3-r7, ip, lr}
+	stmneia	ip!, {r1, r3-r8, lr}
 	tst	r2, #16
-	stmneia	r0!, {r4-r7}
-	ldmfd	sp!, {r4-r7, lr}
+	stmneia	ip!, {r4-r7}
+	ldmfd	sp!, {r4-r8, lr}
 
 #endif
 
 4:	tst	r2, #8
-	stmneia	r0!, {r1, r3}
+	stmneia	ip!, {r1, r3}
 	tst	r2, #4
-	strne	r1, [r0], #4
+	strne	r1, [ip], #4
 /*
  * When we get here, we've got less than 4 bytes to zero.  We
  * may have an unaligned pointer as well.
  */
 5:	tst	r2, #2
-	strneb	r1, [r0], #1
-	strneb	r1, [r0], #1
+	strneb	r1, [ip], #1
+	strneb	r1, [ip], #1
 	tst	r2, #1
-	strneb	r1, [r0], #1
+	strneb	r1, [ip], #1
 	mov	pc, lr
+
+6:	subs	r2, r2, #4		@ 1 do we have enough
+	blt	5b			@ 1 bytes to align with?
+	cmp	r3, #2			@ 1
+	strltb	r1, [ip], #1		@ 1
+	strleb	r1, [ip], #1		@ 1
+	strb	r1, [ip], #1		@ 1
+	add	r2, r2, r3		@ 1 (r2 = r2 - (4 - r3))
+	b	1b
 ENDPROC(memset)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 07/17] xen: arm32: add optimised memchr routine
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
                   ` (5 preceding siblings ...)
  2014-03-20 15:45 ` [PATCH 06/17] xen: arm32: resync mem* with Linux v3.14-rc7 Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
  2014-03-20 17:32   ` Julien Grall
  2014-03-20 15:45 ` [PATCH 08/17] xen: arm32: add optimised strchr and strrchr routines Ian Campbell
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

This isn't used enough to be critical, but it completes the set of mem*.

Taken from Linux v3.14-rc7.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/arm32/lib/Makefile |    2 +-
 xen/arch/arm/arm32/lib/memchr.S |   28 ++++++++++++++++++++++++++++
 xen/include/asm-arm/string.h    |    3 +++
 3 files changed, 32 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/arm/arm32/lib/memchr.S

diff --git a/xen/arch/arm/arm32/lib/Makefile b/xen/arch/arm/arm32/lib/Makefile
index 4cf41f4..fa4e241 100644
--- a/xen/arch/arm/arm32/lib/Makefile
+++ b/xen/arch/arm/arm32/lib/Makefile
@@ -1,4 +1,4 @@
-obj-y += memcpy.o memmove.o memset.o memzero.o
+obj-y += memcpy.o memmove.o memset.o memchr.o memzero.o
 obj-y += findbit.o setbit.o
 obj-y += setbit.o clearbit.o changebit.o
 obj-y += testsetbit.o testclearbit.o testchangebit.o
diff --git a/xen/arch/arm/arm32/lib/memchr.S b/xen/arch/arm/arm32/lib/memchr.S
new file mode 100644
index 0000000..fd64ed8
--- /dev/null
+++ b/xen/arch/arm/arm32/lib/memchr.S
@@ -0,0 +1,28 @@
+/*
+ *  linux/arch/arm/lib/memchr.S
+ *
+ *  Copyright (C) 1995-2000 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ *  ASM optimised string functions
+ */
+
+#include <xen/config.h>
+
+#include "assembler.h"
+
+	.text
+	.align	5
+ENTRY(memchr)
+1:	subs	r2, r2, #1
+	bmi	2f
+	ldrb	r3, [r0], #1
+	teq	r3, r1
+	bne	1b
+	sub	r0, r0, #1
+2:	movne	r0, #0
+	mov	pc, lr
+ENDPROC(memchr)
diff --git a/xen/include/asm-arm/string.h b/xen/include/asm-arm/string.h
index abfa9d2..2c9f4f7 100644
--- a/xen/include/asm-arm/string.h
+++ b/xen/include/asm-arm/string.h
@@ -14,6 +14,9 @@ extern void *memmove(void *dest, const void *src, size_t n);
 #define __HAVE_ARCH_MEMSET
 extern void * memset(void *, int, __kernel_size_t);
 
+#define __HAVE_ARCH_MEMCHR
+extern void * memchr(const void *, int, __kernel_size_t);
+
 extern void __memzero(void *ptr, __kernel_size_t n);
 
 #define memset(p,v,n)                                                   \
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 08/17] xen: arm32: add optimised strchr and strrchr routines
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
                   ` (6 preceding siblings ...)
  2014-03-20 15:45 ` [PATCH 07/17] xen: arm32: add optimised memchr routine Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
  2014-03-20 17:33   ` Julien Grall
  2014-03-20 15:45 ` [PATCH 09/17] xen: arm: remove atomic_clear_mask() Ian Campbell
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Taken from Linux v3.14-rc7.

These aren't widely used enough to be critical, but we may as well have them.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/arm32/lib/Makefile  |    1 +
 xen/arch/arm/arm32/lib/strchr.S  |   29 +++++++++++++++++++++++++++++
 xen/arch/arm/arm32/lib/strrchr.S |   28 ++++++++++++++++++++++++++++
 xen/include/asm-arm/string.h     |   12 ++++++++++++
 4 files changed, 70 insertions(+)
 create mode 100644 xen/arch/arm/arm32/lib/strchr.S
 create mode 100644 xen/arch/arm/arm32/lib/strrchr.S

diff --git a/xen/arch/arm/arm32/lib/Makefile b/xen/arch/arm/arm32/lib/Makefile
index fa4e241..e9fbc59 100644
--- a/xen/arch/arm/arm32/lib/Makefile
+++ b/xen/arch/arm/arm32/lib/Makefile
@@ -2,4 +2,5 @@ obj-y += memcpy.o memmove.o memset.o memchr.o memzero.o
 obj-y += findbit.o setbit.o
 obj-y += setbit.o clearbit.o changebit.o
 obj-y += testsetbit.o testclearbit.o testchangebit.o
+obj-y += strchr.o strrchr.o
 obj-y += lib1funcs.o lshrdi3.o div64.o
diff --git a/xen/arch/arm/arm32/lib/strchr.S b/xen/arch/arm/arm32/lib/strchr.S
new file mode 100644
index 0000000..f01740e
--- /dev/null
+++ b/xen/arch/arm/arm32/lib/strchr.S
@@ -0,0 +1,29 @@
+/*
+ *  linux/arch/arm/lib/strchr.S
+ *
+ *  Copyright (C) 1995-2000 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ *  ASM optimised string functions
+ */
+
+#include <xen/config.h>
+	
+#include "assembler.h"
+
+		.text
+		.align	5
+ENTRY(strchr)
+		and	r1, r1, #0xff
+1:		ldrb	r2, [r0], #1
+		teq	r2, r1
+		teqne	r2, #0
+		bne	1b
+		teq	r2, r1
+		movne	r0, #0
+		subeq	r0, r0, #1
+		mov	pc, lr
+ENDPROC(strchr)
diff --git a/xen/arch/arm/arm32/lib/strrchr.S b/xen/arch/arm/arm32/lib/strrchr.S
new file mode 100644
index 0000000..88fc0de
--- /dev/null
+++ b/xen/arch/arm/arm32/lib/strrchr.S
@@ -0,0 +1,28 @@
+/*
+ *  linux/arch/arm/lib/strrchr.S
+ *
+ *  Copyright (C) 1995-2000 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ *  ASM optimised string functions
+ */
+
+#include <xen/config.h>
+	
+#include "assembler.h"
+
+		.text
+		.align	5
+ENTRY(strrchr)
+		mov	r3, #0
+1:		ldrb	r2, [r0], #1
+		teq	r2, r1
+		subeq	r3, r0, #1
+		teq	r2, #0
+		bne	1b
+		mov	r0, r3
+		mov	pc, lr
+ENDPROC(strrchr)
diff --git a/xen/include/asm-arm/string.h b/xen/include/asm-arm/string.h
index 2c9f4f7..7d8b35a 100644
--- a/xen/include/asm-arm/string.h
+++ b/xen/include/asm-arm/string.h
@@ -4,6 +4,18 @@
 #include <xen/config.h>
 
 #if defined(CONFIG_ARM_32)
+
+/*
+ * We don't do inline string functions, since the
+ * optimised inline asm versions are not small.
+ */
+
+#define __HAVE_ARCH_STRRCHR
+extern char * strrchr(const char * s, int c);
+
+#define __HAVE_ARCH_STRCHR
+extern char * strchr(const char * s, int c);
+
 #define __HAVE_ARCH_MEMCPY
 extern void * memcpy(void *, const void *, __kernel_size_t);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 09/17] xen: arm: remove atomic_clear_mask()
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
                   ` (7 preceding siblings ...)
  2014-03-20 15:45 ` [PATCH 08/17] xen: arm32: add optimised strchr and strrchr routines Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
  2014-03-20 17:35   ` Julien Grall
  2014-03-20 15:45 ` [PATCH 10/17] xen: arm64: disable alignment traps Ian Campbell
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

This has no users.

This brings arm32 atomic.h into sync with Linux v3.14-rc7.

arm64/atomic.h requires other patches for this to be the case.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/include/asm-arm/arm32/atomic.h |   16 ----------------
 xen/include/asm-arm/arm64/atomic.h |   14 --------------
 2 files changed, 30 deletions(-)

diff --git a/xen/include/asm-arm/arm32/atomic.h b/xen/include/asm-arm/arm32/atomic.h
index d309f66..3d601d1 100644
--- a/xen/include/asm-arm/arm32/atomic.h
+++ b/xen/include/asm-arm/arm32/atomic.h
@@ -117,22 +117,6 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
 	return oldval;
 }
 
-static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
-{
-	unsigned long tmp, tmp2;
-
-	prefetchw(addr);
-	__asm__ __volatile__("@ atomic_clear_mask\n"
-"1:	ldrex	%0, [%3]\n"
-"	bic	%0, %0, %4\n"
-"	strex	%1, %0, [%3]\n"
-"	teq	%1, #0\n"
-"	bne	1b"
-	: "=&r" (tmp), "=&r" (tmp2), "+Qo" (*addr)
-	: "r" (addr), "Ir" (mask)
-	: "cc");
-}
-
 #define atomic_inc(v)		atomic_add(1, v)
 #define atomic_dec(v)		atomic_sub(1, v)
 
diff --git a/xen/include/asm-arm/arm64/atomic.h b/xen/include/asm-arm/arm64/atomic.h
index b04e6d5..6b37945 100644
--- a/xen/include/asm-arm/arm64/atomic.h
+++ b/xen/include/asm-arm/arm64/atomic.h
@@ -110,20 +110,6 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
 	return oldval;
 }
 
-static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
-{
-	unsigned long tmp, tmp2;
-
-	asm volatile("// atomic_clear_mask\n"
-"1:	ldxr	%0, %2\n"
-"	bic	%0, %0, %3\n"
-"	stxr	%w1, %0, %2\n"
-"	cbnz	%w1, 1b"
-	: "=&r" (tmp), "=&r" (tmp2), "+Q" (*addr)
-	: "Ir" (mask)
-	: "cc");
-}
-
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
 
 static inline int __atomic_add_unless(atomic_t *v, int a, int u)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 10/17] xen: arm64: disable alignment traps
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
                   ` (8 preceding siblings ...)
  2014-03-20 15:45 ` [PATCH 09/17] xen: arm: remove atomic_clear_mask() Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
  2014-03-20 15:57   ` Andrew Cooper
  2014-03-20 17:54   ` Julien Grall
  2014-03-20 15:45 ` [PATCH 11/17] xen: arm64: atomics: fix use of acquire + release for full barrier semantics Ian Campbell
                   ` (6 subsequent siblings)
  16 siblings, 2 replies; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

The mem* primitives which I am about to import from Linux in a subsequent
patch rely on the hardware handling misalignment.

The benefits of an optimised memcpy etc oughtweigh the downsides.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/arm64/head.S |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 9547ef5..22d0030 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -241,7 +241,7 @@ skip_bss:
          * I-cache enabled,
          * Alignment checking enabled,
          * MMU translation disabled (for now). */
-        ldr   x0, =(HSCTLR_BASE|SCTLR_A)
+        ldr   x0, =(HSCTLR_BASE)
         msr   SCTLR_EL2, x0
 
         /* Rebuild the boot pagetable's first-level entries. The structure
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 11/17] xen: arm64: atomics: fix use of acquire + release for full barrier semantics
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
                   ` (9 preceding siblings ...)
  2014-03-20 15:45 ` [PATCH 10/17] xen: arm64: disable alignment traps Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
  2014-03-20 17:43   ` Julien Grall
  2014-03-20 15:46 ` [PATCH 12/17] xen: arm64: reinstate hard tabs in system.h cmpxchg Ian Campbell
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Xen, like Linux, expects full barrier semantics for bitops, atomics and
cmpxchgs. This issue was discovered on Linux and we get our implementation of
these from Linux so quoting Will Deacon in Linux commit 8e86f0b409a4 for the
gory details:
    Linux requires a number of atomic operations to provide full barrier
    semantics, that is no memory accesses after the operation can be
    observed before any accesses up to and including the operation in
    program order.

    On arm64, these operations have been incorrectly implemented as follows:

        // A, B, C are independent memory locations

        <Access [A]>

        // atomic_op (B)
    1:  ldaxr   x0, [B]         // Exclusive load with acquire
        <op(B)>
        stlxr   w1, x0, [B]     // Exclusive store with release
        cbnz    w1, 1b

        <Access [C]>

    The assumption here being that two half barriers are equivalent to a
    full barrier, so the only permitted ordering would be A -> B -> C
    (where B is the atomic operation involving both a load and a store).

    Unfortunately, this is not the case by the letter of the architecture
    and, in fact, the accesses to A and C are permitted to pass their
    nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
    or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
    store-release on B). This is a clear violation of the full barrier
    requirement.

    The simple way to fix this is to implement the same algorithm as ARMv7
    using explicit barriers:

        <Access [A]>

        // atomic_op (B)
        dmb     ish             // Full barrier
    1:  ldxr    x0, [B]         // Exclusive load
        <op(B)>
        stxr    w1, x0, [B]     // Exclusive store
        cbnz    w1, 1b
        dmb     ish             // Full barrier

        <Access [C]>

    but this has the undesirable effect of introducing *two* full barrier
    instructions. A better approach is actually the following, non-intuitive
    sequence:

        <Access [A]>

        // atomic_op (B)
    1:  ldxr    x0, [B]         // Exclusive load
        <op(B)>
        stlxr   w1, x0, [B]     // Exclusive store with release
        cbnz    w1, 1b
        dmb     ish             // Full barrier

        <Access [C]>

    The simple observations here are:

      - The dmb ensures that no subsequent accesses (e.g. the access to C)
        can enter or pass the atomic sequence.

      - The dmb also ensures that no prior accesses (e.g. the access to A)
        can pass the atomic sequence.

      - Therefore, no prior access can pass a subsequent access, or
        vice-versa (i.e. A is strictly ordered before C).

      - The stlxr ensures that no prior access can pass the store component
        of the atomic operation.

    The only tricky part remaining is the ordering between the ldxr and the
    access to A, since the absence of the first dmb means that we're now
    permitting re-ordering between the ldxr and any prior accesses.

    From an (arbitrary) observer's point of view, there are two scenarios:

      1. We have observed the ldxr. This means that if we perform a store to
         [B], the ldxr will still return older data. If we can observe the
         ldxr, then we can potentially observe the permitted re-ordering
         with the access to A, which is clearly an issue when compared to
         the dmb variant of the code. Thankfully, the exclusive monitor will
         save us here since it will be cleared as a result of the store and
         the ldxr will retry. Notice that any use of a later memory
         observation to imply observation of the ldxr will also imply
         observation of the access to A, since the stlxr/dmb ensure strict
         ordering.

      2. We have not observed the ldxr. This means we can perform a store
         and influence the later ldxr. However, that doesn't actually tell
         us anything about the access to [A], so we've not lost anything
         here either when compared to the dmb variant.

    This patch implements this solution for our barriered atomic operations,
    ensuring that we satisfy the full barrier requirements where they are
    needed.

    Cc: <stable@vger.kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Will Deacon <will.deacon@arm.com>
    Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/arm64/lib/bitops.S    |    3 +-
 xen/include/asm-arm/arm64/atomic.h |   13 +++++---
 xen/include/asm-arm/arm64/system.h |   61 ++++++++++++++++++------------------
 3 files changed, 42 insertions(+), 35 deletions(-)

diff --git a/xen/arch/arm/arm64/lib/bitops.S b/xen/arch/arm/arm64/lib/bitops.S
index 80cc903..e1ad239 100644
--- a/xen/arch/arm/arm64/lib/bitops.S
+++ b/xen/arch/arm/arm64/lib/bitops.S
@@ -46,11 +46,12 @@ ENTRY(	\name	)
 	mov	x2, #1
 	add	x1, x1, x0, lsr #3	// Get word offset
 	lsl	x4, x2, x3		// Create mask
-1:	ldaxr	w2, [x1]
+1:	ldxr	w2, [x1]
 	lsr	w0, w2, w3		// Save old value of bit
 	\instr	w2, w2, w4		// toggle bit
 	stlxr	w5, w2, [x1]
 	cbnz	w5, 1b
+	dmb	ish
 	and	w0, w0, #1
 3:	ret
 ENDPROC(\name	)
diff --git a/xen/include/asm-arm/arm64/atomic.h b/xen/include/asm-arm/arm64/atomic.h
index 6b37945..3f37ed5 100644
--- a/xen/include/asm-arm/arm64/atomic.h
+++ b/xen/include/asm-arm/arm64/atomic.h
@@ -48,7 +48,7 @@ static inline int atomic_add_return(int i, atomic_t *v)
 	int result;
 
 	asm volatile("// atomic_add_return\n"
-"1:	ldaxr	%w0, %2\n"
+"1:	ldxr	%w0, %2\n"
 "	add	%w0, %w0, %w3\n"
 "	stlxr	%w1, %w0, %2\n"
 "	cbnz	%w1, 1b"
@@ -56,6 +56,7 @@ static inline int atomic_add_return(int i, atomic_t *v)
 	: "Ir" (i)
 	: "cc", "memory");
 
+	smp_mb();
 	return result;
 }
 
@@ -80,7 +81,7 @@ static inline int atomic_sub_return(int i, atomic_t *v)
 	int result;
 
 	asm volatile("// atomic_sub_return\n"
-"1:	ldaxr	%w0, %2\n"
+"1:	ldxr	%w0, %2\n"
 "	sub	%w0, %w0, %w3\n"
 "	stlxr	%w1, %w0, %2\n"
 "	cbnz	%w1, 1b"
@@ -88,6 +89,7 @@ static inline int atomic_sub_return(int i, atomic_t *v)
 	: "Ir" (i)
 	: "cc", "memory");
 
+	smp_mb();
 	return result;
 }
 
@@ -96,17 +98,20 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
 	unsigned long tmp;
 	int oldval;
 
+	smp_mb();
+
 	asm volatile("// atomic_cmpxchg\n"
-"1:	ldaxr	%w1, %2\n"
+"1:	ldxr	%w1, %2\n"
 "	cmp	%w1, %w3\n"
 "	b.ne	2f\n"
-"	stlxr	%w0, %w4, %2\n"
+"	stxr	%w0, %w4, %2\n"
 "	cbnz	%w0, 1b\n"
 "2:"
 	: "=&r" (tmp), "=&r" (oldval), "+Q" (ptr->counter)
 	: "Ir" (old), "r" (new)
 	: "cc", "memory");
 
+	smp_mb();
 	return oldval;
 }
 
diff --git a/xen/include/asm-arm/arm64/system.h b/xen/include/asm-arm/arm64/system.h
index 570af5c..0db96e0 100644
--- a/xen/include/asm-arm/arm64/system.h
+++ b/xen/include/asm-arm/arm64/system.h
@@ -8,49 +8,50 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 {
         unsigned long ret, tmp;
 
-        switch (size) {
-        case 1:
-                asm volatile("//        __xchg1\n"
-                "1:     ldaxrb  %w0, %2\n"
-                "       stlxrb  %w1, %w3, %2\n"
-                "       cbnz    %w1, 1b\n"
-                        : "=&r" (ret), "=&r" (tmp), "+Q" (*(u8 *)ptr)
+	switch (size) {
+	case 1:
+		asm volatile("//	__xchg1\n"
+		"1:	ldxrb	%w0, %2\n"
+		"	stlxrb	%w1, %w3, %2\n"
+		"	cbnz	%w1, 1b\n"
+			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u8 *)ptr)
                         : "r" (x)
                         : "cc", "memory");
-                break;
-        case 2:
-                asm volatile("//        __xchg2\n"
-                "1:     ldaxrh  %w0, %2\n"
-                "       stlxrh  %w1, %w3, %2\n"
-                "       cbnz    %w1, 1b\n"
-                        : "=&r" (ret), "=&r" (tmp), "+Q" (*(u16 *)ptr)
+		break;
+	case 2:
+		asm volatile("//	__xchg2\n"
+		"1:	ldxrh	%w0, %2\n"
+		"	stlxrh	%w1, %w3, %2\n"
+		"	cbnz	%w1, 1b\n"
+			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u16 *)ptr)
                         : "r" (x)
                         : "cc", "memory");
-                break;
-        case 4:
-                asm volatile("//        __xchg4\n"
-                "1:     ldaxr   %w0, %2\n"
-                "       stlxr   %w1, %w3, %2\n"
-                "       cbnz    %w1, 1b\n"
-                        : "=&r" (ret), "=&r" (tmp), "+Q" (*(u32 *)ptr)
+		break;
+	case 4:
+		asm volatile("//	__xchg4\n"
+		"1:	ldxr	%w0, %2\n"
+		"	stlxr	%w1, %w3, %2\n"
+		"	cbnz	%w1, 1b\n"
+			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u32 *)ptr)
                         : "r" (x)
                         : "cc", "memory");
-                break;
-        case 8:
-                asm volatile("//        __xchg8\n"
-                "1:     ldaxr   %0, %2\n"
-                "       stlxr   %w1, %3, %2\n"
-                "       cbnz    %w1, 1b\n"
-                        : "=&r" (ret), "=&r" (tmp), "+Q" (*(u64 *)ptr)
+		break;
+	case 8:
+		asm volatile("//	__xchg8\n"
+		"1:	ldxr	%0, %2\n"
+		"	stlxr	%w1, %3, %2\n"
+		"	cbnz	%w1, 1b\n"
+			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u64 *)ptr)
                         : "r" (x)
                         : "cc", "memory");
                 break;
         default:
                 __bad_xchg(ptr, size), ret = 0;
                 break;
-        }
+	}
 
-        return ret;
+	smp_mb();
+	return ret;
 }
 
 #define xchg(ptr,x) \
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 12/17] xen: arm64: reinstate hard tabs in system.h cmpxchg
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
                   ` (10 preceding siblings ...)
  2014-03-20 15:45 ` [PATCH 11/17] xen: arm64: atomics: fix use of acquire + release for full barrier semantics Ian Campbell
@ 2014-03-20 15:46 ` Ian Campbell
  2014-03-20 17:44   ` Julien Grall
  2014-03-20 15:46 ` [PATCH 13/17] xen: arm64: asm: remove redundant "cc" clobbers Ian Campbell
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:46 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

These functions are from Linux and the intention was to keep the formatting
the same to make resyncing easier.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/include/asm-arm/arm64/system.h |  196 ++++++++++++++++++------------------
 1 file changed, 98 insertions(+), 98 deletions(-)

diff --git a/xen/include/asm-arm/arm64/system.h b/xen/include/asm-arm/arm64/system.h
index 0db96e0..9fa698b 100644
--- a/xen/include/asm-arm/arm64/system.h
+++ b/xen/include/asm-arm/arm64/system.h
@@ -6,7 +6,7 @@ extern void __bad_xchg(volatile void *, int);
 
 static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
 {
-        unsigned long ret, tmp;
+	unsigned long ret, tmp;
 
 	switch (size) {
 	case 1:
@@ -15,8 +15,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 		"	stlxrb	%w1, %w3, %2\n"
 		"	cbnz	%w1, 1b\n"
 			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u8 *)ptr)
-                        : "r" (x)
-                        : "cc", "memory");
+			: "r" (x)
+			: "cc", "memory");
 		break;
 	case 2:
 		asm volatile("//	__xchg2\n"
@@ -24,8 +24,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 		"	stlxrh	%w1, %w3, %2\n"
 		"	cbnz	%w1, 1b\n"
 			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u16 *)ptr)
-                        : "r" (x)
-                        : "cc", "memory");
+			: "r" (x)
+			: "cc", "memory");
 		break;
 	case 4:
 		asm volatile("//	__xchg4\n"
@@ -33,8 +33,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 		"	stlxr	%w1, %w3, %2\n"
 		"	cbnz	%w1, 1b\n"
 			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u32 *)ptr)
-                        : "r" (x)
-                        : "cc", "memory");
+			: "r" (x)
+			: "cc", "memory");
 		break;
 	case 8:
 		asm volatile("//	__xchg8\n"
@@ -42,12 +42,12 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 		"	stlxr	%w1, %3, %2\n"
 		"	cbnz	%w1, 1b\n"
 			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u64 *)ptr)
-                        : "r" (x)
-                        : "cc", "memory");
-                break;
-        default:
-                __bad_xchg(ptr, size), ret = 0;
-                break;
+			: "r" (x)
+			: "cc", "memory");
+		break;
+	default:
+		__bad_xchg(ptr, size), ret = 0;
+		break;
 	}
 
 	smp_mb();
@@ -55,107 +55,107 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 }
 
 #define xchg(ptr,x) \
-        ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
+	((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
 
 extern void __bad_cmpxchg(volatile void *ptr, int size);
 
 static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
-                                      unsigned long new, int size)
+				      unsigned long new, int size)
 {
-        unsigned long oldval = 0, res;
-
-        switch (size) {
-        case 1:
-                do {
-                        asm volatile("// __cmpxchg1\n"
-                        "       ldxrb   %w1, %2\n"
-                        "       mov     %w0, #0\n"
-                        "       cmp     %w1, %w3\n"
-                        "       b.ne    1f\n"
-                        "       stxrb   %w0, %w4, %2\n"
-                        "1:\n"
-                                : "=&r" (res), "=&r" (oldval), "+Q" (*(u8 *)ptr)
-                                : "Ir" (old), "r" (new)
-                                : "cc");
-                } while (res);
-                break;
-
-        case 2:
-                do {
-                        asm volatile("// __cmpxchg2\n"
-                        "       ldxrh   %w1, %2\n"
-                        "       mov     %w0, #0\n"
-                        "       cmp     %w1, %w3\n"
-                        "       b.ne    1f\n"
-                        "       stxrh   %w0, %w4, %2\n"
-                        "1:\n"
-                                : "=&r" (res), "=&r" (oldval), "+Q" (*(u16 *)ptr)
-                                : "Ir" (old), "r" (new)
-                                : "cc");
-                } while (res);
-                break;
-
-        case 4:
-                do {
-                        asm volatile("// __cmpxchg4\n"
-                        "       ldxr    %w1, %2\n"
-                        "       mov     %w0, #0\n"
-                        "       cmp     %w1, %w3\n"
-                        "       b.ne    1f\n"
-                        "       stxr    %w0, %w4, %2\n"
-                        "1:\n"
-                                : "=&r" (res), "=&r" (oldval), "+Q" (*(u32 *)ptr)
-                                : "Ir" (old), "r" (new)
-                                : "cc");
-                } while (res);
-                break;
-
-        case 8:
-                do {
-                        asm volatile("// __cmpxchg8\n"
-                        "       ldxr    %1, %2\n"
-                        "       mov     %w0, #0\n"
-                        "       cmp     %1, %3\n"
-                        "       b.ne    1f\n"
-                        "       stxr    %w0, %4, %2\n"
-                        "1:\n"
-                                : "=&r" (res), "=&r" (oldval), "+Q" (*(u64 *)ptr)
-                                : "Ir" (old), "r" (new)
-                                : "cc");
-                } while (res);
-                break;
-
-        default:
+	unsigned long oldval = 0, res;
+
+	switch (size) {
+	case 1:
+		do {
+			asm volatile("// __cmpxchg1\n"
+			"	ldxrb	%w1, %2\n"
+			"	mov	%w0, #0\n"
+			"	cmp	%w1, %w3\n"
+			"	b.ne	1f\n"
+			"	stxrb	%w0, %w4, %2\n"
+			"1:\n"
+				: "=&r" (res), "=&r" (oldval), "+Q" (*(u8 *)ptr)
+				: "Ir" (old), "r" (new)
+				: "cc");
+		} while (res);
+		break;
+
+	case 2:
+		do {
+			asm volatile("// __cmpxchg2\n"
+			"	ldxrh	%w1, %2\n"
+			"	mov	%w0, #0\n"
+			"	cmp	%w1, %w3\n"
+			"	b.ne	1f\n"
+			"	stxrh	%w0, %w4, %2\n"
+			"1:\n"
+				: "=&r" (res), "=&r" (oldval), "+Q" (*(u16 *)ptr)
+				: "Ir" (old), "r" (new)
+				: "cc");
+		} while (res);
+		break;
+
+	case 4:
+		do {
+			asm volatile("// __cmpxchg4\n"
+			"	ldxr	%w1, %2\n"
+			"	mov	%w0, #0\n"
+			"	cmp	%w1, %w3\n"
+			"	b.ne	1f\n"
+			"	stxr	%w0, %w4, %2\n"
+			"1:\n"
+				: "=&r" (res), "=&r" (oldval), "+Q" (*(u32 *)ptr)
+				: "Ir" (old), "r" (new)
+				: "cc");
+		} while (res);
+		break;
+
+	case 8:
+		do {
+			asm volatile("// __cmpxchg8\n"
+			"	ldxr	%1, %2\n"
+			"	mov	%w0, #0\n"
+			"	cmp	%1, %3\n"
+			"	b.ne	1f\n"
+			"	stxr	%w0, %4, %2\n"
+			"1:\n"
+				: "=&r" (res), "=&r" (oldval), "+Q" (*(u64 *)ptr)
+				: "Ir" (old), "r" (new)
+				: "cc");
+		} while (res);
+		break;
+
+	default:
 		__bad_cmpxchg(ptr, size);
 		oldval = 0;
-        }
+	}
 
-        return oldval;
+	return oldval;
 }
 
 static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
-                                         unsigned long new, int size)
+					 unsigned long new, int size)
 {
-        unsigned long ret;
+	unsigned long ret;
 
-        smp_mb();
-        ret = __cmpxchg(ptr, old, new, size);
-        smp_mb();
+	smp_mb();
+	ret = __cmpxchg(ptr, old, new, size);
+	smp_mb();
 
-        return ret;
+	return ret;
 }
 
-#define cmpxchg(ptr,o,n)                                                \
-        ((__typeof__(*(ptr)))__cmpxchg_mb((ptr),                        \
-                                          (unsigned long)(o),           \
-                                          (unsigned long)(n),           \
-                                          sizeof(*(ptr))))
-
-#define cmpxchg_local(ptr,o,n)                                          \
-        ((__typeof__(*(ptr)))__cmpxchg((ptr),                           \
-                                       (unsigned long)(o),              \
-                                       (unsigned long)(n),              \
-                                       sizeof(*(ptr))))
+#define cmpxchg(ptr,o,n)						\
+	((__typeof__(*(ptr)))__cmpxchg_mb((ptr),			\
+					  (unsigned long)(o),		\
+					  (unsigned long)(n),		\
+					  sizeof(*(ptr))))
+
+#define cmpxchg_local(ptr,o,n)						\
+	((__typeof__(*(ptr)))__cmpxchg((ptr),				\
+				       (unsigned long)(o),		\
+				       (unsigned long)(n),		\
+				       sizeof(*(ptr))))
 
 /* Uses uimm4 as a bitmask to select the clearing of one or more of
  * the DAIF exception mask bits:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 13/17] xen: arm64: asm: remove redundant "cc" clobbers
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
                   ` (11 preceding siblings ...)
  2014-03-20 15:46 ` [PATCH 12/17] xen: arm64: reinstate hard tabs in system.h cmpxchg Ian Campbell
@ 2014-03-20 15:46 ` Ian Campbell
  2014-03-20 17:45   ` Julien Grall
  2014-03-20 15:46 ` [PATCH 14/17] xen: arm64: assembly optimised mem* and str* Ian Campbell
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:46 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

This resyncs atomics and cmpxchgs with Linux v3.14-rc7 by importing:
commit 95c4189689f92fba7ecf9097173404d4928c6e9b
Author: Will Deacon <will.deacon@arm.com>
Date:   Tue Feb 4 12:29:13 2014 +0000

    arm64: asm: remove redundant "cc" clobbers

    cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
    from inline asm blocks that only use these instructions to implement
    conditional branches.

    Signed-off-by: Will Deacon <will.deacon@arm.com>
    Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/include/asm-arm/arm64/atomic.h   |   12 +++++-------
 xen/include/asm-arm/arm64/spinlock.h |    6 +++---
 xen/include/asm-arm/arm64/system.h   |    8 ++++----
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/xen/include/asm-arm/arm64/atomic.h b/xen/include/asm-arm/arm64/atomic.h
index 3f37ed5..b5d50f2 100644
--- a/xen/include/asm-arm/arm64/atomic.h
+++ b/xen/include/asm-arm/arm64/atomic.h
@@ -38,8 +38,7 @@ static inline void atomic_add(int i, atomic_t *v)
 "	stxr	%w1, %w0, %2\n"
 "	cbnz	%w1, 1b"
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
-	: "Ir" (i)
-	: "cc");
+	: "Ir" (i));
 }
 
 static inline int atomic_add_return(int i, atomic_t *v)
@@ -54,7 +53,7 @@ static inline int atomic_add_return(int i, atomic_t *v)
 "	cbnz	%w1, 1b"
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
 	: "Ir" (i)
-	: "cc", "memory");
+	: "memory");
 
 	smp_mb();
 	return result;
@@ -71,8 +70,7 @@ static inline void atomic_sub(int i, atomic_t *v)
 "	stxr	%w1, %w0, %2\n"
 "	cbnz	%w1, 1b"
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
-	: "Ir" (i)
-	: "cc");
+	: "Ir" (i));
 }
 
 static inline int atomic_sub_return(int i, atomic_t *v)
@@ -87,7 +85,7 @@ static inline int atomic_sub_return(int i, atomic_t *v)
 "	cbnz	%w1, 1b"
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
 	: "Ir" (i)
-	: "cc", "memory");
+	: "memory");
 
 	smp_mb();
 	return result;
@@ -109,7 +107,7 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
 "2:"
 	: "=&r" (tmp), "=&r" (oldval), "+Q" (ptr->counter)
 	: "Ir" (old), "r" (new)
-	: "cc", "memory");
+	: "cc");
 
 	smp_mb();
 	return oldval;
diff --git a/xen/include/asm-arm/arm64/spinlock.h b/xen/include/asm-arm/arm64/spinlock.h
index 3a36cfd..04300bc 100644
--- a/xen/include/asm-arm/arm64/spinlock.h
+++ b/xen/include/asm-arm/arm64/spinlock.h
@@ -70,7 +70,7 @@ static always_inline int _raw_read_trylock(raw_rwlock_t *rw)
         "1:\n"
         : "=&r" (tmp), "+r" (tmp2), "+Q" (rw->lock)
         :
-        : "cc", "memory");
+        : "memory");
 
     return !tmp2;
 }
@@ -86,7 +86,7 @@ static always_inline int _raw_write_trylock(raw_rwlock_t *rw)
         "1:\n"
         : "=&r" (tmp), "+Q" (rw->lock)
         : "r" (0x80000000)
-        : "cc", "memory");
+        : "memory");
 
     return !tmp;
 }
@@ -102,7 +102,7 @@ static inline void _raw_read_unlock(raw_rwlock_t *rw)
         "       cbnz    %w1, 1b\n"
         : "=&r" (tmp), "=&r" (tmp2), "+Q" (rw->lock)
         :
-        : "cc", "memory");
+        : "memory");
 }
 
 static inline void _raw_write_unlock(raw_rwlock_t *rw)
diff --git a/xen/include/asm-arm/arm64/system.h b/xen/include/asm-arm/arm64/system.h
index 9fa698b..fa50ead 100644
--- a/xen/include/asm-arm/arm64/system.h
+++ b/xen/include/asm-arm/arm64/system.h
@@ -16,7 +16,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 		"	cbnz	%w1, 1b\n"
 			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u8 *)ptr)
 			: "r" (x)
-			: "cc", "memory");
+			: "memory");
 		break;
 	case 2:
 		asm volatile("//	__xchg2\n"
@@ -25,7 +25,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 		"	cbnz	%w1, 1b\n"
 			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u16 *)ptr)
 			: "r" (x)
-			: "cc", "memory");
+			: "memory");
 		break;
 	case 4:
 		asm volatile("//	__xchg4\n"
@@ -34,7 +34,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 		"	cbnz	%w1, 1b\n"
 			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u32 *)ptr)
 			: "r" (x)
-			: "cc", "memory");
+			: "memory");
 		break;
 	case 8:
 		asm volatile("//	__xchg8\n"
@@ -43,7 +43,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 		"	cbnz	%w1, 1b\n"
 			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u64 *)ptr)
 			: "r" (x)
-			: "cc", "memory");
+			: "memory");
 		break;
 	default:
 		__bad_xchg(ptr, size), ret = 0;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 14/17] xen: arm64: assembly optimised mem* and str*
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
                   ` (12 preceding siblings ...)
  2014-03-20 15:46 ` [PATCH 13/17] xen: arm64: asm: remove redundant "cc" clobbers Ian Campbell
@ 2014-03-20 15:46 ` Ian Campbell
  2014-03-20 17:48   ` Julien Grall
  2014-03-20 15:46 ` [PATCH 15/17] xen: arm64: optimised clear_page Ian Campbell
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:46 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Taken from Linux v3.14-rc7.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/arm64/lib/Makefile  |    2 ++
 xen/arch/arm/arm64/lib/memchr.S  |   43 +++++++++++++++++++++++++++++
 xen/arch/arm/arm64/lib/memcpy.S  |   52 +++++++++++++++++++++++++++++++++++
 xen/arch/arm/arm64/lib/memmove.S |   56 ++++++++++++++++++++++++++++++++++++++
 xen/arch/arm/arm64/lib/memset.S  |   52 +++++++++++++++++++++++++++++++++++
 xen/arch/arm/arm64/lib/strchr.S  |   41 ++++++++++++++++++++++++++++
 xen/arch/arm/arm64/lib/strrchr.S |   42 ++++++++++++++++++++++++++++
 xen/include/asm-arm/string.h     |    4 +--
 8 files changed, 290 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/arm/arm64/lib/memchr.S
 create mode 100644 xen/arch/arm/arm64/lib/memcpy.S
 create mode 100644 xen/arch/arm/arm64/lib/memmove.S
 create mode 100644 xen/arch/arm/arm64/lib/memset.S
 create mode 100644 xen/arch/arm/arm64/lib/strchr.S
 create mode 100644 xen/arch/arm/arm64/lib/strrchr.S

diff --git a/xen/arch/arm/arm64/lib/Makefile b/xen/arch/arm/arm64/lib/Makefile
index 32c02c4..9f3b236 100644
--- a/xen/arch/arm/arm64/lib/Makefile
+++ b/xen/arch/arm/arm64/lib/Makefile
@@ -1 +1,3 @@
+obj-y += memcpy.o memmove.o memset.o memchr.o
 obj-y += bitops.o find_next_bit.o
+obj-y += strchr.o strrchr.o
diff --git a/xen/arch/arm/arm64/lib/memchr.S b/xen/arch/arm/arm64/lib/memchr.S
new file mode 100644
index 0000000..3cc1b01
--- /dev/null
+++ b/xen/arch/arm/arm64/lib/memchr.S
@@ -0,0 +1,43 @@
+/*
+ * Based on arch/arm/lib/memchr.S
+ *
+ * Copyright (C) 1995-2000 Russell King
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+
+/*
+ * Find a character in an area of memory.
+ *
+ * Parameters:
+ *	x0 - buf
+ *	x1 - c
+ *	x2 - n
+ * Returns:
+ *	x0 - address of first occurrence of 'c' or 0
+ */
+ENTRY(memchr)
+	and	w1, w1, #0xff
+1:	subs	x2, x2, #1
+	b.mi	2f
+	ldrb	w3, [x0], #1
+	cmp	w3, w1
+	b.ne	1b
+	sub	x0, x0, #1
+	ret
+2:	mov	x0, #0
+	ret
+ENDPROC(memchr)
diff --git a/xen/arch/arm/arm64/lib/memcpy.S b/xen/arch/arm/arm64/lib/memcpy.S
new file mode 100644
index 0000000..c8197c6
--- /dev/null
+++ b/xen/arch/arm/arm64/lib/memcpy.S
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+
+/*
+ * Copy a buffer from src to dest (alignment handled by the hardware)
+ *
+ * Parameters:
+ *	x0 - dest
+ *	x1 - src
+ *	x2 - n
+ * Returns:
+ *	x0 - dest
+ */
+ENTRY(memcpy)
+	mov	x4, x0
+	subs	x2, x2, #8
+	b.mi	2f
+1:	ldr	x3, [x1], #8
+	subs	x2, x2, #8
+	str	x3, [x4], #8
+	b.pl	1b
+2:	adds	x2, x2, #4
+	b.mi	3f
+	ldr	w3, [x1], #4
+	sub	x2, x2, #4
+	str	w3, [x4], #4
+3:	adds	x2, x2, #2
+	b.mi	4f
+	ldrh	w3, [x1], #2
+	sub	x2, x2, #2
+	strh	w3, [x4], #2
+4:	adds	x2, x2, #1
+	b.mi	5f
+	ldrb	w3, [x1]
+	strb	w3, [x4]
+5:	ret
+ENDPROC(memcpy)
diff --git a/xen/arch/arm/arm64/lib/memmove.S b/xen/arch/arm/arm64/lib/memmove.S
new file mode 100644
index 0000000..1bf0936
--- /dev/null
+++ b/xen/arch/arm/arm64/lib/memmove.S
@@ -0,0 +1,56 @@
+/*
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+
+/*
+ * Move a buffer from src to test (alignment handled by the hardware).
+ * If dest <= src, call memcpy, otherwise copy in reverse order.
+ *
+ * Parameters:
+ *	x0 - dest
+ *	x1 - src
+ *	x2 - n
+ * Returns:
+ *	x0 - dest
+ */
+ENTRY(memmove)
+	cmp	x0, x1
+	b.ls	memcpy
+	add	x4, x0, x2
+	add	x1, x1, x2
+	subs	x2, x2, #8
+	b.mi	2f
+1:	ldr	x3, [x1, #-8]!
+	subs	x2, x2, #8
+	str	x3, [x4, #-8]!
+	b.pl	1b
+2:	adds	x2, x2, #4
+	b.mi	3f
+	ldr	w3, [x1, #-4]!
+	sub	x2, x2, #4
+	str	w3, [x4, #-4]!
+3:	adds	x2, x2, #2
+	b.mi	4f
+	ldrh	w3, [x1, #-2]!
+	sub	x2, x2, #2
+	strh	w3, [x4, #-2]!
+4:	adds	x2, x2, #1
+	b.mi	5f
+	ldrb	w3, [x1, #-1]
+	strb	w3, [x4, #-1]
+5:	ret
+ENDPROC(memmove)
diff --git a/xen/arch/arm/arm64/lib/memset.S b/xen/arch/arm/arm64/lib/memset.S
new file mode 100644
index 0000000..25a4fb6
--- /dev/null
+++ b/xen/arch/arm/arm64/lib/memset.S
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+
+/*
+ * Fill in the buffer with character c (alignment handled by the hardware)
+ *
+ * Parameters:
+ *	x0 - buf
+ *	x1 - c
+ *	x2 - n
+ * Returns:
+ *	x0 - buf
+ */
+ENTRY(memset)
+	mov	x4, x0
+	and	w1, w1, #0xff
+	orr	w1, w1, w1, lsl #8
+	orr	w1, w1, w1, lsl #16
+	orr	x1, x1, x1, lsl #32
+	subs	x2, x2, #8
+	b.mi	2f
+1:	str	x1, [x4], #8
+	subs	x2, x2, #8
+	b.pl	1b
+2:	adds	x2, x2, #4
+	b.mi	3f
+	sub	x2, x2, #4
+	str	w1, [x4], #4
+3:	adds	x2, x2, #2
+	b.mi	4f
+	sub	x2, x2, #2
+	strh	w1, [x4], #2
+4:	adds	x2, x2, #1
+	b.mi	5f
+	strb	w1, [x4]
+5:	ret
+ENDPROC(memset)
diff --git a/xen/arch/arm/arm64/lib/strchr.S b/xen/arch/arm/arm64/lib/strchr.S
new file mode 100644
index 0000000..9e265e4
--- /dev/null
+++ b/xen/arch/arm/arm64/lib/strchr.S
@@ -0,0 +1,41 @@
+/*
+ * Based on arch/arm/lib/strchr.S
+ *
+ * Copyright (C) 1995-2000 Russell King
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+
+/*
+ * Find the first occurrence of a character in a string.
+ *
+ * Parameters:
+ *	x0 - str
+ *	x1 - c
+ * Returns:
+ *	x0 - address of first occurrence of 'c' or 0
+ */
+ENTRY(strchr)
+	and	w1, w1, #0xff
+1:	ldrb	w2, [x0], #1
+	cmp	w2, w1
+	ccmp	w2, wzr, #4, ne
+	b.ne	1b
+	sub	x0, x0, #1
+	cmp	w2, w1
+	csel	x0, x0, xzr, eq
+	ret
+ENDPROC(strchr)
diff --git a/xen/arch/arm/arm64/lib/strrchr.S b/xen/arch/arm/arm64/lib/strrchr.S
new file mode 100644
index 0000000..3791754
--- /dev/null
+++ b/xen/arch/arm/arm64/lib/strrchr.S
@@ -0,0 +1,42 @@
+/*
+ * Based on arch/arm/lib/strrchr.S
+ *
+ * Copyright (C) 1995-2000 Russell King
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+
+/*
+ * Find the last occurrence of a character in a string.
+ *
+ * Parameters:
+ *	x0 - str
+ *	x1 - c
+ * Returns:
+ *	x0 - address of last occurrence of 'c' or 0
+ */
+ENTRY(strrchr)
+	mov	x3, #0
+	and	w1, w1, #0xff
+1:	ldrb	w2, [x0], #1
+	cbz	w2, 2f
+	cmp	w2, w1
+	b.ne	1b
+	sub	x3, x0, #1
+	b	1b
+2:	mov	x0, x3
+	ret
+ENDPROC(strrchr)
diff --git a/xen/include/asm-arm/string.h b/xen/include/asm-arm/string.h
index 7d8b35a..3242762 100644
--- a/xen/include/asm-arm/string.h
+++ b/xen/include/asm-arm/string.h
@@ -3,8 +3,6 @@
 
 #include <xen/config.h>
 
-#if defined(CONFIG_ARM_32)
-
 /*
  * We don't do inline string functions, since the
  * optimised inline asm versions are not small.
@@ -29,6 +27,8 @@ extern void * memset(void *, int, __kernel_size_t);
 #define __HAVE_ARCH_MEMCHR
 extern void * memchr(const void *, int, __kernel_size_t);
 
+#if defined(CONFIG_ARM_32)
+
 extern void __memzero(void *ptr, __kernel_size_t n);
 
 #define memset(p,v,n)                                                   \
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 15/17] xen: arm64: optimised clear_page
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
                   ` (13 preceding siblings ...)
  2014-03-20 15:46 ` [PATCH 14/17] xen: arm64: assembly optimised mem* and str* Ian Campbell
@ 2014-03-20 15:46 ` Ian Campbell
  2014-03-20 15:46 ` [PATCH 16/17] xen: arm: refactor xchg and cmpxchg into their own headers Ian Campbell
  2014-03-20 15:46 ` [PATCH 17/17] xen: arm: document what low level primitives we have imported from Linux Ian Campbell
  16 siblings, 0 replies; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:46 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Taken from Linux v3.14-rc7.

The clear_page header now needs to be withing the !__ASSEMBLY__
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/arm64/lib/Makefile     |    1 +
 xen/arch/arm/arm64/lib/clear_page.S |   36 +++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/page.h          |    9 +++++++--
 3 files changed, 44 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/arm/arm64/lib/clear_page.S

diff --git a/xen/arch/arm/arm64/lib/Makefile b/xen/arch/arm/arm64/lib/Makefile
index 9f3b236..b895afa 100644
--- a/xen/arch/arm/arm64/lib/Makefile
+++ b/xen/arch/arm/arm64/lib/Makefile
@@ -1,3 +1,4 @@
 obj-y += memcpy.o memmove.o memset.o memchr.o
+obj-y += clear_page.o
 obj-y += bitops.o find_next_bit.o
 obj-y += strchr.o strrchr.o
diff --git a/xen/arch/arm/arm64/lib/clear_page.S b/xen/arch/arm/arm64/lib/clear_page.S
new file mode 100644
index 0000000..8d5cadb
--- /dev/null
+++ b/xen/arch/arm/arm64/lib/clear_page.S
@@ -0,0 +1,36 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+
+/*
+ * Clear page @dest
+ *
+ * Parameters:
+ *	x0 - dest
+ */
+ENTRY(clear_page)
+	mrs	x1, dczid_el0
+	and	w1, w1, #0xf
+	mov	x2, #4
+	lsl	x1, x2, x1
+
+1:	dc	zva, x0
+	add	x0, x0, x1
+	tst	x0, #(PAGE_SIZE - 1)
+	b.ne	1b
+	ret
+ENDPROC(clear_page)
diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index d18ec2a..e880ae8 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -341,6 +341,13 @@ static inline int gva_to_ipa(vaddr_t va, paddr_t *paddr)
 /* Bits in the PAR returned by va_to_par */
 #define PAR_FAULT 0x1
 
+
+#ifdef CONFIG_ARM_32
+#define clear_page(page) memset((void *)(page), 0, PAGE_SIZE)
+#else
+extern void clear_page(void *to);
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 /*
@@ -382,8 +389,6 @@ static inline int gva_to_ipa(vaddr_t va, paddr_t *paddr)
 #define third_table_offset(va)  TABLE_OFFSET(third_linear_offset(va))
 #define zeroeth_table_offset(va)  TABLE_OFFSET(zeroeth_linear_offset(va))
 
-#define clear_page(page) memset((void *)(page), 0, PAGE_SIZE)
-
 #define PAGE_ALIGN(x) (((x) + PAGE_SIZE - 1) & PAGE_MASK)
 
 #endif /* __ARM_PAGE_H__ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 16/17] xen: arm: refactor xchg and cmpxchg into their own headers
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
                   ` (14 preceding siblings ...)
  2014-03-20 15:46 ` [PATCH 15/17] xen: arm64: optimised clear_page Ian Campbell
@ 2014-03-20 15:46 ` Ian Campbell
  2014-03-20 17:52   ` Julien Grall
  2014-03-20 15:46 ` [PATCH 17/17] xen: arm: document what low level primitives we have imported from Linux Ian Campbell
  16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:46 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

Since these functions are taken from Linux this makes it easier to compare
against the Lihnux cmpxchg.h headers (which were split out from Linux's
system.h a while back).

Since these functions are from Linux the intention is to use Linux coding
style, therefore include a suitable emacs magic block.

For this reason also fix up the indentation in the 32-bit version to use hard
tabs while moving it. The 64-bit version was already correct.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/include/asm-arm/arm32/cmpxchg.h |  146 ++++++++++++++++++++++++++++++
 xen/include/asm-arm/arm32/system.h  |  135 +---------------------------
 xen/include/asm-arm/arm64/cmpxchg.h |  167 +++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/arm64/system.h  |  155 +-------------------------------
 4 files changed, 315 insertions(+), 288 deletions(-)
 create mode 100644 xen/include/asm-arm/arm32/cmpxchg.h
 create mode 100644 xen/include/asm-arm/arm64/cmpxchg.h

diff --git a/xen/include/asm-arm/arm32/cmpxchg.h b/xen/include/asm-arm/arm32/cmpxchg.h
new file mode 100644
index 0000000..70c6090
--- /dev/null
+++ b/xen/include/asm-arm/arm32/cmpxchg.h
@@ -0,0 +1,146 @@
+#ifndef __ASM_ARM32_CMPXCHG_H
+#define __ASM_ARM32_CMPXCHG_H
+
+extern void __bad_xchg(volatile void *, int);
+
+static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
+{
+	unsigned long ret;
+	unsigned int tmp;
+
+	smp_mb();
+
+	switch (size) {
+	case 1:
+		asm volatile("@	__xchg1\n"
+		"1:	ldrexb	%0, [%3]\n"
+		"	strexb	%1, %2, [%3]\n"
+		"	teq	%1, #0\n"
+		"	bne	1b"
+			: "=&r" (ret), "=&r" (tmp)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+	case 4:
+		asm volatile("@	__xchg4\n"
+		"1:	ldrex	%0, [%3]\n"
+		"	strex	%1, %2, [%3]\n"
+		"	teq	%1, #0\n"
+		"	bne	1b"
+			: "=&r" (ret), "=&r" (tmp)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+	default:
+		__bad_xchg(ptr, size), ret = 0;
+		break;
+	}
+	smp_mb();
+
+	return ret;
+}
+
+/*
+ * Atomic compare and exchange.  Compare OLD with MEM, if identical,
+ * store NEW in MEM.  Return the initial value in MEM.  Success is
+ * indicated by comparing RETURN with OLD.
+ */
+
+extern void __bad_cmpxchg(volatile void *ptr, int size);
+
+static always_inline unsigned long __cmpxchg(
+    volatile void *ptr, unsigned long old, unsigned long new, int size)
+{
+	unsigned long oldval, res;
+
+	switch (size) {
+	case 1:
+		do {
+			asm volatile("@ __cmpxchg1\n"
+			"	ldrexb	%1, [%2]\n"
+			"	mov	%0, #0\n"
+			"	teq	%1, %3\n"
+			"	strexbeq %0, %4, [%2]\n"
+				: "=&r" (res), "=&r" (oldval)
+				: "r" (ptr), "Ir" (old), "r" (new)
+				: "memory", "cc");
+		} while (res);
+		break;
+	case 2:
+		do {
+			asm volatile("@ __cmpxchg2\n"
+			"	ldrexh	%1, [%2]\n"
+			"	mov	%0, #0\n"
+			"	teq	%1, %3\n"
+			"	strexheq %0, %4, [%2]\n"
+				: "=&r" (res), "=&r" (oldval)
+				: "r" (ptr), "Ir" (old), "r" (new)
+				: "memory", "cc");
+		} while (res);
+		break;
+	case 4:
+		do {
+			asm volatile("@ __cmpxchg4\n"
+			"	ldrex	%1, [%2]\n"
+			"	mov	%0, #0\n"
+			"	teq	%1, %3\n"
+			"	strexeq	%0, %4, [%2]\n"
+				: "=&r" (res), "=&r" (oldval)
+				: "r" (ptr), "Ir" (old), "r" (new)
+				: "memory", "cc");
+	    } while (res);
+	    break;
+#if 0
+	case 8:
+		do {
+			asm volatile("@ __cmpxchg8\n"
+			"	ldrexd	%1, [%2]\n"
+			"	mov	%0, #0\n"
+			"	teq	%1, %3\n"
+			"	strexdeq %0, %4, [%2]\n"
+				: "=&r" (res), "=&r" (oldval)
+				: "r" (ptr), "Ir" (old), "r" (new)
+				: "memory", "cc");
+		} while (res);
+		break;
+#endif
+	default:
+		__bad_cmpxchg(ptr, size);
+		oldval = 0;
+	}
+
+	return oldval;
+}
+
+static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
+					 unsigned long new, int size)
+{
+	unsigned long ret;
+
+	smp_mb();
+	ret = __cmpxchg(ptr, old, new, size);
+	smp_mb();
+
+	return ret;
+}
+
+#define cmpxchg(ptr,o,n)						\
+	((__typeof__(*(ptr)))__cmpxchg_mb((ptr),			\
+					  (unsigned long)(o),		\
+					  (unsigned long)(n),		\
+					  sizeof(*(ptr))))
+
+#define cmpxchg_local(ptr,o,n)						\
+	((__typeof__(*(ptr)))__cmpxchg((ptr),				\
+				       (unsigned long)(o),		\
+				       (unsigned long)(n),		\
+				       sizeof(*(ptr))))
+#endif
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 8
+ * indent-tabs-mode: t
+ * End:
+ */
diff --git a/xen/include/asm-arm/arm32/system.h b/xen/include/asm-arm/arm32/system.h
index dfaa3b6..b47b942 100644
--- a/xen/include/asm-arm/arm32/system.h
+++ b/xen/include/asm-arm/arm32/system.h
@@ -2,140 +2,7 @@
 #ifndef __ASM_ARM32_SYSTEM_H
 #define __ASM_ARM32_SYSTEM_H
 
-extern void __bad_xchg(volatile void *, int);
-
-static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
-{
-        unsigned long ret;
-        unsigned int tmp;
-
-        smp_mb();
-
-        switch (size) {
-        case 1:
-                asm volatile("@ __xchg1\n"
-                "1:     ldrexb  %0, [%3]\n"
-                "       strexb  %1, %2, [%3]\n"
-                "       teq     %1, #0\n"
-                "       bne     1b"
-                        : "=&r" (ret), "=&r" (tmp)
-                        : "r" (x), "r" (ptr)
-                        : "memory", "cc");
-                break;
-        case 4:
-                asm volatile("@ __xchg4\n"
-                "1:     ldrex   %0, [%3]\n"
-                "       strex   %1, %2, [%3]\n"
-                "       teq     %1, #0\n"
-                "       bne     1b"
-                        : "=&r" (ret), "=&r" (tmp)
-                        : "r" (x), "r" (ptr)
-                        : "memory", "cc");
-                break;
-        default:
-                __bad_xchg(ptr, size), ret = 0;
-                break;
-        }
-        smp_mb();
-
-        return ret;
-}
-
-/*
- * Atomic compare and exchange.  Compare OLD with MEM, if identical,
- * store NEW in MEM.  Return the initial value in MEM.  Success is
- * indicated by comparing RETURN with OLD.
- */
-
-extern void __bad_cmpxchg(volatile void *ptr, int size);
-
-static always_inline unsigned long __cmpxchg(
-    volatile void *ptr, unsigned long old, unsigned long new, int size)
-{
-    unsigned long /*long*/ oldval, res;
-
-    switch (size) {
-    case 1:
-        do {
-            asm volatile("@ __cmpxchg1\n"
-                         "       ldrexb  %1, [%2]\n"
-                         "       mov     %0, #0\n"
-                         "       teq     %1, %3\n"
-                         "       strexbeq %0, %4, [%2]\n"
-                         : "=&r" (res), "=&r" (oldval)
-                         : "r" (ptr), "Ir" (old), "r" (new)
-                         : "memory", "cc");
-        } while (res);
-        break;
-    case 2:
-        do {
-            asm volatile("@ __cmpxchg2\n"
-                         "       ldrexh  %1, [%2]\n"
-                         "       mov     %0, #0\n"
-                         "       teq     %1, %3\n"
-                         "       strexheq %0, %4, [%2]\n"
-                         : "=&r" (res), "=&r" (oldval)
-                         : "r" (ptr), "Ir" (old), "r" (new)
-                         : "memory", "cc");
-        } while (res);
-        break;
-    case 4:
-        do {
-            asm volatile("@ __cmpxchg4\n"
-                         "       ldrex   %1, [%2]\n"
-                         "       mov     %0, #0\n"
-                         "       teq     %1, %3\n"
-                         "       strexeq %0, %4, [%2]\n"
-                         : "=&r" (res), "=&r" (oldval)
-                         : "r" (ptr), "Ir" (old), "r" (new)
-                         : "memory", "cc");
-        } while (res);
-        break;
-#if 0
-    case 8:
-        do {
-            asm volatile("@ __cmpxchg8\n"
-                         "       ldrexd   %1, [%2]\n"
-                         "       mov      %0, #0\n"
-                         "       teq      %1, %3\n"
-                         "       strexdeq %0, %4, [%2]\n"
-                         : "=&r" (res), "=&r" (oldval)
-                         : "r" (ptr), "Ir" (old), "r" (new)
-                         : "memory", "cc");
-        } while (res);
-        break;
-#endif
-    default:
-        __bad_cmpxchg(ptr, size);
-        oldval = 0;
-    }
-
-    return oldval;
-}
-
-static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
-					 unsigned long new, int size)
-{
-	unsigned long ret;
-
-	smp_mb();
-	ret = __cmpxchg(ptr, old, new, size);
-	smp_mb();
-
-	return ret;
-}
-
-#define cmpxchg(ptr,o,n)						\
-	((__typeof__(*(ptr)))__cmpxchg_mb((ptr),			\
-					  (unsigned long)(o),		\
-					  (unsigned long)(n),		\
-					  sizeof(*(ptr))))
-
-#define cmpxchg_local(ptr,o,n)						\
-	((__typeof__(*(ptr)))__cmpxchg((ptr),				\
-				       (unsigned long)(o),		\
-				       (unsigned long)(n),		\
-				       sizeof(*(ptr))))
+#include <asm/arm32/cmpxchg.h>
 
 #define local_irq_disable() asm volatile ( "cpsid i @ local_irq_disable\n" : : : "cc" )
 #define local_irq_enable()  asm volatile ( "cpsie i @ local_irq_enable\n" : : : "cc" )
diff --git a/xen/include/asm-arm/arm64/cmpxchg.h b/xen/include/asm-arm/arm64/cmpxchg.h
new file mode 100644
index 0000000..4e930ce
--- /dev/null
+++ b/xen/include/asm-arm/arm64/cmpxchg.h
@@ -0,0 +1,167 @@
+#ifndef __ASM_ARM64_CMPXCHG_H
+#define __ASM_ARM64_CMPXCHG_H
+
+extern void __bad_xchg(volatile void *, int);
+
+static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
+{
+	unsigned long ret, tmp;
+
+	switch (size) {
+	case 1:
+		asm volatile("//	__xchg1\n"
+		"1:	ldxrb	%w0, %2\n"
+		"	stlxrb	%w1, %w3, %2\n"
+		"	cbnz	%w1, 1b\n"
+			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u8 *)ptr)
+			: "r" (x)
+			: "memory");
+		break;
+	case 2:
+		asm volatile("//	__xchg2\n"
+		"1:	ldxrh	%w0, %2\n"
+		"	stlxrh	%w1, %w3, %2\n"
+		"	cbnz	%w1, 1b\n"
+			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u16 *)ptr)
+			: "r" (x)
+			: "memory");
+		break;
+	case 4:
+		asm volatile("//	__xchg4\n"
+		"1:	ldxr	%w0, %2\n"
+		"	stlxr	%w1, %w3, %2\n"
+		"	cbnz	%w1, 1b\n"
+			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u32 *)ptr)
+			: "r" (x)
+			: "memory");
+		break;
+	case 8:
+		asm volatile("//	__xchg8\n"
+		"1:	ldxr	%0, %2\n"
+		"	stlxr	%w1, %3, %2\n"
+		"	cbnz	%w1, 1b\n"
+			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u64 *)ptr)
+			: "r" (x)
+			: "memory");
+		break;
+	default:
+		__bad_xchg(ptr, size), ret = 0;
+		break;
+	}
+
+	smp_mb();
+	return ret;
+}
+
+#define xchg(ptr,x) \
+	((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
+
+extern void __bad_cmpxchg(volatile void *ptr, int size);
+
+static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
+				      unsigned long new, int size)
+{
+	unsigned long oldval = 0, res;
+
+	switch (size) {
+	case 1:
+		do {
+			asm volatile("// __cmpxchg1\n"
+			"	ldxrb	%w1, %2\n"
+			"	mov	%w0, #0\n"
+			"	cmp	%w1, %w3\n"
+			"	b.ne	1f\n"
+			"	stxrb	%w0, %w4, %2\n"
+			"1:\n"
+				: "=&r" (res), "=&r" (oldval), "+Q" (*(u8 *)ptr)
+				: "Ir" (old), "r" (new)
+				: "cc");
+		} while (res);
+		break;
+
+	case 2:
+		do {
+			asm volatile("// __cmpxchg2\n"
+			"	ldxrh	%w1, %2\n"
+			"	mov	%w0, #0\n"
+			"	cmp	%w1, %w3\n"
+			"	b.ne	1f\n"
+			"	stxrh	%w0, %w4, %2\n"
+			"1:\n"
+				: "=&r" (res), "=&r" (oldval), "+Q" (*(u16 *)ptr)
+				: "Ir" (old), "r" (new)
+				: "cc");
+		} while (res);
+		break;
+
+	case 4:
+		do {
+			asm volatile("// __cmpxchg4\n"
+			"	ldxr	%w1, %2\n"
+			"	mov	%w0, #0\n"
+			"	cmp	%w1, %w3\n"
+			"	b.ne	1f\n"
+			"	stxr	%w0, %w4, %2\n"
+			"1:\n"
+				: "=&r" (res), "=&r" (oldval), "+Q" (*(u32 *)ptr)
+				: "Ir" (old), "r" (new)
+				: "cc");
+		} while (res);
+		break;
+
+	case 8:
+		do {
+			asm volatile("// __cmpxchg8\n"
+			"	ldxr	%1, %2\n"
+			"	mov	%w0, #0\n"
+			"	cmp	%1, %3\n"
+			"	b.ne	1f\n"
+			"	stxr	%w0, %4, %2\n"
+			"1:\n"
+				: "=&r" (res), "=&r" (oldval), "+Q" (*(u64 *)ptr)
+				: "Ir" (old), "r" (new)
+				: "cc");
+		} while (res);
+		break;
+
+	default:
+		__bad_cmpxchg(ptr, size);
+		oldval = 0;
+	}
+
+	return oldval;
+}
+
+static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
+					 unsigned long new, int size)
+{
+	unsigned long ret;
+
+	smp_mb();
+	ret = __cmpxchg(ptr, old, new, size);
+	smp_mb();
+
+	return ret;
+}
+
+#define cmpxchg(ptr,o,n)						\
+	((__typeof__(*(ptr)))__cmpxchg_mb((ptr),			\
+					  (unsigned long)(o),		\
+					  (unsigned long)(n),		\
+					  sizeof(*(ptr))))
+
+#define cmpxchg_local(ptr,o,n)						\
+	((__typeof__(*(ptr)))__cmpxchg((ptr),				\
+				       (unsigned long)(o),		\
+				       (unsigned long)(n),		\
+				       sizeof(*(ptr))))
+
+#endif
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 8
+ * indent-tabs-mode: t
+ * End:
+ */
diff --git a/xen/include/asm-arm/arm64/system.h b/xen/include/asm-arm/arm64/system.h
index fa50ead..6efced3 100644
--- a/xen/include/asm-arm/arm64/system.h
+++ b/xen/include/asm-arm/arm64/system.h
@@ -2,160 +2,7 @@
 #ifndef __ASM_ARM64_SYSTEM_H
 #define __ASM_ARM64_SYSTEM_H
 
-extern void __bad_xchg(volatile void *, int);
-
-static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
-{
-	unsigned long ret, tmp;
-
-	switch (size) {
-	case 1:
-		asm volatile("//	__xchg1\n"
-		"1:	ldxrb	%w0, %2\n"
-		"	stlxrb	%w1, %w3, %2\n"
-		"	cbnz	%w1, 1b\n"
-			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u8 *)ptr)
-			: "r" (x)
-			: "memory");
-		break;
-	case 2:
-		asm volatile("//	__xchg2\n"
-		"1:	ldxrh	%w0, %2\n"
-		"	stlxrh	%w1, %w3, %2\n"
-		"	cbnz	%w1, 1b\n"
-			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u16 *)ptr)
-			: "r" (x)
-			: "memory");
-		break;
-	case 4:
-		asm volatile("//	__xchg4\n"
-		"1:	ldxr	%w0, %2\n"
-		"	stlxr	%w1, %w3, %2\n"
-		"	cbnz	%w1, 1b\n"
-			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u32 *)ptr)
-			: "r" (x)
-			: "memory");
-		break;
-	case 8:
-		asm volatile("//	__xchg8\n"
-		"1:	ldxr	%0, %2\n"
-		"	stlxr	%w1, %3, %2\n"
-		"	cbnz	%w1, 1b\n"
-			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u64 *)ptr)
-			: "r" (x)
-			: "memory");
-		break;
-	default:
-		__bad_xchg(ptr, size), ret = 0;
-		break;
-	}
-
-	smp_mb();
-	return ret;
-}
-
-#define xchg(ptr,x) \
-	((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
-
-extern void __bad_cmpxchg(volatile void *ptr, int size);
-
-static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
-				      unsigned long new, int size)
-{
-	unsigned long oldval = 0, res;
-
-	switch (size) {
-	case 1:
-		do {
-			asm volatile("// __cmpxchg1\n"
-			"	ldxrb	%w1, %2\n"
-			"	mov	%w0, #0\n"
-			"	cmp	%w1, %w3\n"
-			"	b.ne	1f\n"
-			"	stxrb	%w0, %w4, %2\n"
-			"1:\n"
-				: "=&r" (res), "=&r" (oldval), "+Q" (*(u8 *)ptr)
-				: "Ir" (old), "r" (new)
-				: "cc");
-		} while (res);
-		break;
-
-	case 2:
-		do {
-			asm volatile("// __cmpxchg2\n"
-			"	ldxrh	%w1, %2\n"
-			"	mov	%w0, #0\n"
-			"	cmp	%w1, %w3\n"
-			"	b.ne	1f\n"
-			"	stxrh	%w0, %w4, %2\n"
-			"1:\n"
-				: "=&r" (res), "=&r" (oldval), "+Q" (*(u16 *)ptr)
-				: "Ir" (old), "r" (new)
-				: "cc");
-		} while (res);
-		break;
-
-	case 4:
-		do {
-			asm volatile("// __cmpxchg4\n"
-			"	ldxr	%w1, %2\n"
-			"	mov	%w0, #0\n"
-			"	cmp	%w1, %w3\n"
-			"	b.ne	1f\n"
-			"	stxr	%w0, %w4, %2\n"
-			"1:\n"
-				: "=&r" (res), "=&r" (oldval), "+Q" (*(u32 *)ptr)
-				: "Ir" (old), "r" (new)
-				: "cc");
-		} while (res);
-		break;
-
-	case 8:
-		do {
-			asm volatile("// __cmpxchg8\n"
-			"	ldxr	%1, %2\n"
-			"	mov	%w0, #0\n"
-			"	cmp	%1, %3\n"
-			"	b.ne	1f\n"
-			"	stxr	%w0, %4, %2\n"
-			"1:\n"
-				: "=&r" (res), "=&r" (oldval), "+Q" (*(u64 *)ptr)
-				: "Ir" (old), "r" (new)
-				: "cc");
-		} while (res);
-		break;
-
-	default:
-		__bad_cmpxchg(ptr, size);
-		oldval = 0;
-	}
-
-	return oldval;
-}
-
-static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
-					 unsigned long new, int size)
-{
-	unsigned long ret;
-
-	smp_mb();
-	ret = __cmpxchg(ptr, old, new, size);
-	smp_mb();
-
-	return ret;
-}
-
-#define cmpxchg(ptr,o,n)						\
-	((__typeof__(*(ptr)))__cmpxchg_mb((ptr),			\
-					  (unsigned long)(o),		\
-					  (unsigned long)(n),		\
-					  sizeof(*(ptr))))
-
-#define cmpxchg_local(ptr,o,n)						\
-	((__typeof__(*(ptr)))__cmpxchg((ptr),				\
-				       (unsigned long)(o),		\
-				       (unsigned long)(n),		\
-				       sizeof(*(ptr))))
+#include <asm/arm64/cmpxchg.h>
 
 /* Uses uimm4 as a bitmask to select the clearing of one or more of
  * the DAIF exception mask bits:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 17/17] xen: arm: document what low level primitives we have imported from Linux
  2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
                   ` (15 preceding siblings ...)
  2014-03-20 15:46 ` [PATCH 16/17] xen: arm: refactor xchg and cmpxchg into their own headers Ian Campbell
@ 2014-03-20 15:46 ` Ian Campbell
  2014-03-20 16:23   ` Ian Campbell
  16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:46 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini

As part of the recent update I had to reverse engineer what we had, which was
very tedious. Check in my notes so that I have a reference for next time.

Now the secret is to remember to update this file every time!

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/README.LinuxPrimitives |  159 +++++++++++++++++++++++++++++++++++
 1 file changed, 159 insertions(+)
 create mode 100644 xen/arch/arm/README.LinuxPrimitives

diff --git a/xen/arch/arm/README.LinuxPrimitives b/xen/arch/arm/README.LinuxPrimitives
new file mode 100644
index 0000000..5656c11
--- /dev/null
+++ b/xen/arch/arm/README.LinuxPrimitives
@@ -0,0 +1,159 @@
+Xen on ARM uses various low level assembly primitives from the Linux
+kernel. This file tracks what files have been imported and when they
+were last updated.
+
+=====================================================================
+arm64:
+=====================================================================
+
+bitops: last sync @ v3.14-rc7 (last commit: 8e86f0b)
+
+linux/arch/arm64/lib/bitops.S           xen/arch/arm/arm64/lib/bitops.S
+linux/arch/arm64/include/asm/bitops.h   xen/include/asm-arm/arm64/bitops.h
+
+---------------------------------------------------------------------
+
+cmpxchg: last sync @ v3.14-rc7 (last commit: 95c4189)
+
+linux/arch/arm64/include/asm/cmpxchg.h  xen/include/asm-arm/arm64/cmpxchg.h
+
+Skipped:
+  60010e5 arm64: cmpxchg: update macros to prevent warnings
+
+---------------------------------------------------------------------
+
+atomics: last sync @ v3.14-rc7 (last commit: 95c4189)
+
+linux/arch/arm64/include/asm/atomic.h   xen/include/asm-arm/arm64/atomic.h
+
+---------------------------------------------------------------------
+
+spinlocks: last sync @ v3.14-rc7 (last commit: 95c4189)
+
+linux/arch/arm64/include/asm/spinlock.h xen/include/asm-arm/arm64/spinlock.h
+
+Skipped:
+  5686b06 arm64: lockref: add support for lockless lockrefs using cmpxchg
+  52ea2a5 arm64: locks: introduce ticket-based spinlock implementation
+
+---------------------------------------------------------------------
+
+mem*: last sync @ v3.14-rc7 (last commit: 4a89922)
+
+linux/arch/arm64/lib/memchr.S             xen/arch/arm/arm64/lib/memchr.S
+linux/arch/arm64/lib/memcpy.S             xen/arch/arm/arm64/lib/memcpy.S
+linux/arch/arm64/lib/memmove.S            xen/arch/arm/arm64/lib/memmove.S
+linux/arch/arm64/lib/memset.S             xen/arch/arm/arm64/lib/memset.S
+
+for i in memchr.S memcpy.S memmove.S memset.S ; do
+    diff -u linux/arch/arm64/lib/$i xen/arch/arm/arm64/lib/$i
+done
+
+---------------------------------------------------------------------
+
+str*: last sync @ v3.14-rc7 (last commit: 2b8cac8)
+
+linux/arch/arm/lib/strchr.S             xen/arch/arm/arm64/lib/strchr.S
+linux/arch/arm/lib/strrchr.S            xen/arch/arm/arm64/lib/strrchr.S
+
+---------------------------------------------------------------------
+
+{clear,copy}_page: last sync @ v3.14-rc7 (last commit: f27bb13)
+
+linux/arch/arm64/lib/clear_page.S       unused in Xen
+linux/arch/arm64/lib/copy_page.S        xen/arch/arm/arm64/lib/copy_page.S
+
+=====================================================================
+arm32
+=====================================================================
+
+bitops: last sync @ v3.14-rc7 (last commit: b7ec699)
+
+                                        xen/arch/arm/arm32/lib/assembler.h
+linux/arch/arm/lib/bitops.h             xen/arch/arm/arm32/lib/bitops.h
+linux/arch/arm/lib/changebit.S          xen/arch/arm/arm32/lib/changebit.S
+linux/arch/arm/lib/clearbit.S           xen/arch/arm/arm32/lib/clearbit.S
+linux/arch/arm/lib/findbit.S            xen/arch/arm/arm32/lib/findbit.S
+linux/arch/arm/lib/setbit.S             xen/arch/arm/arm32/lib/setbit.S
+linux/arch/arm/lib/testchangebit.S      xen/arch/arm/arm32/lib/testchangebit.S
+linux/arch/arm/lib/testclearbit.S       xen/arch/arm/arm32/lib/testclearbit.S
+linux/arch/arm/lib/testsetbit.S         xen/arch/arm/arm32/lib/testsetbit.S
+
+for i in assembler.h bitops.h changebit.S clearbit.S findbit.S \
+         setbit.S testchangebit.S testclearbit.S testsetbit.S; do 
+    diff -u ../linux/arch/arm/lib/$i xen/arch/arm/arm32/lib/$i;
+done
+
+---------------------------------------------------------------------
+
+cmpxchg: last sync @ v3.14-rc7 (last commit: 775ebcc)
+
+linux/arch/arm/include/asm/cmpxchg.h    xen/include/asm-arm/arm32/cmpxchg.h
+
+---------------------------------------------------------------------
+
+atomics: last sync @ v3.14-rc7 (last commit: aed3a4e)
+
+linux/arch/arm/include/asm/atomic.h     xen/include/asm-arm/arm32/atomic.h
+
+---------------------------------------------------------------------
+
+spinlocks: last sync: 15e7e5c1ebf5
+
+linux/arch/arm/include/asm/spinlock.h   xen/include/asm-arm/arm32/spinlock.h
+
+resync to v3.14-rc7:
+
+  7c8746a ARM: 7955/1: spinlock: ensure we have a compiler barrier before sev
+  0cbad9c ARM: 7854/1: lockref: add support for lockless lockrefs using cmpxchg64
+  9bb17be ARM: locks: prefetch the destination word for write prior to strex
+  27a8479 ARM: smp_on_up: move inline asm ALT_SMP patching macro out of spinlock.
+  00efaa0 ARM: 7812/1: rwlocks: retry trylock operation if strex fails on free lo
+  afa31d8 ARM: 7811/1: locks: use early clobber in arch_spin_trylock
+  73a6fdc ARM: spinlock: use inner-shareable dsb variant prior to sev instruction
+
+---------------------------------------------------------------------
+
+mem*: last sync @ v3.14-rc7 (last commit: 418df63a)
+
+linux/arch/arm/lib/copy_template.S      xen/arch/arm/arm32/lib/copy_template.S
+linux/arch/arm/lib/memchr.S             xen/arch/arm/arm32/lib/memchr.S
+linux/arch/arm/lib/memcpy.S             xen/arch/arm/arm32/lib/memcpy.S
+linux/arch/arm/lib/memmove.S            xen/arch/arm/arm32/lib/memmove.S
+linux/arch/arm/lib/memset.S             xen/arch/arm/arm32/lib/memset.S
+linux/arch/arm/lib/memzero.S            xen/arch/arm/arm32/lib/memzero.S
+
+linux/arch/arm/lib/strchr.S             xen/arch/arm/arm32/lib/strchr.S
+linux/arch/arm/lib/strrchr.S            xen/arch/arm/arm32/lib/strrchr.S
+
+for i in copy_template.S memchr.S memcpy.S memmove.S memset.S \
+         memzero.S ; do
+    diff -u linux/arch/arm/lib/$i xen/arch/arm/arm32/lib/$i
+done
+
+---------------------------------------------------------------------
+
+str*: last sync @ v3.13-rc7 (last commit: 93ed397)
+
+linux/arch/arm/lib/strchr.S             xen/arch/arm/arm32/lib/strchr.S
+linux/arch/arm/lib/strrchr.S            xen/arch/arm/arm32/lib/strrchr.S
+
+---------------------------------------------------------------------
+
+{clear,copy}_page: last sync: Never
+
+linux/arch/arm/lib/copy_page.S          unused in Xen
+
+clear_page == memset
+
+---------------------------------------------------------------------
+
+libgcc: last sync @ v3.14-rc7 (last commit: 01885bc)
+
+linux/arch/arm/lib/lib1funcs.S          xen/arch/arm/arm32/lib/lib1funcs.S
+linux/arch/arm/lib/lshrdi3.S            xen/arch/arm/arm32/lib/lshrdi3.S
+linux/arch/arm/lib/div64.S              xen/arch/arm/arm32/lib/div64.S
+
+for i in lib1funcs.S lshrdi3.S div64.S ; do
+    diff -u linux/arch/arm/lib/$i xen/arch/arm/arm32/lib/$i
+done
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/17] xen: arm64: disable alignment traps
  2014-03-20 15:45 ` [PATCH 10/17] xen: arm64: disable alignment traps Ian Campbell
@ 2014-03-20 15:57   ` Andrew Cooper
  2014-03-20 15:59     ` Ian Campbell
  2014-03-20 17:54   ` Julien Grall
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Cooper @ 2014-03-20 15:57 UTC (permalink / raw)
  To: Ian Campbell; +Cc: julien.grall, tim, stefano.stabellini, xen-devel

On 20/03/14 15:45, Ian Campbell wrote:
> The mem* primitives which I am about to import from Linux in a subsequent
> patch rely on the hardware handling misalignment.
>
> The benefits of an optimised memcpy etc oughtweigh the downsides.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> ---
>  xen/arch/arm/arm64/head.S |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index 9547ef5..22d0030 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -241,7 +241,7 @@ skip_bss:
>           * I-cache enabled,
>           * Alignment checking enabled,

Is this comment still true?

~Andrew

>           * MMU translation disabled (for now). */
> -        ldr   x0, =(HSCTLR_BASE|SCTLR_A)
> +        ldr   x0, =(HSCTLR_BASE)
>          msr   SCTLR_EL2, x0
>  
>          /* Rebuild the boot pagetable's first-level entries. The structure

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/17] xen: arm64: disable alignment traps
  2014-03-20 15:57   ` Andrew Cooper
@ 2014-03-20 15:59     ` Ian Campbell
  2014-03-20 16:21       ` Gordan Bobic
  0 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: julien.grall, tim, stefano.stabellini, xen-devel

On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote:
> On 20/03/14 15:45, Ian Campbell wrote:
> > The mem* primitives which I am about to import from Linux in a subsequent
> > patch rely on the hardware handling misalignment.
> >
> > The benefits of an optimised memcpy etc oughtweigh the downsides.

Ahem, "outweigh".

> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > ---
> >  xen/arch/arm/arm64/head.S |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> > index 9547ef5..22d0030 100644
> > --- a/xen/arch/arm/arm64/head.S
> > +++ b/xen/arch/arm/arm64/head.S
> > @@ -241,7 +241,7 @@ skip_bss:
> >           * I-cache enabled,
> >           * Alignment checking enabled,
> 
> Is this comment still true?

Oh balls, no it is not. I had a meeting between deciding to make this
change and actually making it...

Ian.

> 
> ~Andrew
> 
> >           * MMU translation disabled (for now). */
> > -        ldr   x0, =(HSCTLR_BASE|SCTLR_A)
> > +        ldr   x0, =(HSCTLR_BASE)
> >          msr   SCTLR_EL2, x0
> >  
> >          /* Rebuild the boot pagetable's first-level entries. The structure
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 01/17] xen: x86 & generic: change to __builtin_prefetch()
  2014-03-20 15:45 ` [PATCH 01/17] xen: x86 & generic: change to __builtin_prefetch() Ian Campbell
@ 2014-03-20 16:12   ` Jan Beulich
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2014-03-20 16:12 UTC (permalink / raw)
  To: Ian Campbell; +Cc: KeirFraser, tim, julien.grall, xen-devel, stefano.stabellini

>>> On 20.03.14 at 16:45, Ian Campbell <ian.campbell@citrix.com> wrote:
> Quoting Andi Kleen in Linux b483570a13be from 2007:
>     gcc 3.2+ supports __builtin_prefetch, so it's possible to use it on all
>     architectures. Change the generic fallback in linux/prefetch.h to use it
>     instead of noping it out. gcc should do the right thing when the
>     architecture doesn't support prefetching
> 
>     Undefine the x86-64 inline assembler version and use the fallback.
> 
> ARM wants to use the builtins.
> 
> Fix a pair of spelling errors, one of which was from Lucas De Marchi in the
> Linux tree.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Keir Fraser <keir@xen.org>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> ---
>  xen/include/xen/prefetch.h |   13 +++----------
>  1 file changed, 3 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/include/xen/prefetch.h b/xen/include/xen/prefetch.h
> index 8d7d3ff..ba73998 100644
> --- a/xen/include/xen/prefetch.h
> +++ b/xen/include/xen/prefetch.h
> @@ -28,24 +28,17 @@
>  	prefetchw(x)	- prefetches the cacheline at "x" for write
>  	spin_lock_prefetch(x) - prefectches the spinlock *x for taking
>  	
> -	there is also PREFETCH_STRIDE which is the architecure-prefered 
> +	there is also PREFETCH_STRIDE which is the architecture-preferred
>  	"lookahead" size for prefetching streamed operations.
>  	
>  */
>  
> -/*
> - *	These cannot be do{}while(0) macros. See the mental gymnastics in
> - *	the loop macro.
> - */
> - 
>  #ifndef ARCH_HAS_PREFETCH
> -#define ARCH_HAS_PREFETCH
> -static inline void prefetch(const void *x) {;}
> +#define prefetch(x) __builtin_prefetch(x)
>  #endif
>  
>  #ifndef ARCH_HAS_PREFETCHW
> -#define ARCH_HAS_PREFETCHW
> -static inline void prefetchw(const void *x) {;}
> +#define prefetchw(x) __builtin_prefetch(x,1)
>  #endif
>  
>  #ifndef ARCH_HAS_SPINLOCK_PREFETCH
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/17] xen: arm64: disable alignment traps
  2014-03-20 15:59     ` Ian Campbell
@ 2014-03-20 16:21       ` Gordan Bobic
  2014-03-20 16:27         ` Ian Campbell
  0 siblings, 1 reply; 42+ messages in thread
From: Gordan Bobic @ 2014-03-20 16:21 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Andrew Cooper, xen-devel, julien.grall, tim, stefano.stabellini

On 2014-03-20 15:59, Ian Campbell wrote:
> On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote:
>> On 20/03/14 15:45, Ian Campbell wrote:
>> > The mem* primitives which I am about to import from Linux in a subsequent
>> > patch rely on the hardware handling misalignment.
>> >
>> > The benefits of an optimised memcpy etc oughtweigh the downsides.
> 
> Ahem, "outweigh".

Just FYI, the slow-down from heavy unaligned accesses (with
hardware alignment fixup, you can't disable it using
/proc/cpu/alignment) on Cortex A15 is about 40x.

Most of the commonly used code has been fixed recently, but
there are still some packages that exhibit misaligned access
traps during their test suites and/or normal operation.

Whether the hardware alignment fixup is less overheady on
ARM64, I don't know - I haven't been able to get my hands
on the hardware yet.

Gordan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 17/17] xen: arm: document what low level primitives we have imported from Linux
  2014-03-20 15:46 ` [PATCH 17/17] xen: arm: document what low level primitives we have imported from Linux Ian Campbell
@ 2014-03-20 16:23   ` Ian Campbell
  0 siblings, 0 replies; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 16:23 UTC (permalink / raw)
  To: xen-devel; +Cc: julien.grall, tim, stefano.stabellini

On Thu, 2014-03-20 at 15:46 +0000, Ian Campbell wrote:
> As part of the recent update I had to reverse engineer what we had, which was
> very tedious. Check in my notes so that I have a reference for next time.
> 
> Now the secret is to remember to update this file every time!
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> ---
>  xen/arch/arm/README.LinuxPrimitives |  159 +++++++++++++++++++++++++++++++++++
>  1 file changed, 159 insertions(+)
>  create mode 100644 xen/arch/arm/README.LinuxPrimitives
> 
> diff --git a/xen/arch/arm/README.LinuxPrimitives b/xen/arch/arm/README.LinuxPrimitives
> new file mode 100644
> index 0000000..5656c11
> --- /dev/null
> +++ b/xen/arch/arm/README.LinuxPrimitives

Apparently I forgot to git commit --amend the following in to this
patch. I'll incorporate next time.

diff --git a/xen/arch/arm/README.LinuxPrimitives b/xen/arch/arm/README.LinuxPrimitives
index 5656c11..6cd03ca 100644
--- a/xen/arch/arm/README.LinuxPrimitives
+++ b/xen/arch/arm/README.LinuxPrimitives
@@ -69,7 +69,6 @@ arm32
 
 bitops: last sync @ v3.14-rc7 (last commit: b7ec699)
 
-                                        xen/arch/arm/arm32/lib/assembler.h
 linux/arch/arm/lib/bitops.h             xen/arch/arm/arm32/lib/bitops.h
 linux/arch/arm/lib/changebit.S          xen/arch/arm/arm32/lib/changebit.S
 linux/arch/arm/lib/clearbit.S           xen/arch/arm/arm32/lib/clearbit.S
@@ -79,8 +78,8 @@ linux/arch/arm/lib/testchangebit.S      xen/arch/arm/arm32/lib/testchangebit.S
 linux/arch/arm/lib/testclearbit.S       xen/arch/arm/arm32/lib/testclearbit.S
 linux/arch/arm/lib/testsetbit.S         xen/arch/arm/arm32/lib/testsetbit.S
 
-for i in assembler.h bitops.h changebit.S clearbit.S findbit.S \
-         setbit.S testchangebit.S testclearbit.S testsetbit.S; do 
+for i in bitops.h changebit.S clearbit.S findbit.S setbit.S testchangebit.S \
+         testclearbit.S testsetbit.S; do
     diff -u ../linux/arch/arm/lib/$i xen/arch/arm/arm32/lib/$i;
 done

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/17] xen: arm64: disable alignment traps
  2014-03-20 16:21       ` Gordan Bobic
@ 2014-03-20 16:27         ` Ian Campbell
  2014-03-20 16:43           ` Gordan Bobic
  0 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 16:27 UTC (permalink / raw)
  To: Gordan Bobic
  Cc: Andrew Cooper, xen-devel, julien.grall, tim, stefano.stabellini

On Thu, 2014-03-20 at 16:21 +0000, Gordan Bobic wrote:
> On 2014-03-20 15:59, Ian Campbell wrote:
> > On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote:
> >> On 20/03/14 15:45, Ian Campbell wrote:
> >> > The mem* primitives which I am about to import from Linux in a subsequent
> >> > patch rely on the hardware handling misalignment.
> >> >
> >> > The benefits of an optimised memcpy etc oughtweigh the downsides.
> > 
> > Ahem, "outweigh".
> 
> Just FYI, the slow-down from heavy unaligned accesses (with
> hardware alignment fixup, you can't disable it using
> /proc/cpu/alignment) on Cortex A15 is about 40x.

That's pretty staggering -- are you positive this wasn't the kernel
doing the fixups?

> Most of the commonly used code has been fixed recently, but
> there are still some packages that exhibit misaligned access
> traps during their test suites and/or normal operation.
>
> Whether the hardware alignment fixup is less overheady on
> ARM64, I don't know - I haven't been able to get my hands
> on the hardware yet.

arm64 is a lot "friendlier" than arm32 in this regard. I was mostly
taking it on trust that whoever implemented memcpy.S etc found that
memcpy.S with hardware alignment was better than the dumb loop, even if
it wasn't as good as a clever memcpy.S which avoided the alignments.

Ian.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/17] xen: arm64: disable alignment traps
  2014-03-20 16:27         ` Ian Campbell
@ 2014-03-20 16:43           ` Gordan Bobic
  2014-03-20 16:54             ` Ian Campbell
  0 siblings, 1 reply; 42+ messages in thread
From: Gordan Bobic @ 2014-03-20 16:43 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Andrew Cooper, xen-devel, julien.grall, tim, stefano.stabellini

On 2014-03-20 16:27, Ian Campbell wrote:
> On Thu, 2014-03-20 at 16:21 +0000, Gordan Bobic wrote:
>> On 2014-03-20 15:59, Ian Campbell wrote:
>> > On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote:
>> >> On 20/03/14 15:45, Ian Campbell wrote:
>> >> > The mem* primitives which I am about to import from Linux in a subsequent
>> >> > patch rely on the hardware handling misalignment.
>> >> >
>> >> > The benefits of an optimised memcpy etc oughtweigh the downsides.
>> >
>> > Ahem, "outweigh".
>> 
>> Just FYI, the slow-down from heavy unaligned accesses (with
>> hardware alignment fixup, you can't disable it using
>> /proc/cpu/alignment) on Cortex A15 is about 40x.
> 
> That's pretty staggering -- are you positive this wasn't the kernel
> doing the fixups?

I'm not sure if this is easily checkable:

# echo 0 > /proc/cpu/alignment
# cat /proc/cpu/alignment
User:		0
System:		631040
Skipped:	0
Half:		0
Word:		631040
DWord:		0
Multi:		0
User faults:	2 (fixup)

i.e. I can't disable it.

This is on a Samsung Exynos Chromebook with the
standard ChromeOS kernel.

Here is a recent thread from the Fedora ARM mailing list
which contains links to a simple test program that can
be used to test the alignment related slowdown:

http://www.mail-archive.com/arm@lists.fedoraproject.org/msg06121.html

>> Most of the commonly used code has been fixed recently, but
>> there are still some packages that exhibit misaligned access
>> traps during their test suites and/or normal operation.
>> 
>> Whether the hardware alignment fixup is less overheady on
>> ARM64, I don't know - I haven't been able to get my hands
>> on the hardware yet.
> 
> arm64 is a lot "friendlier" than arm32 in this regard. I was mostly
> taking it on trust that whoever implemented memcpy.S etc found that
> memcpy.S with hardware alignment was better than the dumb loop, even if
> it wasn't as good as a clever memcpy.S which avoided the alignments.

I am inclined to agree - it shouldn't be the job of the kernel or the
hypervisor to do this. It is up to the application developers to know
what they are doing and not do things that introduce misaligned
accesses. Unfortunately, there is far too little push-back on buggy
code because most developers have only ever used x86 and have no idea
that until recently everything else wasn't forgiving of such things.

Gordan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/17] xen: arm64: disable alignment traps
  2014-03-20 16:43           ` Gordan Bobic
@ 2014-03-20 16:54             ` Ian Campbell
  0 siblings, 0 replies; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 16:54 UTC (permalink / raw)
  To: Gordan Bobic
  Cc: Andrew Cooper, xen-devel, julien.grall, tim, stefano.stabellini

On Thu, 2014-03-20 at 16:43 +0000, Gordan Bobic wrote:
> On 2014-03-20 16:27, Ian Campbell wrote:
> > On Thu, 2014-03-20 at 16:21 +0000, Gordan Bobic wrote:
> >> On 2014-03-20 15:59, Ian Campbell wrote:
> >> > On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote:
> >> >> On 20/03/14 15:45, Ian Campbell wrote:
> >> >> > The mem* primitives which I am about to import from Linux in a subsequent
> >> >> > patch rely on the hardware handling misalignment.
> >> >> >
> >> >> > The benefits of an optimised memcpy etc oughtweigh the downsides.
> >> >
> >> > Ahem, "outweigh".
> >> 
> >> Just FYI, the slow-down from heavy unaligned accesses (with
> >> hardware alignment fixup, you can't disable it using
> >> /proc/cpu/alignment) on Cortex A15 is about 40x.
> > 
> > That's pretty staggering -- are you positive this wasn't the kernel
> > doing the fixups?
> 
> I'm not sure if this is easily checkable:
> 
> # echo 0 > /proc/cpu/alignment
> # cat /proc/cpu/alignment
> User:		0
> System:		631040
> Skipped:	0
> Half:		0
> Word:		631040
> DWord:		0
> Multi:		0
> User faults:	2 (fixup)
> 
> i.e. I can't disable it.

That "fixup" implies to me that the kernel will be fixing things up. 

linux/Documentation/arm/mem_alignment describes what happens here.

> 
> This is on a Samsung Exynos Chromebook with the
> standard ChromeOS kernel.

I've no idea if this sets SCTLR.A but it sounds like it does.

> 
> Here is a recent thread from the Fedora ARM mailing list
> which contains links to a simple test program that can
> be used to test the alignment related slowdown:
> 
> http://www.mail-archive.com/arm@lists.fedoraproject.org/msg06121.html
> 
> >> Most of the commonly used code has been fixed recently, but
> >> there are still some packages that exhibit misaligned access
> >> traps during their test suites and/or normal operation.
> >> 
> >> Whether the hardware alignment fixup is less overheady on
> >> ARM64, I don't know - I haven't been able to get my hands
> >> on the hardware yet.
> > 
> > arm64 is a lot "friendlier" than arm32 in this regard. I was mostly
> > taking it on trust that whoever implemented memcpy.S etc found that
> > memcpy.S with hardware alignment was better than the dumb loop, even if
> > it wasn't as good as a clever memcpy.S which avoided the alignments.
> 
> I am inclined to agree - it shouldn't be the job of the kernel or the
> hypervisor to do this.

This patch is only changing the alignment trap behaviour for the
hypervisor itself, it has no impact on either guest kernel or userspace,
which have their own control bits for operation in those modes.

Ian.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/17] xen: arm32: resync bitops with Linux v3.14-rc7
  2014-03-20 15:45 ` [PATCH 02/17] xen: arm32: resync bitops with Linux v3.14-rc7 Ian Campbell
@ 2014-03-20 17:13   ` Julien Grall
  0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:13 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

On 03/20/2014 03:45 PM, Ian Campbell wrote:
> This pulls in the following Linux commits:
> 
> commit c36ef4b1762302a493c6cb754073bded084700e2
> Author: Will Deacon <will.deacon@arm.com>
> Date:   Wed Nov 23 11:28:25 2011 +0100
> 
>     ARM: 7171/1: unwind: add unwind directives to bitops assembly macros
> 
>     The bitops functions (e.g. _test_and_set_bit) on ARM do not have unwind
>     annotations and therefore the kernel cannot backtrace out of them on a
>     fatal error (for example, NULL pointer dereference).
> 
>     This patch annotates the bitops assembly macros with UNWIND annotations
>     so that we can produce a meaningful backtrace on error. Callers of the
>     macros are modified to pass their function name as a macro parameter,
>     enforcing that the macros are used as standalone function implementations.
> 
>     Acked-by: Dave Martin <dave.martin@linaro.org>
>     Signed-off-by: Will Deacon <will.deacon@arm.com>
>     Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> 
> commit d779c07dd72098a7416d907494f958213b7726f3
> Author: Will Deacon <will.deacon@arm.com>
> Date:   Thu Jun 27 12:01:51 2013 +0100
> 
>     ARM: bitops: prefetch the destination word for write prior to strex
> 
>     The cost of changing a cacheline from shared to exclusive state can be
>     significant, especially when this is triggered by an exclusive store,
>     since it may result in having to retry the transaction.
> 
>     This patch prefixes our atomic bitops implementation with prefetchw,
>     to try and grab the line in exclusive state from the start. The testop
>     macro is left alone, since the barrier semantics limit the usefulness
>     of prefetching data.
> 
>     Acked-by: Nicolas Pitre <nico@linaro.org>
>     Signed-off-by: Will Deacon <will.deacon@arm.com>
> 
> commit b7ec699405f55667caeb46d96229d75bf33a83ad
> Author: Will Deacon <will.deacon@arm.com>
> Date:   Tue Nov 19 15:46:11 2013 +0100
> 
>     ARM: 7893/1: bitops: only emit .arch_extension mp if CONFIG_SMP
> 
>     Uwe reported a build failure when targetting a NOMMU platform with my
>     recent prefetch changes:
> 
>       arch/arm/lib/changebit.S: Assembler messages:
>       arch/arm/lib/changebit.S:15: Error: architectural extension `mp' is
>                         not allowed for the current base architecture
> 
>     This is due to use of the .arch_extension mp directive immediately prior
>     to an ALT_SMP(...) instruction. Whilst the ALT_SMP macro will expand to
>     nothing if !CONFIG_SMP, gas will still choke on the directive.
> 
>     This patch fixes the issue by only emitting the sequence (including the
>     directive) if CONFIG_SMP=y.
> 
>     Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
>     Signed-off-by: Will Deacon <will.deacon@arm.com>
>     Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

> ---
>  xen/arch/arm/arm32/lib/bitops.h        |   17 +++++++++++++++--
>  xen/arch/arm/arm32/lib/changebit.S     |    4 +---
>  xen/arch/arm/arm32/lib/clearbit.S      |    4 +---
>  xen/arch/arm/arm32/lib/setbit.S        |    4 +---
>  xen/arch/arm/arm32/lib/testchangebit.S |    4 +---
>  xen/arch/arm/arm32/lib/testclearbit.S  |    4 +---
>  xen/arch/arm/arm32/lib/testsetbit.S    |    4 +---
>  7 files changed, 21 insertions(+), 20 deletions(-)
> 
> diff --git a/xen/arch/arm/arm32/lib/bitops.h b/xen/arch/arm/arm32/lib/bitops.h
> index 689f2e8..25784c3 100644
> --- a/xen/arch/arm/arm32/lib/bitops.h
> +++ b/xen/arch/arm/arm32/lib/bitops.h
> @@ -1,13 +1,20 @@
>  #include <xen/config.h>
>  
>  #if __LINUX_ARM_ARCH__ >= 6
> -	.macro	bitop, instr
> +	.macro	bitop, name, instr
> +ENTRY(	\name		)
> +UNWIND(	.fnstart	)
>  	ands	ip, r1, #3
>  	strneb	r1, [ip]		@ assert word-aligned
>  	mov	r2, #1
>  	and	r3, r0, #31		@ Get bit offset
>  	mov	r0, r0, lsr #5
>  	add	r1, r1, r0, lsl #2	@ Get word offset
> +#if __LINUX_ARM_ARCH__ >= 7 && defined(CONFIG_SMP)
> +	.arch_extension	mp
> +	ALT_SMP(W(pldw)	[r1])
> +	ALT_UP(W(nop))
> +#endif
>  	mov	r3, r2, lsl r3
>  1:	ldrex	r2, [r1]
>  	\instr	r2, r2, r3
> @@ -15,9 +22,13 @@
>  	cmp	r0, #0
>  	bne	1b
>  	bx	lr
> +UNWIND(	.fnend		)
> +ENDPROC(\name		)
>  	.endm
>  
> -	.macro	testop, instr, store
> +	.macro	testop, name, instr, store
> +ENTRY(	\name		)
> +UNWIND(	.fnstart	)
>  	ands	ip, r1, #3
>  	strneb	r1, [ip]		@ assert word-aligned
>  	mov	r2, #1
> @@ -36,6 +47,8 @@
>  	cmp	r0, #0
>  	movne	r0, #1
>  2:	bx	lr
> +UNWIND(	.fnend		)
> +ENDPROC(\name		)
>  	.endm
>  #else
>  	.macro	bitop, name, instr
> diff --git a/xen/arch/arm/arm32/lib/changebit.S b/xen/arch/arm/arm32/lib/changebit.S
> index 62954bc..11f41d2 100644
> --- a/xen/arch/arm/arm32/lib/changebit.S
> +++ b/xen/arch/arm/arm32/lib/changebit.S
> @@ -13,6 +13,4 @@
>  #include "bitops.h"
>                  .text
>  fined(CONFIG_SMP)
> -ENTRY(_change_bit)
> -	bitop	eor
> -ENDPROC(_change_bit)
> +bitop	_change_bit, eor
> diff --git a/xen/arch/arm/arm32/lib/clearbit.S b/xen/arch/arm/arm32/lib/clearbit.S
> index 42ce416..1b6a569 100644
> --- a/xen/arch/arm/arm32/lib/clearbit.S
> +++ b/xen/arch/arm/arm32/lib/clearbit.S
> @@ -14,6 +14,4 @@
>  #include "bitops.h"
>                  .text
>  
> -ENTRY(_clear_bit)
> -	bitop	bic
> -ENDPROC(_clear_bit)
> +bitop	_clear_bit, bic
> diff --git a/xen/arch/arm/arm32/lib/setbit.S b/xen/arch/arm/arm32/lib/setbit.S
> index c828851..1f4ef56 100644
> --- a/xen/arch/arm/arm32/lib/setbit.S
> +++ b/xen/arch/arm/arm32/lib/setbit.S
> @@ -13,6 +13,4 @@
>  #include "bitops.h"
>  	.text
>  
> -ENTRY(_set_bit)
> -	bitop	orr
> -ENDPROC(_set_bit)
> +bitop	_set_bit, orr
> diff --git a/xen/arch/arm/arm32/lib/testchangebit.S b/xen/arch/arm/arm32/lib/testchangebit.S
> index a7f527c..7f4635c 100644
> --- a/xen/arch/arm/arm32/lib/testchangebit.S
> +++ b/xen/arch/arm/arm32/lib/testchangebit.S
> @@ -13,6 +13,4 @@
>  #include "bitops.h"
>                  .text
>  
> -ENTRY(_test_and_change_bit)
> -	testop	eor, str
> -ENDPROC(_test_and_change_bit)
> +testop	_test_and_change_bit, eor, str
> diff --git a/xen/arch/arm/arm32/lib/testclearbit.S b/xen/arch/arm/arm32/lib/testclearbit.S
> index 8f39c72..4d4152f 100644
> --- a/xen/arch/arm/arm32/lib/testclearbit.S
> +++ b/xen/arch/arm/arm32/lib/testclearbit.S
> @@ -13,6 +13,4 @@
>  #include "bitops.h"
>                  .text
>  
> -ENTRY(_test_and_clear_bit)
> -	testop	bicne, strne
> -ENDPROC(_test_and_clear_bit)
> +testop	_test_and_clear_bit, bicne, strne
> diff --git a/xen/arch/arm/arm32/lib/testsetbit.S b/xen/arch/arm/arm32/lib/testsetbit.S
> index 1b8d273..54f48f9 100644
> --- a/xen/arch/arm/arm32/lib/testsetbit.S
> +++ b/xen/arch/arm/arm32/lib/testsetbit.S
> @@ -13,6 +13,4 @@
>  #include "bitops.h"
>                  .text
>  
> -ENTRY(_test_and_set_bit)
> -	testop	orreq, streq
> -ENDPROC(_test_and_set_bit)
> +testop	_test_and_set_bit, orreq, streq
> 


-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/17] xen: arm32: ensure cmpxchg has full barrier semantics
  2014-03-20 15:45 ` [PATCH 03/17] xen: arm32: ensure cmpxchg has full barrier semantics Ian Campbell
@ 2014-03-20 17:22   ` Julien Grall
  0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:22 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

Hi Ian,

On 03/20/2014 03:45 PM, Ian Campbell wrote:
> Unrelated reads/writes should not pass the xchg.
> 
> Provide cmpxchg_local for parity with arm64, although it appears to be unused.
> It also helps make the reason for the separation of __cmpxchg_mb more
> apparent.
> 
> With this our cmpxchg is in sync with Linux v3.14-rc7.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> ---
> We got our cmpxchg implementation from Linux which AFAICS has always had these
> additional barriers. I don't recall us having decided that Xen barriers should
> not have this property as well, and if we did we were remiss in not adding a
> comment etc... If my memory is faulty then I am happy to replace thispatch
> with one which adds a comment instead.

I think the barrier is good for Xen. We may have some place where both
of this macro are used as a "barrier".

Acked-by: Julien Grall <julien.grall@linaro.org>

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 04/17] xen: arm32: replace hard tabs in atomics.h
  2014-03-20 15:45 ` [PATCH 04/17] xen: arm32: replace hard tabs in atomics.h Ian Campbell
@ 2014-03-20 17:23   ` Julien Grall
  0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:23 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

Hi Ian,

On 03/20/2014 03:45 PM, Ian Campbell wrote:
> This file is from Linux and the intention was to keep the formatting the same
> to make resyncing easier. Put the hardtabs back and adjust the emacs magic to
> reflect the desired use of whitespace.
> 
> Adjust the 64-bit  emacs magic too.
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>


I guess it was just mechanical replace:

Acked-by: Julien Grall <julien.grall@linaro.org>

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/17] xen: arm32: resync atomics with (almost) v3.14-rc7
  2014-03-20 15:45 ` [PATCH 05/17] xen: arm32: resync atomics with (almost) v3.14-rc7 Ian Campbell
@ 2014-03-20 17:27   ` Julien Grall
  2014-03-21  8:41     ` Ian Campbell
  0 siblings, 1 reply; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:27 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

Hi Ian,

On 03/20/2014 03:45 PM, Ian Campbell wrote:
> diff --git a/xen/include/asm-arm/arm32/atomic.h b/xen/include/asm-arm/arm32/atomic.h
> index 3f024d4..d309f66 100644
> --- a/xen/include/asm-arm/arm32/atomic.h
> +++ b/xen/include/asm-arm/arm32/atomic.h
> @@ -21,6 +21,7 @@ static inline void atomic_add(int i, atomic_t *v)
>  	unsigned long tmp;
>  	int result;
>  
> +	prefetchw(&v->counter);

Xen on ARM doesn't provide prefetch* helper. Shall we implement it?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 06/17] xen: arm32: resync mem* with Linux v3.14-rc7
  2014-03-20 15:45 ` [PATCH 06/17] xen: arm32: resync mem* with Linux v3.14-rc7 Ian Campbell
@ 2014-03-20 17:29   ` Julien Grall
  0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:29 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

On 03/20/2014 03:45 PM, Ian Campbell wrote:
> This pulls in the following Linux commits:
> commit 455bd4c430b0c0a361f38e8658a0d6cb469942b5
> Author: Ivan Djelic <ivan.djelic@parrot.com>
> Date:   Wed Mar 6 20:09:27 2013 +0100
> 
>     ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimi
> 
>     Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
>     assumptions about the implementation of memset and similar functions.
>     The current ARM optimized memset code does not return the value of
>     its first argument, as is usually expected from standard implementations.
> 
>     For instance in the following function:
> 
>     void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waite
>     {
>         memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
>         waiter->magic = waiter;
>         INIT_LIST_HEAD(&waiter->list);
>     }
> 
>     compiled as:
> 
>     800554d0 <debug_mutex_lock_common>:
>     800554d0:       e92d4008        push    {r3, lr}
>     800554d4:       e1a00001        mov     r0, r1
>     800554d8:       e3a02010        mov     r2, #16 ; 0x10
>     800554dc:       e3a01011        mov     r1, #17 ; 0x11
>     800554e0:       eb04426e        bl      80165ea0 <memset>
>     800554e4:       e1a03000        mov     r3, r0
>     800554e8:       e583000c        str     r0, [r3, #12]
>     800554ec:       e5830000        str     r0, [r3]
>     800554f0:       e5830004        str     r0, [r3, #4]
>     800554f4:       e8bd8008        pop     {r3, pc}
> 
>     GCC assumes memset returns the value of pointer 'waiter' in register r0; ca
>     register/memory corruptions.
> 
>     This patch fixes the return value of the assembly version of memset.
>     It adds a 'mov' instruction and merges an additional load+store into
>     existing load/store instructions.
>     For ease of review, here is a breakdown of the patch into 4 simple steps:
> 
>     Step 1
>     ======
>     Perform the following substitutions:
>     ip -> r8, then
>     r0 -> ip,
>     and insert 'mov ip, r0' as the first statement of the function.
>     At this point, we have a memset() implementation returning the proper resul
>     but corrupting r8 on some paths (the ones that were using ip).
> 
>     Step 2
>     ======
>     Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:
> 
>     save r8:
>     -       str     lr, [sp, #-4]!
>     +       stmfd   sp!, {r8, lr}
> 
>     and restore r8 on both exit paths:
>     -       ldmeqfd sp!, {pc}               @ Now <64 bytes to go.
>     +       ldmeqfd sp!, {r8, pc}           @ Now <64 bytes to go.
>     (...)
>             tst     r2, #16
>             stmneia ip!, {r1, r3, r8, lr}
>     -       ldr     lr, [sp], #4
>     +       ldmfd   sp!, {r8, lr}
> 
>     Step 3
>     ======
>     Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:
> 
>     save r8:
>     -       stmfd   sp!, {r4-r7, lr}
>     +       stmfd   sp!, {r4-r8, lr}
> 
>     and restore r8 on both exit paths:
>             bgt     3b
>     -       ldmeqfd sp!, {r4-r7, pc}
>     +       ldmeqfd sp!, {r4-r8, pc}
>     (...)
>             tst     r2, #16
>             stmneia ip!, {r4-r7}
>     -       ldmfd   sp!, {r4-r7, lr}
>     +       ldmfd   sp!, {r4-r8, lr}
> 
>     Step 4
>     ======
>     Rewrite register list "r4-r7, r8" as "r4-r8".
> 
>     Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
>     Reviewed-by: Nicolas Pitre <nico@linaro.org>
>     Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
>     Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> 
> commit 418df63adac56841ef6b0f1fcf435bc64d4ed177
> Author: Nicolas Pitre <nicolas.pitre@linaro.org>
> Date:   Tue Mar 12 13:00:42 2013 +0100
> 
>     ARM: 7670/1: fix the memset fix
> 
>     Commit 455bd4c430b0 ("ARM: 7668/1: fix memset-related crashes caused by
>     recent GCC (4.7.2) optimizations") attempted to fix a compliance issue
>     with the memset return value.  However the memset itself became broken
>     by that patch for misaligned pointers.
> 
>     This fixes the above by branching over the entry code from the
>     misaligned fixup code to avoid reloading the original pointer.
> 
>     Also, because the function entry alignment is wrong in the Thumb mode
>     compilation, that fixup code is moved to the end.
> 
>     While at it, the entry instructions are slightly reworked to help dual
>     issue pipelines.
> 
>     Signed-off-by: Nicolas Pitre <nico@linaro.org>
>     Tested-by: Alexander Holler <holler@ahsoftware.de>
>     Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 07/17] xen: arm32: add optimised memchr routine
  2014-03-20 15:45 ` [PATCH 07/17] xen: arm32: add optimised memchr routine Ian Campbell
@ 2014-03-20 17:32   ` Julien Grall
  0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:32 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

On 03/20/2014 03:45 PM, Ian Campbell wrote:
> This isn't used enough to be critical, but it completes the set of mem*.
> 
> Taken from Linux v3.14-rc7.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 08/17] xen: arm32: add optimised strchr and strrchr routines
  2014-03-20 15:45 ` [PATCH 08/17] xen: arm32: add optimised strchr and strrchr routines Ian Campbell
@ 2014-03-20 17:33   ` Julien Grall
  0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:33 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

On 03/20/2014 03:45 PM, Ian Campbell wrote:
> Taken from Linux v3.14-rc7.
> 
> These aren't widely used enough to be critical, but we may as well have them.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 09/17] xen: arm: remove atomic_clear_mask()
  2014-03-20 15:45 ` [PATCH 09/17] xen: arm: remove atomic_clear_mask() Ian Campbell
@ 2014-03-20 17:35   ` Julien Grall
  0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:35 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

On 03/20/2014 03:45 PM, Ian Campbell wrote:
> This has no users.
> 
> This brings arm32 atomic.h into sync with Linux v3.14-rc7.
> 
> arm64/atomic.h requires other patches for this to be the case.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 11/17] xen: arm64: atomics: fix use of acquire + release for full barrier semantics
  2014-03-20 15:45 ` [PATCH 11/17] xen: arm64: atomics: fix use of acquire + release for full barrier semantics Ian Campbell
@ 2014-03-20 17:43   ` Julien Grall
  0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:43 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

On 03/20/2014 03:45 PM, Ian Campbell wrote:
> Xen, like Linux, expects full barrier semantics for bitops, atomics and
> cmpxchgs. This issue was discovered on Linux and we get our implementation of
> these from Linux so quoting Will Deacon in Linux commit 8e86f0b409a4 for the
> gory details:
>     Linux requires a number of atomic operations to provide full barrier
>     semantics, that is no memory accesses after the operation can be
>     observed before any accesses up to and including the operation in
>     program order.
> 
>     On arm64, these operations have been incorrectly implemented as follows:
> 
>         // A, B, C are independent memory locations
> 
>         <Access [A]>
> 
>         // atomic_op (B)
>     1:  ldaxr   x0, [B]         // Exclusive load with acquire
>         <op(B)>
>         stlxr   w1, x0, [B]     // Exclusive store with release
>         cbnz    w1, 1b
> 
>         <Access [C]>
> 
>     The assumption here being that two half barriers are equivalent to a
>     full barrier, so the only permitted ordering would be A -> B -> C
>     (where B is the atomic operation involving both a load and a store).
> 
>     Unfortunately, this is not the case by the letter of the architecture
>     and, in fact, the accesses to A and C are permitted to pass their
>     nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
>     or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
>     store-release on B). This is a clear violation of the full barrier
>     requirement.
> 
>     The simple way to fix this is to implement the same algorithm as ARMv7
>     using explicit barriers:
> 
>         <Access [A]>
> 
>         // atomic_op (B)
>         dmb     ish             // Full barrier
>     1:  ldxr    x0, [B]         // Exclusive load
>         <op(B)>
>         stxr    w1, x0, [B]     // Exclusive store
>         cbnz    w1, 1b
>         dmb     ish             // Full barrier
> 
>         <Access [C]>
> 
>     but this has the undesirable effect of introducing *two* full barrier
>     instructions. A better approach is actually the following, non-intuitive
>     sequence:
> 
>         <Access [A]>
> 
>         // atomic_op (B)
>     1:  ldxr    x0, [B]         // Exclusive load
>         <op(B)>
>         stlxr   w1, x0, [B]     // Exclusive store with release
>         cbnz    w1, 1b
>         dmb     ish             // Full barrier
> 
>         <Access [C]>
> 
>     The simple observations here are:
> 
>       - The dmb ensures that no subsequent accesses (e.g. the access to C)
>         can enter or pass the atomic sequence.
> 
>       - The dmb also ensures that no prior accesses (e.g. the access to A)
>         can pass the atomic sequence.
> 
>       - Therefore, no prior access can pass a subsequent access, or
>         vice-versa (i.e. A is strictly ordered before C).
> 
>       - The stlxr ensures that no prior access can pass the store component
>         of the atomic operation.
> 
>     The only tricky part remaining is the ordering between the ldxr and the
>     access to A, since the absence of the first dmb means that we're now
>     permitting re-ordering between the ldxr and any prior accesses.
> 
>     From an (arbitrary) observer's point of view, there are two scenarios:
> 
>       1. We have observed the ldxr. This means that if we perform a store to
>          [B], the ldxr will still return older data. If we can observe the
>          ldxr, then we can potentially observe the permitted re-ordering
>          with the access to A, which is clearly an issue when compared to
>          the dmb variant of the code. Thankfully, the exclusive monitor will
>          save us here since it will be cleared as a result of the store and
>          the ldxr will retry. Notice that any use of a later memory
>          observation to imply observation of the ldxr will also imply
>          observation of the access to A, since the stlxr/dmb ensure strict
>          ordering.
> 
>       2. We have not observed the ldxr. This means we can perform a store
>          and influence the later ldxr. However, that doesn't actually tell
>          us anything about the access to [A], so we've not lost anything
>          here either when compared to the dmb variant.
> 
>     This patch implements this solution for our barriered atomic operations,
>     ensuring that we satisfy the full barrier requirements where they are
>     needed.
> 
>     Cc: <stable@vger.kernel.org>
>     Cc: Peter Zijlstra <peterz@infradead.org>
>     Signed-off-by: Will Deacon <will.deacon@arm.com>
>     Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 12/17] xen: arm64: reinstate hard tabs in system.h cmpxchg
  2014-03-20 15:46 ` [PATCH 12/17] xen: arm64: reinstate hard tabs in system.h cmpxchg Ian Campbell
@ 2014-03-20 17:44   ` Julien Grall
  0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:44 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

On 03/20/2014 03:46 PM, Ian Campbell wrote:
> These functions are from Linux and the intention was to keep the formatting
> the same to make resyncing easier.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 13/17] xen: arm64: asm: remove redundant "cc" clobbers
  2014-03-20 15:46 ` [PATCH 13/17] xen: arm64: asm: remove redundant "cc" clobbers Ian Campbell
@ 2014-03-20 17:45   ` Julien Grall
  0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:45 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

On 03/20/2014 03:46 PM, Ian Campbell wrote:
> This resyncs atomics and cmpxchgs with Linux v3.14-rc7 by importing:
> commit 95c4189689f92fba7ecf9097173404d4928c6e9b
> Author: Will Deacon <will.deacon@arm.com>
> Date:   Tue Feb 4 12:29:13 2014 +0000
> 
>     arm64: asm: remove redundant "cc" clobbers
> 
>     cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
>     from inline asm blocks that only use these instructions to implement
>     conditional branches.
> 
>     Signed-off-by: Will Deacon <will.deacon@arm.com>
>     Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 14/17] xen: arm64: assembly optimised mem* and str*
  2014-03-20 15:46 ` [PATCH 14/17] xen: arm64: assembly optimised mem* and str* Ian Campbell
@ 2014-03-20 17:48   ` Julien Grall
  0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:48 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

On 03/20/2014 03:46 PM, Ian Campbell wrote:
> Taken from Linux v3.14-rc7.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 16/17] xen: arm: refactor xchg and cmpxchg into their own headers
  2014-03-20 15:46 ` [PATCH 16/17] xen: arm: refactor xchg and cmpxchg into their own headers Ian Campbell
@ 2014-03-20 17:52   ` Julien Grall
  2014-03-21  8:42     ` Ian Campbell
  0 siblings, 1 reply; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:52 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

Hi Ian,

On 03/20/2014 03:46 PM, Ian Campbell wrote:
> diff --git a/xen/include/asm-arm/arm32/cmpxchg.h b/xen/include/asm-arm/arm32/cmpxchg.h
> new file mode 100644
> index 0000000..70c6090
> --- /dev/null
> +++ b/xen/include/asm-arm/arm32/cmpxchg.h
> +static always_inline unsigned long __cmpxchg(
> +    volatile void *ptr, unsigned long old, unsigned long new, int size)
> +{
> +	unsigned long oldval, res;
> +
> +	switch (size) {
> +	case 1:
> +		do {
> +			asm volatile("@ __cmpxchg1\n"
> +			"	ldrexb	%1, [%2]\n"
> +			"	mov	%0, #0\n"
> +			"	teq	%1, %3\n"
> +			"	strexbeq %0, %4, [%2]\n"
> +				: "=&r" (res), "=&r" (oldval)
> +				: "r" (ptr), "Ir" (old), "r" (new)
> +				: "memory", "cc");
> +		} while (res);
> +		break;
> +	case 2:
> +		do {
> +			asm volatile("@ __cmpxchg2\n"
> +			"	ldrexh	%1, [%2]\n"
> +			"	mov	%0, #0\n"
> +			"	teq	%1, %3\n"
> +			"	strexheq %0, %4, [%2]\n"
> +				: "=&r" (res), "=&r" (oldval)
> +				: "r" (ptr), "Ir" (old), "r" (new)
> +				: "memory", "cc");
> +		} while (res);
> +		break;
> +	case 4:
> +		do {
> +			asm volatile("@ __cmpxchg4\n"
> +			"	ldrex	%1, [%2]\n"
> +			"	mov	%0, #0\n"
> +			"	teq	%1, %3\n"
> +			"	strexeq	%0, %4, [%2]\n"
> +				: "=&r" (res), "=&r" (oldval)
> +				: "r" (ptr), "Ir" (old), "r" (new)
> +				: "memory", "cc");
> +	    } while (res);
> +	    break;
> +#if 0
> +	case 8:
> +		do {
> +			asm volatile("@ __cmpxchg8\n"
> +			"	ldrexd	%1, [%2]\n"
> +			"	mov	%0, #0\n"
> +			"	teq	%1, %3\n"
> +			"	strexdeq %0, %4, [%2]\n"
> +				: "=&r" (res), "=&r" (oldval)
> +				: "r" (ptr), "Ir" (old), "r" (new)
> +				: "memory", "cc");
> +		} while (res);
> +		break;
> +#endif

Is it really useful to let the dead code in the header?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/17] xen: arm64: disable alignment traps
  2014-03-20 15:45 ` [PATCH 10/17] xen: arm64: disable alignment traps Ian Campbell
  2014-03-20 15:57   ` Andrew Cooper
@ 2014-03-20 17:54   ` Julien Grall
  1 sibling, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:54 UTC (permalink / raw)
  To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel

Hi Ian,

On 03/20/2014 03:45 PM, Ian Campbell wrote:
> The mem* primitives which I am about to import from Linux in a subsequent
> patch rely on the hardware handling misalignment.
> 
> The benefits of an optimised memcpy etc oughtweigh the downsides.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

With the both minor changes that Andrew and you spotted:

Acked-by: Julien Grall <julien.grall@linaro.org>

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/17] xen: arm32: resync atomics with (almost) v3.14-rc7
  2014-03-20 17:27   ` Julien Grall
@ 2014-03-21  8:41     ` Ian Campbell
  0 siblings, 0 replies; 42+ messages in thread
From: Ian Campbell @ 2014-03-21  8:41 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Thu, 2014-03-20 at 17:27 +0000, Julien Grall wrote:
> Hi Ian,
> 
> On 03/20/2014 03:45 PM, Ian Campbell wrote:
> > diff --git a/xen/include/asm-arm/arm32/atomic.h b/xen/include/asm-arm/arm32/atomic.h
> > index 3f024d4..d309f66 100644
> > --- a/xen/include/asm-arm/arm32/atomic.h
> > +++ b/xen/include/asm-arm/arm32/atomic.h
> > @@ -21,6 +21,7 @@ static inline void atomic_add(int i, atomic_t *v)
> >  	unsigned long tmp;
> >  	int result;
> >  
> > +	prefetchw(&v->counter);
> 
> Xen on ARM doesn't provide prefetch* helper. Shall we implement it?

It comes from generic code after the first patch in this series.

Ian.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 16/17] xen: arm: refactor xchg and cmpxchg into their own headers
  2014-03-20 17:52   ` Julien Grall
@ 2014-03-21  8:42     ` Ian Campbell
  0 siblings, 0 replies; 42+ messages in thread
From: Ian Campbell @ 2014-03-21  8:42 UTC (permalink / raw)
  To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel

On Thu, 2014-03-20 at 17:52 +0000, Julien Grall wrote:
> Hi Ian,
> 
> On 03/20/2014 03:46 PM, Ian Campbell wrote:
> > diff --git a/xen/include/asm-arm/arm32/cmpxchg.h b/xen/include/asm-arm/arm32/cmpxchg.h
> > new file mode 100644
> > index 0000000..70c6090
> > --- /dev/null
> > +++ b/xen/include/asm-arm/arm32/cmpxchg.h
> > +static always_inline unsigned long __cmpxchg(
> > +    volatile void *ptr, unsigned long old, unsigned long new, int size)
> > +{
> > +	unsigned long oldval, res;
> > +
> > +	switch (size) {
> > +	case 1:
> > +		do {
> > +			asm volatile("@ __cmpxchg1\n"
> > +			"	ldrexb	%1, [%2]\n"
> > +			"	mov	%0, #0\n"
> > +			"	teq	%1, %3\n"
> > +			"	strexbeq %0, %4, [%2]\n"
> > +				: "=&r" (res), "=&r" (oldval)
> > +				: "r" (ptr), "Ir" (old), "r" (new)
> > +				: "memory", "cc");
> > +		} while (res);
> > +		break;
> > +	case 2:
> > +		do {
> > +			asm volatile("@ __cmpxchg2\n"
> > +			"	ldrexh	%1, [%2]\n"
> > +			"	mov	%0, #0\n"
> > +			"	teq	%1, %3\n"
> > +			"	strexheq %0, %4, [%2]\n"
> > +				: "=&r" (res), "=&r" (oldval)
> > +				: "r" (ptr), "Ir" (old), "r" (new)
> > +				: "memory", "cc");
> > +		} while (res);
> > +		break;
> > +	case 4:
> > +		do {
> > +			asm volatile("@ __cmpxchg4\n"
> > +			"	ldrex	%1, [%2]\n"
> > +			"	mov	%0, #0\n"
> > +			"	teq	%1, %3\n"
> > +			"	strexeq	%0, %4, [%2]\n"
> > +				: "=&r" (res), "=&r" (oldval)
> > +				: "r" (ptr), "Ir" (old), "r" (new)
> > +				: "memory", "cc");
> > +	    } while (res);
> > +	    break;
> > +#if 0
> > +	case 8:
> > +		do {
> > +			asm volatile("@ __cmpxchg8\n"
> > +			"	ldrexd	%1, [%2]\n"
> > +			"	mov	%0, #0\n"
> > +			"	teq	%1, %3\n"
> > +			"	strexdeq %0, %4, [%2]\n"
> > +				: "=&r" (res), "=&r" (oldval)
> > +				: "r" (ptr), "Ir" (old), "r" (new)
> > +				: "memory", "cc");
> > +		} while (res);
> > +		break;
> > +#endif
> 
> Is it really useful to let the dead code in the header?

This was a pure code motion patch so I'm not going to remove it here.

In any case this will come in handy the first time someone tries to
cmpxchg an 8 byte value.

Ian.

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2014-03-21  8:42 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
2014-03-20 15:45 ` [PATCH 01/17] xen: x86 & generic: change to __builtin_prefetch() Ian Campbell
2014-03-20 16:12   ` Jan Beulich
2014-03-20 15:45 ` [PATCH 02/17] xen: arm32: resync bitops with Linux v3.14-rc7 Ian Campbell
2014-03-20 17:13   ` Julien Grall
2014-03-20 15:45 ` [PATCH 03/17] xen: arm32: ensure cmpxchg has full barrier semantics Ian Campbell
2014-03-20 17:22   ` Julien Grall
2014-03-20 15:45 ` [PATCH 04/17] xen: arm32: replace hard tabs in atomics.h Ian Campbell
2014-03-20 17:23   ` Julien Grall
2014-03-20 15:45 ` [PATCH 05/17] xen: arm32: resync atomics with (almost) v3.14-rc7 Ian Campbell
2014-03-20 17:27   ` Julien Grall
2014-03-21  8:41     ` Ian Campbell
2014-03-20 15:45 ` [PATCH 06/17] xen: arm32: resync mem* with Linux v3.14-rc7 Ian Campbell
2014-03-20 17:29   ` Julien Grall
2014-03-20 15:45 ` [PATCH 07/17] xen: arm32: add optimised memchr routine Ian Campbell
2014-03-20 17:32   ` Julien Grall
2014-03-20 15:45 ` [PATCH 08/17] xen: arm32: add optimised strchr and strrchr routines Ian Campbell
2014-03-20 17:33   ` Julien Grall
2014-03-20 15:45 ` [PATCH 09/17] xen: arm: remove atomic_clear_mask() Ian Campbell
2014-03-20 17:35   ` Julien Grall
2014-03-20 15:45 ` [PATCH 10/17] xen: arm64: disable alignment traps Ian Campbell
2014-03-20 15:57   ` Andrew Cooper
2014-03-20 15:59     ` Ian Campbell
2014-03-20 16:21       ` Gordan Bobic
2014-03-20 16:27         ` Ian Campbell
2014-03-20 16:43           ` Gordan Bobic
2014-03-20 16:54             ` Ian Campbell
2014-03-20 17:54   ` Julien Grall
2014-03-20 15:45 ` [PATCH 11/17] xen: arm64: atomics: fix use of acquire + release for full barrier semantics Ian Campbell
2014-03-20 17:43   ` Julien Grall
2014-03-20 15:46 ` [PATCH 12/17] xen: arm64: reinstate hard tabs in system.h cmpxchg Ian Campbell
2014-03-20 17:44   ` Julien Grall
2014-03-20 15:46 ` [PATCH 13/17] xen: arm64: asm: remove redundant "cc" clobbers Ian Campbell
2014-03-20 17:45   ` Julien Grall
2014-03-20 15:46 ` [PATCH 14/17] xen: arm64: assembly optimised mem* and str* Ian Campbell
2014-03-20 17:48   ` Julien Grall
2014-03-20 15:46 ` [PATCH 15/17] xen: arm64: optimised clear_page Ian Campbell
2014-03-20 15:46 ` [PATCH 16/17] xen: arm: refactor xchg and cmpxchg into their own headers Ian Campbell
2014-03-20 17:52   ` Julien Grall
2014-03-21  8:42     ` Ian Campbell
2014-03-20 15:46 ` [PATCH 17/17] xen: arm: document what low level primitives we have imported from Linux Ian Campbell
2014-03-20 16:23   ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.