* [PATCH 00/17] xen: arm: resync low level asm primitive from Linux
@ 2014-03-20 15:45 Ian Campbell
2014-03-20 15:45 ` [PATCH 01/17] xen: x86 & generic: change to __builtin_prefetch() Ian Campbell
` (16 more replies)
0 siblings, 17 replies; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
To: xen-devel
Cc: Julien Grall, Stefano Stabellini, Keir Fraser, Tim Deegan,
Jan Beulich
(Jan/Keir -- only the first patch is of interest to you)
The following resyncs the bitops, atomics, cmpxchg and various optimised
library functions (str*, mem*, clear_page) from Linux. It also adds
various additional optimised variants, especially for arm64 which was
lacking them in Linux when we started.
One area which I have skipped is spinlocks, the generic infrastructure
is pretty different between Xen and Linux so this would need more
thought (it would have included a switch to ticket locks on arm64 for
example..).
I've combined multiple Linux changes into a single Xen change where I
thought it made sense, i.e. for smaller changes even if they are
independent, but for large and complicated changes I've kept things
separate.
As part of this I've also reinstated Linux coding style (in particular
the use of hard tabs) to make life easier when comparing things. This
was always the intention but it seems one or two files got accidentally
reindented at some point.
This booted a guest on both Midway and Xgene. I haven't done any actual
perf measurement, having assumed that whoever wrote this for Linux found
them to be worthwhile enough...
Ian.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 01/17] xen: x86 & generic: change to __builtin_prefetch()
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
2014-03-20 16:12 ` Jan Beulich
2014-03-20 15:45 ` [PATCH 02/17] xen: arm32: resync bitops with Linux v3.14-rc7 Ian Campbell
` (15 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
To: xen-devel
Cc: Keir Fraser, Ian Campbell, stefano.stabellini, julien.grall, tim,
Jan Beulich
Quoting Andi Kleen in Linux b483570a13be from 2007:
gcc 3.2+ supports __builtin_prefetch, so it's possible to use it on all
architectures. Change the generic fallback in linux/prefetch.h to use it
instead of noping it out. gcc should do the right thing when the
architecture doesn't support prefetching
Undefine the x86-64 inline assembler version and use the fallback.
ARM wants to use the builtins.
Fix a pair of spelling errors, one of which was from Lucas De Marchi in the
Linux tree.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
---
xen/include/xen/prefetch.h | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/xen/include/xen/prefetch.h b/xen/include/xen/prefetch.h
index 8d7d3ff..ba73998 100644
--- a/xen/include/xen/prefetch.h
+++ b/xen/include/xen/prefetch.h
@@ -28,24 +28,17 @@
prefetchw(x) - prefetches the cacheline at "x" for write
spin_lock_prefetch(x) - prefectches the spinlock *x for taking
- there is also PREFETCH_STRIDE which is the architecure-prefered
+ there is also PREFETCH_STRIDE which is the architecture-preferred
"lookahead" size for prefetching streamed operations.
*/
-/*
- * These cannot be do{}while(0) macros. See the mental gymnastics in
- * the loop macro.
- */
-
#ifndef ARCH_HAS_PREFETCH
-#define ARCH_HAS_PREFETCH
-static inline void prefetch(const void *x) {;}
+#define prefetch(x) __builtin_prefetch(x)
#endif
#ifndef ARCH_HAS_PREFETCHW
-#define ARCH_HAS_PREFETCHW
-static inline void prefetchw(const void *x) {;}
+#define prefetchw(x) __builtin_prefetch(x,1)
#endif
#ifndef ARCH_HAS_SPINLOCK_PREFETCH
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 02/17] xen: arm32: resync bitops with Linux v3.14-rc7
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
2014-03-20 15:45 ` [PATCH 01/17] xen: x86 & generic: change to __builtin_prefetch() Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
2014-03-20 17:13 ` Julien Grall
2014-03-20 15:45 ` [PATCH 03/17] xen: arm32: ensure cmpxchg has full barrier semantics Ian Campbell
` (14 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
This pulls in the following Linux commits:
commit c36ef4b1762302a493c6cb754073bded084700e2
Author: Will Deacon <will.deacon@arm.com>
Date: Wed Nov 23 11:28:25 2011 +0100
ARM: 7171/1: unwind: add unwind directives to bitops assembly macros
The bitops functions (e.g. _test_and_set_bit) on ARM do not have unwind
annotations and therefore the kernel cannot backtrace out of them on a
fatal error (for example, NULL pointer dereference).
This patch annotates the bitops assembly macros with UNWIND annotations
so that we can produce a meaningful backtrace on error. Callers of the
macros are modified to pass their function name as a macro parameter,
enforcing that the macros are used as standalone function implementations.
Acked-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
commit d779c07dd72098a7416d907494f958213b7726f3
Author: Will Deacon <will.deacon@arm.com>
Date: Thu Jun 27 12:01:51 2013 +0100
ARM: bitops: prefetch the destination word for write prior to strex
The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.
This patch prefixes our atomic bitops implementation with prefetchw,
to try and grab the line in exclusive state from the start. The testop
macro is left alone, since the barrier semantics limit the usefulness
of prefetching data.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
commit b7ec699405f55667caeb46d96229d75bf33a83ad
Author: Will Deacon <will.deacon@arm.com>
Date: Tue Nov 19 15:46:11 2013 +0100
ARM: 7893/1: bitops: only emit .arch_extension mp if CONFIG_SMP
Uwe reported a build failure when targetting a NOMMU platform with my
recent prefetch changes:
arch/arm/lib/changebit.S: Assembler messages:
arch/arm/lib/changebit.S:15: Error: architectural extension `mp' is
not allowed for the current base architecture
This is due to use of the .arch_extension mp directive immediately prior
to an ALT_SMP(...) instruction. Whilst the ALT_SMP macro will expand to
nothing if !CONFIG_SMP, gas will still choke on the directive.
This patch fixes the issue by only emitting the sequence (including the
directive) if CONFIG_SMP=y.
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
xen/arch/arm/arm32/lib/bitops.h | 17 +++++++++++++++--
xen/arch/arm/arm32/lib/changebit.S | 4 +---
xen/arch/arm/arm32/lib/clearbit.S | 4 +---
xen/arch/arm/arm32/lib/setbit.S | 4 +---
xen/arch/arm/arm32/lib/testchangebit.S | 4 +---
xen/arch/arm/arm32/lib/testclearbit.S | 4 +---
xen/arch/arm/arm32/lib/testsetbit.S | 4 +---
7 files changed, 21 insertions(+), 20 deletions(-)
diff --git a/xen/arch/arm/arm32/lib/bitops.h b/xen/arch/arm/arm32/lib/bitops.h
index 689f2e8..25784c3 100644
--- a/xen/arch/arm/arm32/lib/bitops.h
+++ b/xen/arch/arm/arm32/lib/bitops.h
@@ -1,13 +1,20 @@
#include <xen/config.h>
#if __LINUX_ARM_ARCH__ >= 6
- .macro bitop, instr
+ .macro bitop, name, instr
+ENTRY( \name )
+UNWIND( .fnstart )
ands ip, r1, #3
strneb r1, [ip] @ assert word-aligned
mov r2, #1
and r3, r0, #31 @ Get bit offset
mov r0, r0, lsr #5
add r1, r1, r0, lsl #2 @ Get word offset
+#if __LINUX_ARM_ARCH__ >= 7 && defined(CONFIG_SMP)
+ .arch_extension mp
+ ALT_SMP(W(pldw) [r1])
+ ALT_UP(W(nop))
+#endif
mov r3, r2, lsl r3
1: ldrex r2, [r1]
\instr r2, r2, r3
@@ -15,9 +22,13 @@
cmp r0, #0
bne 1b
bx lr
+UNWIND( .fnend )
+ENDPROC(\name )
.endm
- .macro testop, instr, store
+ .macro testop, name, instr, store
+ENTRY( \name )
+UNWIND( .fnstart )
ands ip, r1, #3
strneb r1, [ip] @ assert word-aligned
mov r2, #1
@@ -36,6 +47,8 @@
cmp r0, #0
movne r0, #1
2: bx lr
+UNWIND( .fnend )
+ENDPROC(\name )
.endm
#else
.macro bitop, name, instr
diff --git a/xen/arch/arm/arm32/lib/changebit.S b/xen/arch/arm/arm32/lib/changebit.S
index 62954bc..11f41d2 100644
--- a/xen/arch/arm/arm32/lib/changebit.S
+++ b/xen/arch/arm/arm32/lib/changebit.S
@@ -13,6 +13,4 @@
#include "bitops.h"
.text
-ENTRY(_change_bit)
- bitop eor
-ENDPROC(_change_bit)
+bitop _change_bit, eor
diff --git a/xen/arch/arm/arm32/lib/clearbit.S b/xen/arch/arm/arm32/lib/clearbit.S
index 42ce416..1b6a569 100644
--- a/xen/arch/arm/arm32/lib/clearbit.S
+++ b/xen/arch/arm/arm32/lib/clearbit.S
@@ -14,6 +14,4 @@
#include "bitops.h"
.text
-ENTRY(_clear_bit)
- bitop bic
-ENDPROC(_clear_bit)
+bitop _clear_bit, bic
diff --git a/xen/arch/arm/arm32/lib/setbit.S b/xen/arch/arm/arm32/lib/setbit.S
index c828851..1f4ef56 100644
--- a/xen/arch/arm/arm32/lib/setbit.S
+++ b/xen/arch/arm/arm32/lib/setbit.S
@@ -13,6 +13,4 @@
#include "bitops.h"
.text
-ENTRY(_set_bit)
- bitop orr
-ENDPROC(_set_bit)
+bitop _set_bit, orr
diff --git a/xen/arch/arm/arm32/lib/testchangebit.S b/xen/arch/arm/arm32/lib/testchangebit.S
index a7f527c..7f4635c 100644
--- a/xen/arch/arm/arm32/lib/testchangebit.S
+++ b/xen/arch/arm/arm32/lib/testchangebit.S
@@ -13,6 +13,4 @@
#include "bitops.h"
.text
-ENTRY(_test_and_change_bit)
- testop eor, str
-ENDPROC(_test_and_change_bit)
+testop _test_and_change_bit, eor, str
diff --git a/xen/arch/arm/arm32/lib/testclearbit.S b/xen/arch/arm/arm32/lib/testclearbit.S
index 8f39c72..4d4152f 100644
--- a/xen/arch/arm/arm32/lib/testclearbit.S
+++ b/xen/arch/arm/arm32/lib/testclearbit.S
@@ -13,6 +13,4 @@
#include "bitops.h"
.text
-ENTRY(_test_and_clear_bit)
- testop bicne, strne
-ENDPROC(_test_and_clear_bit)
+testop _test_and_clear_bit, bicne, strne
diff --git a/xen/arch/arm/arm32/lib/testsetbit.S b/xen/arch/arm/arm32/lib/testsetbit.S
index 1b8d273..54f48f9 100644
--- a/xen/arch/arm/arm32/lib/testsetbit.S
+++ b/xen/arch/arm/arm32/lib/testsetbit.S
@@ -13,6 +13,4 @@
#include "bitops.h"
.text
-ENTRY(_test_and_set_bit)
- testop orreq, streq
-ENDPROC(_test_and_set_bit)
+testop _test_and_set_bit, orreq, streq
--
1.7.10.4
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 03/17] xen: arm32: ensure cmpxchg has full barrier semantics
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
2014-03-20 15:45 ` [PATCH 01/17] xen: x86 & generic: change to __builtin_prefetch() Ian Campbell
2014-03-20 15:45 ` [PATCH 02/17] xen: arm32: resync bitops with Linux v3.14-rc7 Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
2014-03-20 17:22 ` Julien Grall
2014-03-20 15:45 ` [PATCH 04/17] xen: arm32: replace hard tabs in atomics.h Ian Campbell
` (13 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
Unrelated reads/writes should not pass the xchg.
Provide cmpxchg_local for parity with arm64, although it appears to be unused.
It also helps make the reason for the separation of __cmpxchg_mb more
apparent.
With this our cmpxchg is in sync with Linux v3.14-rc7.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
We got our cmpxchg implementation from Linux which AFAICS has always had these
additional barriers. I don't recall us having decided that Xen barriers should
not have this property as well, and if we did we were remiss in not adding a
comment etc... If my memory is faulty then I am happy to replace thispatch
with one which adds a comment instead.
---
xen/include/asm-arm/arm32/system.h | 26 +++++++++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)
diff --git a/xen/include/asm-arm/arm32/system.h b/xen/include/asm-arm/arm32/system.h
index 9f233fe..dfaa3b6 100644
--- a/xen/include/asm-arm/arm32/system.h
+++ b/xen/include/asm-arm/arm32/system.h
@@ -113,9 +113,29 @@ static always_inline unsigned long __cmpxchg(
return oldval;
}
-#define cmpxchg(ptr,o,n) \
- ((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o), \
- (unsigned long)(n),sizeof(*(ptr))))
+static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
+ unsigned long new, int size)
+{
+ unsigned long ret;
+
+ smp_mb();
+ ret = __cmpxchg(ptr, old, new, size);
+ smp_mb();
+
+ return ret;
+}
+
+#define cmpxchg(ptr,o,n) \
+ ((__typeof__(*(ptr)))__cmpxchg_mb((ptr), \
+ (unsigned long)(o), \
+ (unsigned long)(n), \
+ sizeof(*(ptr))))
+
+#define cmpxchg_local(ptr,o,n) \
+ ((__typeof__(*(ptr)))__cmpxchg((ptr), \
+ (unsigned long)(o), \
+ (unsigned long)(n), \
+ sizeof(*(ptr))))
#define local_irq_disable() asm volatile ( "cpsid i @ local_irq_disable\n" : : : "cc" )
#define local_irq_enable() asm volatile ( "cpsie i @ local_irq_enable\n" : : : "cc" )
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 04/17] xen: arm32: replace hard tabs in atomics.h
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
` (2 preceding siblings ...)
2014-03-20 15:45 ` [PATCH 03/17] xen: arm32: ensure cmpxchg has full barrier semantics Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
2014-03-20 17:23 ` Julien Grall
2014-03-20 15:45 ` [PATCH 05/17] xen: arm32: resync atomics with (almost) v3.14-rc7 Ian Campbell
` (12 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
This file is from Linux and the intention was to keep the formatting the same
to make resyncing easier. Put the hardtabs back and adjust the emacs magic to
reflect the desired use of whitespace.
Adjust the 64-bit emacs magic too.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
xen/include/asm-arm/arm32/atomic.h | 166 ++++++++++++++++++------------------
xen/include/asm-arm/arm64/atomic.h | 4 +-
2 files changed, 85 insertions(+), 85 deletions(-)
diff --git a/xen/include/asm-arm/arm32/atomic.h b/xen/include/asm-arm/arm32/atomic.h
index 523c745..3f024d4 100644
--- a/xen/include/asm-arm/arm32/atomic.h
+++ b/xen/include/asm-arm/arm32/atomic.h
@@ -18,122 +18,122 @@
*/
static inline void atomic_add(int i, atomic_t *v)
{
- unsigned long tmp;
- int result;
-
- __asm__ __volatile__("@ atomic_add\n"
-"1: ldrex %0, [%3]\n"
-" add %0, %0, %4\n"
-" strex %1, %0, [%3]\n"
-" teq %1, #0\n"
-" bne 1b"
- : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
- : "r" (&v->counter), "Ir" (i)
- : "cc");
+ unsigned long tmp;
+ int result;
+
+ __asm__ __volatile__("@ atomic_add\n"
+"1: ldrex %0, [%3]\n"
+" add %0, %0, %4\n"
+" strex %1, %0, [%3]\n"
+" teq %1, #0\n"
+" bne 1b"
+ : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
+ : "r" (&v->counter), "Ir" (i)
+ : "cc");
}
static inline int atomic_add_return(int i, atomic_t *v)
{
- unsigned long tmp;
- int result;
+ unsigned long tmp;
+ int result;
- smp_mb();
+ smp_mb();
- __asm__ __volatile__("@ atomic_add_return\n"
-"1: ldrex %0, [%3]\n"
-" add %0, %0, %4\n"
-" strex %1, %0, [%3]\n"
-" teq %1, #0\n"
-" bne 1b"
- : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
- : "r" (&v->counter), "Ir" (i)
- : "cc");
+ __asm__ __volatile__("@ atomic_add_return\n"
+"1: ldrex %0, [%3]\n"
+" add %0, %0, %4\n"
+" strex %1, %0, [%3]\n"
+" teq %1, #0\n"
+" bne 1b"
+ : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
+ : "r" (&v->counter), "Ir" (i)
+ : "cc");
- smp_mb();
+ smp_mb();
- return result;
+ return result;
}
static inline void atomic_sub(int i, atomic_t *v)
{
- unsigned long tmp;
- int result;
-
- __asm__ __volatile__("@ atomic_sub\n"
-"1: ldrex %0, [%3]\n"
-" sub %0, %0, %4\n"
-" strex %1, %0, [%3]\n"
-" teq %1, #0\n"
-" bne 1b"
- : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
- : "r" (&v->counter), "Ir" (i)
- : "cc");
+ unsigned long tmp;
+ int result;
+
+ __asm__ __volatile__("@ atomic_sub\n"
+"1: ldrex %0, [%3]\n"
+" sub %0, %0, %4\n"
+" strex %1, %0, [%3]\n"
+" teq %1, #0\n"
+" bne 1b"
+ : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
+ : "r" (&v->counter), "Ir" (i)
+ : "cc");
}
static inline int atomic_sub_return(int i, atomic_t *v)
{
- unsigned long tmp;
- int result;
+ unsigned long tmp;
+ int result;
- smp_mb();
+ smp_mb();
- __asm__ __volatile__("@ atomic_sub_return\n"
-"1: ldrex %0, [%3]\n"
-" sub %0, %0, %4\n"
-" strex %1, %0, [%3]\n"
-" teq %1, #0\n"
-" bne 1b"
- : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
- : "r" (&v->counter), "Ir" (i)
- : "cc");
+ __asm__ __volatile__("@ atomic_sub_return\n"
+"1: ldrex %0, [%3]\n"
+" sub %0, %0, %4\n"
+" strex %1, %0, [%3]\n"
+" teq %1, #0\n"
+" bne 1b"
+ : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
+ : "r" (&v->counter), "Ir" (i)
+ : "cc");
- smp_mb();
+ smp_mb();
- return result;
+ return result;
}
static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
{
- unsigned long oldval, res;
+ unsigned long oldval, res;
- smp_mb();
+ smp_mb();
- do {
- __asm__ __volatile__("@ atomic_cmpxchg\n"
- "ldrex %1, [%3]\n"
- "mov %0, #0\n"
- "teq %1, %4\n"
- "strexeq %0, %5, [%3]\n"
- : "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
- : "r" (&ptr->counter), "Ir" (old), "r" (new)
- : "cc");
- } while (res);
+ do {
+ __asm__ __volatile__("@ atomic_cmpxchg\n"
+ "ldrex %1, [%3]\n"
+ "mov %0, #0\n"
+ "teq %1, %4\n"
+ "strexeq %0, %5, [%3]\n"
+ : "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
+ : "r" (&ptr->counter), "Ir" (old), "r" (new)
+ : "cc");
+ } while (res);
- smp_mb();
+ smp_mb();
- return oldval;
+ return oldval;
}
static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
{
- unsigned long tmp, tmp2;
-
- __asm__ __volatile__("@ atomic_clear_mask\n"
-"1: ldrex %0, [%3]\n"
-" bic %0, %0, %4\n"
-" strex %1, %0, [%3]\n"
-" teq %1, #0\n"
-" bne 1b"
- : "=&r" (tmp), "=&r" (tmp2), "+Qo" (*addr)
- : "r" (addr), "Ir" (mask)
- : "cc");
+ unsigned long tmp, tmp2;
+
+ __asm__ __volatile__("@ atomic_clear_mask\n"
+"1: ldrex %0, [%3]\n"
+" bic %0, %0, %4\n"
+" strex %1, %0, [%3]\n"
+" teq %1, #0\n"
+" bne 1b"
+ : "=&r" (tmp), "=&r" (tmp2), "+Qo" (*addr)
+ : "r" (addr), "Ir" (mask)
+ : "cc");
}
-#define atomic_inc(v) atomic_add(1, v)
-#define atomic_dec(v) atomic_sub(1, v)
+#define atomic_inc(v) atomic_add(1, v)
+#define atomic_dec(v) atomic_sub(1, v)
-#define atomic_inc_and_test(v) (atomic_add_return(1, v) == 0)
-#define atomic_dec_and_test(v) (atomic_sub_return(1, v) == 0)
+#define atomic_inc_and_test(v) (atomic_add_return(1, v) == 0)
+#define atomic_dec_and_test(v) (atomic_sub_return(1, v) == 0)
#define atomic_inc_return(v) (atomic_add_return(1, v))
#define atomic_dec_return(v) (atomic_sub_return(1, v))
#define atomic_sub_and_test(i, v) (atomic_sub_return(i, v) == 0)
@@ -145,7 +145,7 @@ static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
* Local variables:
* mode: C
* c-file-style: "BSD"
- * c-basic-offset: 4
- * indent-tabs-mode: nil
+ * c-basic-offset: 8
+ * indent-tabs-mode: t
* End:
*/
diff --git a/xen/include/asm-arm/arm64/atomic.h b/xen/include/asm-arm/arm64/atomic.h
index a279755..b04e6d5 100644
--- a/xen/include/asm-arm/arm64/atomic.h
+++ b/xen/include/asm-arm/arm64/atomic.h
@@ -157,7 +157,7 @@ static inline int __atomic_add_unless(atomic_t *v, int a, int u)
* Local variables:
* mode: C
* c-file-style: "BSD"
- * c-basic-offset: 4
- * indent-tabs-mode: nil
+ * c-basic-offset: 8
+ * indent-tabs-mode: t
* End:
*/
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 05/17] xen: arm32: resync atomics with (almost) v3.14-rc7
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
` (3 preceding siblings ...)
2014-03-20 15:45 ` [PATCH 04/17] xen: arm32: replace hard tabs in atomics.h Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
2014-03-20 17:27 ` Julien Grall
2014-03-20 15:45 ` [PATCH 06/17] xen: arm32: resync mem* with Linux v3.14-rc7 Ian Campbell
` (11 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
Almost because I omitting aed3a4e "ARM: 7868/1: arm/arm64: remove
atomic_clear_mask() ..." which I will apply to both arm32 and arm64
simultaneously in a later patch.
This pulls in the following Linux patches:
commit f38d999c4d16fc0fce4270374f15fbb2d8713c09
Author: Will Deacon <will.deacon@arm.com>
Date: Thu Jul 4 11:43:18 2013 +0100
ARM: atomics: prefetch the destination word for write prior to strex
The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.
This patch prefixes our atomic access implementations with pldw
instructions (on CPUs which support them) to try and grab the line in
exclusive state from the start. Only the barrier-less functions are
updated, since memory barriers can limit the usefulness of prefetching
data.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
commit 4dcc1cf7316a26e112f5c9fcca531ff98ef44700
Author: Chen Gang <gang.chen@asianux.com>
Date: Sat Oct 26 15:07:25 2013 +0100
ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval
For atomic_cmpxchg(), the type of 'oldval' need be 'int' to match the
type of "*ptr" (used by 'ldrex' instruction) and 'old' (used by 'teq'
instruction).
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
xen/include/asm-arm/arm32/atomic.h | 6 +++++-
xen/include/asm-arm/atomic.h | 1 +
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/xen/include/asm-arm/arm32/atomic.h b/xen/include/asm-arm/arm32/atomic.h
index 3f024d4..d309f66 100644
--- a/xen/include/asm-arm/arm32/atomic.h
+++ b/xen/include/asm-arm/arm32/atomic.h
@@ -21,6 +21,7 @@ static inline void atomic_add(int i, atomic_t *v)
unsigned long tmp;
int result;
+ prefetchw(&v->counter);
__asm__ __volatile__("@ atomic_add\n"
"1: ldrex %0, [%3]\n"
" add %0, %0, %4\n"
@@ -59,6 +60,7 @@ static inline void atomic_sub(int i, atomic_t *v)
unsigned long tmp;
int result;
+ prefetchw(&v->counter);
__asm__ __volatile__("@ atomic_sub\n"
"1: ldrex %0, [%3]\n"
" sub %0, %0, %4\n"
@@ -94,7 +96,8 @@ static inline int atomic_sub_return(int i, atomic_t *v)
static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
{
- unsigned long oldval, res;
+ int oldval;
+ unsigned long res;
smp_mb();
@@ -118,6 +121,7 @@ static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
{
unsigned long tmp, tmp2;
+ prefetchw(addr);
__asm__ __volatile__("@ atomic_clear_mask\n"
"1: ldrex %0, [%3]\n"
" bic %0, %0, %4\n"
diff --git a/xen/include/asm-arm/atomic.h b/xen/include/asm-arm/atomic.h
index 69c8f3f..2c92de9 100644
--- a/xen/include/asm-arm/atomic.h
+++ b/xen/include/asm-arm/atomic.h
@@ -2,6 +2,7 @@
#define __ARCH_ARM_ATOMIC__
#include <xen/config.h>
+#include <xen/prefetch.h>
#include <asm/system.h>
#define build_atomic_read(name, size, width, type, reg)\
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 06/17] xen: arm32: resync mem* with Linux v3.14-rc7
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
` (4 preceding siblings ...)
2014-03-20 15:45 ` [PATCH 05/17] xen: arm32: resync atomics with (almost) v3.14-rc7 Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
2014-03-20 17:29 ` Julien Grall
2014-03-20 15:45 ` [PATCH 07/17] xen: arm32: add optimised memchr routine Ian Campbell
` (10 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
This pulls in the following Linux commits:
commit 455bd4c430b0c0a361f38e8658a0d6cb469942b5
Author: Ivan Djelic <ivan.djelic@parrot.com>
Date: Wed Mar 6 20:09:27 2013 +0100
ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimi
Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.
For instance in the following function:
void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waite
{
memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
waiter->magic = waiter;
INIT_LIST_HEAD(&waiter->list);
}
compiled as:
800554d0 <debug_mutex_lock_common>:
800554d0: e92d4008 push {r3, lr}
800554d4: e1a00001 mov r0, r1
800554d8: e3a02010 mov r2, #16 ; 0x10
800554dc: e3a01011 mov r1, #17 ; 0x11
800554e0: eb04426e bl 80165ea0 <memset>
800554e4: e1a03000 mov r3, r0
800554e8: e583000c str r0, [r3, #12]
800554ec: e5830000 str r0, [r3]
800554f0: e5830004 str r0, [r3, #4]
800554f4: e8bd8008 pop {r3, pc}
GCC assumes memset returns the value of pointer 'waiter' in register r0; ca
register/memory corruptions.
This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:
Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper resul
but corrupting r8 on some paths (the ones that were using ip).
Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:
save r8:
- str lr, [sp, #-4]!
+ stmfd sp!, {r8, lr}
and restore r8 on both exit paths:
- ldmeqfd sp!, {pc} @ Now <64 bytes to go.
+ ldmeqfd sp!, {r8, pc} @ Now <64 bytes to go.
(...)
tst r2, #16
stmneia ip!, {r1, r3, r8, lr}
- ldr lr, [sp], #4
+ ldmfd sp!, {r8, lr}
Step 3
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:
save r8:
- stmfd sp!, {r4-r7, lr}
+ stmfd sp!, {r4-r8, lr}
and restore r8 on both exit paths:
bgt 3b
- ldmeqfd sp!, {r4-r7, pc}
+ ldmeqfd sp!, {r4-r8, pc}
(...)
tst r2, #16
stmneia ip!, {r4-r7}
- ldmfd sp!, {r4-r7, lr}
+ ldmfd sp!, {r4-r8, lr}
Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".
Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
commit 418df63adac56841ef6b0f1fcf435bc64d4ed177
Author: Nicolas Pitre <nicolas.pitre@linaro.org>
Date: Tue Mar 12 13:00:42 2013 +0100
ARM: 7670/1: fix the memset fix
Commit 455bd4c430b0 ("ARM: 7668/1: fix memset-related crashes caused by
recent GCC (4.7.2) optimizations") attempted to fix a compliance issue
with the memset return value. However the memset itself became broken
by that patch for misaligned pointers.
This fixes the above by branching over the entry code from the
misaligned fixup code to avoid reloading the original pointer.
Also, because the function entry alignment is wrong in the Thumb mode
compilation, that fixup code is moved to the end.
While at it, the entry instructions are slightly reworked to help dual
issue pipelines.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
xen/arch/arm/arm32/lib/memset.S | 100 +++++++++++++++++++--------------------
1 file changed, 48 insertions(+), 52 deletions(-)
diff --git a/xen/arch/arm/arm32/lib/memset.S b/xen/arch/arm/arm32/lib/memset.S
index d2937a3..c8ab257 100644
--- a/xen/arch/arm/arm32/lib/memset.S
+++ b/xen/arch/arm/arm32/lib/memset.S
@@ -16,27 +16,15 @@
.text
.align 5
- .word 0
-
-1: subs r2, r2, #4 @ 1 do we have enough
- blt 5f @ 1 bytes to align with?
- cmp r3, #2 @ 1
- strltb r1, [r0], #1 @ 1
- strleb r1, [r0], #1 @ 1
- strb r1, [r0], #1 @ 1
- add r2, r2, r3 @ 1 (r2 = r2 - (4 - r3))
-/*
- * The pointer is now aligned and the length is adjusted. Try doing the
- * memset again.
- */
ENTRY(memset)
ands r3, r0, #3 @ 1 unaligned?
- bne 1b @ 1
+ mov ip, r0 @ preserve r0 as return value
+ bne 6f @ 1
/*
- * we know that the pointer in r0 is aligned to a word boundary.
+ * we know that the pointer in ip is aligned to a word boundary.
*/
- orr r1, r1, r1, lsl #8
+1: orr r1, r1, r1, lsl #8
orr r1, r1, r1, lsl #16
mov r3, r1
cmp r2, #16
@@ -45,29 +33,28 @@ ENTRY(memset)
#if ! CALGN(1)+0
/*
- * We need an extra register for this loop - save the return address and
- * use the LR
+ * We need 2 extra registers for this loop - use r8 and the LR
*/
- str lr, [sp, #-4]!
- mov ip, r1
+ stmfd sp!, {r8, lr}
+ mov r8, r1
mov lr, r1
2: subs r2, r2, #64
- stmgeia r0!, {r1, r3, ip, lr} @ 64 bytes at a time.
- stmgeia r0!, {r1, r3, ip, lr}
- stmgeia r0!, {r1, r3, ip, lr}
- stmgeia r0!, {r1, r3, ip, lr}
+ stmgeia ip!, {r1, r3, r8, lr} @ 64 bytes at a time.
+ stmgeia ip!, {r1, r3, r8, lr}
+ stmgeia ip!, {r1, r3, r8, lr}
+ stmgeia ip!, {r1, r3, r8, lr}
bgt 2b
- ldmeqfd sp!, {pc} @ Now <64 bytes to go.
+ ldmeqfd sp!, {r8, pc} @ Now <64 bytes to go.
/*
* No need to correct the count; we're only testing bits from now on
*/
tst r2, #32
- stmneia r0!, {r1, r3, ip, lr}
- stmneia r0!, {r1, r3, ip, lr}
+ stmneia ip!, {r1, r3, r8, lr}
+ stmneia ip!, {r1, r3, r8, lr}
tst r2, #16
- stmneia r0!, {r1, r3, ip, lr}
- ldr lr, [sp], #4
+ stmneia ip!, {r1, r3, r8, lr}
+ ldmfd sp!, {r8, lr}
#else
@@ -76,54 +63,63 @@ ENTRY(memset)
* whole cache lines at once.
*/
- stmfd sp!, {r4-r7, lr}
+ stmfd sp!, {r4-r8, lr}
mov r4, r1
mov r5, r1
mov r6, r1
mov r7, r1
- mov ip, r1
+ mov r8, r1
mov lr, r1
cmp r2, #96
- tstgt r0, #31
+ tstgt ip, #31
ble 3f
- and ip, r0, #31
- rsb ip, ip, #32
- sub r2, r2, ip
- movs ip, ip, lsl #(32 - 4)
- stmcsia r0!, {r4, r5, r6, r7}
- stmmiia r0!, {r4, r5}
- tst ip, #(1 << 30)
- mov ip, r1
- strne r1, [r0], #4
+ and r8, ip, #31
+ rsb r8, r8, #32
+ sub r2, r2, r8
+ movs r8, r8, lsl #(32 - 4)
+ stmcsia ip!, {r4, r5, r6, r7}
+ stmmiia ip!, {r4, r5}
+ tst r8, #(1 << 30)
+ mov r8, r1
+ strne r1, [ip], #4
3: subs r2, r2, #64
- stmgeia r0!, {r1, r3-r7, ip, lr}
- stmgeia r0!, {r1, r3-r7, ip, lr}
+ stmgeia ip!, {r1, r3-r8, lr}
+ stmgeia ip!, {r1, r3-r8, lr}
bgt 3b
- ldmeqfd sp!, {r4-r7, pc}
+ ldmeqfd sp!, {r4-r8, pc}
tst r2, #32
- stmneia r0!, {r1, r3-r7, ip, lr}
+ stmneia ip!, {r1, r3-r8, lr}
tst r2, #16
- stmneia r0!, {r4-r7}
- ldmfd sp!, {r4-r7, lr}
+ stmneia ip!, {r4-r7}
+ ldmfd sp!, {r4-r8, lr}
#endif
4: tst r2, #8
- stmneia r0!, {r1, r3}
+ stmneia ip!, {r1, r3}
tst r2, #4
- strne r1, [r0], #4
+ strne r1, [ip], #4
/*
* When we get here, we've got less than 4 bytes to zero. We
* may have an unaligned pointer as well.
*/
5: tst r2, #2
- strneb r1, [r0], #1
- strneb r1, [r0], #1
+ strneb r1, [ip], #1
+ strneb r1, [ip], #1
tst r2, #1
- strneb r1, [r0], #1
+ strneb r1, [ip], #1
mov pc, lr
+
+6: subs r2, r2, #4 @ 1 do we have enough
+ blt 5b @ 1 bytes to align with?
+ cmp r3, #2 @ 1
+ strltb r1, [ip], #1 @ 1
+ strleb r1, [ip], #1 @ 1
+ strb r1, [ip], #1 @ 1
+ add r2, r2, r3 @ 1 (r2 = r2 - (4 - r3))
+ b 1b
ENDPROC(memset)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 07/17] xen: arm32: add optimised memchr routine
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
` (5 preceding siblings ...)
2014-03-20 15:45 ` [PATCH 06/17] xen: arm32: resync mem* with Linux v3.14-rc7 Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
2014-03-20 17:32 ` Julien Grall
2014-03-20 15:45 ` [PATCH 08/17] xen: arm32: add optimised strchr and strrchr routines Ian Campbell
` (9 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
This isn't used enough to be critical, but it completes the set of mem*.
Taken from Linux v3.14-rc7.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
xen/arch/arm/arm32/lib/Makefile | 2 +-
xen/arch/arm/arm32/lib/memchr.S | 28 ++++++++++++++++++++++++++++
xen/include/asm-arm/string.h | 3 +++
3 files changed, 32 insertions(+), 1 deletion(-)
create mode 100644 xen/arch/arm/arm32/lib/memchr.S
diff --git a/xen/arch/arm/arm32/lib/Makefile b/xen/arch/arm/arm32/lib/Makefile
index 4cf41f4..fa4e241 100644
--- a/xen/arch/arm/arm32/lib/Makefile
+++ b/xen/arch/arm/arm32/lib/Makefile
@@ -1,4 +1,4 @@
-obj-y += memcpy.o memmove.o memset.o memzero.o
+obj-y += memcpy.o memmove.o memset.o memchr.o memzero.o
obj-y += findbit.o setbit.o
obj-y += setbit.o clearbit.o changebit.o
obj-y += testsetbit.o testclearbit.o testchangebit.o
diff --git a/xen/arch/arm/arm32/lib/memchr.S b/xen/arch/arm/arm32/lib/memchr.S
new file mode 100644
index 0000000..fd64ed8
--- /dev/null
+++ b/xen/arch/arm/arm32/lib/memchr.S
@@ -0,0 +1,28 @@
+/*
+ * linux/arch/arm/lib/memchr.S
+ *
+ * Copyright (C) 1995-2000 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * ASM optimised string functions
+ */
+
+#include <xen/config.h>
+
+#include "assembler.h"
+
+ .text
+ .align 5
+ENTRY(memchr)
+1: subs r2, r2, #1
+ bmi 2f
+ ldrb r3, [r0], #1
+ teq r3, r1
+ bne 1b
+ sub r0, r0, #1
+2: movne r0, #0
+ mov pc, lr
+ENDPROC(memchr)
diff --git a/xen/include/asm-arm/string.h b/xen/include/asm-arm/string.h
index abfa9d2..2c9f4f7 100644
--- a/xen/include/asm-arm/string.h
+++ b/xen/include/asm-arm/string.h
@@ -14,6 +14,9 @@ extern void *memmove(void *dest, const void *src, size_t n);
#define __HAVE_ARCH_MEMSET
extern void * memset(void *, int, __kernel_size_t);
+#define __HAVE_ARCH_MEMCHR
+extern void * memchr(const void *, int, __kernel_size_t);
+
extern void __memzero(void *ptr, __kernel_size_t n);
#define memset(p,v,n) \
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 08/17] xen: arm32: add optimised strchr and strrchr routines
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
` (6 preceding siblings ...)
2014-03-20 15:45 ` [PATCH 07/17] xen: arm32: add optimised memchr routine Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
2014-03-20 17:33 ` Julien Grall
2014-03-20 15:45 ` [PATCH 09/17] xen: arm: remove atomic_clear_mask() Ian Campbell
` (8 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
Taken from Linux v3.14-rc7.
These aren't widely used enough to be critical, but we may as well have them.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
xen/arch/arm/arm32/lib/Makefile | 1 +
xen/arch/arm/arm32/lib/strchr.S | 29 +++++++++++++++++++++++++++++
xen/arch/arm/arm32/lib/strrchr.S | 28 ++++++++++++++++++++++++++++
xen/include/asm-arm/string.h | 12 ++++++++++++
4 files changed, 70 insertions(+)
create mode 100644 xen/arch/arm/arm32/lib/strchr.S
create mode 100644 xen/arch/arm/arm32/lib/strrchr.S
diff --git a/xen/arch/arm/arm32/lib/Makefile b/xen/arch/arm/arm32/lib/Makefile
index fa4e241..e9fbc59 100644
--- a/xen/arch/arm/arm32/lib/Makefile
+++ b/xen/arch/arm/arm32/lib/Makefile
@@ -2,4 +2,5 @@ obj-y += memcpy.o memmove.o memset.o memchr.o memzero.o
obj-y += findbit.o setbit.o
obj-y += setbit.o clearbit.o changebit.o
obj-y += testsetbit.o testclearbit.o testchangebit.o
+obj-y += strchr.o strrchr.o
obj-y += lib1funcs.o lshrdi3.o div64.o
diff --git a/xen/arch/arm/arm32/lib/strchr.S b/xen/arch/arm/arm32/lib/strchr.S
new file mode 100644
index 0000000..f01740e
--- /dev/null
+++ b/xen/arch/arm/arm32/lib/strchr.S
@@ -0,0 +1,29 @@
+/*
+ * linux/arch/arm/lib/strchr.S
+ *
+ * Copyright (C) 1995-2000 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * ASM optimised string functions
+ */
+
+#include <xen/config.h>
+
+#include "assembler.h"
+
+ .text
+ .align 5
+ENTRY(strchr)
+ and r1, r1, #0xff
+1: ldrb r2, [r0], #1
+ teq r2, r1
+ teqne r2, #0
+ bne 1b
+ teq r2, r1
+ movne r0, #0
+ subeq r0, r0, #1
+ mov pc, lr
+ENDPROC(strchr)
diff --git a/xen/arch/arm/arm32/lib/strrchr.S b/xen/arch/arm/arm32/lib/strrchr.S
new file mode 100644
index 0000000..88fc0de
--- /dev/null
+++ b/xen/arch/arm/arm32/lib/strrchr.S
@@ -0,0 +1,28 @@
+/*
+ * linux/arch/arm/lib/strrchr.S
+ *
+ * Copyright (C) 1995-2000 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * ASM optimised string functions
+ */
+
+#include <xen/config.h>
+
+#include "assembler.h"
+
+ .text
+ .align 5
+ENTRY(strrchr)
+ mov r3, #0
+1: ldrb r2, [r0], #1
+ teq r2, r1
+ subeq r3, r0, #1
+ teq r2, #0
+ bne 1b
+ mov r0, r3
+ mov pc, lr
+ENDPROC(strrchr)
diff --git a/xen/include/asm-arm/string.h b/xen/include/asm-arm/string.h
index 2c9f4f7..7d8b35a 100644
--- a/xen/include/asm-arm/string.h
+++ b/xen/include/asm-arm/string.h
@@ -4,6 +4,18 @@
#include <xen/config.h>
#if defined(CONFIG_ARM_32)
+
+/*
+ * We don't do inline string functions, since the
+ * optimised inline asm versions are not small.
+ */
+
+#define __HAVE_ARCH_STRRCHR
+extern char * strrchr(const char * s, int c);
+
+#define __HAVE_ARCH_STRCHR
+extern char * strchr(const char * s, int c);
+
#define __HAVE_ARCH_MEMCPY
extern void * memcpy(void *, const void *, __kernel_size_t);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 09/17] xen: arm: remove atomic_clear_mask()
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
` (7 preceding siblings ...)
2014-03-20 15:45 ` [PATCH 08/17] xen: arm32: add optimised strchr and strrchr routines Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
2014-03-20 17:35 ` Julien Grall
2014-03-20 15:45 ` [PATCH 10/17] xen: arm64: disable alignment traps Ian Campbell
` (7 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
This has no users.
This brings arm32 atomic.h into sync with Linux v3.14-rc7.
arm64/atomic.h requires other patches for this to be the case.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
xen/include/asm-arm/arm32/atomic.h | 16 ----------------
xen/include/asm-arm/arm64/atomic.h | 14 --------------
2 files changed, 30 deletions(-)
diff --git a/xen/include/asm-arm/arm32/atomic.h b/xen/include/asm-arm/arm32/atomic.h
index d309f66..3d601d1 100644
--- a/xen/include/asm-arm/arm32/atomic.h
+++ b/xen/include/asm-arm/arm32/atomic.h
@@ -117,22 +117,6 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
return oldval;
}
-static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
-{
- unsigned long tmp, tmp2;
-
- prefetchw(addr);
- __asm__ __volatile__("@ atomic_clear_mask\n"
-"1: ldrex %0, [%3]\n"
-" bic %0, %0, %4\n"
-" strex %1, %0, [%3]\n"
-" teq %1, #0\n"
-" bne 1b"
- : "=&r" (tmp), "=&r" (tmp2), "+Qo" (*addr)
- : "r" (addr), "Ir" (mask)
- : "cc");
-}
-
#define atomic_inc(v) atomic_add(1, v)
#define atomic_dec(v) atomic_sub(1, v)
diff --git a/xen/include/asm-arm/arm64/atomic.h b/xen/include/asm-arm/arm64/atomic.h
index b04e6d5..6b37945 100644
--- a/xen/include/asm-arm/arm64/atomic.h
+++ b/xen/include/asm-arm/arm64/atomic.h
@@ -110,20 +110,6 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
return oldval;
}
-static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
-{
- unsigned long tmp, tmp2;
-
- asm volatile("// atomic_clear_mask\n"
-"1: ldxr %0, %2\n"
-" bic %0, %0, %3\n"
-" stxr %w1, %0, %2\n"
-" cbnz %w1, 1b"
- : "=&r" (tmp), "=&r" (tmp2), "+Q" (*addr)
- : "Ir" (mask)
- : "cc");
-}
-
#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
static inline int __atomic_add_unless(atomic_t *v, int a, int u)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 10/17] xen: arm64: disable alignment traps
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
` (8 preceding siblings ...)
2014-03-20 15:45 ` [PATCH 09/17] xen: arm: remove atomic_clear_mask() Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
2014-03-20 15:57 ` Andrew Cooper
2014-03-20 17:54 ` Julien Grall
2014-03-20 15:45 ` [PATCH 11/17] xen: arm64: atomics: fix use of acquire + release for full barrier semantics Ian Campbell
` (6 subsequent siblings)
16 siblings, 2 replies; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
The mem* primitives which I am about to import from Linux in a subsequent
patch rely on the hardware handling misalignment.
The benefits of an optimised memcpy etc oughtweigh the downsides.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
xen/arch/arm/arm64/head.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 9547ef5..22d0030 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -241,7 +241,7 @@ skip_bss:
* I-cache enabled,
* Alignment checking enabled,
* MMU translation disabled (for now). */
- ldr x0, =(HSCTLR_BASE|SCTLR_A)
+ ldr x0, =(HSCTLR_BASE)
msr SCTLR_EL2, x0
/* Rebuild the boot pagetable's first-level entries. The structure
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 11/17] xen: arm64: atomics: fix use of acquire + release for full barrier semantics
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
` (9 preceding siblings ...)
2014-03-20 15:45 ` [PATCH 10/17] xen: arm64: disable alignment traps Ian Campbell
@ 2014-03-20 15:45 ` Ian Campbell
2014-03-20 17:43 ` Julien Grall
2014-03-20 15:46 ` [PATCH 12/17] xen: arm64: reinstate hard tabs in system.h cmpxchg Ian Campbell
` (5 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:45 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
Xen, like Linux, expects full barrier semantics for bitops, atomics and
cmpxchgs. This issue was discovered on Linux and we get our implementation of
these from Linux so quoting Will Deacon in Linux commit 8e86f0b409a4 for the
gory details:
Linux requires a number of atomic operations to provide full barrier
semantics, that is no memory accesses after the operation can be
observed before any accesses up to and including the operation in
program order.
On arm64, these operations have been incorrectly implemented as follows:
// A, B, C are independent memory locations
<Access [A]>
// atomic_op (B)
1: ldaxr x0, [B] // Exclusive load with acquire
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b
<Access [C]>
The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).
Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.
The simple way to fix this is to implement the same algorithm as ARMv7
using explicit barriers:
<Access [A]>
// atomic_op (B)
dmb ish // Full barrier
1: ldxr x0, [B] // Exclusive load
<op(B)>
stxr w1, x0, [B] // Exclusive store
cbnz w1, 1b
dmb ish // Full barrier
<Access [C]>
but this has the undesirable effect of introducing *two* full barrier
instructions. A better approach is actually the following, non-intuitive
sequence:
<Access [A]>
// atomic_op (B)
1: ldxr x0, [B] // Exclusive load
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b
dmb ish // Full barrier
<Access [C]>
The simple observations here are:
- The dmb ensures that no subsequent accesses (e.g. the access to C)
can enter or pass the atomic sequence.
- The dmb also ensures that no prior accesses (e.g. the access to A)
can pass the atomic sequence.
- Therefore, no prior access can pass a subsequent access, or
vice-versa (i.e. A is strictly ordered before C).
- The stlxr ensures that no prior access can pass the store component
of the atomic operation.
The only tricky part remaining is the ordering between the ldxr and the
access to A, since the absence of the first dmb means that we're now
permitting re-ordering between the ldxr and any prior accesses.
From an (arbitrary) observer's point of view, there are two scenarios:
1. We have observed the ldxr. This means that if we perform a store to
[B], the ldxr will still return older data. If we can observe the
ldxr, then we can potentially observe the permitted re-ordering
with the access to A, which is clearly an issue when compared to
the dmb variant of the code. Thankfully, the exclusive monitor will
save us here since it will be cleared as a result of the store and
the ldxr will retry. Notice that any use of a later memory
observation to imply observation of the ldxr will also imply
observation of the access to A, since the stlxr/dmb ensure strict
ordering.
2. We have not observed the ldxr. This means we can perform a store
and influence the later ldxr. However, that doesn't actually tell
us anything about the access to [A], so we've not lost anything
here either when compared to the dmb variant.
This patch implements this solution for our barriered atomic operations,
ensuring that we satisfy the full barrier requirements where they are
needed.
Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
xen/arch/arm/arm64/lib/bitops.S | 3 +-
xen/include/asm-arm/arm64/atomic.h | 13 +++++---
xen/include/asm-arm/arm64/system.h | 61 ++++++++++++++++++------------------
3 files changed, 42 insertions(+), 35 deletions(-)
diff --git a/xen/arch/arm/arm64/lib/bitops.S b/xen/arch/arm/arm64/lib/bitops.S
index 80cc903..e1ad239 100644
--- a/xen/arch/arm/arm64/lib/bitops.S
+++ b/xen/arch/arm/arm64/lib/bitops.S
@@ -46,11 +46,12 @@ ENTRY( \name )
mov x2, #1
add x1, x1, x0, lsr #3 // Get word offset
lsl x4, x2, x3 // Create mask
-1: ldaxr w2, [x1]
+1: ldxr w2, [x1]
lsr w0, w2, w3 // Save old value of bit
\instr w2, w2, w4 // toggle bit
stlxr w5, w2, [x1]
cbnz w5, 1b
+ dmb ish
and w0, w0, #1
3: ret
ENDPROC(\name )
diff --git a/xen/include/asm-arm/arm64/atomic.h b/xen/include/asm-arm/arm64/atomic.h
index 6b37945..3f37ed5 100644
--- a/xen/include/asm-arm/arm64/atomic.h
+++ b/xen/include/asm-arm/arm64/atomic.h
@@ -48,7 +48,7 @@ static inline int atomic_add_return(int i, atomic_t *v)
int result;
asm volatile("// atomic_add_return\n"
-"1: ldaxr %w0, %2\n"
+"1: ldxr %w0, %2\n"
" add %w0, %w0, %w3\n"
" stlxr %w1, %w0, %2\n"
" cbnz %w1, 1b"
@@ -56,6 +56,7 @@ static inline int atomic_add_return(int i, atomic_t *v)
: "Ir" (i)
: "cc", "memory");
+ smp_mb();
return result;
}
@@ -80,7 +81,7 @@ static inline int atomic_sub_return(int i, atomic_t *v)
int result;
asm volatile("// atomic_sub_return\n"
-"1: ldaxr %w0, %2\n"
+"1: ldxr %w0, %2\n"
" sub %w0, %w0, %w3\n"
" stlxr %w1, %w0, %2\n"
" cbnz %w1, 1b"
@@ -88,6 +89,7 @@ static inline int atomic_sub_return(int i, atomic_t *v)
: "Ir" (i)
: "cc", "memory");
+ smp_mb();
return result;
}
@@ -96,17 +98,20 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
unsigned long tmp;
int oldval;
+ smp_mb();
+
asm volatile("// atomic_cmpxchg\n"
-"1: ldaxr %w1, %2\n"
+"1: ldxr %w1, %2\n"
" cmp %w1, %w3\n"
" b.ne 2f\n"
-" stlxr %w0, %w4, %2\n"
+" stxr %w0, %w4, %2\n"
" cbnz %w0, 1b\n"
"2:"
: "=&r" (tmp), "=&r" (oldval), "+Q" (ptr->counter)
: "Ir" (old), "r" (new)
: "cc", "memory");
+ smp_mb();
return oldval;
}
diff --git a/xen/include/asm-arm/arm64/system.h b/xen/include/asm-arm/arm64/system.h
index 570af5c..0db96e0 100644
--- a/xen/include/asm-arm/arm64/system.h
+++ b/xen/include/asm-arm/arm64/system.h
@@ -8,49 +8,50 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
{
unsigned long ret, tmp;
- switch (size) {
- case 1:
- asm volatile("// __xchg1\n"
- "1: ldaxrb %w0, %2\n"
- " stlxrb %w1, %w3, %2\n"
- " cbnz %w1, 1b\n"
- : "=&r" (ret), "=&r" (tmp), "+Q" (*(u8 *)ptr)
+ switch (size) {
+ case 1:
+ asm volatile("// __xchg1\n"
+ "1: ldxrb %w0, %2\n"
+ " stlxrb %w1, %w3, %2\n"
+ " cbnz %w1, 1b\n"
+ : "=&r" (ret), "=&r" (tmp), "+Q" (*(u8 *)ptr)
: "r" (x)
: "cc", "memory");
- break;
- case 2:
- asm volatile("// __xchg2\n"
- "1: ldaxrh %w0, %2\n"
- " stlxrh %w1, %w3, %2\n"
- " cbnz %w1, 1b\n"
- : "=&r" (ret), "=&r" (tmp), "+Q" (*(u16 *)ptr)
+ break;
+ case 2:
+ asm volatile("// __xchg2\n"
+ "1: ldxrh %w0, %2\n"
+ " stlxrh %w1, %w3, %2\n"
+ " cbnz %w1, 1b\n"
+ : "=&r" (ret), "=&r" (tmp), "+Q" (*(u16 *)ptr)
: "r" (x)
: "cc", "memory");
- break;
- case 4:
- asm volatile("// __xchg4\n"
- "1: ldaxr %w0, %2\n"
- " stlxr %w1, %w3, %2\n"
- " cbnz %w1, 1b\n"
- : "=&r" (ret), "=&r" (tmp), "+Q" (*(u32 *)ptr)
+ break;
+ case 4:
+ asm volatile("// __xchg4\n"
+ "1: ldxr %w0, %2\n"
+ " stlxr %w1, %w3, %2\n"
+ " cbnz %w1, 1b\n"
+ : "=&r" (ret), "=&r" (tmp), "+Q" (*(u32 *)ptr)
: "r" (x)
: "cc", "memory");
- break;
- case 8:
- asm volatile("// __xchg8\n"
- "1: ldaxr %0, %2\n"
- " stlxr %w1, %3, %2\n"
- " cbnz %w1, 1b\n"
- : "=&r" (ret), "=&r" (tmp), "+Q" (*(u64 *)ptr)
+ break;
+ case 8:
+ asm volatile("// __xchg8\n"
+ "1: ldxr %0, %2\n"
+ " stlxr %w1, %3, %2\n"
+ " cbnz %w1, 1b\n"
+ : "=&r" (ret), "=&r" (tmp), "+Q" (*(u64 *)ptr)
: "r" (x)
: "cc", "memory");
break;
default:
__bad_xchg(ptr, size), ret = 0;
break;
- }
+ }
- return ret;
+ smp_mb();
+ return ret;
}
#define xchg(ptr,x) \
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 12/17] xen: arm64: reinstate hard tabs in system.h cmpxchg
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
` (10 preceding siblings ...)
2014-03-20 15:45 ` [PATCH 11/17] xen: arm64: atomics: fix use of acquire + release for full barrier semantics Ian Campbell
@ 2014-03-20 15:46 ` Ian Campbell
2014-03-20 17:44 ` Julien Grall
2014-03-20 15:46 ` [PATCH 13/17] xen: arm64: asm: remove redundant "cc" clobbers Ian Campbell
` (4 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:46 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
These functions are from Linux and the intention was to keep the formatting
the same to make resyncing easier.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
xen/include/asm-arm/arm64/system.h | 196 ++++++++++++++++++------------------
1 file changed, 98 insertions(+), 98 deletions(-)
diff --git a/xen/include/asm-arm/arm64/system.h b/xen/include/asm-arm/arm64/system.h
index 0db96e0..9fa698b 100644
--- a/xen/include/asm-arm/arm64/system.h
+++ b/xen/include/asm-arm/arm64/system.h
@@ -6,7 +6,7 @@ extern void __bad_xchg(volatile void *, int);
static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
{
- unsigned long ret, tmp;
+ unsigned long ret, tmp;
switch (size) {
case 1:
@@ -15,8 +15,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
" stlxrb %w1, %w3, %2\n"
" cbnz %w1, 1b\n"
: "=&r" (ret), "=&r" (tmp), "+Q" (*(u8 *)ptr)
- : "r" (x)
- : "cc", "memory");
+ : "r" (x)
+ : "cc", "memory");
break;
case 2:
asm volatile("// __xchg2\n"
@@ -24,8 +24,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
" stlxrh %w1, %w3, %2\n"
" cbnz %w1, 1b\n"
: "=&r" (ret), "=&r" (tmp), "+Q" (*(u16 *)ptr)
- : "r" (x)
- : "cc", "memory");
+ : "r" (x)
+ : "cc", "memory");
break;
case 4:
asm volatile("// __xchg4\n"
@@ -33,8 +33,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
" stlxr %w1, %w3, %2\n"
" cbnz %w1, 1b\n"
: "=&r" (ret), "=&r" (tmp), "+Q" (*(u32 *)ptr)
- : "r" (x)
- : "cc", "memory");
+ : "r" (x)
+ : "cc", "memory");
break;
case 8:
asm volatile("// __xchg8\n"
@@ -42,12 +42,12 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
" stlxr %w1, %3, %2\n"
" cbnz %w1, 1b\n"
: "=&r" (ret), "=&r" (tmp), "+Q" (*(u64 *)ptr)
- : "r" (x)
- : "cc", "memory");
- break;
- default:
- __bad_xchg(ptr, size), ret = 0;
- break;
+ : "r" (x)
+ : "cc", "memory");
+ break;
+ default:
+ __bad_xchg(ptr, size), ret = 0;
+ break;
}
smp_mb();
@@ -55,107 +55,107 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
}
#define xchg(ptr,x) \
- ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
+ ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
extern void __bad_cmpxchg(volatile void *ptr, int size);
static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
- unsigned long new, int size)
+ unsigned long new, int size)
{
- unsigned long oldval = 0, res;
-
- switch (size) {
- case 1:
- do {
- asm volatile("// __cmpxchg1\n"
- " ldxrb %w1, %2\n"
- " mov %w0, #0\n"
- " cmp %w1, %w3\n"
- " b.ne 1f\n"
- " stxrb %w0, %w4, %2\n"
- "1:\n"
- : "=&r" (res), "=&r" (oldval), "+Q" (*(u8 *)ptr)
- : "Ir" (old), "r" (new)
- : "cc");
- } while (res);
- break;
-
- case 2:
- do {
- asm volatile("// __cmpxchg2\n"
- " ldxrh %w1, %2\n"
- " mov %w0, #0\n"
- " cmp %w1, %w3\n"
- " b.ne 1f\n"
- " stxrh %w0, %w4, %2\n"
- "1:\n"
- : "=&r" (res), "=&r" (oldval), "+Q" (*(u16 *)ptr)
- : "Ir" (old), "r" (new)
- : "cc");
- } while (res);
- break;
-
- case 4:
- do {
- asm volatile("// __cmpxchg4\n"
- " ldxr %w1, %2\n"
- " mov %w0, #0\n"
- " cmp %w1, %w3\n"
- " b.ne 1f\n"
- " stxr %w0, %w4, %2\n"
- "1:\n"
- : "=&r" (res), "=&r" (oldval), "+Q" (*(u32 *)ptr)
- : "Ir" (old), "r" (new)
- : "cc");
- } while (res);
- break;
-
- case 8:
- do {
- asm volatile("// __cmpxchg8\n"
- " ldxr %1, %2\n"
- " mov %w0, #0\n"
- " cmp %1, %3\n"
- " b.ne 1f\n"
- " stxr %w0, %4, %2\n"
- "1:\n"
- : "=&r" (res), "=&r" (oldval), "+Q" (*(u64 *)ptr)
- : "Ir" (old), "r" (new)
- : "cc");
- } while (res);
- break;
-
- default:
+ unsigned long oldval = 0, res;
+
+ switch (size) {
+ case 1:
+ do {
+ asm volatile("// __cmpxchg1\n"
+ " ldxrb %w1, %2\n"
+ " mov %w0, #0\n"
+ " cmp %w1, %w3\n"
+ " b.ne 1f\n"
+ " stxrb %w0, %w4, %2\n"
+ "1:\n"
+ : "=&r" (res), "=&r" (oldval), "+Q" (*(u8 *)ptr)
+ : "Ir" (old), "r" (new)
+ : "cc");
+ } while (res);
+ break;
+
+ case 2:
+ do {
+ asm volatile("// __cmpxchg2\n"
+ " ldxrh %w1, %2\n"
+ " mov %w0, #0\n"
+ " cmp %w1, %w3\n"
+ " b.ne 1f\n"
+ " stxrh %w0, %w4, %2\n"
+ "1:\n"
+ : "=&r" (res), "=&r" (oldval), "+Q" (*(u16 *)ptr)
+ : "Ir" (old), "r" (new)
+ : "cc");
+ } while (res);
+ break;
+
+ case 4:
+ do {
+ asm volatile("// __cmpxchg4\n"
+ " ldxr %w1, %2\n"
+ " mov %w0, #0\n"
+ " cmp %w1, %w3\n"
+ " b.ne 1f\n"
+ " stxr %w0, %w4, %2\n"
+ "1:\n"
+ : "=&r" (res), "=&r" (oldval), "+Q" (*(u32 *)ptr)
+ : "Ir" (old), "r" (new)
+ : "cc");
+ } while (res);
+ break;
+
+ case 8:
+ do {
+ asm volatile("// __cmpxchg8\n"
+ " ldxr %1, %2\n"
+ " mov %w0, #0\n"
+ " cmp %1, %3\n"
+ " b.ne 1f\n"
+ " stxr %w0, %4, %2\n"
+ "1:\n"
+ : "=&r" (res), "=&r" (oldval), "+Q" (*(u64 *)ptr)
+ : "Ir" (old), "r" (new)
+ : "cc");
+ } while (res);
+ break;
+
+ default:
__bad_cmpxchg(ptr, size);
oldval = 0;
- }
+ }
- return oldval;
+ return oldval;
}
static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
- unsigned long new, int size)
+ unsigned long new, int size)
{
- unsigned long ret;
+ unsigned long ret;
- smp_mb();
- ret = __cmpxchg(ptr, old, new, size);
- smp_mb();
+ smp_mb();
+ ret = __cmpxchg(ptr, old, new, size);
+ smp_mb();
- return ret;
+ return ret;
}
-#define cmpxchg(ptr,o,n) \
- ((__typeof__(*(ptr)))__cmpxchg_mb((ptr), \
- (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
-
-#define cmpxchg_local(ptr,o,n) \
- ((__typeof__(*(ptr)))__cmpxchg((ptr), \
- (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
+#define cmpxchg(ptr,o,n) \
+ ((__typeof__(*(ptr)))__cmpxchg_mb((ptr), \
+ (unsigned long)(o), \
+ (unsigned long)(n), \
+ sizeof(*(ptr))))
+
+#define cmpxchg_local(ptr,o,n) \
+ ((__typeof__(*(ptr)))__cmpxchg((ptr), \
+ (unsigned long)(o), \
+ (unsigned long)(n), \
+ sizeof(*(ptr))))
/* Uses uimm4 as a bitmask to select the clearing of one or more of
* the DAIF exception mask bits:
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 13/17] xen: arm64: asm: remove redundant "cc" clobbers
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
` (11 preceding siblings ...)
2014-03-20 15:46 ` [PATCH 12/17] xen: arm64: reinstate hard tabs in system.h cmpxchg Ian Campbell
@ 2014-03-20 15:46 ` Ian Campbell
2014-03-20 17:45 ` Julien Grall
2014-03-20 15:46 ` [PATCH 14/17] xen: arm64: assembly optimised mem* and str* Ian Campbell
` (3 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:46 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
This resyncs atomics and cmpxchgs with Linux v3.14-rc7 by importing:
commit 95c4189689f92fba7ecf9097173404d4928c6e9b
Author: Will Deacon <will.deacon@arm.com>
Date: Tue Feb 4 12:29:13 2014 +0000
arm64: asm: remove redundant "cc" clobbers
cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
from inline asm blocks that only use these instructions to implement
conditional branches.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
xen/include/asm-arm/arm64/atomic.h | 12 +++++-------
xen/include/asm-arm/arm64/spinlock.h | 6 +++---
xen/include/asm-arm/arm64/system.h | 8 ++++----
3 files changed, 12 insertions(+), 14 deletions(-)
diff --git a/xen/include/asm-arm/arm64/atomic.h b/xen/include/asm-arm/arm64/atomic.h
index 3f37ed5..b5d50f2 100644
--- a/xen/include/asm-arm/arm64/atomic.h
+++ b/xen/include/asm-arm/arm64/atomic.h
@@ -38,8 +38,7 @@ static inline void atomic_add(int i, atomic_t *v)
" stxr %w1, %w0, %2\n"
" cbnz %w1, 1b"
: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
- : "Ir" (i)
- : "cc");
+ : "Ir" (i));
}
static inline int atomic_add_return(int i, atomic_t *v)
@@ -54,7 +53,7 @@ static inline int atomic_add_return(int i, atomic_t *v)
" cbnz %w1, 1b"
: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
: "Ir" (i)
- : "cc", "memory");
+ : "memory");
smp_mb();
return result;
@@ -71,8 +70,7 @@ static inline void atomic_sub(int i, atomic_t *v)
" stxr %w1, %w0, %2\n"
" cbnz %w1, 1b"
: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
- : "Ir" (i)
- : "cc");
+ : "Ir" (i));
}
static inline int atomic_sub_return(int i, atomic_t *v)
@@ -87,7 +85,7 @@ static inline int atomic_sub_return(int i, atomic_t *v)
" cbnz %w1, 1b"
: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
: "Ir" (i)
- : "cc", "memory");
+ : "memory");
smp_mb();
return result;
@@ -109,7 +107,7 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
"2:"
: "=&r" (tmp), "=&r" (oldval), "+Q" (ptr->counter)
: "Ir" (old), "r" (new)
- : "cc", "memory");
+ : "cc");
smp_mb();
return oldval;
diff --git a/xen/include/asm-arm/arm64/spinlock.h b/xen/include/asm-arm/arm64/spinlock.h
index 3a36cfd..04300bc 100644
--- a/xen/include/asm-arm/arm64/spinlock.h
+++ b/xen/include/asm-arm/arm64/spinlock.h
@@ -70,7 +70,7 @@ static always_inline int _raw_read_trylock(raw_rwlock_t *rw)
"1:\n"
: "=&r" (tmp), "+r" (tmp2), "+Q" (rw->lock)
:
- : "cc", "memory");
+ : "memory");
return !tmp2;
}
@@ -86,7 +86,7 @@ static always_inline int _raw_write_trylock(raw_rwlock_t *rw)
"1:\n"
: "=&r" (tmp), "+Q" (rw->lock)
: "r" (0x80000000)
- : "cc", "memory");
+ : "memory");
return !tmp;
}
@@ -102,7 +102,7 @@ static inline void _raw_read_unlock(raw_rwlock_t *rw)
" cbnz %w1, 1b\n"
: "=&r" (tmp), "=&r" (tmp2), "+Q" (rw->lock)
:
- : "cc", "memory");
+ : "memory");
}
static inline void _raw_write_unlock(raw_rwlock_t *rw)
diff --git a/xen/include/asm-arm/arm64/system.h b/xen/include/asm-arm/arm64/system.h
index 9fa698b..fa50ead 100644
--- a/xen/include/asm-arm/arm64/system.h
+++ b/xen/include/asm-arm/arm64/system.h
@@ -16,7 +16,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
" cbnz %w1, 1b\n"
: "=&r" (ret), "=&r" (tmp), "+Q" (*(u8 *)ptr)
: "r" (x)
- : "cc", "memory");
+ : "memory");
break;
case 2:
asm volatile("// __xchg2\n"
@@ -25,7 +25,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
" cbnz %w1, 1b\n"
: "=&r" (ret), "=&r" (tmp), "+Q" (*(u16 *)ptr)
: "r" (x)
- : "cc", "memory");
+ : "memory");
break;
case 4:
asm volatile("// __xchg4\n"
@@ -34,7 +34,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
" cbnz %w1, 1b\n"
: "=&r" (ret), "=&r" (tmp), "+Q" (*(u32 *)ptr)
: "r" (x)
- : "cc", "memory");
+ : "memory");
break;
case 8:
asm volatile("// __xchg8\n"
@@ -43,7 +43,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
" cbnz %w1, 1b\n"
: "=&r" (ret), "=&r" (tmp), "+Q" (*(u64 *)ptr)
: "r" (x)
- : "cc", "memory");
+ : "memory");
break;
default:
__bad_xchg(ptr, size), ret = 0;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 14/17] xen: arm64: assembly optimised mem* and str*
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
` (12 preceding siblings ...)
2014-03-20 15:46 ` [PATCH 13/17] xen: arm64: asm: remove redundant "cc" clobbers Ian Campbell
@ 2014-03-20 15:46 ` Ian Campbell
2014-03-20 17:48 ` Julien Grall
2014-03-20 15:46 ` [PATCH 15/17] xen: arm64: optimised clear_page Ian Campbell
` (2 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:46 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
Taken from Linux v3.14-rc7.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
xen/arch/arm/arm64/lib/Makefile | 2 ++
xen/arch/arm/arm64/lib/memchr.S | 43 +++++++++++++++++++++++++++++
xen/arch/arm/arm64/lib/memcpy.S | 52 +++++++++++++++++++++++++++++++++++
xen/arch/arm/arm64/lib/memmove.S | 56 ++++++++++++++++++++++++++++++++++++++
xen/arch/arm/arm64/lib/memset.S | 52 +++++++++++++++++++++++++++++++++++
xen/arch/arm/arm64/lib/strchr.S | 41 ++++++++++++++++++++++++++++
xen/arch/arm/arm64/lib/strrchr.S | 42 ++++++++++++++++++++++++++++
xen/include/asm-arm/string.h | 4 +--
8 files changed, 290 insertions(+), 2 deletions(-)
create mode 100644 xen/arch/arm/arm64/lib/memchr.S
create mode 100644 xen/arch/arm/arm64/lib/memcpy.S
create mode 100644 xen/arch/arm/arm64/lib/memmove.S
create mode 100644 xen/arch/arm/arm64/lib/memset.S
create mode 100644 xen/arch/arm/arm64/lib/strchr.S
create mode 100644 xen/arch/arm/arm64/lib/strrchr.S
diff --git a/xen/arch/arm/arm64/lib/Makefile b/xen/arch/arm/arm64/lib/Makefile
index 32c02c4..9f3b236 100644
--- a/xen/arch/arm/arm64/lib/Makefile
+++ b/xen/arch/arm/arm64/lib/Makefile
@@ -1 +1,3 @@
+obj-y += memcpy.o memmove.o memset.o memchr.o
obj-y += bitops.o find_next_bit.o
+obj-y += strchr.o strrchr.o
diff --git a/xen/arch/arm/arm64/lib/memchr.S b/xen/arch/arm/arm64/lib/memchr.S
new file mode 100644
index 0000000..3cc1b01
--- /dev/null
+++ b/xen/arch/arm/arm64/lib/memchr.S
@@ -0,0 +1,43 @@
+/*
+ * Based on arch/arm/lib/memchr.S
+ *
+ * Copyright (C) 1995-2000 Russell King
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+
+/*
+ * Find a character in an area of memory.
+ *
+ * Parameters:
+ * x0 - buf
+ * x1 - c
+ * x2 - n
+ * Returns:
+ * x0 - address of first occurrence of 'c' or 0
+ */
+ENTRY(memchr)
+ and w1, w1, #0xff
+1: subs x2, x2, #1
+ b.mi 2f
+ ldrb w3, [x0], #1
+ cmp w3, w1
+ b.ne 1b
+ sub x0, x0, #1
+ ret
+2: mov x0, #0
+ ret
+ENDPROC(memchr)
diff --git a/xen/arch/arm/arm64/lib/memcpy.S b/xen/arch/arm/arm64/lib/memcpy.S
new file mode 100644
index 0000000..c8197c6
--- /dev/null
+++ b/xen/arch/arm/arm64/lib/memcpy.S
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+
+/*
+ * Copy a buffer from src to dest (alignment handled by the hardware)
+ *
+ * Parameters:
+ * x0 - dest
+ * x1 - src
+ * x2 - n
+ * Returns:
+ * x0 - dest
+ */
+ENTRY(memcpy)
+ mov x4, x0
+ subs x2, x2, #8
+ b.mi 2f
+1: ldr x3, [x1], #8
+ subs x2, x2, #8
+ str x3, [x4], #8
+ b.pl 1b
+2: adds x2, x2, #4
+ b.mi 3f
+ ldr w3, [x1], #4
+ sub x2, x2, #4
+ str w3, [x4], #4
+3: adds x2, x2, #2
+ b.mi 4f
+ ldrh w3, [x1], #2
+ sub x2, x2, #2
+ strh w3, [x4], #2
+4: adds x2, x2, #1
+ b.mi 5f
+ ldrb w3, [x1]
+ strb w3, [x4]
+5: ret
+ENDPROC(memcpy)
diff --git a/xen/arch/arm/arm64/lib/memmove.S b/xen/arch/arm/arm64/lib/memmove.S
new file mode 100644
index 0000000..1bf0936
--- /dev/null
+++ b/xen/arch/arm/arm64/lib/memmove.S
@@ -0,0 +1,56 @@
+/*
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+
+/*
+ * Move a buffer from src to test (alignment handled by the hardware).
+ * If dest <= src, call memcpy, otherwise copy in reverse order.
+ *
+ * Parameters:
+ * x0 - dest
+ * x1 - src
+ * x2 - n
+ * Returns:
+ * x0 - dest
+ */
+ENTRY(memmove)
+ cmp x0, x1
+ b.ls memcpy
+ add x4, x0, x2
+ add x1, x1, x2
+ subs x2, x2, #8
+ b.mi 2f
+1: ldr x3, [x1, #-8]!
+ subs x2, x2, #8
+ str x3, [x4, #-8]!
+ b.pl 1b
+2: adds x2, x2, #4
+ b.mi 3f
+ ldr w3, [x1, #-4]!
+ sub x2, x2, #4
+ str w3, [x4, #-4]!
+3: adds x2, x2, #2
+ b.mi 4f
+ ldrh w3, [x1, #-2]!
+ sub x2, x2, #2
+ strh w3, [x4, #-2]!
+4: adds x2, x2, #1
+ b.mi 5f
+ ldrb w3, [x1, #-1]
+ strb w3, [x4, #-1]
+5: ret
+ENDPROC(memmove)
diff --git a/xen/arch/arm/arm64/lib/memset.S b/xen/arch/arm/arm64/lib/memset.S
new file mode 100644
index 0000000..25a4fb6
--- /dev/null
+++ b/xen/arch/arm/arm64/lib/memset.S
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+
+/*
+ * Fill in the buffer with character c (alignment handled by the hardware)
+ *
+ * Parameters:
+ * x0 - buf
+ * x1 - c
+ * x2 - n
+ * Returns:
+ * x0 - buf
+ */
+ENTRY(memset)
+ mov x4, x0
+ and w1, w1, #0xff
+ orr w1, w1, w1, lsl #8
+ orr w1, w1, w1, lsl #16
+ orr x1, x1, x1, lsl #32
+ subs x2, x2, #8
+ b.mi 2f
+1: str x1, [x4], #8
+ subs x2, x2, #8
+ b.pl 1b
+2: adds x2, x2, #4
+ b.mi 3f
+ sub x2, x2, #4
+ str w1, [x4], #4
+3: adds x2, x2, #2
+ b.mi 4f
+ sub x2, x2, #2
+ strh w1, [x4], #2
+4: adds x2, x2, #1
+ b.mi 5f
+ strb w1, [x4]
+5: ret
+ENDPROC(memset)
diff --git a/xen/arch/arm/arm64/lib/strchr.S b/xen/arch/arm/arm64/lib/strchr.S
new file mode 100644
index 0000000..9e265e4
--- /dev/null
+++ b/xen/arch/arm/arm64/lib/strchr.S
@@ -0,0 +1,41 @@
+/*
+ * Based on arch/arm/lib/strchr.S
+ *
+ * Copyright (C) 1995-2000 Russell King
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+
+/*
+ * Find the first occurrence of a character in a string.
+ *
+ * Parameters:
+ * x0 - str
+ * x1 - c
+ * Returns:
+ * x0 - address of first occurrence of 'c' or 0
+ */
+ENTRY(strchr)
+ and w1, w1, #0xff
+1: ldrb w2, [x0], #1
+ cmp w2, w1
+ ccmp w2, wzr, #4, ne
+ b.ne 1b
+ sub x0, x0, #1
+ cmp w2, w1
+ csel x0, x0, xzr, eq
+ ret
+ENDPROC(strchr)
diff --git a/xen/arch/arm/arm64/lib/strrchr.S b/xen/arch/arm/arm64/lib/strrchr.S
new file mode 100644
index 0000000..3791754
--- /dev/null
+++ b/xen/arch/arm/arm64/lib/strrchr.S
@@ -0,0 +1,42 @@
+/*
+ * Based on arch/arm/lib/strrchr.S
+ *
+ * Copyright (C) 1995-2000 Russell King
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+
+/*
+ * Find the last occurrence of a character in a string.
+ *
+ * Parameters:
+ * x0 - str
+ * x1 - c
+ * Returns:
+ * x0 - address of last occurrence of 'c' or 0
+ */
+ENTRY(strrchr)
+ mov x3, #0
+ and w1, w1, #0xff
+1: ldrb w2, [x0], #1
+ cbz w2, 2f
+ cmp w2, w1
+ b.ne 1b
+ sub x3, x0, #1
+ b 1b
+2: mov x0, x3
+ ret
+ENDPROC(strrchr)
diff --git a/xen/include/asm-arm/string.h b/xen/include/asm-arm/string.h
index 7d8b35a..3242762 100644
--- a/xen/include/asm-arm/string.h
+++ b/xen/include/asm-arm/string.h
@@ -3,8 +3,6 @@
#include <xen/config.h>
-#if defined(CONFIG_ARM_32)
-
/*
* We don't do inline string functions, since the
* optimised inline asm versions are not small.
@@ -29,6 +27,8 @@ extern void * memset(void *, int, __kernel_size_t);
#define __HAVE_ARCH_MEMCHR
extern void * memchr(const void *, int, __kernel_size_t);
+#if defined(CONFIG_ARM_32)
+
extern void __memzero(void *ptr, __kernel_size_t n);
#define memset(p,v,n) \
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 15/17] xen: arm64: optimised clear_page
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
` (13 preceding siblings ...)
2014-03-20 15:46 ` [PATCH 14/17] xen: arm64: assembly optimised mem* and str* Ian Campbell
@ 2014-03-20 15:46 ` Ian Campbell
2014-03-20 15:46 ` [PATCH 16/17] xen: arm: refactor xchg and cmpxchg into their own headers Ian Campbell
2014-03-20 15:46 ` [PATCH 17/17] xen: arm: document what low level primitives we have imported from Linux Ian Campbell
16 siblings, 0 replies; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:46 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
Taken from Linux v3.14-rc7.
The clear_page header now needs to be withing the !__ASSEMBLY__
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
xen/arch/arm/arm64/lib/Makefile | 1 +
xen/arch/arm/arm64/lib/clear_page.S | 36 +++++++++++++++++++++++++++++++++++
xen/include/asm-arm/page.h | 9 +++++++--
3 files changed, 44 insertions(+), 2 deletions(-)
create mode 100644 xen/arch/arm/arm64/lib/clear_page.S
diff --git a/xen/arch/arm/arm64/lib/Makefile b/xen/arch/arm/arm64/lib/Makefile
index 9f3b236..b895afa 100644
--- a/xen/arch/arm/arm64/lib/Makefile
+++ b/xen/arch/arm/arm64/lib/Makefile
@@ -1,3 +1,4 @@
obj-y += memcpy.o memmove.o memset.o memchr.o
+obj-y += clear_page.o
obj-y += bitops.o find_next_bit.o
obj-y += strchr.o strrchr.o
diff --git a/xen/arch/arm/arm64/lib/clear_page.S b/xen/arch/arm/arm64/lib/clear_page.S
new file mode 100644
index 0000000..8d5cadb
--- /dev/null
+++ b/xen/arch/arm/arm64/lib/clear_page.S
@@ -0,0 +1,36 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+
+/*
+ * Clear page @dest
+ *
+ * Parameters:
+ * x0 - dest
+ */
+ENTRY(clear_page)
+ mrs x1, dczid_el0
+ and w1, w1, #0xf
+ mov x2, #4
+ lsl x1, x2, x1
+
+1: dc zva, x0
+ add x0, x0, x1
+ tst x0, #(PAGE_SIZE - 1)
+ b.ne 1b
+ ret
+ENDPROC(clear_page)
diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index d18ec2a..e880ae8 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -341,6 +341,13 @@ static inline int gva_to_ipa(vaddr_t va, paddr_t *paddr)
/* Bits in the PAR returned by va_to_par */
#define PAR_FAULT 0x1
+
+#ifdef CONFIG_ARM_32
+#define clear_page(page) memset((void *)(page), 0, PAGE_SIZE)
+#else
+extern void clear_page(void *to);
+#endif
+
#endif /* __ASSEMBLY__ */
/*
@@ -382,8 +389,6 @@ static inline int gva_to_ipa(vaddr_t va, paddr_t *paddr)
#define third_table_offset(va) TABLE_OFFSET(third_linear_offset(va))
#define zeroeth_table_offset(va) TABLE_OFFSET(zeroeth_linear_offset(va))
-#define clear_page(page) memset((void *)(page), 0, PAGE_SIZE)
-
#define PAGE_ALIGN(x) (((x) + PAGE_SIZE - 1) & PAGE_MASK)
#endif /* __ARM_PAGE_H__ */
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 16/17] xen: arm: refactor xchg and cmpxchg into their own headers
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
` (14 preceding siblings ...)
2014-03-20 15:46 ` [PATCH 15/17] xen: arm64: optimised clear_page Ian Campbell
@ 2014-03-20 15:46 ` Ian Campbell
2014-03-20 17:52 ` Julien Grall
2014-03-20 15:46 ` [PATCH 17/17] xen: arm: document what low level primitives we have imported from Linux Ian Campbell
16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:46 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
Since these functions are taken from Linux this makes it easier to compare
against the Lihnux cmpxchg.h headers (which were split out from Linux's
system.h a while back).
Since these functions are from Linux the intention is to use Linux coding
style, therefore include a suitable emacs magic block.
For this reason also fix up the indentation in the 32-bit version to use hard
tabs while moving it. The 64-bit version was already correct.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
xen/include/asm-arm/arm32/cmpxchg.h | 146 ++++++++++++++++++++++++++++++
xen/include/asm-arm/arm32/system.h | 135 +---------------------------
xen/include/asm-arm/arm64/cmpxchg.h | 167 +++++++++++++++++++++++++++++++++++
xen/include/asm-arm/arm64/system.h | 155 +-------------------------------
4 files changed, 315 insertions(+), 288 deletions(-)
create mode 100644 xen/include/asm-arm/arm32/cmpxchg.h
create mode 100644 xen/include/asm-arm/arm64/cmpxchg.h
diff --git a/xen/include/asm-arm/arm32/cmpxchg.h b/xen/include/asm-arm/arm32/cmpxchg.h
new file mode 100644
index 0000000..70c6090
--- /dev/null
+++ b/xen/include/asm-arm/arm32/cmpxchg.h
@@ -0,0 +1,146 @@
+#ifndef __ASM_ARM32_CMPXCHG_H
+#define __ASM_ARM32_CMPXCHG_H
+
+extern void __bad_xchg(volatile void *, int);
+
+static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
+{
+ unsigned long ret;
+ unsigned int tmp;
+
+ smp_mb();
+
+ switch (size) {
+ case 1:
+ asm volatile("@ __xchg1\n"
+ "1: ldrexb %0, [%3]\n"
+ " strexb %1, %2, [%3]\n"
+ " teq %1, #0\n"
+ " bne 1b"
+ : "=&r" (ret), "=&r" (tmp)
+ : "r" (x), "r" (ptr)
+ : "memory", "cc");
+ break;
+ case 4:
+ asm volatile("@ __xchg4\n"
+ "1: ldrex %0, [%3]\n"
+ " strex %1, %2, [%3]\n"
+ " teq %1, #0\n"
+ " bne 1b"
+ : "=&r" (ret), "=&r" (tmp)
+ : "r" (x), "r" (ptr)
+ : "memory", "cc");
+ break;
+ default:
+ __bad_xchg(ptr, size), ret = 0;
+ break;
+ }
+ smp_mb();
+
+ return ret;
+}
+
+/*
+ * Atomic compare and exchange. Compare OLD with MEM, if identical,
+ * store NEW in MEM. Return the initial value in MEM. Success is
+ * indicated by comparing RETURN with OLD.
+ */
+
+extern void __bad_cmpxchg(volatile void *ptr, int size);
+
+static always_inline unsigned long __cmpxchg(
+ volatile void *ptr, unsigned long old, unsigned long new, int size)
+{
+ unsigned long oldval, res;
+
+ switch (size) {
+ case 1:
+ do {
+ asm volatile("@ __cmpxchg1\n"
+ " ldrexb %1, [%2]\n"
+ " mov %0, #0\n"
+ " teq %1, %3\n"
+ " strexbeq %0, %4, [%2]\n"
+ : "=&r" (res), "=&r" (oldval)
+ : "r" (ptr), "Ir" (old), "r" (new)
+ : "memory", "cc");
+ } while (res);
+ break;
+ case 2:
+ do {
+ asm volatile("@ __cmpxchg2\n"
+ " ldrexh %1, [%2]\n"
+ " mov %0, #0\n"
+ " teq %1, %3\n"
+ " strexheq %0, %4, [%2]\n"
+ : "=&r" (res), "=&r" (oldval)
+ : "r" (ptr), "Ir" (old), "r" (new)
+ : "memory", "cc");
+ } while (res);
+ break;
+ case 4:
+ do {
+ asm volatile("@ __cmpxchg4\n"
+ " ldrex %1, [%2]\n"
+ " mov %0, #0\n"
+ " teq %1, %3\n"
+ " strexeq %0, %4, [%2]\n"
+ : "=&r" (res), "=&r" (oldval)
+ : "r" (ptr), "Ir" (old), "r" (new)
+ : "memory", "cc");
+ } while (res);
+ break;
+#if 0
+ case 8:
+ do {
+ asm volatile("@ __cmpxchg8\n"
+ " ldrexd %1, [%2]\n"
+ " mov %0, #0\n"
+ " teq %1, %3\n"
+ " strexdeq %0, %4, [%2]\n"
+ : "=&r" (res), "=&r" (oldval)
+ : "r" (ptr), "Ir" (old), "r" (new)
+ : "memory", "cc");
+ } while (res);
+ break;
+#endif
+ default:
+ __bad_cmpxchg(ptr, size);
+ oldval = 0;
+ }
+
+ return oldval;
+}
+
+static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
+ unsigned long new, int size)
+{
+ unsigned long ret;
+
+ smp_mb();
+ ret = __cmpxchg(ptr, old, new, size);
+ smp_mb();
+
+ return ret;
+}
+
+#define cmpxchg(ptr,o,n) \
+ ((__typeof__(*(ptr)))__cmpxchg_mb((ptr), \
+ (unsigned long)(o), \
+ (unsigned long)(n), \
+ sizeof(*(ptr))))
+
+#define cmpxchg_local(ptr,o,n) \
+ ((__typeof__(*(ptr)))__cmpxchg((ptr), \
+ (unsigned long)(o), \
+ (unsigned long)(n), \
+ sizeof(*(ptr))))
+#endif
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 8
+ * indent-tabs-mode: t
+ * End:
+ */
diff --git a/xen/include/asm-arm/arm32/system.h b/xen/include/asm-arm/arm32/system.h
index dfaa3b6..b47b942 100644
--- a/xen/include/asm-arm/arm32/system.h
+++ b/xen/include/asm-arm/arm32/system.h
@@ -2,140 +2,7 @@
#ifndef __ASM_ARM32_SYSTEM_H
#define __ASM_ARM32_SYSTEM_H
-extern void __bad_xchg(volatile void *, int);
-
-static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
-{
- unsigned long ret;
- unsigned int tmp;
-
- smp_mb();
-
- switch (size) {
- case 1:
- asm volatile("@ __xchg1\n"
- "1: ldrexb %0, [%3]\n"
- " strexb %1, %2, [%3]\n"
- " teq %1, #0\n"
- " bne 1b"
- : "=&r" (ret), "=&r" (tmp)
- : "r" (x), "r" (ptr)
- : "memory", "cc");
- break;
- case 4:
- asm volatile("@ __xchg4\n"
- "1: ldrex %0, [%3]\n"
- " strex %1, %2, [%3]\n"
- " teq %1, #0\n"
- " bne 1b"
- : "=&r" (ret), "=&r" (tmp)
- : "r" (x), "r" (ptr)
- : "memory", "cc");
- break;
- default:
- __bad_xchg(ptr, size), ret = 0;
- break;
- }
- smp_mb();
-
- return ret;
-}
-
-/*
- * Atomic compare and exchange. Compare OLD with MEM, if identical,
- * store NEW in MEM. Return the initial value in MEM. Success is
- * indicated by comparing RETURN with OLD.
- */
-
-extern void __bad_cmpxchg(volatile void *ptr, int size);
-
-static always_inline unsigned long __cmpxchg(
- volatile void *ptr, unsigned long old, unsigned long new, int size)
-{
- unsigned long /*long*/ oldval, res;
-
- switch (size) {
- case 1:
- do {
- asm volatile("@ __cmpxchg1\n"
- " ldrexb %1, [%2]\n"
- " mov %0, #0\n"
- " teq %1, %3\n"
- " strexbeq %0, %4, [%2]\n"
- : "=&r" (res), "=&r" (oldval)
- : "r" (ptr), "Ir" (old), "r" (new)
- : "memory", "cc");
- } while (res);
- break;
- case 2:
- do {
- asm volatile("@ __cmpxchg2\n"
- " ldrexh %1, [%2]\n"
- " mov %0, #0\n"
- " teq %1, %3\n"
- " strexheq %0, %4, [%2]\n"
- : "=&r" (res), "=&r" (oldval)
- : "r" (ptr), "Ir" (old), "r" (new)
- : "memory", "cc");
- } while (res);
- break;
- case 4:
- do {
- asm volatile("@ __cmpxchg4\n"
- " ldrex %1, [%2]\n"
- " mov %0, #0\n"
- " teq %1, %3\n"
- " strexeq %0, %4, [%2]\n"
- : "=&r" (res), "=&r" (oldval)
- : "r" (ptr), "Ir" (old), "r" (new)
- : "memory", "cc");
- } while (res);
- break;
-#if 0
- case 8:
- do {
- asm volatile("@ __cmpxchg8\n"
- " ldrexd %1, [%2]\n"
- " mov %0, #0\n"
- " teq %1, %3\n"
- " strexdeq %0, %4, [%2]\n"
- : "=&r" (res), "=&r" (oldval)
- : "r" (ptr), "Ir" (old), "r" (new)
- : "memory", "cc");
- } while (res);
- break;
-#endif
- default:
- __bad_cmpxchg(ptr, size);
- oldval = 0;
- }
-
- return oldval;
-}
-
-static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
- unsigned long new, int size)
-{
- unsigned long ret;
-
- smp_mb();
- ret = __cmpxchg(ptr, old, new, size);
- smp_mb();
-
- return ret;
-}
-
-#define cmpxchg(ptr,o,n) \
- ((__typeof__(*(ptr)))__cmpxchg_mb((ptr), \
- (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
-
-#define cmpxchg_local(ptr,o,n) \
- ((__typeof__(*(ptr)))__cmpxchg((ptr), \
- (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
+#include <asm/arm32/cmpxchg.h>
#define local_irq_disable() asm volatile ( "cpsid i @ local_irq_disable\n" : : : "cc" )
#define local_irq_enable() asm volatile ( "cpsie i @ local_irq_enable\n" : : : "cc" )
diff --git a/xen/include/asm-arm/arm64/cmpxchg.h b/xen/include/asm-arm/arm64/cmpxchg.h
new file mode 100644
index 0000000..4e930ce
--- /dev/null
+++ b/xen/include/asm-arm/arm64/cmpxchg.h
@@ -0,0 +1,167 @@
+#ifndef __ASM_ARM64_CMPXCHG_H
+#define __ASM_ARM64_CMPXCHG_H
+
+extern void __bad_xchg(volatile void *, int);
+
+static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
+{
+ unsigned long ret, tmp;
+
+ switch (size) {
+ case 1:
+ asm volatile("// __xchg1\n"
+ "1: ldxrb %w0, %2\n"
+ " stlxrb %w1, %w3, %2\n"
+ " cbnz %w1, 1b\n"
+ : "=&r" (ret), "=&r" (tmp), "+Q" (*(u8 *)ptr)
+ : "r" (x)
+ : "memory");
+ break;
+ case 2:
+ asm volatile("// __xchg2\n"
+ "1: ldxrh %w0, %2\n"
+ " stlxrh %w1, %w3, %2\n"
+ " cbnz %w1, 1b\n"
+ : "=&r" (ret), "=&r" (tmp), "+Q" (*(u16 *)ptr)
+ : "r" (x)
+ : "memory");
+ break;
+ case 4:
+ asm volatile("// __xchg4\n"
+ "1: ldxr %w0, %2\n"
+ " stlxr %w1, %w3, %2\n"
+ " cbnz %w1, 1b\n"
+ : "=&r" (ret), "=&r" (tmp), "+Q" (*(u32 *)ptr)
+ : "r" (x)
+ : "memory");
+ break;
+ case 8:
+ asm volatile("// __xchg8\n"
+ "1: ldxr %0, %2\n"
+ " stlxr %w1, %3, %2\n"
+ " cbnz %w1, 1b\n"
+ : "=&r" (ret), "=&r" (tmp), "+Q" (*(u64 *)ptr)
+ : "r" (x)
+ : "memory");
+ break;
+ default:
+ __bad_xchg(ptr, size), ret = 0;
+ break;
+ }
+
+ smp_mb();
+ return ret;
+}
+
+#define xchg(ptr,x) \
+ ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
+
+extern void __bad_cmpxchg(volatile void *ptr, int size);
+
+static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
+ unsigned long new, int size)
+{
+ unsigned long oldval = 0, res;
+
+ switch (size) {
+ case 1:
+ do {
+ asm volatile("// __cmpxchg1\n"
+ " ldxrb %w1, %2\n"
+ " mov %w0, #0\n"
+ " cmp %w1, %w3\n"
+ " b.ne 1f\n"
+ " stxrb %w0, %w4, %2\n"
+ "1:\n"
+ : "=&r" (res), "=&r" (oldval), "+Q" (*(u8 *)ptr)
+ : "Ir" (old), "r" (new)
+ : "cc");
+ } while (res);
+ break;
+
+ case 2:
+ do {
+ asm volatile("// __cmpxchg2\n"
+ " ldxrh %w1, %2\n"
+ " mov %w0, #0\n"
+ " cmp %w1, %w3\n"
+ " b.ne 1f\n"
+ " stxrh %w0, %w4, %2\n"
+ "1:\n"
+ : "=&r" (res), "=&r" (oldval), "+Q" (*(u16 *)ptr)
+ : "Ir" (old), "r" (new)
+ : "cc");
+ } while (res);
+ break;
+
+ case 4:
+ do {
+ asm volatile("// __cmpxchg4\n"
+ " ldxr %w1, %2\n"
+ " mov %w0, #0\n"
+ " cmp %w1, %w3\n"
+ " b.ne 1f\n"
+ " stxr %w0, %w4, %2\n"
+ "1:\n"
+ : "=&r" (res), "=&r" (oldval), "+Q" (*(u32 *)ptr)
+ : "Ir" (old), "r" (new)
+ : "cc");
+ } while (res);
+ break;
+
+ case 8:
+ do {
+ asm volatile("// __cmpxchg8\n"
+ " ldxr %1, %2\n"
+ " mov %w0, #0\n"
+ " cmp %1, %3\n"
+ " b.ne 1f\n"
+ " stxr %w0, %4, %2\n"
+ "1:\n"
+ : "=&r" (res), "=&r" (oldval), "+Q" (*(u64 *)ptr)
+ : "Ir" (old), "r" (new)
+ : "cc");
+ } while (res);
+ break;
+
+ default:
+ __bad_cmpxchg(ptr, size);
+ oldval = 0;
+ }
+
+ return oldval;
+}
+
+static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
+ unsigned long new, int size)
+{
+ unsigned long ret;
+
+ smp_mb();
+ ret = __cmpxchg(ptr, old, new, size);
+ smp_mb();
+
+ return ret;
+}
+
+#define cmpxchg(ptr,o,n) \
+ ((__typeof__(*(ptr)))__cmpxchg_mb((ptr), \
+ (unsigned long)(o), \
+ (unsigned long)(n), \
+ sizeof(*(ptr))))
+
+#define cmpxchg_local(ptr,o,n) \
+ ((__typeof__(*(ptr)))__cmpxchg((ptr), \
+ (unsigned long)(o), \
+ (unsigned long)(n), \
+ sizeof(*(ptr))))
+
+#endif
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 8
+ * indent-tabs-mode: t
+ * End:
+ */
diff --git a/xen/include/asm-arm/arm64/system.h b/xen/include/asm-arm/arm64/system.h
index fa50ead..6efced3 100644
--- a/xen/include/asm-arm/arm64/system.h
+++ b/xen/include/asm-arm/arm64/system.h
@@ -2,160 +2,7 @@
#ifndef __ASM_ARM64_SYSTEM_H
#define __ASM_ARM64_SYSTEM_H
-extern void __bad_xchg(volatile void *, int);
-
-static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
-{
- unsigned long ret, tmp;
-
- switch (size) {
- case 1:
- asm volatile("// __xchg1\n"
- "1: ldxrb %w0, %2\n"
- " stlxrb %w1, %w3, %2\n"
- " cbnz %w1, 1b\n"
- : "=&r" (ret), "=&r" (tmp), "+Q" (*(u8 *)ptr)
- : "r" (x)
- : "memory");
- break;
- case 2:
- asm volatile("// __xchg2\n"
- "1: ldxrh %w0, %2\n"
- " stlxrh %w1, %w3, %2\n"
- " cbnz %w1, 1b\n"
- : "=&r" (ret), "=&r" (tmp), "+Q" (*(u16 *)ptr)
- : "r" (x)
- : "memory");
- break;
- case 4:
- asm volatile("// __xchg4\n"
- "1: ldxr %w0, %2\n"
- " stlxr %w1, %w3, %2\n"
- " cbnz %w1, 1b\n"
- : "=&r" (ret), "=&r" (tmp), "+Q" (*(u32 *)ptr)
- : "r" (x)
- : "memory");
- break;
- case 8:
- asm volatile("// __xchg8\n"
- "1: ldxr %0, %2\n"
- " stlxr %w1, %3, %2\n"
- " cbnz %w1, 1b\n"
- : "=&r" (ret), "=&r" (tmp), "+Q" (*(u64 *)ptr)
- : "r" (x)
- : "memory");
- break;
- default:
- __bad_xchg(ptr, size), ret = 0;
- break;
- }
-
- smp_mb();
- return ret;
-}
-
-#define xchg(ptr,x) \
- ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
-
-extern void __bad_cmpxchg(volatile void *ptr, int size);
-
-static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
- unsigned long new, int size)
-{
- unsigned long oldval = 0, res;
-
- switch (size) {
- case 1:
- do {
- asm volatile("// __cmpxchg1\n"
- " ldxrb %w1, %2\n"
- " mov %w0, #0\n"
- " cmp %w1, %w3\n"
- " b.ne 1f\n"
- " stxrb %w0, %w4, %2\n"
- "1:\n"
- : "=&r" (res), "=&r" (oldval), "+Q" (*(u8 *)ptr)
- : "Ir" (old), "r" (new)
- : "cc");
- } while (res);
- break;
-
- case 2:
- do {
- asm volatile("// __cmpxchg2\n"
- " ldxrh %w1, %2\n"
- " mov %w0, #0\n"
- " cmp %w1, %w3\n"
- " b.ne 1f\n"
- " stxrh %w0, %w4, %2\n"
- "1:\n"
- : "=&r" (res), "=&r" (oldval), "+Q" (*(u16 *)ptr)
- : "Ir" (old), "r" (new)
- : "cc");
- } while (res);
- break;
-
- case 4:
- do {
- asm volatile("// __cmpxchg4\n"
- " ldxr %w1, %2\n"
- " mov %w0, #0\n"
- " cmp %w1, %w3\n"
- " b.ne 1f\n"
- " stxr %w0, %w4, %2\n"
- "1:\n"
- : "=&r" (res), "=&r" (oldval), "+Q" (*(u32 *)ptr)
- : "Ir" (old), "r" (new)
- : "cc");
- } while (res);
- break;
-
- case 8:
- do {
- asm volatile("// __cmpxchg8\n"
- " ldxr %1, %2\n"
- " mov %w0, #0\n"
- " cmp %1, %3\n"
- " b.ne 1f\n"
- " stxr %w0, %4, %2\n"
- "1:\n"
- : "=&r" (res), "=&r" (oldval), "+Q" (*(u64 *)ptr)
- : "Ir" (old), "r" (new)
- : "cc");
- } while (res);
- break;
-
- default:
- __bad_cmpxchg(ptr, size);
- oldval = 0;
- }
-
- return oldval;
-}
-
-static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
- unsigned long new, int size)
-{
- unsigned long ret;
-
- smp_mb();
- ret = __cmpxchg(ptr, old, new, size);
- smp_mb();
-
- return ret;
-}
-
-#define cmpxchg(ptr,o,n) \
- ((__typeof__(*(ptr)))__cmpxchg_mb((ptr), \
- (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
-
-#define cmpxchg_local(ptr,o,n) \
- ((__typeof__(*(ptr)))__cmpxchg((ptr), \
- (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))))
+#include <asm/arm64/cmpxchg.h>
/* Uses uimm4 as a bitmask to select the clearing of one or more of
* the DAIF exception mask bits:
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 17/17] xen: arm: document what low level primitives we have imported from Linux
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
` (15 preceding siblings ...)
2014-03-20 15:46 ` [PATCH 16/17] xen: arm: refactor xchg and cmpxchg into their own headers Ian Campbell
@ 2014-03-20 15:46 ` Ian Campbell
2014-03-20 16:23 ` Ian Campbell
16 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:46 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, Ian Campbell, stefano.stabellini
As part of the recent update I had to reverse engineer what we had, which was
very tedious. Check in my notes so that I have a reference for next time.
Now the secret is to remember to update this file every time!
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
xen/arch/arm/README.LinuxPrimitives | 159 +++++++++++++++++++++++++++++++++++
1 file changed, 159 insertions(+)
create mode 100644 xen/arch/arm/README.LinuxPrimitives
diff --git a/xen/arch/arm/README.LinuxPrimitives b/xen/arch/arm/README.LinuxPrimitives
new file mode 100644
index 0000000..5656c11
--- /dev/null
+++ b/xen/arch/arm/README.LinuxPrimitives
@@ -0,0 +1,159 @@
+Xen on ARM uses various low level assembly primitives from the Linux
+kernel. This file tracks what files have been imported and when they
+were last updated.
+
+=====================================================================
+arm64:
+=====================================================================
+
+bitops: last sync @ v3.14-rc7 (last commit: 8e86f0b)
+
+linux/arch/arm64/lib/bitops.S xen/arch/arm/arm64/lib/bitops.S
+linux/arch/arm64/include/asm/bitops.h xen/include/asm-arm/arm64/bitops.h
+
+---------------------------------------------------------------------
+
+cmpxchg: last sync @ v3.14-rc7 (last commit: 95c4189)
+
+linux/arch/arm64/include/asm/cmpxchg.h xen/include/asm-arm/arm64/cmpxchg.h
+
+Skipped:
+ 60010e5 arm64: cmpxchg: update macros to prevent warnings
+
+---------------------------------------------------------------------
+
+atomics: last sync @ v3.14-rc7 (last commit: 95c4189)
+
+linux/arch/arm64/include/asm/atomic.h xen/include/asm-arm/arm64/atomic.h
+
+---------------------------------------------------------------------
+
+spinlocks: last sync @ v3.14-rc7 (last commit: 95c4189)
+
+linux/arch/arm64/include/asm/spinlock.h xen/include/asm-arm/arm64/spinlock.h
+
+Skipped:
+ 5686b06 arm64: lockref: add support for lockless lockrefs using cmpxchg
+ 52ea2a5 arm64: locks: introduce ticket-based spinlock implementation
+
+---------------------------------------------------------------------
+
+mem*: last sync @ v3.14-rc7 (last commit: 4a89922)
+
+linux/arch/arm64/lib/memchr.S xen/arch/arm/arm64/lib/memchr.S
+linux/arch/arm64/lib/memcpy.S xen/arch/arm/arm64/lib/memcpy.S
+linux/arch/arm64/lib/memmove.S xen/arch/arm/arm64/lib/memmove.S
+linux/arch/arm64/lib/memset.S xen/arch/arm/arm64/lib/memset.S
+
+for i in memchr.S memcpy.S memmove.S memset.S ; do
+ diff -u linux/arch/arm64/lib/$i xen/arch/arm/arm64/lib/$i
+done
+
+---------------------------------------------------------------------
+
+str*: last sync @ v3.14-rc7 (last commit: 2b8cac8)
+
+linux/arch/arm/lib/strchr.S xen/arch/arm/arm64/lib/strchr.S
+linux/arch/arm/lib/strrchr.S xen/arch/arm/arm64/lib/strrchr.S
+
+---------------------------------------------------------------------
+
+{clear,copy}_page: last sync @ v3.14-rc7 (last commit: f27bb13)
+
+linux/arch/arm64/lib/clear_page.S unused in Xen
+linux/arch/arm64/lib/copy_page.S xen/arch/arm/arm64/lib/copy_page.S
+
+=====================================================================
+arm32
+=====================================================================
+
+bitops: last sync @ v3.14-rc7 (last commit: b7ec699)
+
+ xen/arch/arm/arm32/lib/assembler.h
+linux/arch/arm/lib/bitops.h xen/arch/arm/arm32/lib/bitops.h
+linux/arch/arm/lib/changebit.S xen/arch/arm/arm32/lib/changebit.S
+linux/arch/arm/lib/clearbit.S xen/arch/arm/arm32/lib/clearbit.S
+linux/arch/arm/lib/findbit.S xen/arch/arm/arm32/lib/findbit.S
+linux/arch/arm/lib/setbit.S xen/arch/arm/arm32/lib/setbit.S
+linux/arch/arm/lib/testchangebit.S xen/arch/arm/arm32/lib/testchangebit.S
+linux/arch/arm/lib/testclearbit.S xen/arch/arm/arm32/lib/testclearbit.S
+linux/arch/arm/lib/testsetbit.S xen/arch/arm/arm32/lib/testsetbit.S
+
+for i in assembler.h bitops.h changebit.S clearbit.S findbit.S \
+ setbit.S testchangebit.S testclearbit.S testsetbit.S; do
+ diff -u ../linux/arch/arm/lib/$i xen/arch/arm/arm32/lib/$i;
+done
+
+---------------------------------------------------------------------
+
+cmpxchg: last sync @ v3.14-rc7 (last commit: 775ebcc)
+
+linux/arch/arm/include/asm/cmpxchg.h xen/include/asm-arm/arm32/cmpxchg.h
+
+---------------------------------------------------------------------
+
+atomics: last sync @ v3.14-rc7 (last commit: aed3a4e)
+
+linux/arch/arm/include/asm/atomic.h xen/include/asm-arm/arm32/atomic.h
+
+---------------------------------------------------------------------
+
+spinlocks: last sync: 15e7e5c1ebf5
+
+linux/arch/arm/include/asm/spinlock.h xen/include/asm-arm/arm32/spinlock.h
+
+resync to v3.14-rc7:
+
+ 7c8746a ARM: 7955/1: spinlock: ensure we have a compiler barrier before sev
+ 0cbad9c ARM: 7854/1: lockref: add support for lockless lockrefs using cmpxchg64
+ 9bb17be ARM: locks: prefetch the destination word for write prior to strex
+ 27a8479 ARM: smp_on_up: move inline asm ALT_SMP patching macro out of spinlock.
+ 00efaa0 ARM: 7812/1: rwlocks: retry trylock operation if strex fails on free lo
+ afa31d8 ARM: 7811/1: locks: use early clobber in arch_spin_trylock
+ 73a6fdc ARM: spinlock: use inner-shareable dsb variant prior to sev instruction
+
+---------------------------------------------------------------------
+
+mem*: last sync @ v3.14-rc7 (last commit: 418df63a)
+
+linux/arch/arm/lib/copy_template.S xen/arch/arm/arm32/lib/copy_template.S
+linux/arch/arm/lib/memchr.S xen/arch/arm/arm32/lib/memchr.S
+linux/arch/arm/lib/memcpy.S xen/arch/arm/arm32/lib/memcpy.S
+linux/arch/arm/lib/memmove.S xen/arch/arm/arm32/lib/memmove.S
+linux/arch/arm/lib/memset.S xen/arch/arm/arm32/lib/memset.S
+linux/arch/arm/lib/memzero.S xen/arch/arm/arm32/lib/memzero.S
+
+linux/arch/arm/lib/strchr.S xen/arch/arm/arm32/lib/strchr.S
+linux/arch/arm/lib/strrchr.S xen/arch/arm/arm32/lib/strrchr.S
+
+for i in copy_template.S memchr.S memcpy.S memmove.S memset.S \
+ memzero.S ; do
+ diff -u linux/arch/arm/lib/$i xen/arch/arm/arm32/lib/$i
+done
+
+---------------------------------------------------------------------
+
+str*: last sync @ v3.13-rc7 (last commit: 93ed397)
+
+linux/arch/arm/lib/strchr.S xen/arch/arm/arm32/lib/strchr.S
+linux/arch/arm/lib/strrchr.S xen/arch/arm/arm32/lib/strrchr.S
+
+---------------------------------------------------------------------
+
+{clear,copy}_page: last sync: Never
+
+linux/arch/arm/lib/copy_page.S unused in Xen
+
+clear_page == memset
+
+---------------------------------------------------------------------
+
+libgcc: last sync @ v3.14-rc7 (last commit: 01885bc)
+
+linux/arch/arm/lib/lib1funcs.S xen/arch/arm/arm32/lib/lib1funcs.S
+linux/arch/arm/lib/lshrdi3.S xen/arch/arm/arm32/lib/lshrdi3.S
+linux/arch/arm/lib/div64.S xen/arch/arm/arm32/lib/div64.S
+
+for i in lib1funcs.S lshrdi3.S div64.S ; do
+ diff -u linux/arch/arm/lib/$i xen/arch/arm/arm32/lib/$i
+done
--
1.7.10.4
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 10/17] xen: arm64: disable alignment traps
2014-03-20 15:45 ` [PATCH 10/17] xen: arm64: disable alignment traps Ian Campbell
@ 2014-03-20 15:57 ` Andrew Cooper
2014-03-20 15:59 ` Ian Campbell
2014-03-20 17:54 ` Julien Grall
1 sibling, 1 reply; 42+ messages in thread
From: Andrew Cooper @ 2014-03-20 15:57 UTC (permalink / raw)
To: Ian Campbell; +Cc: julien.grall, tim, stefano.stabellini, xen-devel
On 20/03/14 15:45, Ian Campbell wrote:
> The mem* primitives which I am about to import from Linux in a subsequent
> patch rely on the hardware handling misalignment.
>
> The benefits of an optimised memcpy etc oughtweigh the downsides.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> ---
> xen/arch/arm/arm64/head.S | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index 9547ef5..22d0030 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -241,7 +241,7 @@ skip_bss:
> * I-cache enabled,
> * Alignment checking enabled,
Is this comment still true?
~Andrew
> * MMU translation disabled (for now). */
> - ldr x0, =(HSCTLR_BASE|SCTLR_A)
> + ldr x0, =(HSCTLR_BASE)
> msr SCTLR_EL2, x0
>
> /* Rebuild the boot pagetable's first-level entries. The structure
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 10/17] xen: arm64: disable alignment traps
2014-03-20 15:57 ` Andrew Cooper
@ 2014-03-20 15:59 ` Ian Campbell
2014-03-20 16:21 ` Gordan Bobic
0 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 15:59 UTC (permalink / raw)
To: Andrew Cooper; +Cc: julien.grall, tim, stefano.stabellini, xen-devel
On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote:
> On 20/03/14 15:45, Ian Campbell wrote:
> > The mem* primitives which I am about to import from Linux in a subsequent
> > patch rely on the hardware handling misalignment.
> >
> > The benefits of an optimised memcpy etc oughtweigh the downsides.
Ahem, "outweigh".
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > ---
> > xen/arch/arm/arm64/head.S | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> > index 9547ef5..22d0030 100644
> > --- a/xen/arch/arm/arm64/head.S
> > +++ b/xen/arch/arm/arm64/head.S
> > @@ -241,7 +241,7 @@ skip_bss:
> > * I-cache enabled,
> > * Alignment checking enabled,
>
> Is this comment still true?
Oh balls, no it is not. I had a meeting between deciding to make this
change and actually making it...
Ian.
>
> ~Andrew
>
> > * MMU translation disabled (for now). */
> > - ldr x0, =(HSCTLR_BASE|SCTLR_A)
> > + ldr x0, =(HSCTLR_BASE)
> > msr SCTLR_EL2, x0
> >
> > /* Rebuild the boot pagetable's first-level entries. The structure
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 01/17] xen: x86 & generic: change to __builtin_prefetch()
2014-03-20 15:45 ` [PATCH 01/17] xen: x86 & generic: change to __builtin_prefetch() Ian Campbell
@ 2014-03-20 16:12 ` Jan Beulich
0 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2014-03-20 16:12 UTC (permalink / raw)
To: Ian Campbell; +Cc: KeirFraser, tim, julien.grall, xen-devel, stefano.stabellini
>>> On 20.03.14 at 16:45, Ian Campbell <ian.campbell@citrix.com> wrote:
> Quoting Andi Kleen in Linux b483570a13be from 2007:
> gcc 3.2+ supports __builtin_prefetch, so it's possible to use it on all
> architectures. Change the generic fallback in linux/prefetch.h to use it
> instead of noping it out. gcc should do the right thing when the
> architecture doesn't support prefetching
>
> Undefine the x86-64 inline assembler version and use the fallback.
>
> ARM wants to use the builtins.
>
> Fix a pair of spelling errors, one of which was from Lucas De Marchi in the
> Linux tree.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Keir Fraser <keir@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> xen/include/xen/prefetch.h | 13 +++----------
> 1 file changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/xen/include/xen/prefetch.h b/xen/include/xen/prefetch.h
> index 8d7d3ff..ba73998 100644
> --- a/xen/include/xen/prefetch.h
> +++ b/xen/include/xen/prefetch.h
> @@ -28,24 +28,17 @@
> prefetchw(x) - prefetches the cacheline at "x" for write
> spin_lock_prefetch(x) - prefectches the spinlock *x for taking
>
> - there is also PREFETCH_STRIDE which is the architecure-prefered
> + there is also PREFETCH_STRIDE which is the architecture-preferred
> "lookahead" size for prefetching streamed operations.
>
> */
>
> -/*
> - * These cannot be do{}while(0) macros. See the mental gymnastics in
> - * the loop macro.
> - */
> -
> #ifndef ARCH_HAS_PREFETCH
> -#define ARCH_HAS_PREFETCH
> -static inline void prefetch(const void *x) {;}
> +#define prefetch(x) __builtin_prefetch(x)
> #endif
>
> #ifndef ARCH_HAS_PREFETCHW
> -#define ARCH_HAS_PREFETCHW
> -static inline void prefetchw(const void *x) {;}
> +#define prefetchw(x) __builtin_prefetch(x,1)
> #endif
>
> #ifndef ARCH_HAS_SPINLOCK_PREFETCH
> --
> 1.7.10.4
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 10/17] xen: arm64: disable alignment traps
2014-03-20 15:59 ` Ian Campbell
@ 2014-03-20 16:21 ` Gordan Bobic
2014-03-20 16:27 ` Ian Campbell
0 siblings, 1 reply; 42+ messages in thread
From: Gordan Bobic @ 2014-03-20 16:21 UTC (permalink / raw)
To: Ian Campbell
Cc: Andrew Cooper, xen-devel, julien.grall, tim, stefano.stabellini
On 2014-03-20 15:59, Ian Campbell wrote:
> On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote:
>> On 20/03/14 15:45, Ian Campbell wrote:
>> > The mem* primitives which I am about to import from Linux in a subsequent
>> > patch rely on the hardware handling misalignment.
>> >
>> > The benefits of an optimised memcpy etc oughtweigh the downsides.
>
> Ahem, "outweigh".
Just FYI, the slow-down from heavy unaligned accesses (with
hardware alignment fixup, you can't disable it using
/proc/cpu/alignment) on Cortex A15 is about 40x.
Most of the commonly used code has been fixed recently, but
there are still some packages that exhibit misaligned access
traps during their test suites and/or normal operation.
Whether the hardware alignment fixup is less overheady on
ARM64, I don't know - I haven't been able to get my hands
on the hardware yet.
Gordan
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 17/17] xen: arm: document what low level primitives we have imported from Linux
2014-03-20 15:46 ` [PATCH 17/17] xen: arm: document what low level primitives we have imported from Linux Ian Campbell
@ 2014-03-20 16:23 ` Ian Campbell
0 siblings, 0 replies; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 16:23 UTC (permalink / raw)
To: xen-devel; +Cc: julien.grall, tim, stefano.stabellini
On Thu, 2014-03-20 at 15:46 +0000, Ian Campbell wrote:
> As part of the recent update I had to reverse engineer what we had, which was
> very tedious. Check in my notes so that I have a reference for next time.
>
> Now the secret is to remember to update this file every time!
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> ---
> xen/arch/arm/README.LinuxPrimitives | 159 +++++++++++++++++++++++++++++++++++
> 1 file changed, 159 insertions(+)
> create mode 100644 xen/arch/arm/README.LinuxPrimitives
>
> diff --git a/xen/arch/arm/README.LinuxPrimitives b/xen/arch/arm/README.LinuxPrimitives
> new file mode 100644
> index 0000000..5656c11
> --- /dev/null
> +++ b/xen/arch/arm/README.LinuxPrimitives
Apparently I forgot to git commit --amend the following in to this
patch. I'll incorporate next time.
diff --git a/xen/arch/arm/README.LinuxPrimitives b/xen/arch/arm/README.LinuxPrimitives
index 5656c11..6cd03ca 100644
--- a/xen/arch/arm/README.LinuxPrimitives
+++ b/xen/arch/arm/README.LinuxPrimitives
@@ -69,7 +69,6 @@ arm32
bitops: last sync @ v3.14-rc7 (last commit: b7ec699)
- xen/arch/arm/arm32/lib/assembler.h
linux/arch/arm/lib/bitops.h xen/arch/arm/arm32/lib/bitops.h
linux/arch/arm/lib/changebit.S xen/arch/arm/arm32/lib/changebit.S
linux/arch/arm/lib/clearbit.S xen/arch/arm/arm32/lib/clearbit.S
@@ -79,8 +78,8 @@ linux/arch/arm/lib/testchangebit.S xen/arch/arm/arm32/lib/testchangebit.S
linux/arch/arm/lib/testclearbit.S xen/arch/arm/arm32/lib/testclearbit.S
linux/arch/arm/lib/testsetbit.S xen/arch/arm/arm32/lib/testsetbit.S
-for i in assembler.h bitops.h changebit.S clearbit.S findbit.S \
- setbit.S testchangebit.S testclearbit.S testsetbit.S; do
+for i in bitops.h changebit.S clearbit.S findbit.S setbit.S testchangebit.S \
+ testclearbit.S testsetbit.S; do
diff -u ../linux/arch/arm/lib/$i xen/arch/arm/arm32/lib/$i;
done
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 10/17] xen: arm64: disable alignment traps
2014-03-20 16:21 ` Gordan Bobic
@ 2014-03-20 16:27 ` Ian Campbell
2014-03-20 16:43 ` Gordan Bobic
0 siblings, 1 reply; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 16:27 UTC (permalink / raw)
To: Gordan Bobic
Cc: Andrew Cooper, xen-devel, julien.grall, tim, stefano.stabellini
On Thu, 2014-03-20 at 16:21 +0000, Gordan Bobic wrote:
> On 2014-03-20 15:59, Ian Campbell wrote:
> > On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote:
> >> On 20/03/14 15:45, Ian Campbell wrote:
> >> > The mem* primitives which I am about to import from Linux in a subsequent
> >> > patch rely on the hardware handling misalignment.
> >> >
> >> > The benefits of an optimised memcpy etc oughtweigh the downsides.
> >
> > Ahem, "outweigh".
>
> Just FYI, the slow-down from heavy unaligned accesses (with
> hardware alignment fixup, you can't disable it using
> /proc/cpu/alignment) on Cortex A15 is about 40x.
That's pretty staggering -- are you positive this wasn't the kernel
doing the fixups?
> Most of the commonly used code has been fixed recently, but
> there are still some packages that exhibit misaligned access
> traps during their test suites and/or normal operation.
>
> Whether the hardware alignment fixup is less overheady on
> ARM64, I don't know - I haven't been able to get my hands
> on the hardware yet.
arm64 is a lot "friendlier" than arm32 in this regard. I was mostly
taking it on trust that whoever implemented memcpy.S etc found that
memcpy.S with hardware alignment was better than the dumb loop, even if
it wasn't as good as a clever memcpy.S which avoided the alignments.
Ian.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 10/17] xen: arm64: disable alignment traps
2014-03-20 16:27 ` Ian Campbell
@ 2014-03-20 16:43 ` Gordan Bobic
2014-03-20 16:54 ` Ian Campbell
0 siblings, 1 reply; 42+ messages in thread
From: Gordan Bobic @ 2014-03-20 16:43 UTC (permalink / raw)
To: Ian Campbell
Cc: Andrew Cooper, xen-devel, julien.grall, tim, stefano.stabellini
On 2014-03-20 16:27, Ian Campbell wrote:
> On Thu, 2014-03-20 at 16:21 +0000, Gordan Bobic wrote:
>> On 2014-03-20 15:59, Ian Campbell wrote:
>> > On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote:
>> >> On 20/03/14 15:45, Ian Campbell wrote:
>> >> > The mem* primitives which I am about to import from Linux in a subsequent
>> >> > patch rely on the hardware handling misalignment.
>> >> >
>> >> > The benefits of an optimised memcpy etc oughtweigh the downsides.
>> >
>> > Ahem, "outweigh".
>>
>> Just FYI, the slow-down from heavy unaligned accesses (with
>> hardware alignment fixup, you can't disable it using
>> /proc/cpu/alignment) on Cortex A15 is about 40x.
>
> That's pretty staggering -- are you positive this wasn't the kernel
> doing the fixups?
I'm not sure if this is easily checkable:
# echo 0 > /proc/cpu/alignment
# cat /proc/cpu/alignment
User: 0
System: 631040
Skipped: 0
Half: 0
Word: 631040
DWord: 0
Multi: 0
User faults: 2 (fixup)
i.e. I can't disable it.
This is on a Samsung Exynos Chromebook with the
standard ChromeOS kernel.
Here is a recent thread from the Fedora ARM mailing list
which contains links to a simple test program that can
be used to test the alignment related slowdown:
http://www.mail-archive.com/arm@lists.fedoraproject.org/msg06121.html
>> Most of the commonly used code has been fixed recently, but
>> there are still some packages that exhibit misaligned access
>> traps during their test suites and/or normal operation.
>>
>> Whether the hardware alignment fixup is less overheady on
>> ARM64, I don't know - I haven't been able to get my hands
>> on the hardware yet.
>
> arm64 is a lot "friendlier" than arm32 in this regard. I was mostly
> taking it on trust that whoever implemented memcpy.S etc found that
> memcpy.S with hardware alignment was better than the dumb loop, even if
> it wasn't as good as a clever memcpy.S which avoided the alignments.
I am inclined to agree - it shouldn't be the job of the kernel or the
hypervisor to do this. It is up to the application developers to know
what they are doing and not do things that introduce misaligned
accesses. Unfortunately, there is far too little push-back on buggy
code because most developers have only ever used x86 and have no idea
that until recently everything else wasn't forgiving of such things.
Gordan
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 10/17] xen: arm64: disable alignment traps
2014-03-20 16:43 ` Gordan Bobic
@ 2014-03-20 16:54 ` Ian Campbell
0 siblings, 0 replies; 42+ messages in thread
From: Ian Campbell @ 2014-03-20 16:54 UTC (permalink / raw)
To: Gordan Bobic
Cc: Andrew Cooper, xen-devel, julien.grall, tim, stefano.stabellini
On Thu, 2014-03-20 at 16:43 +0000, Gordan Bobic wrote:
> On 2014-03-20 16:27, Ian Campbell wrote:
> > On Thu, 2014-03-20 at 16:21 +0000, Gordan Bobic wrote:
> >> On 2014-03-20 15:59, Ian Campbell wrote:
> >> > On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote:
> >> >> On 20/03/14 15:45, Ian Campbell wrote:
> >> >> > The mem* primitives which I am about to import from Linux in a subsequent
> >> >> > patch rely on the hardware handling misalignment.
> >> >> >
> >> >> > The benefits of an optimised memcpy etc oughtweigh the downsides.
> >> >
> >> > Ahem, "outweigh".
> >>
> >> Just FYI, the slow-down from heavy unaligned accesses (with
> >> hardware alignment fixup, you can't disable it using
> >> /proc/cpu/alignment) on Cortex A15 is about 40x.
> >
> > That's pretty staggering -- are you positive this wasn't the kernel
> > doing the fixups?
>
> I'm not sure if this is easily checkable:
>
> # echo 0 > /proc/cpu/alignment
> # cat /proc/cpu/alignment
> User: 0
> System: 631040
> Skipped: 0
> Half: 0
> Word: 631040
> DWord: 0
> Multi: 0
> User faults: 2 (fixup)
>
> i.e. I can't disable it.
That "fixup" implies to me that the kernel will be fixing things up.
linux/Documentation/arm/mem_alignment describes what happens here.
>
> This is on a Samsung Exynos Chromebook with the
> standard ChromeOS kernel.
I've no idea if this sets SCTLR.A but it sounds like it does.
>
> Here is a recent thread from the Fedora ARM mailing list
> which contains links to a simple test program that can
> be used to test the alignment related slowdown:
>
> http://www.mail-archive.com/arm@lists.fedoraproject.org/msg06121.html
>
> >> Most of the commonly used code has been fixed recently, but
> >> there are still some packages that exhibit misaligned access
> >> traps during their test suites and/or normal operation.
> >>
> >> Whether the hardware alignment fixup is less overheady on
> >> ARM64, I don't know - I haven't been able to get my hands
> >> on the hardware yet.
> >
> > arm64 is a lot "friendlier" than arm32 in this regard. I was mostly
> > taking it on trust that whoever implemented memcpy.S etc found that
> > memcpy.S with hardware alignment was better than the dumb loop, even if
> > it wasn't as good as a clever memcpy.S which avoided the alignments.
>
> I am inclined to agree - it shouldn't be the job of the kernel or the
> hypervisor to do this.
This patch is only changing the alignment trap behaviour for the
hypervisor itself, it has no impact on either guest kernel or userspace,
which have their own control bits for operation in those modes.
Ian.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 02/17] xen: arm32: resync bitops with Linux v3.14-rc7
2014-03-20 15:45 ` [PATCH 02/17] xen: arm32: resync bitops with Linux v3.14-rc7 Ian Campbell
@ 2014-03-20 17:13 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:13 UTC (permalink / raw)
To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel
On 03/20/2014 03:45 PM, Ian Campbell wrote:
> This pulls in the following Linux commits:
>
> commit c36ef4b1762302a493c6cb754073bded084700e2
> Author: Will Deacon <will.deacon@arm.com>
> Date: Wed Nov 23 11:28:25 2011 +0100
>
> ARM: 7171/1: unwind: add unwind directives to bitops assembly macros
>
> The bitops functions (e.g. _test_and_set_bit) on ARM do not have unwind
> annotations and therefore the kernel cannot backtrace out of them on a
> fatal error (for example, NULL pointer dereference).
>
> This patch annotates the bitops assembly macros with UNWIND annotations
> so that we can produce a meaningful backtrace on error. Callers of the
> macros are modified to pass their function name as a macro parameter,
> enforcing that the macros are used as standalone function implementations.
>
> Acked-by: Dave Martin <dave.martin@linaro.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
>
> commit d779c07dd72098a7416d907494f958213b7726f3
> Author: Will Deacon <will.deacon@arm.com>
> Date: Thu Jun 27 12:01:51 2013 +0100
>
> ARM: bitops: prefetch the destination word for write prior to strex
>
> The cost of changing a cacheline from shared to exclusive state can be
> significant, especially when this is triggered by an exclusive store,
> since it may result in having to retry the transaction.
>
> This patch prefixes our atomic bitops implementation with prefetchw,
> to try and grab the line in exclusive state from the start. The testop
> macro is left alone, since the barrier semantics limit the usefulness
> of prefetching data.
>
> Acked-by: Nicolas Pitre <nico@linaro.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
>
> commit b7ec699405f55667caeb46d96229d75bf33a83ad
> Author: Will Deacon <will.deacon@arm.com>
> Date: Tue Nov 19 15:46:11 2013 +0100
>
> ARM: 7893/1: bitops: only emit .arch_extension mp if CONFIG_SMP
>
> Uwe reported a build failure when targetting a NOMMU platform with my
> recent prefetch changes:
>
> arch/arm/lib/changebit.S: Assembler messages:
> arch/arm/lib/changebit.S:15: Error: architectural extension `mp' is
> not allowed for the current base architecture
>
> This is due to use of the .arch_extension mp directive immediately prior
> to an ALT_SMP(...) instruction. Whilst the ALT_SMP macro will expand to
> nothing if !CONFIG_SMP, gas will still choke on the directive.
>
> This patch fixes the issue by only emitting the sequence (including the
> directive) if CONFIG_SMP=y.
>
> Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
> ---
> xen/arch/arm/arm32/lib/bitops.h | 17 +++++++++++++++--
> xen/arch/arm/arm32/lib/changebit.S | 4 +---
> xen/arch/arm/arm32/lib/clearbit.S | 4 +---
> xen/arch/arm/arm32/lib/setbit.S | 4 +---
> xen/arch/arm/arm32/lib/testchangebit.S | 4 +---
> xen/arch/arm/arm32/lib/testclearbit.S | 4 +---
> xen/arch/arm/arm32/lib/testsetbit.S | 4 +---
> 7 files changed, 21 insertions(+), 20 deletions(-)
>
> diff --git a/xen/arch/arm/arm32/lib/bitops.h b/xen/arch/arm/arm32/lib/bitops.h
> index 689f2e8..25784c3 100644
> --- a/xen/arch/arm/arm32/lib/bitops.h
> +++ b/xen/arch/arm/arm32/lib/bitops.h
> @@ -1,13 +1,20 @@
> #include <xen/config.h>
>
> #if __LINUX_ARM_ARCH__ >= 6
> - .macro bitop, instr
> + .macro bitop, name, instr
> +ENTRY( \name )
> +UNWIND( .fnstart )
> ands ip, r1, #3
> strneb r1, [ip] @ assert word-aligned
> mov r2, #1
> and r3, r0, #31 @ Get bit offset
> mov r0, r0, lsr #5
> add r1, r1, r0, lsl #2 @ Get word offset
> +#if __LINUX_ARM_ARCH__ >= 7 && defined(CONFIG_SMP)
> + .arch_extension mp
> + ALT_SMP(W(pldw) [r1])
> + ALT_UP(W(nop))
> +#endif
> mov r3, r2, lsl r3
> 1: ldrex r2, [r1]
> \instr r2, r2, r3
> @@ -15,9 +22,13 @@
> cmp r0, #0
> bne 1b
> bx lr
> +UNWIND( .fnend )
> +ENDPROC(\name )
> .endm
>
> - .macro testop, instr, store
> + .macro testop, name, instr, store
> +ENTRY( \name )
> +UNWIND( .fnstart )
> ands ip, r1, #3
> strneb r1, [ip] @ assert word-aligned
> mov r2, #1
> @@ -36,6 +47,8 @@
> cmp r0, #0
> movne r0, #1
> 2: bx lr
> +UNWIND( .fnend )
> +ENDPROC(\name )
> .endm
> #else
> .macro bitop, name, instr
> diff --git a/xen/arch/arm/arm32/lib/changebit.S b/xen/arch/arm/arm32/lib/changebit.S
> index 62954bc..11f41d2 100644
> --- a/xen/arch/arm/arm32/lib/changebit.S
> +++ b/xen/arch/arm/arm32/lib/changebit.S
> @@ -13,6 +13,4 @@
> #include "bitops.h"
> .text
> fined(CONFIG_SMP)
> -ENTRY(_change_bit)
> - bitop eor
> -ENDPROC(_change_bit)
> +bitop _change_bit, eor
> diff --git a/xen/arch/arm/arm32/lib/clearbit.S b/xen/arch/arm/arm32/lib/clearbit.S
> index 42ce416..1b6a569 100644
> --- a/xen/arch/arm/arm32/lib/clearbit.S
> +++ b/xen/arch/arm/arm32/lib/clearbit.S
> @@ -14,6 +14,4 @@
> #include "bitops.h"
> .text
>
> -ENTRY(_clear_bit)
> - bitop bic
> -ENDPROC(_clear_bit)
> +bitop _clear_bit, bic
> diff --git a/xen/arch/arm/arm32/lib/setbit.S b/xen/arch/arm/arm32/lib/setbit.S
> index c828851..1f4ef56 100644
> --- a/xen/arch/arm/arm32/lib/setbit.S
> +++ b/xen/arch/arm/arm32/lib/setbit.S
> @@ -13,6 +13,4 @@
> #include "bitops.h"
> .text
>
> -ENTRY(_set_bit)
> - bitop orr
> -ENDPROC(_set_bit)
> +bitop _set_bit, orr
> diff --git a/xen/arch/arm/arm32/lib/testchangebit.S b/xen/arch/arm/arm32/lib/testchangebit.S
> index a7f527c..7f4635c 100644
> --- a/xen/arch/arm/arm32/lib/testchangebit.S
> +++ b/xen/arch/arm/arm32/lib/testchangebit.S
> @@ -13,6 +13,4 @@
> #include "bitops.h"
> .text
>
> -ENTRY(_test_and_change_bit)
> - testop eor, str
> -ENDPROC(_test_and_change_bit)
> +testop _test_and_change_bit, eor, str
> diff --git a/xen/arch/arm/arm32/lib/testclearbit.S b/xen/arch/arm/arm32/lib/testclearbit.S
> index 8f39c72..4d4152f 100644
> --- a/xen/arch/arm/arm32/lib/testclearbit.S
> +++ b/xen/arch/arm/arm32/lib/testclearbit.S
> @@ -13,6 +13,4 @@
> #include "bitops.h"
> .text
>
> -ENTRY(_test_and_clear_bit)
> - testop bicne, strne
> -ENDPROC(_test_and_clear_bit)
> +testop _test_and_clear_bit, bicne, strne
> diff --git a/xen/arch/arm/arm32/lib/testsetbit.S b/xen/arch/arm/arm32/lib/testsetbit.S
> index 1b8d273..54f48f9 100644
> --- a/xen/arch/arm/arm32/lib/testsetbit.S
> +++ b/xen/arch/arm/arm32/lib/testsetbit.S
> @@ -13,6 +13,4 @@
> #include "bitops.h"
> .text
>
> -ENTRY(_test_and_set_bit)
> - testop orreq, streq
> -ENDPROC(_test_and_set_bit)
> +testop _test_and_set_bit, orreq, streq
>
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 03/17] xen: arm32: ensure cmpxchg has full barrier semantics
2014-03-20 15:45 ` [PATCH 03/17] xen: arm32: ensure cmpxchg has full barrier semantics Ian Campbell
@ 2014-03-20 17:22 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:22 UTC (permalink / raw)
To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel
Hi Ian,
On 03/20/2014 03:45 PM, Ian Campbell wrote:
> Unrelated reads/writes should not pass the xchg.
>
> Provide cmpxchg_local for parity with arm64, although it appears to be unused.
> It also helps make the reason for the separation of __cmpxchg_mb more
> apparent.
>
> With this our cmpxchg is in sync with Linux v3.14-rc7.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> ---
> We got our cmpxchg implementation from Linux which AFAICS has always had these
> additional barriers. I don't recall us having decided that Xen barriers should
> not have this property as well, and if we did we were remiss in not adding a
> comment etc... If my memory is faulty then I am happy to replace thispatch
> with one which adds a comment instead.
I think the barrier is good for Xen. We may have some place where both
of this macro are used as a "barrier".
Acked-by: Julien Grall <julien.grall@linaro.org>
Regards,
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 04/17] xen: arm32: replace hard tabs in atomics.h
2014-03-20 15:45 ` [PATCH 04/17] xen: arm32: replace hard tabs in atomics.h Ian Campbell
@ 2014-03-20 17:23 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:23 UTC (permalink / raw)
To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel
Hi Ian,
On 03/20/2014 03:45 PM, Ian Campbell wrote:
> This file is from Linux and the intention was to keep the formatting the same
> to make resyncing easier. Put the hardtabs back and adjust the emacs magic to
> reflect the desired use of whitespace.
>
> Adjust the 64-bit emacs magic too.
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
I guess it was just mechanical replace:
Acked-by: Julien Grall <julien.grall@linaro.org>
Regards,
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 05/17] xen: arm32: resync atomics with (almost) v3.14-rc7
2014-03-20 15:45 ` [PATCH 05/17] xen: arm32: resync atomics with (almost) v3.14-rc7 Ian Campbell
@ 2014-03-20 17:27 ` Julien Grall
2014-03-21 8:41 ` Ian Campbell
0 siblings, 1 reply; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:27 UTC (permalink / raw)
To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel
Hi Ian,
On 03/20/2014 03:45 PM, Ian Campbell wrote:
> diff --git a/xen/include/asm-arm/arm32/atomic.h b/xen/include/asm-arm/arm32/atomic.h
> index 3f024d4..d309f66 100644
> --- a/xen/include/asm-arm/arm32/atomic.h
> +++ b/xen/include/asm-arm/arm32/atomic.h
> @@ -21,6 +21,7 @@ static inline void atomic_add(int i, atomic_t *v)
> unsigned long tmp;
> int result;
>
> + prefetchw(&v->counter);
Xen on ARM doesn't provide prefetch* helper. Shall we implement it?
Regards,
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 06/17] xen: arm32: resync mem* with Linux v3.14-rc7
2014-03-20 15:45 ` [PATCH 06/17] xen: arm32: resync mem* with Linux v3.14-rc7 Ian Campbell
@ 2014-03-20 17:29 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:29 UTC (permalink / raw)
To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel
On 03/20/2014 03:45 PM, Ian Campbell wrote:
> This pulls in the following Linux commits:
> commit 455bd4c430b0c0a361f38e8658a0d6cb469942b5
> Author: Ivan Djelic <ivan.djelic@parrot.com>
> Date: Wed Mar 6 20:09:27 2013 +0100
>
> ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimi
>
> Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
> assumptions about the implementation of memset and similar functions.
> The current ARM optimized memset code does not return the value of
> its first argument, as is usually expected from standard implementations.
>
> For instance in the following function:
>
> void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waite
> {
> memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
> waiter->magic = waiter;
> INIT_LIST_HEAD(&waiter->list);
> }
>
> compiled as:
>
> 800554d0 <debug_mutex_lock_common>:
> 800554d0: e92d4008 push {r3, lr}
> 800554d4: e1a00001 mov r0, r1
> 800554d8: e3a02010 mov r2, #16 ; 0x10
> 800554dc: e3a01011 mov r1, #17 ; 0x11
> 800554e0: eb04426e bl 80165ea0 <memset>
> 800554e4: e1a03000 mov r3, r0
> 800554e8: e583000c str r0, [r3, #12]
> 800554ec: e5830000 str r0, [r3]
> 800554f0: e5830004 str r0, [r3, #4]
> 800554f4: e8bd8008 pop {r3, pc}
>
> GCC assumes memset returns the value of pointer 'waiter' in register r0; ca
> register/memory corruptions.
>
> This patch fixes the return value of the assembly version of memset.
> It adds a 'mov' instruction and merges an additional load+store into
> existing load/store instructions.
> For ease of review, here is a breakdown of the patch into 4 simple steps:
>
> Step 1
> ======
> Perform the following substitutions:
> ip -> r8, then
> r0 -> ip,
> and insert 'mov ip, r0' as the first statement of the function.
> At this point, we have a memset() implementation returning the proper resul
> but corrupting r8 on some paths (the ones that were using ip).
>
> Step 2
> ======
> Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:
>
> save r8:
> - str lr, [sp, #-4]!
> + stmfd sp!, {r8, lr}
>
> and restore r8 on both exit paths:
> - ldmeqfd sp!, {pc} @ Now <64 bytes to go.
> + ldmeqfd sp!, {r8, pc} @ Now <64 bytes to go.
> (...)
> tst r2, #16
> stmneia ip!, {r1, r3, r8, lr}
> - ldr lr, [sp], #4
> + ldmfd sp!, {r8, lr}
>
> Step 3
> ======
> Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:
>
> save r8:
> - stmfd sp!, {r4-r7, lr}
> + stmfd sp!, {r4-r8, lr}
>
> and restore r8 on both exit paths:
> bgt 3b
> - ldmeqfd sp!, {r4-r7, pc}
> + ldmeqfd sp!, {r4-r8, pc}
> (...)
> tst r2, #16
> stmneia ip!, {r4-r7}
> - ldmfd sp!, {r4-r7, lr}
> + ldmfd sp!, {r4-r8, lr}
>
> Step 4
> ======
> Rewrite register list "r4-r7, r8" as "r4-r8".
>
> Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
> Reviewed-by: Nicolas Pitre <nico@linaro.org>
> Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
>
> commit 418df63adac56841ef6b0f1fcf435bc64d4ed177
> Author: Nicolas Pitre <nicolas.pitre@linaro.org>
> Date: Tue Mar 12 13:00:42 2013 +0100
>
> ARM: 7670/1: fix the memset fix
>
> Commit 455bd4c430b0 ("ARM: 7668/1: fix memset-related crashes caused by
> recent GCC (4.7.2) optimizations") attempted to fix a compliance issue
> with the memset return value. However the memset itself became broken
> by that patch for misaligned pointers.
>
> This fixes the above by branching over the entry code from the
> misaligned fixup code to avoid reloading the original pointer.
>
> Also, because the function entry alignment is wrong in the Thumb mode
> compilation, that fixup code is moved to the end.
>
> While at it, the entry instructions are slightly reworked to help dual
> issue pipelines.
>
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> Tested-by: Alexander Holler <holler@ahsoftware.de>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 07/17] xen: arm32: add optimised memchr routine
2014-03-20 15:45 ` [PATCH 07/17] xen: arm32: add optimised memchr routine Ian Campbell
@ 2014-03-20 17:32 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:32 UTC (permalink / raw)
To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel
On 03/20/2014 03:45 PM, Ian Campbell wrote:
> This isn't used enough to be critical, but it completes the set of mem*.
>
> Taken from Linux v3.14-rc7.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 08/17] xen: arm32: add optimised strchr and strrchr routines
2014-03-20 15:45 ` [PATCH 08/17] xen: arm32: add optimised strchr and strrchr routines Ian Campbell
@ 2014-03-20 17:33 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:33 UTC (permalink / raw)
To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel
On 03/20/2014 03:45 PM, Ian Campbell wrote:
> Taken from Linux v3.14-rc7.
>
> These aren't widely used enough to be critical, but we may as well have them.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 09/17] xen: arm: remove atomic_clear_mask()
2014-03-20 15:45 ` [PATCH 09/17] xen: arm: remove atomic_clear_mask() Ian Campbell
@ 2014-03-20 17:35 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:35 UTC (permalink / raw)
To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel
On 03/20/2014 03:45 PM, Ian Campbell wrote:
> This has no users.
>
> This brings arm32 atomic.h into sync with Linux v3.14-rc7.
>
> arm64/atomic.h requires other patches for this to be the case.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 11/17] xen: arm64: atomics: fix use of acquire + release for full barrier semantics
2014-03-20 15:45 ` [PATCH 11/17] xen: arm64: atomics: fix use of acquire + release for full barrier semantics Ian Campbell
@ 2014-03-20 17:43 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:43 UTC (permalink / raw)
To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel
On 03/20/2014 03:45 PM, Ian Campbell wrote:
> Xen, like Linux, expects full barrier semantics for bitops, atomics and
> cmpxchgs. This issue was discovered on Linux and we get our implementation of
> these from Linux so quoting Will Deacon in Linux commit 8e86f0b409a4 for the
> gory details:
> Linux requires a number of atomic operations to provide full barrier
> semantics, that is no memory accesses after the operation can be
> observed before any accesses up to and including the operation in
> program order.
>
> On arm64, these operations have been incorrectly implemented as follows:
>
> // A, B, C are independent memory locations
>
> <Access [A]>
>
> // atomic_op (B)
> 1: ldaxr x0, [B] // Exclusive load with acquire
> <op(B)>
> stlxr w1, x0, [B] // Exclusive store with release
> cbnz w1, 1b
>
> <Access [C]>
>
> The assumption here being that two half barriers are equivalent to a
> full barrier, so the only permitted ordering would be A -> B -> C
> (where B is the atomic operation involving both a load and a store).
>
> Unfortunately, this is not the case by the letter of the architecture
> and, in fact, the accesses to A and C are permitted to pass their
> nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
> or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
> store-release on B). This is a clear violation of the full barrier
> requirement.
>
> The simple way to fix this is to implement the same algorithm as ARMv7
> using explicit barriers:
>
> <Access [A]>
>
> // atomic_op (B)
> dmb ish // Full barrier
> 1: ldxr x0, [B] // Exclusive load
> <op(B)>
> stxr w1, x0, [B] // Exclusive store
> cbnz w1, 1b
> dmb ish // Full barrier
>
> <Access [C]>
>
> but this has the undesirable effect of introducing *two* full barrier
> instructions. A better approach is actually the following, non-intuitive
> sequence:
>
> <Access [A]>
>
> // atomic_op (B)
> 1: ldxr x0, [B] // Exclusive load
> <op(B)>
> stlxr w1, x0, [B] // Exclusive store with release
> cbnz w1, 1b
> dmb ish // Full barrier
>
> <Access [C]>
>
> The simple observations here are:
>
> - The dmb ensures that no subsequent accesses (e.g. the access to C)
> can enter or pass the atomic sequence.
>
> - The dmb also ensures that no prior accesses (e.g. the access to A)
> can pass the atomic sequence.
>
> - Therefore, no prior access can pass a subsequent access, or
> vice-versa (i.e. A is strictly ordered before C).
>
> - The stlxr ensures that no prior access can pass the store component
> of the atomic operation.
>
> The only tricky part remaining is the ordering between the ldxr and the
> access to A, since the absence of the first dmb means that we're now
> permitting re-ordering between the ldxr and any prior accesses.
>
> From an (arbitrary) observer's point of view, there are two scenarios:
>
> 1. We have observed the ldxr. This means that if we perform a store to
> [B], the ldxr will still return older data. If we can observe the
> ldxr, then we can potentially observe the permitted re-ordering
> with the access to A, which is clearly an issue when compared to
> the dmb variant of the code. Thankfully, the exclusive monitor will
> save us here since it will be cleared as a result of the store and
> the ldxr will retry. Notice that any use of a later memory
> observation to imply observation of the ldxr will also imply
> observation of the access to A, since the stlxr/dmb ensure strict
> ordering.
>
> 2. We have not observed the ldxr. This means we can perform a store
> and influence the later ldxr. However, that doesn't actually tell
> us anything about the access to [A], so we've not lost anything
> here either when compared to the dmb variant.
>
> This patch implements this solution for our barriered atomic operations,
> ensuring that we satisfy the full barrier requirements where they are
> needed.
>
> Cc: <stable@vger.kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 12/17] xen: arm64: reinstate hard tabs in system.h cmpxchg
2014-03-20 15:46 ` [PATCH 12/17] xen: arm64: reinstate hard tabs in system.h cmpxchg Ian Campbell
@ 2014-03-20 17:44 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:44 UTC (permalink / raw)
To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel
On 03/20/2014 03:46 PM, Ian Campbell wrote:
> These functions are from Linux and the intention was to keep the formatting
> the same to make resyncing easier.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 13/17] xen: arm64: asm: remove redundant "cc" clobbers
2014-03-20 15:46 ` [PATCH 13/17] xen: arm64: asm: remove redundant "cc" clobbers Ian Campbell
@ 2014-03-20 17:45 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:45 UTC (permalink / raw)
To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel
On 03/20/2014 03:46 PM, Ian Campbell wrote:
> This resyncs atomics and cmpxchgs with Linux v3.14-rc7 by importing:
> commit 95c4189689f92fba7ecf9097173404d4928c6e9b
> Author: Will Deacon <will.deacon@arm.com>
> Date: Tue Feb 4 12:29:13 2014 +0000
>
> arm64: asm: remove redundant "cc" clobbers
>
> cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
> from inline asm blocks that only use these instructions to implement
> conditional branches.
>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 14/17] xen: arm64: assembly optimised mem* and str*
2014-03-20 15:46 ` [PATCH 14/17] xen: arm64: assembly optimised mem* and str* Ian Campbell
@ 2014-03-20 17:48 ` Julien Grall
0 siblings, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:48 UTC (permalink / raw)
To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel
On 03/20/2014 03:46 PM, Ian Campbell wrote:
> Taken from Linux v3.14-rc7.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 16/17] xen: arm: refactor xchg and cmpxchg into their own headers
2014-03-20 15:46 ` [PATCH 16/17] xen: arm: refactor xchg and cmpxchg into their own headers Ian Campbell
@ 2014-03-20 17:52 ` Julien Grall
2014-03-21 8:42 ` Ian Campbell
0 siblings, 1 reply; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:52 UTC (permalink / raw)
To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel
Hi Ian,
On 03/20/2014 03:46 PM, Ian Campbell wrote:
> diff --git a/xen/include/asm-arm/arm32/cmpxchg.h b/xen/include/asm-arm/arm32/cmpxchg.h
> new file mode 100644
> index 0000000..70c6090
> --- /dev/null
> +++ b/xen/include/asm-arm/arm32/cmpxchg.h
> +static always_inline unsigned long __cmpxchg(
> + volatile void *ptr, unsigned long old, unsigned long new, int size)
> +{
> + unsigned long oldval, res;
> +
> + switch (size) {
> + case 1:
> + do {
> + asm volatile("@ __cmpxchg1\n"
> + " ldrexb %1, [%2]\n"
> + " mov %0, #0\n"
> + " teq %1, %3\n"
> + " strexbeq %0, %4, [%2]\n"
> + : "=&r" (res), "=&r" (oldval)
> + : "r" (ptr), "Ir" (old), "r" (new)
> + : "memory", "cc");
> + } while (res);
> + break;
> + case 2:
> + do {
> + asm volatile("@ __cmpxchg2\n"
> + " ldrexh %1, [%2]\n"
> + " mov %0, #0\n"
> + " teq %1, %3\n"
> + " strexheq %0, %4, [%2]\n"
> + : "=&r" (res), "=&r" (oldval)
> + : "r" (ptr), "Ir" (old), "r" (new)
> + : "memory", "cc");
> + } while (res);
> + break;
> + case 4:
> + do {
> + asm volatile("@ __cmpxchg4\n"
> + " ldrex %1, [%2]\n"
> + " mov %0, #0\n"
> + " teq %1, %3\n"
> + " strexeq %0, %4, [%2]\n"
> + : "=&r" (res), "=&r" (oldval)
> + : "r" (ptr), "Ir" (old), "r" (new)
> + : "memory", "cc");
> + } while (res);
> + break;
> +#if 0
> + case 8:
> + do {
> + asm volatile("@ __cmpxchg8\n"
> + " ldrexd %1, [%2]\n"
> + " mov %0, #0\n"
> + " teq %1, %3\n"
> + " strexdeq %0, %4, [%2]\n"
> + : "=&r" (res), "=&r" (oldval)
> + : "r" (ptr), "Ir" (old), "r" (new)
> + : "memory", "cc");
> + } while (res);
> + break;
> +#endif
Is it really useful to let the dead code in the header?
Regards,
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 10/17] xen: arm64: disable alignment traps
2014-03-20 15:45 ` [PATCH 10/17] xen: arm64: disable alignment traps Ian Campbell
2014-03-20 15:57 ` Andrew Cooper
@ 2014-03-20 17:54 ` Julien Grall
1 sibling, 0 replies; 42+ messages in thread
From: Julien Grall @ 2014-03-20 17:54 UTC (permalink / raw)
To: Ian Campbell; +Cc: stefano.stabellini, tim, xen-devel
Hi Ian,
On 03/20/2014 03:45 PM, Ian Campbell wrote:
> The mem* primitives which I am about to import from Linux in a subsequent
> patch rely on the hardware handling misalignment.
>
> The benefits of an optimised memcpy etc oughtweigh the downsides.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
With the both minor changes that Andrew and you spotted:
Acked-by: Julien Grall <julien.grall@linaro.org>
Regards,
--
Julien Grall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 05/17] xen: arm32: resync atomics with (almost) v3.14-rc7
2014-03-20 17:27 ` Julien Grall
@ 2014-03-21 8:41 ` Ian Campbell
0 siblings, 0 replies; 42+ messages in thread
From: Ian Campbell @ 2014-03-21 8:41 UTC (permalink / raw)
To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel
On Thu, 2014-03-20 at 17:27 +0000, Julien Grall wrote:
> Hi Ian,
>
> On 03/20/2014 03:45 PM, Ian Campbell wrote:
> > diff --git a/xen/include/asm-arm/arm32/atomic.h b/xen/include/asm-arm/arm32/atomic.h
> > index 3f024d4..d309f66 100644
> > --- a/xen/include/asm-arm/arm32/atomic.h
> > +++ b/xen/include/asm-arm/arm32/atomic.h
> > @@ -21,6 +21,7 @@ static inline void atomic_add(int i, atomic_t *v)
> > unsigned long tmp;
> > int result;
> >
> > + prefetchw(&v->counter);
>
> Xen on ARM doesn't provide prefetch* helper. Shall we implement it?
It comes from generic code after the first patch in this series.
Ian.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 16/17] xen: arm: refactor xchg and cmpxchg into their own headers
2014-03-20 17:52 ` Julien Grall
@ 2014-03-21 8:42 ` Ian Campbell
0 siblings, 0 replies; 42+ messages in thread
From: Ian Campbell @ 2014-03-21 8:42 UTC (permalink / raw)
To: Julien Grall; +Cc: stefano.stabellini, tim, xen-devel
On Thu, 2014-03-20 at 17:52 +0000, Julien Grall wrote:
> Hi Ian,
>
> On 03/20/2014 03:46 PM, Ian Campbell wrote:
> > diff --git a/xen/include/asm-arm/arm32/cmpxchg.h b/xen/include/asm-arm/arm32/cmpxchg.h
> > new file mode 100644
> > index 0000000..70c6090
> > --- /dev/null
> > +++ b/xen/include/asm-arm/arm32/cmpxchg.h
> > +static always_inline unsigned long __cmpxchg(
> > + volatile void *ptr, unsigned long old, unsigned long new, int size)
> > +{
> > + unsigned long oldval, res;
> > +
> > + switch (size) {
> > + case 1:
> > + do {
> > + asm volatile("@ __cmpxchg1\n"
> > + " ldrexb %1, [%2]\n"
> > + " mov %0, #0\n"
> > + " teq %1, %3\n"
> > + " strexbeq %0, %4, [%2]\n"
> > + : "=&r" (res), "=&r" (oldval)
> > + : "r" (ptr), "Ir" (old), "r" (new)
> > + : "memory", "cc");
> > + } while (res);
> > + break;
> > + case 2:
> > + do {
> > + asm volatile("@ __cmpxchg2\n"
> > + " ldrexh %1, [%2]\n"
> > + " mov %0, #0\n"
> > + " teq %1, %3\n"
> > + " strexheq %0, %4, [%2]\n"
> > + : "=&r" (res), "=&r" (oldval)
> > + : "r" (ptr), "Ir" (old), "r" (new)
> > + : "memory", "cc");
> > + } while (res);
> > + break;
> > + case 4:
> > + do {
> > + asm volatile("@ __cmpxchg4\n"
> > + " ldrex %1, [%2]\n"
> > + " mov %0, #0\n"
> > + " teq %1, %3\n"
> > + " strexeq %0, %4, [%2]\n"
> > + : "=&r" (res), "=&r" (oldval)
> > + : "r" (ptr), "Ir" (old), "r" (new)
> > + : "memory", "cc");
> > + } while (res);
> > + break;
> > +#if 0
> > + case 8:
> > + do {
> > + asm volatile("@ __cmpxchg8\n"
> > + " ldrexd %1, [%2]\n"
> > + " mov %0, #0\n"
> > + " teq %1, %3\n"
> > + " strexdeq %0, %4, [%2]\n"
> > + : "=&r" (res), "=&r" (oldval)
> > + : "r" (ptr), "Ir" (old), "r" (new)
> > + : "memory", "cc");
> > + } while (res);
> > + break;
> > +#endif
>
> Is it really useful to let the dead code in the header?
This was a pure code motion patch so I'm not going to remove it here.
In any case this will come in handy the first time someone tries to
cmpxchg an 8 byte value.
Ian.
^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2014-03-21 8:42 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-20 15:45 [PATCH 00/17] xen: arm: resync low level asm primitive from Linux Ian Campbell
2014-03-20 15:45 ` [PATCH 01/17] xen: x86 & generic: change to __builtin_prefetch() Ian Campbell
2014-03-20 16:12 ` Jan Beulich
2014-03-20 15:45 ` [PATCH 02/17] xen: arm32: resync bitops with Linux v3.14-rc7 Ian Campbell
2014-03-20 17:13 ` Julien Grall
2014-03-20 15:45 ` [PATCH 03/17] xen: arm32: ensure cmpxchg has full barrier semantics Ian Campbell
2014-03-20 17:22 ` Julien Grall
2014-03-20 15:45 ` [PATCH 04/17] xen: arm32: replace hard tabs in atomics.h Ian Campbell
2014-03-20 17:23 ` Julien Grall
2014-03-20 15:45 ` [PATCH 05/17] xen: arm32: resync atomics with (almost) v3.14-rc7 Ian Campbell
2014-03-20 17:27 ` Julien Grall
2014-03-21 8:41 ` Ian Campbell
2014-03-20 15:45 ` [PATCH 06/17] xen: arm32: resync mem* with Linux v3.14-rc7 Ian Campbell
2014-03-20 17:29 ` Julien Grall
2014-03-20 15:45 ` [PATCH 07/17] xen: arm32: add optimised memchr routine Ian Campbell
2014-03-20 17:32 ` Julien Grall
2014-03-20 15:45 ` [PATCH 08/17] xen: arm32: add optimised strchr and strrchr routines Ian Campbell
2014-03-20 17:33 ` Julien Grall
2014-03-20 15:45 ` [PATCH 09/17] xen: arm: remove atomic_clear_mask() Ian Campbell
2014-03-20 17:35 ` Julien Grall
2014-03-20 15:45 ` [PATCH 10/17] xen: arm64: disable alignment traps Ian Campbell
2014-03-20 15:57 ` Andrew Cooper
2014-03-20 15:59 ` Ian Campbell
2014-03-20 16:21 ` Gordan Bobic
2014-03-20 16:27 ` Ian Campbell
2014-03-20 16:43 ` Gordan Bobic
2014-03-20 16:54 ` Ian Campbell
2014-03-20 17:54 ` Julien Grall
2014-03-20 15:45 ` [PATCH 11/17] xen: arm64: atomics: fix use of acquire + release for full barrier semantics Ian Campbell
2014-03-20 17:43 ` Julien Grall
2014-03-20 15:46 ` [PATCH 12/17] xen: arm64: reinstate hard tabs in system.h cmpxchg Ian Campbell
2014-03-20 17:44 ` Julien Grall
2014-03-20 15:46 ` [PATCH 13/17] xen: arm64: asm: remove redundant "cc" clobbers Ian Campbell
2014-03-20 17:45 ` Julien Grall
2014-03-20 15:46 ` [PATCH 14/17] xen: arm64: assembly optimised mem* and str* Ian Campbell
2014-03-20 17:48 ` Julien Grall
2014-03-20 15:46 ` [PATCH 15/17] xen: arm64: optimised clear_page Ian Campbell
2014-03-20 15:46 ` [PATCH 16/17] xen: arm: refactor xchg and cmpxchg into their own headers Ian Campbell
2014-03-20 17:52 ` Julien Grall
2014-03-21 8:42 ` Ian Campbell
2014-03-20 15:46 ` [PATCH 17/17] xen: arm: document what low level primitives we have imported from Linux Ian Campbell
2014-03-20 16:23 ` Ian Campbell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.