linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* __xchg for sizes other than 32bit
@ 2010-03-10 16:22 Imre Deak
  2010-03-10 17:35 ` Russell King - ARM Linux
  0 siblings, 1 reply; 25+ messages in thread
From: Imre Deak @ 2010-03-10 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

recently you've added support for __cmpxchg for ARMv6+. There the assumption
is that ldrex[bh]/strex[bh] is only supported on platforms with the 32v6K
extension.

Currently the __xchg code uses these even without the extension. Should this
be fixed?

I'm looking at this since I'd need to add support for short arguments in __xchg
needed at least by the cgroups code.

Thanks,
Imre

^ permalink raw reply	[flat|nested] 25+ messages in thread

* __xchg for sizes other than 32bit
  2010-03-10 16:22 __xchg for sizes other than 32bit Imre Deak
@ 2010-03-10 17:35 ` Russell King - ARM Linux
  2010-03-10 20:02   ` [PATCH] ARM support single byte cmpxchg and cmpxchg_local on ARMv6 Mathieu Desnoyers
                     ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2010-03-10 17:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Mar 10, 2010 at 06:22:21PM +0200, Imre Deak wrote:
> recently you've added support for __cmpxchg for ARMv6+. There the assumption
> is that ldrex[bh]/strex[bh] is only supported on platforms with the 32v6K
> extension.
> 
> Currently the __xchg code uses these even without the extension. Should this
> be fixed?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] ARM support single byte cmpxchg and cmpxchg_local on ARMv6
  2010-03-10 17:35 ` Russell King - ARM Linux
@ 2010-03-10 20:02   ` Mathieu Desnoyers
  2010-03-10 20:30     ` Russell King - ARM Linux
  2010-03-10 23:16   ` __xchg for sizes other than 32bit Jamie Lokier
  2010-03-18  9:29   ` [PATCH 1/1] [RFC] arm: add half-word __xchg Alexander Shishkin
  2 siblings, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-03-10 20:02 UTC (permalink / raw)
  To: linux-arm-kernel

* Russell King - ARM Linux (linux at arm.linux.org.uk) wrote:
> On Wed, Mar 10, 2010 at 06:22:21PM +0200, Imre Deak wrote:
> > recently you've added support for __cmpxchg for ARMv6+. There the assumption
> > is that ldrex[bh]/strex[bh] is only supported on platforms with the 32v6K
> > extension.
> > 
> > Currently the __xchg code uses these even without the extension. Should this
> > be fixed?
> 
> From what I remember, the half-word versions definitely aren't supported
> on anything without V6K extensions.  I think that the byte and word
> versions are supported on V6 and up though.
> 
> That'd make both __cmpxchg and __xchg slightly buggy, in different ways.
> 
> What it does mean is that atomic operations on unsigned shorts using
> ldrex/strex will only be possible on V6K and up.

Does the following help (addressing the single byte cmpxchg part of the
problem) ?


ARM support single byte cmpxchg and cmpxchg_local on ARMv6

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King - ARM Linux <linux@arm.linux.org.uk>
CC: Imre Deak <imre.deak@nokia.com>
---
 arch/arm/include/asm/system.h |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: linux-2.6-lttng.git/arch/arm/include/asm/system.h
===================================================================
--- linux-2.6-lttng.git.orig/arch/arm/include/asm/system.h	2010-03-10 14:49:59.000000000 -0500
+++ linux-2.6-lttng.git/arch/arm/include/asm/system.h	2010-03-10 14:57:41.000000000 -0500
@@ -352,7 +352,6 @@
 	unsigned long oldval, res;
 
 	switch (size) {
-#ifdef CONFIG_CPU_32v6K
 	case 1:
 		do {
 			asm volatile("@ __cmpxchg1\n"
@@ -365,6 +364,7 @@
 				: "memory", "cc");
 		} while (res);
 		break;
+#ifdef CONFIG_CPU_32v6K
 	case 2:
 		do {
 			asm volatile("@ __cmpxchg1\n"
@@ -424,7 +424,6 @@
 
 	switch (size) {
 #ifndef CONFIG_CPU_32v6K
-	case 1:
 	case 2:
 		ret = __cmpxchg_local_generic(ptr, old, new, size);
 		break;


-- 
Mathieu Desnoyers
Operating System Efficiency Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] ARM support single byte cmpxchg and cmpxchg_local on ARMv6
  2010-03-10 20:02   ` [PATCH] ARM support single byte cmpxchg and cmpxchg_local on ARMv6 Mathieu Desnoyers
@ 2010-03-10 20:30     ` Russell King - ARM Linux
  2010-03-10 21:15       ` Mathieu Desnoyers
  0 siblings, 1 reply; 25+ messages in thread
From: Russell King - ARM Linux @ 2010-03-10 20:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Mar 10, 2010 at 03:02:36PM -0500, Mathieu Desnoyers wrote:
> * Russell King - ARM Linux (linux at arm.linux.org.uk) wrote:
> > On Wed, Mar 10, 2010 at 06:22:21PM +0200, Imre Deak wrote:
> > > recently you've added support for __cmpxchg for ARMv6+. There the assumption
> > > is that ldrex[bh]/strex[bh] is only supported on platforms with the 32v6K
> > > extension.
> > > 
> > > Currently the __xchg code uses these even without the extension. Should this
> > > be fixed?
> > 
> > From what I remember, the half-word versions definitely aren't supported
> > on anything without V6K extensions.  I think that the byte and word
> > versions are supported on V6 and up though.
> > 
> > That'd make both __cmpxchg and __xchg slightly buggy, in different ways.
> > 
> > What it does mean is that atomic operations on unsigned shorts using
> > ldrex/strex will only be possible on V6K and up.
> 
> Does the following help (addressing the single byte cmpxchg part of the
> problem) ?

First I'd need to look up what the real conditions are on the ldrex
instructions - which I'm not doing at the moment on account of being
down with the lurgy this week...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] ARM support single byte cmpxchg and cmpxchg_local on ARMv6
  2010-03-10 20:30     ` Russell King - ARM Linux
@ 2010-03-10 21:15       ` Mathieu Desnoyers
  0 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-03-10 21:15 UTC (permalink / raw)
  To: linux-arm-kernel

* Russell King - ARM Linux (linux at arm.linux.org.uk) wrote:
> On Wed, Mar 10, 2010 at 03:02:36PM -0500, Mathieu Desnoyers wrote:
> > * Russell King - ARM Linux (linux at arm.linux.org.uk) wrote:
> > > On Wed, Mar 10, 2010 at 06:22:21PM +0200, Imre Deak wrote:
> > > > recently you've added support for __cmpxchg for ARMv6+. There the assumption
> > > > is that ldrex[bh]/strex[bh] is only supported on platforms with the 32v6K
> > > > extension.
> > > > 
> > > > Currently the __xchg code uses these even without the extension. Should this
> > > > be fixed?
> > > 
> > > From what I remember, the half-word versions definitely aren't supported
> > > on anything without V6K extensions.  I think that the byte and word
> > > versions are supported on V6 and up though.
> > > 
> > > That'd make both __cmpxchg and __xchg slightly buggy, in different ways.
> > > 
> > > What it does mean is that atomic operations on unsigned shorts using
> > > ldrex/strex will only be possible on V6K and up.
> > 
> > Does the following help (addressing the single byte cmpxchg part of the
> > problem) ?
> 
> First I'd need to look up what the real conditions are on the ldrex
> instructions - which I'm not doing at the moment on account of being
> down with the lurgy this week...

No problem, the patch can wait.

Take care,

Mathieu



-- 
Mathieu Desnoyers
Operating System Efficiency Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* __xchg for sizes other than 32bit
  2010-03-10 17:35 ` Russell King - ARM Linux
  2010-03-10 20:02   ` [PATCH] ARM support single byte cmpxchg and cmpxchg_local on ARMv6 Mathieu Desnoyers
@ 2010-03-10 23:16   ` Jamie Lokier
  2010-03-18  9:29   ` [PATCH 1/1] [RFC] arm: add half-word __xchg Alexander Shishkin
  2 siblings, 0 replies; 25+ messages in thread
From: Jamie Lokier @ 2010-03-10 23:16 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Wed, Mar 10, 2010 at 06:22:21PM +0200, Imre Deak wrote:
> > recently you've added support for __cmpxchg for ARMv6+. There the assumption
> > is that ldrex[bh]/strex[bh] is only supported on platforms with the 32v6K
> > extension.
> > 
> > Currently the __xchg code uses these even without the extension. Should this
> > be fixed?
> 
> >From what I remember, the half-word versions definitely aren't supported
> on anything without V6K extensions.  I think that the byte and word
> versions are supported on V6 and up though.
> 
> That'd make both __cmpxchg and __xchg slightly buggy, in different ways.
> 
> What it does mean is that atomic operations on unsigned shorts using
> ldrex/strex will only be possible on V6K and up.

It's possible to emulate 16-bit cmpxchg using 32-bit cmpxchg.

xchg can't be emulated with wider xchg, but of course it can use wider
cmpxchg in a loop.

That sort of thing probably should be in asm-generic somewhere.

-- Jamie

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFC] arm: add half-word __xchg
  2010-03-10 17:35 ` Russell King - ARM Linux
  2010-03-10 20:02   ` [PATCH] ARM support single byte cmpxchg and cmpxchg_local on ARMv6 Mathieu Desnoyers
  2010-03-10 23:16   ` __xchg for sizes other than 32bit Jamie Lokier
@ 2010-03-18  9:29   ` Alexander Shishkin
  2010-03-18 12:32     ` Mathieu Desnoyers
  2 siblings, 1 reply; 25+ messages in thread
From: Alexander Shishkin @ 2010-03-18  9:29 UTC (permalink / raw)
  To: linux-arm-kernel

On systems where ldrexh/strexh are not available, use a generic local
version.

Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
CC: linux-arm-kernel-bounces at lists.infradead.org
CC: Imre Deak <imre.deak@nokia.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 arch/arm/include/asm/system.h |   61 ++++++++++++++++++++++++++++++++++-------
 1 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
index d65b2f5..a2ee41f 100644
--- a/arch/arm/include/asm/system.h
+++ b/arch/arm/include/asm/system.h
@@ -218,6 +218,39 @@ do {									\
 	last = __switch_to(prev,task_thread_info(prev), task_thread_info(next));	\
 } while (0)
 
+static inline unsigned long __xchg_local_generic(unsigned long x,
+						 volatile void *ptr, int size)
+{
+	extern void __bad_xchg(volatile void *, int);
+	unsigned long ret;
+	unsigned long flags;
+
+	switch (size) {
+	case 1:
+		raw_local_irq_save(flags);
+		ret = *(volatile unsigned char *)ptr;
+		*(volatile unsigned char *)ptr = x;
+		raw_local_irq_restore(flags);
+		break;
+
+	case 2:
+		raw_local_irq_save(flags);
+		ret = *(volatile unsigned short *)ptr;
+		*(volatile unsigned short *)ptr = x;
+		raw_local_irq_restore(flags);
+		break;
+
+	case 4:
+		raw_local_irq_save(flags);
+		ret = *(volatile unsigned long *)ptr;
+		*(volatile unsigned long *)ptr = x;
+		raw_local_irq_restore(flags);
+		break;
+	}
+
+	return ret;
+}
+
 #if defined(CONFIG_CPU_SA1100) || defined(CONFIG_CPU_SA110)
 /*
  * On the StrongARM, "swp" is terminally broken since it bypasses the
@@ -262,6 +295,22 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 			: "r" (x), "r" (ptr)
 			: "memory", "cc");
 		break;
+#ifdef CONFIG_CPU_32v6K
+	case 2:
+		asm volatile("@	__xchg2\n"
+		"1:	ldrexh	%0, [%3]\n"
+		"	strexh	%1, %2, [%3]\n"
+		"	teq	%1, #0\n"
+		"	bne	1b"
+			: "=&r" (ret), "=&r" (tmp)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+#else
+	case 2:
+		ret = __xchg_local_generic(x, ptr, 2);
+		break;
+#endif
 	case 4:
 		asm volatile("@	__xchg4\n"
 		"1:	ldrex	%0, [%3]\n"
@@ -277,17 +326,9 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 #error SMP is not supported on this platform
 #endif
 	case 1:
-		raw_local_irq_save(flags);
-		ret = *(volatile unsigned char *)ptr;
-		*(volatile unsigned char *)ptr = x;
-		raw_local_irq_restore(flags);
-		break;
-
+	case 2:
 	case 4:
-		raw_local_irq_save(flags);
-		ret = *(volatile unsigned long *)ptr;
-		*(volatile unsigned long *)ptr = x;
-		raw_local_irq_restore(flags);
+		ret = __xchg_local_generic(x, ptr, size);
 		break;
 #else
 	case 1:
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFC] arm: add half-word __xchg
  2010-03-18  9:29   ` [PATCH 1/1] [RFC] arm: add half-word __xchg Alexander Shishkin
@ 2010-03-18 12:32     ` Mathieu Desnoyers
  2010-03-18 12:37       ` Alexander Shishkin
  2010-03-18 13:33       ` Alexander Shishkin
  0 siblings, 2 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-03-18 12:32 UTC (permalink / raw)
  To: linux-arm-kernel

* Alexander Shishkin (virtuoso at slind.org) wrote:
> On systems where ldrexh/strexh are not available, use a generic local
> version.
> 
> Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
> CC: linux-arm-kernel-bounces at lists.infradead.org
> CC: Imre Deak <imre.deak@nokia.com>
> CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> ---
>  arch/arm/include/asm/system.h |   61 ++++++++++++++++++++++++++++++++++-------
>  1 files changed, 51 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
> index d65b2f5..a2ee41f 100644
> --- a/arch/arm/include/asm/system.h
> +++ b/arch/arm/include/asm/system.h
> @@ -218,6 +218,39 @@ do {									\
>  	last = __switch_to(prev,task_thread_info(prev), task_thread_info(next));	\
>  } while (0)
>  
> +static inline unsigned long __xchg_local_generic(unsigned long x,
> +						 volatile void *ptr, int size)
> +{
> +	extern void __bad_xchg(volatile void *, int);
> +	unsigned long ret;
> +	unsigned long flags;
> +
> +	switch (size) {
> +	case 1:
> +		raw_local_irq_save(flags);
> +		ret = *(volatile unsigned char *)ptr;
> +		*(volatile unsigned char *)ptr = x;
> +		raw_local_irq_restore(flags);
> +		break;
> +
> +	case 2:
> +		raw_local_irq_save(flags);
> +		ret = *(volatile unsigned short *)ptr;
> +		*(volatile unsigned short *)ptr = x;
> +		raw_local_irq_restore(flags);
> +		break;
> +
> +	case 4:
> +		raw_local_irq_save(flags);
> +		ret = *(volatile unsigned long *)ptr;
> +		*(volatile unsigned long *)ptr = x;
> +		raw_local_irq_restore(flags);
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
>  #if defined(CONFIG_CPU_SA1100) || defined(CONFIG_CPU_SA110)
>  /*
>   * On the StrongARM, "swp" is terminally broken since it bypasses the
> @@ -262,6 +295,22 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
>  			: "r" (x), "r" (ptr)
>  			: "memory", "cc");
>  		break;
> +#ifdef CONFIG_CPU_32v6K
> +	case 2:
> +		asm volatile("@	__xchg2\n"
> +		"1:	ldrexh	%0, [%3]\n"
> +		"	strexh	%1, %2, [%3]\n"
> +		"	teq	%1, #0\n"
> +		"	bne	1b"
> +			: "=&r" (ret), "=&r" (tmp)
> +			: "r" (x), "r" (ptr)
> +			: "memory", "cc");
> +		break;
> +#else
> +	case 2:
> +		ret = __xchg_local_generic(x, ptr, 2);
> +		break;

Calling __xchg_local_generic as an underlying xchg() implementation on
SMP is buggy. I think we have to surround this by "#ifndef CONFIG_SMP",
so it generates a "__bad_xchg(ptr, size), ret = 0;" error for short xchg
on SMP. That's odd though, as a program that builds on UP will break on
SMP.

Thanks,

Mathieu

> +#endif
>  	case 4:
>  		asm volatile("@	__xchg4\n"
>  		"1:	ldrex	%0, [%3]\n"
> @@ -277,17 +326,9 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
>  #error SMP is not supported on this platform
>  #endif
>  	case 1:
> -		raw_local_irq_save(flags);
> -		ret = *(volatile unsigned char *)ptr;
> -		*(volatile unsigned char *)ptr = x;
> -		raw_local_irq_restore(flags);
> -		break;
> -
> +	case 2:
>  	case 4:
> -		raw_local_irq_save(flags);
> -		ret = *(volatile unsigned long *)ptr;
> -		*(volatile unsigned long *)ptr = x;
> -		raw_local_irq_restore(flags);
> +		ret = __xchg_local_generic(x, ptr, size);
>  		break;
>  #else
>  	case 1:
> -- 
> 1.6.3.3
> 

-- 
Mathieu Desnoyers
Operating System Efficiency Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFC] arm: add half-word __xchg
  2010-03-18 12:32     ` Mathieu Desnoyers
@ 2010-03-18 12:37       ` Alexander Shishkin
  2010-03-18 13:33       ` Alexander Shishkin
  1 sibling, 0 replies; 25+ messages in thread
From: Alexander Shishkin @ 2010-03-18 12:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 18 March 2010 14:32, Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> * Alexander Shishkin (virtuoso at slind.org) wrote:
>> On systems where ldrexh/strexh are not available, use a generic local
>> version.
>>
>> Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
>> CC: linux-arm-kernel-bounces at lists.infradead.org
>> CC: Imre Deak <imre.deak@nokia.com>
>> CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
>> ---
>> ?arch/arm/include/asm/system.h | ? 61 ++++++++++++++++++++++++++++++++++-------
>> ?1 files changed, 51 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
>> index d65b2f5..a2ee41f 100644
>> --- a/arch/arm/include/asm/system.h
>> +++ b/arch/arm/include/asm/system.h
>> @@ -218,6 +218,39 @@ do { ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? \
>> ? ? ? last = __switch_to(prev,task_thread_info(prev), task_thread_info(next)); ? ? ? ?\
>> ?} while (0)
>>
>> +static inline unsigned long __xchg_local_generic(unsigned long x,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?volatile void *ptr, int size)
>> +{
>> + ? ? extern void __bad_xchg(volatile void *, int);
>> + ? ? unsigned long ret;
>> + ? ? unsigned long flags;
>> +
>> + ? ? switch (size) {
>> + ? ? case 1:
>> + ? ? ? ? ? ? raw_local_irq_save(flags);
>> + ? ? ? ? ? ? ret = *(volatile unsigned char *)ptr;
>> + ? ? ? ? ? ? *(volatile unsigned char *)ptr = x;
>> + ? ? ? ? ? ? raw_local_irq_restore(flags);
>> + ? ? ? ? ? ? break;
>> +
>> + ? ? case 2:
>> + ? ? ? ? ? ? raw_local_irq_save(flags);
>> + ? ? ? ? ? ? ret = *(volatile unsigned short *)ptr;
>> + ? ? ? ? ? ? *(volatile unsigned short *)ptr = x;
>> + ? ? ? ? ? ? raw_local_irq_restore(flags);
>> + ? ? ? ? ? ? break;
>> +
>> + ? ? case 4:
>> + ? ? ? ? ? ? raw_local_irq_save(flags);
>> + ? ? ? ? ? ? ret = *(volatile unsigned long *)ptr;
>> + ? ? ? ? ? ? *(volatile unsigned long *)ptr = x;
>> + ? ? ? ? ? ? raw_local_irq_restore(flags);
>> + ? ? ? ? ? ? break;
>> + ? ? }
>> +
>> + ? ? return ret;
>> +}
>> +
>> ?#if defined(CONFIG_CPU_SA1100) || defined(CONFIG_CPU_SA110)
>> ?/*
>> ? * On the StrongARM, "swp" is terminally broken since it bypasses the
>> @@ -262,6 +295,22 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
>> ? ? ? ? ? ? ? ? ? ? ? : "r" (x), "r" (ptr)
>> ? ? ? ? ? ? ? ? ? ? ? : "memory", "cc");
>> ? ? ? ? ? ? ? break;
>> +#ifdef CONFIG_CPU_32v6K
>> + ? ? case 2:
>> + ? ? ? ? ? ? asm volatile("@ __xchg2\n"
>> + ? ? ? ? ? ? "1: ? ? ldrexh ?%0, [%3]\n"
>> + ? ? ? ? ? ? " ? ? ? strexh ?%1, %2, [%3]\n"
>> + ? ? ? ? ? ? " ? ? ? teq ? ? %1, #0\n"
>> + ? ? ? ? ? ? " ? ? ? bne ? ? 1b"
>> + ? ? ? ? ? ? ? ? ? ? : "=&r" (ret), "=&r" (tmp)
>> + ? ? ? ? ? ? ? ? ? ? : "r" (x), "r" (ptr)
>> + ? ? ? ? ? ? ? ? ? ? : "memory", "cc");
>> + ? ? ? ? ? ? break;
>> +#else
>> + ? ? case 2:
>> + ? ? ? ? ? ? ret = __xchg_local_generic(x, ptr, 2);
>> + ? ? ? ? ? ? break;
>
> Calling __xchg_local_generic as an underlying xchg() implementation on
> SMP is buggy. I think we have to surround this by "#ifndef CONFIG_SMP",
> so it generates a "__bad_xchg(ptr, size), ret = 0;" error for short xchg
> on SMP. That's odd though, as a program that builds on UP will break on
> SMP.

Point taken, thanks.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFC] arm: add half-word __xchg
  2010-03-18 12:32     ` Mathieu Desnoyers
  2010-03-18 12:37       ` Alexander Shishkin
@ 2010-03-18 13:33       ` Alexander Shishkin
  2010-03-18 13:50         ` Mathieu Desnoyers
  1 sibling, 1 reply; 25+ messages in thread
From: Alexander Shishkin @ 2010-03-18 13:33 UTC (permalink / raw)
  To: linux-arm-kernel

On systems where ldrexh/strexh are not available, use a generic local
version or __bad_xchg() on SMP.

Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
CC: linux-arm-kernel-bounces at lists.infradead.org
CC: Imre Deak <imre.deak@nokia.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 arch/arm/include/asm/system.h |   65 ++++++++++++++++++++++++++++++++++------
 1 files changed, 55 insertions(+), 10 deletions(-)

diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
index d65b2f5..82248ae 100644
--- a/arch/arm/include/asm/system.h
+++ b/arch/arm/include/asm/system.h
@@ -218,6 +218,39 @@ do {									\
 	last = __switch_to(prev,task_thread_info(prev), task_thread_info(next));	\
 } while (0)
 
+static inline unsigned long __xchg_local_generic(unsigned long x,
+						 volatile void *ptr, int size)
+{
+	extern void __bad_xchg(volatile void *, int);
+	unsigned long ret;
+	unsigned long flags;
+
+	switch (size) {
+	case 1:
+		raw_local_irq_save(flags);
+		ret = *(volatile unsigned char *)ptr;
+		*(volatile unsigned char *)ptr = x;
+		raw_local_irq_restore(flags);
+		break;
+
+	case 2:
+		raw_local_irq_save(flags);
+		ret = *(volatile unsigned short *)ptr;
+		*(volatile unsigned short *)ptr = x;
+		raw_local_irq_restore(flags);
+		break;
+
+	case 4:
+		raw_local_irq_save(flags);
+		ret = *(volatile unsigned long *)ptr;
+		*(volatile unsigned long *)ptr = x;
+		raw_local_irq_restore(flags);
+		break;
+	}
+
+	return ret;
+}
+
 #if defined(CONFIG_CPU_SA1100) || defined(CONFIG_CPU_SA110)
 /*
  * On the StrongARM, "swp" is terminally broken since it bypasses the
@@ -262,6 +295,26 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 			: "r" (x), "r" (ptr)
 			: "memory", "cc");
 		break;
+#ifdef CONFIG_CPU_32v6K
+	case 2:
+		asm volatile("@	__xchg2\n"
+		"1:	ldrexh	%0, [%3]\n"
+		"	strexh	%1, %2, [%3]\n"
+		"	teq	%1, #0\n"
+		"	bne	1b"
+			: "=&r" (ret), "=&r" (tmp)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+#else
+	case 2:
+#ifdef CONFIG_SMP
+		__bad_xchg(ptr, size), ret = 0;
+#else
+		ret = __xchg_local_generic(x, ptr, 2);
+#endif
+		break;
+#endif
 	case 4:
 		asm volatile("@	__xchg4\n"
 		"1:	ldrex	%0, [%3]\n"
@@ -277,17 +330,9 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 #error SMP is not supported on this platform
 #endif
 	case 1:
-		raw_local_irq_save(flags);
-		ret = *(volatile unsigned char *)ptr;
-		*(volatile unsigned char *)ptr = x;
-		raw_local_irq_restore(flags);
-		break;
-
+	case 2:
 	case 4:
-		raw_local_irq_save(flags);
-		ret = *(volatile unsigned long *)ptr;
-		*(volatile unsigned long *)ptr = x;
-		raw_local_irq_restore(flags);
+		ret = __xchg_local_generic(x, ptr, size);
 		break;
 #else
 	case 1:
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFC] arm: add half-word __xchg
  2010-03-18 13:33       ` Alexander Shishkin
@ 2010-03-18 13:50         ` Mathieu Desnoyers
  2010-03-18 16:33           ` Imre Deak
                             ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-03-18 13:50 UTC (permalink / raw)
  To: linux-arm-kernel

* Alexander Shishkin (virtuoso at slind.org) wrote:
> On systems where ldrexh/strexh are not available, use a generic local
> version or __bad_xchg() on SMP.
> 
> Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
> CC: linux-arm-kernel-bounces at lists.infradead.org
> CC: Imre Deak <imre.deak@nokia.com>
> CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> ---
>  arch/arm/include/asm/system.h |   65 ++++++++++++++++++++++++++++++++++------
>  1 files changed, 55 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
> index d65b2f5..82248ae 100644
> --- a/arch/arm/include/asm/system.h
> +++ b/arch/arm/include/asm/system.h
> @@ -218,6 +218,39 @@ do {									\
>  	last = __switch_to(prev,task_thread_info(prev), task_thread_info(next));	\
>  } while (0)
>  
> +static inline unsigned long __xchg_local_generic(unsigned long x,
> +						 volatile void *ptr, int size)
> +{
> +	extern void __bad_xchg(volatile void *, int);
> +	unsigned long ret;
> +	unsigned long flags;
> +
> +	switch (size) {
> +	case 1:
> +		raw_local_irq_save(flags);
> +		ret = *(volatile unsigned char *)ptr;
> +		*(volatile unsigned char *)ptr = x;
> +		raw_local_irq_restore(flags);
> +		break;
> +
> +	case 2:
> +		raw_local_irq_save(flags);
> +		ret = *(volatile unsigned short *)ptr;
> +		*(volatile unsigned short *)ptr = x;
> +		raw_local_irq_restore(flags);
> +		break;
> +
> +	case 4:
> +		raw_local_irq_save(flags);
> +		ret = *(volatile unsigned long *)ptr;
> +		*(volatile unsigned long *)ptr = x;
> +		raw_local_irq_restore(flags);
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
>  #if defined(CONFIG_CPU_SA1100) || defined(CONFIG_CPU_SA110)
>  /*
>   * On the StrongARM, "swp" is terminally broken since it bypasses the
> @@ -262,6 +295,26 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
>  			: "r" (x), "r" (ptr)
>  			: "memory", "cc");
>  		break;
> +#ifdef CONFIG_CPU_32v6K
> +	case 2:
> +		asm volatile("@	__xchg2\n"
> +		"1:	ldrexh	%0, [%3]\n"
> +		"	strexh	%1, %2, [%3]\n"
> +		"	teq	%1, #0\n"
> +		"	bne	1b"
> +			: "=&r" (ret), "=&r" (tmp)
> +			: "r" (x), "r" (ptr)
> +			: "memory", "cc");
> +		break;
> +#else
> +	case 2:
> +#ifdef CONFIG_SMP
> +		__bad_xchg(ptr, size), ret = 0;

You don't need to put this one explicitly. See the default case of the
switch.

But.. thinking about it. It's bad to have a 2-byte xchg primitive that
only works on UP and breaks the build on SMP. We should instead
implement a workaround based on __cmpxchg4 to perform the 2-byte xchg().

Thanks,

Mathieu

> +#else
> +		ret = __xchg_local_generic(x, ptr, 2);
> +#endif
> +		break;
> +#endif
>  	case 4:
>  		asm volatile("@	__xchg4\n"
>  		"1:	ldrex	%0, [%3]\n"
> @@ -277,17 +330,9 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
>  #error SMP is not supported on this platform
>  #endif
>  	case 1:
> -		raw_local_irq_save(flags);
> -		ret = *(volatile unsigned char *)ptr;
> -		*(volatile unsigned char *)ptr = x;
> -		raw_local_irq_restore(flags);
> -		break;
> -
> +	case 2:
>  	case 4:
> -		raw_local_irq_save(flags);
> -		ret = *(volatile unsigned long *)ptr;
> -		*(volatile unsigned long *)ptr = x;
> -		raw_local_irq_restore(flags);
> +		ret = __xchg_local_generic(x, ptr, size);
>  		break;
>  #else
>  	case 1:
> -- 
> 1.6.3.3
> 

-- 
Mathieu Desnoyers
Operating System Efficiency Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFC] arm: add half-word __xchg
  2010-03-18 13:50         ` Mathieu Desnoyers
@ 2010-03-18 16:33           ` Imre Deak
  2010-03-18 17:21             ` Mathieu Desnoyers
  2010-03-19  1:49           ` Jamie Lokier
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Imre Deak @ 2010-03-18 16:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 18, 2010 at 02:50:08PM +0100, ext Mathieu Desnoyers wrote:
> * Alexander Shishkin (virtuoso at slind.org) wrote:
> > On systems where ldrexh/strexh are not available, use a generic local
> > version or __bad_xchg() on SMP.
> > 
> > Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
> > CC: linux-arm-kernel-bounces at lists.infradead.org
> > CC: Imre Deak <imre.deak@nokia.com>
> > CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > ---
> >  arch/arm/include/asm/system.h |   65 ++++++++++++++++++++++++++++++++++------
> >  1 files changed, 55 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
> > index d65b2f5..82248ae 100644
> > --- a/arch/arm/include/asm/system.h
> > +++ b/arch/arm/include/asm/system.h
> > @@ -218,6 +218,39 @@ do {									\
> >  	last = __switch_to(prev,task_thread_info(prev), task_thread_info(next));	\
> >  } while (0)
> >  
> > +static inline unsigned long __xchg_local_generic(unsigned long x,
> > +						 volatile void *ptr, int size)
> > +{
> > +	extern void __bad_xchg(volatile void *, int);
> > +	unsigned long ret;
> > +	unsigned long flags;
> > +
> > +	switch (size) {
> > +	case 1:
> > +		raw_local_irq_save(flags);
> > +		ret = *(volatile unsigned char *)ptr;
> > +		*(volatile unsigned char *)ptr = x;
> > +		raw_local_irq_restore(flags);
> > +		break;
> > +
> > +	case 2:
> > +		raw_local_irq_save(flags);
> > +		ret = *(volatile unsigned short *)ptr;
> > +		*(volatile unsigned short *)ptr = x;
> > +		raw_local_irq_restore(flags);
> > +		break;
> > +
> > +	case 4:
> > +		raw_local_irq_save(flags);
> > +		ret = *(volatile unsigned long *)ptr;
> > +		*(volatile unsigned long *)ptr = x;
> > +		raw_local_irq_restore(flags);
> > +		break;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> >  #if defined(CONFIG_CPU_SA1100) || defined(CONFIG_CPU_SA110)
> >  /*
> >   * On the StrongARM, "swp" is terminally broken since it bypasses the
> > @@ -262,6 +295,26 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
> >  			: "r" (x), "r" (ptr)
> >  			: "memory", "cc");
> >  		break;
> > +#ifdef CONFIG_CPU_32v6K
> > +	case 2:
> > +		asm volatile("@	__xchg2\n"
> > +		"1:	ldrexh	%0, [%3]\n"
> > +		"	strexh	%1, %2, [%3]\n"
> > +		"	teq	%1, #0\n"
> > +		"	bne	1b"
> > +			: "=&r" (ret), "=&r" (tmp)
> > +			: "r" (x), "r" (ptr)
> > +			: "memory", "cc");
> > +		break;
> > +#else
> > +	case 2:
> > +#ifdef CONFIG_SMP
> > +		__bad_xchg(ptr, size), ret = 0;
> 

On a related note, why can't we have __bad_xchg as an undefined function
for detecting the error already in build time? That seems to be the
approach in similar places.

--Imre

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFC] arm: add half-word __xchg
  2010-03-18 16:33           ` Imre Deak
@ 2010-03-18 17:21             ` Mathieu Desnoyers
  2010-03-18 19:00               ` Imre Deak
  0 siblings, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-03-18 17:21 UTC (permalink / raw)
  To: linux-arm-kernel

* Imre Deak (imre.deak at nokia.com) wrote:
> On Thu, Mar 18, 2010 at 02:50:08PM +0100, ext Mathieu Desnoyers wrote:
> > * Alexander Shishkin (virtuoso at slind.org) wrote:
> > > On systems where ldrexh/strexh are not available, use a generic local
> > > version or __bad_xchg() on SMP.
> > > 
> > > Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
> > > CC: linux-arm-kernel-bounces at lists.infradead.org
> > > CC: Imre Deak <imre.deak@nokia.com>
> > > CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > > ---
> > >  arch/arm/include/asm/system.h |   65 ++++++++++++++++++++++++++++++++++------
> > >  1 files changed, 55 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
> > > index d65b2f5..82248ae 100644
> > > --- a/arch/arm/include/asm/system.h
> > > +++ b/arch/arm/include/asm/system.h
> > > @@ -218,6 +218,39 @@ do {									\
> > >  	last = __switch_to(prev,task_thread_info(prev), task_thread_info(next));	\
> > >  } while (0)
> > >  
> > > +static inline unsigned long __xchg_local_generic(unsigned long x,
> > > +						 volatile void *ptr, int size)
> > > +{
> > > +	extern void __bad_xchg(volatile void *, int);
> > > +	unsigned long ret;
> > > +	unsigned long flags;
> > > +
> > > +	switch (size) {
> > > +	case 1:
> > > +		raw_local_irq_save(flags);
> > > +		ret = *(volatile unsigned char *)ptr;
> > > +		*(volatile unsigned char *)ptr = x;
> > > +		raw_local_irq_restore(flags);
> > > +		break;
> > > +
> > > +	case 2:
> > > +		raw_local_irq_save(flags);
> > > +		ret = *(volatile unsigned short *)ptr;
> > > +		*(volatile unsigned short *)ptr = x;
> > > +		raw_local_irq_restore(flags);
> > > +		break;
> > > +
> > > +	case 4:
> > > +		raw_local_irq_save(flags);
> > > +		ret = *(volatile unsigned long *)ptr;
> > > +		*(volatile unsigned long *)ptr = x;
> > > +		raw_local_irq_restore(flags);
> > > +		break;
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > > +
> > >  #if defined(CONFIG_CPU_SA1100) || defined(CONFIG_CPU_SA110)
> > >  /*
> > >   * On the StrongARM, "swp" is terminally broken since it bypasses the
> > > @@ -262,6 +295,26 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
> > >  			: "r" (x), "r" (ptr)
> > >  			: "memory", "cc");
> > >  		break;
> > > +#ifdef CONFIG_CPU_32v6K
> > > +	case 2:
> > > +		asm volatile("@	__xchg2\n"
> > > +		"1:	ldrexh	%0, [%3]\n"
> > > +		"	strexh	%1, %2, [%3]\n"
> > > +		"	teq	%1, #0\n"
> > > +		"	bne	1b"
> > > +			: "=&r" (ret), "=&r" (tmp)
> > > +			: "r" (x), "r" (ptr)
> > > +			: "memory", "cc");
> > > +		break;
> > > +#else
> > > +	case 2:
> > > +#ifdef CONFIG_SMP
> > > +		__bad_xchg(ptr, size), ret = 0;
> > 
> 
> On a related note, why can't we have __bad_xchg as an undefined function
> for detecting the error already in build time? That seems to be the
> approach in similar places.

And, unless I am missing your point entirely, this is exactly what is
done here.

Thanks,

Mathieu

> 
> --Imre
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFC] arm: add half-word __xchg
  2010-03-18 17:21             ` Mathieu Desnoyers
@ 2010-03-18 19:00               ` Imre Deak
  2010-03-18 19:30                 ` Mathieu Desnoyers
  0 siblings, 1 reply; 25+ messages in thread
From: Imre Deak @ 2010-03-18 19:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 18, 2010 at 06:21:41PM +0100, ext Mathieu Desnoyers wrote:
> * Imre Deak (imre.deak at nokia.com) wrote:
> > On Thu, Mar 18, 2010 at 02:50:08PM +0100, ext Mathieu Desnoyers wrote:
> > > * Alexander Shishkin (virtuoso at slind.org) wrote:
> > > > On systems where ldrexh/strexh are not available, use a generic local
> > > > version or __bad_xchg() on SMP.
> > > > 
> > > > Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
> > > > CC: linux-arm-kernel-bounces at lists.infradead.org
> > > > CC: Imre Deak <imre.deak@nokia.com>
> > > > CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > > > ---
> > > >  arch/arm/include/asm/system.h |   65 ++++++++++++++++++++++++++++++++++------
> > > >  1 files changed, 55 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
> > > > index d65b2f5..82248ae 100644
> > > > --- a/arch/arm/include/asm/system.h
> > > > +++ b/arch/arm/include/asm/system.h
> > > > @@ -218,6 +218,39 @@ do {									\
> > > >  	last = __switch_to(prev,task_thread_info(prev), task_thread_info(next));	\
> > > >  } while (0)
> > > >  
> > > > +static inline unsigned long __xchg_local_generic(unsigned long x,
> > > > +						 volatile void *ptr, int size)
> > > > +{
> > > > +	extern void __bad_xchg(volatile void *, int);
> > > > +	unsigned long ret;
> > > > +	unsigned long flags;
> > > > +
> > > > +	switch (size) {
> > > > +	case 1:
> > > > +		raw_local_irq_save(flags);
> > > > +		ret = *(volatile unsigned char *)ptr;
> > > > +		*(volatile unsigned char *)ptr = x;
> > > > +		raw_local_irq_restore(flags);
> > > > +		break;
> > > > +
> > > > +	case 2:
> > > > +		raw_local_irq_save(flags);
> > > > +		ret = *(volatile unsigned short *)ptr;
> > > > +		*(volatile unsigned short *)ptr = x;
> > > > +		raw_local_irq_restore(flags);
> > > > +		break;
> > > > +
> > > > +	case 4:
> > > > +		raw_local_irq_save(flags);
> > > > +		ret = *(volatile unsigned long *)ptr;
> > > > +		*(volatile unsigned long *)ptr = x;
> > > > +		raw_local_irq_restore(flags);
> > > > +		break;
> > > > +	}
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > >  #if defined(CONFIG_CPU_SA1100) || defined(CONFIG_CPU_SA110)
> > > >  /*
> > > >   * On the StrongARM, "swp" is terminally broken since it bypasses the
> > > > @@ -262,6 +295,26 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
> > > >  			: "r" (x), "r" (ptr)
> > > >  			: "memory", "cc");
> > > >  		break;
> > > > +#ifdef CONFIG_CPU_32v6K
> > > > +	case 2:
> > > > +		asm volatile("@	__xchg2\n"
> > > > +		"1:	ldrexh	%0, [%3]\n"
> > > > +		"	strexh	%1, %2, [%3]\n"
> > > > +		"	teq	%1, #0\n"
> > > > +		"	bne	1b"
> > > > +			: "=&r" (ret), "=&r" (tmp)
> > > > +			: "r" (x), "r" (ptr)
> > > > +			: "memory", "cc");
> > > > +		break;
> > > > +#else
> > > > +	case 2:
> > > > +#ifdef CONFIG_SMP
> > > > +		__bad_xchg(ptr, size), ret = 0;
> > > 
> > 
> > On a related note, why can't we have __bad_xchg as an undefined function
> > for detecting the error already in build time? That seems to be the
> > approach in similar places.
> 
> And, unless I am missing your point entirely, this is exactly what is
> done here.

__bad_xchg is defined in arch/arm/kernel/traps.c, so at the moment
we'll only get a runtime error.

--Imre

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFC] arm: add half-word __xchg
  2010-03-18 19:00               ` Imre Deak
@ 2010-03-18 19:30                 ` Mathieu Desnoyers
  0 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-03-18 19:30 UTC (permalink / raw)
  To: linux-arm-kernel

* Imre Deak (imre.deak at nokia.com) wrote:
> On Thu, Mar 18, 2010 at 06:21:41PM +0100, ext Mathieu Desnoyers wrote:
> > * Imre Deak (imre.deak at nokia.com) wrote:
> > > On Thu, Mar 18, 2010 at 02:50:08PM +0100, ext Mathieu Desnoyers wrote:
> > > > * Alexander Shishkin (virtuoso at slind.org) wrote:
> > > > > On systems where ldrexh/strexh are not available, use a generic local
> > > > > version or __bad_xchg() on SMP.
> > > > > 
> > > > > Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
> > > > > CC: linux-arm-kernel-bounces at lists.infradead.org
> > > > > CC: Imre Deak <imre.deak@nokia.com>
> > > > > CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > > > > ---
> > > > >  arch/arm/include/asm/system.h |   65 ++++++++++++++++++++++++++++++++++------
> > > > >  1 files changed, 55 insertions(+), 10 deletions(-)
> > > > > 
> > > > > diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
> > > > > index d65b2f5..82248ae 100644
> > > > > --- a/arch/arm/include/asm/system.h
> > > > > +++ b/arch/arm/include/asm/system.h
> > > > > @@ -218,6 +218,39 @@ do {									\
> > > > >  	last = __switch_to(prev,task_thread_info(prev), task_thread_info(next));	\
> > > > >  } while (0)
> > > > >  
> > > > > +static inline unsigned long __xchg_local_generic(unsigned long x,
> > > > > +						 volatile void *ptr, int size)
> > > > > +{
> > > > > +	extern void __bad_xchg(volatile void *, int);
> > > > > +	unsigned long ret;
> > > > > +	unsigned long flags;
> > > > > +
> > > > > +	switch (size) {
> > > > > +	case 1:
> > > > > +		raw_local_irq_save(flags);
> > > > > +		ret = *(volatile unsigned char *)ptr;
> > > > > +		*(volatile unsigned char *)ptr = x;
> > > > > +		raw_local_irq_restore(flags);
> > > > > +		break;
> > > > > +
> > > > > +	case 2:
> > > > > +		raw_local_irq_save(flags);
> > > > > +		ret = *(volatile unsigned short *)ptr;
> > > > > +		*(volatile unsigned short *)ptr = x;
> > > > > +		raw_local_irq_restore(flags);
> > > > > +		break;
> > > > > +
> > > > > +	case 4:
> > > > > +		raw_local_irq_save(flags);
> > > > > +		ret = *(volatile unsigned long *)ptr;
> > > > > +		*(volatile unsigned long *)ptr = x;
> > > > > +		raw_local_irq_restore(flags);
> > > > > +		break;
> > > > > +	}
> > > > > +
> > > > > +	return ret;
> > > > > +}
> > > > > +
> > > > >  #if defined(CONFIG_CPU_SA1100) || defined(CONFIG_CPU_SA110)
> > > > >  /*
> > > > >   * On the StrongARM, "swp" is terminally broken since it bypasses the
> > > > > @@ -262,6 +295,26 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
> > > > >  			: "r" (x), "r" (ptr)
> > > > >  			: "memory", "cc");
> > > > >  		break;
> > > > > +#ifdef CONFIG_CPU_32v6K
> > > > > +	case 2:
> > > > > +		asm volatile("@	__xchg2\n"
> > > > > +		"1:	ldrexh	%0, [%3]\n"
> > > > > +		"	strexh	%1, %2, [%3]\n"
> > > > > +		"	teq	%1, #0\n"
> > > > > +		"	bne	1b"
> > > > > +			: "=&r" (ret), "=&r" (tmp)
> > > > > +			: "r" (x), "r" (ptr)
> > > > > +			: "memory", "cc");
> > > > > +		break;
> > > > > +#else
> > > > > +	case 2:
> > > > > +#ifdef CONFIG_SMP
> > > > > +		__bad_xchg(ptr, size), ret = 0;
> > > > 
> > > 
> > > On a related note, why can't we have __bad_xchg as an undefined function
> > > for detecting the error already in build time? That seems to be the
> > > approach in similar places.
> > 
> > And, unless I am missing your point entirely, this is exactly what is
> > done here.
> 
> __bad_xchg is defined in arch/arm/kernel/traps.c, so at the moment
> we'll only get a runtime error.

*blink* oh.. why is it defined at all ? A linker error would be fine
here. That's what I thought was being done in the first place.

Mathieu

> 
> --Imre
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFC] arm: add half-word __xchg
  2010-03-18 13:50         ` Mathieu Desnoyers
  2010-03-18 16:33           ` Imre Deak
@ 2010-03-19  1:49           ` Jamie Lokier
  2010-03-19  2:12             ` Mathieu Desnoyers
  2010-03-25 15:52           ` [PATCH 1/1] [RFCv2] " Alexander Shishkin
  2010-03-25 16:42           ` Alexander Shishkin
  3 siblings, 1 reply; 25+ messages in thread
From: Jamie Lokier @ 2010-03-19  1:49 UTC (permalink / raw)
  To: linux-arm-kernel

Mathieu Desnoyers wrote:
> But.. thinking about it. It's bad to have a 2-byte xchg primitive that
> only works on UP and breaks the build on SMP. We should instead
> implement a workaround based on __cmpxchg4 to perform the 2-byte xchg().

This exposes why there should be __cmpxchg_bool() versions, which do
not loop, preferably in the generic kernel API, because the workaround
using __cmpxchg4 has to add yet another pointless loop nesting to all
cmpxchg users.

--- Jamie

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFC] arm: add half-word __xchg
  2010-03-19  1:49           ` Jamie Lokier
@ 2010-03-19  2:12             ` Mathieu Desnoyers
  2010-03-19  3:36               ` Jamie Lokier
  0 siblings, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-03-19  2:12 UTC (permalink / raw)
  To: linux-arm-kernel

* Jamie Lokier (jamie at shareable.org) wrote:
> Mathieu Desnoyers wrote:
> > But.. thinking about it. It's bad to have a 2-byte xchg primitive that
> > only works on UP and breaks the build on SMP. We should instead
> > implement a workaround based on __cmpxchg4 to perform the 2-byte xchg().
> 
> This exposes why there should be __cmpxchg_bool() versions, which do
> not loop, preferably in the generic kernel API, because the workaround
> using __cmpxchg4 has to add yet another pointless loop nesting to all
> cmpxchg users.

The workaround I propose is to use __cmpxchg4 in a loop for 2-byte xchg()
fallback on arm; it is not related to cmpxchg() in any way. I don't
think one single use-case justifies the creation of a __cmpxchg_bool()
across all architectures.

Also, I've never seen where having a cmpxchg primitive returning a
boolean is more efficient than returning the value read, ever. If we
have to loop, then we can re-use the value that has been read by
cmpxchg, while the version returning a boolean would need a read to
re-read the next value.

So.. I am afraid I am probably missing your point entirely, because I
don't see how this can improve anything.

Thanks,

Mathieu

> 
> --- Jamie

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFC] arm: add half-word __xchg
  2010-03-19  2:12             ` Mathieu Desnoyers
@ 2010-03-19  3:36               ` Jamie Lokier
  0 siblings, 0 replies; 25+ messages in thread
From: Jamie Lokier @ 2010-03-19  3:36 UTC (permalink / raw)
  To: linux-arm-kernel

Mathieu Desnoyers wrote:
> * Jamie Lokier (jamie at shareable.org) wrote:
> > Mathieu Desnoyers wrote:
> > > But.. thinking about it. It's bad to have a 2-byte xchg primitive that
> > > only works on UP and breaks the build on SMP. We should instead
> > > implement a workaround based on __cmpxchg4 to perform the 2-byte xchg().
> > 
> > This exposes why there should be __cmpxchg_bool() versions, which do
> > not loop, preferably in the generic kernel API, because the workaround
> > using __cmpxchg4 has to add yet another pointless loop nesting to all
> > cmpxchg users.
> 
> The workaround I propose is to use __cmpxchg4 in a loop for 2-byte xchg()
> fallback on arm; it is not related to cmpxchg() in any way. I don't
> think one single use-case justifies the creation of a __cmpxchg_bool()
> across all architectures.
> 
> Also, I've never seen where having a cmpxchg primitive returning a
> boolean is more efficient than returning the value read, ever. If we
> have to loop, then we can re-use the value that has been read by
> cmpxchg, while the version returning a boolean would need a read to
> re-read the next value.

Your workaround is an example (but we are both right; the best I can
come up with is described at the end).  This is a version of __xchg2
as you've described (please ignore any irrelevant mistakes :-):

static inline u16 __xchg2(u16 *ptr, u16 new)
{
	int shift = (~(unsigned)ptr & 2) * 8; /* Assumes big-endian. */
	u32 mask = ~(0x0000ffff << shift), add = new << shift;
	u32 *ptrbig = (u32 *)((char *)ptr - ((unsigned)ptr & 2));

	u32 oldbig = *ptrbig;
	while (1) {
		u32 tmp = __cmpxchg4(ptrbig, oldbig, (oldbig & mask) | add);
		if (tmp == oldbig)
			break;
		oldbig = tmp;
	}

	return (u16)(oldbig >> shift);
}

This is __cmpxchg4() from the __cmpxchg() macro in <asm/system.h>,
types changed for sanity:

static inline u32 __cmpxchg4(u32 *ptr, u32 old, u32 new)
{
	u32 oldval, res;
	do {
		asm volatile("@ __cmpxchg4\n"
		"       ldrex   %1, [%2]\n"
		"       mov     %0, #0\n"
		"       teq     %1, %3\n"
		"       strexeq %0, %4, [%2]\n"
			: "=&r" (res), "=&r" (oldval)
			: "r" (ptr), "Ir" (old), "r" (new)
			: "memory", "cc");
	} while (res);
	return oldval;
}

Now let's imagine a __cmpxchg4_bool, which returns non-zero if the new
value was stored, and zero if the new value was not stored:

static inline int __cmpxchg4_bool(u32 *ptr, u32 old, u32 new)
{
	u32 tmp, failed;
	asm volatile("@ __cmpxchg4_bool\n"
	"       ldrex   %1, [%2]\n"
	"       mov     %0, #1\n"
	"       teq     %1, %3\n"
	"       strexeq %0, %4, [%2]\n"
		: "=&r" (res), "=&r" (tmp)
		: "r" (ptr), "Ir" (old), "r" (new)
		: "memory", "cc");
	return !failed;
}

And the alternative loop part of __xchg2 (setup and return aren't
interseting):

	do {
		oldbig = *ptrbig;
	} while (!__cmpxchg4_bool(ptrbig, oldbig, (oldbig & mask) | add);

	
I'm sure you can already see where this is going.

Let's compile __xchg2 using __cmpxchg4 by hand.  We're not interested
in the setup, just the loop, so let's imagine the code starting at
"oldbig = *ptrbig".  The names are all symbolic register names:

	ldr	oldbig, [ptrbig]

	@ start of loop in __xchg2
	and	new, oldbig, mask
	or	new, new, add
1:
	@ start of __cmpxchg4
2:
	ldrex	tmp, [ptrbig]
	mov	res, #0
	teq	tmp, oldbig
	strexeq	res, newbig, [ptrbig]

	@ end of asm, but do..while(res) still needed
	teq	res, #0
	bne	2b
	@ end of __cmpxchg4

	@ remainder of loop in __xchg2
	teq	tmp, oldbig
	mov	oldbig, tmp
	bne	1b

The result is (oldbig >> shift).

Now for comparison, the equivalent part of __xchg2 using __cmpxchg4_bool:

	@ start of loop in __xchg2
1:
	ldr	oldbig, [ptrbig]
	and	new, oldbig, mask
	or	new, new, add

	@ start of __cmpxchg4_bool
	ldrex	tmp, [ptrbig]
	mov	failed, #1
	teq	tmp, oldbig
	strexeq	failed, newbig, [ptrbig]
	@ end of __cmpxchg4_bool (!failed is optimised by caller's !)

	@ remainder of loop in __xchg2
	teq	failed, #0
	bne	1b

As you can see, the bool version has one less loop nesting, and is 9
instructions instead of 11 for the non-bool version because of that.

Although the bool version is shorter, and the double-loop is
pointless, 

You are right that the ldrex result can be recycled to avoid a load
(and is probably faster), although the double-loop is pointless.

Meaning the "ideal" API for the bool-flavoured call would returns
_both_ the old value and whether the new value was stored.  A tidy way
to do that would be to take &old instead, and modify the pointed-to value.

Combining both observations:

static inline int __cmpxchg4_bool2(u32 *ptr, u32 *old, u32 new)
{
	u32 failed;
	asm volatile("@ __cmpxchg4_bool2\n"
	"       ldrex   %1, [%2]\n"
	"       mov     %0, #1\n"
	"       teq     %1, %3\n"
	"       strexeq %0, %4, [%2]\n"
		: "=&r" (res), "=&r" (*old)
		: "r" (ptr), "Ir" (*old), "r" (new)
		: "memory", "cc");
	return !failed;
}

The new loop part of __xchg2:

	u32 oldbig = *ptrbig;
	while (!__cmpxchg4_bool2(ptrbig, &oldbig, (oldbig & mask) | add)) {}

The result (assuming GCC does it's job optimising *&oldbig to oldbig):
	
	ldr	oldbig, [ptrbig]

	@ start of loop in __xchg2
1:
	and	new, oldbig, mask
	or	new, new, add

	@ start of __cmpxchg4_bool2
	ldrex	tmp, [ptrbig]
	mov	failed, #1
	teq	tmp, oldbig
	strexeq	failed, newbig, [ptrbig]
	mov	oldbig, tmp
	@ end of __cmpxchg4_bool2 (!failed is optimised by caller's !)

	@ remainder of loop in __xchg2
	teq	failed, #0
	bne	1b

That eliminates the load and the double-loop.

Quite a lot of __cmpxchg calls in the kernel are like this, with an
unnecessary extra loop, but by no means all are.  It's not just
__xchg2, but obviously all the callers would have to change to take
advantage of a __cmpxchg_bool2-like function.

The difference becomes more evident when writing __cmpxchg2 emulation
using __cmpxchg4.  Then you get an unnecessary *triple* loop at many
call sites: From __cmpxchg4 itself, __cmpxchg2 wrapped around it, and
the caller is also looping.  Only the caller's loop is necessary.

Obviously we can only go so far.  The shortest version of __xchg2 puts
the and+or between ldrex and strex, but that really isn't something
that can be exported safely in an arch-generic way:

	@ start of loop in __xchg2_smallest
1:
	ldrex	oldbig, [ptrbig]
	and	new, oldbig, mask
	or	new, new, add
	strex	failed, newbig, [ptrbig]
	teq	failed, #0
	bne	1b

The difference between 6 instructions and the original's 11, and the
fact it is simple and clear, leads me to suggest using last one as the
ARM version of __xchg2 when {ld,st}rexh are not available.  It's only
downside is it can't go in asm-generic.  That is, unless we had an
__atomic_and_xor() arch function, for asm-generic to use :-)

-- Jamie

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFCv2] arm: add half-word __xchg
  2010-03-18 13:50         ` Mathieu Desnoyers
  2010-03-18 16:33           ` Imre Deak
  2010-03-19  1:49           ` Jamie Lokier
@ 2010-03-25 15:52           ` Alexander Shishkin
  2010-03-25 16:42           ` Alexander Shishkin
  3 siblings, 0 replies; 25+ messages in thread
From: Alexander Shishkin @ 2010-03-25 15:52 UTC (permalink / raw)
  To: linux-arm-kernel

On systems where ldrexh/strexh are not available,
  * for pre-v6 systems, use a generic local version,
  * for v6 without v6K, emulate xchg2 using 32-bit cmpxchg()
    (it is not yet clear if xchg1 has to be emulated on such
    systems as well, thus the "size" parameter).

The __xchg_generic() function is based on the code that Jamie
posted earlier.

Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
CC: linux-arm-kernel-bounces at lists.infradead.org
CC: Imre Deak <imre.deak@nokia.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Jamie Lokier <jamie@shareable.org>
---
 arch/arm/include/asm/system.h |   53 +++++++++++++++++++++++++++++++++++++++++
 1 files changed, 53 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
index d65b2f5..e7a17ab 100644
--- a/arch/arm/include/asm/system.h
+++ b/arch/arm/include/asm/system.h
@@ -218,6 +218,36 @@ do {									\
 	last = __switch_to(prev,task_thread_info(prev), task_thread_info(next));	\
 } while (0)
 
+#ifdef __LINUX_ARM_ARCH__ >= 6
+static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
+				      unsigned long new, int size);
+
+/*
+ * emulate __xchg() using 32-bit __cmpxchg()
+ */
+static inline unsigned long __xchg_generic(unsigned long x,
+						 volatile void *ptr, int size)
+{
+	unsigned long *ptrbig = (unsigned long *)((unsigned long)ptr & ~3UL);
+	int shift = ((unsigned)ptr - (unsigned)ptrbig) * 8;
+	unsigned long mask, add, ret;
+
+	mask = ~(((1 << (size * 8)) - 1) << shift);
+	add = x << shift;
+
+	ret = *ptrbig;
+	while (1) {
+		unsigned long tmp = __cmpxchg(ptrbig, ret, (ret & mask) | add,
+					      4);
+		if (tmp == ret)
+			break;
+		ret = tmp;
+	}
+
+	return ret;
+}
+#endif
+
 #if defined(CONFIG_CPU_SA1100) || defined(CONFIG_CPU_SA110)
 /*
  * On the StrongARM, "swp" is terminally broken since it bypasses the
@@ -262,6 +292,22 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 			: "r" (x), "r" (ptr)
 			: "memory", "cc");
 		break;
+#ifdef CONFIG_CPU_32v6K
+	case 2:
+		asm volatile("@	__xchg2\n"
+		"1:	ldrexh	%0, [%3]\n"
+		"	strexh	%1, %2, [%3]\n"
+		"	teq	%1, #0\n"
+		"	bne	1b"
+			: "=&r" (ret), "=&r" (tmp)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+#else
+	case 2:
+		ret = __xchg_generic(x, ptr, 2);
+		break;
+#endif
 	case 4:
 		asm volatile("@	__xchg4\n"
 		"1:	ldrex	%0, [%3]\n"
@@ -283,6 +329,13 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 		raw_local_irq_restore(flags);
 		break;
 
+	case 2:
+		raw_local_irq_save(flags);
+		ret = *(volatile unsigned short *)ptr;
+		*(volatile unsigned short *)ptr = x;
+		raw_local_irq_restore(flags);
+		break;
+
 	case 4:
 		raw_local_irq_save(flags);
 		ret = *(volatile unsigned long *)ptr;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFCv2] arm: add half-word __xchg
  2010-03-18 13:50         ` Mathieu Desnoyers
                             ` (2 preceding siblings ...)
  2010-03-25 15:52           ` [PATCH 1/1] [RFCv2] " Alexander Shishkin
@ 2010-03-25 16:42           ` Alexander Shishkin
  2010-03-27 22:52             ` Russell King - ARM Linux
  3 siblings, 1 reply; 25+ messages in thread
From: Alexander Shishkin @ 2010-03-25 16:42 UTC (permalink / raw)
  To: linux-arm-kernel

[fixed #if __LINUX_ARM_ARCH__]
On systems where ldrexh/strexh are not available,
  * for pre-v6 systems, use a generic local version,
  * for v6 without v6K, emulate xchg2 using 32-bit cmpxchg()
    (it is not yet clear if xchg1 has to be emulated on such
    systems as well, thus the "size" parameter).

The __xchg_generic() function is based on the code that Jamie
posted earlier.

Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
CC: linux-arm-kernel-bounces at lists.infradead.org
CC: Imre Deak <imre.deak@nokia.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Jamie Lokier <jamie@shareable.org>
---
 arch/arm/include/asm/system.h |   53 +++++++++++++++++++++++++++++++++++++++++
 1 files changed, 53 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
index d65b2f5..b9b5ec7 100644
--- a/arch/arm/include/asm/system.h
+++ b/arch/arm/include/asm/system.h
@@ -218,6 +218,36 @@ do {									\
 	last = __switch_to(prev,task_thread_info(prev), task_thread_info(next));	\
 } while (0)
 
+#if __LINUX_ARM_ARCH__ >= 6
+static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
+				      unsigned long new, int size);
+
+/*
+ * emulate __xchg() using 32-bit __cmpxchg()
+ */
+static inline unsigned long __xchg_generic(unsigned long x,
+						 volatile void *ptr, int size)
+{
+	unsigned long *ptrbig = (unsigned long *)((unsigned long)ptr & ~3UL);
+	int shift = ((unsigned)ptr - (unsigned)ptrbig) * 8;
+	unsigned long mask, add, ret;
+
+	mask = ~(((1 << (size * 8)) - 1) << shift);
+	add = x << shift;
+
+	ret = *ptrbig;
+	while (1) {
+		unsigned long tmp = __cmpxchg(ptrbig, ret, (ret & mask) | add,
+					      4);
+		if (tmp == ret)
+			break;
+		ret = tmp;
+	}
+
+	return ret;
+}
+#endif
+
 #if defined(CONFIG_CPU_SA1100) || defined(CONFIG_CPU_SA110)
 /*
  * On the StrongARM, "swp" is terminally broken since it bypasses the
@@ -262,6 +292,22 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 			: "r" (x), "r" (ptr)
 			: "memory", "cc");
 		break;
+#ifdef CONFIG_CPU_32v6K
+	case 2:
+		asm volatile("@	__xchg2\n"
+		"1:	ldrexh	%0, [%3]\n"
+		"	strexh	%1, %2, [%3]\n"
+		"	teq	%1, #0\n"
+		"	bne	1b"
+			: "=&r" (ret), "=&r" (tmp)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+#else
+	case 2:
+		ret = __xchg_generic(x, ptr, 2);
+		break;
+#endif
 	case 4:
 		asm volatile("@	__xchg4\n"
 		"1:	ldrex	%0, [%3]\n"
@@ -283,6 +329,13 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 		raw_local_irq_restore(flags);
 		break;
 
+	case 2:
+		raw_local_irq_save(flags);
+		ret = *(volatile unsigned short *)ptr;
+		*(volatile unsigned short *)ptr = x;
+		raw_local_irq_restore(flags);
+		break;
+
 	case 4:
 		raw_local_irq_save(flags);
 		ret = *(volatile unsigned long *)ptr;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFCv2] arm: add half-word __xchg
  2010-03-25 16:42           ` Alexander Shishkin
@ 2010-03-27 22:52             ` Russell King - ARM Linux
  2010-03-28  0:14               ` Jamie Lokier
  0 siblings, 1 reply; 25+ messages in thread
From: Russell King - ARM Linux @ 2010-03-27 22:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 25, 2010 at 06:42:46PM +0200, Alexander Shishkin wrote:
> +/*
> + * emulate __xchg() using 32-bit __cmpxchg()
> + */
> +static inline unsigned long __xchg_generic(unsigned long x,
> +						 volatile void *ptr, int size)
> +{
> +	unsigned long *ptrbig = (unsigned long *)((unsigned long)ptr & ~3UL);
> +	int shift = ((unsigned)ptr - (unsigned)ptrbig) * 8;

I wonder if we should be using __alignof__ here.

	unsigned long *ptrbig = (unsigned long *)((unsigned long)ptr &
		(__alignof__(unsigned long) - 1));

Rest of the patch looks fine; it'd be nice to get some tested-by's for it
though.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFCv2] arm: add half-word __xchg
  2010-03-27 22:52             ` Russell King - ARM Linux
@ 2010-03-28  0:14               ` Jamie Lokier
  2010-03-28  0:18                 ` Russell King - ARM Linux
  2010-03-28  1:00                 ` Mathieu Desnoyers
  0 siblings, 2 replies; 25+ messages in thread
From: Jamie Lokier @ 2010-03-28  0:14 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> I wonder if we should be using __alignof__ here.
> 
> 	unsigned long *ptrbig = (unsigned long *)((unsigned long)ptr &
> 		(__alignof__(unsigned long) - 1));

Are there ARM targets with a smaller value from __alignof__()?
I think you meant:

	unsigned long *ptrbig = (unsigned long *)((unsigned long)ptr &
		~(unsigned long)(__alignof__(unsigned long) - 1));

Perhaps in asm-generic that has a place.  It would simplify the asm if
the alignment is 1 on some machine.

But I don't think it'd happen anyway.  There are machines which don't
require full alignment of long, but insufficiently aligned *atomic*
accesses like cmpxchg are *not atomic*.  x86 is one such machine.
Fortunately GCC aligns the types sufficiently well - and we rely on
that all over the kernel.

I'm not sure about ARM, when doing a 64-bit cmpxchg, if the doubleword
must be 64-bit aligned to get atomicity.

Note that the posted code doesn't work as is for 64-bit longs: the
"mask" calculation overflows if called with size >= 4.

But seeing as this is for ARM only at present, I'd just change the
types to u32 and be done with it.  It does seem like a good thing to
go to asm-generic eventually though.

-- Jamie

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFCv2] arm: add half-word __xchg
  2010-03-28  0:14               ` Jamie Lokier
@ 2010-03-28  0:18                 ` Russell King - ARM Linux
  2010-03-28  1:00                 ` Mathieu Desnoyers
  1 sibling, 0 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2010-03-28  0:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Mar 28, 2010 at 12:14:29AM +0000, Jamie Lokier wrote:
> But seeing as this is for ARM only at present, I'd just change the
> types to u32 and be done with it.  It does seem like a good thing to
> go to asm-generic eventually though.

Which is why having it use __alignof__ now would be a good idea - it
avoids someone blindly copying the code and then there being a subtle
bug in it.

I was wondering if a generic PTR_ALIGN(type, ptr) macro would be a
good idea too.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFCv2] arm: add half-word __xchg
  2010-03-28  0:14               ` Jamie Lokier
  2010-03-28  0:18                 ` Russell King - ARM Linux
@ 2010-03-28  1:00                 ` Mathieu Desnoyers
  2010-03-28 14:39                   ` Jamie Lokier
  1 sibling, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-03-28  1:00 UTC (permalink / raw)
  To: linux-arm-kernel

* Jamie Lokier (jamie at shareable.org) wrote:
> Russell King - ARM Linux wrote:
> > I wonder if we should be using __alignof__ here.
> > 
> > 	unsigned long *ptrbig = (unsigned long *)((unsigned long)ptr &
> > 		(__alignof__(unsigned long) - 1));
> 
> Are there ARM targets with a smaller value from __alignof__()?
> I think you meant:
> 
> 	unsigned long *ptrbig = (unsigned long *)((unsigned long)ptr &
> 		~(unsigned long)(__alignof__(unsigned long) - 1));
> 
> Perhaps in asm-generic that has a place.  It would simplify the asm if
> the alignment is 1 on some machine.
> 
> But I don't think it'd happen anyway.  There are machines which don't
> require full alignment of long, but insufficiently aligned *atomic*
> accesses like cmpxchg are *not atomic*.  x86 is one such machine.

I think you mean CMPXCHG8B and CMPXCHG16B ?

If my memory serves me well, cmpxchg is fine with unaligned accesses on
x86. But you are right in that other architectures will have a hard time
with non-aligned cmpxchg. I'm just not sure about x86 specifically.

Thanks,

Mathieu

> Fortunately GCC aligns the types sufficiently well - and we rely on
> that all over the kernel.
> 
> I'm not sure about ARM, when doing a 64-bit cmpxchg, if the doubleword
> must be 64-bit aligned to get atomicity.
> 
> Note that the posted code doesn't work as is for 64-bit longs: the
> "mask" calculation overflows if called with size >= 4.
> 
> But seeing as this is for ARM only at present, I'd just change the
> types to u32 and be done with it.  It does seem like a good thing to
> go to asm-generic eventually though.
> 
> -- Jamie

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] [RFCv2] arm: add half-word __xchg
  2010-03-28  1:00                 ` Mathieu Desnoyers
@ 2010-03-28 14:39                   ` Jamie Lokier
  0 siblings, 0 replies; 25+ messages in thread
From: Jamie Lokier @ 2010-03-28 14:39 UTC (permalink / raw)
  To: linux-arm-kernel

Mathieu Desnoyers wrote:
> * Jamie Lokier (jamie at shareable.org) wrote:
> > Russell King - ARM Linux wrote:
> > > I wonder if we should be using __alignof__ here.
> > > 
> > > 	unsigned long *ptrbig = (unsigned long *)((unsigned long)ptr &
> > > 		(__alignof__(unsigned long) - 1));
> > 
> > Are there ARM targets with a smaller value from __alignof__()?
> > I think you meant:
> > 
> > 	unsigned long *ptrbig = (unsigned long *)((unsigned long)ptr &
> > 		~(unsigned long)(__alignof__(unsigned long) - 1));
> > 
> > Perhaps in asm-generic that has a place.  It would simplify the asm if
> > the alignment is 1 on some machine.
> > 
> > But I don't think it'd happen anyway.  There are machines which don't
> > require full alignment of long, but insufficiently aligned *atomic*
> > accesses like cmpxchg are *not atomic*.  x86 is one such machine.
> 
> I think you mean CMPXCHG8B and CMPXCHG16B ?
> 
> If my memory serves me well, cmpxchg is fine with unaligned accesses on
> x86. But you are right in that other architectures will have a hard time
> with non-aligned cmpxchg. I'm just not sure about x86 specifically.

I'm surprised, I thought I'd read it somewhere, but you're right,
except that CMPXCHG8B also permits arbitrary alignment, as an old
Intel manual states:

    The integrity of the LOCK prefix is not affected by the alignment
    of the memory field. Memory locking is observed for arbitrarily
    misaligned fields.

I didn't check anything about CMPXCHG16.

What I'd misremembered was this:

    To improve performance of applications, AMD64 processors can
    speculatively execute instructions out of program order and
    temporarily hold out-of-order results. However, certain rules are
    followed with regard to normal cacheable accesses on naturally
    aligned boundaries to WB memory.

In other words, the x86 implicit SMP barriers are only guaranteed for
naturally aligned accesses.

ARM does requires naturally aligned atomics, and that's an issue if a
kernel is built for OABI, which has 4-byte alignment for "long long".
Fortunately it throws a noisy alignment fault, rather than being
quietly non-atomic :-)

-- Jamie

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2010-03-28 14:39 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-10 16:22 __xchg for sizes other than 32bit Imre Deak
2010-03-10 17:35 ` Russell King - ARM Linux
2010-03-10 20:02   ` [PATCH] ARM support single byte cmpxchg and cmpxchg_local on ARMv6 Mathieu Desnoyers
2010-03-10 20:30     ` Russell King - ARM Linux
2010-03-10 21:15       ` Mathieu Desnoyers
2010-03-10 23:16   ` __xchg for sizes other than 32bit Jamie Lokier
2010-03-18  9:29   ` [PATCH 1/1] [RFC] arm: add half-word __xchg Alexander Shishkin
2010-03-18 12:32     ` Mathieu Desnoyers
2010-03-18 12:37       ` Alexander Shishkin
2010-03-18 13:33       ` Alexander Shishkin
2010-03-18 13:50         ` Mathieu Desnoyers
2010-03-18 16:33           ` Imre Deak
2010-03-18 17:21             ` Mathieu Desnoyers
2010-03-18 19:00               ` Imre Deak
2010-03-18 19:30                 ` Mathieu Desnoyers
2010-03-19  1:49           ` Jamie Lokier
2010-03-19  2:12             ` Mathieu Desnoyers
2010-03-19  3:36               ` Jamie Lokier
2010-03-25 15:52           ` [PATCH 1/1] [RFCv2] " Alexander Shishkin
2010-03-25 16:42           ` Alexander Shishkin
2010-03-27 22:52             ` Russell King - ARM Linux
2010-03-28  0:14               ` Jamie Lokier
2010-03-28  0:18                 ` Russell King - ARM Linux
2010-03-28  1:00                 ` Mathieu Desnoyers
2010-03-28 14:39                   ` Jamie Lokier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).