public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* Fix ia64 bit ops: Full barriers for bit operations returning a value
@ 2006-04-03 18:32 Christoph Lameter
  2006-04-03 23:11 ` Chen, Kenneth W
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Christoph Lameter @ 2006-04-03 18:32 UTC (permalink / raw)
  To: linux-ia64

On Sat, 1 Apr 2006, Nick Piggin wrote:

> All bitop and atomic test_and_set, inc_return, etc etc (ie. everything
> that modifies the operand and returns something) needs to be a full
> barrier before and after too.

Fix ia64 bitops: full barriers in bitops returning a value

This fixes up bitops so that they provide a full barrier which are 
required according to Documentation/atomic_ops.txt. Bit operations use a 
cmpxchg with a prior load from a volatile pointer. This load is an acquire 
operation. I think we can simply make the cmpxchg have release semantics 
in order to produce a full acquire / release cycle.

Note that this only fixes up the bit operations if used together with the
earlier fix for the clear_bit barriers. We still need a fix for general 
atomic operations.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.16-mm2/include/asm-ia64/bitops.h
=================================--- linux-2.6.16-mm2.orig/include/asm-ia64/bitops.h	2006-03-31 11:17:53.000000000 -0800
+++ linux-2.6.16-mm2/include/asm-ia64/bitops.h	2006-04-03 11:19:45.000000000 -0700
@@ -163,13 +163,14 @@ test_and_set_bit (int nr, volatile void 
 	volatile __u32 *m;
 	CMPXCHG_BUGCHECK_DECL
 
+	/* Volatile load = acquire */
 	m = (volatile __u32 *) addr + (nr >> 5);
 	bit = 1 << (nr & 31);
 	do {
 		CMPXCHG_BUGCHECK(m);
 		old = *m;
 		new = old | bit;
-	} while (cmpxchg_acq(m, old, new) != old);
+	} while (cmpxchg_rel(m, old, new) != old);
 	return (old & bit) != 0;
 }
 
@@ -208,13 +209,14 @@ test_and_clear_bit (int nr, volatile voi
 	volatile __u32 *m;
 	CMPXCHG_BUGCHECK_DECL
 
+	/* Volatile load = acquire */
 	m = (volatile __u32 *) addr + (nr >> 5);
 	mask = ~(1 << (nr & 31));
 	do {
 		CMPXCHG_BUGCHECK(m);
 		old = *m;
 		new = old & mask;
-	} while (cmpxchg_acq(m, old, new) != old);
+	} while (cmpxchg_rel(m, old, new) != old);
 	return (old & ~mask) != 0;
 }
 
@@ -253,13 +255,14 @@ test_and_change_bit (int nr, volatile vo
 	volatile __u32 *m;
 	CMPXCHG_BUGCHECK_DECL
 
+	/* Volatile load = acquire */
 	m = (volatile __u32 *) addr + (nr >> 5);
 	bit = (1 << (nr & 31));
 	do {
 		CMPXCHG_BUGCHECK(m);
 		old = *m;
 		new = old ^ bit;
-	} while (cmpxchg_acq(m, old, new) != old);
+	} while (cmpxchg_rel(m, old, new) != old);
 	return (old & bit) != 0;
 }
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Fix ia64 bit ops: Full barriers for bit operations returning a value
  2006-04-03 18:32 Fix ia64 bit ops: Full barriers for bit operations returning a value Christoph Lameter
@ 2006-04-03 23:11 ` Chen, Kenneth W
  2006-04-04  1:05 ` Fix ia64 bit ops: Full barriers for bit operations returning Nick Piggin
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Chen, Kenneth W @ 2006-04-03 23:11 UTC (permalink / raw)
  To: linux-ia64

Christoph Lameter wrote on Monday, April 03, 2006 11:33 AM
> On Sat, 1 Apr 2006, Nick Piggin wrote:
> 
> > All bitop and atomic test_and_set, inc_return, etc etc (ie. everything
> > that modifies the operand and returns something) needs to be a full
> > barrier before and after too.
> 
> Fix ia64 bitops: full barriers in bitops returning a value
> 
> This fixes up bitops so that they provide a full barrier which are 
> required according to Documentation/atomic_ops.txt. Bit operations use a 
> cmpxchg with a prior load from a volatile pointer. This load is an acquire 
> operation. I think we can simply make the cmpxchg have release semantics 
> in order to produce a full acquire / release cycle.
> 
> Note that this only fixes up the bit operations if used together with the
> earlier fix for the clear_bit barriers. We still need a fix for general 
> atomic operations.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>


Looks good to me.
Acked-by: Ken Chen <kenneth.w.chen@intel.com>



> Index: linux-2.6.16-mm2/include/asm-ia64/bitops.h
> =================================> --- linux-2.6.16-mm2.orig/include/asm-ia64/bitops.h	2006-03-31 11:17:53.000000000 -0800
> +++ linux-2.6.16-mm2/include/asm-ia64/bitops.h	2006-04-03 11:19:45.000000000 -0700
> @@ -163,13 +163,14 @@ test_and_set_bit (int nr, volatile void 
>  	volatile __u32 *m;
>  	CMPXCHG_BUGCHECK_DECL
>  
> +	/* Volatile load = acquire */
>  	m = (volatile __u32 *) addr + (nr >> 5);
>  	bit = 1 << (nr & 31);
>  	do {
>  		CMPXCHG_BUGCHECK(m);
>  		old = *m;
>  		new = old | bit;
> -	} while (cmpxchg_acq(m, old, new) != old);
> +	} while (cmpxchg_rel(m, old, new) != old);
>  	return (old & bit) != 0;
>  }
>  
> @@ -208,13 +209,14 @@ test_and_clear_bit (int nr, volatile voi
>  	volatile __u32 *m;
>  	CMPXCHG_BUGCHECK_DECL
>  
> +	/* Volatile load = acquire */
>  	m = (volatile __u32 *) addr + (nr >> 5);
>  	mask = ~(1 << (nr & 31));
>  	do {
>  		CMPXCHG_BUGCHECK(m);
>  		old = *m;
>  		new = old & mask;
> -	} while (cmpxchg_acq(m, old, new) != old);
> +	} while (cmpxchg_rel(m, old, new) != old);
>  	return (old & ~mask) != 0;
>  }
>  
> @@ -253,13 +255,14 @@ test_and_change_bit (int nr, volatile vo
>  	volatile __u32 *m;
>  	CMPXCHG_BUGCHECK_DECL
>  
> +	/* Volatile load = acquire */
>  	m = (volatile __u32 *) addr + (nr >> 5);
>  	bit = (1 << (nr & 31));
>  	do {
>  		CMPXCHG_BUGCHECK(m);
>  		old = *m;
>  		new = old ^ bit;
> -	} while (cmpxchg_acq(m, old, new) != old);
> +	} while (cmpxchg_rel(m, old, new) != old);
>  	return (old & bit) != 0;
>  }
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fix ia64 bit ops: Full barriers for bit operations returning
  2006-04-03 18:32 Fix ia64 bit ops: Full barriers for bit operations returning a value Christoph Lameter
  2006-04-03 23:11 ` Chen, Kenneth W
@ 2006-04-04  1:05 ` Nick Piggin
  2006-04-04  2:11 ` Fix ia64 bit ops: Full barriers for bit operations returning a Christoph Lameter
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Nick Piggin @ 2006-04-04  1:05 UTC (permalink / raw)
  To: linux-ia64

Chen, Kenneth W wrote:

>Christoph Lameter wrote on Monday, April 03, 2006 11:33 AM
>
>>
>>Note that this only fixes up the bit operations if used together with the
>>earlier fix for the clear_bit barriers. We still need a fix for general 
>>atomic operations.
>>
>>Signed-off-by: Christoph Lameter <clameter@sgi.com>
>>
>
>
>Looks good to me.
>Acked-by: Ken Chen <kenneth.w.chen@intel.com>
>

I don't think my concern about this approach was ever answered: an
acquire + release combination will always allow memory operations to
move past the atomic operation, won't it?

I would have thought you'd need release + acquire.

--

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Fix ia64 bit ops: Full barriers for bit operations returning a
  2006-04-03 18:32 Fix ia64 bit ops: Full barriers for bit operations returning a value Christoph Lameter
  2006-04-03 23:11 ` Chen, Kenneth W
  2006-04-04  1:05 ` Fix ia64 bit ops: Full barriers for bit operations returning Nick Piggin
@ 2006-04-04  2:11 ` Christoph Lameter
  2006-04-04 13:30 ` Fix ia64 bit ops: Full barriers for bit operations returning a value Chen, Kenneth W
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Christoph Lameter @ 2006-04-04  2:11 UTC (permalink / raw)
  To: linux-ia64

On Mon, 3 Apr 2006, Chen, Kenneth W wrote:

> Looks good to me.
> Acked-by: Ken Chen <kenneth.w.chen@intel.com>

Ummm.. Are you sure? The release should come before the acquire. The 
final store of the cmpxchg may be delayed until after stores following the cmpxchg?

I am not sure how to fix this in an elegant way without adding a explicit
mf(). Any ideas?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Fix ia64 bit ops: Full barriers for bit operations returning a value
  2006-04-03 18:32 Fix ia64 bit ops: Full barriers for bit operations returning a value Christoph Lameter
                   ` (2 preceding siblings ...)
  2006-04-04  2:11 ` Fix ia64 bit ops: Full barriers for bit operations returning a Christoph Lameter
@ 2006-04-04 13:30 ` Chen, Kenneth W
  2006-04-04 14:40 ` Fix ia64 bit ops: Full barriers for bit operations returning a Christoph Lameter
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Chen, Kenneth W @ 2006-04-04 13:30 UTC (permalink / raw)
  To: linux-ia64

Christoph Lameter wrote on Monday, April 03, 2006 7:12 PM
> On Mon, 3 Apr 2006, Chen, Kenneth W wrote:
> 
> > Looks good to me.
> > Acked-by: Ken Chen <kenneth.w.chen@intel.com>
> 
> Ummm.. Are you sure? The release should come before the acquire. The 
> final store of the cmpxchg may be delayed until after stores following
> the cmpxchg?

I realized that.  However, this patch is one step forward to fix current
deficiency even if it is not a full solution.  Hence the ack. I also take
that as your intention to post this patch as well, no?

- Ken

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fix ia64 bit ops: Full barriers for bit operations returning a
  2006-04-03 18:32 Fix ia64 bit ops: Full barriers for bit operations returning a value Christoph Lameter
                   ` (3 preceding siblings ...)
  2006-04-04 13:30 ` Fix ia64 bit ops: Full barriers for bit operations returning a value Chen, Kenneth W
@ 2006-04-04 14:40 ` Christoph Lameter
  2006-04-04 16:48 ` Fix ia64 bit ops: Full barriers for bit operations returning Zoltan Menyhart
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Christoph Lameter @ 2006-04-04 14:40 UTC (permalink / raw)
  To: linux-ia64

On Tue, 4 Apr 2006, Zoltan Menyhart wrote:

> Could you consider using some cache hints, like "ld8.bias.nta"?
> "bias" is a hint to acquire exclusive ownership.
> "nta" is a hint to allocate the cache line only in L2
> (and side effect: to bias it to be replaced).
> All of the Itanium 2 processor's atomic instructions are handled
> exclusively by the L2 cache.

Could you come up with a patch? Currently, I do not seem to be able to 
spend enough time on it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fix ia64 bit ops: Full barriers for bit operations returning
  2006-04-03 18:32 Fix ia64 bit ops: Full barriers for bit operations returning a value Christoph Lameter
                   ` (4 preceding siblings ...)
  2006-04-04 14:40 ` Fix ia64 bit ops: Full barriers for bit operations returning a Christoph Lameter
@ 2006-04-04 16:48 ` Zoltan Menyhart
  2006-04-05 15:30 ` Zoltan Menyhart
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Zoltan Menyhart @ 2006-04-04 16:48 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 1345 bytes --]

Christoph Lameter wrote:

> Could you come up with a patch? Currently, I do not seem to be able to 
> spend enough time on it.

Please have a look at this sample.

Temporary solution while we are waiting for:

	test_and_set_bit (int nr, volatile void *addr, MODE_BARRIER)

& co.

I changed the temp. variables to be 64 bit wide in order to eliminate a "zxt4".

Here is what I get (NOP-s removed):

<reserve_bootmem_core+240>:  [MMI]       mf;;
<reserve_bootmem_core+241>:              and r10=31,r18
<reserve_bootmem_core+257>:              extr r11=r18,5,27;;
<reserve_bootmem_core+272>:  [MFI]       shladd r16=r11,2,r16
<reserve_bootmem_core+274>:              shl r17=r19,r10;;
<reserve_bootmem_core+288>:  [MMI]       ld4.bias.nta r20=[r16];;
<reserve_bootmem_core+289>:              or r22=r17,r20
<reserve_bootmem_core+305>:              mov.m ar.ccv=r20;;
<reserve_bootmem_core+320>:  [MMI]       cmpxchg4.acq.nta r21=[r16],r22,ar.ccv;;
<reserve_bootmem_core+322>:              cmp.eq p14,p15=r20,r21
<reserve_bootmem_core+336>:  [BBB] (p15) br.cond.dptk.few <reserve_bootmem_core+288>

Notes:

"reserve_bootmem_core()" is a typical example of unnecessary fencing.

What a nasty business reversing the "old" vs. "new" parameters:

#define ia64_cmpxchg(sem,ptr,old,new,size)
...
	      	_r_ = ia64_cmpxchg1_##sem((__u8 *) ptr, new, _o_);

[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 2378 bytes --]

--- old/include/asm-ia64/bitops.h	2006-04-04 18:19:50.000000000 +0200
+++ linux-2.6.16-ccb/include/asm-ia64/bitops.h	2006-04-04 18:17:26.000000000 +0200
@@ -154,19 +154,32 @@ __change_bit (int nr, volatile void *add
  * It also implies a memory barrier.
  */
 static __inline__ int
-test_and_set_bit (int nr, volatile void *addr)
+test_and_set_bit (int nr, void *addr)
 {
-	__u32 bit, old, new;
-	volatile __u32 *m;
+	__u64 bit, old, new;
+	__u64 *m;
 	CMPXCHG_BUGCHECK_DECL
 
-	m = (volatile __u32 *) addr + (nr >> 5);
-	bit = 1 << (nr & 31);
+	ia64_mf();
+	m = (__u64 *)((__u32 *) addr + (nr >> 5));
+	bit = 1UL << (nr & 31);
 	do {
 		CMPXCHG_BUGCHECK(m);
-		old = *m;
+		/*
+		 * "bias" is a hint to acquire exclusive ownership.
+		 * "nta" is a hint to allocate the cache line only in L2
+		 * and to bias it to be replaced.
+		 */
+		old = ia64_ld4_bias_nta(m);
 		new = old | bit;
-	} while (cmpxchg_acq(m, old, new) != old);
+		/*
+		 * All of the Itanium 2 processor's atomic instructions
+		 * are handled exclusively by L2.
+		 * "nta" is a hint not to allocate the cache line else
+		 * than in L2, to bias it to be replaced and not to write
+		 * it back into L3.
+		 */
+	} while (ia64_cmpxchg4_acq_nta(m, new, old) != old);
 	return (old & bit) != 0;
 }
 
--- old/include/asm-ia64/gcc_intrin.h	2006-04-04 18:19:50.000000000 +0200
+++ linux-2.6.16-ccb/include/asm-ia64/gcc_intrin.h	2006-04-04 18:17:49.000000000 +0200
@@ -221,6 +221,14 @@ register unsigned long ia64_r13 asm ("r1
 	asm volatile ("stf.spill [%0]=%1" :: "r"(x), "f"(__f__) : "memory");	\
 })
 
+#define ia64_ld4_bias_nta(ptr)							\
+({										\
+	__u64 ia64_intri_res;							\
+	asm volatile ("ld4.bias.nta %0=[%1]":					\
+			      "=r"(ia64_intri_res) : "r"(ptr) : "memory");	\
+	ia64_intri_res;								\
+})
+
 #define ia64_fetchadd4_acq(p, inc)						\
 ({										\
 										\
@@ -350,6 +358,15 @@ register unsigned long ia64_r13 asm ("r1
 	ia64_intri_res;									\
 })
 
+#define ia64_cmpxchg4_acq_nta(ptr, new, old)						\
+({											\
+	__u64 ia64_intri_res;								\
+	asm volatile ("mov ar.ccv=%0;;" :: "rO"(old));					\
+	asm volatile ("cmpxchg4.acq.nta %0=[%1],%2,ar.ccv":				\
+			      "=r"(ia64_intri_res) : "r"(ptr), "r"(new) : "memory");	\
+	ia64_intri_res;									\
+})
+
 #define ia64_cmpxchg8_acq(ptr, new, old)						\
 ({											\
 	__u64 ia64_intri_res;								\

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fix ia64 bit ops: Full barriers for bit operations returning
  2006-04-03 18:32 Fix ia64 bit ops: Full barriers for bit operations returning a value Christoph Lameter
                   ` (5 preceding siblings ...)
  2006-04-04 16:48 ` Fix ia64 bit ops: Full barriers for bit operations returning Zoltan Menyhart
@ 2006-04-05 15:30 ` Zoltan Menyhart
  2006-04-05 16:17 ` Fix ia64 bit ops: Full barriers for bit operations returning a Christoph Lameter
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Zoltan Menyhart @ 2006-04-05 15:30 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 1458 bytes --]

Christoph Lameter wrote:

> Could you come up with a patch? Currently, I do not seem to be able to 
> spend enough time on it.

Please have a look at this patch.

Temporary solution while we are waiting for:

	test_and_set_bit (int nr, volatile void *addr, MODE_BARRIER)

& co.

Changing the temp. variables to be 64 bit wide was not a good idea => alignment faults.
In order to eliminate the extra "zxt4", I hanged the type of the return values of my
intrinsic macros to be 32 bit wide. Here is what I get (NOP-s removed):

reserve_bootmem_core+240:  [MMI]       mf;;
reserve_bootmem_core+241:              and r10=31,r18
reserve_bootmem_core+257:              extr r11=r18,5,27;;
reserve_bootmem_core+272:  [MFI]       shladd r16=r11,2,r16
reserve_bootmem_core+274:              shl r17=r19,r10;;
reserve_bootmem_core+288:  [MMI]       ld4.bias.nta r20=[r16];;
reserve_bootmem_core+289:              or r22=r17,r20
reserve_bootmem_core+305:              mov.m ar.ccv=r20;;
reserve_bootmem_core+320:  [MMI]       cmpxchg4.acq.nta r21=[r16],r22,ar.ccv;;
reserve_bootmem_core+322:              cmp4.eq p14,p15=r20,r21
reserve_bootmem_core+336:  [BBB] (p15) br.cond.dptk.few reserve_bootmem_core+288

BTW why do all the intrinsic macros return 64 bit wide values, independently of
their actual operand width? E.g.:

#define ia64_cmpxchg4_acq(ptr, new, old)
...
	__u64 ia64_intri_res;

Thanks,

Zoltan

Signed-off-by: Zoltan Menyhart <Zoltan.Menyhart@bull.net>

[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 5780 bytes --]

--- old/include/asm-ia64/bitops.h	2006-04-04 18:19:50.000000000 +0200
+++ linux-2.6.16/include/asm-ia64/bitops.h	2006-04-05 16:49:12.000000000 +0200
@@ -7,6 +7,19 @@
  *
  * 02/06/02 find_next_bit() and find_first_bit() added from Erich Focht's ia64 O(1)
  *	    scheduler patch
+ * 06/04/05 Cache hints added:
+ *	    For loads before the atomic operations:
+ *		"bias" is a hint to acquire exclusive ownership.
+ *		"nta" is a hint to allocate the cache line only in L2
+ *		and to bias it to be replaced.
+ *	    For the atomic operations (as they are handled exclusively by L2):
+ *		"nta" is a hint not to allocate the cache line else than in L2,
+ *		to bias it to be replaced and not to write it back into L3.
+ *	    Added full fencing semantics to the atomic bit operations returning
+ *	    values.
+ *	    Note that it is a temporary solution while we are waiting for explicitly
+ *	    indicated fencing behavior, e.g.:
+ *			test_and_set_bit (int nr, void *addr, MODE_BARRIER)
  */
 
 #include <linux/compiler.h>
@@ -42,9 +55,9 @@ set_bit (int nr, volatile void *addr)
 	bit = 1 << (nr & 31);
 	do {
 		CMPXCHG_BUGCHECK(m);
-		old = *m;
+		old = ia64_ld4_bias_nta(m);
 		new = old | bit;
-	} while (cmpxchg_acq(m, old, new) != old);
+	} while (ia64_cmpxchg4_acq_nta(m, new, old) != old);
 }
 
 /**
@@ -89,9 +102,9 @@ clear_bit (int nr, volatile void *addr)
 	mask = ~(1 << (nr & 31));
 	do {
 		CMPXCHG_BUGCHECK(m);
-		old = *m;
+		old = ia64_ld4_bias_nta(m);
 		new = old & mask;
-	} while (cmpxchg_acq(m, old, new) != old);
+	} while (ia64_cmpxchg4_acq_nta(m, new, old) != old);
 }
 
 /**
@@ -100,14 +113,12 @@ clear_bit (int nr, volatile void *addr)
 static __inline__ void
 __clear_bit (int nr, volatile void *addr)
 {
-	volatile __u32 *p = (__u32 *) addr + (nr >> 5);
-	__u32 m = 1 << (nr & 31);
-	*p &= ~m;
+	*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
 }
 
 /**
  * change_bit - Toggle a bit in memory
- * @nr: Bit to clear
+ * @nr: Bit to change
  * @addr: Address to start counting from
  *
  * change_bit() is atomic and may not be reordered.
@@ -122,17 +133,17 @@ change_bit (int nr, volatile void *addr)
 	CMPXCHG_BUGCHECK_DECL
 
 	m = (volatile __u32 *) addr + (nr >> 5);
-	bit = (1 << (nr & 31));
+	bit = 1 << (nr & 31);
 	do {
 		CMPXCHG_BUGCHECK(m);
-		old = *m;
+		old = ia64_ld4_bias_nta(m);
 		new = old ^ bit;
-	} while (cmpxchg_acq(m, old, new) != old);
+	} while (ia64_cmpxchg4_acq_nta(m, new, old) != old);
 }
 
 /**
  * __change_bit - Toggle a bit in memory
- * @nr: the bit to set
+ * @nr: the bit to change
  * @addr: the address to start counting from
  *
  * Unlike change_bit(), this function is non-atomic and may be reordered.
@@ -160,13 +171,14 @@ test_and_set_bit (int nr, volatile void 
 	volatile __u32 *m;
 	CMPXCHG_BUGCHECK_DECL
 
+	ia64_mf();
 	m = (volatile __u32 *) addr + (nr >> 5);
 	bit = 1 << (nr & 31);
 	do {
 		CMPXCHG_BUGCHECK(m);
-		old = *m;
+		old = ia64_ld4_bias_nta(m);
 		new = old | bit;
-	} while (cmpxchg_acq(m, old, new) != old);
+	} while (ia64_cmpxchg4_acq_nta(m, new, old) != old);
 	return (old & bit) != 0;
 }
 
@@ -192,7 +204,7 @@ __test_and_set_bit (int nr, volatile voi
 
 /**
  * test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to set
+ * @nr: Bit to clear
  * @addr: Address to count from
  *
  * This operation is atomic and cannot be reordered.  
@@ -205,19 +217,20 @@ test_and_clear_bit (int nr, volatile voi
 	volatile __u32 *m;
 	CMPXCHG_BUGCHECK_DECL
 
+	ia64_mf();
 	m = (volatile __u32 *) addr + (nr >> 5);
 	mask = ~(1 << (nr & 31));
 	do {
 		CMPXCHG_BUGCHECK(m);
-		old = *m;
+		old = ia64_ld4_bias_nta(m);
 		new = old & mask;
-	} while (cmpxchg_acq(m, old, new) != old);
+	} while (ia64_cmpxchg4_acq_nta(m, new, old) != old);
 	return (old & ~mask) != 0;
 }
 
 /**
  * __test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to set
+ * @nr: Bit to clear
  * @addr: Address to count from
  *
  * This operation is non-atomic and can be reordered.  
@@ -237,7 +250,7 @@ __test_and_clear_bit(int nr, volatile vo
 
 /**
  * test_and_change_bit - Change a bit and return its old value
- * @nr: Bit to set
+ * @nr: Bit to change
  * @addr: Address to count from
  *
  * This operation is atomic and cannot be reordered.  
@@ -250,13 +263,14 @@ test_and_change_bit (int nr, volatile vo
 	volatile __u32 *m;
 	CMPXCHG_BUGCHECK_DECL
 
+	ia64_mf();
 	m = (volatile __u32 *) addr + (nr >> 5);
 	bit = (1 << (nr & 31));
 	do {
 		CMPXCHG_BUGCHECK(m);
-		old = *m;
+		old = ia64_ld4_bias_nta(m);
 		new = old ^ bit;
-	} while (cmpxchg_acq(m, old, new) != old);
+	} while (ia64_cmpxchg4_acq_nta(m, new, old) != old);
 	return (old & bit) != 0;
 }
 
--- old/include/asm-ia64/gcc_intrin.h	2006-04-04 18:19:50.000000000 +0200
+++ linux-2.6.16/include/asm-ia64/gcc_intrin.h	2006-04-05 17:07:29.000000000 +0200
@@ -221,6 +221,14 @@ register unsigned long ia64_r13 asm ("r1
 	asm volatile ("stf.spill [%0]=%1" :: "r"(x), "f"(__f__) : "memory");	\
 })
 
+#define ia64_ld4_bias_nta(ptr)							\
+({										\
+	__u32 ia64_intri_res;							\
+	asm volatile ("ld4.bias.nta %0=[%1]":					\
+			      "=r"(ia64_intri_res) : "r"(ptr) : "memory");	\
+	ia64_intri_res;								\
+})
+
 #define ia64_fetchadd4_acq(p, inc)						\
 ({										\
 										\
@@ -350,6 +358,15 @@ register unsigned long ia64_r13 asm ("r1
 	ia64_intri_res;									\
 })
 
+#define ia64_cmpxchg4_acq_nta(ptr, new, old)						\
+({											\
+	__u32 ia64_intri_res;								\
+	asm volatile ("mov ar.ccv=%0;;" :: "rO"(old));					\
+	asm volatile ("cmpxchg4.acq.nta %0=[%1],%2,ar.ccv":				\
+			      "=r"(ia64_intri_res) : "r"(ptr), "r"(new) : "memory");	\
+	ia64_intri_res;									\
+})
+
 #define ia64_cmpxchg8_acq(ptr, new, old)						\
 ({											\
 	__u64 ia64_intri_res;								\

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fix ia64 bit ops: Full barriers for bit operations returning a
  2006-04-03 18:32 Fix ia64 bit ops: Full barriers for bit operations returning a value Christoph Lameter
                   ` (6 preceding siblings ...)
  2006-04-05 15:30 ` Zoltan Menyhart
@ 2006-04-05 16:17 ` Christoph Lameter
  2006-04-05 16:44 ` Fix ia64 bit ops: Full barriers for bit operations returning Zoltan Menyhart
  2006-04-05 17:31 ` Fix ia64 bit ops: Full barriers for bit operations returning a Christoph Lameter
  9 siblings, 0 replies; 11+ messages in thread
From: Christoph Lameter @ 2006-04-05 16:17 UTC (permalink / raw)
  To: linux-ia64

On Tue, 4 Apr 2006, Zoltan Menyhart wrote:

> Christoph Lameter wrote:
> 
> > Could you come up with a patch? Currently, I do not seem to be able to spend
> > enough time on it.
> 
> Please have a look at this sample.

Could decomplicate this? Just use acquire / release and avoid the 
additional intrinsics. The purpose is first of all correctness. Then we 
can add some whiz bang on top. Also please sent patches inline not as 
attachments.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fix ia64 bit ops: Full barriers for bit operations returning
  2006-04-03 18:32 Fix ia64 bit ops: Full barriers for bit operations returning a value Christoph Lameter
                   ` (7 preceding siblings ...)
  2006-04-05 16:17 ` Fix ia64 bit ops: Full barriers for bit operations returning a Christoph Lameter
@ 2006-04-05 16:44 ` Zoltan Menyhart
  2006-04-05 17:31 ` Fix ia64 bit ops: Full barriers for bit operations returning a Christoph Lameter
  9 siblings, 0 replies; 11+ messages in thread
From: Zoltan Menyhart @ 2006-04-05 16:44 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 829 bytes --]

Christoph Lameter wrote:

> Could decomplicate this? Just use acquire / release and avoid the 
> additional intrinsics. The purpose is first of all correctness. Then we 
> can add some whiz bang on top. Also please sent patches inline not as 
> attachments.

I could have misunderstood what you wrote yesterday:

>>Could you consider using some cache hints, like "ld8.bias.nta"?
>>"bias" is a hint to acquire exclusive ownership.
>>"nta" is a hint to allocate the cache line only in L2
>>(and side effect: to bias it to be replaced).
>>All of the Itanium 2 processor's atomic instructions are handled
>>exclusively by the L2 cache.
> 
> Could you come up with a patch? Currently, I do not seem to be able to 
> spend enough time on it.

I thought you had asked for both the correct fencing and the cache hints.

Thanks,

Zoltan


[-- Attachment #2: bitops.diff --]
[-- Type: text/plain, Size: 1717 bytes --]

--- old/include/asm-ia64/bitops.h	2006-04-04 18:19:50.000000000 +0200
+++ linux-2.6.16/include/asm-ia64/bitops.h	2006-04-05 18:38:44.000000000 +0200
@@ -7,6 +7,11 @@
  *
  * 02/06/02 find_next_bit() and find_first_bit() added from Erich Focht's ia64 O(1)
  *	    scheduler patch
+ * 06/04/05 Added full fencing semantics to the atomic bit operations returning
+ *	    values.
+ *	    Note that it is a temporary solution while we are waiting for explicitly
+ *	    indicated fencing behavior, e.g.:
+ *			test_and_set_bit (int nr, void *addr, MODE_BARRIER)
  */
 
 #include <linux/compiler.h>
@@ -160,6 +165,7 @@ test_and_set_bit (int nr, volatile void 
 	volatile __u32 *m;
 	CMPXCHG_BUGCHECK_DECL
 
+	ia64_mf();
 	m = (volatile __u32 *) addr + (nr >> 5);
 	bit = 1 << (nr & 31);
 	do {
@@ -192,7 +198,7 @@ __test_and_set_bit (int nr, volatile voi
 
 /**
  * test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to set
+ * @nr: Bit to clear
  * @addr: Address to count from
  *
  * This operation is atomic and cannot be reordered.  
@@ -205,6 +211,7 @@ test_and_clear_bit (int nr, volatile voi
 	volatile __u32 *m;
 	CMPXCHG_BUGCHECK_DECL
 
+	ia64_mf();
 	m = (volatile __u32 *) addr + (nr >> 5);
 	mask = ~(1 << (nr & 31));
 	do {
@@ -237,7 +244,7 @@ __test_and_clear_bit(int nr, volatile vo
 
 /**
  * test_and_change_bit - Change a bit and return its old value
- * @nr: Bit to set
+ * @nr: Bit to change
  * @addr: Address to count from
  *
  * This operation is atomic and cannot be reordered.  
@@ -250,6 +257,7 @@ test_and_change_bit (int nr, volatile vo
 	volatile __u32 *m;
 	CMPXCHG_BUGCHECK_DECL
 
+	ia64_mf();
 	m = (volatile __u32 *) addr + (nr >> 5);
 	bit = (1 << (nr & 31));
 	do {

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fix ia64 bit ops: Full barriers for bit operations returning a
  2006-04-03 18:32 Fix ia64 bit ops: Full barriers for bit operations returning a value Christoph Lameter
                   ` (8 preceding siblings ...)
  2006-04-05 16:44 ` Fix ia64 bit ops: Full barriers for bit operations returning Zoltan Menyhart
@ 2006-04-05 17:31 ` Christoph Lameter
  9 siblings, 0 replies; 11+ messages in thread
From: Christoph Lameter @ 2006-04-05 17:31 UTC (permalink / raw)
  To: linux-ia64

On Wed, 5 Apr 2006, Zoltan Menyhart wrote:

> Christoph Lameter wrote:
> 
> > Could decomplicate this? Just use acquire / release and avoid the additional
> > intrinsics. The purpose is first of all correctness. Then we can add some
> > whiz bang on top. Also please sent patches inline not as attachments.
> 
> I could have misunderstood what you wrote yesterday:
> 
> > > Could you consider using some cache hints, like "ld8.bias.nta"?
> > > "bias" is a hint to acquire exclusive ownership.
> > > "nta" is a hint to allocate the cache line only in L2
> > > (and side effect: to bias it to be replaced).
> > > All of the Itanium 2 processor's atomic instructions are handled
> > > exclusively by the L2 cache.
> > 
> > Could you come up with a patch? Currently, I do not seem to be able to spend
> > enough time on it.
> 
> I thought you had asked for both the correct fencing and the cache hints.

Oh. Sorry.

The new patch seems to be okay but it still was not sent inline and is
therefore difficult to review for most of us.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-04-05 17:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-03 18:32 Fix ia64 bit ops: Full barriers for bit operations returning a value Christoph Lameter
2006-04-03 23:11 ` Chen, Kenneth W
2006-04-04  1:05 ` Fix ia64 bit ops: Full barriers for bit operations returning Nick Piggin
2006-04-04  2:11 ` Fix ia64 bit ops: Full barriers for bit operations returning a Christoph Lameter
2006-04-04 13:30 ` Fix ia64 bit ops: Full barriers for bit operations returning a value Chen, Kenneth W
2006-04-04 14:40 ` Fix ia64 bit ops: Full barriers for bit operations returning a Christoph Lameter
2006-04-04 16:48 ` Fix ia64 bit ops: Full barriers for bit operations returning Zoltan Menyhart
2006-04-05 15:30 ` Zoltan Menyhart
2006-04-05 16:17 ` Fix ia64 bit ops: Full barriers for bit operations returning a Christoph Lameter
2006-04-05 16:44 ` Fix ia64 bit ops: Full barriers for bit operations returning Zoltan Menyhart
2006-04-05 17:31 ` Fix ia64 bit ops: Full barriers for bit operations returning a Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox