[PATCH v8 4/4] qrwlock: Use smp_store_release() in write

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-08 16:59 [PATCH v8 0/4] qrwlock: Introducing a queue read/write lock implementation Waiman Long
@ 2014-01-08 16:59 ` Waiman Long
  0 siblings, 0 replies; 31+ messages in thread
From: Waiman Long @ 2014-01-08 16:59 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Arnd Bergmann
  Cc: linux-arch, x86, linux-kernel, Peter Zijlstra, Steven Rostedt,
	Andrew Morton, Michel Lespinasse, Andi Kleen, Rik van Riel,
	Paul E. McKenney, Linus Torvalds, Raghavendra K T, George Spelvin,
	Tim Chen, Aswin Chandramouleeswaran", Scott J Norton,
	Waiman Long

This patch modifies the queue_write_unlock() function to use the
new smp_store_release() function in another pending patch. It also
removes the temporary implementation of smp_load_acquire() and
smp_store_release() function in qrwlock.c.

This patch should only be merged if PeterZ's linux-arch patch patch
was merged.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/asm-generic/qrwlock.h |    4 +---
 kernel/locking/qrwlock.c      |   34 ----------------------------------
 2 files changed, 1 insertions(+), 37 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 2b9a7b4..4d4bd04 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -179,9 +179,7 @@ static inline void queue_write_unlock(struct qrwlock *lock)
 	/*
 	 * Make sure that none of the critical section will be leaked out.
 	 */
-	smp_mb__before_clear_bit();
-	ACCESS_ONCE(lock->cnts.writer) = 0;
-	smp_mb__after_clear_bit();
+	smp_store_release(&lock->cnts.writer, 0)
 }
 
 /*
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 1b3ffb2..3d3ba2b 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -48,40 +48,6 @@
 # define arch_mutex_cpu_relax() cpu_relax()
 #endif
 
-#ifndef smp_load_acquire
-# ifdef CONFIG_X86
-#   define smp_load_acquire(p)				\
-	({						\
-		typeof(*p) ___p1 = ACCESS_ONCE(*p);	\
-		barrier();				\
-		___p1;					\
-	})
-# else
-#   define smp_load_acquire(p)				\
-	({						\
-		typeof(*p) ___p1 = ACCESS_ONCE(*p);	\
-		smp_mb();				\
-		___p1;					\
-	})
-# endif
-#endif
-
-#ifndef smp_store_release
-# ifdef CONFIG_X86
-#   define smp_store_release(p, v)			\
-	do {						\
-		barrier();				\
-		ACCESS_ONCE(*p) = v;			\
-	} while (0)
-# else
-#   define smp_store_release(p, v)			\
-	do {						\
-		smp_mb();				\
-		ACCESS_ONCE(*p) = v;			\
-	} while (0)
-# endif
-#endif
-
 /*
  * If an xadd (exchange-add) macro isn't available, simulate one with
  * the atomic_add_return() function.
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
@ 2014-01-13  2:47 Daniel J Blueman
  2014-01-13  3:49 ` Paul E. McKenney
  2014-01-13 16:41 ` Waiman Long
  0 siblings, 2 replies; 31+ messages in thread
From: Daniel J Blueman @ 2014-01-13  2:47 UTC (permalink / raw)
  To: Waiman Long; +Cc: Paul E. McKenney, Linux Kernel

On Thursday, 9 January 2014 01:10:03 UTC+8, Waiman Long  wrote:
 > This patch modifies the queue_write_unlock() function to use the
 > new smp_store_release() function in another pending patch. It also
 > removes the temporary implementation of smp_load_acquire() and
 > smp_store_release() function in qrwlock.c.
 >
 > This patch should only be merged if PeterZ's linux-arch patch patch
 > was merged.
 >
 > Signed-off-by: Waiman Long <Waiman.Long@hp.com>
 > Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
 > ---
 >  include/asm-generic/qrwlock.h |    4 +---
 >  kernel/locking/qrwlock.c      |   34 ----------------------------------
 >  2 files changed, 1 insertions(+), 37 deletions(-)
 >
 > diff --git a/include/asm-generic/qrwlock.h 
b/include/asm-generic/qrwlock.h
 > index 2b9a7b4..4d4bd04 100644
 > --- a/include/asm-generic/qrwlock.h
 > +++ b/include/asm-generic/qrwlock.h
 > @@ -179,9 +179,7 @@ static inline void queue_write_unlock(struct 
qrwlock *lock)
 >  	/*
 >  	 * Make sure that none of the critical section will be leaked out.
 >  	 */
 > -	smp_mb__before_clear_bit();
 > -	ACCESS_ONCE(lock->cnts.writer) = 0;
 > -	smp_mb__after_clear_bit();
 > +	smp_store_release(&lock->cnts.writer, 0)

This will fail compilation, so probably needs further testing with 
Peter's load_acquire/store_release barrier patches.

Thanks,
   Daniel
-- 
Daniel J Blueman
Principal Software Engineer, Numascale

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-13  2:47 [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock() Daniel J Blueman
@ 2014-01-13  3:49 ` Paul E. McKenney
  2014-01-13 16:41 ` Waiman Long
  1 sibling, 0 replies; 31+ messages in thread
From: Paul E. McKenney @ 2014-01-13  3:49 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Waiman Long, Linux Kernel

On Mon, Jan 13, 2014 at 10:47:36AM +0800, Daniel J Blueman wrote:
> On Thursday, 9 January 2014 01:10:03 UTC+8, Waiman Long  wrote:
> > This patch modifies the queue_write_unlock() function to use the
> > new smp_store_release() function in another pending patch. It also
> > removes the temporary implementation of smp_load_acquire() and
> > smp_store_release() function in qrwlock.c.
> >
> > This patch should only be merged if PeterZ's linux-arch patch patch
> > was merged.
> >
> > Signed-off-by: Waiman Long <Waiman.Long@hp.com>
> > Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  include/asm-generic/qrwlock.h |    4 +---
> >  kernel/locking/qrwlock.c      |   34 ----------------------------------
> >  2 files changed, 1 insertions(+), 37 deletions(-)
> >
> > diff --git a/include/asm-generic/qrwlock.h
> b/include/asm-generic/qrwlock.h
> > index 2b9a7b4..4d4bd04 100644
> > --- a/include/asm-generic/qrwlock.h
> > +++ b/include/asm-generic/qrwlock.h
> > @@ -179,9 +179,7 @@ static inline void queue_write_unlock(struct
> qrwlock *lock)
> >  	/*
> >  	 * Make sure that none of the critical section will be leaked out.
> >  	 */
> > -	smp_mb__before_clear_bit();
> > -	ACCESS_ONCE(lock->cnts.writer) = 0;
> > -	smp_mb__after_clear_bit();
> > +	smp_store_release(&lock->cnts.writer, 0)
> 
> This will fail compilation, so probably needs further testing with
> Peter's load_acquire/store_release barrier patches.

Peter's patch just hit -tip, so this should be esay to do.  In Waiman's
defense, he does call attention to this in the commit log.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-13  2:47 [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock() Daniel J Blueman
  2014-01-13  3:49 ` Paul E. McKenney
@ 2014-01-13 16:41 ` Waiman Long
  2014-01-14  2:28   ` Daniel J Blueman
  1 sibling, 1 reply; 31+ messages in thread
From: Waiman Long @ 2014-01-13 16:41 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Daniel J Blueman, Paul E. McKenney, Linux Kernel

On 01/12/2014 09:47 PM, Daniel J Blueman wrote:
> On Thursday, 9 January 2014 01:10:03 UTC+8, Waiman Long  wrote:
> > This patch modifies the queue_write_unlock() function to use the
> > new smp_store_release() function in another pending patch. It also
> > removes the temporary implementation of smp_load_acquire() and
> > smp_store_release() function in qrwlock.c.
> >
> > This patch should only be merged if PeterZ's linux-arch patch patch
> > was merged.
> >
> > Signed-off-by: Waiman Long <Waiman.Long@hp.com>
> > Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  include/asm-generic/qrwlock.h |    4 +---
> >  kernel/locking/qrwlock.c      |   34 
> ----------------------------------
> >  2 files changed, 1 insertions(+), 37 deletions(-)
> >
> > diff --git a/include/asm-generic/qrwlock.h 
> b/include/asm-generic/qrwlock.h
> > index 2b9a7b4..4d4bd04 100644
> > --- a/include/asm-generic/qrwlock.h
> > +++ b/include/asm-generic/qrwlock.h
> > @@ -179,9 +179,7 @@ static inline void queue_write_unlock(struct 
> qrwlock *lock)
> >      /*
> >       * Make sure that none of the critical section will be leaked out.
> >       */
> > -    smp_mb__before_clear_bit();
> > -    ACCESS_ONCE(lock->cnts.writer) = 0;
> > -    smp_mb__after_clear_bit();
> > +    smp_store_release(&lock->cnts.writer, 0)
>
> This will fail compilation, so probably needs further testing with 
> Peter's load_acquire/store_release barrier patches.
>

Peter,

I found out that the build failure was caused by the fact that the 
__native_word() macro (used internally by compiletime_assert_atomic()) 
allows only a size of 4 or 8 for x86-64. The data type that I used is a 
byte. Is there a reason why byte and short are not considered native?

-Longman

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-13 16:41 ` Waiman Long
@ 2014-01-14  2:28   ` Daniel J Blueman
  2014-01-14 11:03     ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel J Blueman @ 2014-01-14  2:28 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra; +Cc: Paul E. McKenney, Linux Kernel

On 14/01/2014 00:41, Waiman Long wrote:
> On 01/12/2014 09:47 PM, Daniel J Blueman wrote:
>> On Thursday, 9 January 2014 01:10:03 UTC+8, Waiman Long  wrote:
>> > This patch modifies the queue_write_unlock() function to use the
>> > new smp_store_release() function in another pending patch. It also
>> > removes the temporary implementation of smp_load_acquire() and
>> > smp_store_release() function in qrwlock.c.
>> >
>> > This patch should only be merged if PeterZ's linux-arch patch patch
>> > was merged.
>> >
>> > Signed-off-by: Waiman Long <Waiman.Long@hp.com>
>> > Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> > ---
>> >  include/asm-generic/qrwlock.h |    4 +---
>> >  kernel/locking/qrwlock.c      |   34
>> ----------------------------------
>> >  2 files changed, 1 insertions(+), 37 deletions(-)
>> >
>> > diff --git a/include/asm-generic/qrwlock.h
>> b/include/asm-generic/qrwlock.h
>> > index 2b9a7b4..4d4bd04 100644
>> > --- a/include/asm-generic/qrwlock.h
>> > +++ b/include/asm-generic/qrwlock.h
>> > @@ -179,9 +179,7 @@ static inline void queue_write_unlock(struct
>> qrwlock *lock)
>> >      /*
>> >       * Make sure that none of the critical section will be leaked out.
>> >       */
>> > -    smp_mb__before_clear_bit();
>> > -    ACCESS_ONCE(lock->cnts.writer) = 0;
>> > -    smp_mb__after_clear_bit();
>> > +    smp_store_release(&lock->cnts.writer, 0)
>>
>> This will fail compilation, so probably needs further testing with
>> Peter's load_acquire/store_release barrier patches.
>>
>
> Peter,
>
> I found out that the build failure was caused by the fact that the
> __native_word() macro (used internally by compiletime_assert_atomic())
> allows only a size of 4 or 8 for x86-64. The data type that I used is a
> byte. Is there a reason why byte and short are not considered native?

It seems likely it was implemented like that since there was no existing 
need; long can be relied on as the largest native type, so this should 
suffice and works here:

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index fe7a686..dac91d7 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -299,8 +299,8 @@ void ftrace_likely_update(struct ftrace_branch_data 
*f, int val, int expect);
  #endif

  /* Is this type a native word size -- useful for atomic operations */
-#ifndef __native_word
-# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == 
sizeof(long))
+#ifndef __native_type
+# define __native_type(t) (sizeof(t) <= sizeof(long))
  #endif

  /* Compile time object size, -1 for unknown */
@@ -343,8 +343,8 @@ void ftrace_likely_update(struct ftrace_branch_data 
*f, int val, int expect);
  	_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)

  #define compiletime_assert_atomic_type(t)				\
-	compiletime_assert(__native_word(t),				\
-		"Need native word sized stores/loads for atomicity.")
+	compiletime_assert(__native_type(t),				\
+		"Need native sized stores/loads for atomicity.")

  /*
   * Prevent the compiler from merging or refetching accesses.  The compiler

Signed-off-by: Daniel J Blueman <daniel@numascale.com>
-- 
Daniel J Blueman
Principal Software Engineer, Numascale

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-14  2:28   ` Daniel J Blueman
@ 2014-01-14 11:03     ` Peter Zijlstra
  2014-01-14 15:25       ` Waiman Long
  2014-01-14 17:08       ` Matt Turner
  0 siblings, 2 replies; 31+ messages in thread
From: Peter Zijlstra @ 2014-01-14 11:03 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Waiman Long, Paul E. McKenney, Linux Kernel, rth, ink, mattst88,
	Linus Torvalds

On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
> >Peter,
> >
> >I found out that the build failure was caused by the fact that the
> >__native_word() macro (used internally by compiletime_assert_atomic())
> >allows only a size of 4 or 8 for x86-64. The data type that I used is a
> >byte. Is there a reason why byte and short are not considered native?
> 
> It seems likely it was implemented like that since there was no existing
> need; long can be relied on as the largest native type, so this should
> suffice and works here:

There's Alphas that cannot actually atomically adres a byte; I do not
konw if Linux cares about them, but if it does, we cannot in fact rely
on this in generic primitives like this.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-14 11:03     ` Peter Zijlstra
@ 2014-01-14 15:25       ` Waiman Long
  2014-01-14 17:08       ` Matt Turner
  1 sibling, 0 replies; 31+ messages in thread
From: Waiman Long @ 2014-01-14 15:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel J Blueman, Paul E. McKenney, Linux Kernel, rth, ink,
	mattst88, Linus Torvalds

On 01/14/2014 06:03 AM, Peter Zijlstra wrote:
> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
>>> Peter,
>>>
>>> I found out that the build failure was caused by the fact that the
>>> __native_word() macro (used internally by compiletime_assert_atomic())
>>> allows only a size of 4 or 8 for x86-64. The data type that I used is a
>>> byte. Is there a reason why byte and short are not considered native?
>> It seems likely it was implemented like that since there was no existing
>> need; long can be relied on as the largest native type, so this should
>> suffice and works here:
> There's Alphas that cannot actually atomically adres a byte; I do not
> konw if Linux cares about them, but if it does, we cannot in fact rely
> on this in generic primitives like this.

Thank for the explanation.

Can we allow architectural override of __native_word() macro? Like

#ifdef __arch_native_word
#define __native_word(t)    __arch_native_word(t)
#else
#define __native_word(t)    (sizeof(t) == sizeof(int) || sizeof(t) == 
sizeof(long))
#endif

In this way, we can allow x86 to support byte-based atomic type while 
restricting the generic macro to int and long only. I will also modify 
the code to use cmpxchg() when byte is not an atomic type.

-Longman



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-14 11:03     ` Peter Zijlstra
  2014-01-14 15:25       ` Waiman Long
@ 2014-01-14 17:08       ` Matt Turner
  2014-01-14 18:01         ` Richard Henderson
  1 sibling, 1 reply; 31+ messages in thread
From: Matt Turner @ 2014-01-14 17:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel J Blueman, Waiman Long, Paul E. McKenney, Linux Kernel,
	Richard Henderson, Ivan Kokshaysky, Linus Torvalds

On Tue, Jan 14, 2014 at 3:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
>> >Peter,
>> >
>> >I found out that the build failure was caused by the fact that the
>> >__native_word() macro (used internally by compiletime_assert_atomic())
>> >allows only a size of 4 or 8 for x86-64. The data type that I used is a
>> >byte. Is there a reason why byte and short are not considered native?
>>
>> It seems likely it was implemented like that since there was no existing
>> need; long can be relied on as the largest native type, so this should
>> suffice and works here:
>
> There's Alphas that cannot actually atomically adres a byte; I do not
> konw if Linux cares about them, but if it does, we cannot in fact rely
> on this in generic primitives like this.

That's right, and thanks for the heads-up. Alpha can only address 4
and 8 bytes atomically. (LDL_L, LDQ_L, STL_C, STQ_C).

The Byte-Word extension in EV56 doesn't add new atomics, so in fact no
Alphas can address < 4 bytes atomically.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-14 17:08       ` Matt Turner
@ 2014-01-14 18:01         ` Richard Henderson
  2014-01-14 19:09           ` Waiman Long
  2014-01-14 23:44           ` Paul E. McKenney
  0 siblings, 2 replies; 31+ messages in thread
From: Richard Henderson @ 2014-01-14 18:01 UTC (permalink / raw)
  To: Matt Turner, Peter Zijlstra
  Cc: Daniel J Blueman, Waiman Long, Paul E. McKenney, Linux Kernel,
	Ivan Kokshaysky, Linus Torvalds

On 01/14/2014 09:08 AM, Matt Turner wrote:
> On Tue, Jan 14, 2014 at 3:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
>>>> Peter,
>>>>
>>>> I found out that the build failure was caused by the fact that the
>>>> __native_word() macro (used internally by compiletime_assert_atomic())
>>>> allows only a size of 4 or 8 for x86-64. The data type that I used is a
>>>> byte. Is there a reason why byte and short are not considered native?
>>>
>>> It seems likely it was implemented like that since there was no existing
>>> need; long can be relied on as the largest native type, so this should
>>> suffice and works here:
>>
>> There's Alphas that cannot actually atomically adres a byte; I do not
>> konw if Linux cares about them, but if it does, we cannot in fact rely
>> on this in generic primitives like this.
> 
> That's right, and thanks for the heads-up. Alpha can only address 4
> and 8 bytes atomically. (LDL_L, LDQ_L, STL_C, STQ_C).
> 
> The Byte-Word extension in EV56 doesn't add new atomics, so in fact no
> Alphas can address < 4 bytes atomically.
> 

Emulated with aligned 4 byte atomics, and masking.  The same is true for arm,
ppc, mips which, depending on cpu, also lack < 4 byte atomics.


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-14 18:01         ` Richard Henderson
@ 2014-01-14 19:09           ` Waiman Long
  2014-01-14 20:20             ` Peter Zijlstra
  2014-01-14 23:44           ` Paul E. McKenney
  1 sibling, 1 reply; 31+ messages in thread
From: Waiman Long @ 2014-01-14 19:09 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Matt Turner, Peter Zijlstra, Daniel J Blueman, Paul E. McKenney,
	Linux Kernel, Ivan Kokshaysky, Linus Torvalds

On 01/14/2014 01:01 PM, Richard Henderson wrote:
> On 01/14/2014 09:08 AM, Matt Turner wrote:
>> On Tue, Jan 14, 2014 at 3:03 AM, Peter Zijlstra<peterz@infradead.org>  wrote:
>>> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
>>>>> Peter,
>>>>>
>>>>> I found out that the build failure was caused by the fact that the
>>>>> __native_word() macro (used internally by compiletime_assert_atomic())
>>>>> allows only a size of 4 or 8 for x86-64. The data type that I used is a
>>>>> byte. Is there a reason why byte and short are not considered native?
>>>> It seems likely it was implemented like that since there was no existing
>>>> need; long can be relied on as the largest native type, so this should
>>>> suffice and works here:
>>> There's Alphas that cannot actually atomically adres a byte; I do not
>>> konw if Linux cares about them, but if it does, we cannot in fact rely
>>> on this in generic primitives like this.
>> That's right, and thanks for the heads-up. Alpha can only address 4
>> and 8 bytes atomically. (LDL_L, LDQ_L, STL_C, STQ_C).
>>
>> The Byte-Word extension in EV56 doesn't add new atomics, so in fact no
>> Alphas can address<  4 bytes atomically.
>>
> Emulated with aligned 4 byte atomics, and masking.  The same is true for arm,
> ppc, mips which, depending on cpu, also lack<  4 byte atomics.
>

I would like to know if the action of writing out a byte (e.g. *byte = 
0) is atomic in those architectures or is emulated by a 
compiler-generated software read-modify-write.

-Longman

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-14 19:09           ` Waiman Long
@ 2014-01-14 20:20             ` Peter Zijlstra
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2014-01-14 20:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Richard Henderson, Matt Turner, Daniel J Blueman,
	Paul E. McKenney, Linux Kernel, Ivan Kokshaysky, Linus Torvalds

On Tue, Jan 14, 2014 at 02:09:30PM -0500, Waiman Long wrote:
> I would like to know if the action of writing out a byte (e.g. *byte = 0) is
> atomic in those architectures or is emulated by a compiler-generated
> software read-modify-write.

So on Alpha pre ev56 something like:

  *(volatile u8 *)foo = 0;

_Should_ cause a compile error as the hardware has to do a rmw which is
not compatible with the requirements for volatile -- that said I do not
know if a compiler will actually generate this error.

I can well imagine other load-store archs suffering similar problems,
although I'm not aware of any.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-14 18:01         ` Richard Henderson
  2014-01-14 19:09           ` Waiman Long
@ 2014-01-14 23:44           ` Paul E. McKenney
  2014-01-15  0:25             ` Linus Torvalds
  2014-01-15  3:08             ` Daniel J Blueman
  1 sibling, 2 replies; 31+ messages in thread
From: Paul E. McKenney @ 2014-01-14 23:44 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Matt Turner, Peter Zijlstra, Daniel J Blueman, Waiman Long,
	Linux Kernel, Ivan Kokshaysky, Linus Torvalds

On Tue, Jan 14, 2014 at 10:01:04AM -0800, Richard Henderson wrote:
> On 01/14/2014 09:08 AM, Matt Turner wrote:
> > On Tue, Jan 14, 2014 at 3:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
> >>>> Peter,
> >>>>
> >>>> I found out that the build failure was caused by the fact that the
> >>>> __native_word() macro (used internally by compiletime_assert_atomic())
> >>>> allows only a size of 4 or 8 for x86-64. The data type that I used is a
> >>>> byte. Is there a reason why byte and short are not considered native?
> >>>
> >>> It seems likely it was implemented like that since there was no existing
> >>> need; long can be relied on as the largest native type, so this should
> >>> suffice and works here:
> >>
> >> There's Alphas that cannot actually atomically adres a byte; I do not
> >> konw if Linux cares about them, but if it does, we cannot in fact rely
> >> on this in generic primitives like this.
> > 
> > That's right, and thanks for the heads-up. Alpha can only address 4
> > and 8 bytes atomically. (LDL_L, LDQ_L, STL_C, STQ_C).
> > 
> > The Byte-Word extension in EV56 doesn't add new atomics, so in fact no
> > Alphas can address < 4 bytes atomically.
> 
> Emulated with aligned 4 byte atomics, and masking.  The same is true for arm,
> ppc, mips which, depending on cpu, also lack < 4 byte atomics.

Which means that Alpha should be able to similarly emulate 1-byte and
2-byte atomics, correct?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-14 23:44           ` Paul E. McKenney
@ 2014-01-15  0:25             ` Linus Torvalds
  2014-01-15  2:39               ` Paul E. McKenney
  2014-01-15  3:08             ` Daniel J Blueman
  1 sibling, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2014-01-15  0:25 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Richard Henderson, Matt Turner, Peter Zijlstra, Daniel J Blueman,
	Waiman Long, Linux Kernel, Ivan Kokshaysky

On Wed, Jan 15, 2014 at 6:44 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> Which means that Alpha should be able to similarly emulate 1-byte and
> 2-byte atomics, correct?

Not reasonably, no.

The ldl/stc implementation on early alpha was so broken as to be
unusable. It's not actually done in the cache, it WENT OUT ON THE BUS.
We're talking 70's style "external lock signal" kind of things like
the 8086 did for locked cycles before the advent of caches, the kind
that nobody sane has done for a long long time.

So realistically, you absolutely do not want to use those things to
emulate atomic byte/word accesses. The whole point of "load_acquire()"
and "store_release()" is that it's supposed to be cheaper than a
locked access, and can be done with just a barrier instruction or a
special instruction flag.

If you just want to do a store release, on alpha you'd want to
implement that as a full memory barrier followed by a store. It
doesn't get the advantage of a real release consistency model, but at
least it's not doing an external bus access. But you can only do that
store as a 4-byte or 8-byte store.on the older alphas (byte and word
stores work on newer ones).

Of course, it's entirely possible that nobody cares..

             Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-15  0:25             ` Linus Torvalds
@ 2014-01-15  2:39               ` Paul E. McKenney
  2014-01-15  8:07                 ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2014-01-15  2:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Richard Henderson, Matt Turner, Peter Zijlstra, Daniel J Blueman,
	Waiman Long, Linux Kernel, Ivan Kokshaysky

On Wed, Jan 15, 2014 at 07:25:04AM +0700, Linus Torvalds wrote:
> On Wed, Jan 15, 2014 at 6:44 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > Which means that Alpha should be able to similarly emulate 1-byte and
> > 2-byte atomics, correct?
> 
> Not reasonably, no.
> 
> The ldl/stc implementation on early alpha was so broken as to be
> unusable. It's not actually done in the cache, it WENT OUT ON THE BUS.
> We're talking 70's style "external lock signal" kind of things like
> the 8086 did for locked cycles before the advent of caches, the kind
> that nobody sane has done for a long long time.
> 
> So realistically, you absolutely do not want to use those things to
> emulate atomic byte/word accesses. The whole point of "load_acquire()"
> and "store_release()" is that it's supposed to be cheaper than a
> locked access, and can be done with just a barrier instruction or a
> special instruction flag.
> 
> If you just want to do a store release, on alpha you'd want to
> implement that as a full memory barrier followed by a store. It
> doesn't get the advantage of a real release consistency model, but at
> least it's not doing an external bus access. But you can only do that
> store as a 4-byte or 8-byte store.on the older alphas (byte and word
> stores work on newer ones).
> 
> Of course, it's entirely possible that nobody cares..

That would be my hope.  ;-)

If nobody cares about Alpha period, it is easy.  However, the last time
that I tried that approach, they sent me a URL of a wiki showing Alpha
systems still running mainline.  But a slow-but-working approach for
Alpha does seem reasonable, even for those still running Linux on Alpha.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-14 23:44           ` Paul E. McKenney
  2014-01-15  0:25             ` Linus Torvalds
@ 2014-01-15  3:08             ` Daniel J Blueman
  1 sibling, 0 replies; 31+ messages in thread
From: Daniel J Blueman @ 2014-01-15  3:08 UTC (permalink / raw)
  To: paulmck, Richard Henderson
  Cc: Matt Turner, Peter Zijlstra, Waiman Long, Linux Kernel,
	Ivan Kokshaysky, Linus Torvalds

On 01/15/2014 07:44 AM, Paul E. McKenney wrote:
> On Tue, Jan 14, 2014 at 10:01:04AM -0800, Richard Henderson wrote:
>> On 01/14/2014 09:08 AM, Matt Turner wrote:
>>> On Tue, Jan 14, 2014 at 3:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
>>>>>> Peter,
>>>>>>
>>>>>> I found out that the build failure was caused by the fact that the
>>>>>> __native_word() macro (used internally by compiletime_assert_atomic())
>>>>>> allows only a size of 4 or 8 for x86-64. The data type that I used is a
>>>>>> byte. Is there a reason why byte and short are not considered native?
>>>>>
>>>>> It seems likely it was implemented like that since there was no existing
>>>>> need; long can be relied on as the largest native type, so this should
>>>>> suffice and works here:
>>>>
>>>> There's Alphas that cannot actually atomically adres a byte; I do not
>>>> konw if Linux cares about them, but if it does, we cannot in fact rely
>>>> on this in generic primitives like this.
>>>
>>> That's right, and thanks for the heads-up. Alpha can only address 4
>>> and 8 bytes atomically. (LDL_L, LDQ_L, STL_C, STQ_C).
>>>
>>> The Byte-Word extension in EV56 doesn't add new atomics, so in fact no
>>> Alphas can address < 4 bytes atomically.
>>
>> Emulated with aligned 4 byte atomics, and masking.  The same is true for arm,
>> ppc, mips which, depending on cpu, also lack < 4 byte atomics.
>
> Which means that Alpha should be able to similarly emulate 1-byte and
> 2-byte atomics, correct?

If it's not possible to guarantee that GCC emits the 4-byte atomics by 
using a union, then we could emit the instructions via assembly. We'd 
introduce a macro to ensure lock word alignment, and this would be safe 
for unsigned counting up to the packed type limit. Maybe that's just too 
over-constrained though.

Thanks,
   Daniel
-- 
Daniel J Blueman
Principal Software Engineer, Numascale

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-15  2:39               ` Paul E. McKenney
@ 2014-01-15  8:07                 ` Peter Zijlstra
  2014-01-15 20:53                   ` Paul E. McKenney
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2014-01-15  8:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Richard Henderson, Matt Turner, Daniel J Blueman,
	Waiman Long, Linux Kernel, Ivan Kokshaysky

On Tue, Jan 14, 2014 at 06:39:58PM -0800, Paul E. McKenney wrote:
> > If you just want to do a store release, on alpha you'd want to
> > implement that as a full memory barrier followed by a store. It
> > doesn't get the advantage of a real release consistency model, but at
> > least it's not doing an external bus access. But you can only do that
> > store as a 4-byte or 8-byte store.on the older alphas (byte and word
> > stores work on newer ones).
> > 
> > Of course, it's entirely possible that nobody cares..
> 
> That would be my hope.  ;-)
> 
> If nobody cares about Alpha period, it is easy.  However, the last time
> that I tried that approach, they sent me a URL of a wiki showing Alpha
> systems still running mainline.  But a slow-but-working approach for
> Alpha does seem reasonable, even for those still running Linux on Alpha.

Well, if they're all EV56 or later we're still good as they can actually
do what we need.

But I don't think a ll/sc implementation of the store_release can even
work, because in that case all users of the other bytes also need a
ll/sc around them but how are we to know about them?

So the only real way to allow store_release on 8/16 bit values is by
removing all Alpha support _pre_ EV56 :/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-15  8:07                 ` Peter Zijlstra
@ 2014-01-15 20:53                   ` Paul E. McKenney
  2014-01-15 23:21                     ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2014-01-15 20:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Richard Henderson, Matt Turner, Daniel J Blueman,
	Waiman Long, Linux Kernel, Ivan Kokshaysky

On Wed, Jan 15, 2014 at 09:07:53AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 14, 2014 at 06:39:58PM -0800, Paul E. McKenney wrote:
> > > If you just want to do a store release, on alpha you'd want to
> > > implement that as a full memory barrier followed by a store. It
> > > doesn't get the advantage of a real release consistency model, but at
> > > least it's not doing an external bus access. But you can only do that
> > > store as a 4-byte or 8-byte store.on the older alphas (byte and word
> > > stores work on newer ones).
> > > 
> > > Of course, it's entirely possible that nobody cares..
> > 
> > That would be my hope.  ;-)
> > 
> > If nobody cares about Alpha period, it is easy.  However, the last time
> > that I tried that approach, they sent me a URL of a wiki showing Alpha
> > systems still running mainline.  But a slow-but-working approach for
> > Alpha does seem reasonable, even for those still running Linux on Alpha.
> 
> Well, if they're all EV56 or later we're still good as they can actually
> do what we need.
> 
> But I don't think a ll/sc implementation of the store_release can even
> work, because in that case all users of the other bytes also need a
> ll/sc around them but how are we to know about them?
> 
> So the only real way to allow store_release on 8/16 bit values is by
> removing all Alpha support _pre_ EV56 :/

Fair point...  We could demand Alpha-specific alignment, but that would
get really ugly really quickly.

But we did drop support for SMP i386 quite some time ago, so perhaps
it is time to drop support for SMP Alpha pre-EV56.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-15 20:53                   ` Paul E. McKenney
@ 2014-01-15 23:21                     ` Peter Zijlstra
       [not found]                       ` <CA+55aFydYLQeBq=4AQQp_4dAnq09ocLmde1LFaXiNAJ=wJzfFA@mail.gmail.com>
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2014-01-15 23:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Richard Henderson, Matt Turner, Daniel J Blueman,
	Waiman Long, Linux Kernel, Ivan Kokshaysky

On Wed, Jan 15, 2014 at 12:53:46PM -0800, Paul E. McKenney wrote:
> But we did drop support for SMP i386 quite some time ago, so perhaps
> it is time to drop support for SMP Alpha pre-EV56.

So while the primitive is called smp_store_release() the !SMP variant
still does:
 
  *(volatile __type *) = ptr;

which should not compile on any Alpha pre EV56, SMP or no for __type ==
u8.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
       [not found]                       ` <CA+55aFydYLQeBq=4AQQp_4dAnq09ocLmde1LFaXiNAJ=wJzfFA@mail.gmail.com>
@ 2014-01-16 10:36                         ` Peter Zijlstra
  2014-01-18 10:01                           ` Paul E. McKenney
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2014-01-16 10:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matt Turner, Waiman Long, Paul E. McKenney, Linux Kernel,
	Ivan Kokshaysky, Daniel J Blueman, Richard Henderson

On Thu, Jan 16, 2014 at 06:39:23AM +0700, Linus Torvalds wrote:
> On Jan 16, 2014 6:22 AM, "Peter Zijlstra" <peterz@infradead.org> wrote:
> >
> > So while the primitive is called smp_store_release() the !SMP variant
> > still does:
> >
> >   *(volatile __type *) = ptr;
> >
> > which should not compile on any Alpha pre EV56, SMP or no for __type ==
> > u8.
> 
> I'm not sure where you get that "should not compile" theory from.
> 
> I'm pretty sure it will compile just fine. It will just generate the same
> standard read-modify-write sequence (and not using the ldl/stc sequence
> either). Do you have any actual reason to believe it won't, apart from your
> theoretical wishes of how the world should work?

No, I earlier even said it probably would compile. My usage of 'should'
comes from how we've 'defined' volatile/ACCESS_ONCE in
Documentation/memory-barriers.txt. According to those constraints the
rmw cycle is not proper code.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-16 10:36                         ` Peter Zijlstra
@ 2014-01-18 10:01                           ` Paul E. McKenney
  2014-01-18 11:34                             ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2014-01-18 10:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Matt Turner, Waiman Long, Linux Kernel,
	Ivan Kokshaysky, Daniel J Blueman, Richard Henderson

On Thu, Jan 16, 2014 at 11:36:59AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 16, 2014 at 06:39:23AM +0700, Linus Torvalds wrote:
> > On Jan 16, 2014 6:22 AM, "Peter Zijlstra" <peterz@infradead.org> wrote:
> > >
> > > So while the primitive is called smp_store_release() the !SMP variant
> > > still does:
> > >
> > >   *(volatile __type *) = ptr;
> > >
> > > which should not compile on any Alpha pre EV56, SMP or no for __type ==
> > > u8.
> > 
> > I'm not sure where you get that "should not compile" theory from.
> > 
> > I'm pretty sure it will compile just fine. It will just generate the same
> > standard read-modify-write sequence (and not using the ldl/stc sequence
> > either). Do you have any actual reason to believe it won't, apart from your
> > theoretical wishes of how the world should work?
> 
> No, I earlier even said it probably would compile. My usage of 'should'
> comes from how we've 'defined' volatile/ACCESS_ONCE in
> Documentation/memory-barriers.txt. According to those constraints the
> rmw cycle is not proper code.

OK, I will bite...  Aside from fine-grained code timing, what code could
you write to tell the difference between a real one-byte store and an
RMW emulating that store?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-18 10:01                           ` Paul E. McKenney
@ 2014-01-18 11:34                             ` Peter Zijlstra
  2014-01-18 12:25                               ` Paul E. McKenney
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2014-01-18 11:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Matt Turner, Waiman Long, Linux Kernel,
	Ivan Kokshaysky, Daniel J Blueman, Richard Henderson

On Sat, Jan 18, 2014 at 02:01:05AM -0800, Paul E. McKenney wrote:
> OK, I will bite...  Aside from fine-grained code timing, what code could
> you write to tell the difference between a real one-byte store and an
> RMW emulating that store?

Why isn't fine-grained code timing an issue? I'm sure Alpha people will
love it when their machine magically keels over every so often.

Suppose we have two bytes in a word that get concurrent updates:

union {
	struct {
		u8 a;
		u8 b;
	};
	int word;
} ponies = { .word = 0, };

then two threads concurrently do:

CPU0:		CPU1:

ponies.a = 5	ponies.b = 10

At which point you'd expect: a == 5 && b == 10

However, with a rmw you could end up like:

			load r, ponies.word
load r, ponies.word
and  r, ~0xFF
or   r, 5
store ponies.word, r
			and r, ~0xFF00
			or r, 10 << 8
			store ponies.word, r

which gives: a == 0 && b == 10

The same can be had on a single CPU if you make the second RMW an
interrupt.

In fact, we recently had such a RMW issue on PPC64 although from a
slightly different angle, but we managed to hit it quite consistently.
See commit ba1f14fbe7096.

The thing is, if we allow the above RMW 'atomic' store, we have to be
_very_ careful that there cannot be such overlapping stores, otherwise
things will go BOOM!

However, if we already have to make sure there's no overlapping stores,
we might as well write a wide store and not allow the narrow stores to
begin with, to force people to think about the issue.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-18 11:34                             ` Peter Zijlstra
@ 2014-01-18 12:25                               ` Paul E. McKenney
  2014-01-18 12:41                                 ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2014-01-18 12:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Matt Turner, Waiman Long, Linux Kernel,
	Ivan Kokshaysky, Daniel J Blueman, Richard Henderson

On Sat, Jan 18, 2014 at 12:34:06PM +0100, Peter Zijlstra wrote:
> On Sat, Jan 18, 2014 at 02:01:05AM -0800, Paul E. McKenney wrote:
> > OK, I will bite...  Aside from fine-grained code timing, what code could
> > you write to tell the difference between a real one-byte store and an
> > RMW emulating that store?
> 
> Why isn't fine-grained code timing an issue? I'm sure Alpha people will
> love it when their machine magically keels over every so often.
> 
> Suppose we have two bytes in a word that get concurrent updates:
> 
> union {
> 	struct {
> 		u8 a;
> 		u8 b;
> 	};
> 	int word;
> } ponies = { .word = 0, };
> 
> then two threads concurrently do:
> 
> CPU0:		CPU1:
> 
> ponies.a = 5	ponies.b = 10
> 
> 
> At which point you'd expect: a == 5 && b == 10
> 
> However, with a rmw you could end up like:
> 
> 
> 			load r, ponies.word
> load r, ponies.word
> and  r, ~0xFF
> or   r, 5
> store ponies.word, r
> 			and r, ~0xFF00
> 			or r, 10 << 8
> 			store ponies.word, r
> 
> which gives: a == 0 && b == 10
> 
> The same can be had on a single CPU if you make the second RMW an
> interrupt.
> 
> 
> In fact, we recently had such a RMW issue on PPC64 although from a
> slightly different angle, but we managed to hit it quite consistently.
> See commit ba1f14fbe7096.
> 
> The thing is, if we allow the above RMW 'atomic' store, we have to be
> _very_ careful that there cannot be such overlapping stores, otherwise
> things will go BOOM!
> 
> However, if we already have to make sure there's no overlapping stores,
> we might as well write a wide store and not allow the narrow stores to
> begin with, to force people to think about the issue.

Ah, I was assuming atomic rmw, which for Alpha would be implemented using
the LL and SC instructions.  Yes, lots of overhead, but if the CPU
designers chose not to provide a load/store byte...

							Thanx, Paul


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-18 12:25                               ` Paul E. McKenney
@ 2014-01-18 12:41                                 ` Peter Zijlstra
  2014-01-18 21:22                                   ` Paul E. McKenney
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2014-01-18 12:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Matt Turner, Waiman Long, Linux Kernel,
	Ivan Kokshaysky, Daniel J Blueman, Richard Henderson

On Sat, Jan 18, 2014 at 04:25:48AM -0800, Paul E. McKenney wrote:
> On Sat, Jan 18, 2014 at 12:34:06PM +0100, Peter Zijlstra wrote:
> > On Sat, Jan 18, 2014 at 02:01:05AM -0800, Paul E. McKenney wrote:
> > > OK, I will bite...  Aside from fine-grained code timing, what code could
> > > you write to tell the difference between a real one-byte store and an
> > > RMW emulating that store?
> > 
> > Why isn't fine-grained code timing an issue? I'm sure Alpha people will
> > love it when their machine magically keels over every so often.
> > 
> > Suppose we have two bytes in a word that get concurrent updates:
> > 
> > union {
> > 	struct {
> > 		u8 a;
> > 		u8 b;
> > 	};
> > 	int word;
> > } ponies = { .word = 0, };
> > 
> > then two threads concurrently do:
> > 
> > CPU0:		CPU1:
> > 
> > ponies.a = 5	ponies.b = 10
> > 
> > 
> > At which point you'd expect: a == 5 && b == 10
> > 
> > However, with a rmw you could end up like:
> > 
> > 
> > 			load r, ponies.word
> > load r, ponies.word
> > and  r, ~0xFF
> > or   r, 5
> > store ponies.word, r
> > 			and r, ~0xFF00
> > 			or r, 10 << 8
> > 			store ponies.word, r
> > 
> > which gives: a == 0 && b == 10
> > 
> > The same can be had on a single CPU if you make the second RMW an
> > interrupt.
> > 
> > 
> > In fact, we recently had such a RMW issue on PPC64 although from a
> > slightly different angle, but we managed to hit it quite consistently.
> > See commit ba1f14fbe7096.
> > 
> > The thing is, if we allow the above RMW 'atomic' store, we have to be
> > _very_ careful that there cannot be such overlapping stores, otherwise
> > things will go BOOM!
> > 
> > However, if we already have to make sure there's no overlapping stores,
> > we might as well write a wide store and not allow the narrow stores to
> > begin with, to force people to think about the issue.
> 
> Ah, I was assuming atomic rmw, which for Alpha would be implemented using
> the LL and SC instructions.  Yes, lots of overhead, but if the CPU
> designers chose not to provide a load/store byte...

I don't see how ll/sc will help any. Suppose we do the a store as
smp_store_release() using ll/sc but the b store is unaware and doesn't
do an ll/sc.

Then we're still up shit creek without no paddle.

Whatever you're going to do, you need to be intimately aware of what the
other bits in your word are doing.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-18 12:41                                 ` Peter Zijlstra
@ 2014-01-18 21:22                                   ` Paul E. McKenney
  2014-01-19  0:57                                     ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2014-01-18 21:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Matt Turner, Waiman Long, Linux Kernel,
	Ivan Kokshaysky, Daniel J Blueman, Richard Henderson

On Sat, Jan 18, 2014 at 01:41:36PM +0100, Peter Zijlstra wrote:
> On Sat, Jan 18, 2014 at 04:25:48AM -0800, Paul E. McKenney wrote:
> > On Sat, Jan 18, 2014 at 12:34:06PM +0100, Peter Zijlstra wrote:
> > > On Sat, Jan 18, 2014 at 02:01:05AM -0800, Paul E. McKenney wrote:
> > > > OK, I will bite...  Aside from fine-grained code timing, what code could
> > > > you write to tell the difference between a real one-byte store and an
> > > > RMW emulating that store?
> > > 
> > > Why isn't fine-grained code timing an issue? I'm sure Alpha people will
> > > love it when their machine magically keels over every so often.
> > > 
> > > Suppose we have two bytes in a word that get concurrent updates:
> > > 
> > > union {
> > > 	struct {
> > > 		u8 a;
> > > 		u8 b;
> > > 	};
> > > 	int word;
> > > } ponies = { .word = 0, };
> > > 
> > > then two threads concurrently do:
> > > 
> > > CPU0:		CPU1:
> > > 
> > > ponies.a = 5	ponies.b = 10
> > > 
> > > 
> > > At which point you'd expect: a == 5 && b == 10
> > > 
> > > However, with a rmw you could end up like:
> > > 
> > > 
> > > 			load r, ponies.word
> > > load r, ponies.word
> > > and  r, ~0xFF
> > > or   r, 5
> > > store ponies.word, r
> > > 			and r, ~0xFF00
> > > 			or r, 10 << 8
> > > 			store ponies.word, r
> > > 
> > > which gives: a == 0 && b == 10
> > > 
> > > The same can be had on a single CPU if you make the second RMW an
> > > interrupt.
> > > 
> > > 
> > > In fact, we recently had such a RMW issue on PPC64 although from a
> > > slightly different angle, but we managed to hit it quite consistently.
> > > See commit ba1f14fbe7096.
> > > 
> > > The thing is, if we allow the above RMW 'atomic' store, we have to be
> > > _very_ careful that there cannot be such overlapping stores, otherwise
> > > things will go BOOM!
> > > 
> > > However, if we already have to make sure there's no overlapping stores,
> > > we might as well write a wide store and not allow the narrow stores to
> > > begin with, to force people to think about the issue.
> > 
> > Ah, I was assuming atomic rmw, which for Alpha would be implemented using
> > the LL and SC instructions.  Yes, lots of overhead, but if the CPU
> > designers chose not to provide a load/store byte...
> 
> I don't see how ll/sc will help any. Suppose we do the a store as
> smp_store_release() using ll/sc but the b store is unaware and doesn't
> do an ll/sc.
> 
> Then we're still up shit creek without no paddle.
> 
> Whatever you're going to do, you need to be intimately aware of what the
> other bits in your word are doing.

Yes, this requires that -all- updates to the fields in the machine word
in question use atomic rmw.  Which would not be pretty from a core-code
perspective.  Hence my suggestion of ceasing Linux-kernel support for
DEC Alpha CPUs that don't support byte operations.  Also need 16-bit
operations as well, of course...

							Thanx, Paul


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-18 21:22                                   ` Paul E. McKenney
@ 2014-01-19  0:57                                     ` Linus Torvalds
  2014-01-19  8:04                                       ` Paul E. McKenney
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2014-01-19  0:57 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Peter Zijlstra, Matt Turner, Waiman Long, Linux Kernel,
	Ivan Kokshaysky, Daniel J Blueman, Richard Henderson

On Sat, Jan 18, 2014 at 1:22 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> Yes, this requires that -all- updates to the fields in the machine word
> in question use atomic rmw.  Which would not be pretty from a core-code
> perspective.  Hence my suggestion of ceasing Linux-kernel support for
> DEC Alpha CPUs that don't support byte operations.  Also need 16-bit
> operations as well, of course...

I'm not seeing this.

Why the hell would you have byte- or halfword-sized versions of the
store_release or load_acquire things on alpha anyway?

What it means is that data structures that do locking or atomics need
to be "int" or "long" on alpha.  That has always been true. What do
you claim has changed?

                  Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-19  0:57                                     ` Linus Torvalds
@ 2014-01-19  8:04                                       ` Paul E. McKenney
  2014-01-19 19:56                                         ` Linus Torvalds
  2014-01-21 15:02                                         ` Waiman Long
  0 siblings, 2 replies; 31+ messages in thread
From: Paul E. McKenney @ 2014-01-19  8:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Matt Turner, Waiman Long, Linux Kernel,
	Ivan Kokshaysky, Daniel J Blueman, Richard Henderson

On Sat, Jan 18, 2014 at 04:57:05PM -0800, Linus Torvalds wrote:
> On Sat, Jan 18, 2014 at 1:22 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > Yes, this requires that -all- updates to the fields in the machine word
> > in question use atomic rmw.  Which would not be pretty from a core-code
> > perspective.  Hence my suggestion of ceasing Linux-kernel support for
> > DEC Alpha CPUs that don't support byte operations.  Also need 16-bit
> > operations as well, of course...
> 
> I'm not seeing this.
> 
> Why the hell would you have byte- or halfword-sized versions of the
> store_release or load_acquire things on alpha anyway?
> 
> What it means is that data structures that do locking or atomics need
> to be "int" or "long" on alpha.  That has always been true. What do
> you claim has changed?

OK, another approach would be to never add "select ARCH_USE_QUEUE_RWLOCK"
on Alpha, at least if the queued rwlocks really do want to atomically
manipulate bytes.  After all, the Alpha systems that I know about don't
have enough CPUs to make queued rwlocks necessary anyway.

Much simpler solution!

Is this what you were getting at, or am I missing your point?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-19  8:04                                       ` Paul E. McKenney
@ 2014-01-19 19:56                                         ` Linus Torvalds
  2014-01-20  0:52                                           ` Paul E. McKenney
  2014-01-21 15:02                                         ` Waiman Long
  1 sibling, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2014-01-19 19:56 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Peter Zijlstra, Matt Turner, Waiman Long, Linux Kernel,
	Ivan Kokshaysky, Daniel J Blueman, Richard Henderson

On Sun, Jan 19, 2014 at 12:04 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> OK, another approach would be to never add "select ARCH_USE_QUEUE_RWLOCK"
> on Alpha, at least if the queued rwlocks really do want to atomically
> manipulate bytes.  After all, the Alpha systems that I know about don't
> have enough CPUs to make queued rwlocks necessary anyway.
>
> Much simpler solution!
>
> Is this what you were getting at, or am I missing your point?

You're missing something.

Just make the "writer" field be an "int" on little-endian archiectures
(like alpha).

There is no reason for that field to be a "char" to begin with, as far
as I can tell, since the padding of the structure means that it
doesn't save any space. But even if that wasn't true, we could make an
arch-specific type for "minimum type for locking".

So my *point* was that it should be easy enough to just make sure that
any data structures used for locking have types that are appropriate
for that locking.

                      Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-19 19:56                                         ` Linus Torvalds
@ 2014-01-20  0:52                                           ` Paul E. McKenney
  0 siblings, 0 replies; 31+ messages in thread
From: Paul E. McKenney @ 2014-01-20  0:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Matt Turner, Waiman Long, Linux Kernel,
	Ivan Kokshaysky, Daniel J Blueman, Richard Henderson

On Sun, Jan 19, 2014 at 11:56:02AM -0800, Linus Torvalds wrote:
> On Sun, Jan 19, 2014 at 12:04 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > OK, another approach would be to never add "select ARCH_USE_QUEUE_RWLOCK"
> > on Alpha, at least if the queued rwlocks really do want to atomically
> > manipulate bytes.  After all, the Alpha systems that I know about don't
> > have enough CPUs to make queued rwlocks necessary anyway.
> >
> > Much simpler solution!
> >
> > Is this what you were getting at, or am I missing your point?
> 
> You're missing something.
> 
> Just make the "writer" field be an "int" on little-endian archiectures
> (like alpha).
> 
> There is no reason for that field to be a "char" to begin with, as far
> as I can tell, since the padding of the structure means that it
> doesn't save any space. But even if that wasn't true, we could make an
> arch-specific type for "minimum type for locking".

On 64-bit systems (which includes Alpha), agreed, the field can be a
32-bit portion of a 64-bit structure that is then manipulated atomically.
Many 32-bit systems need the reader and writer counts to fix in 32 bits
in order to allow things like queue_read_trylock() to correctly account
for both readers and writers.

If there was a 32-bit system running Linux that did not support byte
accesses, there would be a problem, but I don't know of any such system.

> So my *point* was that it should be easy enough to just make sure that
> any data structures used for locking have types that are appropriate
> for that locking.

So something like the following for the qrwlock definition, with
appropriate C-preprocessor wrappers for the atomic-add accesses?

							Thanx, Paul

------------------------------------------------------------------------

typedef struct qrwlock {
	union qrwcnts {
#ifdef CONFIG_64B
		struct (
			int writer;
			int reader;
		};
		atomic_long_t rwa;
		u64	 rwc;
#else
		struct {
#ifdef __LITTLE_ENDIAN
			u8  writer;	/* Writer state		*/
#else
			u16 r16;	/* Reader count - msb	*/
			u8  r8;		/* Reader count - lsb	*/
			u8  writer;	/* Writer state		*/
#endif
		};
		atomic_t    rwa;	/* Reader/writer atomic	*/
		u32	    rwc;	/* Reader/writer counts */
	} cnts;
#endif
	struct mcs_spinlock *waitq;	/* Tail of waiting queue */
} arch_rwlock_t;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-19  8:04                                       ` Paul E. McKenney
  2014-01-19 19:56                                         ` Linus Torvalds
@ 2014-01-21 15:02                                         ` Waiman Long
  2014-01-21 15:41                                           ` Peter Zijlstra
  1 sibling, 1 reply; 31+ messages in thread
From: Waiman Long @ 2014-01-21 15:02 UTC (permalink / raw)
  To: paulmck
  Cc: Linus Torvalds, Peter Zijlstra, Matt Turner, Linux Kernel,
	Ivan Kokshaysky, Daniel J Blueman, Richard Henderson

On 01/19/2014 03:04 AM, Paul E. McKenney wrote:
> On Sat, Jan 18, 2014 at 04:57:05PM -0800, Linus Torvalds wrote:
>> On Sat, Jan 18, 2014 at 1:22 PM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com>  wrote:
>>> Yes, this requires that -all- updates to the fields in the machine word
>>> in question use atomic rmw.  Which would not be pretty from a core-code
>>> perspective.  Hence my suggestion of ceasing Linux-kernel support for
>>> DEC Alpha CPUs that don't support byte operations.  Also need 16-bit
>>> operations as well, of course...
>> I'm not seeing this.
>>
>> Why the hell would you have byte- or halfword-sized versions of the
>> store_release or load_acquire things on alpha anyway?
>>
>> What it means is that data structures that do locking or atomics need
>> to be "int" or "long" on alpha.  That has always been true. What do
>> you claim has changed?
> OK, another approach would be to never add "select ARCH_USE_QUEUE_RWLOCK"
> on Alpha, at least if the queued rwlocks really do want to atomically
> manipulate bytes.  After all, the Alpha systems that I know about don't
> have enough CPUs to make queued rwlocks necessary anyway.
>
> Much simpler solution!
>
> Is this what you were getting at, or am I missing your point?
>
> 							Thanx, Paul
>

My latest v9 series of qrwlock patch will automatically adapt to the 
lack of atomic byte access by using an atomic integer instruction 
instead. So the new series should work for pre-EV56 Alpha, it is just a 
bit less efficient in this case.

-Longman

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-21 15:02                                         ` Waiman Long
@ 2014-01-21 15:41                                           ` Peter Zijlstra
  2014-01-21 16:16                                             ` Waiman Long
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2014-01-21 15:41 UTC (permalink / raw)
  To: Waiman Long
  Cc: paulmck, Linus Torvalds, Matt Turner, Linux Kernel,
	Ivan Kokshaysky, Daniel J Blueman, Richard Henderson

On Tue, Jan 21, 2014 at 10:02:06AM -0500, Waiman Long wrote:
> My latest v9 series of qrwlock patch will automatically adapt to the lack of
> atomic byte access by using an atomic integer instruction instead. So the
> new series should work for pre-EV56 Alpha, it is just a bit less efficient
> in this case.

See my other email; I don't think you can do that without also changing
the implementation of the queue_read_{try}lock() functions.

Without those changes you can have transient values in your 'read-count'
part of the word and a full word write will wreck things.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
  2014-01-21 15:41                                           ` Peter Zijlstra
@ 2014-01-21 16:16                                             ` Waiman Long
  0 siblings, 0 replies; 31+ messages in thread
From: Waiman Long @ 2014-01-21 16:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulmck, Linus Torvalds, Matt Turner, Linux Kernel,
	Ivan Kokshaysky, Daniel J Blueman, Richard Henderson

On 01/21/2014 10:41 AM, Peter Zijlstra wrote:
> On Tue, Jan 21, 2014 at 10:02:06AM -0500, Waiman Long wrote:
>> My latest v9 series of qrwlock patch will automatically adapt to the lack of
>> atomic byte access by using an atomic integer instruction instead. So the
>> new series should work for pre-EV56 Alpha, it is just a bit less efficient
>> in this case.
> See my other email; I don't think you can do that without also changing
> the implementation of the queue_read_{try}lock() functions.
>
> Without those changes you can have transient values in your 'read-count'
> part of the word and a full word write will wreck things.

I don't see any problem with my current logic. If a writer has the write 
lock, the writer byte has to have a value of 0xff. So atomically 
subtracting 0xff from it will guarantee that the writer byte will become 
zero, which is the same as assigning a zero value to that byte. The only 
difference is that an atomic subtract instruction will need to be used 
instead of a simple byte assignment.

Please let me know if there is any flaw in my logic.

-Longman

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2014-01-21 16:16 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-13  2:47 [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock() Daniel J Blueman
2014-01-13  3:49 ` Paul E. McKenney
2014-01-13 16:41 ` Waiman Long
2014-01-14  2:28   ` Daniel J Blueman
2014-01-14 11:03     ` Peter Zijlstra
2014-01-14 15:25       ` Waiman Long
2014-01-14 17:08       ` Matt Turner
2014-01-14 18:01         ` Richard Henderson
2014-01-14 19:09           ` Waiman Long
2014-01-14 20:20             ` Peter Zijlstra
2014-01-14 23:44           ` Paul E. McKenney
2014-01-15  0:25             ` Linus Torvalds
2014-01-15  2:39               ` Paul E. McKenney
2014-01-15  8:07                 ` Peter Zijlstra
2014-01-15 20:53                   ` Paul E. McKenney
2014-01-15 23:21                     ` Peter Zijlstra
     [not found]                       ` <CA+55aFydYLQeBq=4AQQp_4dAnq09ocLmde1LFaXiNAJ=wJzfFA@mail.gmail.com>
2014-01-16 10:36                         ` Peter Zijlstra
2014-01-18 10:01                           ` Paul E. McKenney
2014-01-18 11:34                             ` Peter Zijlstra
2014-01-18 12:25                               ` Paul E. McKenney
2014-01-18 12:41                                 ` Peter Zijlstra
2014-01-18 21:22                                   ` Paul E. McKenney
2014-01-19  0:57                                     ` Linus Torvalds
2014-01-19  8:04                                       ` Paul E. McKenney
2014-01-19 19:56                                         ` Linus Torvalds
2014-01-20  0:52                                           ` Paul E. McKenney
2014-01-21 15:02                                         ` Waiman Long
2014-01-21 15:41                                           ` Peter Zijlstra
2014-01-21 16:16                                             ` Waiman Long
2014-01-15  3:08             ` Daniel J Blueman
  -- strict thread matches above, loose matches on Subject: below --
2014-01-08 16:59 [PATCH v8 0/4] qrwlock: Introducing a queue read/write lock implementation Waiman Long
2014-01-08 16:59 ` [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock() Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).