* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
@ 2014-01-13 2:47 Daniel J Blueman
2014-01-13 3:49 ` Paul E. McKenney
2014-01-13 16:41 ` Waiman Long
0 siblings, 2 replies; 31+ messages in thread
From: Daniel J Blueman @ 2014-01-13 2:47 UTC (permalink / raw)
To: Waiman Long; +Cc: Paul E. McKenney, Linux Kernel
On Thursday, 9 January 2014 01:10:03 UTC+8, Waiman Long wrote:
> This patch modifies the queue_write_unlock() function to use the
> new smp_store_release() function in another pending patch. It also
> removes the temporary implementation of smp_load_acquire() and
> smp_store_release() function in qrwlock.c.
>
> This patch should only be merged if PeterZ's linux-arch patch patch
> was merged.
>
> Signed-off-by: Waiman Long <Waiman.Long@hp.com>
> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
> include/asm-generic/qrwlock.h | 4 +---
> kernel/locking/qrwlock.c | 34 ----------------------------------
> 2 files changed, 1 insertions(+), 37 deletions(-)
>
> diff --git a/include/asm-generic/qrwlock.h
b/include/asm-generic/qrwlock.h
> index 2b9a7b4..4d4bd04 100644
> --- a/include/asm-generic/qrwlock.h
> +++ b/include/asm-generic/qrwlock.h
> @@ -179,9 +179,7 @@ static inline void queue_write_unlock(struct
qrwlock *lock)
> /*
> * Make sure that none of the critical section will be leaked out.
> */
> - smp_mb__before_clear_bit();
> - ACCESS_ONCE(lock->cnts.writer) = 0;
> - smp_mb__after_clear_bit();
> + smp_store_release(&lock->cnts.writer, 0)
This will fail compilation, so probably needs further testing with
Peter's load_acquire/store_release barrier patches.
Thanks,
Daniel
--
Daniel J Blueman
Principal Software Engineer, Numascale
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-13 2:47 [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock() Daniel J Blueman
@ 2014-01-13 3:49 ` Paul E. McKenney
2014-01-13 16:41 ` Waiman Long
1 sibling, 0 replies; 31+ messages in thread
From: Paul E. McKenney @ 2014-01-13 3:49 UTC (permalink / raw)
To: Daniel J Blueman; +Cc: Waiman Long, Linux Kernel
On Mon, Jan 13, 2014 at 10:47:36AM +0800, Daniel J Blueman wrote:
> On Thursday, 9 January 2014 01:10:03 UTC+8, Waiman Long wrote:
> > This patch modifies the queue_write_unlock() function to use the
> > new smp_store_release() function in another pending patch. It also
> > removes the temporary implementation of smp_load_acquire() and
> > smp_store_release() function in qrwlock.c.
> >
> > This patch should only be merged if PeterZ's linux-arch patch patch
> > was merged.
> >
> > Signed-off-by: Waiman Long <Waiman.Long@hp.com>
> > Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> > include/asm-generic/qrwlock.h | 4 +---
> > kernel/locking/qrwlock.c | 34 ----------------------------------
> > 2 files changed, 1 insertions(+), 37 deletions(-)
> >
> > diff --git a/include/asm-generic/qrwlock.h
> b/include/asm-generic/qrwlock.h
> > index 2b9a7b4..4d4bd04 100644
> > --- a/include/asm-generic/qrwlock.h
> > +++ b/include/asm-generic/qrwlock.h
> > @@ -179,9 +179,7 @@ static inline void queue_write_unlock(struct
> qrwlock *lock)
> > /*
> > * Make sure that none of the critical section will be leaked out.
> > */
> > - smp_mb__before_clear_bit();
> > - ACCESS_ONCE(lock->cnts.writer) = 0;
> > - smp_mb__after_clear_bit();
> > + smp_store_release(&lock->cnts.writer, 0)
>
> This will fail compilation, so probably needs further testing with
> Peter's load_acquire/store_release barrier patches.
Peter's patch just hit -tip, so this should be esay to do. In Waiman's
defense, he does call attention to this in the commit log.
Thanx, Paul
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-13 2:47 [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock() Daniel J Blueman
2014-01-13 3:49 ` Paul E. McKenney
@ 2014-01-13 16:41 ` Waiman Long
2014-01-14 2:28 ` Daniel J Blueman
1 sibling, 1 reply; 31+ messages in thread
From: Waiman Long @ 2014-01-13 16:41 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Daniel J Blueman, Paul E. McKenney, Linux Kernel
On 01/12/2014 09:47 PM, Daniel J Blueman wrote:
> On Thursday, 9 January 2014 01:10:03 UTC+8, Waiman Long wrote:
> > This patch modifies the queue_write_unlock() function to use the
> > new smp_store_release() function in another pending patch. It also
> > removes the temporary implementation of smp_load_acquire() and
> > smp_store_release() function in qrwlock.c.
> >
> > This patch should only be merged if PeterZ's linux-arch patch patch
> > was merged.
> >
> > Signed-off-by: Waiman Long <Waiman.Long@hp.com>
> > Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> > include/asm-generic/qrwlock.h | 4 +---
> > kernel/locking/qrwlock.c | 34
> ----------------------------------
> > 2 files changed, 1 insertions(+), 37 deletions(-)
> >
> > diff --git a/include/asm-generic/qrwlock.h
> b/include/asm-generic/qrwlock.h
> > index 2b9a7b4..4d4bd04 100644
> > --- a/include/asm-generic/qrwlock.h
> > +++ b/include/asm-generic/qrwlock.h
> > @@ -179,9 +179,7 @@ static inline void queue_write_unlock(struct
> qrwlock *lock)
> > /*
> > * Make sure that none of the critical section will be leaked out.
> > */
> > - smp_mb__before_clear_bit();
> > - ACCESS_ONCE(lock->cnts.writer) = 0;
> > - smp_mb__after_clear_bit();
> > + smp_store_release(&lock->cnts.writer, 0)
>
> This will fail compilation, so probably needs further testing with
> Peter's load_acquire/store_release barrier patches.
>
Peter,
I found out that the build failure was caused by the fact that the
__native_word() macro (used internally by compiletime_assert_atomic())
allows only a size of 4 or 8 for x86-64. The data type that I used is a
byte. Is there a reason why byte and short are not considered native?
-Longman
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-13 16:41 ` Waiman Long
@ 2014-01-14 2:28 ` Daniel J Blueman
2014-01-14 11:03 ` Peter Zijlstra
0 siblings, 1 reply; 31+ messages in thread
From: Daniel J Blueman @ 2014-01-14 2:28 UTC (permalink / raw)
To: Waiman Long, Peter Zijlstra; +Cc: Paul E. McKenney, Linux Kernel
On 14/01/2014 00:41, Waiman Long wrote:
> On 01/12/2014 09:47 PM, Daniel J Blueman wrote:
>> On Thursday, 9 January 2014 01:10:03 UTC+8, Waiman Long wrote:
>> > This patch modifies the queue_write_unlock() function to use the
>> > new smp_store_release() function in another pending patch. It also
>> > removes the temporary implementation of smp_load_acquire() and
>> > smp_store_release() function in qrwlock.c.
>> >
>> > This patch should only be merged if PeterZ's linux-arch patch patch
>> > was merged.
>> >
>> > Signed-off-by: Waiman Long <Waiman.Long@hp.com>
>> > Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> > ---
>> > include/asm-generic/qrwlock.h | 4 +---
>> > kernel/locking/qrwlock.c | 34
>> ----------------------------------
>> > 2 files changed, 1 insertions(+), 37 deletions(-)
>> >
>> > diff --git a/include/asm-generic/qrwlock.h
>> b/include/asm-generic/qrwlock.h
>> > index 2b9a7b4..4d4bd04 100644
>> > --- a/include/asm-generic/qrwlock.h
>> > +++ b/include/asm-generic/qrwlock.h
>> > @@ -179,9 +179,7 @@ static inline void queue_write_unlock(struct
>> qrwlock *lock)
>> > /*
>> > * Make sure that none of the critical section will be leaked out.
>> > */
>> > - smp_mb__before_clear_bit();
>> > - ACCESS_ONCE(lock->cnts.writer) = 0;
>> > - smp_mb__after_clear_bit();
>> > + smp_store_release(&lock->cnts.writer, 0)
>>
>> This will fail compilation, so probably needs further testing with
>> Peter's load_acquire/store_release barrier patches.
>>
>
> Peter,
>
> I found out that the build failure was caused by the fact that the
> __native_word() macro (used internally by compiletime_assert_atomic())
> allows only a size of 4 or 8 for x86-64. The data type that I used is a
> byte. Is there a reason why byte and short are not considered native?
It seems likely it was implemented like that since there was no existing
need; long can be relied on as the largest native type, so this should
suffice and works here:
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index fe7a686..dac91d7 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -299,8 +299,8 @@ void ftrace_likely_update(struct ftrace_branch_data
*f, int val, int expect);
#endif
/* Is this type a native word size -- useful for atomic operations */
-#ifndef __native_word
-# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) ==
sizeof(long))
+#ifndef __native_type
+# define __native_type(t) (sizeof(t) <= sizeof(long))
#endif
/* Compile time object size, -1 for unknown */
@@ -343,8 +343,8 @@ void ftrace_likely_update(struct ftrace_branch_data
*f, int val, int expect);
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
#define compiletime_assert_atomic_type(t) \
- compiletime_assert(__native_word(t), \
- "Need native word sized stores/loads for atomicity.")
+ compiletime_assert(__native_type(t), \
+ "Need native sized stores/loads for atomicity.")
/*
* Prevent the compiler from merging or refetching accesses. The compiler
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
--
Daniel J Blueman
Principal Software Engineer, Numascale
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-14 2:28 ` Daniel J Blueman
@ 2014-01-14 11:03 ` Peter Zijlstra
2014-01-14 15:25 ` Waiman Long
2014-01-14 17:08 ` Matt Turner
0 siblings, 2 replies; 31+ messages in thread
From: Peter Zijlstra @ 2014-01-14 11:03 UTC (permalink / raw)
To: Daniel J Blueman
Cc: Waiman Long, Paul E. McKenney, Linux Kernel, rth, ink, mattst88,
Linus Torvalds
On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
> >Peter,
> >
> >I found out that the build failure was caused by the fact that the
> >__native_word() macro (used internally by compiletime_assert_atomic())
> >allows only a size of 4 or 8 for x86-64. The data type that I used is a
> >byte. Is there a reason why byte and short are not considered native?
>
> It seems likely it was implemented like that since there was no existing
> need; long can be relied on as the largest native type, so this should
> suffice and works here:
There's Alphas that cannot actually atomically adres a byte; I do not
konw if Linux cares about them, but if it does, we cannot in fact rely
on this in generic primitives like this.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-14 11:03 ` Peter Zijlstra
@ 2014-01-14 15:25 ` Waiman Long
2014-01-14 17:08 ` Matt Turner
1 sibling, 0 replies; 31+ messages in thread
From: Waiman Long @ 2014-01-14 15:25 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Daniel J Blueman, Paul E. McKenney, Linux Kernel, rth, ink,
mattst88, Linus Torvalds
On 01/14/2014 06:03 AM, Peter Zijlstra wrote:
> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
>>> Peter,
>>>
>>> I found out that the build failure was caused by the fact that the
>>> __native_word() macro (used internally by compiletime_assert_atomic())
>>> allows only a size of 4 or 8 for x86-64. The data type that I used is a
>>> byte. Is there a reason why byte and short are not considered native?
>> It seems likely it was implemented like that since there was no existing
>> need; long can be relied on as the largest native type, so this should
>> suffice and works here:
> There's Alphas that cannot actually atomically adres a byte; I do not
> konw if Linux cares about them, but if it does, we cannot in fact rely
> on this in generic primitives like this.
Thank for the explanation.
Can we allow architectural override of __native_word() macro? Like
#ifdef __arch_native_word
#define __native_word(t) __arch_native_word(t)
#else
#define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) ==
sizeof(long))
#endif
In this way, we can allow x86 to support byte-based atomic type while
restricting the generic macro to int and long only. I will also modify
the code to use cmpxchg() when byte is not an atomic type.
-Longman
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-14 11:03 ` Peter Zijlstra
2014-01-14 15:25 ` Waiman Long
@ 2014-01-14 17:08 ` Matt Turner
2014-01-14 18:01 ` Richard Henderson
1 sibling, 1 reply; 31+ messages in thread
From: Matt Turner @ 2014-01-14 17:08 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Daniel J Blueman, Waiman Long, Paul E. McKenney, Linux Kernel,
Richard Henderson, Ivan Kokshaysky, Linus Torvalds
On Tue, Jan 14, 2014 at 3:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
>> >Peter,
>> >
>> >I found out that the build failure was caused by the fact that the
>> >__native_word() macro (used internally by compiletime_assert_atomic())
>> >allows only a size of 4 or 8 for x86-64. The data type that I used is a
>> >byte. Is there a reason why byte and short are not considered native?
>>
>> It seems likely it was implemented like that since there was no existing
>> need; long can be relied on as the largest native type, so this should
>> suffice and works here:
>
> There's Alphas that cannot actually atomically adres a byte; I do not
> konw if Linux cares about them, but if it does, we cannot in fact rely
> on this in generic primitives like this.
That's right, and thanks for the heads-up. Alpha can only address 4
and 8 bytes atomically. (LDL_L, LDQ_L, STL_C, STQ_C).
The Byte-Word extension in EV56 doesn't add new atomics, so in fact no
Alphas can address < 4 bytes atomically.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-14 17:08 ` Matt Turner
@ 2014-01-14 18:01 ` Richard Henderson
2014-01-14 19:09 ` Waiman Long
2014-01-14 23:44 ` Paul E. McKenney
0 siblings, 2 replies; 31+ messages in thread
From: Richard Henderson @ 2014-01-14 18:01 UTC (permalink / raw)
To: Matt Turner, Peter Zijlstra
Cc: Daniel J Blueman, Waiman Long, Paul E. McKenney, Linux Kernel,
Ivan Kokshaysky, Linus Torvalds
On 01/14/2014 09:08 AM, Matt Turner wrote:
> On Tue, Jan 14, 2014 at 3:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
>>>> Peter,
>>>>
>>>> I found out that the build failure was caused by the fact that the
>>>> __native_word() macro (used internally by compiletime_assert_atomic())
>>>> allows only a size of 4 or 8 for x86-64. The data type that I used is a
>>>> byte. Is there a reason why byte and short are not considered native?
>>>
>>> It seems likely it was implemented like that since there was no existing
>>> need; long can be relied on as the largest native type, so this should
>>> suffice and works here:
>>
>> There's Alphas that cannot actually atomically adres a byte; I do not
>> konw if Linux cares about them, but if it does, we cannot in fact rely
>> on this in generic primitives like this.
>
> That's right, and thanks for the heads-up. Alpha can only address 4
> and 8 bytes atomically. (LDL_L, LDQ_L, STL_C, STQ_C).
>
> The Byte-Word extension in EV56 doesn't add new atomics, so in fact no
> Alphas can address < 4 bytes atomically.
>
Emulated with aligned 4 byte atomics, and masking. The same is true for arm,
ppc, mips which, depending on cpu, also lack < 4 byte atomics.
r~
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-14 18:01 ` Richard Henderson
@ 2014-01-14 19:09 ` Waiman Long
2014-01-14 20:20 ` Peter Zijlstra
2014-01-14 23:44 ` Paul E. McKenney
1 sibling, 1 reply; 31+ messages in thread
From: Waiman Long @ 2014-01-14 19:09 UTC (permalink / raw)
To: Richard Henderson
Cc: Matt Turner, Peter Zijlstra, Daniel J Blueman, Paul E. McKenney,
Linux Kernel, Ivan Kokshaysky, Linus Torvalds
On 01/14/2014 01:01 PM, Richard Henderson wrote:
> On 01/14/2014 09:08 AM, Matt Turner wrote:
>> On Tue, Jan 14, 2014 at 3:03 AM, Peter Zijlstra<peterz@infradead.org> wrote:
>>> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
>>>>> Peter,
>>>>>
>>>>> I found out that the build failure was caused by the fact that the
>>>>> __native_word() macro (used internally by compiletime_assert_atomic())
>>>>> allows only a size of 4 or 8 for x86-64. The data type that I used is a
>>>>> byte. Is there a reason why byte and short are not considered native?
>>>> It seems likely it was implemented like that since there was no existing
>>>> need; long can be relied on as the largest native type, so this should
>>>> suffice and works here:
>>> There's Alphas that cannot actually atomically adres a byte; I do not
>>> konw if Linux cares about them, but if it does, we cannot in fact rely
>>> on this in generic primitives like this.
>> That's right, and thanks for the heads-up. Alpha can only address 4
>> and 8 bytes atomically. (LDL_L, LDQ_L, STL_C, STQ_C).
>>
>> The Byte-Word extension in EV56 doesn't add new atomics, so in fact no
>> Alphas can address< 4 bytes atomically.
>>
> Emulated with aligned 4 byte atomics, and masking. The same is true for arm,
> ppc, mips which, depending on cpu, also lack< 4 byte atomics.
>
I would like to know if the action of writing out a byte (e.g. *byte =
0) is atomic in those architectures or is emulated by a
compiler-generated software read-modify-write.
-Longman
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-14 19:09 ` Waiman Long
@ 2014-01-14 20:20 ` Peter Zijlstra
0 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2014-01-14 20:20 UTC (permalink / raw)
To: Waiman Long
Cc: Richard Henderson, Matt Turner, Daniel J Blueman,
Paul E. McKenney, Linux Kernel, Ivan Kokshaysky, Linus Torvalds
On Tue, Jan 14, 2014 at 02:09:30PM -0500, Waiman Long wrote:
> I would like to know if the action of writing out a byte (e.g. *byte = 0) is
> atomic in those architectures or is emulated by a compiler-generated
> software read-modify-write.
So on Alpha pre ev56 something like:
*(volatile u8 *)foo = 0;
_Should_ cause a compile error as the hardware has to do a rmw which is
not compatible with the requirements for volatile -- that said I do not
know if a compiler will actually generate this error.
I can well imagine other load-store archs suffering similar problems,
although I'm not aware of any.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-14 18:01 ` Richard Henderson
2014-01-14 19:09 ` Waiman Long
@ 2014-01-14 23:44 ` Paul E. McKenney
2014-01-15 0:25 ` Linus Torvalds
2014-01-15 3:08 ` Daniel J Blueman
1 sibling, 2 replies; 31+ messages in thread
From: Paul E. McKenney @ 2014-01-14 23:44 UTC (permalink / raw)
To: Richard Henderson
Cc: Matt Turner, Peter Zijlstra, Daniel J Blueman, Waiman Long,
Linux Kernel, Ivan Kokshaysky, Linus Torvalds
On Tue, Jan 14, 2014 at 10:01:04AM -0800, Richard Henderson wrote:
> On 01/14/2014 09:08 AM, Matt Turner wrote:
> > On Tue, Jan 14, 2014 at 3:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
> >>>> Peter,
> >>>>
> >>>> I found out that the build failure was caused by the fact that the
> >>>> __native_word() macro (used internally by compiletime_assert_atomic())
> >>>> allows only a size of 4 or 8 for x86-64. The data type that I used is a
> >>>> byte. Is there a reason why byte and short are not considered native?
> >>>
> >>> It seems likely it was implemented like that since there was no existing
> >>> need; long can be relied on as the largest native type, so this should
> >>> suffice and works here:
> >>
> >> There's Alphas that cannot actually atomically adres a byte; I do not
> >> konw if Linux cares about them, but if it does, we cannot in fact rely
> >> on this in generic primitives like this.
> >
> > That's right, and thanks for the heads-up. Alpha can only address 4
> > and 8 bytes atomically. (LDL_L, LDQ_L, STL_C, STQ_C).
> >
> > The Byte-Word extension in EV56 doesn't add new atomics, so in fact no
> > Alphas can address < 4 bytes atomically.
>
> Emulated with aligned 4 byte atomics, and masking. The same is true for arm,
> ppc, mips which, depending on cpu, also lack < 4 byte atomics.
Which means that Alpha should be able to similarly emulate 1-byte and
2-byte atomics, correct?
Thanx, Paul
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-14 23:44 ` Paul E. McKenney
@ 2014-01-15 0:25 ` Linus Torvalds
2014-01-15 2:39 ` Paul E. McKenney
2014-01-15 3:08 ` Daniel J Blueman
1 sibling, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2014-01-15 0:25 UTC (permalink / raw)
To: Paul McKenney
Cc: Richard Henderson, Matt Turner, Peter Zijlstra, Daniel J Blueman,
Waiman Long, Linux Kernel, Ivan Kokshaysky
On Wed, Jan 15, 2014 at 6:44 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> Which means that Alpha should be able to similarly emulate 1-byte and
> 2-byte atomics, correct?
Not reasonably, no.
The ldl/stc implementation on early alpha was so broken as to be
unusable. It's not actually done in the cache, it WENT OUT ON THE BUS.
We're talking 70's style "external lock signal" kind of things like
the 8086 did for locked cycles before the advent of caches, the kind
that nobody sane has done for a long long time.
So realistically, you absolutely do not want to use those things to
emulate atomic byte/word accesses. The whole point of "load_acquire()"
and "store_release()" is that it's supposed to be cheaper than a
locked access, and can be done with just a barrier instruction or a
special instruction flag.
If you just want to do a store release, on alpha you'd want to
implement that as a full memory barrier followed by a store. It
doesn't get the advantage of a real release consistency model, but at
least it's not doing an external bus access. But you can only do that
store as a 4-byte or 8-byte store.on the older alphas (byte and word
stores work on newer ones).
Of course, it's entirely possible that nobody cares..
Linus
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-15 0:25 ` Linus Torvalds
@ 2014-01-15 2:39 ` Paul E. McKenney
2014-01-15 8:07 ` Peter Zijlstra
0 siblings, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2014-01-15 2:39 UTC (permalink / raw)
To: Linus Torvalds
Cc: Richard Henderson, Matt Turner, Peter Zijlstra, Daniel J Blueman,
Waiman Long, Linux Kernel, Ivan Kokshaysky
On Wed, Jan 15, 2014 at 07:25:04AM +0700, Linus Torvalds wrote:
> On Wed, Jan 15, 2014 at 6:44 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > Which means that Alpha should be able to similarly emulate 1-byte and
> > 2-byte atomics, correct?
>
> Not reasonably, no.
>
> The ldl/stc implementation on early alpha was so broken as to be
> unusable. It's not actually done in the cache, it WENT OUT ON THE BUS.
> We're talking 70's style "external lock signal" kind of things like
> the 8086 did for locked cycles before the advent of caches, the kind
> that nobody sane has done for a long long time.
>
> So realistically, you absolutely do not want to use those things to
> emulate atomic byte/word accesses. The whole point of "load_acquire()"
> and "store_release()" is that it's supposed to be cheaper than a
> locked access, and can be done with just a barrier instruction or a
> special instruction flag.
>
> If you just want to do a store release, on alpha you'd want to
> implement that as a full memory barrier followed by a store. It
> doesn't get the advantage of a real release consistency model, but at
> least it's not doing an external bus access. But you can only do that
> store as a 4-byte or 8-byte store.on the older alphas (byte and word
> stores work on newer ones).
>
> Of course, it's entirely possible that nobody cares..
That would be my hope. ;-)
If nobody cares about Alpha period, it is easy. However, the last time
that I tried that approach, they sent me a URL of a wiki showing Alpha
systems still running mainline. But a slow-but-working approach for
Alpha does seem reasonable, even for those still running Linux on Alpha.
Thanx, Paul
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-15 2:39 ` Paul E. McKenney
@ 2014-01-15 8:07 ` Peter Zijlstra
2014-01-15 20:53 ` Paul E. McKenney
0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2014-01-15 8:07 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Linus Torvalds, Richard Henderson, Matt Turner, Daniel J Blueman,
Waiman Long, Linux Kernel, Ivan Kokshaysky
On Tue, Jan 14, 2014 at 06:39:58PM -0800, Paul E. McKenney wrote:
> > If you just want to do a store release, on alpha you'd want to
> > implement that as a full memory barrier followed by a store. It
> > doesn't get the advantage of a real release consistency model, but at
> > least it's not doing an external bus access. But you can only do that
> > store as a 4-byte or 8-byte store.on the older alphas (byte and word
> > stores work on newer ones).
> >
> > Of course, it's entirely possible that nobody cares..
>
> That would be my hope. ;-)
>
> If nobody cares about Alpha period, it is easy. However, the last time
> that I tried that approach, they sent me a URL of a wiki showing Alpha
> systems still running mainline. But a slow-but-working approach for
> Alpha does seem reasonable, even for those still running Linux on Alpha.
Well, if they're all EV56 or later we're still good as they can actually
do what we need.
But I don't think a ll/sc implementation of the store_release can even
work, because in that case all users of the other bytes also need a
ll/sc around them but how are we to know about them?
So the only real way to allow store_release on 8/16 bit values is by
removing all Alpha support _pre_ EV56 :/
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-15 8:07 ` Peter Zijlstra
@ 2014-01-15 20:53 ` Paul E. McKenney
2014-01-15 23:21 ` Peter Zijlstra
0 siblings, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2014-01-15 20:53 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linus Torvalds, Richard Henderson, Matt Turner, Daniel J Blueman,
Waiman Long, Linux Kernel, Ivan Kokshaysky
On Wed, Jan 15, 2014 at 09:07:53AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 14, 2014 at 06:39:58PM -0800, Paul E. McKenney wrote:
> > > If you just want to do a store release, on alpha you'd want to
> > > implement that as a full memory barrier followed by a store. It
> > > doesn't get the advantage of a real release consistency model, but at
> > > least it's not doing an external bus access. But you can only do that
> > > store as a 4-byte or 8-byte store.on the older alphas (byte and word
> > > stores work on newer ones).
> > >
> > > Of course, it's entirely possible that nobody cares..
> >
> > That would be my hope. ;-)
> >
> > If nobody cares about Alpha period, it is easy. However, the last time
> > that I tried that approach, they sent me a URL of a wiki showing Alpha
> > systems still running mainline. But a slow-but-working approach for
> > Alpha does seem reasonable, even for those still running Linux on Alpha.
>
> Well, if they're all EV56 or later we're still good as they can actually
> do what we need.
>
> But I don't think a ll/sc implementation of the store_release can even
> work, because in that case all users of the other bytes also need a
> ll/sc around them but how are we to know about them?
>
> So the only real way to allow store_release on 8/16 bit values is by
> removing all Alpha support _pre_ EV56 :/
Fair point... We could demand Alpha-specific alignment, but that would
get really ugly really quickly.
But we did drop support for SMP i386 quite some time ago, so perhaps
it is time to drop support for SMP Alpha pre-EV56.
Thanx, Paul
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-15 20:53 ` Paul E. McKenney
@ 2014-01-15 23:21 ` Peter Zijlstra
[not found] ` <CA+55aFydYLQeBq=4AQQp_4dAnq09ocLmde1LFaXiNAJ=wJzfFA@mail.gmail.com>
0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2014-01-15 23:21 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Linus Torvalds, Richard Henderson, Matt Turner, Daniel J Blueman,
Waiman Long, Linux Kernel, Ivan Kokshaysky
On Wed, Jan 15, 2014 at 12:53:46PM -0800, Paul E. McKenney wrote:
> But we did drop support for SMP i386 quite some time ago, so perhaps
> it is time to drop support for SMP Alpha pre-EV56.
So while the primitive is called smp_store_release() the !SMP variant
still does:
*(volatile __type *) = ptr;
which should not compile on any Alpha pre EV56, SMP or no for __type ==
u8.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-14 23:44 ` Paul E. McKenney
2014-01-15 0:25 ` Linus Torvalds
@ 2014-01-15 3:08 ` Daniel J Blueman
1 sibling, 0 replies; 31+ messages in thread
From: Daniel J Blueman @ 2014-01-15 3:08 UTC (permalink / raw)
To: paulmck, Richard Henderson
Cc: Matt Turner, Peter Zijlstra, Waiman Long, Linux Kernel,
Ivan Kokshaysky, Linus Torvalds
On 01/15/2014 07:44 AM, Paul E. McKenney wrote:
> On Tue, Jan 14, 2014 at 10:01:04AM -0800, Richard Henderson wrote:
>> On 01/14/2014 09:08 AM, Matt Turner wrote:
>>> On Tue, Jan 14, 2014 at 3:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
>>>>>> Peter,
>>>>>>
>>>>>> I found out that the build failure was caused by the fact that the
>>>>>> __native_word() macro (used internally by compiletime_assert_atomic())
>>>>>> allows only a size of 4 or 8 for x86-64. The data type that I used is a
>>>>>> byte. Is there a reason why byte and short are not considered native?
>>>>>
>>>>> It seems likely it was implemented like that since there was no existing
>>>>> need; long can be relied on as the largest native type, so this should
>>>>> suffice and works here:
>>>>
>>>> There's Alphas that cannot actually atomically adres a byte; I do not
>>>> konw if Linux cares about them, but if it does, we cannot in fact rely
>>>> on this in generic primitives like this.
>>>
>>> That's right, and thanks for the heads-up. Alpha can only address 4
>>> and 8 bytes atomically. (LDL_L, LDQ_L, STL_C, STQ_C).
>>>
>>> The Byte-Word extension in EV56 doesn't add new atomics, so in fact no
>>> Alphas can address < 4 bytes atomically.
>>
>> Emulated with aligned 4 byte atomics, and masking. The same is true for arm,
>> ppc, mips which, depending on cpu, also lack < 4 byte atomics.
>
> Which means that Alpha should be able to similarly emulate 1-byte and
> 2-byte atomics, correct?
If it's not possible to guarantee that GCC emits the 4-byte atomics by
using a union, then we could emit the instructions via assembly. We'd
introduce a macro to ensure lock word alignment, and this would be safe
for unsigned counting up to the packed type limit. Maybe that's just too
over-constrained though.
Thanks,
Daniel
--
Daniel J Blueman
Principal Software Engineer, Numascale
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v8 0/4] qrwlock: Introducing a queue read/write lock implementation
@ 2014-01-08 16:59 Waiman Long
2014-01-08 16:59 ` [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock() Waiman Long
0 siblings, 1 reply; 31+ messages in thread
From: Waiman Long @ 2014-01-08 16:59 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Arnd Bergmann
Cc: linux-arch, x86, linux-kernel, Peter Zijlstra, Steven Rostedt,
Andrew Morton, Michel Lespinasse, Andi Kleen, Rik van Riel,
Paul E. McKenney, Linus Torvalds, Raghavendra K T, George Spelvin,
Tim Chen, Aswin Chandramouleeswaran", Scott J Norton,
Waiman Long
v7->v8:
- Use atomic_t functions (which are implemented in all arch's) to
modify reader counts.
- Use smp_load_acquire() & smp_store_release() for barriers.
- Further tuning in slowpath performance.
v6->v7:
- Remove support for unfair lock, so only fair qrwlock will be provided.
- Move qrwlock.c to the kernel/locking directory.
v5->v6:
- Modify queue_read_can_lock() to avoid false positive result.
- Move the two slowpath functions' performance tuning change from
patch 4 to patch 1.
- Add a new optional patch to use the new smp_store_release() function
if that is merged.
v4->v5:
- Fix wrong definitions for QW_MASK_FAIR & QW_MASK_UNFAIR macros.
- Add an optional patch 4 which should only be applied after the
mcs_spinlock.h header file is merged.
v3->v4:
- Optimize the fast path with better cold cache behavior and
performance.
- Removing some testing code.
- Make x86 use queue rwlock with no user configuration.
v2->v3:
- Make read lock stealing the default and fair rwlock an option with
a different initializer.
- In queue_read_lock_slowpath(), check irq_count() and force spinning
and lock stealing in interrupt context.
- Unify the fair and classic read-side code path, and make write-side
to use cmpxchg with 2 different writer states. This slows down the
write lock fastpath to make the read side more efficient, but is
still slightly faster than a spinlock.
v1->v2:
- Improve lock fastpath performance.
- Optionally provide classic read/write lock behavior for backward
compatibility.
- Use xadd instead of cmpxchg for fair reader code path to make it
immute to reader contention.
- Run more performance testing.
As mentioned in the LWN article http://lwn.net/Articles/364583/,
the read/write lock suffer from an unfairness problem that it is
possible for a stream of incoming readers to block a waiting writer
from getting the lock for a long time. Also, a waiting reader/writer
contending a rwlock in local memory will have a higher chance of
acquiring the lock than a reader/writer in remote node.
This patch set introduces a queue-based read/write lock implementation
that can largely solve this unfairness problem.
The read lock slowpath will check if the reader is in an interrupt
context. If so, it will force lock spinning and stealing without
waiting in a queue. This is to ensure the read lock will be granted
as soon as possible.
The queue write lock can also be used as a replacement for ticket
spinlocks that are highly contended if lock size increase is not
an issue.
This patch set currently provides queue read/write lock support on
x86 architecture only. Support for other architectures can be added
later on once architecture specific support infrastructure is added
and proper testing is done.
The optional patch 3 has a dependency on the the mcs_spinlock.h
header file which has not been merged yet. So this patch should only
be applied after the mcs_spinlock.h header file is merged.
The optional patch 4 use the new smp_store_release() and
smp_load_acquire() functions which are being reviewed & not merged yet.
Waiman Long (4):
qrwlock: A queue read/write lock implementation
qrwlock x86: Enable x86 to use queue read/write lock
qrwlock: Use the mcs_spinlock helper functions for MCS queuing
qrwlock: Use smp_store_release() in write_unlock()
arch/x86/Kconfig | 1 +
arch/x86/include/asm/spinlock.h | 2 +
arch/x86/include/asm/spinlock_types.h | 4 +
include/asm-generic/qrwlock.h | 203 +++++++++++++++++++++++++++++++++
kernel/Kconfig.locks | 7 +
kernel/locking/Makefile | 1 +
kernel/locking/qrwlock.c | 191 +++++++++++++++++++++++++++++++
7 files changed, 409 insertions(+), 0 deletions(-)
create mode 100644 include/asm-generic/qrwlock.h
create mode 100644 kernel/locking/qrwlock.c
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()
2014-01-08 16:59 [PATCH v8 0/4] qrwlock: Introducing a queue read/write lock implementation Waiman Long
@ 2014-01-08 16:59 ` Waiman Long
0 siblings, 0 replies; 31+ messages in thread
From: Waiman Long @ 2014-01-08 16:59 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Arnd Bergmann
Cc: linux-arch, x86, linux-kernel, Peter Zijlstra, Steven Rostedt,
Andrew Morton, Michel Lespinasse, Andi Kleen, Rik van Riel,
Paul E. McKenney, Linus Torvalds, Raghavendra K T, George Spelvin,
Tim Chen, Aswin Chandramouleeswaran", Scott J Norton,
Waiman Long
This patch modifies the queue_write_unlock() function to use the
new smp_store_release() function in another pending patch. It also
removes the temporary implementation of smp_load_acquire() and
smp_store_release() function in qrwlock.c.
This patch should only be merged if PeterZ's linux-arch patch patch
was merged.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
include/asm-generic/qrwlock.h | 4 +---
kernel/locking/qrwlock.c | 34 ----------------------------------
2 files changed, 1 insertions(+), 37 deletions(-)
diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 2b9a7b4..4d4bd04 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -179,9 +179,7 @@ static inline void queue_write_unlock(struct qrwlock *lock)
/*
* Make sure that none of the critical section will be leaked out.
*/
- smp_mb__before_clear_bit();
- ACCESS_ONCE(lock->cnts.writer) = 0;
- smp_mb__after_clear_bit();
+ smp_store_release(&lock->cnts.writer, 0)
}
/*
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 1b3ffb2..3d3ba2b 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -48,40 +48,6 @@
# define arch_mutex_cpu_relax() cpu_relax()
#endif
-#ifndef smp_load_acquire
-# ifdef CONFIG_X86
-# define smp_load_acquire(p) \
- ({ \
- typeof(*p) ___p1 = ACCESS_ONCE(*p); \
- barrier(); \
- ___p1; \
- })
-# else
-# define smp_load_acquire(p) \
- ({ \
- typeof(*p) ___p1 = ACCESS_ONCE(*p); \
- smp_mb(); \
- ___p1; \
- })
-# endif
-#endif
-
-#ifndef smp_store_release
-# ifdef CONFIG_X86
-# define smp_store_release(p, v) \
- do { \
- barrier(); \
- ACCESS_ONCE(*p) = v; \
- } while (0)
-# else
-# define smp_store_release(p, v) \
- do { \
- smp_mb(); \
- ACCESS_ONCE(*p) = v; \
- } while (0)
-# endif
-#endif
-
/*
* If an xadd (exchange-add) macro isn't available, simulate one with
* the atomic_add_return() function.
--
1.7.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
end of thread, other threads:[~2014-01-21 16:16 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-13 2:47 [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock() Daniel J Blueman
2014-01-13 3:49 ` Paul E. McKenney
2014-01-13 16:41 ` Waiman Long
2014-01-14 2:28 ` Daniel J Blueman
2014-01-14 11:03 ` Peter Zijlstra
2014-01-14 15:25 ` Waiman Long
2014-01-14 17:08 ` Matt Turner
2014-01-14 18:01 ` Richard Henderson
2014-01-14 19:09 ` Waiman Long
2014-01-14 20:20 ` Peter Zijlstra
2014-01-14 23:44 ` Paul E. McKenney
2014-01-15 0:25 ` Linus Torvalds
2014-01-15 2:39 ` Paul E. McKenney
2014-01-15 8:07 ` Peter Zijlstra
2014-01-15 20:53 ` Paul E. McKenney
2014-01-15 23:21 ` Peter Zijlstra
[not found] ` <CA+55aFydYLQeBq=4AQQp_4dAnq09ocLmde1LFaXiNAJ=wJzfFA@mail.gmail.com>
2014-01-16 10:36 ` Peter Zijlstra
2014-01-18 10:01 ` Paul E. McKenney
2014-01-18 11:34 ` Peter Zijlstra
2014-01-18 12:25 ` Paul E. McKenney
2014-01-18 12:41 ` Peter Zijlstra
2014-01-18 21:22 ` Paul E. McKenney
2014-01-19 0:57 ` Linus Torvalds
2014-01-19 8:04 ` Paul E. McKenney
2014-01-19 19:56 ` Linus Torvalds
2014-01-20 0:52 ` Paul E. McKenney
2014-01-21 15:02 ` Waiman Long
2014-01-21 15:41 ` Peter Zijlstra
2014-01-21 16:16 ` Waiman Long
2014-01-15 3:08 ` Daniel J Blueman
-- strict thread matches above, loose matches on Subject: below --
2014-01-08 16:59 [PATCH v8 0/4] qrwlock: Introducing a queue read/write lock implementation Waiman Long
2014-01-08 16:59 ` [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock() Waiman Long
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.