* [PATCH 0/2] s390: don't use 128-bit cmpxchg for READ_ONCE() purposes
@ 2023-02-24 10:02 Heiko Carstens
2023-02-24 10:02 ` [PATCH 1/2] s390/rwonce: add READ_ONCE_ALIGNED_128() macro Heiko Carstens
2023-02-24 10:02 ` [PATCH 2/2] s390/cpum_sf: use READ_ONCE_ALIGNED_128() instead of 128-bit cmpxchg Heiko Carstens
0 siblings, 2 replies; 6+ messages in thread
From: Heiko Carstens @ 2023-02-24 10:02 UTC (permalink / raw)
To: Vasily Gorbik, Alexander Gordeev, Thomas Richter
Cc: Sven Schnelle, Christian Borntraeger, Peter Zijlstra, linux-s390,
linux-kernel
Introduce and use an s390 specific READ_ONCE_ALIGNED_128() macro in order
to get rid of the odd 128-bit cmpxchg READ_ONCE() usage in cpum_sf, which
was introduced with commit 82d3edb50a11 ("s390/cpum_sf: add READ_ONCE()
semantics to compare and swap loops").
Heiko Carstens (2):
s390/rwonce: add READ_ONCE_ALIGNED_128() macro
s390/cpum_sf: use READ_ONCE_ALIGNED_128() instead of 128-bit cmpxchg
arch/s390/include/asm/rwonce.h | 31 +++++++++++++++++++++++++++++++
arch/s390/kernel/perf_cpum_sf.c | 9 +++------
2 files changed, 34 insertions(+), 6 deletions(-)
create mode 100644 arch/s390/include/asm/rwonce.h
--
2.37.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] s390/rwonce: add READ_ONCE_ALIGNED_128() macro
2023-02-24 10:02 [PATCH 0/2] s390: don't use 128-bit cmpxchg for READ_ONCE() purposes Heiko Carstens
@ 2023-02-24 10:02 ` Heiko Carstens
2023-02-25 16:50 ` Peter Zijlstra
2023-02-24 10:02 ` [PATCH 2/2] s390/cpum_sf: use READ_ONCE_ALIGNED_128() instead of 128-bit cmpxchg Heiko Carstens
1 sibling, 1 reply; 6+ messages in thread
From: Heiko Carstens @ 2023-02-24 10:02 UTC (permalink / raw)
To: Vasily Gorbik, Alexander Gordeev, Thomas Richter
Cc: Sven Schnelle, Christian Borntraeger, Peter Zijlstra, linux-s390,
linux-kernel
Add an s390 specific READ_ONCE_ALIGNED_128() helper, which can be used for
fast block concurrent (atomic) 128-bit accesses.
The used lpq instruction requires 128-bit alignment. This is also the
reason why the compiler doesn't emit this instruction if __READ_ONCE() is
used for 128-bit accesses.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
---
arch/s390/include/asm/rwonce.h | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
create mode 100644 arch/s390/include/asm/rwonce.h
diff --git a/arch/s390/include/asm/rwonce.h b/arch/s390/include/asm/rwonce.h
new file mode 100644
index 000000000000..91fc24520e82
--- /dev/null
+++ b/arch/s390/include/asm/rwonce.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_S390_RWONCE_H
+#define __ASM_S390_RWONCE_H
+
+#include <linux/compiler_types.h>
+
+/*
+ * Use READ_ONCE_ALIGNED_128() for 128-bit block concurrent (atomic) read
+ * accesses. Note that x must be 128-bit aligned, otherwise a specification
+ * exception is generated.
+ */
+#define READ_ONCE_ALIGNED_128(x) \
+({ \
+ union { \
+ typeof(x) __x; \
+ __uint128_t val; \
+ } __u; \
+ \
+ BUILD_BUG_ON(sizeof(x) != 16); \
+ asm volatile( \
+ " lpq %[val],%[_x]\n" \
+ : [val] "=d" (__u.val) \
+ : [_x] "QS" (x) \
+ : "memory"); \
+ __u.__x; \
+})
+
+#include <asm-generic/rwonce.h>
+
+#endif /* __ASM_S390_RWONCE_H */
--
2.37.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] s390/cpum_sf: use READ_ONCE_ALIGNED_128() instead of 128-bit cmpxchg
2023-02-24 10:02 [PATCH 0/2] s390: don't use 128-bit cmpxchg for READ_ONCE() purposes Heiko Carstens
2023-02-24 10:02 ` [PATCH 1/2] s390/rwonce: add READ_ONCE_ALIGNED_128() macro Heiko Carstens
@ 2023-02-24 10:02 ` Heiko Carstens
1 sibling, 0 replies; 6+ messages in thread
From: Heiko Carstens @ 2023-02-24 10:02 UTC (permalink / raw)
To: Vasily Gorbik, Alexander Gordeev, Thomas Richter
Cc: Sven Schnelle, Christian Borntraeger, Peter Zijlstra, linux-s390,
linux-kernel
Use READ_ONCE_ALIGNED_128() to read the previous value in front of a
128-bit cmpxchg loop, instead of (mis-)using a 128-bit cmpxchg operation to
do the same.
This makes the code more readable and is faster.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
---
arch/s390/kernel/perf_cpum_sf.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
index 79904a839fb9..e7b867e2f73f 100644
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -1355,8 +1355,7 @@ static void hw_perf_event_update(struct perf_event *event, int flush_all)
num_sdb++;
/* Reset trailer (using compare-double-and-swap) */
- /* READ_ONCE() 16 byte header */
- prev.val = __cdsg(&te->header.val, 0, 0);
+ prev.val = READ_ONCE_ALIGNED_128(te->header.val);
do {
old.val = prev.val;
new.val = prev.val;
@@ -1558,8 +1557,7 @@ static bool aux_set_alert(struct aux_buffer *aux, unsigned long alert_index,
struct hws_trailer_entry *te;
te = aux_sdb_trailer(aux, alert_index);
- /* READ_ONCE() 16 byte header */
- prev.val = __cdsg(&te->header.val, 0, 0);
+ prev.val = READ_ONCE_ALIGNED_128(te->header.val);
do {
old.val = prev.val;
new.val = prev.val;
@@ -1637,8 +1635,7 @@ static bool aux_reset_buffer(struct aux_buffer *aux, unsigned long range,
idx_old = idx = aux->empty_mark + 1;
for (i = 0; i < range_scan; i++, idx++) {
te = aux_sdb_trailer(aux, idx);
- /* READ_ONCE() 16 byte header */
- prev.val = __cdsg(&te->header.val, 0, 0);
+ prev.val = READ_ONCE_ALIGNED_128(te->header.val);
do {
old.val = prev.val;
new.val = prev.val;
--
2.37.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] s390/rwonce: add READ_ONCE_ALIGNED_128() macro
2023-02-24 10:02 ` [PATCH 1/2] s390/rwonce: add READ_ONCE_ALIGNED_128() macro Heiko Carstens
@ 2023-02-25 16:50 ` Peter Zijlstra
2023-02-26 20:56 ` Heiko Carstens
0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2023-02-25 16:50 UTC (permalink / raw)
To: Heiko Carstens
Cc: Vasily Gorbik, Alexander Gordeev, Thomas Richter, Sven Schnelle,
Christian Borntraeger, linux-s390, linux-kernel
On Fri, Feb 24, 2023 at 11:02:36AM +0100, Heiko Carstens wrote:
> Add an s390 specific READ_ONCE_ALIGNED_128() helper, which can be used for
> fast block concurrent (atomic) 128-bit accesses.
>
> The used lpq instruction requires 128-bit alignment. This is also the
> reason why the compiler doesn't emit this instruction if __READ_ONCE() is
> used for 128-bit accesses.
Does your u128 not have natural alignment? Does it help if you force
align the u128 type?
>
> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
> ---
> arch/s390/include/asm/rwonce.h | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
> create mode 100644 arch/s390/include/asm/rwonce.h
>
> diff --git a/arch/s390/include/asm/rwonce.h b/arch/s390/include/asm/rwonce.h
> new file mode 100644
> index 000000000000..91fc24520e82
> --- /dev/null
> +++ b/arch/s390/include/asm/rwonce.h
> @@ -0,0 +1,31 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_S390_RWONCE_H
> +#define __ASM_S390_RWONCE_H
> +
> +#include <linux/compiler_types.h>
> +
> +/*
> + * Use READ_ONCE_ALIGNED_128() for 128-bit block concurrent (atomic) read
> + * accesses. Note that x must be 128-bit aligned, otherwise a specification
> + * exception is generated.
> + */
> +#define READ_ONCE_ALIGNED_128(x) \
> +({ \
> + union { \
> + typeof(x) __x; \
> + __uint128_t val; \
> + } __u; \
> + \
> + BUILD_BUG_ON(sizeof(x) != 16); \
> + asm volatile( \
> + " lpq %[val],%[_x]\n" \
> + : [val] "=d" (__u.val) \
> + : [_x] "QS" (x) \
> + : "memory"); \
> + __u.__x; \
> +})
> +
> +#include <asm-generic/rwonce.h>
> +
> +#endif /* __ASM_S390_RWONCE_H */
> --
> 2.37.2
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] s390/rwonce: add READ_ONCE_ALIGNED_128() macro
2023-02-25 16:50 ` Peter Zijlstra
@ 2023-02-26 20:56 ` Heiko Carstens
2023-02-27 11:51 ` Peter Zijlstra
0 siblings, 1 reply; 6+ messages in thread
From: Heiko Carstens @ 2023-02-26 20:56 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Vasily Gorbik, Alexander Gordeev, Thomas Richter, Sven Schnelle,
Christian Borntraeger, linux-s390, linux-kernel
On Sat, Feb 25, 2023 at 05:50:58PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 24, 2023 at 11:02:36AM +0100, Heiko Carstens wrote:
> > Add an s390 specific READ_ONCE_ALIGNED_128() helper, which can be used for
> > fast block concurrent (atomic) 128-bit accesses.
> >
> > The used lpq instruction requires 128-bit alignment. This is also the
> > reason why the compiler doesn't emit this instruction if __READ_ONCE() is
> > used for 128-bit accesses.
>
> Does your u128 not have natural alignment? Does it help if you force
> align the u128 type?
s390 seems to be the only architecture which has a 64 bit alignment for
__uint128_t. But making it explicitly naturally aligned doesn't help.
I guess that's because the lpq instruction requires an even-odd register
pair where it reads to, while the now used lmg instruction can use any
register pair; but lmg doesn't come with atomic semantics.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] s390/rwonce: add READ_ONCE_ALIGNED_128() macro
2023-02-26 20:56 ` Heiko Carstens
@ 2023-02-27 11:51 ` Peter Zijlstra
0 siblings, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2023-02-27 11:51 UTC (permalink / raw)
To: Heiko Carstens
Cc: Vasily Gorbik, Alexander Gordeev, Thomas Richter, Sven Schnelle,
Christian Borntraeger, linux-s390, linux-kernel
On Sun, Feb 26, 2023 at 09:56:44PM +0100, Heiko Carstens wrote:
> On Sat, Feb 25, 2023 at 05:50:58PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 24, 2023 at 11:02:36AM +0100, Heiko Carstens wrote:
> > > Add an s390 specific READ_ONCE_ALIGNED_128() helper, which can be used for
> > > fast block concurrent (atomic) 128-bit accesses.
> > >
> > > The used lpq instruction requires 128-bit alignment. This is also the
> > > reason why the compiler doesn't emit this instruction if __READ_ONCE() is
> > > used for 128-bit accesses.
> >
> > Does your u128 not have natural alignment? Does it help if you force
> > align the u128 type?
>
> s390 seems to be the only architecture which has a 64 bit alignment for
> __uint128_t. But making it explicitly naturally aligned doesn't help.
> I guess that's because the lpq instruction requires an even-odd register
> pair where it reads to, while the now used lmg instruction can use any
> register pair; but lmg doesn't come with atomic semantics.
One thing you could do it talk with your compiler folks to allow using
lpq for volatile loads. That won't help you now and you'll have to do
these patches, but it makes sense to change the toolchains to me.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-02-27 11:51 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-24 10:02 [PATCH 0/2] s390: don't use 128-bit cmpxchg for READ_ONCE() purposes Heiko Carstens
2023-02-24 10:02 ` [PATCH 1/2] s390/rwonce: add READ_ONCE_ALIGNED_128() macro Heiko Carstens
2023-02-25 16:50 ` Peter Zijlstra
2023-02-26 20:56 ` Heiko Carstens
2023-02-27 11:51 ` Peter Zijlstra
2023-02-24 10:02 ` [PATCH 2/2] s390/cpum_sf: use READ_ONCE_ALIGNED_128() instead of 128-bit cmpxchg Heiko Carstens
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).