public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v2)
@ 2025-11-24  9:26 George Guo
  2025-11-24  9:26 ` [PATCH v2 1/2] LoongArch: Add 128-bit atomic cmpxchg support George Guo
  2025-11-24  9:26 ` [PATCH v2 2/2] LoongArch: Enable 128-bit atomics " George Guo
  0 siblings, 2 replies; 8+ messages in thread
From: George Guo @ 2025-11-24  9:26 UTC (permalink / raw)
  To: Huacai Chen, WANG Xuerui; +Cc: loongarch, linux-kernel, George Guo, George Guo

This patch series adds 128-bit atomic compare-and-exchange support for
LoongArch architecture, which fixes BPF scheduler test failures caused
by missing 128-bit atomics support.

The series consists of two patches:

1. "LoongArch: Add 128-bit atomic cmpxchg support"
   - Implements 128-bit atomic compare-and-exchange using LoongArch's
     LL.D/SC.Q instructions
   - Fixes BPF scheduler test failures (scx_central scx_qmap) where
     kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
     leading to -ENOMEM errors during scheduler initialization

2. "LoongArch: Enable 128-bit atomics cmpxchg support"
   - Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
     in Kconfig to enable 128-bit atomic cmpxchg support

The issue was identified through BPF scheduler test failures where
scx_central and scx_qmap schedulers would fail to initialize. Testing
was performed using the scx_qmap scheduler from tools/sched_ext/,
confirming that the patches resolve the initialization failures.

Signed-off-by: George Guo <dongtai.guo@linux.dev>
---
Changes in v2:
- Use a normal ld.d for the high word instead of ll.d to avoid race
  condition
- Insert a dbar between ll.d and ld.d to prevent reordering
- Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
- Fix address operand constraints after testing different approaches:
  * ld.d with "m"
  * ll.d with "ZC",
  * sc.q with "ZB"(alternative constraints caused issues:
   - "r"  caused system hang
   - "ZC" caused compiler error:
     {standard input}: Assembler messages:
     {standard input}:10037: Fatal error: Immediate overflow.
     format: u0:0 )
- Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev

---
George Guo (2):
      LoongArch: Add 128-bit atomic cmpxchg support
      LoongArch: Enable 128-bit atomics cmpxchg support

 arch/loongarch/Kconfig               |  2 ++
 arch/loongarch/include/asm/cmpxchg.h | 47 ++++++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+)
---
base-commit: d5ae5ac32615e4af729f0610fdc11ff4f4798aef
change-id: 20251120-2-d03862b2cf6d

Best regards,
-- 
George Guo <dongtai.guo@linux.dev>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] LoongArch: Add 128-bit atomic cmpxchg support
  2025-11-24  9:26 [PATCH v2 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v2) George Guo
@ 2025-11-24  9:26 ` George Guo
  2025-11-24 11:37   ` hev
  2025-11-24  9:26 ` [PATCH v2 2/2] LoongArch: Enable 128-bit atomics " George Guo
  1 sibling, 1 reply; 8+ messages in thread
From: George Guo @ 2025-11-24  9:26 UTC (permalink / raw)
  To: Huacai Chen, WANG Xuerui; +Cc: loongarch, linux-kernel, George Guo, George Guo

From: George Guo <guodongtai@kylinos.cn>

Implement 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions.

At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
caused by kmalloc_nolock_noprof returning NULL due to missing
128-bit atomics. The NULL returns led to -ENOMEM errors during
scheduler initialization, causing test cases to fail.

Verified by testing with the scx_qmap scheduler (located in
tools/sched_ext/). Building with `make` and running
./tools/sched_ext/build/bin/scx_qmap.

Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
 arch/loongarch/include/asm/cmpxchg.h | 47 ++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..757f6e82b9880d04f4883dc9a802312111aa4588 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -111,6 +111,44 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
 	__ret;								\
 })
 
+union __u128_halves {
+	u128 full;
+	struct {
+		u64 low;
+		u64 high;
+	};
+};
+
+#define __cmpxchg128_asm(ptr, old, new)					\
+({									\
+	union __u128_halves __old, __new, __ret;			\
+	volatile u64 *__ptr = (volatile u64 *)(ptr);			\
+									\
+	__old.full = (old);                                             \
+	__new.full = (new);						\
+									\
+	__asm__ __volatile__(						\
+	"1:   ll.d    %0, %3		# 128-bit cmpxchg low	\n"	\
+	"     dbar    0			# memory barrier	\n"	\
+	"     ld.d    %1, %4		# 128-bit cmpxchg high	\n"	\
+	"     bne     %0, %z5, 2f				\n"	\
+	"     bne     %1, %z6, 2f				\n"	\
+	"     move    $t0, %z7					\n"	\
+	"     move    $t1, %z8					\n"	\
+	"     sc.q    $t0, $t1, %2				\n"	\
+	"     beqz    $t0, 1b					\n"	\
+	"2:							\n"	\
+	__WEAK_LLSC_MB							\
+	: "=&r" (__ret.low), "=&r" (__ret.high),			\
+	  "=ZB" (__ptr[0])						\
+	: "ZC" (__ptr[0]), "m" (__ptr[1]),				\
+	  "Jr" (__old.low), "Jr" (__old.high),				\
+	  "Jr" (__new.low), "Jr" (__new.high)				\
+	: "t0", "t1", "memory");					\
+									\
+	__ret.full;							\
+})
+
 static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
 					   unsigned int new, unsigned int size)
 {
@@ -198,6 +236,15 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
 	__res;								\
 })
 
+/* cmpxchg128 */
+#define system_has_cmpxchg128()		1
+
+#define arch_cmpxchg128(ptr, o, n)					\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 16);				\
+	__cmpxchg128_asm(ptr, o, n);			\
+})
+
 #ifdef CONFIG_64BIT
 #define arch_cmpxchg64_local(ptr, o, n)					\
   ({									\

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] LoongArch: Enable 128-bit atomics cmpxchg support
  2025-11-24  9:26 [PATCH v2 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v2) George Guo
  2025-11-24  9:26 ` [PATCH v2 1/2] LoongArch: Add 128-bit atomic cmpxchg support George Guo
@ 2025-11-24  9:26 ` George Guo
  1 sibling, 0 replies; 8+ messages in thread
From: George Guo @ 2025-11-24  9:26 UTC (permalink / raw)
  To: Huacai Chen, WANG Xuerui; +Cc: loongarch, linux-kernel, George Guo, George Guo

From: George Guo <guodongtai@kylinos.cn>

Add select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE in Kconfig
to enable 128-bit atomic cmpxchg support on LoongArch.

Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
 arch/loongarch/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 5b1116733d881bc2b1b43fb93f20367add4dbc54..6fb2c253969f9ddece5478920423d7326c3ec046 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -114,6 +114,7 @@ config LOONGARCH
 	select GENERIC_TIME_VSYSCALL
 	select GPIOLIB
 	select HAS_IOPORT
+	select HAVE_ALIGNED_STRUCT_PAGE
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_JUMP_LABEL
 	select HAVE_ARCH_JUMP_LABEL_RELATIVE
@@ -140,6 +141,7 @@ config LOONGARCH
 	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS
 	select HAVE_EBPF_JIT
+	select HAVE_CMPXCHG_DOUBLE
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS if !ARCH_STRICT_ALIGN
 	select HAVE_EXIT_THREAD
 	select HAVE_GENERIC_TIF_BITS

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] LoongArch: Add 128-bit atomic cmpxchg support
  2025-11-24  9:26 ` [PATCH v2 1/2] LoongArch: Add 128-bit atomic cmpxchg support George Guo
@ 2025-11-24 11:37   ` hev
  2025-11-25  2:43     ` George Guo
  0 siblings, 1 reply; 8+ messages in thread
From: hev @ 2025-11-24 11:37 UTC (permalink / raw)
  To: George Guo; +Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo

On Mon, Nov 24, 2025 at 5:28 PM George Guo <dongtai.guo@linux.dev> wrote:
>
> From: George Guo <guodongtai@kylinos.cn>
>
> Implement 128-bit atomic compare-and-exchange using LoongArch's
> LL.D/SC.Q instructions.
>
> At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
> caused by kmalloc_nolock_noprof returning NULL due to missing
> 128-bit atomics. The NULL returns led to -ENOMEM errors during
> scheduler initialization, causing test cases to fail.
>
> Verified by testing with the scx_qmap scheduler (located in
> tools/sched_ext/). Building with `make` and running
> ./tools/sched_ext/build/bin/scx_qmap.
>
> Signed-off-by: George Guo <guodongtai@kylinos.cn>
> ---
>  arch/loongarch/include/asm/cmpxchg.h | 47 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
>
> diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> index 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..757f6e82b9880d04f4883dc9a802312111aa4588 100644
> --- a/arch/loongarch/include/asm/cmpxchg.h
> +++ b/arch/loongarch/include/asm/cmpxchg.h
> @@ -111,6 +111,44 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
>         __ret;                                                          \
>  })
>
> +union __u128_halves {
> +       u128 full;
> +       struct {
> +               u64 low;
> +               u64 high;
> +       };
> +};
> +
> +#define __cmpxchg128_asm(ptr, old, new)                                        \
> +({                                                                     \
> +       union __u128_halves __old, __new, __ret;                        \
> +       volatile u64 *__ptr = (volatile u64 *)(ptr);                    \
> +                                                                       \
> +       __old.full = (old);                                             \
> +       __new.full = (new);                                             \
> +                                                                       \
> +       __asm__ __volatile__(                                           \
> +       "1:   ll.d    %0, %3            # 128-bit cmpxchg low   \n"     \
> +       "     dbar    0                 # memory barrier        \n"     \
> +       "     ld.d    %1, %4            # 128-bit cmpxchg high  \n"     \
> +       "     bne     %0, %z5, 2f                               \n"     \
> +       "     bne     %1, %z6, 2f                               \n"     \
> +       "     move    $t0, %z7                                  \n"     \
> +       "     move    $t1, %z8                                  \n"     \
> +       "     sc.q    $t0, $t1, %2                              \n"     \
> +       "     beqz    $t0, 1b                                   \n"     \
> +       "2:                                                     \n"     \
> +       __WEAK_LLSC_MB                                                  \
> +       : "=&r" (__ret.low), "=&r" (__ret.high),                        \
> +         "=ZB" (__ptr[0])                                              \

"ZB" isn't a legal constraint for the address operand in sc.q. When
assembled, it turns into something like sc.q $r,$r,$r,0, which clearly
doesn't match the instruction format, yet gas happily accepts it wheil
clang rightfully rejects it. Classic GNU-as leniency biting again. :)

> +       : "ZC" (__ptr[0]), "m" (__ptr[1]),                              \
> +         "Jr" (__old.low), "Jr" (__old.high),                          \
> +         "Jr" (__new.low), "Jr" (__new.high)                           \
> +       : "t0", "t1", "memory");                                        \
> +                                                                       \
> +       __ret.full;                                                     \
> +})
> +
>  static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
>                                            unsigned int new, unsigned int size)
>  {
> @@ -198,6 +236,15 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
>         __res;                                                          \
>  })
>
> +/* cmpxchg128 */
> +#define system_has_cmpxchg128()                1
> +
> +#define arch_cmpxchg128(ptr, o, n)                                     \
> +({                                                                     \
> +       BUILD_BUG_ON(sizeof(*(ptr)) != 16);                             \
> +       __cmpxchg128_asm(ptr, o, n);                    \
> +})
> +
>  #ifdef CONFIG_64BIT
>  #define arch_cmpxchg64_local(ptr, o, n)                                        \
>    ({                                                                   \
>
> --
> 2.48.1
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] LoongArch: Add 128-bit atomic cmpxchg support
  2025-11-24 11:37   ` hev
@ 2025-11-25  2:43     ` George Guo
  2025-11-25  3:04       ` Xi Ruoyao
  2025-11-25  3:32       ` hev
  0 siblings, 2 replies; 8+ messages in thread
From: George Guo @ 2025-11-25  2:43 UTC (permalink / raw)
  To: hev; +Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=GB18030, Size: 4499 bytes --]

On Mon, 24 Nov 2025 19:37:40 +0800
hev <r@hev.cc> wrote:

> On Mon, Nov 24, 2025 at 5:286§2PM George Guo <dongtai.guo@linux.dev>
> wrote:
> >
> > From: George Guo <guodongtai@kylinos.cn>
> >
> > Implement 128-bit atomic compare-and-exchange using LoongArch's
> > LL.D/SC.Q instructions.
> >
> > At the same time, fix BPF scheduler test failures (scx_central
> > scx_qmap) caused by kmalloc_nolock_noprof returning NULL due to
> > missing 128-bit atomics. The NULL returns led to -ENOMEM errors
> > during scheduler initialization, causing test cases to fail.
> >
> > Verified by testing with the scx_qmap scheduler (located in
> > tools/sched_ext/). Building with `make` and running
> > ./tools/sched_ext/build/bin/scx_qmap.
> >
> > Signed-off-by: George Guo <guodongtai@kylinos.cn>
> > ---
> >  arch/loongarch/include/asm/cmpxchg.h | 47
> > ++++++++++++++++++++++++++++++++++++ 1 file changed, 47
> > insertions(+)
> >
> > diff --git a/arch/loongarch/include/asm/cmpxchg.h
> > b/arch/loongarch/include/asm/cmpxchg.h index
> > 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..757f6e82b9880d04f4883dc9a802312111aa4588
> > 100644 --- a/arch/loongarch/include/asm/cmpxchg.h +++
> > b/arch/loongarch/include/asm/cmpxchg.h @@ -111,6 +111,44 @@
> > __arch_xchg(volatile void *ptr, unsigned long x, int size) __ret;
> >                                                        \ })
> >
> > +union __u128_halves {
> > +       u128 full;
> > +       struct {
> > +               u64 low;
> > +               u64 high;
> > +       };
> > +};
> > +
> > +#define __cmpxchg128_asm(ptr, old, new)
> >             \ +({
> >                   \
> > +       union __u128_halves __old, __new, __ret;
> >     \
> > +       volatile u64 *__ptr = (volatile u64 *)(ptr);
> >     \
> > +
> >     \
> > +       __old.full = (old);
> >     \
> > +       __new.full = (new);
> >     \
> > +
> >     \
> > +       __asm__ __volatile__(
> >     \
> > +       "1:   ll.d    %0, %3            # 128-bit cmpxchg low   \n"
> >     \
> > +       "     dbar    0                 # memory barrier        \n"
> >     \
> > +       "     ld.d    %1, %4            # 128-bit cmpxchg high  \n"
> >     \
> > +       "     bne     %0, %z5, 2f                               \n"
> >     \
> > +       "     bne     %1, %z6, 2f                               \n"
> >     \
> > +       "     move    $t0, %z7                                  \n"
> >     \
> > +       "     move    $t1, %z8                                  \n"
> >     \
> > +       "     sc.q    $t0, $t1, %2                              \n"
> >     \
> > +       "     beqz    $t0, 1b                                   \n"
> >     \
> > +       "2:                                                     \n"
> >     \
> > +       __WEAK_LLSC_MB
> >     \
> > +       : "=&r" (__ret.low), "=&r" (__ret.high),
> >     \
> > +         "=ZB" (__ptr[0])
> >     \
> 
> "ZB" isn't a legal constraint for the address operand in sc.q. When
> assembled, it turns into something like sc.q $r,$r,$r,0, which clearly
> doesn't match the instruction format, yet gas happily accepts it wheil
> clang rightfully rejects it. Classic GNU-as leniency biting again. :)
> 
Hi Hev,

Thanks for your advice, I tried sc.q with r or ZC. the result as
below: (with gcc 14.2.1 in fedora-42)
   - sc.q with "r"  caused system hang
   - sc.q with "ZC" caused compiler error:
     {standard input}: Assembler messages:
     {standard input}:10037: Fatal error: Immediate overflow.
     format: u0:0 )
> > +       : "ZC" (__ptr[0]), "m" (__ptr[1]),
> >     \
> > +         "Jr" (__old.low), "Jr" (__old.high),
> >     \
> > +         "Jr" (__new.low), "Jr" (__new.high)
> >     \
> > +       : "t0", "t1", "memory");
> >     \
> > +
> >     \
> > +       __ret.full;
> >     \ +})
> > +
> >  static inline unsigned int __cmpxchg_small(volatile void *ptr,
> > unsigned int old, unsigned int new, unsigned int size)
> >  {
> > @@ -198,6 +236,15 @@ __cmpxchg(volatile void *ptr, unsigned long
> > old, unsigned long new, unsigned int __res;
> >                                  \ })
> >
> > +/* cmpxchg128 */
> > +#define system_has_cmpxchg128()                1
> > +
> > +#define arch_cmpxchg128(ptr, o, n)
> >     \ +({
> >           \
> > +       BUILD_BUG_ON(sizeof(*(ptr)) != 16);
> >     \
> > +       __cmpxchg128_asm(ptr, o, n);                    \
> > +})
> > +
> >  #ifdef CONFIG_64BIT
> >  #define arch_cmpxchg64_local(ptr, o, n)
> >             \ ({
> >                \
> >
> > --
> > 2.48.1
> >
> >
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] LoongArch: Add 128-bit atomic cmpxchg support
  2025-11-25  2:43     ` George Guo
@ 2025-11-25  3:04       ` Xi Ruoyao
       [not found]         ` <b5de6a2e-a700-4687-b483-2d60e309de25@loongson.cn>
  2025-11-25  3:32       ` hev
  1 sibling, 1 reply; 8+ messages in thread
From: Xi Ruoyao @ 2025-11-25  3:04 UTC (permalink / raw)
  To: George Guo, hev
  Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo

On Tue, 2025-11-25 at 10:43 +0800, George Guo wrote:
> > > +         "=ZB" (__ptr[0])
> > >      \
> > 
> > "ZB" isn't a legal constraint for the address operand in sc.q. When
> > assembled, it turns into something like sc.q $r,$r,$r,0, which clearly
> > doesn't match the instruction format, yet gas happily accepts it wheil
> > clang rightfully rejects it. Classic GNU-as leniency biting again. :)

I clearly remember when Jiajie submitted the sc.q support to GAS
Qinggang was really insistent on supporting the additional ",0" here. 
But I don't really understand why we must support it...
> 
> Thanks for your advice, I tried sc.q with r or ZC. the result as
> below: (with gcc 14.2.1 in fedora-42)
>    - sc.q with "r"  caused system hang

It won't work because it'll pass the value (not address) of __ptr[0].

>    - sc.q with "ZC" caused compiler error:
>      {standard input}: Assembler messages:
>      {standard input}:10037: Fatal error: Immediate overflow.

It won't work because the only accepted immediate of sc.q is 0, but ZC
would allow any factor of 4 in [-32768, 32768).  I.e. ZC is for
{ldptr,stptr,ll,sc}.{w,d}.

As ZB is only used for sc.q (yet) in GCC backend maybe we can change ZB
to print simply $rX instead of $rX,0 and make LLVM do the same.  Would
someone submit a GCC patch for that?  Or is there already such a
constraint but I don't know?

BTW for the barrier between ll.d and ld.d, "dbar 0x700" is enough to
order two loads on the same address, and a Loongson hardware engineer
just confirmed me privately that "same address" can be read as "in the
same cacheline" here.  Thus it's enough in our case and it has a lower
overhead than "dbar 0".

-- 
Xi Ruoyao <xry111@xry111.site>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] LoongArch: Add 128-bit atomic cmpxchg support
  2025-11-25  2:43     ` George Guo
  2025-11-25  3:04       ` Xi Ruoyao
@ 2025-11-25  3:32       ` hev
  1 sibling, 0 replies; 8+ messages in thread
From: hev @ 2025-11-25  3:32 UTC (permalink / raw)
  To: George Guo; +Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo

On Tue, Nov 25, 2025 at 10:43 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> On Mon, 24 Nov 2025 19:37:40 +0800
> hev <r@hev.cc> wrote:
>
> > On Mon, Nov 24, 2025 at 5:28 PM George Guo <dongtai.guo@linux.dev>
> > wrote:
> > >
> > > From: George Guo <guodongtai@kylinos.cn>
> > >
> > > Implement 128-bit atomic compare-and-exchange using LoongArch's
> > > LL.D/SC.Q instructions.
> > >
> > > At the same time, fix BPF scheduler test failures (scx_central
> > > scx_qmap) caused by kmalloc_nolock_noprof returning NULL due to
> > > missing 128-bit atomics. The NULL returns led to -ENOMEM errors
> > > during scheduler initialization, causing test cases to fail.
> > >
> > > Verified by testing with the scx_qmap scheduler (located in
> > > tools/sched_ext/). Building with `make` and running
> > > ./tools/sched_ext/build/bin/scx_qmap.
> > >
> > > Signed-off-by: George Guo <guodongtai@kylinos.cn>
> > > ---
> > >  arch/loongarch/include/asm/cmpxchg.h | 47
> > > ++++++++++++++++++++++++++++++++++++ 1 file changed, 47
> > > insertions(+)
> > >
> > > diff --git a/arch/loongarch/include/asm/cmpxchg.h
> > > b/arch/loongarch/include/asm/cmpxchg.h index
> > > 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..757f6e82b9880d04f4883dc9a802312111aa4588
> > > 100644 --- a/arch/loongarch/include/asm/cmpxchg.h +++
> > > b/arch/loongarch/include/asm/cmpxchg.h @@ -111,6 +111,44 @@
> > > __arch_xchg(volatile void *ptr, unsigned long x, int size) __ret;
> > >                                                        \ })
> > >
> > > +union __u128_halves {
> > > +       u128 full;
> > > +       struct {
> > > +               u64 low;
> > > +               u64 high;
> > > +       };
> > > +};
> > > +
> > > +#define __cmpxchg128_asm(ptr, old, new)
> > >             \ +({
> > >                   \
> > > +       union __u128_halves __old, __new, __ret;
> > >     \
> > > +       volatile u64 *__ptr = (volatile u64 *)(ptr);
> > >     \
> > > +
> > >     \
> > > +       __old.full = (old);
> > >     \
> > > +       __new.full = (new);
> > >     \
> > > +
> > >     \
> > > +       __asm__ __volatile__(
> > >     \
> > > +       "1:   ll.d    %0, %3            # 128-bit cmpxchg low   \n"
> > >     \
> > > +       "     dbar    0                 # memory barrier        \n"
> > >     \
> > > +       "     ld.d    %1, %4            # 128-bit cmpxchg high  \n"
> > >     \
> > > +       "     bne     %0, %z5, 2f                               \n"
> > >     \
> > > +       "     bne     %1, %z6, 2f                               \n"
> > >     \
> > > +       "     move    $t0, %z7                                  \n"
> > >     \
> > > +       "     move    $t1, %z8                                  \n"
> > >     \
> > > +       "     sc.q    $t0, $t1, %2                              \n"
> > >     \
> > > +       "     beqz    $t0, 1b                                   \n"
> > >     \
> > > +       "2:                                                     \n"
> > >     \
> > > +       __WEAK_LLSC_MB
> > >     \
> > > +       : "=&r" (__ret.low), "=&r" (__ret.high),
> > >     \
> > > +         "=ZB" (__ptr[0])
> > >     \
> >
> > "ZB" isn't a legal constraint for the address operand in sc.q. When
> > assembled, it turns into something like sc.q $r,$r,$r,0, which clearly
> > doesn't match the instruction format, yet gas happily accepts it wheil
> > clang rightfully rejects it. Classic GNU-as leniency biting again. :)
> >
> Hi Hev,
>
> Thanks for your advice, I tried sc.q with r or ZC. the result as
> below: (with gcc 14.2.1 in fedora-42)
>    - sc.q with "r"  caused system hang

input operands:
 : "r" (__ptr), ...

>    - sc.q with "ZC" caused compiler error:
>      {standard input}: Assembler messages:
>      {standard input}:10037: Fatal error: Immediate overflow.
>      format: u0:0 )
> > > +       : "ZC" (__ptr[0]), "m" (__ptr[1]),
> > >     \
> > > +         "Jr" (__old.low), "Jr" (__old.high),
> > >     \
> > > +         "Jr" (__new.low), "Jr" (__new.high)
> > >     \
> > > +       : "t0", "t1", "memory");
> > >     \
> > > +
> > >     \
> > > +       __ret.full;
> > >     \ +})
> > > +
> > >  static inline unsigned int __cmpxchg_small(volatile void *ptr,
> > > unsigned int old, unsigned int new, unsigned int size)
> > >  {
> > > @@ -198,6 +236,15 @@ __cmpxchg(volatile void *ptr, unsigned long
> > > old, unsigned long new, unsigned int __res;
> > >                                  \ })
> > >
> > > +/* cmpxchg128 */
> > > +#define system_has_cmpxchg128()                1
> > > +
> > > +#define arch_cmpxchg128(ptr, o, n)
> > >     \ +({
> > >           \
> > > +       BUILD_BUG_ON(sizeof(*(ptr)) != 16);
> > >     \
> > > +       __cmpxchg128_asm(ptr, o, n);                    \
> > > +})
> > > +
> > >  #ifdef CONFIG_64BIT
> > >  #define arch_cmpxchg64_local(ptr, o, n)
> > >             \ ({
> > >                \
> > >
> > > --
> > > 2.48.1
> > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] LoongArch: Add 128-bit atomic cmpxchg support
       [not found]         ` <b5de6a2e-a700-4687-b483-2d60e309de25@loongson.cn>
@ 2025-11-25  8:01           ` Xi Ruoyao
  0 siblings, 0 replies; 8+ messages in thread
From: Xi Ruoyao @ 2025-11-25  8:01 UTC (permalink / raw)
  To: mengqinggang, George Guo, hev
  Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo

On Tue, 2025-11-25 at 15:34 +0800, mengqinggang wrote:
> > I clearly remember when Jiajie submitted the sc.q support to GAS
> > Qinggang was really insistent on supporting the additional ",0" here. 
> > But I don't really understand why we must support it...

> Because gcc output sc.q $r,$r,$r,0. 

At that time GCC had not started to use sc.q yet.  Some unfortunate
communication error, I guess.


-- 
Xi Ruoyao <xry111@xry111.site>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-11-25  8:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-24  9:26 [PATCH v2 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v2) George Guo
2025-11-24  9:26 ` [PATCH v2 1/2] LoongArch: Add 128-bit atomic cmpxchg support George Guo
2025-11-24 11:37   ` hev
2025-11-25  2:43     ` George Guo
2025-11-25  3:04       ` Xi Ruoyao
     [not found]         ` <b5de6a2e-a700-4687-b483-2d60e309de25@loongson.cn>
2025-11-25  8:01           ` Xi Ruoyao
2025-11-25  3:32       ` hev
2025-11-24  9:26 ` [PATCH v2 2/2] LoongArch: Enable 128-bit atomics " George Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox