* [PATCH 0/2] LoongArch: Add 128-bit atomic cmpxchg support
@ 2025-11-20 7:45 George Guo
2025-11-20 7:45 ` [PATCH 1/2] " George Guo
2025-11-20 7:45 ` [PATCH 2/2] LoongArch: Enable 128-bit atomics " George Guo
0 siblings, 2 replies; 9+ messages in thread
From: George Guo @ 2025-11-20 7:45 UTC (permalink / raw)
To: Huacai Chen, WANG Xuerui; +Cc: loongarch, linux-kernel, George Guo, George Guo
This patch series adds 128-bit atomic compare-and-exchange support for
LoongArch architecture, which fixes BPF scheduler test failures caused
by missing 128-bit atomics support.
The series consists of two patches:
1. "LoongArch: Add 128-bit atomic cmpxchg support"
- Implements 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions
- Fixes BPF scheduler test failures (scx_central scx_qmap) where
kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
leading to -ENOMEM errors during scheduler initialization
2. "LoongArch: Enable 128-bit atomics cmpxchg support"
- Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
in Kconfig to enable 128-bit atomic cmpxchg support
The issue was identified through BPF scheduler test failures where
scx_central and scx_qmap schedulers would fail to initialize. Testing
was performed using the scx_qmap scheduler from tools/sched_ext/,
confirming that the patches resolve the initialization failures.
Signed-off-by: George Guo <dongtai.guo@linux.dev>
---
George Guo (2):
LoongArch: Add 128-bit atomic cmpxchg support
LoongArch: Enable 128-bit atomics cmpxchg support
arch/loongarch/Kconfig | 2 ++
arch/loongarch/include/asm/cmpxchg.h | 46 ++++++++++++++++++++++++++++++++++++
2 files changed, 48 insertions(+)
---
base-commit: 8b690556d8fe074b4f9835075050fba3fb180e93
change-id: 20251120-2-d03862b2cf6d
Best regards,
--
George Guo <dongtai.guo@linux.dev>
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH 1/2] LoongArch: Add 128-bit atomic cmpxchg support 2025-11-20 7:45 [PATCH 0/2] LoongArch: Add 128-bit atomic cmpxchg support George Guo @ 2025-11-20 7:45 ` George Guo 2025-11-20 8:07 ` Xi Ruoyao ` (2 more replies) 2025-11-20 7:45 ` [PATCH 2/2] LoongArch: Enable 128-bit atomics " George Guo 1 sibling, 3 replies; 9+ messages in thread From: George Guo @ 2025-11-20 7:45 UTC (permalink / raw) To: Huacai Chen, WANG Xuerui; +Cc: loongarch, linux-kernel, George Guo, George Guo From: George Guo <guodongtai@kylinos.cn> Implement 128-bit atomic compare-and-exchange using LoongArch's LL.D/SC.Q instructions. At the same time, fix BPF scheduler test failures (scx_central scx_qmap) caused by kmalloc_nolock_noprof returning NULL due to missing 128-bit atomics. The NULL returns led to -ENOMEM errors during scheduler initialization, causing test cases to fail. Verified by testing with the scx_qmap scheduler (located in tools/sched_ext/). Building with `make` and running ./tools/sched_ext/build/bin/scx_qmap. Signed-off-by: George Guo <guodongtai@kylinos.cn> --- arch/loongarch/include/asm/cmpxchg.h | 46 ++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h index 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..5f8d418595cf62ec3153dd3825d80ac1fb31e883 100644 --- a/arch/loongarch/include/asm/cmpxchg.h +++ b/arch/loongarch/include/asm/cmpxchg.h @@ -111,6 +111,43 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size) __ret; \ }) +union __u128_halves { + u128 full; + struct { + u64 low; + u64 high; + }; +}; + +#define __cmpxchg128_asm(ld, st, ptr, old, new) \ +({ \ + union __u128_halves __old, __new, __ret; \ + volatile u64 *__ptr = (volatile u64 *)(ptr); \ + \ + __old.full = (old); \ + __new.full = (new); \ + \ + __asm__ __volatile__( \ + "1: " ld " %0, %4 # 128-bit cmpxchg low \n" \ + " " ld " %1, %5 # 128-bit cmpxchg high \n" \ + " bne %0, %z6, 2f \n" \ + " bne %1, %z7, 2f \n" \ + " move $t0, %z8 \n" \ + " move $t1, %z9 \n" \ + " " st " $t0, $t1, %2 \n" \ + " beqz $t0, 1b \n" \ + "2: \n" \ + __WEAK_LLSC_MB \ + : "=&r" (__ret.low), "=&r" (__ret.high), \ + "=ZB" (__ptr[0]), "=ZB" (__ptr[1]) \ + : "ZB" (__ptr[0]), "ZB" (__ptr[1]), \ + "Jr" (__old.low), "Jr" (__old.high), \ + "Jr" (__new.low), "Jr" (__new.high) \ + : "t0", "t1", "memory"); \ + \ + __ret.full; \ +}) + static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old, unsigned int new, unsigned int size) { @@ -198,6 +235,15 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int __res; \ }) +/* cmpxchg128 */ +#define system_has_cmpxchg128() 1 + +#define arch_cmpxchg128(ptr, o, n) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 16); \ + __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); \ +}) + #ifdef CONFIG_64BIT #define arch_cmpxchg64_local(ptr, o, n) \ ({ \ -- 2.48.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] LoongArch: Add 128-bit atomic cmpxchg support 2025-11-20 7:45 ` [PATCH 1/2] " George Guo @ 2025-11-20 8:07 ` Xi Ruoyao 2025-11-20 9:25 ` hev 2025-11-20 11:14 ` david laight 2 siblings, 0 replies; 9+ messages in thread From: Xi Ruoyao @ 2025-11-20 8:07 UTC (permalink / raw) To: George Guo, Huacai Chen, WANG Xuerui; +Cc: loongarch, linux-kernel, George Guo On Thu, 2025-11-20 at 15:45 +0800, George Guo wrote: > From: George Guo <guodongtai@kylinos.cn> > > Implement 128-bit atomic compare-and-exchange using LoongArch's > LL.D/SC.Q instructions. > > At the same time, fix BPF scheduler test failures (scx_central scx_qmap) > caused by kmalloc_nolock_noprof returning NULL due to missing > 128-bit atomics. The NULL returns led to -ENOMEM errors during > scheduler initialization, causing test cases to fail. > > Verified by testing with the scx_qmap scheduler (located in > tools/sched_ext/). Building with `make` and running > ./tools/sched_ext/build/bin/scx_qmap. > > Signed-off-by: George Guo <guodongtai@kylinos.cn> > --- > arch/loongarch/include/asm/cmpxchg.h | 46 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 46 insertions(+) > > diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h > index 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..5f8d418595cf62ec3153dd3825d80ac1fb31e883 100644 > --- a/arch/loongarch/include/asm/cmpxchg.h > +++ b/arch/loongarch/include/asm/cmpxchg.h > @@ -111,6 +111,43 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size) > __ret; \ > }) > > +union __u128_halves { > + u128 full; > + struct { > + u64 low; > + u64 high; > + }; > +}; > + > +#define __cmpxchg128_asm(ld, st, ptr, old, new) \ > +({ \ > + union __u128_halves __old, __new, __ret; \ > + volatile u64 *__ptr = (volatile u64 *)(ptr); \ > + \ > + __old.full = (old); \ > + __new.full = (new); \ > + \ > + __asm__ __volatile__( \ > + "1: " ld " %0, %4 # 128-bit cmpxchg low \n" \ > + " " ld " %1, %5 # 128-bit cmpxchg high \n" \ This is incorrect. It may happen that: SMP 1 | SMP 2 ll.d $r4, mem | | sc.q $t0, $t1, mem ll.d $r5, mem + 4 | As the second ll.d instruction raises the LL bit, you lose the info if the first ll.d instruction has succeeded. Thus you cannot figure out if someone has modified the mem during your "critical section." You should use a normal ld.d for the high word instead. And you need to insert a dbar between ll.d and ld.d to prevent reordering. -- Xi Ruoyao <xry111@xry111.site> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] LoongArch: Add 128-bit atomic cmpxchg support 2025-11-20 7:45 ` [PATCH 1/2] " George Guo 2025-11-20 8:07 ` Xi Ruoyao @ 2025-11-20 9:25 ` hev 2025-11-21 9:51 ` George Guo 2025-11-20 11:14 ` david laight 2 siblings, 1 reply; 9+ messages in thread From: hev @ 2025-11-20 9:25 UTC (permalink / raw) To: George Guo; +Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo On Thu, Nov 20, 2025 at 3:46 PM George Guo <dongtai.guo@linux.dev> wrote: > > From: George Guo <guodongtai@kylinos.cn> > > Implement 128-bit atomic compare-and-exchange using LoongArch's > LL.D/SC.Q instructions. > > At the same time, fix BPF scheduler test failures (scx_central scx_qmap) > caused by kmalloc_nolock_noprof returning NULL due to missing > 128-bit atomics. The NULL returns led to -ENOMEM errors during > scheduler initialization, causing test cases to fail. > > Verified by testing with the scx_qmap scheduler (located in > tools/sched_ext/). Building with `make` and running > ./tools/sched_ext/build/bin/scx_qmap. > > Signed-off-by: George Guo <guodongtai@kylinos.cn> > --- > arch/loongarch/include/asm/cmpxchg.h | 46 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 46 insertions(+) > > diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h > index 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..5f8d418595cf62ec3153dd3825d80ac1fb31e883 100644 > --- a/arch/loongarch/include/asm/cmpxchg.h > +++ b/arch/loongarch/include/asm/cmpxchg.h > @@ -111,6 +111,43 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size) > __ret; \ > }) > > +union __u128_halves { > + u128 full; > + struct { > + u64 low; > + u64 high; > + }; > +}; > + > +#define __cmpxchg128_asm(ld, st, ptr, old, new) \ > +({ \ > + union __u128_halves __old, __new, __ret; \ > + volatile u64 *__ptr = (volatile u64 *)(ptr); \ > + \ > + __old.full = (old); \ > + __new.full = (new); \ > + \ > + __asm__ __volatile__( \ > + "1: " ld " %0, %4 # 128-bit cmpxchg low \n" \ > + " " ld " %1, %5 # 128-bit cmpxchg high \n" \ > + " bne %0, %z6, 2f \n" \ > + " bne %1, %z7, 2f \n" \ > + " move $t0, %z8 \n" \ > + " move $t1, %z9 \n" \ > + " " st " $t0, $t1, %2 \n" \ > + " beqz $t0, 1b \n" \ > + "2: \n" \ > + __WEAK_LLSC_MB \ > + : "=&r" (__ret.low), "=&r" (__ret.high), \ > + "=ZB" (__ptr[0]), "=ZB" (__ptr[1]) \ > + : "ZB" (__ptr[0]), "ZB" (__ptr[1]), \ Address operand constraints: - ld.d: "m" - ll.d: "ZC" - sc.q: "r" > + "Jr" (__old.low), "Jr" (__old.high), \ > + "Jr" (__new.low), "Jr" (__new.high) \ > + : "t0", "t1", "memory"); \ > + \ > + __ret.full; \ > +}) > + > static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old, > unsigned int new, unsigned int size) > { > @@ -198,6 +235,15 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int > __res; \ > }) > > +/* cmpxchg128 */ > +#define system_has_cmpxchg128() 1 > + > +#define arch_cmpxchg128(ptr, o, n) \ > +({ \ > + BUILD_BUG_ON(sizeof(*(ptr)) != 16); \ > + __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); \ "sc.d" -> "sc.q" __cmpxchg128_asm doesn’t have multiple variants, so no need to genericize it? > +}) > + > #ifdef CONFIG_64BIT > #define arch_cmpxchg64_local(ptr, o, n) \ > ({ \ > > -- > 2.48.1 > > -- Rui ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] LoongArch: Add 128-bit atomic cmpxchg support 2025-11-20 9:25 ` hev @ 2025-11-21 9:51 ` George Guo 2025-11-21 11:38 ` hev 0 siblings, 1 reply; 9+ messages in thread From: George Guo @ 2025-11-21 9:51 UTC (permalink / raw) To: hev; +Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=GB18030, Size: 4091 bytes --] On Thu, 20 Nov 2025 17:25:34 +0800 hev <r@hev.cc> wrote: > On Thu, Nov 20, 2025 at 3:466§2PM George Guo <dongtai.guo@linux.dev> > wrote: > > > > From: George Guo <guodongtai@kylinos.cn> > > > > Implement 128-bit atomic compare-and-exchange using LoongArch's > > LL.D/SC.Q instructions. > > > > At the same time, fix BPF scheduler test failures (scx_central > > scx_qmap) caused by kmalloc_nolock_noprof returning NULL due to > > missing 128-bit atomics. The NULL returns led to -ENOMEM errors > > during scheduler initialization, causing test cases to fail. > > > > Verified by testing with the scx_qmap scheduler (located in > > tools/sched_ext/). Building with `make` and running > > ./tools/sched_ext/build/bin/scx_qmap. > > > > Signed-off-by: George Guo <guodongtai@kylinos.cn> > > --- > > arch/loongarch/include/asm/cmpxchg.h | 46 > > ++++++++++++++++++++++++++++++++++++ 1 file changed, 46 > > insertions(+) > > > > diff --git a/arch/loongarch/include/asm/cmpxchg.h > > b/arch/loongarch/include/asm/cmpxchg.h index > > 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..5f8d418595cf62ec3153dd3825d80ac1fb31e883 > > 100644 --- a/arch/loongarch/include/asm/cmpxchg.h +++ > > b/arch/loongarch/include/asm/cmpxchg.h @@ -111,6 +111,43 @@ > > __arch_xchg(volatile void *ptr, unsigned long x, int size) __ret; > > \ }) > > > > +union __u128_halves { > > + u128 full; > > + struct { > > + u64 low; > > + u64 high; > > + }; > > +}; > > + > > +#define __cmpxchg128_asm(ld, st, ptr, old, new) > > \ +({ > > \ > > + union __u128_halves __old, __new, __ret; > > \ > > + volatile u64 *__ptr = (volatile u64 *)(ptr); > > \ > > + > > \ > > + __old.full = (old); > > \ > > + __new.full = (new); > > \ > > + > > \ > > + __asm__ __volatile__( > > \ > > + "1: " ld " %0, %4 # 128-bit cmpxchg low \n" > > \ > > + " " ld " %1, %5 # 128-bit cmpxchg high \n" > > \ > > + " bne %0, %z6, 2f \n" > > \ > > + " bne %1, %z7, 2f \n" > > \ > > + " move $t0, %z8 \n" > > \ > > + " move $t1, %z9 \n" > > \ > > + " " st " $t0, $t1, %2 \n" > > \ > > + " beqz $t0, 1b \n" > > \ > > + "2: \n" > > \ > > + __WEAK_LLSC_MB > > \ > > + : "=&r" (__ret.low), "=&r" (__ret.high), > > \ > > + "=ZB" (__ptr[0]), "=ZB" (__ptr[1]) > > \ > > + : "ZB" (__ptr[0]), "ZB" (__ptr[1]), > > \ > > Address operand constraints: > - ld.d: "m" > - ll.d: "ZC" > - sc.q: "r" > Thanks for your advice. Could you tell me how to find these constraints? > > + "Jr" (__old.low), "Jr" (__old.high), > > \ > > + "Jr" (__new.low), "Jr" (__new.high) > > \ > > + : "t0", "t1", "memory"); > > \ > > + > > \ > > + __ret.full; > > \ +}) > > + > > static inline unsigned int __cmpxchg_small(volatile void *ptr, > > unsigned int old, unsigned int new, unsigned int size) > > { > > @@ -198,6 +235,15 @@ __cmpxchg(volatile void *ptr, unsigned long > > old, unsigned long new, unsigned int __res; > > \ }) > > > > +/* cmpxchg128 */ > > +#define system_has_cmpxchg128() 1 > > + > > +#define arch_cmpxchg128(ptr, o, n) > > \ +({ > > \ > > + BUILD_BUG_ON(sizeof(*(ptr)) != 16); > > \ > > + __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); > > \ > > "sc.d" -> "sc.q" > > __cmpxchg128_asm doesn¡¯t have multiple variants, so no need to > genericize it? > > > +}) > > + > > #ifdef CONFIG_64BIT > > #define arch_cmpxchg64_local(ptr, o, n) > > \ ({ > > \ > > > > -- > > 2.48.1 > > > > > > -- > Rui ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] LoongArch: Add 128-bit atomic cmpxchg support 2025-11-21 9:51 ` George Guo @ 2025-11-21 11:38 ` hev 0 siblings, 0 replies; 9+ messages in thread From: hev @ 2025-11-21 11:38 UTC (permalink / raw) To: George Guo; +Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo On Fri, Nov 21, 2025 at 5:52 PM George Guo <dongtai.guo@linux.dev> wrote: > > On Thu, 20 Nov 2025 17:25:34 +0800 > hev <r@hev.cc> wrote: > > > On Thu, Nov 20, 2025 at 3:46 PM George Guo <dongtai.guo@linux.dev> > > wrote: > > > > > > From: George Guo <guodongtai@kylinos.cn> > > > > > > Implement 128-bit atomic compare-and-exchange using LoongArch's > > > LL.D/SC.Q instructions. > > > > > > At the same time, fix BPF scheduler test failures (scx_central > > > scx_qmap) caused by kmalloc_nolock_noprof returning NULL due to > > > missing 128-bit atomics. The NULL returns led to -ENOMEM errors > > > during scheduler initialization, causing test cases to fail. > > > > > > Verified by testing with the scx_qmap scheduler (located in > > > tools/sched_ext/). Building with `make` and running > > > ./tools/sched_ext/build/bin/scx_qmap. > > > > > > Signed-off-by: George Guo <guodongtai@kylinos.cn> > > > --- > > > arch/loongarch/include/asm/cmpxchg.h | 46 > > > ++++++++++++++++++++++++++++++++++++ 1 file changed, 46 > > > insertions(+) > > > > > > diff --git a/arch/loongarch/include/asm/cmpxchg.h > > > b/arch/loongarch/include/asm/cmpxchg.h index > > > 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..5f8d418595cf62ec3153dd3825d80ac1fb31e883 > > > 100644 --- a/arch/loongarch/include/asm/cmpxchg.h +++ > > > b/arch/loongarch/include/asm/cmpxchg.h @@ -111,6 +111,43 @@ > > > __arch_xchg(volatile void *ptr, unsigned long x, int size) __ret; > > > \ }) > > > > > > +union __u128_halves { > > > + u128 full; > > > + struct { > > > + u64 low; > > > + u64 high; > > > + }; > > > +}; > > > + > > > +#define __cmpxchg128_asm(ld, st, ptr, old, new) > > > \ +({ > > > \ > > > + union __u128_halves __old, __new, __ret; > > > \ > > > + volatile u64 *__ptr = (volatile u64 *)(ptr); > > > \ > > > + > > > \ > > > + __old.full = (old); > > > \ > > > + __new.full = (new); > > > \ > > > + > > > \ > > > + __asm__ __volatile__( > > > \ > > > + "1: " ld " %0, %4 # 128-bit cmpxchg low \n" > > > \ > > > + " " ld " %1, %5 # 128-bit cmpxchg high \n" > > > \ > > > + " bne %0, %z6, 2f \n" > > > \ > > > + " bne %1, %z7, 2f \n" > > > \ > > > + " move $t0, %z8 \n" > > > \ > > > + " move $t1, %z9 \n" > > > \ > > > + " " st " $t0, $t1, %2 \n" > > > \ > > > + " beqz $t0, 1b \n" > > > \ > > > + "2: \n" > > > \ > > > + __WEAK_LLSC_MB > > > \ > > > + : "=&r" (__ret.low), "=&r" (__ret.high), > > > \ > > > + "=ZB" (__ptr[0]), "=ZB" (__ptr[1]) > > > \ > > > + : "ZB" (__ptr[0]), "ZB" (__ptr[1]), > > > \ > > > > Address operand constraints: > > - ld.d: "m" > > - ll.d: "ZC" > > - sc.q: "r" > > > Thanks for your advice. > Could you tell me how to find these constraints? https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html > > > + "Jr" (__old.low), "Jr" (__old.high), > > > \ > > > + "Jr" (__new.low), "Jr" (__new.high) > > > \ > > > + : "t0", "t1", "memory"); > > > \ > > > + > > > \ > > > + __ret.full; > > > \ +}) > > > + > > > static inline unsigned int __cmpxchg_small(volatile void *ptr, > > > unsigned int old, unsigned int new, unsigned int size) > > > { > > > @@ -198,6 +235,15 @@ __cmpxchg(volatile void *ptr, unsigned long > > > old, unsigned long new, unsigned int __res; > > > \ }) > > > > > > +/* cmpxchg128 */ > > > +#define system_has_cmpxchg128() 1 > > > + > > > +#define arch_cmpxchg128(ptr, o, n) > > > \ +({ > > > \ > > > + BUILD_BUG_ON(sizeof(*(ptr)) != 16); > > > \ > > > + __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); > > > \ > > > > "sc.d" -> "sc.q" > > > > __cmpxchg128_asm doesn’t have multiple variants, so no need to > > genericize it? > > > > > +}) > > > + > > > #ifdef CONFIG_64BIT > > > #define arch_cmpxchg64_local(ptr, o, n) > > > \ ({ > > > \ > > > > > > -- > > > 2.48.1 > > > > > > > > > > -- > > Rui > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] LoongArch: Add 128-bit atomic cmpxchg support 2025-11-20 7:45 ` [PATCH 1/2] " George Guo 2025-11-20 8:07 ` Xi Ruoyao 2025-11-20 9:25 ` hev @ 2025-11-20 11:14 ` david laight 2 siblings, 0 replies; 9+ messages in thread From: david laight @ 2025-11-20 11:14 UTC (permalink / raw) To: George Guo; +Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo On Thu, 20 Nov 2025 15:45:44 +0800 George Guo <dongtai.guo@linux.dev> wrote: > From: George Guo <guodongtai@kylinos.cn> > > Implement 128-bit atomic compare-and-exchange using LoongArch's > LL.D/SC.Q instructions. > > At the same time, fix BPF scheduler test failures (scx_central scx_qmap) > caused by kmalloc_nolock_noprof returning NULL due to missing > 128-bit atomics. The NULL returns led to -ENOMEM errors during > scheduler initialization, causing test cases to fail. > > Verified by testing with the scx_qmap scheduler (located in > tools/sched_ext/). Building with `make` and running > ./tools/sched_ext/build/bin/scx_qmap. > > Signed-off-by: George Guo <guodongtai@kylinos.cn> > --- > arch/loongarch/include/asm/cmpxchg.h | 46 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 46 insertions(+) > > diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h > index 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..5f8d418595cf62ec3153dd3825d80ac1fb31e883 100644 > --- a/arch/loongarch/include/asm/cmpxchg.h > +++ b/arch/loongarch/include/asm/cmpxchg.h > @@ -111,6 +111,43 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size) > __ret; \ > }) > > +union __u128_halves { > + u128 full; > + struct { > + u64 low; > + u64 high; > + }; > +}; > + > +#define __cmpxchg128_asm(ld, st, ptr, old, new) \ > +({ \ > + union __u128_halves __old, __new, __ret; \ > + volatile u64 *__ptr = (volatile u64 *)(ptr); \ > + \ > + __old.full = (old); \ > + __new.full = (new); \ > + \ > + __asm__ __volatile__( \ > + "1: " ld " %0, %4 # 128-bit cmpxchg low \n" \ > + " " ld " %1, %5 # 128-bit cmpxchg high \n" \ > + " bne %0, %z6, 2f \n" \ > + " bne %1, %z7, 2f \n" \ > + " move $t0, %z8 \n" \ > + " move $t1, %z9 \n" \ > + " " st " $t0, $t1, %2 \n" \ > + " beqz $t0, 1b \n" \ > + "2: \n" \ > + __WEAK_LLSC_MB \ > + : "=&r" (__ret.low), "=&r" (__ret.high), \ > + "=ZB" (__ptr[0]), "=ZB" (__ptr[1]) \ > + : "ZB" (__ptr[0]), "ZB" (__ptr[1]), \ > + "Jr" (__old.low), "Jr" (__old.high), \ > + "Jr" (__new.low), "Jr" (__new.high) \ > + : "t0", "t1", "memory"); \ I'd add symbolic names for the asm registers to it easier to read. eg: [ret_low] "=%r" (__ret.low) and replace %0 with %[rel_row] David > + \ > + __ret.full; \ > +}) > + > static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old, > unsigned int new, unsigned int size) > { > @@ -198,6 +235,15 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int > __res; \ > }) > > +/* cmpxchg128 */ > +#define system_has_cmpxchg128() 1 > + > +#define arch_cmpxchg128(ptr, o, n) \ > +({ \ > + BUILD_BUG_ON(sizeof(*(ptr)) != 16); \ > + __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); \ > +}) > + > #ifdef CONFIG_64BIT > #define arch_cmpxchg64_local(ptr, o, n) \ > ({ \ > ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] LoongArch: Enable 128-bit atomics cmpxchg support 2025-11-20 7:45 [PATCH 0/2] LoongArch: Add 128-bit atomic cmpxchg support George Guo 2025-11-20 7:45 ` [PATCH 1/2] " George Guo @ 2025-11-20 7:45 ` George Guo 2025-11-20 10:37 ` kernel test robot 1 sibling, 1 reply; 9+ messages in thread From: George Guo @ 2025-11-20 7:45 UTC (permalink / raw) To: Huacai Chen, WANG Xuerui; +Cc: loongarch, linux-kernel, George Guo, George Guo From: George Guo <guodongtai@kylinos.cn> Add select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE in Kconfig to enable 128-bit atomic cmpxchg support on LoongArch. Signed-off-by: George Guo <guodongtai@kylinos.cn> --- arch/loongarch/Kconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 5b1116733d881bc2b1b43fb93f20367add4dbc54..6fb2c253969f9ddece5478920423d7326c3ec046 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -114,6 +114,7 @@ config LOONGARCH select GENERIC_TIME_VSYSCALL select GPIOLIB select HAS_IOPORT + select HAVE_ALIGNED_STRUCT_PAGE select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_JUMP_LABEL select HAVE_ARCH_JUMP_LABEL_RELATIVE @@ -140,6 +141,7 @@ config LOONGARCH select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS select HAVE_DYNAMIC_FTRACE_WITH_REGS select HAVE_EBPF_JIT + select HAVE_CMPXCHG_DOUBLE select HAVE_EFFICIENT_UNALIGNED_ACCESS if !ARCH_STRICT_ALIGN select HAVE_EXIT_THREAD select HAVE_GENERIC_TIF_BITS -- 2.48.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] LoongArch: Enable 128-bit atomics cmpxchg support 2025-11-20 7:45 ` [PATCH 2/2] LoongArch: Enable 128-bit atomics " George Guo @ 2025-11-20 10:37 ` kernel test robot 0 siblings, 0 replies; 9+ messages in thread From: kernel test robot @ 2025-11-20 10:37 UTC (permalink / raw) To: George Guo, Huacai Chen, WANG Xuerui Cc: llvm, oe-kbuild-all, loongarch, linux-kernel, George Guo Hi George, kernel test robot noticed the following build errors: [auto build test ERROR on 8b690556d8fe074b4f9835075050fba3fb180e93] url: https://github.com/intel-lab-lkp/linux/commits/George-Guo/LoongArch-Add-128-bit-atomic-cmpxchg-support/20251120-160152 base: 8b690556d8fe074b4f9835075050fba3fb180e93 patch link: https://lore.kernel.org/r/20251120-2-v1-2-705bdc440550%40linux.dev patch subject: [PATCH 2/2] LoongArch: Enable 128-bit atomics cmpxchg support config: loongarch-allnoconfig (https://download.01.org/0day-ci/archive/20251120/202511201828.xfphUVkJ-lkp@intel.com/config) compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project 9e9fe08b16ea2c4d9867fb4974edf2a3776d6ece) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251120/202511201828.xfphUVkJ-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202511201828.xfphUVkJ-lkp@intel.com/ All errors (new ones prefixed by >>): >> mm/slub.c:766:9: error: immediate must be a multiple of 4 in the range [-32768, 32764] 766 | return try_cmpxchg_freelist(&slab->freelist_counter.full, &old.full, new.full); | ^ mm/slab.h:22:32: note: expanded from macro 'try_cmpxchg_freelist' 22 | # define try_cmpxchg_freelist try_cmpxchg128 | ^ include/linux/atomic/atomic-instrumented.h:4956:2: note: expanded from macro 'try_cmpxchg128' 4956 | raw_try_cmpxchg128(__ai_ptr, __ai_oldp, __VA_ARGS__); \ | ^ include/linux/atomic/atomic-arch-fallback.h:326:9: note: expanded from macro 'raw_try_cmpxchg128' 326 | ___r = raw_cmpxchg128((_ptr), ___o, (_new)); \ | ^ include/linux/atomic/atomic-arch-fallback.h:141:24: note: expanded from macro 'raw_cmpxchg128' 141 | #define raw_cmpxchg128 arch_cmpxchg128 | ^ arch/loongarch/include/asm/cmpxchg.h:244:2: note: expanded from macro 'arch_cmpxchg128' 244 | __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); \ | ^ arch/loongarch/include/asm/cmpxchg.h:137:3: note: expanded from macro '__cmpxchg128_asm' 137 | " " st " $t0, $t1, %2 \n" \ | ^ <inline asm>:7:23: note: instantiated into assembly here 7 | sc.d $t0, $t1, $a2, 0 | ^ >> mm/slub.c:766:9: error: immediate must be a multiple of 4 in the range [-32768, 32764] 766 | return try_cmpxchg_freelist(&slab->freelist_counter.full, &old.full, new.full); | ^ mm/slab.h:22:32: note: expanded from macro 'try_cmpxchg_freelist' 22 | # define try_cmpxchg_freelist try_cmpxchg128 | ^ include/linux/atomic/atomic-instrumented.h:4956:2: note: expanded from macro 'try_cmpxchg128' 4956 | raw_try_cmpxchg128(__ai_ptr, __ai_oldp, __VA_ARGS__); \ | ^ include/linux/atomic/atomic-arch-fallback.h:326:9: note: expanded from macro 'raw_try_cmpxchg128' 326 | ___r = raw_cmpxchg128((_ptr), ___o, (_new)); \ | ^ include/linux/atomic/atomic-arch-fallback.h:141:24: note: expanded from macro 'raw_cmpxchg128' 141 | #define raw_cmpxchg128 arch_cmpxchg128 | ^ arch/loongarch/include/asm/cmpxchg.h:244:2: note: expanded from macro 'arch_cmpxchg128' 244 | __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); \ | ^ arch/loongarch/include/asm/cmpxchg.h:137:3: note: expanded from macro '__cmpxchg128_asm' 137 | " " st " $t0, $t1, %2 \n" \ | ^ <inline asm>:7:23: note: instantiated into assembly here 7 | sc.d $t0, $t1, $a3, 0 | ^ >> mm/slub.c:766:9: error: immediate must be a multiple of 4 in the range [-32768, 32764] 766 | return try_cmpxchg_freelist(&slab->freelist_counter.full, &old.full, new.full); | ^ mm/slab.h:22:32: note: expanded from macro 'try_cmpxchg_freelist' 22 | # define try_cmpxchg_freelist try_cmpxchg128 | ^ include/linux/atomic/atomic-instrumented.h:4956:2: note: expanded from macro 'try_cmpxchg128' 4956 | raw_try_cmpxchg128(__ai_ptr, __ai_oldp, __VA_ARGS__); \ | ^ include/linux/atomic/atomic-arch-fallback.h:326:9: note: expanded from macro 'raw_try_cmpxchg128' 326 | ___r = raw_cmpxchg128((_ptr), ___o, (_new)); \ | ^ include/linux/atomic/atomic-arch-fallback.h:141:24: note: expanded from macro 'raw_cmpxchg128' 141 | #define raw_cmpxchg128 arch_cmpxchg128 | ^ arch/loongarch/include/asm/cmpxchg.h:244:2: note: expanded from macro 'arch_cmpxchg128' 244 | __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); \ | ^ arch/loongarch/include/asm/cmpxchg.h:137:3: note: expanded from macro '__cmpxchg128_asm' 137 | " " st " $t0, $t1, %2 \n" \ | ^ <inline asm>:7:23: note: instantiated into assembly here 7 | sc.d $t0, $t1, $a5, 0 | ^ >> mm/slub.c:766:9: error: immediate must be a multiple of 4 in the range [-32768, 32764] 766 | return try_cmpxchg_freelist(&slab->freelist_counter.full, &old.full, new.full); | ^ mm/slab.h:22:32: note: expanded from macro 'try_cmpxchg_freelist' 22 | # define try_cmpxchg_freelist try_cmpxchg128 | ^ include/linux/atomic/atomic-instrumented.h:4956:2: note: expanded from macro 'try_cmpxchg128' 4956 | raw_try_cmpxchg128(__ai_ptr, __ai_oldp, __VA_ARGS__); \ | ^ include/linux/atomic/atomic-arch-fallback.h:326:9: note: expanded from macro 'raw_try_cmpxchg128' 326 | ___r = raw_cmpxchg128((_ptr), ___o, (_new)); \ | ^ include/linux/atomic/atomic-arch-fallback.h:141:24: note: expanded from macro 'raw_cmpxchg128' 141 | #define raw_cmpxchg128 arch_cmpxchg128 | ^ arch/loongarch/include/asm/cmpxchg.h:244:2: note: expanded from macro 'arch_cmpxchg128' 244 | __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); \ | ^ arch/loongarch/include/asm/cmpxchg.h:137:3: note: expanded from macro '__cmpxchg128_asm' 137 | " " st " $t0, $t1, %2 \n" \ | ^ <inline asm>:7:23: note: instantiated into assembly here 7 | sc.d $t0, $t1, $t3, 0 | ^ 4 errors generated. vim +766 mm/slub.c 881db7fb03a77a Christoph Lameter 2011-06-01 756 6801be4f2653e5 Peter Zijlstra 2023-05-31 757 static inline bool 6801be4f2653e5 Peter Zijlstra 2023-05-31 758 __update_freelist_fast(struct slab *slab, 6801be4f2653e5 Peter Zijlstra 2023-05-31 759 void *freelist_old, unsigned long counters_old, 6801be4f2653e5 Peter Zijlstra 2023-05-31 760 void *freelist_new, unsigned long counters_new) 6801be4f2653e5 Peter Zijlstra 2023-05-31 761 { 6801be4f2653e5 Peter Zijlstra 2023-05-31 762 #ifdef system_has_freelist_aba 6801be4f2653e5 Peter Zijlstra 2023-05-31 763 freelist_aba_t old = { .freelist = freelist_old, .counter = counters_old }; 6801be4f2653e5 Peter Zijlstra 2023-05-31 764 freelist_aba_t new = { .freelist = freelist_new, .counter = counters_new }; 6801be4f2653e5 Peter Zijlstra 2023-05-31 765 6801be4f2653e5 Peter Zijlstra 2023-05-31 @766 return try_cmpxchg_freelist(&slab->freelist_counter.full, &old.full, new.full); 6801be4f2653e5 Peter Zijlstra 2023-05-31 767 #else 6801be4f2653e5 Peter Zijlstra 2023-05-31 768 return false; 6801be4f2653e5 Peter Zijlstra 2023-05-31 769 #endif 6801be4f2653e5 Peter Zijlstra 2023-05-31 770 } 6801be4f2653e5 Peter Zijlstra 2023-05-31 771 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-11-21 11:38 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-20 7:45 [PATCH 0/2] LoongArch: Add 128-bit atomic cmpxchg support George Guo 2025-11-20 7:45 ` [PATCH 1/2] " George Guo 2025-11-20 8:07 ` Xi Ruoyao 2025-11-20 9:25 ` hev 2025-11-21 9:51 ` George Guo 2025-11-21 11:38 ` hev 2025-11-20 11:14 ` david laight 2025-11-20 7:45 ` [PATCH 2/2] LoongArch: Enable 128-bit atomics " George Guo 2025-11-20 10:37 ` kernel test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox