* [PATCH 0/2] LoongArch: Add 128-bit atomic cmpxchg support
@ 2025-11-20 7:45 George Guo
2025-11-20 7:45 ` [PATCH 1/2] " George Guo
2025-11-20 7:45 ` [PATCH 2/2] LoongArch: Enable 128-bit atomics " George Guo
0 siblings, 2 replies; 9+ messages in thread
From: George Guo @ 2025-11-20 7:45 UTC (permalink / raw)
To: Huacai Chen, WANG Xuerui; +Cc: loongarch, linux-kernel, George Guo, George Guo
This patch series adds 128-bit atomic compare-and-exchange support for
LoongArch architecture, which fixes BPF scheduler test failures caused
by missing 128-bit atomics support.
The series consists of two patches:
1. "LoongArch: Add 128-bit atomic cmpxchg support"
- Implements 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions
- Fixes BPF scheduler test failures (scx_central scx_qmap) where
kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
leading to -ENOMEM errors during scheduler initialization
2. "LoongArch: Enable 128-bit atomics cmpxchg support"
- Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
in Kconfig to enable 128-bit atomic cmpxchg support
The issue was identified through BPF scheduler test failures where
scx_central and scx_qmap schedulers would fail to initialize. Testing
was performed using the scx_qmap scheduler from tools/sched_ext/,
confirming that the patches resolve the initialization failures.
Signed-off-by: George Guo <dongtai.guo@linux.dev>
---
George Guo (2):
LoongArch: Add 128-bit atomic cmpxchg support
LoongArch: Enable 128-bit atomics cmpxchg support
arch/loongarch/Kconfig | 2 ++
arch/loongarch/include/asm/cmpxchg.h | 46 ++++++++++++++++++++++++++++++++++++
2 files changed, 48 insertions(+)
---
base-commit: 8b690556d8fe074b4f9835075050fba3fb180e93
change-id: 20251120-2-d03862b2cf6d
Best regards,
--
George Guo <dongtai.guo@linux.dev>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] LoongArch: Add 128-bit atomic cmpxchg support
2025-11-20 7:45 [PATCH 0/2] LoongArch: Add 128-bit atomic cmpxchg support George Guo
@ 2025-11-20 7:45 ` George Guo
2025-11-20 8:07 ` Xi Ruoyao
` (2 more replies)
2025-11-20 7:45 ` [PATCH 2/2] LoongArch: Enable 128-bit atomics " George Guo
1 sibling, 3 replies; 9+ messages in thread
From: George Guo @ 2025-11-20 7:45 UTC (permalink / raw)
To: Huacai Chen, WANG Xuerui; +Cc: loongarch, linux-kernel, George Guo, George Guo
From: George Guo <guodongtai@kylinos.cn>
Implement 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions.
At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
caused by kmalloc_nolock_noprof returning NULL due to missing
128-bit atomics. The NULL returns led to -ENOMEM errors during
scheduler initialization, causing test cases to fail.
Verified by testing with the scx_qmap scheduler (located in
tools/sched_ext/). Building with `make` and running
./tools/sched_ext/build/bin/scx_qmap.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cmpxchg.h | 46 ++++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..5f8d418595cf62ec3153dd3825d80ac1fb31e883 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -111,6 +111,43 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
__ret; \
})
+union __u128_halves {
+ u128 full;
+ struct {
+ u64 low;
+ u64 high;
+ };
+};
+
+#define __cmpxchg128_asm(ld, st, ptr, old, new) \
+({ \
+ union __u128_halves __old, __new, __ret; \
+ volatile u64 *__ptr = (volatile u64 *)(ptr); \
+ \
+ __old.full = (old); \
+ __new.full = (new); \
+ \
+ __asm__ __volatile__( \
+ "1: " ld " %0, %4 # 128-bit cmpxchg low \n" \
+ " " ld " %1, %5 # 128-bit cmpxchg high \n" \
+ " bne %0, %z6, 2f \n" \
+ " bne %1, %z7, 2f \n" \
+ " move $t0, %z8 \n" \
+ " move $t1, %z9 \n" \
+ " " st " $t0, $t1, %2 \n" \
+ " beqz $t0, 1b \n" \
+ "2: \n" \
+ __WEAK_LLSC_MB \
+ : "=&r" (__ret.low), "=&r" (__ret.high), \
+ "=ZB" (__ptr[0]), "=ZB" (__ptr[1]) \
+ : "ZB" (__ptr[0]), "ZB" (__ptr[1]), \
+ "Jr" (__old.low), "Jr" (__old.high), \
+ "Jr" (__new.low), "Jr" (__new.high) \
+ : "t0", "t1", "memory"); \
+ \
+ __ret.full; \
+})
+
static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
unsigned int new, unsigned int size)
{
@@ -198,6 +235,15 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
__res; \
})
+/* cmpxchg128 */
+#define system_has_cmpxchg128() 1
+
+#define arch_cmpxchg128(ptr, o, n) \
+({ \
+ BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
+ __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); \
+})
+
#ifdef CONFIG_64BIT
#define arch_cmpxchg64_local(ptr, o, n) \
({ \
--
2.48.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] LoongArch: Enable 128-bit atomics cmpxchg support
2025-11-20 7:45 [PATCH 0/2] LoongArch: Add 128-bit atomic cmpxchg support George Guo
2025-11-20 7:45 ` [PATCH 1/2] " George Guo
@ 2025-11-20 7:45 ` George Guo
2025-11-20 10:37 ` kernel test robot
1 sibling, 1 reply; 9+ messages in thread
From: George Guo @ 2025-11-20 7:45 UTC (permalink / raw)
To: Huacai Chen, WANG Xuerui; +Cc: loongarch, linux-kernel, George Guo, George Guo
From: George Guo <guodongtai@kylinos.cn>
Add select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE in Kconfig
to enable 128-bit atomic cmpxchg support on LoongArch.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 5b1116733d881bc2b1b43fb93f20367add4dbc54..6fb2c253969f9ddece5478920423d7326c3ec046 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -114,6 +114,7 @@ config LOONGARCH
select GENERIC_TIME_VSYSCALL
select GPIOLIB
select HAS_IOPORT
+ select HAVE_ALIGNED_STRUCT_PAGE
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_JUMP_LABEL_RELATIVE
@@ -140,6 +141,7 @@ config LOONGARCH
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_EBPF_JIT
+ select HAVE_CMPXCHG_DOUBLE
select HAVE_EFFICIENT_UNALIGNED_ACCESS if !ARCH_STRICT_ALIGN
select HAVE_EXIT_THREAD
select HAVE_GENERIC_TIF_BITS
--
2.48.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] LoongArch: Add 128-bit atomic cmpxchg support
2025-11-20 7:45 ` [PATCH 1/2] " George Guo
@ 2025-11-20 8:07 ` Xi Ruoyao
2025-11-20 9:25 ` hev
2025-11-20 11:14 ` david laight
2 siblings, 0 replies; 9+ messages in thread
From: Xi Ruoyao @ 2025-11-20 8:07 UTC (permalink / raw)
To: George Guo, Huacai Chen, WANG Xuerui; +Cc: loongarch, linux-kernel, George Guo
On Thu, 2025-11-20 at 15:45 +0800, George Guo wrote:
> From: George Guo <guodongtai@kylinos.cn>
>
> Implement 128-bit atomic compare-and-exchange using LoongArch's
> LL.D/SC.Q instructions.
>
> At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
> caused by kmalloc_nolock_noprof returning NULL due to missing
> 128-bit atomics. The NULL returns led to -ENOMEM errors during
> scheduler initialization, causing test cases to fail.
>
> Verified by testing with the scx_qmap scheduler (located in
> tools/sched_ext/). Building with `make` and running
> ./tools/sched_ext/build/bin/scx_qmap.
>
> Signed-off-by: George Guo <guodongtai@kylinos.cn>
> ---
> arch/loongarch/include/asm/cmpxchg.h | 46 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 46 insertions(+)
>
> diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> index 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..5f8d418595cf62ec3153dd3825d80ac1fb31e883 100644
> --- a/arch/loongarch/include/asm/cmpxchg.h
> +++ b/arch/loongarch/include/asm/cmpxchg.h
> @@ -111,6 +111,43 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
> __ret; \
> })
>
> +union __u128_halves {
> + u128 full;
> + struct {
> + u64 low;
> + u64 high;
> + };
> +};
> +
> +#define __cmpxchg128_asm(ld, st, ptr, old, new) \
> +({ \
> + union __u128_halves __old, __new, __ret; \
> + volatile u64 *__ptr = (volatile u64 *)(ptr); \
> + \
> + __old.full = (old); \
> + __new.full = (new); \
> + \
> + __asm__ __volatile__( \
> + "1: " ld " %0, %4 # 128-bit cmpxchg low \n" \
> + " " ld " %1, %5 # 128-bit cmpxchg high \n" \
This is incorrect. It may happen that:
SMP 1 | SMP 2
ll.d $r4, mem |
| sc.q $t0, $t1, mem
ll.d $r5, mem + 4 |
As the second ll.d instruction raises the LL bit, you lose the info if
the first ll.d instruction has succeeded. Thus you cannot figure out if
someone has modified the mem during your "critical section."
You should use a normal ld.d for the high word instead. And you need to
insert a dbar between ll.d and ld.d to prevent reordering.
--
Xi Ruoyao <xry111@xry111.site>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] LoongArch: Add 128-bit atomic cmpxchg support
2025-11-20 7:45 ` [PATCH 1/2] " George Guo
2025-11-20 8:07 ` Xi Ruoyao
@ 2025-11-20 9:25 ` hev
2025-11-21 9:51 ` George Guo
2025-11-20 11:14 ` david laight
2 siblings, 1 reply; 9+ messages in thread
From: hev @ 2025-11-20 9:25 UTC (permalink / raw)
To: George Guo; +Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo
On Thu, Nov 20, 2025 at 3:46 PM George Guo <dongtai.guo@linux.dev> wrote:
>
> From: George Guo <guodongtai@kylinos.cn>
>
> Implement 128-bit atomic compare-and-exchange using LoongArch's
> LL.D/SC.Q instructions.
>
> At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
> caused by kmalloc_nolock_noprof returning NULL due to missing
> 128-bit atomics. The NULL returns led to -ENOMEM errors during
> scheduler initialization, causing test cases to fail.
>
> Verified by testing with the scx_qmap scheduler (located in
> tools/sched_ext/). Building with `make` and running
> ./tools/sched_ext/build/bin/scx_qmap.
>
> Signed-off-by: George Guo <guodongtai@kylinos.cn>
> ---
> arch/loongarch/include/asm/cmpxchg.h | 46 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 46 insertions(+)
>
> diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> index 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..5f8d418595cf62ec3153dd3825d80ac1fb31e883 100644
> --- a/arch/loongarch/include/asm/cmpxchg.h
> +++ b/arch/loongarch/include/asm/cmpxchg.h
> @@ -111,6 +111,43 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
> __ret; \
> })
>
> +union __u128_halves {
> + u128 full;
> + struct {
> + u64 low;
> + u64 high;
> + };
> +};
> +
> +#define __cmpxchg128_asm(ld, st, ptr, old, new) \
> +({ \
> + union __u128_halves __old, __new, __ret; \
> + volatile u64 *__ptr = (volatile u64 *)(ptr); \
> + \
> + __old.full = (old); \
> + __new.full = (new); \
> + \
> + __asm__ __volatile__( \
> + "1: " ld " %0, %4 # 128-bit cmpxchg low \n" \
> + " " ld " %1, %5 # 128-bit cmpxchg high \n" \
> + " bne %0, %z6, 2f \n" \
> + " bne %1, %z7, 2f \n" \
> + " move $t0, %z8 \n" \
> + " move $t1, %z9 \n" \
> + " " st " $t0, $t1, %2 \n" \
> + " beqz $t0, 1b \n" \
> + "2: \n" \
> + __WEAK_LLSC_MB \
> + : "=&r" (__ret.low), "=&r" (__ret.high), \
> + "=ZB" (__ptr[0]), "=ZB" (__ptr[1]) \
> + : "ZB" (__ptr[0]), "ZB" (__ptr[1]), \
Address operand constraints:
- ld.d: "m"
- ll.d: "ZC"
- sc.q: "r"
> + "Jr" (__old.low), "Jr" (__old.high), \
> + "Jr" (__new.low), "Jr" (__new.high) \
> + : "t0", "t1", "memory"); \
> + \
> + __ret.full; \
> +})
> +
> static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
> unsigned int new, unsigned int size)
> {
> @@ -198,6 +235,15 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
> __res; \
> })
>
> +/* cmpxchg128 */
> +#define system_has_cmpxchg128() 1
> +
> +#define arch_cmpxchg128(ptr, o, n) \
> +({ \
> + BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
> + __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); \
"sc.d" -> "sc.q"
__cmpxchg128_asm doesn’t have multiple variants, so no need to genericize it?
> +})
> +
> #ifdef CONFIG_64BIT
> #define arch_cmpxchg64_local(ptr, o, n) \
> ({ \
>
> --
> 2.48.1
>
>
--
Rui
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] LoongArch: Enable 128-bit atomics cmpxchg support
2025-11-20 7:45 ` [PATCH 2/2] LoongArch: Enable 128-bit atomics " George Guo
@ 2025-11-20 10:37 ` kernel test robot
0 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2025-11-20 10:37 UTC (permalink / raw)
To: George Guo, Huacai Chen, WANG Xuerui
Cc: llvm, oe-kbuild-all, loongarch, linux-kernel, George Guo
Hi George,
kernel test robot noticed the following build errors:
[auto build test ERROR on 8b690556d8fe074b4f9835075050fba3fb180e93]
url: https://github.com/intel-lab-lkp/linux/commits/George-Guo/LoongArch-Add-128-bit-atomic-cmpxchg-support/20251120-160152
base: 8b690556d8fe074b4f9835075050fba3fb180e93
patch link: https://lore.kernel.org/r/20251120-2-v1-2-705bdc440550%40linux.dev
patch subject: [PATCH 2/2] LoongArch: Enable 128-bit atomics cmpxchg support
config: loongarch-allnoconfig (https://download.01.org/0day-ci/archive/20251120/202511201828.xfphUVkJ-lkp@intel.com/config)
compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project 9e9fe08b16ea2c4d9867fb4974edf2a3776d6ece)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251120/202511201828.xfphUVkJ-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511201828.xfphUVkJ-lkp@intel.com/
All errors (new ones prefixed by >>):
>> mm/slub.c:766:9: error: immediate must be a multiple of 4 in the range [-32768, 32764]
766 | return try_cmpxchg_freelist(&slab->freelist_counter.full, &old.full, new.full);
| ^
mm/slab.h:22:32: note: expanded from macro 'try_cmpxchg_freelist'
22 | # define try_cmpxchg_freelist try_cmpxchg128
| ^
include/linux/atomic/atomic-instrumented.h:4956:2: note: expanded from macro 'try_cmpxchg128'
4956 | raw_try_cmpxchg128(__ai_ptr, __ai_oldp, __VA_ARGS__); \
| ^
include/linux/atomic/atomic-arch-fallback.h:326:9: note: expanded from macro 'raw_try_cmpxchg128'
326 | ___r = raw_cmpxchg128((_ptr), ___o, (_new)); \
| ^
include/linux/atomic/atomic-arch-fallback.h:141:24: note: expanded from macro 'raw_cmpxchg128'
141 | #define raw_cmpxchg128 arch_cmpxchg128
| ^
arch/loongarch/include/asm/cmpxchg.h:244:2: note: expanded from macro 'arch_cmpxchg128'
244 | __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); \
| ^
arch/loongarch/include/asm/cmpxchg.h:137:3: note: expanded from macro '__cmpxchg128_asm'
137 | " " st " $t0, $t1, %2 \n" \
| ^
<inline asm>:7:23: note: instantiated into assembly here
7 | sc.d $t0, $t1, $a2, 0
| ^
>> mm/slub.c:766:9: error: immediate must be a multiple of 4 in the range [-32768, 32764]
766 | return try_cmpxchg_freelist(&slab->freelist_counter.full, &old.full, new.full);
| ^
mm/slab.h:22:32: note: expanded from macro 'try_cmpxchg_freelist'
22 | # define try_cmpxchg_freelist try_cmpxchg128
| ^
include/linux/atomic/atomic-instrumented.h:4956:2: note: expanded from macro 'try_cmpxchg128'
4956 | raw_try_cmpxchg128(__ai_ptr, __ai_oldp, __VA_ARGS__); \
| ^
include/linux/atomic/atomic-arch-fallback.h:326:9: note: expanded from macro 'raw_try_cmpxchg128'
326 | ___r = raw_cmpxchg128((_ptr), ___o, (_new)); \
| ^
include/linux/atomic/atomic-arch-fallback.h:141:24: note: expanded from macro 'raw_cmpxchg128'
141 | #define raw_cmpxchg128 arch_cmpxchg128
| ^
arch/loongarch/include/asm/cmpxchg.h:244:2: note: expanded from macro 'arch_cmpxchg128'
244 | __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); \
| ^
arch/loongarch/include/asm/cmpxchg.h:137:3: note: expanded from macro '__cmpxchg128_asm'
137 | " " st " $t0, $t1, %2 \n" \
| ^
<inline asm>:7:23: note: instantiated into assembly here
7 | sc.d $t0, $t1, $a3, 0
| ^
>> mm/slub.c:766:9: error: immediate must be a multiple of 4 in the range [-32768, 32764]
766 | return try_cmpxchg_freelist(&slab->freelist_counter.full, &old.full, new.full);
| ^
mm/slab.h:22:32: note: expanded from macro 'try_cmpxchg_freelist'
22 | # define try_cmpxchg_freelist try_cmpxchg128
| ^
include/linux/atomic/atomic-instrumented.h:4956:2: note: expanded from macro 'try_cmpxchg128'
4956 | raw_try_cmpxchg128(__ai_ptr, __ai_oldp, __VA_ARGS__); \
| ^
include/linux/atomic/atomic-arch-fallback.h:326:9: note: expanded from macro 'raw_try_cmpxchg128'
326 | ___r = raw_cmpxchg128((_ptr), ___o, (_new)); \
| ^
include/linux/atomic/atomic-arch-fallback.h:141:24: note: expanded from macro 'raw_cmpxchg128'
141 | #define raw_cmpxchg128 arch_cmpxchg128
| ^
arch/loongarch/include/asm/cmpxchg.h:244:2: note: expanded from macro 'arch_cmpxchg128'
244 | __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); \
| ^
arch/loongarch/include/asm/cmpxchg.h:137:3: note: expanded from macro '__cmpxchg128_asm'
137 | " " st " $t0, $t1, %2 \n" \
| ^
<inline asm>:7:23: note: instantiated into assembly here
7 | sc.d $t0, $t1, $a5, 0
| ^
>> mm/slub.c:766:9: error: immediate must be a multiple of 4 in the range [-32768, 32764]
766 | return try_cmpxchg_freelist(&slab->freelist_counter.full, &old.full, new.full);
| ^
mm/slab.h:22:32: note: expanded from macro 'try_cmpxchg_freelist'
22 | # define try_cmpxchg_freelist try_cmpxchg128
| ^
include/linux/atomic/atomic-instrumented.h:4956:2: note: expanded from macro 'try_cmpxchg128'
4956 | raw_try_cmpxchg128(__ai_ptr, __ai_oldp, __VA_ARGS__); \
| ^
include/linux/atomic/atomic-arch-fallback.h:326:9: note: expanded from macro 'raw_try_cmpxchg128'
326 | ___r = raw_cmpxchg128((_ptr), ___o, (_new)); \
| ^
include/linux/atomic/atomic-arch-fallback.h:141:24: note: expanded from macro 'raw_cmpxchg128'
141 | #define raw_cmpxchg128 arch_cmpxchg128
| ^
arch/loongarch/include/asm/cmpxchg.h:244:2: note: expanded from macro 'arch_cmpxchg128'
244 | __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); \
| ^
arch/loongarch/include/asm/cmpxchg.h:137:3: note: expanded from macro '__cmpxchg128_asm'
137 | " " st " $t0, $t1, %2 \n" \
| ^
<inline asm>:7:23: note: instantiated into assembly here
7 | sc.d $t0, $t1, $t3, 0
| ^
4 errors generated.
vim +766 mm/slub.c
881db7fb03a77a Christoph Lameter 2011-06-01 756
6801be4f2653e5 Peter Zijlstra 2023-05-31 757 static inline bool
6801be4f2653e5 Peter Zijlstra 2023-05-31 758 __update_freelist_fast(struct slab *slab,
6801be4f2653e5 Peter Zijlstra 2023-05-31 759 void *freelist_old, unsigned long counters_old,
6801be4f2653e5 Peter Zijlstra 2023-05-31 760 void *freelist_new, unsigned long counters_new)
6801be4f2653e5 Peter Zijlstra 2023-05-31 761 {
6801be4f2653e5 Peter Zijlstra 2023-05-31 762 #ifdef system_has_freelist_aba
6801be4f2653e5 Peter Zijlstra 2023-05-31 763 freelist_aba_t old = { .freelist = freelist_old, .counter = counters_old };
6801be4f2653e5 Peter Zijlstra 2023-05-31 764 freelist_aba_t new = { .freelist = freelist_new, .counter = counters_new };
6801be4f2653e5 Peter Zijlstra 2023-05-31 765
6801be4f2653e5 Peter Zijlstra 2023-05-31 @766 return try_cmpxchg_freelist(&slab->freelist_counter.full, &old.full, new.full);
6801be4f2653e5 Peter Zijlstra 2023-05-31 767 #else
6801be4f2653e5 Peter Zijlstra 2023-05-31 768 return false;
6801be4f2653e5 Peter Zijlstra 2023-05-31 769 #endif
6801be4f2653e5 Peter Zijlstra 2023-05-31 770 }
6801be4f2653e5 Peter Zijlstra 2023-05-31 771
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] LoongArch: Add 128-bit atomic cmpxchg support
2025-11-20 7:45 ` [PATCH 1/2] " George Guo
2025-11-20 8:07 ` Xi Ruoyao
2025-11-20 9:25 ` hev
@ 2025-11-20 11:14 ` david laight
2 siblings, 0 replies; 9+ messages in thread
From: david laight @ 2025-11-20 11:14 UTC (permalink / raw)
To: George Guo; +Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo
On Thu, 20 Nov 2025 15:45:44 +0800
George Guo <dongtai.guo@linux.dev> wrote:
> From: George Guo <guodongtai@kylinos.cn>
>
> Implement 128-bit atomic compare-and-exchange using LoongArch's
> LL.D/SC.Q instructions.
>
> At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
> caused by kmalloc_nolock_noprof returning NULL due to missing
> 128-bit atomics. The NULL returns led to -ENOMEM errors during
> scheduler initialization, causing test cases to fail.
>
> Verified by testing with the scx_qmap scheduler (located in
> tools/sched_ext/). Building with `make` and running
> ./tools/sched_ext/build/bin/scx_qmap.
>
> Signed-off-by: George Guo <guodongtai@kylinos.cn>
> ---
> arch/loongarch/include/asm/cmpxchg.h | 46 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 46 insertions(+)
>
> diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> index 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..5f8d418595cf62ec3153dd3825d80ac1fb31e883 100644
> --- a/arch/loongarch/include/asm/cmpxchg.h
> +++ b/arch/loongarch/include/asm/cmpxchg.h
> @@ -111,6 +111,43 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
> __ret; \
> })
>
> +union __u128_halves {
> + u128 full;
> + struct {
> + u64 low;
> + u64 high;
> + };
> +};
> +
> +#define __cmpxchg128_asm(ld, st, ptr, old, new) \
> +({ \
> + union __u128_halves __old, __new, __ret; \
> + volatile u64 *__ptr = (volatile u64 *)(ptr); \
> + \
> + __old.full = (old); \
> + __new.full = (new); \
> + \
> + __asm__ __volatile__( \
> + "1: " ld " %0, %4 # 128-bit cmpxchg low \n" \
> + " " ld " %1, %5 # 128-bit cmpxchg high \n" \
> + " bne %0, %z6, 2f \n" \
> + " bne %1, %z7, 2f \n" \
> + " move $t0, %z8 \n" \
> + " move $t1, %z9 \n" \
> + " " st " $t0, $t1, %2 \n" \
> + " beqz $t0, 1b \n" \
> + "2: \n" \
> + __WEAK_LLSC_MB \
> + : "=&r" (__ret.low), "=&r" (__ret.high), \
> + "=ZB" (__ptr[0]), "=ZB" (__ptr[1]) \
> + : "ZB" (__ptr[0]), "ZB" (__ptr[1]), \
> + "Jr" (__old.low), "Jr" (__old.high), \
> + "Jr" (__new.low), "Jr" (__new.high) \
> + : "t0", "t1", "memory"); \
I'd add symbolic names for the asm registers to it easier to read.
eg: [ret_low] "=%r" (__ret.low) and replace %0 with %[rel_row]
David
> + \
> + __ret.full; \
> +})
> +
> static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
> unsigned int new, unsigned int size)
> {
> @@ -198,6 +235,15 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
> __res; \
> })
>
> +/* cmpxchg128 */
> +#define system_has_cmpxchg128() 1
> +
> +#define arch_cmpxchg128(ptr, o, n) \
> +({ \
> + BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
> + __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n); \
> +})
> +
> #ifdef CONFIG_64BIT
> #define arch_cmpxchg64_local(ptr, o, n) \
> ({ \
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] LoongArch: Add 128-bit atomic cmpxchg support
2025-11-20 9:25 ` hev
@ 2025-11-21 9:51 ` George Guo
2025-11-21 11:38 ` hev
0 siblings, 1 reply; 9+ messages in thread
From: George Guo @ 2025-11-21 9:51 UTC (permalink / raw)
To: hev; +Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=GB18030, Size: 4091 bytes --]
On Thu, 20 Nov 2025 17:25:34 +0800
hev <r@hev.cc> wrote:
> On Thu, Nov 20, 2025 at 3:466§2PM George Guo <dongtai.guo@linux.dev>
> wrote:
> >
> > From: George Guo <guodongtai@kylinos.cn>
> >
> > Implement 128-bit atomic compare-and-exchange using LoongArch's
> > LL.D/SC.Q instructions.
> >
> > At the same time, fix BPF scheduler test failures (scx_central
> > scx_qmap) caused by kmalloc_nolock_noprof returning NULL due to
> > missing 128-bit atomics. The NULL returns led to -ENOMEM errors
> > during scheduler initialization, causing test cases to fail.
> >
> > Verified by testing with the scx_qmap scheduler (located in
> > tools/sched_ext/). Building with `make` and running
> > ./tools/sched_ext/build/bin/scx_qmap.
> >
> > Signed-off-by: George Guo <guodongtai@kylinos.cn>
> > ---
> > arch/loongarch/include/asm/cmpxchg.h | 46
> > ++++++++++++++++++++++++++++++++++++ 1 file changed, 46
> > insertions(+)
> >
> > diff --git a/arch/loongarch/include/asm/cmpxchg.h
> > b/arch/loongarch/include/asm/cmpxchg.h index
> > 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..5f8d418595cf62ec3153dd3825d80ac1fb31e883
> > 100644 --- a/arch/loongarch/include/asm/cmpxchg.h +++
> > b/arch/loongarch/include/asm/cmpxchg.h @@ -111,6 +111,43 @@
> > __arch_xchg(volatile void *ptr, unsigned long x, int size) __ret;
> > \ })
> >
> > +union __u128_halves {
> > + u128 full;
> > + struct {
> > + u64 low;
> > + u64 high;
> > + };
> > +};
> > +
> > +#define __cmpxchg128_asm(ld, st, ptr, old, new)
> > \ +({
> > \
> > + union __u128_halves __old, __new, __ret;
> > \
> > + volatile u64 *__ptr = (volatile u64 *)(ptr);
> > \
> > +
> > \
> > + __old.full = (old);
> > \
> > + __new.full = (new);
> > \
> > +
> > \
> > + __asm__ __volatile__(
> > \
> > + "1: " ld " %0, %4 # 128-bit cmpxchg low \n"
> > \
> > + " " ld " %1, %5 # 128-bit cmpxchg high \n"
> > \
> > + " bne %0, %z6, 2f \n"
> > \
> > + " bne %1, %z7, 2f \n"
> > \
> > + " move $t0, %z8 \n"
> > \
> > + " move $t1, %z9 \n"
> > \
> > + " " st " $t0, $t1, %2 \n"
> > \
> > + " beqz $t0, 1b \n"
> > \
> > + "2: \n"
> > \
> > + __WEAK_LLSC_MB
> > \
> > + : "=&r" (__ret.low), "=&r" (__ret.high),
> > \
> > + "=ZB" (__ptr[0]), "=ZB" (__ptr[1])
> > \
> > + : "ZB" (__ptr[0]), "ZB" (__ptr[1]),
> > \
>
> Address operand constraints:
> - ld.d: "m"
> - ll.d: "ZC"
> - sc.q: "r"
>
Thanks for your advice.
Could you tell me how to find these constraints?
> > + "Jr" (__old.low), "Jr" (__old.high),
> > \
> > + "Jr" (__new.low), "Jr" (__new.high)
> > \
> > + : "t0", "t1", "memory");
> > \
> > +
> > \
> > + __ret.full;
> > \ +})
> > +
> > static inline unsigned int __cmpxchg_small(volatile void *ptr,
> > unsigned int old, unsigned int new, unsigned int size)
> > {
> > @@ -198,6 +235,15 @@ __cmpxchg(volatile void *ptr, unsigned long
> > old, unsigned long new, unsigned int __res;
> > \ })
> >
> > +/* cmpxchg128 */
> > +#define system_has_cmpxchg128() 1
> > +
> > +#define arch_cmpxchg128(ptr, o, n)
> > \ +({
> > \
> > + BUILD_BUG_ON(sizeof(*(ptr)) != 16);
> > \
> > + __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n);
> > \
>
> "sc.d" -> "sc.q"
>
> __cmpxchg128_asm doesn¡¯t have multiple variants, so no need to
> genericize it?
>
> > +})
> > +
> > #ifdef CONFIG_64BIT
> > #define arch_cmpxchg64_local(ptr, o, n)
> > \ ({
> > \
> >
> > --
> > 2.48.1
> >
> >
>
> --
> Rui
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] LoongArch: Add 128-bit atomic cmpxchg support
2025-11-21 9:51 ` George Guo
@ 2025-11-21 11:38 ` hev
0 siblings, 0 replies; 9+ messages in thread
From: hev @ 2025-11-21 11:38 UTC (permalink / raw)
To: George Guo; +Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo
On Fri, Nov 21, 2025 at 5:52 PM George Guo <dongtai.guo@linux.dev> wrote:
>
> On Thu, 20 Nov 2025 17:25:34 +0800
> hev <r@hev.cc> wrote:
>
> > On Thu, Nov 20, 2025 at 3:46 PM George Guo <dongtai.guo@linux.dev>
> > wrote:
> > >
> > > From: George Guo <guodongtai@kylinos.cn>
> > >
> > > Implement 128-bit atomic compare-and-exchange using LoongArch's
> > > LL.D/SC.Q instructions.
> > >
> > > At the same time, fix BPF scheduler test failures (scx_central
> > > scx_qmap) caused by kmalloc_nolock_noprof returning NULL due to
> > > missing 128-bit atomics. The NULL returns led to -ENOMEM errors
> > > during scheduler initialization, causing test cases to fail.
> > >
> > > Verified by testing with the scx_qmap scheduler (located in
> > > tools/sched_ext/). Building with `make` and running
> > > ./tools/sched_ext/build/bin/scx_qmap.
> > >
> > > Signed-off-by: George Guo <guodongtai@kylinos.cn>
> > > ---
> > > arch/loongarch/include/asm/cmpxchg.h | 46
> > > ++++++++++++++++++++++++++++++++++++ 1 file changed, 46
> > > insertions(+)
> > >
> > > diff --git a/arch/loongarch/include/asm/cmpxchg.h
> > > b/arch/loongarch/include/asm/cmpxchg.h index
> > > 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..5f8d418595cf62ec3153dd3825d80ac1fb31e883
> > > 100644 --- a/arch/loongarch/include/asm/cmpxchg.h +++
> > > b/arch/loongarch/include/asm/cmpxchg.h @@ -111,6 +111,43 @@
> > > __arch_xchg(volatile void *ptr, unsigned long x, int size) __ret;
> > > \ })
> > >
> > > +union __u128_halves {
> > > + u128 full;
> > > + struct {
> > > + u64 low;
> > > + u64 high;
> > > + };
> > > +};
> > > +
> > > +#define __cmpxchg128_asm(ld, st, ptr, old, new)
> > > \ +({
> > > \
> > > + union __u128_halves __old, __new, __ret;
> > > \
> > > + volatile u64 *__ptr = (volatile u64 *)(ptr);
> > > \
> > > +
> > > \
> > > + __old.full = (old);
> > > \
> > > + __new.full = (new);
> > > \
> > > +
> > > \
> > > + __asm__ __volatile__(
> > > \
> > > + "1: " ld " %0, %4 # 128-bit cmpxchg low \n"
> > > \
> > > + " " ld " %1, %5 # 128-bit cmpxchg high \n"
> > > \
> > > + " bne %0, %z6, 2f \n"
> > > \
> > > + " bne %1, %z7, 2f \n"
> > > \
> > > + " move $t0, %z8 \n"
> > > \
> > > + " move $t1, %z9 \n"
> > > \
> > > + " " st " $t0, $t1, %2 \n"
> > > \
> > > + " beqz $t0, 1b \n"
> > > \
> > > + "2: \n"
> > > \
> > > + __WEAK_LLSC_MB
> > > \
> > > + : "=&r" (__ret.low), "=&r" (__ret.high),
> > > \
> > > + "=ZB" (__ptr[0]), "=ZB" (__ptr[1])
> > > \
> > > + : "ZB" (__ptr[0]), "ZB" (__ptr[1]),
> > > \
> >
> > Address operand constraints:
> > - ld.d: "m"
> > - ll.d: "ZC"
> > - sc.q: "r"
> >
> Thanks for your advice.
> Could you tell me how to find these constraints?
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html
> > > + "Jr" (__old.low), "Jr" (__old.high),
> > > \
> > > + "Jr" (__new.low), "Jr" (__new.high)
> > > \
> > > + : "t0", "t1", "memory");
> > > \
> > > +
> > > \
> > > + __ret.full;
> > > \ +})
> > > +
> > > static inline unsigned int __cmpxchg_small(volatile void *ptr,
> > > unsigned int old, unsigned int new, unsigned int size)
> > > {
> > > @@ -198,6 +235,15 @@ __cmpxchg(volatile void *ptr, unsigned long
> > > old, unsigned long new, unsigned int __res;
> > > \ })
> > >
> > > +/* cmpxchg128 */
> > > +#define system_has_cmpxchg128() 1
> > > +
> > > +#define arch_cmpxchg128(ptr, o, n)
> > > \ +({
> > > \
> > > + BUILD_BUG_ON(sizeof(*(ptr)) != 16);
> > > \
> > > + __cmpxchg128_asm("ll.d", "sc.d", ptr, o, n);
> > > \
> >
> > "sc.d" -> "sc.q"
> >
> > __cmpxchg128_asm doesn’t have multiple variants, so no need to
> > genericize it?
> >
> > > +})
> > > +
> > > #ifdef CONFIG_64BIT
> > > #define arch_cmpxchg64_local(ptr, o, n)
> > > \ ({
> > > \
> > >
> > > --
> > > 2.48.1
> > >
> > >
> >
> > --
> > Rui
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-11-21 11:38 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-20 7:45 [PATCH 0/2] LoongArch: Add 128-bit atomic cmpxchg support George Guo
2025-11-20 7:45 ` [PATCH 1/2] " George Guo
2025-11-20 8:07 ` Xi Ruoyao
2025-11-20 9:25 ` hev
2025-11-21 9:51 ` George Guo
2025-11-21 11:38 ` hev
2025-11-20 11:14 ` david laight
2025-11-20 7:45 ` [PATCH 2/2] LoongArch: Enable 128-bit atomics " George Guo
2025-11-20 10:37 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox