* [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5)
@ 2025-12-15 8:11 George Guo
2025-12-15 8:11 ` [PATCH v6 1/4] LoongArch: Add SCQ support detection George Guo
` (4 more replies)
0 siblings, 5 replies; 31+ messages in thread
From: George Guo @ 2025-12-15 8:11 UTC (permalink / raw)
To: Huacai Chen, WANG Xuerui, hengqi.chen
Cc: r, xry111, loongarch, linux-kernel, George Guo, George Guo,
Yangyang Lian
This patch series adds 128-bit atomic compare-and-exchange support for
LoongArch architecture, which fixes BPF scheduler test failures caused
by missing 128-bit atomics support.
The series consists of four patches:
1. "LoongArch: Add SCQ support detection"
- Check CPUCFG2_SCQ bit to determin if the CPU supports
SCQ instrction.
2. "LoongArch: Add 128-bit atomic cmpxchg support"
- Implements 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions
- Fixes BPF scheduler test failures (scx_central scx_qmap) where
kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
leading to -ENOMEM errors during scheduler initialization
3. "LoongArch: Use spinlock to emulate 128-bit cmpxchg"
- For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
the SCQ instruction on 3A5000), provide a fallback implementation
of __cmpxchg128 using a spinlock to emulate the atomic operation.
4. "LoongArch: Enable 128-bit atomics cmpxchg support"
- Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
in Kconfig to enable 128-bit atomic cmpxchg support
The issue was identified through BPF scheduler test failures where
scx_central and scx_qmap schedulers would fail to initialize. Testing
was performed using the scx_qmap scheduler from tools/sched_ext/,
confirming that the patches resolve the initialization failures.
Signed-off-by: George Guo <dongtai.guo@linux.dev>
---
Changes in v6:
- Put SCQ information in hwcap
- Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
Changes in v5:
- Reordered the patches
- Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
Changes in v4:
- Add SCQ support detection
- Add spinlock to emulate 128-bit cmpxchg
- Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
Changes in v3:
- dbar 0 -> __WEAK_LLSC_MB
- =ZB" (__ptr[0]) -> "r" (__ptr)
- Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
Changes in v2:
- Use a normal ld.d for the high word instead of ll.d to avoid race
condition
- Insert a dbar between ll.d and ld.d to prevent reordering
- Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
- Fix address operand constraints after testing different approaches:
* ld.d with "m"
* ll.d with "ZC",
* sc.q with "ZB"(alternative constraints caused issues:
- "r" caused system hang
- "ZC" caused compiler error:
{standard input}: Assembler messages:
{standard input}:10037: Fatal error: Immediate overflow.
format: u0:0 )
- Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
---
George Guo (4):
LoongArch: Add SCQ support detection
LoongArch: Add 128-bit atomic cmpxchg support
LoongArch: Use spinlock to emulate 128-bit cmpxchg
LoongArch: Enable 128-bit atomics cmpxchg support
arch/loongarch/Kconfig | 2 +
arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++++++++++
arch/loongarch/include/asm/cpu-features.h | 1 +
arch/loongarch/include/asm/cpu.h | 2 +
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kernel/cpu-probe.c | 2 +
arch/loongarch/kernel/proc.c | 1 +
7 files changed, 75 insertions(+)
---
base-commit: 612df905d7404450696e979c806ba4cdef8684f4
change-id: 20251120-2-d03862b2cf6d
Best regards,
--
George Guo <dongtai.guo@linux.dev>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v6 1/4] LoongArch: Add SCQ support detection
2025-12-15 8:11 [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) George Guo
@ 2025-12-15 8:11 ` George Guo
2025-12-15 8:11 ` [PATCH v6 2/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
` (3 subsequent siblings)
4 siblings, 0 replies; 31+ messages in thread
From: George Guo @ 2025-12-15 8:11 UTC (permalink / raw)
To: Huacai Chen, WANG Xuerui, hengqi.chen
Cc: r, xry111, loongarch, linux-kernel, George Guo, George Guo,
Yangyang Lian
From: George Guo <guodongtai@kylinos.cn>
Check CPUCFG2_SCQ bit to determin if the CPU supports
SCQ instrction.
Co-developed-by: Yangyang Lian <lianyangyang@kylinos.cn>
Signed-off-by: Yangyang Lian <lianyangyang@kylinos.cn>
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cpu-features.h | 1 +
arch/loongarch/include/asm/cpu.h | 2 ++
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kernel/cpu-probe.c | 2 ++
arch/loongarch/kernel/proc.c | 1 +
5 files changed, 7 insertions(+)
diff --git a/arch/loongarch/include/asm/cpu-features.h b/arch/loongarch/include/asm/cpu-features.h
index bd5f0457ad21d89ab902fb1971cc8b41b1d340ad..860cb58a92ba0c0316a8009d97441043374e7f10 100644
--- a/arch/loongarch/include/asm/cpu-features.h
+++ b/arch/loongarch/include/asm/cpu-features.h
@@ -70,5 +70,6 @@
#define cpu_has_msgint cpu_opt(LOONGARCH_CPU_MSGINT)
#define cpu_has_avecint cpu_opt(LOONGARCH_CPU_AVECINT)
#define cpu_has_redirectint cpu_opt(LOONGARCH_CPU_REDIRECTINT)
+#define cpu_has_scq cpu_opt(LOONGARCH_CPU_SCQ)
#endif /* __ASM_CPU_FEATURES_H */
diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h
index f3efb00b61414a9b111ade9fe9beb410b927d937..5531039027ec763f21c7a6a88685ec81fa61d3cc 100644
--- a/arch/loongarch/include/asm/cpu.h
+++ b/arch/loongarch/include/asm/cpu.h
@@ -125,6 +125,7 @@ static inline char *id_to_core_name(unsigned int id)
#define CPU_FEATURE_MSGINT 29 /* CPU has MSG interrupt */
#define CPU_FEATURE_AVECINT 30 /* CPU has AVEC interrupt */
#define CPU_FEATURE_REDIRECTINT 31 /* CPU has interrupt remapping */
+#define CPU_FEATURE_SCQ 32 /* CPU has SC.Q instruction */
#define LOONGARCH_CPU_CPUCFG BIT_ULL(CPU_FEATURE_CPUCFG)
#define LOONGARCH_CPU_LAM BIT_ULL(CPU_FEATURE_LAM)
@@ -158,5 +159,6 @@ static inline char *id_to_core_name(unsigned int id)
#define LOONGARCH_CPU_MSGINT BIT_ULL(CPU_FEATURE_MSGINT)
#define LOONGARCH_CPU_AVECINT BIT_ULL(CPU_FEATURE_AVECINT)
#define LOONGARCH_CPU_REDIRECTINT BIT_ULL(CPU_FEATURE_REDIRECTINT)
+#define LOONGARCH_CPU_SCQ BIT_ULL(CPU_FEATURE_SCQ)
#endif /* _ASM_CPU_H */
diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index 3de03cb864b248cd0fb5de9ec5a86b1436ccbdef..be04b3e6f5b0cd6c5d561efcfd99502bc24e5eee 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -94,6 +94,7 @@
#define CPUCFG2_LSPW BIT(21)
#define CPUCFG2_LAM BIT(22)
#define CPUCFG2_PTW BIT(24)
+#define CPUCFG2_SCQ BIT(30)
#define LOONGARCH_CPUCFG3 0x3
#define CPUCFG3_CCDMA BIT(0)
diff --git a/arch/loongarch/kernel/cpu-probe.c b/arch/loongarch/kernel/cpu-probe.c
index a2060a24b39fd78fa255816fa5518e0ee99b8a8e..5c5ead3eb0895c1a20abba1e19f02226a2657b1f 100644
--- a/arch/loongarch/kernel/cpu-probe.c
+++ b/arch/loongarch/kernel/cpu-probe.c
@@ -201,6 +201,8 @@ static void cpu_probe_common(struct cpuinfo_loongarch *c)
c->options |= LOONGARCH_CPU_PTW;
elf_hwcap |= HWCAP_LOONGARCH_PTW;
}
+ if (config & CPUCFG2_SCQ)
+ c->options |= LOONGARCH_CPU_SCQ;
if (config & CPUCFG2_LSPW) {
c->options |= LOONGARCH_CPU_LSPW;
elf_hwcap |= HWCAP_LOONGARCH_LSPW;
diff --git a/arch/loongarch/kernel/proc.c b/arch/loongarch/kernel/proc.c
index 63d2b7e7e844b0647a3e0d988ec2adb6c77b9b14..adfe8a1e3c9dad047bad197bab99fe87ca58b098 100644
--- a/arch/loongarch/kernel/proc.c
+++ b/arch/loongarch/kernel/proc.c
@@ -75,6 +75,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
if (cpu_has_lbt_x86) seq_printf(m, " lbt_x86");
if (cpu_has_lbt_arm) seq_printf(m, " lbt_arm");
if (cpu_has_lbt_mips) seq_printf(m, " lbt_mips");
+ if (cpu_has_scq) seq_printf(m, " scp");
seq_printf(m, "\n");
seq_printf(m, "Hardware Watchpoint\t: %s", str_yes_no(cpu_has_watch));
--
2.49.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v6 2/4] LoongArch: Add 128-bit atomic cmpxchg support
2025-12-15 8:11 [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) George Guo
2025-12-15 8:11 ` [PATCH v6 1/4] LoongArch: Add SCQ support detection George Guo
@ 2025-12-15 8:11 ` George Guo
2025-12-15 8:11 ` [PATCH v6 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg George Guo
` (2 subsequent siblings)
4 siblings, 0 replies; 31+ messages in thread
From: George Guo @ 2025-12-15 8:11 UTC (permalink / raw)
To: Huacai Chen, WANG Xuerui, hengqi.chen
Cc: r, xry111, loongarch, linux-kernel, George Guo, George Guo
From: George Guo <guodongtai@kylinos.cn>
Implement 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions.
At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
caused by kmalloc_nolock_noprof returning NULL due to missing
128-bit atomics. The NULL returns led to -ENOMEM errors during
scheduler initialization, causing test cases to fail.
Verified by testing with the scx_qmap scheduler (located in
tools/sched_ext/). Building with `make` and running
./tools/sched_ext/build/bin/scx_qmap.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cmpxchg.h | 47 ++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..f7a0a9a032c513196ef186a5493b500787e0e9b6 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -111,6 +111,44 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
__ret; \
})
+union __u128_halves {
+ u128 full;
+ struct {
+ u64 low;
+ u64 high;
+ };
+};
+
+#define __cmpxchg128_asm(ptr, old, new) \
+({ \
+ union __u128_halves __old, __new, __ret; \
+ volatile u64 *__ptr = (volatile u64 *)(ptr); \
+ \
+ __old.full = (old); \
+ __new.full = (new); \
+ \
+ __asm__ __volatile__( \
+ "1: ll.d %0, %3 # 128-bit cmpxchg low \n" \
+ __WEAK_LLSC_MB \
+ " ld.d %1, %4 # 128-bit cmpxchg high \n" \
+ " bne %0, %z5, 2f \n" \
+ " bne %1, %z6, 2f \n" \
+ " move $t0, %z7 \n" \
+ " move $t1, %z8 \n" \
+ " sc.q $t0, $t1, %2 \n" \
+ " beqz $t0, 1b \n" \
+ "2: \n" \
+ __WEAK_LLSC_MB \
+ : "=&r" (__ret.low), "=&r" (__ret.high) \
+ : "r" (__ptr), \
+ "ZC" (__ptr[0]), "m" (__ptr[1]), \
+ "Jr" (__old.low), "Jr" (__old.high), \
+ "Jr" (__new.low), "Jr" (__new.high) \
+ : "t0", "t1", "memory"); \
+ \
+ __ret.full; \
+})
+
static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
unsigned int new, unsigned int size)
{
@@ -198,6 +236,15 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
__res; \
})
+/* cmpxchg128 */
+#define system_has_cmpxchg128() 1
+
+#define arch_cmpxchg128(ptr, o, n) \
+({ \
+ BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
+ __cmpxchg128_asm(ptr, o, n); \
+})
+
#ifdef CONFIG_64BIT
#define arch_cmpxchg64_local(ptr, o, n) \
({ \
--
2.49.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v6 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg
2025-12-15 8:11 [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) George Guo
2025-12-15 8:11 ` [PATCH v6 1/4] LoongArch: Add SCQ support detection George Guo
2025-12-15 8:11 ` [PATCH v6 2/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
@ 2025-12-15 8:11 ` George Guo
2025-12-20 13:41 ` [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) Hengqi Chen
2025-12-20 13:55 ` [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) Hengqi Chen
4 siblings, 0 replies; 31+ messages in thread
From: George Guo @ 2025-12-15 8:11 UTC (permalink / raw)
To: Huacai Chen, WANG Xuerui, hengqi.chen
Cc: r, xry111, loongarch, linux-kernel, George Guo, George Guo
From: George Guo <guodongtai@kylinos.cn>
For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
the SCQ instruction on 3A5000), provide a fallback implementation
of __cmpxchg128 using a spinlock to emulate the atomic operation.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cmpxchg.h | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index f7a0a9a032c513196ef186a5493b500787e0e9b6..814097bfc334184018747e47fb90fd2d2fb27ee2 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -8,6 +8,7 @@
#include <linux/bits.h>
#include <linux/build_bug.h>
#include <asm/barrier.h>
+#include <asm/cpu-features.h>
#define __xchg_asm(amswap_db, m, val) \
({ \
@@ -149,6 +150,23 @@ union __u128_halves {
__ret.full; \
})
+#define __cmpxchg128_locked(ptr, old, new) \
+({ \
+ u128 __ret; \
+ static DEFINE_SPINLOCK(lock); \
+ unsigned long flags; \
+ \
+ spin_lock_irqsave(&lock, flags); \
+ \
+ __ret = *(volatile u128 *)(ptr); \
+ if (__ret == (old)) \
+ *(volatile u128 *)(ptr) = (new); \
+ \
+ spin_unlock_irqrestore(&lock, flags); \
+ \
+ __ret; \
+})
+
static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
unsigned int new, unsigned int size)
{
@@ -242,7 +260,8 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
#define arch_cmpxchg128(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
- __cmpxchg128_asm(ptr, o, n); \
+ cpu_has_scq ? __cmpxchg128_asm(ptr, o, n) : \
+ __cmpxchg128_locked(ptr, o, n); \
})
#ifdef CONFIG_64BIT
--
2.49.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v6 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg
2025-12-15 8:22 George Guo
@ 2025-12-15 8:22 ` George Guo
0 siblings, 0 replies; 31+ messages in thread
From: George Guo @ 2025-12-15 8:22 UTC (permalink / raw)
To: Huacai Chen, WANG Xuerui, hengqi.chen
Cc: r, xry111, loongarch, linux-kernel, George Guo, George Guo
From: George Guo <guodongtai@kylinos.cn>
For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
the SCQ instruction on 3A5000), provide a fallback implementation
of __cmpxchg128 using a spinlock to emulate the atomic operation.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cmpxchg.h | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index f7a0a9a032c513196ef186a5493b500787e0e9b6..814097bfc334184018747e47fb90fd2d2fb27ee2 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -8,6 +8,7 @@
#include <linux/bits.h>
#include <linux/build_bug.h>
#include <asm/barrier.h>
+#include <asm/cpu-features.h>
#define __xchg_asm(amswap_db, m, val) \
({ \
@@ -149,6 +150,23 @@ union __u128_halves {
__ret.full; \
})
+#define __cmpxchg128_locked(ptr, old, new) \
+({ \
+ u128 __ret; \
+ static DEFINE_SPINLOCK(lock); \
+ unsigned long flags; \
+ \
+ spin_lock_irqsave(&lock, flags); \
+ \
+ __ret = *(volatile u128 *)(ptr); \
+ if (__ret == (old)) \
+ *(volatile u128 *)(ptr) = (new); \
+ \
+ spin_unlock_irqrestore(&lock, flags); \
+ \
+ __ret; \
+})
+
static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
unsigned int new, unsigned int size)
{
@@ -242,7 +260,8 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
#define arch_cmpxchg128(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
- __cmpxchg128_asm(ptr, o, n); \
+ cpu_has_scq ? __cmpxchg128_asm(ptr, o, n) : \
+ __cmpxchg128_locked(ptr, o, n); \
})
#ifdef CONFIG_64BIT
--
2.49.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5)
2025-12-15 8:11 [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) George Guo
` (2 preceding siblings ...)
2025-12-15 8:11 ` [PATCH v6 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg George Guo
@ 2025-12-20 13:41 ` Hengqi Chen
2025-12-29 6:34 ` [PATCH loongarch-next 0/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
2025-12-20 13:55 ` [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) Hengqi Chen
4 siblings, 1 reply; 31+ messages in thread
From: Hengqi Chen @ 2025-12-20 13:41 UTC (permalink / raw)
To: George Guo
Cc: Huacai Chen, WANG Xuerui, r, xry111, loongarch, linux-kernel,
George Guo, Yangyang Lian
On Mon, Dec 15, 2025 at 4:11 PM George Guo <dongtai.guo@linux.dev> wrote:
>
> This patch series adds 128-bit atomic compare-and-exchange support for
> LoongArch architecture, which fixes BPF scheduler test failures caused
> by missing 128-bit atomics support.
>
> The series consists of four patches:
>
This series can not apply cleanly on top of loongarch-next branch, so
I haven't tested it.
> 1. "LoongArch: Add SCQ support detection"
> - Check CPUCFG2_SCQ bit to determin if the CPU supports
> SCQ instrction.
>
> 2. "LoongArch: Add 128-bit atomic cmpxchg support"
> - Implements 128-bit atomic compare-and-exchange using LoongArch's
> LL.D/SC.Q instructions
> - Fixes BPF scheduler test failures (scx_central scx_qmap) where
> kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
> leading to -ENOMEM errors during scheduler initialization
>
> 3. "LoongArch: Use spinlock to emulate 128-bit cmpxchg"
> - For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
> the SCQ instruction on 3A5000), provide a fallback implementation
> of __cmpxchg128 using a spinlock to emulate the atomic operation.
>
> 4. "LoongArch: Enable 128-bit atomics cmpxchg support"
> - Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
> in Kconfig to enable 128-bit atomic cmpxchg support
>
> The issue was identified through BPF scheduler test failures where
> scx_central and scx_qmap schedulers would fail to initialize. Testing
> was performed using the scx_qmap scheduler from tools/sched_ext/,
> confirming that the patches resolve the initialization failures.
>
> Signed-off-by: George Guo <dongtai.guo@linux.dev>
> ---
> Changes in v6:
> - Put SCQ information in hwcap
> - Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
>
> Changes in v5:
> - Reordered the patches
> - Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
>
> Changes in v4:
> - Add SCQ support detection
> - Add spinlock to emulate 128-bit cmpxchg
> - Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
>
> Changes in v3:
> - dbar 0 -> __WEAK_LLSC_MB
> - =ZB" (__ptr[0]) -> "r" (__ptr)
> - Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
>
> Changes in v2:
> - Use a normal ld.d for the high word instead of ll.d to avoid race
> condition
> - Insert a dbar between ll.d and ld.d to prevent reordering
> - Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
> - Fix address operand constraints after testing different approaches:
> * ld.d with "m"
> * ll.d with "ZC",
> * sc.q with "ZB"(alternative constraints caused issues:
> - "r" caused system hang
> - "ZC" caused compiler error:
> {standard input}: Assembler messages:
> {standard input}:10037: Fatal error: Immediate overflow.
> format: u0:0 )
> - Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
>
> ---
> George Guo (4):
> LoongArch: Add SCQ support detection
> LoongArch: Add 128-bit atomic cmpxchg support
> LoongArch: Use spinlock to emulate 128-bit cmpxchg
> LoongArch: Enable 128-bit atomics cmpxchg support
>
> arch/loongarch/Kconfig | 2 +
> arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++++++++++
> arch/loongarch/include/asm/cpu-features.h | 1 +
> arch/loongarch/include/asm/cpu.h | 2 +
> arch/loongarch/include/asm/loongarch.h | 1 +
> arch/loongarch/kernel/cpu-probe.c | 2 +
> arch/loongarch/kernel/proc.c | 1 +
> 7 files changed, 75 insertions(+)
> ---
> base-commit: 612df905d7404450696e979c806ba4cdef8684f4
> change-id: 20251120-2-d03862b2cf6d
>
> Best regards,
> --
> George Guo <dongtai.guo@linux.dev>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5)
2025-12-15 8:11 [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) George Guo
` (3 preceding siblings ...)
2025-12-20 13:41 ` [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) Hengqi Chen
@ 2025-12-20 13:55 ` Hengqi Chen
4 siblings, 0 replies; 31+ messages in thread
From: Hengqi Chen @ 2025-12-20 13:55 UTC (permalink / raw)
To: George Guo
Cc: Huacai Chen, WANG Xuerui, r, xry111, loongarch, linux-kernel,
George Guo, Yangyang Lian
On Mon, Dec 15, 2025 at 4:11 PM George Guo <dongtai.guo@linux.dev> wrote:
>
> This patch series adds 128-bit atomic compare-and-exchange support for
> LoongArch architecture, which fixes BPF scheduler test failures caused
> by missing 128-bit atomics support.
>
> The series consists of four patches:
>
> 1. "LoongArch: Add SCQ support detection"
> - Check CPUCFG2_SCQ bit to determin if the CPU supports
> SCQ instrction.
>
> 2. "LoongArch: Add 128-bit atomic cmpxchg support"
> - Implements 128-bit atomic compare-and-exchange using LoongArch's
> LL.D/SC.Q instructions
> - Fixes BPF scheduler test failures (scx_central scx_qmap) where
> kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
> leading to -ENOMEM errors during scheduler initialization
>
> 3. "LoongArch: Use spinlock to emulate 128-bit cmpxchg"
> - For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
> the SCQ instruction on 3A5000), provide a fallback implementation
> of __cmpxchg128 using a spinlock to emulate the atomic operation.
>
Probably, you can combine patch 2 and patch 3 into a single patch.
> 4. "LoongArch: Enable 128-bit atomics cmpxchg support"
> - Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
> in Kconfig to enable 128-bit atomic cmpxchg support
>
> The issue was identified through BPF scheduler test failures where
> scx_central and scx_qmap schedulers would fail to initialize. Testing
> was performed using the scx_qmap scheduler from tools/sched_ext/,
> confirming that the patches resolve the initialization failures.
>
> Signed-off-by: George Guo <dongtai.guo@linux.dev>
> ---
> Changes in v6:
> - Put SCQ information in hwcap
> - Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
>
> Changes in v5:
> - Reordered the patches
> - Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
>
> Changes in v4:
> - Add SCQ support detection
> - Add spinlock to emulate 128-bit cmpxchg
> - Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
>
> Changes in v3:
> - dbar 0 -> __WEAK_LLSC_MB
> - =ZB" (__ptr[0]) -> "r" (__ptr)
> - Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
>
> Changes in v2:
> - Use a normal ld.d for the high word instead of ll.d to avoid race
> condition
> - Insert a dbar between ll.d and ld.d to prevent reordering
> - Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
> - Fix address operand constraints after testing different approaches:
> * ld.d with "m"
> * ll.d with "ZC",
> * sc.q with "ZB"(alternative constraints caused issues:
> - "r" caused system hang
> - "ZC" caused compiler error:
> {standard input}: Assembler messages:
> {standard input}:10037: Fatal error: Immediate overflow.
> format: u0:0 )
> - Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
>
> ---
> George Guo (4):
> LoongArch: Add SCQ support detection
> LoongArch: Add 128-bit atomic cmpxchg support
> LoongArch: Use spinlock to emulate 128-bit cmpxchg
> LoongArch: Enable 128-bit atomics cmpxchg support
>
> arch/loongarch/Kconfig | 2 +
> arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++++++++++
> arch/loongarch/include/asm/cpu-features.h | 1 +
> arch/loongarch/include/asm/cpu.h | 2 +
> arch/loongarch/include/asm/loongarch.h | 1 +
> arch/loongarch/kernel/cpu-probe.c | 2 +
> arch/loongarch/kernel/proc.c | 1 +
> 7 files changed, 75 insertions(+)
> ---
> base-commit: 612df905d7404450696e979c806ba4cdef8684f4
> change-id: 20251120-2-d03862b2cf6d
>
> Best regards,
> --
> George Guo <dongtai.guo@linux.dev>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH loongarch-next 0/4] LoongArch: Add 128-bit atomic cmpxchg support
2025-12-20 13:41 ` [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) Hengqi Chen
@ 2025-12-29 6:34 ` George Guo
2025-12-29 6:34 ` [PATCH loongarch-next 1/4] LoongArch: Add SCQ support detection George Guo
` (4 more replies)
0 siblings, 5 replies; 31+ messages in thread
From: George Guo @ 2025-12-29 6:34 UTC (permalink / raw)
To: hengqi.chen
Cc: chenhuacai, dongtai.guo, guodongtai, kernel, lianyangyang,
linux-kernel, loongarch, r, xry111
This patch series adds 128-bit atomic compare-and-exchange support for
LoongArch architecture, which fixes BPF scheduler test failures caused
by missing 128-bit atomics support.
The series consists of four patches:
1. "LoongArch: Add SCQ support detection"
- Check CPUCFG2_SCQ bit to determin if the CPU supports
SCQ instrction.
2. "LoongArch: Add 128-bit atomic cmpxchg support"
- Implements 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions
- Fixes BPF scheduler test failures (scx_central scx_qmap) where
kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
leading to -ENOMEM errors during scheduler initialization
3. "LoongArch: Use spinlock to emulate 128-bit cmpxchg"
- For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
the SCQ instruction on 3A5000), provide a fallback implementation
of __cmpxchg128 using a spinlock to emulate the atomic operation.
4. "LoongArch: Enable 128-bit atomics cmpxchg support"
- Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
in Kconfig to enable 128-bit atomic cmpxchg support
The issue was identified through BPF scheduler test failures where
scx_central and scx_qmap schedulers would fail to initialize. Testing
was performed using the scx_qmap scheduler from tools/sched_ext/,
confirming that the patches resolve the initialization failures.
---
Changes in v7:
- Create patches based on loongarch-next branch(previously used master)
- Link to v6: https://lore.kernel.org/r/20251215-2-v6-0-09a486e8df99@linux.dev
Changes in v6:
- Put SCQ information in hwcap
- Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
Changes in v5:
- Reordered the patches
- Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
Changes in v4:
- Add SCQ support detection
- Add spinlock to emulate 128-bit cmpxchg
- Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
Changes in v3:
- dbar 0 -> __WEAK_LLSC_MB
- =ZB" (__ptr[0]) -> "r" (__ptr)
- Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
Changes in v2:
- Use a normal ld.d for the high word instead of ll.d to avoid race
condition
- Insert a dbar between ll.d and ld.d to prevent reordering
- Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
- Fix address operand constraints after testing different approaches:
* ld.d with "m"
* ll.d with "ZC",
* sc.q with "ZB"(alternative constraints caused issues:
- "r" caused system hang
- "ZC" caused compiler error:
{standard input}: Assembler messages:
{standard input}:10037: Fatal error: Immediate overflow.
format: u0:0 )
- Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
George Guo (4):
LoongArch: Add SCQ support detection
LoongArch: Add 128-bit atomic cmpxchg support
LoongArch: Use spinlock to emulate 128-bit cmpxchg
LoongArch: Enable 128-bit atomics cmpxchg support
arch/loongarch/Kconfig | 2 +
arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++
arch/loongarch/include/asm/cpu-features.h | 1 +
arch/loongarch/include/asm/cpu.h | 2 +
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kernel/cpu-probe.c | 2 +
arch/loongarch/kernel/proc.c | 1 +
7 files changed, 75 insertions(+)
--
2.49.0
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH loongarch-next 1/4] LoongArch: Add SCQ support detection
2025-12-29 6:34 ` [PATCH loongarch-next 0/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
@ 2025-12-29 6:34 ` George Guo
2025-12-29 6:34 ` [PATCH loongarch-next 2/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
` (3 subsequent siblings)
4 siblings, 0 replies; 31+ messages in thread
From: George Guo @ 2025-12-29 6:34 UTC (permalink / raw)
To: hengqi.chen
Cc: chenhuacai, dongtai.guo, guodongtai, kernel, lianyangyang,
linux-kernel, loongarch, r, xry111
From: George Guo <guodongtai@kylinos.cn>
Check CPUCFG2_SCQ bit to determin if the CPU supports
SCQ instrction.
Co-developed-by: Yangyang Lian <lianyangyang@kylinos.cn>
Signed-off-by: Yangyang Lian <lianyangyang@kylinos.cn>
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cpu-features.h | 1 +
arch/loongarch/include/asm/cpu.h | 2 ++
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kernel/cpu-probe.c | 2 ++
arch/loongarch/kernel/proc.c | 1 +
5 files changed, 7 insertions(+)
diff --git a/arch/loongarch/include/asm/cpu-features.h b/arch/loongarch/include/asm/cpu-features.h
index 3745d991a99a..39c7fe64c3ef 100644
--- a/arch/loongarch/include/asm/cpu-features.h
+++ b/arch/loongarch/include/asm/cpu-features.h
@@ -67,5 +67,6 @@
#define cpu_has_msgint cpu_opt(LOONGARCH_CPU_MSGINT)
#define cpu_has_avecint cpu_opt(LOONGARCH_CPU_AVECINT)
#define cpu_has_redirectint cpu_opt(LOONGARCH_CPU_REDIRECTINT)
+#define cpu_has_scq cpu_opt(LOONGARCH_CPU_SCQ)
#endif /* __ASM_CPU_FEATURES_H */
diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h
index f3efb00b6141..5531039027ec 100644
--- a/arch/loongarch/include/asm/cpu.h
+++ b/arch/loongarch/include/asm/cpu.h
@@ -125,6 +125,7 @@ static inline char *id_to_core_name(unsigned int id)
#define CPU_FEATURE_MSGINT 29 /* CPU has MSG interrupt */
#define CPU_FEATURE_AVECINT 30 /* CPU has AVEC interrupt */
#define CPU_FEATURE_REDIRECTINT 31 /* CPU has interrupt remapping */
+#define CPU_FEATURE_SCQ 32 /* CPU has SC.Q instruction */
#define LOONGARCH_CPU_CPUCFG BIT_ULL(CPU_FEATURE_CPUCFG)
#define LOONGARCH_CPU_LAM BIT_ULL(CPU_FEATURE_LAM)
@@ -158,5 +159,6 @@ static inline char *id_to_core_name(unsigned int id)
#define LOONGARCH_CPU_MSGINT BIT_ULL(CPU_FEATURE_MSGINT)
#define LOONGARCH_CPU_AVECINT BIT_ULL(CPU_FEATURE_AVECINT)
#define LOONGARCH_CPU_REDIRECTINT BIT_ULL(CPU_FEATURE_REDIRECTINT)
+#define LOONGARCH_CPU_SCQ BIT_ULL(CPU_FEATURE_SCQ)
#endif /* _ASM_CPU_H */
diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index e6b8ff61c8cc..817cd90941d9 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -94,6 +94,7 @@
#define CPUCFG2_LSPW BIT(21)
#define CPUCFG2_LAM BIT(22)
#define CPUCFG2_PTW BIT(24)
+#define CPUCFG2_SCQ BIT(30)
#define LOONGARCH_CPUCFG3 0x3
#define CPUCFG3_CCDMA BIT(0)
diff --git a/arch/loongarch/kernel/cpu-probe.c b/arch/loongarch/kernel/cpu-probe.c
index 08a227034042..382c472c6bfe 100644
--- a/arch/loongarch/kernel/cpu-probe.c
+++ b/arch/loongarch/kernel/cpu-probe.c
@@ -205,6 +205,8 @@ static void cpu_probe_common(struct cpuinfo_loongarch *c)
c->options |= LOONGARCH_CPU_PTW;
elf_hwcap |= HWCAP_LOONGARCH_PTW;
}
+ if (config & CPUCFG2_SCQ)
+ c->options |= LOONGARCH_CPU_SCQ;
if (config & CPUCFG2_LSPW) {
c->options |= LOONGARCH_CPU_LSPW;
elf_hwcap |= HWCAP_LOONGARCH_LSPW;
diff --git a/arch/loongarch/kernel/proc.c b/arch/loongarch/kernel/proc.c
index a8800d20e11b..252fa1d03b85 100644
--- a/arch/loongarch/kernel/proc.c
+++ b/arch/loongarch/kernel/proc.c
@@ -75,6 +75,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
if (cpu_has_lbt_x86) seq_printf(m, " lbt_x86");
if (cpu_has_lbt_arm) seq_printf(m, " lbt_arm");
if (cpu_has_lbt_mips) seq_printf(m, " lbt_mips");
+ if (cpu_has_scq) seq_printf(m, " scq");
seq_printf(m, "\n");
seq_printf(m, "Hardware Watchpoint\t: %s", str_yes_no(cpu_has_watch));
--
2.49.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH loongarch-next 2/4] LoongArch: Add 128-bit atomic cmpxchg support
2025-12-29 6:34 ` [PATCH loongarch-next 0/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
2025-12-29 6:34 ` [PATCH loongarch-next 1/4] LoongArch: Add SCQ support detection George Guo
@ 2025-12-29 6:34 ` George Guo
2025-12-29 6:34 ` [PATCH loongarch-next 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg George Guo
` (2 subsequent siblings)
4 siblings, 0 replies; 31+ messages in thread
From: George Guo @ 2025-12-29 6:34 UTC (permalink / raw)
To: hengqi.chen
Cc: chenhuacai, dongtai.guo, guodongtai, kernel, lianyangyang,
linux-kernel, loongarch, r, xry111
From: George Guo <guodongtai@kylinos.cn>
Implement 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions.
At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
caused by kmalloc_nolock_noprof returning NULL due to missing
128-bit atomics. The NULL returns led to -ENOMEM errors during
scheduler initialization, causing test cases to fail.
Verified by testing with the scx_qmap scheduler (located in
tools/sched_ext/). Building with `make` and running
./tools/sched_ext/build/bin/scx_qmap.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cmpxchg.h | 47 ++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index 0494c2ab553e..61ce6a0889f0 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -137,6 +137,44 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
__ret; \
})
+union __u128_halves {
+ u128 full;
+ struct {
+ u64 low;
+ u64 high;
+ };
+};
+
+#define __cmpxchg128_asm(ptr, old, new) \
+({ \
+ union __u128_halves __old, __new, __ret; \
+ volatile u64 *__ptr = (volatile u64 *)(ptr); \
+ \
+ __old.full = (old); \
+ __new.full = (new); \
+ \
+ __asm__ __volatile__( \
+ "1: ll.d %0, %3 # 128-bit cmpxchg low \n" \
+ __WEAK_LLSC_MB \
+ " ld.d %1, %4 # 128-bit cmpxchg high \n" \
+ " bne %0, %z5, 2f \n" \
+ " bne %1, %z6, 2f \n" \
+ " move $t0, %z7 \n" \
+ " move $t1, %z8 \n" \
+ " sc.q $t0, $t1, %2 \n" \
+ " beqz $t0, 1b \n" \
+ "2: \n" \
+ __WEAK_LLSC_MB \
+ : "=&r" (__ret.low), "=&r" (__ret.high) \
+ : "r" (__ptr), \
+ "ZC" (__ptr[0]), "m" (__ptr[1]), \
+ "Jr" (__old.low), "Jr" (__old.high), \
+ "Jr" (__new.low), "Jr" (__new.high) \
+ : "t0", "t1", "memory"); \
+ \
+ __ret.full; \
+})
+
static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
unsigned int new, unsigned int size)
{
@@ -224,6 +262,15 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
__res; \
})
+/* cmpxchg128 */
+#define system_has_cmpxchg128() 1
+
+#define arch_cmpxchg128(ptr, o, n) \
+({ \
+ BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
+ __cmpxchg128_asm(ptr, o, n); \
+})
+
#ifdef CONFIG_64BIT
#define arch_cmpxchg64_local(ptr, o, n) \
({ \
--
2.49.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH loongarch-next 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg
2025-12-29 6:34 ` [PATCH loongarch-next 0/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
2025-12-29 6:34 ` [PATCH loongarch-next 1/4] LoongArch: Add SCQ support detection George Guo
2025-12-29 6:34 ` [PATCH loongarch-next 2/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
@ 2025-12-29 6:34 ` George Guo
2025-12-29 6:34 ` [PATCH loongarch-next 4/4] LoongArch: Enable 128-bit atomics cmpxchg support George Guo
2025-12-29 14:21 ` [PATCH loongarch-next 0/4] LoongArch: Add 128-bit atomic " Hengqi Chen
4 siblings, 0 replies; 31+ messages in thread
From: George Guo @ 2025-12-29 6:34 UTC (permalink / raw)
To: hengqi.chen
Cc: chenhuacai, dongtai.guo, guodongtai, kernel, lianyangyang,
linux-kernel, loongarch, r, xry111
From: George Guo <guodongtai@kylinos.cn>
For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
the SCQ instruction on 3A5000), provide a fallback implementation
of __cmpxchg128 using a spinlock to emulate the atomic operation.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cmpxchg.h | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index 61ce6a0889f0..ef793bcb7b25 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -8,6 +8,7 @@
#include <linux/bits.h>
#include <linux/build_bug.h>
#include <asm/barrier.h>
+#include <asm/cpu-features.h>
#define __xchg_amo_asm(amswap_db, m, val) \
({ \
@@ -175,6 +176,23 @@ union __u128_halves {
__ret.full; \
})
+#define __cmpxchg128_locked(ptr, old, new) \
+({ \
+ u128 __ret; \
+ static DEFINE_SPINLOCK(lock); \
+ unsigned long flags; \
+ \
+ spin_lock_irqsave(&lock, flags); \
+ \
+ __ret = *(volatile u128 *)(ptr); \
+ if (__ret == (old)) \
+ *(volatile u128 *)(ptr) = (new); \
+ \
+ spin_unlock_irqrestore(&lock, flags); \
+ \
+ __ret; \
+})
+
static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
unsigned int new, unsigned int size)
{
@@ -268,7 +286,8 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
#define arch_cmpxchg128(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
- __cmpxchg128_asm(ptr, o, n); \
+ cpu_has_scq ? __cmpxchg128_asm(ptr, o, n) : \
+ __cmpxchg128_locked(ptr, o, n); \
})
#ifdef CONFIG_64BIT
--
2.49.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH loongarch-next 4/4] LoongArch: Enable 128-bit atomics cmpxchg support
2025-12-29 6:34 ` [PATCH loongarch-next 0/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
` (2 preceding siblings ...)
2025-12-29 6:34 ` [PATCH loongarch-next 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg George Guo
@ 2025-12-29 6:34 ` George Guo
2025-12-29 14:21 ` [PATCH loongarch-next 0/4] LoongArch: Add 128-bit atomic " Hengqi Chen
4 siblings, 0 replies; 31+ messages in thread
From: George Guo @ 2025-12-29 6:34 UTC (permalink / raw)
To: hengqi.chen
Cc: chenhuacai, dongtai.guo, guodongtai, kernel, lianyangyang,
linux-kernel, loongarch, r, xry111
From: George Guo <guodongtai@kylinos.cn>
Add select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE in Kconfig
to enable 128-bit atomic cmpxchg support on LoongArch.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 730f34214519..d4de823276d1 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -114,6 +114,7 @@ config LOONGARCH
select GENERIC_TIME_VSYSCALL
select GPIOLIB
select HAS_IOPORT
+ select HAVE_ALIGNED_STRUCT_PAGE
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_BITREVERSE
select HAVE_ARCH_JUMP_LABEL
@@ -141,6 +142,7 @@ config LOONGARCH
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_EBPF_JIT
+ select HAVE_CMPXCHG_DOUBLE
select HAVE_EFFICIENT_UNALIGNED_ACCESS if !ARCH_STRICT_ALIGN
select HAVE_EXIT_THREAD
select HAVE_GENERIC_TIF_BITS
--
2.49.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH loongarch-next 0/4] LoongArch: Add 128-bit atomic cmpxchg support
2025-12-29 6:34 ` [PATCH loongarch-next 0/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
` (3 preceding siblings ...)
2025-12-29 6:34 ` [PATCH loongarch-next 4/4] LoongArch: Enable 128-bit atomics cmpxchg support George Guo
@ 2025-12-29 14:21 ` Hengqi Chen
2025-12-30 1:34 ` [PATCH v7 " George Guo
4 siblings, 1 reply; 31+ messages in thread
From: Hengqi Chen @ 2025-12-29 14:21 UTC (permalink / raw)
To: George Guo
Cc: chenhuacai, guodongtai, kernel, lianyangyang, linux-kernel,
loongarch, r, xry111
On Mon, Dec 29, 2025 at 2:34 PM George Guo <dongtai.guo@linux.dev> wrote:
>
> This patch series adds 128-bit atomic compare-and-exchange support for
> LoongArch architecture, which fixes BPF scheduler test failures caused
> by missing 128-bit atomics support.
>
> The series consists of four patches:
>
> 1. "LoongArch: Add SCQ support detection"
> - Check CPUCFG2_SCQ bit to determin if the CPU supports
> SCQ instrction.
>
> 2. "LoongArch: Add 128-bit atomic cmpxchg support"
> - Implements 128-bit atomic compare-and-exchange using LoongArch's
> LL.D/SC.Q instructions
> - Fixes BPF scheduler test failures (scx_central scx_qmap) where
> kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
> leading to -ENOMEM errors during scheduler initialization
>
> 3. "LoongArch: Use spinlock to emulate 128-bit cmpxchg"
> - For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
> the SCQ instruction on 3A5000), provide a fallback implementation
> of __cmpxchg128 using a spinlock to emulate the atomic operation.
>
> 4. "LoongArch: Enable 128-bit atomics cmpxchg support"
> - Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
> in Kconfig to enable 128-bit atomic cmpxchg support
>
> The issue was identified through BPF scheduler test failures where
> scx_central and scx_qmap schedulers would fail to initialize. Testing
> was performed using the scx_qmap scheduler from tools/sched_ext/,
> confirming that the patches resolve the initialization failures.
>
> ---
> Changes in v7:
> - Create patches based on loongarch-next branch(previously used master)
> - Link to v6: https://lore.kernel.org/r/20251215-2-v6-0-09a486e8df99@linux.dev
>
Please tag the subject line with v7 and resend, otherwise this
confuses b4. Thanks.
> Changes in v6:
> - Put SCQ information in hwcap
> - Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
>
> Changes in v5:
> - Reordered the patches
> - Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
>
> Changes in v4:
> - Add SCQ support detection
> - Add spinlock to emulate 128-bit cmpxchg
> - Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
>
> Changes in v3:
> - dbar 0 -> __WEAK_LLSC_MB
> - =ZB" (__ptr[0]) -> "r" (__ptr)
> - Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
>
> Changes in v2:
> - Use a normal ld.d for the high word instead of ll.d to avoid race
> condition
> - Insert a dbar between ll.d and ld.d to prevent reordering
> - Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
> - Fix address operand constraints after testing different approaches:
> * ld.d with "m"
> * ll.d with "ZC",
> * sc.q with "ZB"(alternative constraints caused issues:
> - "r" caused system hang
> - "ZC" caused compiler error:
> {standard input}: Assembler messages:
> {standard input}:10037: Fatal error: Immediate overflow.
> format: u0:0 )
> - Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
>
>
> George Guo (4):
> LoongArch: Add SCQ support detection
> LoongArch: Add 128-bit atomic cmpxchg support
> LoongArch: Use spinlock to emulate 128-bit cmpxchg
> LoongArch: Enable 128-bit atomics cmpxchg support
>
> arch/loongarch/Kconfig | 2 +
> arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++
> arch/loongarch/include/asm/cpu-features.h | 1 +
> arch/loongarch/include/asm/cpu.h | 2 +
> arch/loongarch/include/asm/loongarch.h | 1 +
> arch/loongarch/kernel/cpu-probe.c | 2 +
> arch/loongarch/kernel/proc.c | 1 +
> 7 files changed, 75 insertions(+)
>
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v7 loongarch-next 0/4] LoongArch: Add 128-bit atomic cmpxchg support
2025-12-29 14:21 ` [PATCH loongarch-next 0/4] LoongArch: Add 128-bit atomic " Hengqi Chen
@ 2025-12-30 1:34 ` George Guo
2025-12-30 1:34 ` [PATCH v7 loongarch-next 1/4] LoongArch: Add SCQ support detection George Guo
` (4 more replies)
0 siblings, 5 replies; 31+ messages in thread
From: George Guo @ 2025-12-30 1:34 UTC (permalink / raw)
To: hengqi.chen
Cc: chenhuacai, dongtai.guo, guodongtai, kernel, lianyangyang,
linux-kernel, loongarch, r, xry111
This patch series adds 128-bit atomic compare-and-exchange support for
LoongArch architecture, which fixes BPF scheduler test failures caused
by missing 128-bit atomics support.
The series consists of four patches:
1. "LoongArch: Add SCQ support detection"
- Check CPUCFG2_SCQ bit to determin if the CPU supports
SCQ instrction.
2. "LoongArch: Add 128-bit atomic cmpxchg support"
- Implements 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions
- Fixes BPF scheduler test failures (scx_central scx_qmap) where
kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
leading to -ENOMEM errors during scheduler initialization
3. "LoongArch: Use spinlock to emulate 128-bit cmpxchg"
- For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
the SCQ instruction on 3A5000), provide a fallback implementation
of __cmpxchg128 using a spinlock to emulate the atomic operation.
4. "LoongArch: Enable 128-bit atomics cmpxchg support"
- Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
in Kconfig to enable 128-bit atomic cmpxchg support
The issue was identified through BPF scheduler test failures where
scx_central and scx_qmap schedulers would fail to initialize. Testing
was performed using the scx_qmap scheduler from tools/sched_ext/,
confirming that the patches resolve the initialization failures.
---
Changes in v7:
- Create patches based on loongarch-next branch(previously used master)
- Link to v6: https://lore.kernel.org/r/20251215-2-v6-0-09a486e8df99@linux.dev
Changes in v6:
- Put SCQ information in hwcap
- Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
Changes in v5:
- Reordered the patches
- Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
Changes in v4:
- Add SCQ support detection
- Add spinlock to emulate 128-bit cmpxchg
- Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
Changes in v3:
- dbar 0 -> __WEAK_LLSC_MB
- =ZB" (__ptr[0]) -> "r" (__ptr)
- Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
Changes in v2:
- Use a normal ld.d for the high word instead of ll.d to avoid race
condition
- Insert a dbar between ll.d and ld.d to prevent reordering
- Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
- Fix address operand constraints after testing different approaches:
* ld.d with "m"
* ll.d with "ZC",
* sc.q with "ZB"(alternative constraints caused issues:
- "r" caused system hang
- "ZC" caused compiler error:
{standard input}: Assembler messages:
{standard input}:10037: Fatal error: Immediate overflow.
format: u0:0 )
- Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
George Guo (4):
LoongArch: Add SCQ support detection
LoongArch: Add 128-bit atomic cmpxchg support
LoongArch: Use spinlock to emulate 128-bit cmpxchg
LoongArch: Enable 128-bit atomics cmpxchg support
arch/loongarch/Kconfig | 2 +
arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++
arch/loongarch/include/asm/cpu-features.h | 1 +
arch/loongarch/include/asm/cpu.h | 2 +
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kernel/cpu-probe.c | 2 +
arch/loongarch/kernel/proc.c | 1 +
7 files changed, 75 insertions(+)
--
2.49.0
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v7 loongarch-next 1/4] LoongArch: Add SCQ support detection
2025-12-30 1:34 ` [PATCH v7 " George Guo
@ 2025-12-30 1:34 ` George Guo
2025-12-30 12:05 ` Hengqi Chen
2025-12-30 12:07 ` Hengqi Chen
2025-12-30 1:34 ` [PATCH v7 loongarch-next 2/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
` (3 subsequent siblings)
4 siblings, 2 replies; 31+ messages in thread
From: George Guo @ 2025-12-30 1:34 UTC (permalink / raw)
To: hengqi.chen
Cc: chenhuacai, dongtai.guo, guodongtai, kernel, lianyangyang,
linux-kernel, loongarch, r, xry111
From: George Guo <guodongtai@kylinos.cn>
Check CPUCFG2_SCQ bit to determin if the CPU supports
SCQ instrction.
Co-developed-by: Yangyang Lian <lianyangyang@kylinos.cn>
Signed-off-by: Yangyang Lian <lianyangyang@kylinos.cn>
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cpu-features.h | 1 +
arch/loongarch/include/asm/cpu.h | 2 ++
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kernel/cpu-probe.c | 2 ++
arch/loongarch/kernel/proc.c | 1 +
5 files changed, 7 insertions(+)
diff --git a/arch/loongarch/include/asm/cpu-features.h b/arch/loongarch/include/asm/cpu-features.h
index 3745d991a99a..39c7fe64c3ef 100644
--- a/arch/loongarch/include/asm/cpu-features.h
+++ b/arch/loongarch/include/asm/cpu-features.h
@@ -67,5 +67,6 @@
#define cpu_has_msgint cpu_opt(LOONGARCH_CPU_MSGINT)
#define cpu_has_avecint cpu_opt(LOONGARCH_CPU_AVECINT)
#define cpu_has_redirectint cpu_opt(LOONGARCH_CPU_REDIRECTINT)
+#define cpu_has_scq cpu_opt(LOONGARCH_CPU_SCQ)
#endif /* __ASM_CPU_FEATURES_H */
diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h
index f3efb00b6141..5531039027ec 100644
--- a/arch/loongarch/include/asm/cpu.h
+++ b/arch/loongarch/include/asm/cpu.h
@@ -125,6 +125,7 @@ static inline char *id_to_core_name(unsigned int id)
#define CPU_FEATURE_MSGINT 29 /* CPU has MSG interrupt */
#define CPU_FEATURE_AVECINT 30 /* CPU has AVEC interrupt */
#define CPU_FEATURE_REDIRECTINT 31 /* CPU has interrupt remapping */
+#define CPU_FEATURE_SCQ 32 /* CPU has SC.Q instruction */
#define LOONGARCH_CPU_CPUCFG BIT_ULL(CPU_FEATURE_CPUCFG)
#define LOONGARCH_CPU_LAM BIT_ULL(CPU_FEATURE_LAM)
@@ -158,5 +159,6 @@ static inline char *id_to_core_name(unsigned int id)
#define LOONGARCH_CPU_MSGINT BIT_ULL(CPU_FEATURE_MSGINT)
#define LOONGARCH_CPU_AVECINT BIT_ULL(CPU_FEATURE_AVECINT)
#define LOONGARCH_CPU_REDIRECTINT BIT_ULL(CPU_FEATURE_REDIRECTINT)
+#define LOONGARCH_CPU_SCQ BIT_ULL(CPU_FEATURE_SCQ)
#endif /* _ASM_CPU_H */
diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index e6b8ff61c8cc..817cd90941d9 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -94,6 +94,7 @@
#define CPUCFG2_LSPW BIT(21)
#define CPUCFG2_LAM BIT(22)
#define CPUCFG2_PTW BIT(24)
+#define CPUCFG2_SCQ BIT(30)
#define LOONGARCH_CPUCFG3 0x3
#define CPUCFG3_CCDMA BIT(0)
diff --git a/arch/loongarch/kernel/cpu-probe.c b/arch/loongarch/kernel/cpu-probe.c
index 08a227034042..382c472c6bfe 100644
--- a/arch/loongarch/kernel/cpu-probe.c
+++ b/arch/loongarch/kernel/cpu-probe.c
@@ -205,6 +205,8 @@ static void cpu_probe_common(struct cpuinfo_loongarch *c)
c->options |= LOONGARCH_CPU_PTW;
elf_hwcap |= HWCAP_LOONGARCH_PTW;
}
+ if (config & CPUCFG2_SCQ)
+ c->options |= LOONGARCH_CPU_SCQ;
if (config & CPUCFG2_LSPW) {
c->options |= LOONGARCH_CPU_LSPW;
elf_hwcap |= HWCAP_LOONGARCH_LSPW;
diff --git a/arch/loongarch/kernel/proc.c b/arch/loongarch/kernel/proc.c
index a8800d20e11b..252fa1d03b85 100644
--- a/arch/loongarch/kernel/proc.c
+++ b/arch/loongarch/kernel/proc.c
@@ -75,6 +75,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
if (cpu_has_lbt_x86) seq_printf(m, " lbt_x86");
if (cpu_has_lbt_arm) seq_printf(m, " lbt_arm");
if (cpu_has_lbt_mips) seq_printf(m, " lbt_mips");
+ if (cpu_has_scq) seq_printf(m, " scq");
seq_printf(m, "\n");
seq_printf(m, "Hardware Watchpoint\t: %s", str_yes_no(cpu_has_watch));
--
2.49.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v7 loongarch-next 2/4] LoongArch: Add 128-bit atomic cmpxchg support
2025-12-30 1:34 ` [PATCH v7 " George Guo
2025-12-30 1:34 ` [PATCH v7 loongarch-next 1/4] LoongArch: Add SCQ support detection George Guo
@ 2025-12-30 1:34 ` George Guo
2025-12-30 12:17 ` Hengqi Chen
2025-12-30 1:34 ` [PATCH v7 loongarch-next 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg George Guo
` (2 subsequent siblings)
4 siblings, 1 reply; 31+ messages in thread
From: George Guo @ 2025-12-30 1:34 UTC (permalink / raw)
To: hengqi.chen
Cc: chenhuacai, dongtai.guo, guodongtai, kernel, lianyangyang,
linux-kernel, loongarch, r, xry111
From: George Guo <guodongtai@kylinos.cn>
Implement 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions.
At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
caused by kmalloc_nolock_noprof returning NULL due to missing
128-bit atomics. The NULL returns led to -ENOMEM errors during
scheduler initialization, causing test cases to fail.
Verified by testing with the scx_qmap scheduler (located in
tools/sched_ext/). Building with `make` and running
./tools/sched_ext/build/bin/scx_qmap.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cmpxchg.h | 47 ++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index 0494c2ab553e..61ce6a0889f0 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -137,6 +137,44 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
__ret; \
})
+union __u128_halves {
+ u128 full;
+ struct {
+ u64 low;
+ u64 high;
+ };
+};
+
+#define __cmpxchg128_asm(ptr, old, new) \
+({ \
+ union __u128_halves __old, __new, __ret; \
+ volatile u64 *__ptr = (volatile u64 *)(ptr); \
+ \
+ __old.full = (old); \
+ __new.full = (new); \
+ \
+ __asm__ __volatile__( \
+ "1: ll.d %0, %3 # 128-bit cmpxchg low \n" \
+ __WEAK_LLSC_MB \
+ " ld.d %1, %4 # 128-bit cmpxchg high \n" \
+ " bne %0, %z5, 2f \n" \
+ " bne %1, %z6, 2f \n" \
+ " move $t0, %z7 \n" \
+ " move $t1, %z8 \n" \
+ " sc.q $t0, $t1, %2 \n" \
+ " beqz $t0, 1b \n" \
+ "2: \n" \
+ __WEAK_LLSC_MB \
+ : "=&r" (__ret.low), "=&r" (__ret.high) \
+ : "r" (__ptr), \
+ "ZC" (__ptr[0]), "m" (__ptr[1]), \
+ "Jr" (__old.low), "Jr" (__old.high), \
+ "Jr" (__new.low), "Jr" (__new.high) \
+ : "t0", "t1", "memory"); \
+ \
+ __ret.full; \
+})
+
static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
unsigned int new, unsigned int size)
{
@@ -224,6 +262,15 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
__res; \
})
+/* cmpxchg128 */
+#define system_has_cmpxchg128() 1
+
+#define arch_cmpxchg128(ptr, o, n) \
+({ \
+ BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
+ __cmpxchg128_asm(ptr, o, n); \
+})
+
#ifdef CONFIG_64BIT
#define arch_cmpxchg64_local(ptr, o, n) \
({ \
--
2.49.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v7 loongarch-next 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg
2025-12-30 1:34 ` [PATCH v7 " George Guo
2025-12-30 1:34 ` [PATCH v7 loongarch-next 1/4] LoongArch: Add SCQ support detection George Guo
2025-12-30 1:34 ` [PATCH v7 loongarch-next 2/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
@ 2025-12-30 1:34 ` George Guo
2025-12-30 1:34 ` [PATCH v7 loongarch-next 4/4] LoongArch: Enable 128-bit atomics cmpxchg support George Guo
2025-12-30 12:04 ` [PATCH v7 loongarch-next 0/4] LoongArch: Add 128-bit atomic " Hengqi Chen
4 siblings, 0 replies; 31+ messages in thread
From: George Guo @ 2025-12-30 1:34 UTC (permalink / raw)
To: hengqi.chen
Cc: chenhuacai, dongtai.guo, guodongtai, kernel, lianyangyang,
linux-kernel, loongarch, r, xry111
From: George Guo <guodongtai@kylinos.cn>
For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
the SCQ instruction on 3A5000), provide a fallback implementation
of __cmpxchg128 using a spinlock to emulate the atomic operation.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cmpxchg.h | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index 61ce6a0889f0..ef793bcb7b25 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -8,6 +8,7 @@
#include <linux/bits.h>
#include <linux/build_bug.h>
#include <asm/barrier.h>
+#include <asm/cpu-features.h>
#define __xchg_amo_asm(amswap_db, m, val) \
({ \
@@ -175,6 +176,23 @@ union __u128_halves {
__ret.full; \
})
+#define __cmpxchg128_locked(ptr, old, new) \
+({ \
+ u128 __ret; \
+ static DEFINE_SPINLOCK(lock); \
+ unsigned long flags; \
+ \
+ spin_lock_irqsave(&lock, flags); \
+ \
+ __ret = *(volatile u128 *)(ptr); \
+ if (__ret == (old)) \
+ *(volatile u128 *)(ptr) = (new); \
+ \
+ spin_unlock_irqrestore(&lock, flags); \
+ \
+ __ret; \
+})
+
static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
unsigned int new, unsigned int size)
{
@@ -268,7 +286,8 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
#define arch_cmpxchg128(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
- __cmpxchg128_asm(ptr, o, n); \
+ cpu_has_scq ? __cmpxchg128_asm(ptr, o, n) : \
+ __cmpxchg128_locked(ptr, o, n); \
})
#ifdef CONFIG_64BIT
--
2.49.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v7 loongarch-next 4/4] LoongArch: Enable 128-bit atomics cmpxchg support
2025-12-30 1:34 ` [PATCH v7 " George Guo
` (2 preceding siblings ...)
2025-12-30 1:34 ` [PATCH v7 loongarch-next 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg George Guo
@ 2025-12-30 1:34 ` George Guo
2025-12-30 12:19 ` Hengqi Chen
2025-12-30 12:04 ` [PATCH v7 loongarch-next 0/4] LoongArch: Add 128-bit atomic " Hengqi Chen
4 siblings, 1 reply; 31+ messages in thread
From: George Guo @ 2025-12-30 1:34 UTC (permalink / raw)
To: hengqi.chen
Cc: chenhuacai, dongtai.guo, guodongtai, kernel, lianyangyang,
linux-kernel, loongarch, r, xry111
From: George Guo <guodongtai@kylinos.cn>
Add select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE in Kconfig
to enable 128-bit atomic cmpxchg support on LoongArch.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 730f34214519..d4de823276d1 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -114,6 +114,7 @@ config LOONGARCH
select GENERIC_TIME_VSYSCALL
select GPIOLIB
select HAS_IOPORT
+ select HAVE_ALIGNED_STRUCT_PAGE
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_BITREVERSE
select HAVE_ARCH_JUMP_LABEL
@@ -141,6 +142,7 @@ config LOONGARCH
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_EBPF_JIT
+ select HAVE_CMPXCHG_DOUBLE
select HAVE_EFFICIENT_UNALIGNED_ACCESS if !ARCH_STRICT_ALIGN
select HAVE_EXIT_THREAD
select HAVE_GENERIC_TIF_BITS
--
2.49.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v7 loongarch-next 0/4] LoongArch: Add 128-bit atomic cmpxchg support
2025-12-30 1:34 ` [PATCH v7 " George Guo
` (3 preceding siblings ...)
2025-12-30 1:34 ` [PATCH v7 loongarch-next 4/4] LoongArch: Enable 128-bit atomics cmpxchg support George Guo
@ 2025-12-30 12:04 ` Hengqi Chen
2025-12-31 3:45 ` [PATCH v8 loongarch-next 0/3] " George Guo
4 siblings, 1 reply; 31+ messages in thread
From: Hengqi Chen @ 2025-12-30 12:04 UTC (permalink / raw)
To: George Guo
Cc: chenhuacai, guodongtai, kernel, lianyangyang, linux-kernel,
loongarch, r, xry111
On Tue, Dec 30, 2025 at 9:34 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> This patch series adds 128-bit atomic compare-and-exchange support for
> LoongArch architecture, which fixes BPF scheduler test failures caused
> by missing 128-bit atomics support.
>
> The series consists of four patches:
>
> 1. "LoongArch: Add SCQ support detection"
> - Check CPUCFG2_SCQ bit to determin if the CPU supports
> SCQ instrction.
>
> 2. "LoongArch: Add 128-bit atomic cmpxchg support"
> - Implements 128-bit atomic compare-and-exchange using LoongArch's
> LL.D/SC.Q instructions
> - Fixes BPF scheduler test failures (scx_central scx_qmap) where
> kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
> leading to -ENOMEM errors during scheduler initialization
>
> 3. "LoongArch: Use spinlock to emulate 128-bit cmpxchg"
> - For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
> the SCQ instruction on 3A5000), provide a fallback implementation
> of __cmpxchg128 using a spinlock to emulate the atomic operation.
>
> 4. "LoongArch: Enable 128-bit atomics cmpxchg support"
> - Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
> in Kconfig to enable 128-bit atomic cmpxchg support
>
> The issue was identified through BPF scheduler test failures where
> scx_central and scx_qmap schedulers would fail to initialize. Testing
> was performed using the scx_qmap scheduler from tools/sched_ext/,
> confirming that the patches resolve the initialization failures.
>
Testing good, this series fixes the BPF timer issues.
For the series:
Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
> ---
> Changes in v7:
> - Create patches based on loongarch-next branch(previously used master)
> - Link to v6: https://lore.kernel.org/r/20251215-2-v6-0-09a486e8df99@linux.dev
>
> Changes in v6:
> - Put SCQ information in hwcap
> - Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
>
> Changes in v5:
> - Reordered the patches
> - Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
>
> Changes in v4:
> - Add SCQ support detection
> - Add spinlock to emulate 128-bit cmpxchg
> - Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
>
> Changes in v3:
> - dbar 0 -> __WEAK_LLSC_MB
> - =ZB" (__ptr[0]) -> "r" (__ptr)
> - Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
>
> Changes in v2:
> - Use a normal ld.d for the high word instead of ll.d to avoid race
> condition
> - Insert a dbar between ll.d and ld.d to prevent reordering
> - Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
> - Fix address operand constraints after testing different approaches:
> * ld.d with "m"
> * ll.d with "ZC",
> * sc.q with "ZB"(alternative constraints caused issues:
> - "r" caused system hang
> - "ZC" caused compiler error:
> {standard input}: Assembler messages:
> {standard input}:10037: Fatal error: Immediate overflow.
> format: u0:0 )
> - Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
>
>
> George Guo (4):
> LoongArch: Add SCQ support detection
> LoongArch: Add 128-bit atomic cmpxchg support
> LoongArch: Use spinlock to emulate 128-bit cmpxchg
> LoongArch: Enable 128-bit atomics cmpxchg support
>
> arch/loongarch/Kconfig | 2 +
> arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++
> arch/loongarch/include/asm/cpu-features.h | 1 +
> arch/loongarch/include/asm/cpu.h | 2 +
> arch/loongarch/include/asm/loongarch.h | 1 +
> arch/loongarch/kernel/cpu-probe.c | 2 +
> arch/loongarch/kernel/proc.c | 1 +
> 7 files changed, 75 insertions(+)
>
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v7 loongarch-next 1/4] LoongArch: Add SCQ support detection
2025-12-30 1:34 ` [PATCH v7 loongarch-next 1/4] LoongArch: Add SCQ support detection George Guo
@ 2025-12-30 12:05 ` Hengqi Chen
2025-12-30 12:07 ` Hengqi Chen
1 sibling, 0 replies; 31+ messages in thread
From: Hengqi Chen @ 2025-12-30 12:05 UTC (permalink / raw)
To: George Guo
Cc: chenhuacai, guodongtai, kernel, lianyangyang, linux-kernel,
loongarch, r, xry111
On Tue, Dec 30, 2025 at 9:34 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> From: George Guo <guodongtai@kylinos.cn>
>
> Check CPUCFG2_SCQ bit to determin if the CPU supports
> SCQ instrction.
>
> Co-developed-by: Yangyang Lian <lianyangyang@kylinos.cn>
> Signed-off-by: Yangyang Lian <lianyangyang@kylinos.cn>
> Signed-off-by: George Guo <guodongtai@kylinos.cn>
> ---
> arch/loongarch/include/asm/cpu-features.h | 1 +
> arch/loongarch/include/asm/cpu.h | 2 ++
> arch/loongarch/include/asm/loongarch.h | 1 +
> arch/loongarch/kernel/cpu-probe.c | 2 ++
> arch/loongarch/kernel/proc.c | 1 +
> 5 files changed, 7 insertions(+)
>
Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com>
> diff --git a/arch/loongarch/include/asm/cpu-features.h b/arch/loongarch/include/asm/cpu-features.h
> index 3745d991a99a..39c7fe64c3ef 100644
> --- a/arch/loongarch/include/asm/cpu-features.h
> +++ b/arch/loongarch/include/asm/cpu-features.h
> @@ -67,5 +67,6 @@
> #define cpu_has_msgint cpu_opt(LOONGARCH_CPU_MSGINT)
> #define cpu_has_avecint cpu_opt(LOONGARCH_CPU_AVECINT)
> #define cpu_has_redirectint cpu_opt(LOONGARCH_CPU_REDIRECTINT)
> +#define cpu_has_scq cpu_opt(LOONGARCH_CPU_SCQ)
>
> #endif /* __ASM_CPU_FEATURES_H */
> diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h
> index f3efb00b6141..5531039027ec 100644
> --- a/arch/loongarch/include/asm/cpu.h
> +++ b/arch/loongarch/include/asm/cpu.h
> @@ -125,6 +125,7 @@ static inline char *id_to_core_name(unsigned int id)
> #define CPU_FEATURE_MSGINT 29 /* CPU has MSG interrupt */
> #define CPU_FEATURE_AVECINT 30 /* CPU has AVEC interrupt */
> #define CPU_FEATURE_REDIRECTINT 31 /* CPU has interrupt remapping */
> +#define CPU_FEATURE_SCQ 32 /* CPU has SC.Q instruction */
>
> #define LOONGARCH_CPU_CPUCFG BIT_ULL(CPU_FEATURE_CPUCFG)
> #define LOONGARCH_CPU_LAM BIT_ULL(CPU_FEATURE_LAM)
> @@ -158,5 +159,6 @@ static inline char *id_to_core_name(unsigned int id)
> #define LOONGARCH_CPU_MSGINT BIT_ULL(CPU_FEATURE_MSGINT)
> #define LOONGARCH_CPU_AVECINT BIT_ULL(CPU_FEATURE_AVECINT)
> #define LOONGARCH_CPU_REDIRECTINT BIT_ULL(CPU_FEATURE_REDIRECTINT)
> +#define LOONGARCH_CPU_SCQ BIT_ULL(CPU_FEATURE_SCQ)
>
> #endif /* _ASM_CPU_H */
> diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
> index e6b8ff61c8cc..817cd90941d9 100644
> --- a/arch/loongarch/include/asm/loongarch.h
> +++ b/arch/loongarch/include/asm/loongarch.h
> @@ -94,6 +94,7 @@
> #define CPUCFG2_LSPW BIT(21)
> #define CPUCFG2_LAM BIT(22)
> #define CPUCFG2_PTW BIT(24)
> +#define CPUCFG2_SCQ BIT(30)
>
> #define LOONGARCH_CPUCFG3 0x3
> #define CPUCFG3_CCDMA BIT(0)
> diff --git a/arch/loongarch/kernel/cpu-probe.c b/arch/loongarch/kernel/cpu-probe.c
> index 08a227034042..382c472c6bfe 100644
> --- a/arch/loongarch/kernel/cpu-probe.c
> +++ b/arch/loongarch/kernel/cpu-probe.c
> @@ -205,6 +205,8 @@ static void cpu_probe_common(struct cpuinfo_loongarch *c)
> c->options |= LOONGARCH_CPU_PTW;
> elf_hwcap |= HWCAP_LOONGARCH_PTW;
> }
> + if (config & CPUCFG2_SCQ)
> + c->options |= LOONGARCH_CPU_SCQ;
> if (config & CPUCFG2_LSPW) {
> c->options |= LOONGARCH_CPU_LSPW;
> elf_hwcap |= HWCAP_LOONGARCH_LSPW;
> diff --git a/arch/loongarch/kernel/proc.c b/arch/loongarch/kernel/proc.c
> index a8800d20e11b..252fa1d03b85 100644
> --- a/arch/loongarch/kernel/proc.c
> +++ b/arch/loongarch/kernel/proc.c
> @@ -75,6 +75,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> if (cpu_has_lbt_x86) seq_printf(m, " lbt_x86");
> if (cpu_has_lbt_arm) seq_printf(m, " lbt_arm");
> if (cpu_has_lbt_mips) seq_printf(m, " lbt_mips");
> + if (cpu_has_scq) seq_printf(m, " scq");
> seq_printf(m, "\n");
>
> seq_printf(m, "Hardware Watchpoint\t: %s", str_yes_no(cpu_has_watch));
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v7 loongarch-next 1/4] LoongArch: Add SCQ support detection
2025-12-30 1:34 ` [PATCH v7 loongarch-next 1/4] LoongArch: Add SCQ support detection George Guo
2025-12-30 12:05 ` Hengqi Chen
@ 2025-12-30 12:07 ` Hengqi Chen
1 sibling, 0 replies; 31+ messages in thread
From: Hengqi Chen @ 2025-12-30 12:07 UTC (permalink / raw)
To: George Guo
Cc: chenhuacai, guodongtai, kernel, lianyangyang, linux-kernel,
loongarch, r, xry111
On Tue, Dec 30, 2025 at 9:34 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> From: George Guo <guodongtai@kylinos.cn>
>
> Check CPUCFG2_SCQ bit to determin if the CPU supports
> SCQ instrction.
>
nit:
determin -> determine
instruction -> instruction
> Co-developed-by: Yangyang Lian <lianyangyang@kylinos.cn>
> Signed-off-by: Yangyang Lian <lianyangyang@kylinos.cn>
> Signed-off-by: George Guo <guodongtai@kylinos.cn>
> ---
> arch/loongarch/include/asm/cpu-features.h | 1 +
> arch/loongarch/include/asm/cpu.h | 2 ++
> arch/loongarch/include/asm/loongarch.h | 1 +
> arch/loongarch/kernel/cpu-probe.c | 2 ++
> arch/loongarch/kernel/proc.c | 1 +
> 5 files changed, 7 insertions(+)
>
> diff --git a/arch/loongarch/include/asm/cpu-features.h b/arch/loongarch/include/asm/cpu-features.h
> index 3745d991a99a..39c7fe64c3ef 100644
> --- a/arch/loongarch/include/asm/cpu-features.h
> +++ b/arch/loongarch/include/asm/cpu-features.h
> @@ -67,5 +67,6 @@
> #define cpu_has_msgint cpu_opt(LOONGARCH_CPU_MSGINT)
> #define cpu_has_avecint cpu_opt(LOONGARCH_CPU_AVECINT)
> #define cpu_has_redirectint cpu_opt(LOONGARCH_CPU_REDIRECTINT)
> +#define cpu_has_scq cpu_opt(LOONGARCH_CPU_SCQ)
>
> #endif /* __ASM_CPU_FEATURES_H */
> diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h
> index f3efb00b6141..5531039027ec 100644
> --- a/arch/loongarch/include/asm/cpu.h
> +++ b/arch/loongarch/include/asm/cpu.h
> @@ -125,6 +125,7 @@ static inline char *id_to_core_name(unsigned int id)
> #define CPU_FEATURE_MSGINT 29 /* CPU has MSG interrupt */
> #define CPU_FEATURE_AVECINT 30 /* CPU has AVEC interrupt */
> #define CPU_FEATURE_REDIRECTINT 31 /* CPU has interrupt remapping */
> +#define CPU_FEATURE_SCQ 32 /* CPU has SC.Q instruction */
>
> #define LOONGARCH_CPU_CPUCFG BIT_ULL(CPU_FEATURE_CPUCFG)
> #define LOONGARCH_CPU_LAM BIT_ULL(CPU_FEATURE_LAM)
> @@ -158,5 +159,6 @@ static inline char *id_to_core_name(unsigned int id)
> #define LOONGARCH_CPU_MSGINT BIT_ULL(CPU_FEATURE_MSGINT)
> #define LOONGARCH_CPU_AVECINT BIT_ULL(CPU_FEATURE_AVECINT)
> #define LOONGARCH_CPU_REDIRECTINT BIT_ULL(CPU_FEATURE_REDIRECTINT)
> +#define LOONGARCH_CPU_SCQ BIT_ULL(CPU_FEATURE_SCQ)
>
> #endif /* _ASM_CPU_H */
> diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
> index e6b8ff61c8cc..817cd90941d9 100644
> --- a/arch/loongarch/include/asm/loongarch.h
> +++ b/arch/loongarch/include/asm/loongarch.h
> @@ -94,6 +94,7 @@
> #define CPUCFG2_LSPW BIT(21)
> #define CPUCFG2_LAM BIT(22)
> #define CPUCFG2_PTW BIT(24)
> +#define CPUCFG2_SCQ BIT(30)
>
> #define LOONGARCH_CPUCFG3 0x3
> #define CPUCFG3_CCDMA BIT(0)
> diff --git a/arch/loongarch/kernel/cpu-probe.c b/arch/loongarch/kernel/cpu-probe.c
> index 08a227034042..382c472c6bfe 100644
> --- a/arch/loongarch/kernel/cpu-probe.c
> +++ b/arch/loongarch/kernel/cpu-probe.c
> @@ -205,6 +205,8 @@ static void cpu_probe_common(struct cpuinfo_loongarch *c)
> c->options |= LOONGARCH_CPU_PTW;
> elf_hwcap |= HWCAP_LOONGARCH_PTW;
> }
> + if (config & CPUCFG2_SCQ)
> + c->options |= LOONGARCH_CPU_SCQ;
> if (config & CPUCFG2_LSPW) {
> c->options |= LOONGARCH_CPU_LSPW;
> elf_hwcap |= HWCAP_LOONGARCH_LSPW;
> diff --git a/arch/loongarch/kernel/proc.c b/arch/loongarch/kernel/proc.c
> index a8800d20e11b..252fa1d03b85 100644
> --- a/arch/loongarch/kernel/proc.c
> +++ b/arch/loongarch/kernel/proc.c
> @@ -75,6 +75,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> if (cpu_has_lbt_x86) seq_printf(m, " lbt_x86");
> if (cpu_has_lbt_arm) seq_printf(m, " lbt_arm");
> if (cpu_has_lbt_mips) seq_printf(m, " lbt_mips");
> + if (cpu_has_scq) seq_printf(m, " scq");
> seq_printf(m, "\n");
>
> seq_printf(m, "Hardware Watchpoint\t: %s", str_yes_no(cpu_has_watch));
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v7 loongarch-next 2/4] LoongArch: Add 128-bit atomic cmpxchg support
2025-12-30 1:34 ` [PATCH v7 loongarch-next 2/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
@ 2025-12-30 12:17 ` Hengqi Chen
0 siblings, 0 replies; 31+ messages in thread
From: Hengqi Chen @ 2025-12-30 12:17 UTC (permalink / raw)
To: George Guo
Cc: chenhuacai, guodongtai, kernel, lianyangyang, linux-kernel,
loongarch, r, xry111
On Tue, Dec 30, 2025 at 9:34 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> From: George Guo <guodongtai@kylinos.cn>
>
> Implement 128-bit atomic compare-and-exchange using LoongArch's
> LL.D/SC.Q instructions.
>
> At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
> caused by kmalloc_nolock_noprof returning NULL due to missing
> 128-bit atomics. The NULL returns led to -ENOMEM errors during
> scheduler initialization, causing test cases to fail.
>
> Verified by testing with the scx_qmap scheduler (located in
> tools/sched_ext/). Building with `make` and running
> ./tools/sched_ext/build/bin/scx_qmap.
>
As I mentioned in last cycle, patch 2 and patch 3 can be merged into one.
Please also add a link ([1]) to upstream commit that breaks these tests.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=5fb750e8a9ae
> Signed-off-by: George Guo <guodongtai@kylinos.cn>
> ---
> arch/loongarch/include/asm/cmpxchg.h | 47 ++++++++++++++++++++++++++++
> 1 file changed, 47 insertions(+)
>
> diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> index 0494c2ab553e..61ce6a0889f0 100644
> --- a/arch/loongarch/include/asm/cmpxchg.h
> +++ b/arch/loongarch/include/asm/cmpxchg.h
> @@ -137,6 +137,44 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
> __ret; \
> })
>
> +union __u128_halves {
> + u128 full;
> + struct {
> + u64 low;
> + u64 high;
> + };
> +};
> +
> +#define __cmpxchg128_asm(ptr, old, new) \
> +({ \
> + union __u128_halves __old, __new, __ret; \
> + volatile u64 *__ptr = (volatile u64 *)(ptr); \
> + \
> + __old.full = (old); \
> + __new.full = (new); \
> + \
> + __asm__ __volatile__( \
> + "1: ll.d %0, %3 # 128-bit cmpxchg low \n" \
> + __WEAK_LLSC_MB \
> + " ld.d %1, %4 # 128-bit cmpxchg high \n" \
> + " bne %0, %z5, 2f \n" \
> + " bne %1, %z6, 2f \n" \
> + " move $t0, %z7 \n" \
> + " move $t1, %z8 \n" \
> + " sc.q $t0, $t1, %2 \n" \
> + " beqz $t0, 1b \n" \
> + "2: \n" \
> + __WEAK_LLSC_MB \
> + : "=&r" (__ret.low), "=&r" (__ret.high) \
> + : "r" (__ptr), \
> + "ZC" (__ptr[0]), "m" (__ptr[1]), \
> + "Jr" (__old.low), "Jr" (__old.high), \
> + "Jr" (__new.low), "Jr" (__new.high) \
> + : "t0", "t1", "memory"); \
> + \
> + __ret.full; \
> +})
> +
> static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
> unsigned int new, unsigned int size)
> {
> @@ -224,6 +262,15 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
> __res; \
> })
>
> +/* cmpxchg128 */
> +#define system_has_cmpxchg128() 1
> +
> +#define arch_cmpxchg128(ptr, o, n) \
> +({ \
> + BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
> + __cmpxchg128_asm(ptr, o, n); \
> +})
> +
> #ifdef CONFIG_64BIT
> #define arch_cmpxchg64_local(ptr, o, n) \
> ({ \
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v7 loongarch-next 4/4] LoongArch: Enable 128-bit atomics cmpxchg support
2025-12-30 1:34 ` [PATCH v7 loongarch-next 4/4] LoongArch: Enable 128-bit atomics cmpxchg support George Guo
@ 2025-12-30 12:19 ` Hengqi Chen
0 siblings, 0 replies; 31+ messages in thread
From: Hengqi Chen @ 2025-12-30 12:19 UTC (permalink / raw)
To: George Guo
Cc: chenhuacai, guodongtai, kernel, lianyangyang, linux-kernel,
loongarch, r, xry111
On Tue, Dec 30, 2025 at 9:34 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> From: George Guo <guodongtai@kylinos.cn>
>
> Add select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE in Kconfig
> to enable 128-bit atomic cmpxchg support on LoongArch.
>
> Signed-off-by: George Guo <guodongtai@kylinos.cn>
> ---
> arch/loongarch/Kconfig | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 730f34214519..d4de823276d1 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -114,6 +114,7 @@ config LOONGARCH
> select GENERIC_TIME_VSYSCALL
> select GPIOLIB
> select HAS_IOPORT
> + select HAVE_ALIGNED_STRUCT_PAGE
> select HAVE_ARCH_AUDITSYSCALL
> select HAVE_ARCH_BITREVERSE
> select HAVE_ARCH_JUMP_LABEL
> @@ -141,6 +142,7 @@ config LOONGARCH
> select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> select HAVE_DYNAMIC_FTRACE_WITH_REGS
> select HAVE_EBPF_JIT
> + select HAVE_CMPXCHG_DOUBLE
> select HAVE_EFFICIENT_UNALIGNED_ACCESS if !ARCH_STRICT_ALIGN
> select HAVE_EXIT_THREAD
> select HAVE_GENERIC_TIF_BITS
> --
Keep the list sorted ?
> 2.49.0
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v8 loongarch-next 0/3] LoongArch: Add 128-bit atomic cmpxchg support
2025-12-30 12:04 ` [PATCH v7 loongarch-next 0/4] LoongArch: Add 128-bit atomic " Hengqi Chen
@ 2025-12-31 3:45 ` George Guo
2025-12-31 3:45 ` [PATCH v8 loongarch-next 1/3] LoongArch: Add SCQ support detection George Guo
` (3 more replies)
0 siblings, 4 replies; 31+ messages in thread
From: George Guo @ 2025-12-31 3:45 UTC (permalink / raw)
To: hengqi.chen
Cc: chenhuacai, dongtai.guo, guodongtai, kernel, lianyangyang,
linux-kernel, loongarch, r, xry111
This patch series adds 128-bit atomic compare-and-exchange support for
LoongArch architecture, which fixes BPF scheduler test failures caused
by missing 128-bit atomics support.
The series consists of three patches:
1. "LoongArch: Add SCQ support detection"
- Check CPUCFG2_SCQ bit to determin if the CPU supports
SCQ instrction.
2. "LoongArch: Add 128-bit atomic cmpxchg support"
- Implements 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions
- For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
the SCQ instruction on 3A5000), use a spinlock to emulate
the atomic operation.
- Fixes BPF scheduler test failures (scx_central scx_qmap) where
kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
leading to -ENOMEM errors during scheduler initialization
3. LoongArch: Enable 128-bit atomics cmpxchg support"
- Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
in Kconfig to enable 128-bit atomic cmpxchg support
The issue was identified through BPF scheduler test failures where
scx_central and scx_qmap schedulers would fail to initialize. Testing
was performed using the scx_qmap scheduler from tools/sched_ext/,
confirming that the patches resolve the initialization failures.
---
Changes in v8:
- Merge patch 2 and patch 3 into one patch
- Put HAVE_CMPXCHG_DOUBLE in order
- Link to v7: https://lore.kernel.org/all/20251230013417.37393-1-dongtai.guo@linux.dev/
---
Changes in v7:
- Create patches based on loongarch-next branch(previously used master)
- Link to v6: https://lore.kernel.org/r/20251215-2-v6-0-09a486e8df99@linux.dev
Changes in v6:
- Put SCQ information in hwcap
- Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
Changes in v5:
- Reordered the patches
- Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
Changes in v4:
- Add SCQ support detection
- Add spinlock to emulate 128-bit cmpxchg
- Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
Changes in v3:
- dbar 0 -> __WEAK_LLSC_MB
- =ZB" (__ptr[0]) -> "r" (__ptr)
- Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
Changes in v2:
- Use a normal ld.d for the high word instead of ll.d to avoid race
condition
- Insert a dbar between ll.d and ld.d to prevent reordering
- Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
- Fix address operand constraints after testing different approaches:
* ld.d with "m"
* ll.d with "ZC",
* sc.q with "ZB"(alternative constraints caused issues:
- "r" caused system hang
- "ZC" caused compiler error:
{standard input}: Assembler messages:
{standard input}:10037: Fatal error: Immediate overflow.
format: u0:0 )
- Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
George Guo (3):
LoongArch: Add SCQ support detection
LoongArch: Add 128-bit atomic cmpxchg support
LoongArch: Enable 128-bit atomics cmpxchg support
arch/loongarch/Kconfig | 2 +
arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++
arch/loongarch/include/asm/cpu-features.h | 1 +
arch/loongarch/include/asm/cpu.h | 2 +
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kernel/cpu-probe.c | 2 +
arch/loongarch/kernel/proc.c | 1 +
7 files changed, 75 insertions(+)
--
2.49.0
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v8 loongarch-next 1/3] LoongArch: Add SCQ support detection
2025-12-31 3:45 ` [PATCH v8 loongarch-next 0/3] " George Guo
@ 2025-12-31 3:45 ` George Guo
2025-12-31 9:51 ` Hengqi Chen
2025-12-31 3:45 ` [PATCH v8 loongarch-next 2/3] LoongArch: Add 128-bit atomic cmpxchg support George Guo
` (2 subsequent siblings)
3 siblings, 1 reply; 31+ messages in thread
From: George Guo @ 2025-12-31 3:45 UTC (permalink / raw)
To: hengqi.chen
Cc: chenhuacai, dongtai.guo, guodongtai, kernel, lianyangyang,
linux-kernel, loongarch, r, xry111
From: George Guo <guodongtai@kylinos.cn>
Check CPUCFG2_SCQ bit to determine if the CPU supports
SCQ instruction.
Co-developed-by: Yangyang Lian <lianyangyang@kylinos.cn>
Signed-off-by: Yangyang Lian <lianyangyang@kylinos.cn>
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cpu-features.h | 1 +
arch/loongarch/include/asm/cpu.h | 2 ++
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kernel/cpu-probe.c | 2 ++
arch/loongarch/kernel/proc.c | 1 +
5 files changed, 7 insertions(+)
diff --git a/arch/loongarch/include/asm/cpu-features.h b/arch/loongarch/include/asm/cpu-features.h
index 3745d991a99a..39c7fe64c3ef 100644
--- a/arch/loongarch/include/asm/cpu-features.h
+++ b/arch/loongarch/include/asm/cpu-features.h
@@ -67,5 +67,6 @@
#define cpu_has_msgint cpu_opt(LOONGARCH_CPU_MSGINT)
#define cpu_has_avecint cpu_opt(LOONGARCH_CPU_AVECINT)
#define cpu_has_redirectint cpu_opt(LOONGARCH_CPU_REDIRECTINT)
+#define cpu_has_scq cpu_opt(LOONGARCH_CPU_SCQ)
#endif /* __ASM_CPU_FEATURES_H */
diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h
index f3efb00b6141..5531039027ec 100644
--- a/arch/loongarch/include/asm/cpu.h
+++ b/arch/loongarch/include/asm/cpu.h
@@ -125,6 +125,7 @@ static inline char *id_to_core_name(unsigned int id)
#define CPU_FEATURE_MSGINT 29 /* CPU has MSG interrupt */
#define CPU_FEATURE_AVECINT 30 /* CPU has AVEC interrupt */
#define CPU_FEATURE_REDIRECTINT 31 /* CPU has interrupt remapping */
+#define CPU_FEATURE_SCQ 32 /* CPU has SC.Q instruction */
#define LOONGARCH_CPU_CPUCFG BIT_ULL(CPU_FEATURE_CPUCFG)
#define LOONGARCH_CPU_LAM BIT_ULL(CPU_FEATURE_LAM)
@@ -158,5 +159,6 @@ static inline char *id_to_core_name(unsigned int id)
#define LOONGARCH_CPU_MSGINT BIT_ULL(CPU_FEATURE_MSGINT)
#define LOONGARCH_CPU_AVECINT BIT_ULL(CPU_FEATURE_AVECINT)
#define LOONGARCH_CPU_REDIRECTINT BIT_ULL(CPU_FEATURE_REDIRECTINT)
+#define LOONGARCH_CPU_SCQ BIT_ULL(CPU_FEATURE_SCQ)
#endif /* _ASM_CPU_H */
diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index e6b8ff61c8cc..817cd90941d9 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -94,6 +94,7 @@
#define CPUCFG2_LSPW BIT(21)
#define CPUCFG2_LAM BIT(22)
#define CPUCFG2_PTW BIT(24)
+#define CPUCFG2_SCQ BIT(30)
#define LOONGARCH_CPUCFG3 0x3
#define CPUCFG3_CCDMA BIT(0)
diff --git a/arch/loongarch/kernel/cpu-probe.c b/arch/loongarch/kernel/cpu-probe.c
index 08a227034042..382c472c6bfe 100644
--- a/arch/loongarch/kernel/cpu-probe.c
+++ b/arch/loongarch/kernel/cpu-probe.c
@@ -205,6 +205,8 @@ static void cpu_probe_common(struct cpuinfo_loongarch *c)
c->options |= LOONGARCH_CPU_PTW;
elf_hwcap |= HWCAP_LOONGARCH_PTW;
}
+ if (config & CPUCFG2_SCQ)
+ c->options |= LOONGARCH_CPU_SCQ;
if (config & CPUCFG2_LSPW) {
c->options |= LOONGARCH_CPU_LSPW;
elf_hwcap |= HWCAP_LOONGARCH_LSPW;
diff --git a/arch/loongarch/kernel/proc.c b/arch/loongarch/kernel/proc.c
index a8800d20e11b..252fa1d03b85 100644
--- a/arch/loongarch/kernel/proc.c
+++ b/arch/loongarch/kernel/proc.c
@@ -75,6 +75,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
if (cpu_has_lbt_x86) seq_printf(m, " lbt_x86");
if (cpu_has_lbt_arm) seq_printf(m, " lbt_arm");
if (cpu_has_lbt_mips) seq_printf(m, " lbt_mips");
+ if (cpu_has_scq) seq_printf(m, " scq");
seq_printf(m, "\n");
seq_printf(m, "Hardware Watchpoint\t: %s", str_yes_no(cpu_has_watch));
--
2.49.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v8 loongarch-next 2/3] LoongArch: Add 128-bit atomic cmpxchg support
2025-12-31 3:45 ` [PATCH v8 loongarch-next 0/3] " George Guo
2025-12-31 3:45 ` [PATCH v8 loongarch-next 1/3] LoongArch: Add SCQ support detection George Guo
@ 2025-12-31 3:45 ` George Guo
2025-12-31 9:53 ` Hengqi Chen
2025-12-31 3:45 ` [PATCH v8 loongarch-next 3/3] LoongArch: Enable 128-bit atomics " George Guo
2025-12-31 9:56 ` [PATCH v8 loongarch-next 0/3] LoongArch: Add 128-bit atomic " Huacai Chen
3 siblings, 1 reply; 31+ messages in thread
From: George Guo @ 2025-12-31 3:45 UTC (permalink / raw)
To: hengqi.chen
Cc: chenhuacai, dongtai.guo, guodongtai, kernel, lianyangyang,
linux-kernel, loongarch, r, xry111
From: George Guo <guodongtai@kylinos.cn>
Implement 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions.
For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
the SCQ instruction on 3A5000), use a spinlock to emulate
the atomic operation.
At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
caused by kmalloc_nolock_noprof returning NULL due to missing
128-bit atomics. The NULL returns led to -ENOMEM errors during
scheduler initialization, causing test cases to fail.
Verified by testing with the scx_qmap scheduler (located in
tools/sched_ext/). Building with `make` and running
./tools/sched_ext/build/bin/scx_qmap.
Link: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=5fb750e8a9ae
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cmpxchg.h | 66 ++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index 0494c2ab553e..ef793bcb7b25 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -8,6 +8,7 @@
#include <linux/bits.h>
#include <linux/build_bug.h>
#include <asm/barrier.h>
+#include <asm/cpu-features.h>
#define __xchg_amo_asm(amswap_db, m, val) \
({ \
@@ -137,6 +138,61 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
__ret; \
})
+union __u128_halves {
+ u128 full;
+ struct {
+ u64 low;
+ u64 high;
+ };
+};
+
+#define __cmpxchg128_asm(ptr, old, new) \
+({ \
+ union __u128_halves __old, __new, __ret; \
+ volatile u64 *__ptr = (volatile u64 *)(ptr); \
+ \
+ __old.full = (old); \
+ __new.full = (new); \
+ \
+ __asm__ __volatile__( \
+ "1: ll.d %0, %3 # 128-bit cmpxchg low \n" \
+ __WEAK_LLSC_MB \
+ " ld.d %1, %4 # 128-bit cmpxchg high \n" \
+ " bne %0, %z5, 2f \n" \
+ " bne %1, %z6, 2f \n" \
+ " move $t0, %z7 \n" \
+ " move $t1, %z8 \n" \
+ " sc.q $t0, $t1, %2 \n" \
+ " beqz $t0, 1b \n" \
+ "2: \n" \
+ __WEAK_LLSC_MB \
+ : "=&r" (__ret.low), "=&r" (__ret.high) \
+ : "r" (__ptr), \
+ "ZC" (__ptr[0]), "m" (__ptr[1]), \
+ "Jr" (__old.low), "Jr" (__old.high), \
+ "Jr" (__new.low), "Jr" (__new.high) \
+ : "t0", "t1", "memory"); \
+ \
+ __ret.full; \
+})
+
+#define __cmpxchg128_locked(ptr, old, new) \
+({ \
+ u128 __ret; \
+ static DEFINE_SPINLOCK(lock); \
+ unsigned long flags; \
+ \
+ spin_lock_irqsave(&lock, flags); \
+ \
+ __ret = *(volatile u128 *)(ptr); \
+ if (__ret == (old)) \
+ *(volatile u128 *)(ptr) = (new); \
+ \
+ spin_unlock_irqrestore(&lock, flags); \
+ \
+ __ret; \
+})
+
static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
unsigned int new, unsigned int size)
{
@@ -224,6 +280,16 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
__res; \
})
+/* cmpxchg128 */
+#define system_has_cmpxchg128() 1
+
+#define arch_cmpxchg128(ptr, o, n) \
+({ \
+ BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
+ cpu_has_scq ? __cmpxchg128_asm(ptr, o, n) : \
+ __cmpxchg128_locked(ptr, o, n); \
+})
+
#ifdef CONFIG_64BIT
#define arch_cmpxchg64_local(ptr, o, n) \
({ \
--
2.49.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v8 loongarch-next 3/3] LoongArch: Enable 128-bit atomics cmpxchg support
2025-12-31 3:45 ` [PATCH v8 loongarch-next 0/3] " George Guo
2025-12-31 3:45 ` [PATCH v8 loongarch-next 1/3] LoongArch: Add SCQ support detection George Guo
2025-12-31 3:45 ` [PATCH v8 loongarch-next 2/3] LoongArch: Add 128-bit atomic cmpxchg support George Guo
@ 2025-12-31 3:45 ` George Guo
2025-12-31 9:52 ` Hengqi Chen
2025-12-31 9:56 ` [PATCH v8 loongarch-next 0/3] LoongArch: Add 128-bit atomic " Huacai Chen
3 siblings, 1 reply; 31+ messages in thread
From: George Guo @ 2025-12-31 3:45 UTC (permalink / raw)
To: hengqi.chen
Cc: chenhuacai, dongtai.guo, guodongtai, kernel, lianyangyang,
linux-kernel, loongarch, r, xry111
From: George Guo <guodongtai@kylinos.cn>
Add select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE in Kconfig
to enable 128-bit atomic cmpxchg support on LoongArch.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 730f34214519..f9845ebec1a4 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -114,6 +114,7 @@ config LOONGARCH
select GENERIC_TIME_VSYSCALL
select GPIOLIB
select HAS_IOPORT
+ select HAVE_ALIGNED_STRUCT_PAGE
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_BITREVERSE
select HAVE_ARCH_JUMP_LABEL
@@ -130,6 +131,7 @@ config LOONGARCH
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD
select HAVE_ASM_MODVERSIONS
+ select HAVE_CMPXCHG_DOUBLE
select HAVE_CONTEXT_TRACKING_USER
select HAVE_C_RECORDMCOUNT
select HAVE_DEBUG_KMEMLEAK
--
2.49.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v8 loongarch-next 1/3] LoongArch: Add SCQ support detection
2025-12-31 3:45 ` [PATCH v8 loongarch-next 1/3] LoongArch: Add SCQ support detection George Guo
@ 2025-12-31 9:51 ` Hengqi Chen
0 siblings, 0 replies; 31+ messages in thread
From: Hengqi Chen @ 2025-12-31 9:51 UTC (permalink / raw)
To: George Guo
Cc: chenhuacai, guodongtai, kernel, lianyangyang, linux-kernel,
loongarch, r, xry111
On Wed, Dec 31, 2025 at 11:45 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> From: George Guo <guodongtai@kylinos.cn>
>
> Check CPUCFG2_SCQ bit to determine if the CPU supports
> SCQ instruction.
>
> Co-developed-by: Yangyang Lian <lianyangyang@kylinos.cn>
> Signed-off-by: Yangyang Lian <lianyangyang@kylinos.cn>
> Signed-off-by: George Guo <guodongtai@kylinos.cn>
> ---
There is a conflict with latest loongarch-next branch. Other than that
Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com>
Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
> arch/loongarch/include/asm/cpu-features.h | 1 +
> arch/loongarch/include/asm/cpu.h | 2 ++
> arch/loongarch/include/asm/loongarch.h | 1 +
> arch/loongarch/kernel/cpu-probe.c | 2 ++
> arch/loongarch/kernel/proc.c | 1 +
> 5 files changed, 7 insertions(+)
>
> diff --git a/arch/loongarch/include/asm/cpu-features.h b/arch/loongarch/include/asm/cpu-features.h
> index 3745d991a99a..39c7fe64c3ef 100644
> --- a/arch/loongarch/include/asm/cpu-features.h
> +++ b/arch/loongarch/include/asm/cpu-features.h
> @@ -67,5 +67,6 @@
> #define cpu_has_msgint cpu_opt(LOONGARCH_CPU_MSGINT)
> #define cpu_has_avecint cpu_opt(LOONGARCH_CPU_AVECINT)
> #define cpu_has_redirectint cpu_opt(LOONGARCH_CPU_REDIRECTINT)
> +#define cpu_has_scq cpu_opt(LOONGARCH_CPU_SCQ)
>
> #endif /* __ASM_CPU_FEATURES_H */
> diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h
> index f3efb00b6141..5531039027ec 100644
> --- a/arch/loongarch/include/asm/cpu.h
> +++ b/arch/loongarch/include/asm/cpu.h
> @@ -125,6 +125,7 @@ static inline char *id_to_core_name(unsigned int id)
> #define CPU_FEATURE_MSGINT 29 /* CPU has MSG interrupt */
> #define CPU_FEATURE_AVECINT 30 /* CPU has AVEC interrupt */
> #define CPU_FEATURE_REDIRECTINT 31 /* CPU has interrupt remapping */
> +#define CPU_FEATURE_SCQ 32 /* CPU has SC.Q instruction */
>
> #define LOONGARCH_CPU_CPUCFG BIT_ULL(CPU_FEATURE_CPUCFG)
> #define LOONGARCH_CPU_LAM BIT_ULL(CPU_FEATURE_LAM)
> @@ -158,5 +159,6 @@ static inline char *id_to_core_name(unsigned int id)
> #define LOONGARCH_CPU_MSGINT BIT_ULL(CPU_FEATURE_MSGINT)
> #define LOONGARCH_CPU_AVECINT BIT_ULL(CPU_FEATURE_AVECINT)
> #define LOONGARCH_CPU_REDIRECTINT BIT_ULL(CPU_FEATURE_REDIRECTINT)
> +#define LOONGARCH_CPU_SCQ BIT_ULL(CPU_FEATURE_SCQ)
>
> #endif /* _ASM_CPU_H */
> diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
> index e6b8ff61c8cc..817cd90941d9 100644
> --- a/arch/loongarch/include/asm/loongarch.h
> +++ b/arch/loongarch/include/asm/loongarch.h
> @@ -94,6 +94,7 @@
> #define CPUCFG2_LSPW BIT(21)
> #define CPUCFG2_LAM BIT(22)
> #define CPUCFG2_PTW BIT(24)
> +#define CPUCFG2_SCQ BIT(30)
>
> #define LOONGARCH_CPUCFG3 0x3
> #define CPUCFG3_CCDMA BIT(0)
> diff --git a/arch/loongarch/kernel/cpu-probe.c b/arch/loongarch/kernel/cpu-probe.c
> index 08a227034042..382c472c6bfe 100644
> --- a/arch/loongarch/kernel/cpu-probe.c
> +++ b/arch/loongarch/kernel/cpu-probe.c
> @@ -205,6 +205,8 @@ static void cpu_probe_common(struct cpuinfo_loongarch *c)
> c->options |= LOONGARCH_CPU_PTW;
> elf_hwcap |= HWCAP_LOONGARCH_PTW;
> }
> + if (config & CPUCFG2_SCQ)
> + c->options |= LOONGARCH_CPU_SCQ;
> if (config & CPUCFG2_LSPW) {
> c->options |= LOONGARCH_CPU_LSPW;
> elf_hwcap |= HWCAP_LOONGARCH_LSPW;
> diff --git a/arch/loongarch/kernel/proc.c b/arch/loongarch/kernel/proc.c
> index a8800d20e11b..252fa1d03b85 100644
> --- a/arch/loongarch/kernel/proc.c
> +++ b/arch/loongarch/kernel/proc.c
> @@ -75,6 +75,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> if (cpu_has_lbt_x86) seq_printf(m, " lbt_x86");
> if (cpu_has_lbt_arm) seq_printf(m, " lbt_arm");
> if (cpu_has_lbt_mips) seq_printf(m, " lbt_mips");
> + if (cpu_has_scq) seq_printf(m, " scq");
> seq_printf(m, "\n");
>
> seq_printf(m, "Hardware Watchpoint\t: %s", str_yes_no(cpu_has_watch));
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 loongarch-next 3/3] LoongArch: Enable 128-bit atomics cmpxchg support
2025-12-31 3:45 ` [PATCH v8 loongarch-next 3/3] LoongArch: Enable 128-bit atomics " George Guo
@ 2025-12-31 9:52 ` Hengqi Chen
0 siblings, 0 replies; 31+ messages in thread
From: Hengqi Chen @ 2025-12-31 9:52 UTC (permalink / raw)
To: George Guo
Cc: chenhuacai, guodongtai, kernel, lianyangyang, linux-kernel,
loongarch, r, xry111
On Wed, Dec 31, 2025 at 11:45 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> From: George Guo <guodongtai@kylinos.cn>
>
> Add select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE in Kconfig
> to enable 128-bit atomic cmpxchg support on LoongArch.
>
Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com>
Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
> Signed-off-by: George Guo <guodongtai@kylinos.cn>
> ---
> arch/loongarch/Kconfig | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 730f34214519..f9845ebec1a4 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -114,6 +114,7 @@ config LOONGARCH
> select GENERIC_TIME_VSYSCALL
> select GPIOLIB
> select HAS_IOPORT
> + select HAVE_ALIGNED_STRUCT_PAGE
> select HAVE_ARCH_AUDITSYSCALL
> select HAVE_ARCH_BITREVERSE
> select HAVE_ARCH_JUMP_LABEL
> @@ -130,6 +131,7 @@ config LOONGARCH
> select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD
> select HAVE_ASM_MODVERSIONS
> + select HAVE_CMPXCHG_DOUBLE
> select HAVE_CONTEXT_TRACKING_USER
> select HAVE_C_RECORDMCOUNT
> select HAVE_DEBUG_KMEMLEAK
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 loongarch-next 2/3] LoongArch: Add 128-bit atomic cmpxchg support
2025-12-31 3:45 ` [PATCH v8 loongarch-next 2/3] LoongArch: Add 128-bit atomic cmpxchg support George Guo
@ 2025-12-31 9:53 ` Hengqi Chen
0 siblings, 0 replies; 31+ messages in thread
From: Hengqi Chen @ 2025-12-31 9:53 UTC (permalink / raw)
To: George Guo
Cc: chenhuacai, guodongtai, kernel, lianyangyang, linux-kernel,
loongarch, r, xry111
On Wed, Dec 31, 2025 at 11:45 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> From: George Guo <guodongtai@kylinos.cn>
>
> Implement 128-bit atomic compare-and-exchange using LoongArch's
> LL.D/SC.Q instructions.
>
> For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
> the SCQ instruction on 3A5000), use a spinlock to emulate
> the atomic operation.
>
> At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
> caused by kmalloc_nolock_noprof returning NULL due to missing
> 128-bit atomics. The NULL returns led to -ENOMEM errors during
> scheduler initialization, causing test cases to fail.
>
> Verified by testing with the scx_qmap scheduler (located in
> tools/sched_ext/). Building with `make` and running
> ./tools/sched_ext/build/bin/scx_qmap.
>
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=5fb750e8a9ae
> Signed-off-by: George Guo <guodongtai@kylinos.cn>
> ---
Acked-by: Hengqi Chen <hengqi.chen@gmail.com>
Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
> arch/loongarch/include/asm/cmpxchg.h | 66 ++++++++++++++++++++++++++++
> 1 file changed, 66 insertions(+)
>
> diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> index 0494c2ab553e..ef793bcb7b25 100644
> --- a/arch/loongarch/include/asm/cmpxchg.h
> +++ b/arch/loongarch/include/asm/cmpxchg.h
> @@ -8,6 +8,7 @@
> #include <linux/bits.h>
> #include <linux/build_bug.h>
> #include <asm/barrier.h>
> +#include <asm/cpu-features.h>
>
> #define __xchg_amo_asm(amswap_db, m, val) \
> ({ \
> @@ -137,6 +138,61 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
> __ret; \
> })
>
> +union __u128_halves {
> + u128 full;
> + struct {
> + u64 low;
> + u64 high;
> + };
> +};
> +
> +#define __cmpxchg128_asm(ptr, old, new) \
> +({ \
> + union __u128_halves __old, __new, __ret; \
> + volatile u64 *__ptr = (volatile u64 *)(ptr); \
> + \
> + __old.full = (old); \
> + __new.full = (new); \
> + \
> + __asm__ __volatile__( \
> + "1: ll.d %0, %3 # 128-bit cmpxchg low \n" \
> + __WEAK_LLSC_MB \
> + " ld.d %1, %4 # 128-bit cmpxchg high \n" \
> + " bne %0, %z5, 2f \n" \
> + " bne %1, %z6, 2f \n" \
> + " move $t0, %z7 \n" \
> + " move $t1, %z8 \n" \
> + " sc.q $t0, $t1, %2 \n" \
> + " beqz $t0, 1b \n" \
> + "2: \n" \
> + __WEAK_LLSC_MB \
> + : "=&r" (__ret.low), "=&r" (__ret.high) \
> + : "r" (__ptr), \
> + "ZC" (__ptr[0]), "m" (__ptr[1]), \
> + "Jr" (__old.low), "Jr" (__old.high), \
> + "Jr" (__new.low), "Jr" (__new.high) \
> + : "t0", "t1", "memory"); \
> + \
> + __ret.full; \
> +})
> +
> +#define __cmpxchg128_locked(ptr, old, new) \
> +({ \
> + u128 __ret; \
> + static DEFINE_SPINLOCK(lock); \
> + unsigned long flags; \
> + \
> + spin_lock_irqsave(&lock, flags); \
> + \
> + __ret = *(volatile u128 *)(ptr); \
> + if (__ret == (old)) \
> + *(volatile u128 *)(ptr) = (new); \
> + \
> + spin_unlock_irqrestore(&lock, flags); \
> + \
> + __ret; \
> +})
> +
> static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
> unsigned int new, unsigned int size)
> {
> @@ -224,6 +280,16 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
> __res; \
> })
>
> +/* cmpxchg128 */
> +#define system_has_cmpxchg128() 1
> +
> +#define arch_cmpxchg128(ptr, o, n) \
> +({ \
> + BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
> + cpu_has_scq ? __cmpxchg128_asm(ptr, o, n) : \
> + __cmpxchg128_locked(ptr, o, n); \
> +})
> +
> #ifdef CONFIG_64BIT
> #define arch_cmpxchg64_local(ptr, o, n) \
> ({ \
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v8 loongarch-next 0/3] LoongArch: Add 128-bit atomic cmpxchg support
2025-12-31 3:45 ` [PATCH v8 loongarch-next 0/3] " George Guo
` (2 preceding siblings ...)
2025-12-31 3:45 ` [PATCH v8 loongarch-next 3/3] LoongArch: Enable 128-bit atomics " George Guo
@ 2025-12-31 9:56 ` Huacai Chen
3 siblings, 0 replies; 31+ messages in thread
From: Huacai Chen @ 2025-12-31 9:56 UTC (permalink / raw)
To: George Guo
Cc: hengqi.chen, guodongtai, kernel, lianyangyang, linux-kernel,
loongarch, r, xry111
Hi, George,
On Wed, Dec 31, 2025 at 11:45 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> This patch series adds 128-bit atomic compare-and-exchange support for
> LoongArch architecture, which fixes BPF scheduler test failures caused
> by missing 128-bit atomics support.
>
> The series consists of three patches:
>
> 1. "LoongArch: Add SCQ support detection"
> - Check CPUCFG2_SCQ bit to determin if the CPU supports
> SCQ instrction.
>
> 2. "LoongArch: Add 128-bit atomic cmpxchg support"
> - Implements 128-bit atomic compare-and-exchange using LoongArch's
> LL.D/SC.Q instructions
> - For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
> the SCQ instruction on 3A5000), use a spinlock to emulate
> the atomic operation.
> - Fixes BPF scheduler test failures (scx_central scx_qmap) where
> kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
> leading to -ENOMEM errors during scheduler initialization
>
> 3. LoongArch: Enable 128-bit atomics cmpxchg support"
> - Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
> in Kconfig to enable 128-bit atomic cmpxchg support
>
> The issue was identified through BPF scheduler test failures where
> scx_central and scx_qmap schedulers would fail to initialize. Testing
> was performed using the scx_qmap scheduler from tools/sched_ext/,
> confirming that the patches resolve the initialization failures.
>
> ---
> Changes in v8:
> - Merge patch 2 and patch 3 into one patch
> - Put HAVE_CMPXCHG_DOUBLE in order
> - Link to v7: https://lore.kernel.org/all/20251230013417.37393-1-dongtai.guo@linux.dev/
I don't know why you make all versions in a single thread, and the
version numbers of cover letters are always wrong.
For the code itself:
1. You said you have set hwcaps, but you completely ignore
arch/loongarch/include/uapi/asm/hwcap.h, I don't know why.
2. You can simply do
#define system_has_cmpxchg128() (cpu_has_scq)
and don't need to define __cmpxchg128_locked(), which is the same as
X86 and RISC-V.
Huacai
>
> ---
> Changes in v7:
> - Create patches based on loongarch-next branch(previously used master)
> - Link to v6: https://lore.kernel.org/r/20251215-2-v6-0-09a486e8df99@linux.dev
>
> Changes in v6:
> - Put SCQ information in hwcap
> - Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
>
> Changes in v5:
> - Reordered the patches
> - Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
>
> Changes in v4:
> - Add SCQ support detection
> - Add spinlock to emulate 128-bit cmpxchg
> - Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
>
> Changes in v3:
> - dbar 0 -> __WEAK_LLSC_MB
> - =ZB" (__ptr[0]) -> "r" (__ptr)
> - Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
>
> Changes in v2:
> - Use a normal ld.d for the high word instead of ll.d to avoid race
> condition
> - Insert a dbar between ll.d and ld.d to prevent reordering
> - Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
> - Fix address operand constraints after testing different approaches:
> * ld.d with "m"
> * ll.d with "ZC",
> * sc.q with "ZB"(alternative constraints caused issues:
> - "r" caused system hang
> - "ZC" caused compiler error:
> {standard input}: Assembler messages:
> {standard input}:10037: Fatal error: Immediate overflow.
> format: u0:0 )
> - Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
>
> George Guo (3):
> LoongArch: Add SCQ support detection
> LoongArch: Add 128-bit atomic cmpxchg support
> LoongArch: Enable 128-bit atomics cmpxchg support
>
> arch/loongarch/Kconfig | 2 +
> arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++
> arch/loongarch/include/asm/cpu-features.h | 1 +
> arch/loongarch/include/asm/cpu.h | 2 +
> arch/loongarch/include/asm/loongarch.h | 1 +
> arch/loongarch/kernel/cpu-probe.c | 2 +
> arch/loongarch/kernel/proc.c | 1 +
> 7 files changed, 75 insertions(+)
>
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2025-12-31 9:55 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-15 8:11 [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) George Guo
2025-12-15 8:11 ` [PATCH v6 1/4] LoongArch: Add SCQ support detection George Guo
2025-12-15 8:11 ` [PATCH v6 2/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
2025-12-15 8:11 ` [PATCH v6 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg George Guo
2025-12-20 13:41 ` [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) Hengqi Chen
2025-12-29 6:34 ` [PATCH loongarch-next 0/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
2025-12-29 6:34 ` [PATCH loongarch-next 1/4] LoongArch: Add SCQ support detection George Guo
2025-12-29 6:34 ` [PATCH loongarch-next 2/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
2025-12-29 6:34 ` [PATCH loongarch-next 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg George Guo
2025-12-29 6:34 ` [PATCH loongarch-next 4/4] LoongArch: Enable 128-bit atomics cmpxchg support George Guo
2025-12-29 14:21 ` [PATCH loongarch-next 0/4] LoongArch: Add 128-bit atomic " Hengqi Chen
2025-12-30 1:34 ` [PATCH v7 " George Guo
2025-12-30 1:34 ` [PATCH v7 loongarch-next 1/4] LoongArch: Add SCQ support detection George Guo
2025-12-30 12:05 ` Hengqi Chen
2025-12-30 12:07 ` Hengqi Chen
2025-12-30 1:34 ` [PATCH v7 loongarch-next 2/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
2025-12-30 12:17 ` Hengqi Chen
2025-12-30 1:34 ` [PATCH v7 loongarch-next 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg George Guo
2025-12-30 1:34 ` [PATCH v7 loongarch-next 4/4] LoongArch: Enable 128-bit atomics cmpxchg support George Guo
2025-12-30 12:19 ` Hengqi Chen
2025-12-30 12:04 ` [PATCH v7 loongarch-next 0/4] LoongArch: Add 128-bit atomic " Hengqi Chen
2025-12-31 3:45 ` [PATCH v8 loongarch-next 0/3] " George Guo
2025-12-31 3:45 ` [PATCH v8 loongarch-next 1/3] LoongArch: Add SCQ support detection George Guo
2025-12-31 9:51 ` Hengqi Chen
2025-12-31 3:45 ` [PATCH v8 loongarch-next 2/3] LoongArch: Add 128-bit atomic cmpxchg support George Guo
2025-12-31 9:53 ` Hengqi Chen
2025-12-31 3:45 ` [PATCH v8 loongarch-next 3/3] LoongArch: Enable 128-bit atomics " George Guo
2025-12-31 9:52 ` Hengqi Chen
2025-12-31 9:56 ` [PATCH v8 loongarch-next 0/3] LoongArch: Add 128-bit atomic " Huacai Chen
2025-12-20 13:55 ` [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) Hengqi Chen
-- strict thread matches above, loose matches on Subject: below --
2025-12-15 8:22 George Guo
2025-12-15 8:22 ` [PATCH v6 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg George Guo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.