* [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5)
@ 2025-12-15 8:22 George Guo
2025-12-15 8:22 ` [PATCH v6 1/4] LoongArch: Add SCQ support detection George Guo
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: George Guo @ 2025-12-15 8:22 UTC (permalink / raw)
To: Huacai Chen, WANG Xuerui, hengqi.chen
Cc: r, xry111, loongarch, linux-kernel, George Guo, George Guo,
Yangyang Lian
This patch series adds 128-bit atomic compare-and-exchange support for
LoongArch architecture, which fixes BPF scheduler test failures caused
by missing 128-bit atomics support.
The series consists of four patches:
1. "LoongArch: Add SCQ support detection"
- Check CPUCFG2_SCQ bit to determin if the CPU supports
SCQ instrction.
2. "LoongArch: Add 128-bit atomic cmpxchg support"
- Implements 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions
- Fixes BPF scheduler test failures (scx_central scx_qmap) where
kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
leading to -ENOMEM errors during scheduler initialization
3. "LoongArch: Use spinlock to emulate 128-bit cmpxchg"
- For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
the SCQ instruction on 3A5000), provide a fallback implementation
of __cmpxchg128 using a spinlock to emulate the atomic operation.
4. "LoongArch: Enable 128-bit atomics cmpxchg support"
- Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
in Kconfig to enable 128-bit atomic cmpxchg support
The issue was identified through BPF scheduler test failures where
scx_central and scx_qmap schedulers would fail to initialize. Testing
was performed using the scx_qmap scheduler from tools/sched_ext/,
confirming that the patches resolve the initialization failures.
Signed-off-by: George Guo <dongtai.guo@linux.dev>
---
Changes in v6:
- Put SCQ information in hwcap
- Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
Changes in v5:
- Reordered the patches
- Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
Changes in v4:
- Add SCQ support detection
- Add spinlock to emulate 128-bit cmpxchg
- Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
Changes in v3:
- dbar 0 -> __WEAK_LLSC_MB
- =ZB" (__ptr[0]) -> "r" (__ptr)
- Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
Changes in v2:
- Use a normal ld.d for the high word instead of ll.d to avoid race
condition
- Insert a dbar between ll.d and ld.d to prevent reordering
- Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
- Fix address operand constraints after testing different approaches:
* ld.d with "m"
* ll.d with "ZC",
* sc.q with "ZB"(alternative constraints caused issues:
- "r" caused system hang
- "ZC" caused compiler error:
{standard input}: Assembler messages:
{standard input}:10037: Fatal error: Immediate overflow.
format: u0:0 )
- Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
---
George Guo (4):
LoongArch: Add SCQ support detection
LoongArch: Add 128-bit atomic cmpxchg support
LoongArch: Use spinlock to emulate 128-bit cmpxchg
LoongArch: Enable 128-bit atomics cmpxchg support
arch/loongarch/Kconfig | 2 +
arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++++++++++
arch/loongarch/include/asm/cpu-features.h | 1 +
arch/loongarch/include/asm/cpu.h | 2 +
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kernel/cpu-probe.c | 2 +
arch/loongarch/kernel/proc.c | 1 +
7 files changed, 75 insertions(+)
---
base-commit: 612df905d7404450696e979c806ba4cdef8684f4
change-id: 20251120-2-d03862b2cf6d
Best regards,
--
George Guo <dongtai.guo@linux.dev>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v6 1/4] LoongArch: Add SCQ support detection
2025-12-15 8:22 [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) George Guo
@ 2025-12-15 8:22 ` George Guo
2025-12-15 8:22 ` [PATCH v6 2/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: George Guo @ 2025-12-15 8:22 UTC (permalink / raw)
To: Huacai Chen, WANG Xuerui, hengqi.chen
Cc: r, xry111, loongarch, linux-kernel, George Guo, George Guo,
Yangyang Lian
From: George Guo <guodongtai@kylinos.cn>
Check CPUCFG2_SCQ bit to determin if the CPU supports
SCQ instrction.
Co-developed-by: Yangyang Lian <lianyangyang@kylinos.cn>
Signed-off-by: Yangyang Lian <lianyangyang@kylinos.cn>
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cpu-features.h | 1 +
arch/loongarch/include/asm/cpu.h | 2 ++
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kernel/cpu-probe.c | 2 ++
arch/loongarch/kernel/proc.c | 1 +
5 files changed, 7 insertions(+)
diff --git a/arch/loongarch/include/asm/cpu-features.h b/arch/loongarch/include/asm/cpu-features.h
index bd5f0457ad21d89ab902fb1971cc8b41b1d340ad..860cb58a92ba0c0316a8009d97441043374e7f10 100644
--- a/arch/loongarch/include/asm/cpu-features.h
+++ b/arch/loongarch/include/asm/cpu-features.h
@@ -70,5 +70,6 @@
#define cpu_has_msgint cpu_opt(LOONGARCH_CPU_MSGINT)
#define cpu_has_avecint cpu_opt(LOONGARCH_CPU_AVECINT)
#define cpu_has_redirectint cpu_opt(LOONGARCH_CPU_REDIRECTINT)
+#define cpu_has_scq cpu_opt(LOONGARCH_CPU_SCQ)
#endif /* __ASM_CPU_FEATURES_H */
diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h
index f3efb00b61414a9b111ade9fe9beb410b927d937..5531039027ec763f21c7a6a88685ec81fa61d3cc 100644
--- a/arch/loongarch/include/asm/cpu.h
+++ b/arch/loongarch/include/asm/cpu.h
@@ -125,6 +125,7 @@ static inline char *id_to_core_name(unsigned int id)
#define CPU_FEATURE_MSGINT 29 /* CPU has MSG interrupt */
#define CPU_FEATURE_AVECINT 30 /* CPU has AVEC interrupt */
#define CPU_FEATURE_REDIRECTINT 31 /* CPU has interrupt remapping */
+#define CPU_FEATURE_SCQ 32 /* CPU has SC.Q instruction */
#define LOONGARCH_CPU_CPUCFG BIT_ULL(CPU_FEATURE_CPUCFG)
#define LOONGARCH_CPU_LAM BIT_ULL(CPU_FEATURE_LAM)
@@ -158,5 +159,6 @@ static inline char *id_to_core_name(unsigned int id)
#define LOONGARCH_CPU_MSGINT BIT_ULL(CPU_FEATURE_MSGINT)
#define LOONGARCH_CPU_AVECINT BIT_ULL(CPU_FEATURE_AVECINT)
#define LOONGARCH_CPU_REDIRECTINT BIT_ULL(CPU_FEATURE_REDIRECTINT)
+#define LOONGARCH_CPU_SCQ BIT_ULL(CPU_FEATURE_SCQ)
#endif /* _ASM_CPU_H */
diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index 3de03cb864b248cd0fb5de9ec5a86b1436ccbdef..be04b3e6f5b0cd6c5d561efcfd99502bc24e5eee 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -94,6 +94,7 @@
#define CPUCFG2_LSPW BIT(21)
#define CPUCFG2_LAM BIT(22)
#define CPUCFG2_PTW BIT(24)
+#define CPUCFG2_SCQ BIT(30)
#define LOONGARCH_CPUCFG3 0x3
#define CPUCFG3_CCDMA BIT(0)
diff --git a/arch/loongarch/kernel/cpu-probe.c b/arch/loongarch/kernel/cpu-probe.c
index a2060a24b39fd78fa255816fa5518e0ee99b8a8e..5c5ead3eb0895c1a20abba1e19f02226a2657b1f 100644
--- a/arch/loongarch/kernel/cpu-probe.c
+++ b/arch/loongarch/kernel/cpu-probe.c
@@ -201,6 +201,8 @@ static void cpu_probe_common(struct cpuinfo_loongarch *c)
c->options |= LOONGARCH_CPU_PTW;
elf_hwcap |= HWCAP_LOONGARCH_PTW;
}
+ if (config & CPUCFG2_SCQ)
+ c->options |= LOONGARCH_CPU_SCQ;
if (config & CPUCFG2_LSPW) {
c->options |= LOONGARCH_CPU_LSPW;
elf_hwcap |= HWCAP_LOONGARCH_LSPW;
diff --git a/arch/loongarch/kernel/proc.c b/arch/loongarch/kernel/proc.c
index 63d2b7e7e844b0647a3e0d988ec2adb6c77b9b14..f1ad1773425eba5becfd31bc730eed0f2d19589d 100644
--- a/arch/loongarch/kernel/proc.c
+++ b/arch/loongarch/kernel/proc.c
@@ -75,6 +75,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
if (cpu_has_lbt_x86) seq_printf(m, " lbt_x86");
if (cpu_has_lbt_arm) seq_printf(m, " lbt_arm");
if (cpu_has_lbt_mips) seq_printf(m, " lbt_mips");
+ if (cpu_has_scq) seq_printf(m, " scq");
seq_printf(m, "\n");
seq_printf(m, "Hardware Watchpoint\t: %s", str_yes_no(cpu_has_watch));
--
2.49.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v6 2/4] LoongArch: Add 128-bit atomic cmpxchg support
2025-12-15 8:22 [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) George Guo
2025-12-15 8:22 ` [PATCH v6 1/4] LoongArch: Add SCQ support detection George Guo
@ 2025-12-15 8:22 ` George Guo
2025-12-15 8:22 ` [PATCH v6 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg George Guo
2025-12-15 8:22 ` [PATCH v6 4/4] LoongArch: Enable 128-bit atomics cmpxchg support George Guo
3 siblings, 0 replies; 5+ messages in thread
From: George Guo @ 2025-12-15 8:22 UTC (permalink / raw)
To: Huacai Chen, WANG Xuerui, hengqi.chen
Cc: r, xry111, loongarch, linux-kernel, George Guo, George Guo
From: George Guo <guodongtai@kylinos.cn>
Implement 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions.
At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
caused by kmalloc_nolock_noprof returning NULL due to missing
128-bit atomics. The NULL returns led to -ENOMEM errors during
scheduler initialization, causing test cases to fail.
Verified by testing with the scx_qmap scheduler (located in
tools/sched_ext/). Building with `make` and running
./tools/sched_ext/build/bin/scx_qmap.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cmpxchg.h | 47 ++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..f7a0a9a032c513196ef186a5493b500787e0e9b6 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -111,6 +111,44 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
__ret; \
})
+union __u128_halves {
+ u128 full;
+ struct {
+ u64 low;
+ u64 high;
+ };
+};
+
+#define __cmpxchg128_asm(ptr, old, new) \
+({ \
+ union __u128_halves __old, __new, __ret; \
+ volatile u64 *__ptr = (volatile u64 *)(ptr); \
+ \
+ __old.full = (old); \
+ __new.full = (new); \
+ \
+ __asm__ __volatile__( \
+ "1: ll.d %0, %3 # 128-bit cmpxchg low \n" \
+ __WEAK_LLSC_MB \
+ " ld.d %1, %4 # 128-bit cmpxchg high \n" \
+ " bne %0, %z5, 2f \n" \
+ " bne %1, %z6, 2f \n" \
+ " move $t0, %z7 \n" \
+ " move $t1, %z8 \n" \
+ " sc.q $t0, $t1, %2 \n" \
+ " beqz $t0, 1b \n" \
+ "2: \n" \
+ __WEAK_LLSC_MB \
+ : "=&r" (__ret.low), "=&r" (__ret.high) \
+ : "r" (__ptr), \
+ "ZC" (__ptr[0]), "m" (__ptr[1]), \
+ "Jr" (__old.low), "Jr" (__old.high), \
+ "Jr" (__new.low), "Jr" (__new.high) \
+ : "t0", "t1", "memory"); \
+ \
+ __ret.full; \
+})
+
static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
unsigned int new, unsigned int size)
{
@@ -198,6 +236,15 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
__res; \
})
+/* cmpxchg128 */
+#define system_has_cmpxchg128() 1
+
+#define arch_cmpxchg128(ptr, o, n) \
+({ \
+ BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
+ __cmpxchg128_asm(ptr, o, n); \
+})
+
#ifdef CONFIG_64BIT
#define arch_cmpxchg64_local(ptr, o, n) \
({ \
--
2.49.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v6 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg
2025-12-15 8:22 [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) George Guo
2025-12-15 8:22 ` [PATCH v6 1/4] LoongArch: Add SCQ support detection George Guo
2025-12-15 8:22 ` [PATCH v6 2/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
@ 2025-12-15 8:22 ` George Guo
2025-12-15 8:22 ` [PATCH v6 4/4] LoongArch: Enable 128-bit atomics cmpxchg support George Guo
3 siblings, 0 replies; 5+ messages in thread
From: George Guo @ 2025-12-15 8:22 UTC (permalink / raw)
To: Huacai Chen, WANG Xuerui, hengqi.chen
Cc: r, xry111, loongarch, linux-kernel, George Guo, George Guo
From: George Guo <guodongtai@kylinos.cn>
For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
the SCQ instruction on 3A5000), provide a fallback implementation
of __cmpxchg128 using a spinlock to emulate the atomic operation.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cmpxchg.h | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index f7a0a9a032c513196ef186a5493b500787e0e9b6..814097bfc334184018747e47fb90fd2d2fb27ee2 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -8,6 +8,7 @@
#include <linux/bits.h>
#include <linux/build_bug.h>
#include <asm/barrier.h>
+#include <asm/cpu-features.h>
#define __xchg_asm(amswap_db, m, val) \
({ \
@@ -149,6 +150,23 @@ union __u128_halves {
__ret.full; \
})
+#define __cmpxchg128_locked(ptr, old, new) \
+({ \
+ u128 __ret; \
+ static DEFINE_SPINLOCK(lock); \
+ unsigned long flags; \
+ \
+ spin_lock_irqsave(&lock, flags); \
+ \
+ __ret = *(volatile u128 *)(ptr); \
+ if (__ret == (old)) \
+ *(volatile u128 *)(ptr) = (new); \
+ \
+ spin_unlock_irqrestore(&lock, flags); \
+ \
+ __ret; \
+})
+
static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
unsigned int new, unsigned int size)
{
@@ -242,7 +260,8 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
#define arch_cmpxchg128(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
- __cmpxchg128_asm(ptr, o, n); \
+ cpu_has_scq ? __cmpxchg128_asm(ptr, o, n) : \
+ __cmpxchg128_locked(ptr, o, n); \
})
#ifdef CONFIG_64BIT
--
2.49.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v6 4/4] LoongArch: Enable 128-bit atomics cmpxchg support
2025-12-15 8:22 [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) George Guo
` (2 preceding siblings ...)
2025-12-15 8:22 ` [PATCH v6 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg George Guo
@ 2025-12-15 8:22 ` George Guo
3 siblings, 0 replies; 5+ messages in thread
From: George Guo @ 2025-12-15 8:22 UTC (permalink / raw)
To: Huacai Chen, WANG Xuerui, hengqi.chen
Cc: r, xry111, loongarch, linux-kernel, George Guo, George Guo
From: George Guo <guodongtai@kylinos.cn>
Add select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE in Kconfig
to enable 128-bit atomic cmpxchg support on LoongArch.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 5b1116733d881bc2b1b43fb93f20367add4dbc54..6fb2c253969f9ddece5478920423d7326c3ec046 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -114,6 +114,7 @@ config LOONGARCH
select GENERIC_TIME_VSYSCALL
select GPIOLIB
select HAS_IOPORT
+ select HAVE_ALIGNED_STRUCT_PAGE
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_JUMP_LABEL_RELATIVE
@@ -140,6 +141,7 @@ config LOONGARCH
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_EBPF_JIT
+ select HAVE_CMPXCHG_DOUBLE
select HAVE_EFFICIENT_UNALIGNED_ACCESS if !ARCH_STRICT_ALIGN
select HAVE_EXIT_THREAD
select HAVE_GENERIC_TIF_BITS
--
2.49.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-12-15 8:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-15 8:22 [PATCH v6 0/4] LoongArch: Add 128-bit atomic cmpxchg support (v5) George Guo
2025-12-15 8:22 ` [PATCH v6 1/4] LoongArch: Add SCQ support detection George Guo
2025-12-15 8:22 ` [PATCH v6 2/4] LoongArch: Add 128-bit atomic cmpxchg support George Guo
2025-12-15 8:22 ` [PATCH v6 3/4] LoongArch: Use spinlock to emulate 128-bit cmpxchg George Guo
2025-12-15 8:22 ` [PATCH v6 4/4] LoongArch: Enable 128-bit atomics cmpxchg support George Guo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.