public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3)
@ 2025-11-26  2:05 George Guo
  2025-11-26  2:05 ` [PATCH v3 1/2] LoongArch: Add 128-bit atomic cmpxchg support George Guo
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: George Guo @ 2025-11-26  2:05 UTC (permalink / raw)
  To: Huacai Chen, WANG Xuerui; +Cc: loongarch, linux-kernel, George Guo, George Guo

This patch series adds 128-bit atomic compare-and-exchange support for
LoongArch architecture, which fixes BPF scheduler test failures caused
by missing 128-bit atomics support.

The series consists of two patches:

1. "LoongArch: Add 128-bit atomic cmpxchg support"
   - Implements 128-bit atomic compare-and-exchange using LoongArch's
     LL.D/SC.Q instructions
   - Fixes BPF scheduler test failures (scx_central scx_qmap) where
     kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
     leading to -ENOMEM errors during scheduler initialization

2. "LoongArch: Enable 128-bit atomics cmpxchg support"
   - Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
     in Kconfig to enable 128-bit atomic cmpxchg support

The issue was identified through BPF scheduler test failures where
scx_central and scx_qmap schedulers would fail to initialize. Testing
was performed using the scx_qmap scheduler from tools/sched_ext/,
confirming that the patches resolve the initialization failures.

Signed-off-by: George Guo <dongtai.guo@linux.dev>
---
Changes in v3:
- dbar 0 -> __WEAK_LLSC_MB
- =ZB" (__ptr[0]) -> "r" (__ptr)
- Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev

Changes in v2:
- Use a normal ld.d for the high word instead of ll.d to avoid race
  condition
- Insert a dbar between ll.d and ld.d to prevent reordering
- Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
- Fix address operand constraints after testing different approaches:
  * ld.d with "m"
  * ll.d with "ZC",
  * sc.q with "ZB"(alternative constraints caused issues:
   - "r"  caused system hang
   - "ZC" caused compiler error:
     {standard input}: Assembler messages:
     {standard input}:10037: Fatal error: Immediate overflow.
     format: u0:0 )
- Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev

---
George Guo (2):
      LoongArch: Add 128-bit atomic cmpxchg support
      LoongArch: Enable 128-bit atomics cmpxchg support

 arch/loongarch/Kconfig               |  2 ++
 arch/loongarch/include/asm/cmpxchg.h | 47 ++++++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+)
---
base-commit: d5ae5ac32615e4af729f0610fdc11ff4f4798aef
change-id: 20251120-2-d03862b2cf6d

Best regards,
-- 
George Guo <dongtai.guo@linux.dev>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/2] LoongArch: Add 128-bit atomic cmpxchg support
  2025-11-26  2:05 [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3) George Guo
@ 2025-11-26  2:05 ` George Guo
  2025-11-26  2:05 ` [PATCH v3 2/2] LoongArch: Enable 128-bit atomics " George Guo
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: George Guo @ 2025-11-26  2:05 UTC (permalink / raw)
  To: Huacai Chen, WANG Xuerui; +Cc: loongarch, linux-kernel, George Guo, George Guo

From: George Guo <guodongtai@kylinos.cn>

Implement 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions.

At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
caused by kmalloc_nolock_noprof returning NULL due to missing
128-bit atomics. The NULL returns led to -ENOMEM errors during
scheduler initialization, causing test cases to fail.

Verified by testing with the scx_qmap scheduler (located in
tools/sched_ext/). Building with `make` and running
./tools/sched_ext/build/bin/scx_qmap.

Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
 arch/loongarch/include/asm/cmpxchg.h | 47 ++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index 979fde61bba8a42cb4f019f13ded2a3119d4aaf4..88553ae63e178ebff16ebbc2e01b0e23831b0918 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -111,6 +111,44 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
 	__ret;								\
 })
 
+union __u128_halves {
+	u128 full;
+	struct {
+		u64 low;
+		u64 high;
+	};
+};
+
+#define __cmpxchg128_asm(ptr, old, new)					\
+({									\
+	union __u128_halves __old, __new, __ret;			\
+	volatile u64 *__ptr = (volatile u64 *)(ptr);			\
+									\
+	__old.full = (old);                                             \
+	__new.full = (new);						\
+									\
+	__asm__ __volatile__(						\
+	"1:   ll.d    %0, %3		# 128-bit cmpxchg low	\n"	\
+	__WEAK_LLSC_MB							\
+	"     ld.d    %1, %4		# 128-bit cmpxchg high	\n"	\
+	"     bne     %0, %z5, 2f				\n"	\
+	"     bne     %1, %z6, 2f				\n"	\
+	"     move    $t0, %z7					\n"	\
+	"     move    $t1, %z8					\n"	\
+	"     sc.q    $t0, $t1, %2				\n"	\
+	"     beqz    $t0, 1b					\n"	\
+	"2:							\n"	\
+	__WEAK_LLSC_MB							\
+	: "=&r" (__ret.low), "=&r" (__ret.high)				\
+	: "r" (__ptr),							\
+	  "ZC" (__ptr[0]), "m" (__ptr[1]),				\
+	  "Jr" (__old.low), "Jr" (__old.high),				\
+	  "Jr" (__new.low), "Jr" (__new.high)				\
+	: "t0", "t1", "memory");					\
+									\
+	__ret.full;							\
+})
+
 static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
 					   unsigned int new, unsigned int size)
 {
@@ -198,6 +236,15 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
 	__res;								\
 })
 
+/* cmpxchg128 */
+#define system_has_cmpxchg128()		1
+
+#define arch_cmpxchg128(ptr, o, n)					\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 16);				\
+	__cmpxchg128_asm(ptr, o, n);			\
+})
+
 #ifdef CONFIG_64BIT
 #define arch_cmpxchg64_local(ptr, o, n)					\
   ({									\

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] LoongArch: Enable 128-bit atomics cmpxchg support
  2025-11-26  2:05 [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3) George Guo
  2025-11-26  2:05 ` [PATCH v3 1/2] LoongArch: Add 128-bit atomic cmpxchg support George Guo
@ 2025-11-26  2:05 ` George Guo
  2025-11-26  4:44 ` [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3) Huacai Chen
  2025-11-26  5:23 ` Hengqi Chen
  3 siblings, 0 replies; 9+ messages in thread
From: George Guo @ 2025-11-26  2:05 UTC (permalink / raw)
  To: Huacai Chen, WANG Xuerui; +Cc: loongarch, linux-kernel, George Guo, George Guo

From: George Guo <guodongtai@kylinos.cn>

Add select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE in Kconfig
to enable 128-bit atomic cmpxchg support on LoongArch.

Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
 arch/loongarch/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 5b1116733d881bc2b1b43fb93f20367add4dbc54..6fb2c253969f9ddece5478920423d7326c3ec046 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -114,6 +114,7 @@ config LOONGARCH
 	select GENERIC_TIME_VSYSCALL
 	select GPIOLIB
 	select HAS_IOPORT
+	select HAVE_ALIGNED_STRUCT_PAGE
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_JUMP_LABEL
 	select HAVE_ARCH_JUMP_LABEL_RELATIVE
@@ -140,6 +141,7 @@ config LOONGARCH
 	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS
 	select HAVE_EBPF_JIT
+	select HAVE_CMPXCHG_DOUBLE
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS if !ARCH_STRICT_ALIGN
 	select HAVE_EXIT_THREAD
 	select HAVE_GENERIC_TIF_BITS

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3)
  2025-11-26  2:05 [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3) George Guo
  2025-11-26  2:05 ` [PATCH v3 1/2] LoongArch: Add 128-bit atomic cmpxchg support George Guo
  2025-11-26  2:05 ` [PATCH v3 2/2] LoongArch: Enable 128-bit atomics " George Guo
@ 2025-11-26  4:44 ` Huacai Chen
  2025-11-26  9:42   ` George Guo
  2025-11-26  5:23 ` Hengqi Chen
  3 siblings, 1 reply; 9+ messages in thread
From: Huacai Chen @ 2025-11-26  4:44 UTC (permalink / raw)
  To: George Guo; +Cc: WANG Xuerui, loongarch, linux-kernel, George Guo

Hi, George,

On Wed, Nov 26, 2025 at 10:06 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> This patch series adds 128-bit atomic compare-and-exchange support for
> LoongArch architecture, which fixes BPF scheduler test failures caused
> by missing 128-bit atomics support.
Have you tested your code on Loongson-3A5000/3C5000?

Huacai

>
> The series consists of two patches:
>
> 1. "LoongArch: Add 128-bit atomic cmpxchg support"
>    - Implements 128-bit atomic compare-and-exchange using LoongArch's
>      LL.D/SC.Q instructions
>    - Fixes BPF scheduler test failures (scx_central scx_qmap) where
>      kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
>      leading to -ENOMEM errors during scheduler initialization
>
> 2. "LoongArch: Enable 128-bit atomics cmpxchg support"
>    - Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
>      in Kconfig to enable 128-bit atomic cmpxchg support
>
> The issue was identified through BPF scheduler test failures where
> scx_central and scx_qmap schedulers would fail to initialize. Testing
> was performed using the scx_qmap scheduler from tools/sched_ext/,
> confirming that the patches resolve the initialization failures.
>
> Signed-off-by: George Guo <dongtai.guo@linux.dev>
> ---
> Changes in v3:
> - dbar 0 -> __WEAK_LLSC_MB
> - =ZB" (__ptr[0]) -> "r" (__ptr)
> - Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
>
> Changes in v2:
> - Use a normal ld.d for the high word instead of ll.d to avoid race
>   condition
> - Insert a dbar between ll.d and ld.d to prevent reordering
> - Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
> - Fix address operand constraints after testing different approaches:
>   * ld.d with "m"
>   * ll.d with "ZC",
>   * sc.q with "ZB"(alternative constraints caused issues:
>    - "r"  caused system hang
>    - "ZC" caused compiler error:
>      {standard input}: Assembler messages:
>      {standard input}:10037: Fatal error: Immediate overflow.
>      format: u0:0 )
> - Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
>
> ---
> George Guo (2):
>       LoongArch: Add 128-bit atomic cmpxchg support
>       LoongArch: Enable 128-bit atomics cmpxchg support
>
>  arch/loongarch/Kconfig               |  2 ++
>  arch/loongarch/include/asm/cmpxchg.h | 47 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 49 insertions(+)
> ---
> base-commit: d5ae5ac32615e4af729f0610fdc11ff4f4798aef
> change-id: 20251120-2-d03862b2cf6d
>
> Best regards,
> --
> George Guo <dongtai.guo@linux.dev>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3)
  2025-11-26  2:05 [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3) George Guo
                   ` (2 preceding siblings ...)
  2025-11-26  4:44 ` [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3) Huacai Chen
@ 2025-11-26  5:23 ` Hengqi Chen
  2025-11-26  9:40   ` George Guo
  3 siblings, 1 reply; 9+ messages in thread
From: Hengqi Chen @ 2025-11-26  5:23 UTC (permalink / raw)
  To: George Guo; +Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo

On Wed, Nov 26, 2025 at 10:06 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> This patch series adds 128-bit atomic compare-and-exchange support for
> LoongArch architecture, which fixes BPF scheduler test failures caused
> by missing 128-bit atomics support.
>
> The series consists of two patches:
>
> 1. "LoongArch: Add 128-bit atomic cmpxchg support"
>    - Implements 128-bit atomic compare-and-exchange using LoongArch's
>      LL.D/SC.Q instructions
>    - Fixes BPF scheduler test failures (scx_central scx_qmap) where
>      kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
>      leading to -ENOMEM errors during scheduler initialization
>

This kmalloc_nolock_noprof() was introduced in v6.18-rc1 and has no
caller for now.
Why is this related to the sched_ext failure ?

> 2. "LoongArch: Enable 128-bit atomics cmpxchg support"
>    - Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
>      in Kconfig to enable 128-bit atomic cmpxchg support
>
> The issue was identified through BPF scheduler test failures where
> scx_central and scx_qmap schedulers would fail to initialize. Testing
> was performed using the scx_qmap scheduler from tools/sched_ext/,
> confirming that the patches resolve the initialization failures.
>
> Signed-off-by: George Guo <dongtai.guo@linux.dev>
> ---
> Changes in v3:
> - dbar 0 -> __WEAK_LLSC_MB
> - =ZB" (__ptr[0]) -> "r" (__ptr)
> - Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
>
> Changes in v2:
> - Use a normal ld.d for the high word instead of ll.d to avoid race
>   condition
> - Insert a dbar between ll.d and ld.d to prevent reordering
> - Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
> - Fix address operand constraints after testing different approaches:
>   * ld.d with "m"
>   * ll.d with "ZC",
>   * sc.q with "ZB"(alternative constraints caused issues:
>    - "r"  caused system hang
>    - "ZC" caused compiler error:
>      {standard input}: Assembler messages:
>      {standard input}:10037: Fatal error: Immediate overflow.
>      format: u0:0 )
> - Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
>
> ---
> George Guo (2):
>       LoongArch: Add 128-bit atomic cmpxchg support
>       LoongArch: Enable 128-bit atomics cmpxchg support
>
>  arch/loongarch/Kconfig               |  2 ++
>  arch/loongarch/include/asm/cmpxchg.h | 47 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 49 insertions(+)
> ---
> base-commit: d5ae5ac32615e4af729f0610fdc11ff4f4798aef
> change-id: 20251120-2-d03862b2cf6d
>
> Best regards,
> --
> George Guo <dongtai.guo@linux.dev>
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3)
  2025-11-26  5:23 ` Hengqi Chen
@ 2025-11-26  9:40   ` George Guo
  2025-11-26 11:05     ` Hengqi Chen
  0 siblings, 1 reply; 9+ messages in thread
From: George Guo @ 2025-11-26  9:40 UTC (permalink / raw)
  To: Hengqi Chen; +Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=GB18030, Size: 3858 bytes --]

On Wed, 26 Nov 2025 13:23:57 +0800
Hengqi Chen <hengqi.chen@gmail.com> wrote:

> On Wed, Nov 26, 2025 at 10:066§2AM George Guo <dongtai.guo@linux.dev>
> wrote:
> >
> > This patch series adds 128-bit atomic compare-and-exchange support
> > for LoongArch architecture, which fixes BPF scheduler test failures
> > caused by missing 128-bit atomics support.
> >
> > The series consists of two patches:
> >
> > 1. "LoongArch: Add 128-bit atomic cmpxchg support"
> >    - Implements 128-bit atomic compare-and-exchange using
> > LoongArch's LL.D/SC.Q instructions
> >    - Fixes BPF scheduler test failures (scx_central scx_qmap) where
> >      kmalloc_nolock_noprof returns NULL due to missing 128-bit
> > atomics, leading to -ENOMEM errors during scheduler initialization
> >  
> 
> This kmalloc_nolock_noprof() was introduced in v6.18-rc1 and has no
> caller for now.
> Why is this related to the sched_ext failure ?
> 
Hi Hengqi,

When running scx_central, function call chain as below:
central_init->bpf_timer_init->__bpf_async_init->bpf_map_kmalloc_nolock->kmalloc_nolock
->kmalloc_nolock_noprof

The function kmalloc_nolock_noprof returns NULL due to the following
condition:

if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))
        /*
         * kmalloc_nolock() is not supported on architectures that
         * don't implement cmpxchg16b, but debug caches don't use
         * per-cpu slab and per-cpu partial slabs. They rely on
         * kmem_cache_node->list_lock, so kmalloc_nolock() can
         * attempt to allocate from debug caches by
         * spin_trylock_irqsave(&n->list_lock, ...)
         */
        return NULL;

The NULL return occurs because kmalloc_nolock is not supported on
Loongarch, which don't implement cmpxchg16b. So I am giving the patch.

Also I tried with debug caches(CONFIG_SLUB_DEBUG_ON=y), it works,
but not a good idea. 

> > 2. "LoongArch: Enable 128-bit atomics cmpxchg support"
> >    - Adds select HAVE_CMPXCHG_DOUBLE and select
> > HAVE_ALIGNED_STRUCT_PAGE in Kconfig to enable 128-bit atomic
> > cmpxchg support
> >
> > The issue was identified through BPF scheduler test failures where
> > scx_central and scx_qmap schedulers would fail to initialize.
> > Testing was performed using the scx_qmap scheduler from
> > tools/sched_ext/, confirming that the patches resolve the
> > initialization failures.
> >
> > Signed-off-by: George Guo <dongtai.guo@linux.dev>
> > ---
> > Changes in v3:
> > - dbar 0 -> __WEAK_LLSC_MB
> > - =ZB" (__ptr[0]) -> "r" (__ptr)
> > - Link to v2:
> > https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
> >
> > Changes in v2:
> > - Use a normal ld.d for the high word instead of ll.d to avoid race
> >   condition
> > - Insert a dbar between ll.d and ld.d to prevent reordering
> > - Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to
> > __cmpxchg128_asm(ptr, o, n)
> > - Fix address operand constraints after testing different
> > approaches:
> >   * ld.d with "m"
> >   * ll.d with "ZC",
> >   * sc.q with "ZB"(alternative constraints caused issues:
> >    - "r"  caused system hang
> >    - "ZC" caused compiler error:
> >      {standard input}: Assembler messages:
> >      {standard input}:10037: Fatal error: Immediate overflow.
> >      format: u0:0 )
> > - Link to v1:
> > https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
> >
> > ---
> > George Guo (2):
> >       LoongArch: Add 128-bit atomic cmpxchg support
> >       LoongArch: Enable 128-bit atomics cmpxchg support
> >
> >  arch/loongarch/Kconfig               |  2 ++
> >  arch/loongarch/include/asm/cmpxchg.h | 47
> > ++++++++++++++++++++++++++++++++++++ 2 files changed, 49
> > insertions(+) ---
> > base-commit: d5ae5ac32615e4af729f0610fdc11ff4f4798aef
> > change-id: 20251120-2-d03862b2cf6d
> >
> > Best regards,
> > --
> > George Guo <dongtai.guo@linux.dev>
> >
> >  


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3)
  2025-11-26  4:44 ` [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3) Huacai Chen
@ 2025-11-26  9:42   ` George Guo
  0 siblings, 0 replies; 9+ messages in thread
From: George Guo @ 2025-11-26  9:42 UTC (permalink / raw)
  To: Huacai Chen; +Cc: WANG Xuerui, loongarch, linux-kernel, George Guo

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=GB18030, Size: 2908 bytes --]

On Wed, 26 Nov 2025 12:44:44 +0800
Huacai Chen <chenhuacai@kernel.org> wrote:

> Hi, George,
> 
> On Wed, Nov 26, 2025 at 10:066§2AM George Guo <dongtai.guo@linux.dev>
> wrote:
> >
> > This patch series adds 128-bit atomic compare-and-exchange support
> > for LoongArch architecture, which fixes BPF scheduler test failures
> > caused by missing 128-bit atomics support.  
> Have you tested your code on Loongson-3A5000/3C5000?
> 
> Huacai
> 
Hi Huacai,

I have tested it on a virtual machine with fedora-42.

> >
> > The series consists of two patches:
> >
> > 1. "LoongArch: Add 128-bit atomic cmpxchg support"
> >    - Implements 128-bit atomic compare-and-exchange using
> > LoongArch's LL.D/SC.Q instructions
> >    - Fixes BPF scheduler test failures (scx_central scx_qmap) where
> >      kmalloc_nolock_noprof returns NULL due to missing 128-bit
> > atomics, leading to -ENOMEM errors during scheduler initialization
> >
> > 2. "LoongArch: Enable 128-bit atomics cmpxchg support"
> >    - Adds select HAVE_CMPXCHG_DOUBLE and select
> > HAVE_ALIGNED_STRUCT_PAGE in Kconfig to enable 128-bit atomic
> > cmpxchg support
> >
> > The issue was identified through BPF scheduler test failures where
> > scx_central and scx_qmap schedulers would fail to initialize.
> > Testing was performed using the scx_qmap scheduler from
> > tools/sched_ext/, confirming that the patches resolve the
> > initialization failures.
> >
> > Signed-off-by: George Guo <dongtai.guo@linux.dev>
> > ---
> > Changes in v3:
> > - dbar 0 -> __WEAK_LLSC_MB
> > - =ZB" (__ptr[0]) -> "r" (__ptr)
> > - Link to v2:
> > https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
> >
> > Changes in v2:
> > - Use a normal ld.d for the high word instead of ll.d to avoid race
> >   condition
> > - Insert a dbar between ll.d and ld.d to prevent reordering
> > - Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to
> > __cmpxchg128_asm(ptr, o, n)
> > - Fix address operand constraints after testing different
> > approaches:
> >   * ld.d with "m"
> >   * ll.d with "ZC",
> >   * sc.q with "ZB"(alternative constraints caused issues:
> >    - "r"  caused system hang
> >    - "ZC" caused compiler error:
> >      {standard input}: Assembler messages:
> >      {standard input}:10037: Fatal error: Immediate overflow.
> >      format: u0:0 )
> > - Link to v1:
> > https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
> >
> > ---
> > George Guo (2):
> >       LoongArch: Add 128-bit atomic cmpxchg support
> >       LoongArch: Enable 128-bit atomics cmpxchg support
> >
> >  arch/loongarch/Kconfig               |  2 ++
> >  arch/loongarch/include/asm/cmpxchg.h | 47
> > ++++++++++++++++++++++++++++++++++++ 2 files changed, 49
> > insertions(+) ---
> > base-commit: d5ae5ac32615e4af729f0610fdc11ff4f4798aef
> > change-id: 20251120-2-d03862b2cf6d
> >
> > Best regards,
> > --
> > George Guo <dongtai.guo@linux.dev>
> >  


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3)
  2025-11-26  9:40   ` George Guo
@ 2025-11-26 11:05     ` Hengqi Chen
  2025-11-27  4:15       ` Hengqi Chen
  0 siblings, 1 reply; 9+ messages in thread
From: Hengqi Chen @ 2025-11-26 11:05 UTC (permalink / raw)
  To: George Guo; +Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo

On Wed, Nov 26, 2025 at 5:40 PM George Guo <dongtai.guo@linux.dev> wrote:
>
> On Wed, 26 Nov 2025 13:23:57 +0800
> Hengqi Chen <hengqi.chen@gmail.com> wrote:
>
> > On Wed, Nov 26, 2025 at 10:06 AM George Guo <dongtai.guo@linux.dev>
> > wrote:
> > >
> > > This patch series adds 128-bit atomic compare-and-exchange support
> > > for LoongArch architecture, which fixes BPF scheduler test failures
> > > caused by missing 128-bit atomics support.
> > >
> > > The series consists of two patches:
> > >
> > > 1. "LoongArch: Add 128-bit atomic cmpxchg support"
> > >    - Implements 128-bit atomic compare-and-exchange using
> > > LoongArch's LL.D/SC.Q instructions
> > >    - Fixes BPF scheduler test failures (scx_central scx_qmap) where
> > >      kmalloc_nolock_noprof returns NULL due to missing 128-bit
> > > atomics, leading to -ENOMEM errors during scheduler initialization
> > >
> >
> > This kmalloc_nolock_noprof() was introduced in v6.18-rc1 and has no
> > caller for now.

OK, it does have a caller in rc-2 [1].

  [1]: https://lore.kernel.org/bpf/20251015000700.28988-1-alexei.starovoitov@gmail.com/

> > Why is this related to the sched_ext failure ?
> >
> Hi Hengqi,
>
> When running scx_central, function call chain as below:
> central_init->bpf_timer_init->__bpf_async_init->bpf_map_kmalloc_nolock->kmalloc_nolock
> ->kmalloc_nolock_noprof
>

Thanks, will test this series.

> The function kmalloc_nolock_noprof returns NULL due to the following
> condition:
>
> if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))
>         /*
>          * kmalloc_nolock() is not supported on architectures that
>          * don't implement cmpxchg16b, but debug caches don't use
>          * per-cpu slab and per-cpu partial slabs. They rely on
>          * kmem_cache_node->list_lock, so kmalloc_nolock() can
>          * attempt to allocate from debug caches by
>          * spin_trylock_irqsave(&n->list_lock, ...)
>          */
>         return NULL;
>
> The NULL return occurs because kmalloc_nolock is not supported on
> Loongarch, which don't implement cmpxchg16b. So I am giving the patch.
>
> Also I tried with debug caches(CONFIG_SLUB_DEBUG_ON=y), it works,
> but not a good idea.
>
> > > 2. "LoongArch: Enable 128-bit atomics cmpxchg support"
> > >    - Adds select HAVE_CMPXCHG_DOUBLE and select
> > > HAVE_ALIGNED_STRUCT_PAGE in Kconfig to enable 128-bit atomic
> > > cmpxchg support
> > >
> > > The issue was identified through BPF scheduler test failures where
> > > scx_central and scx_qmap schedulers would fail to initialize.
> > > Testing was performed using the scx_qmap scheduler from
> > > tools/sched_ext/, confirming that the patches resolve the
> > > initialization failures.
> > >
> > > Signed-off-by: George Guo <dongtai.guo@linux.dev>
> > > ---
> > > Changes in v3:
> > > - dbar 0 -> __WEAK_LLSC_MB
> > > - =ZB" (__ptr[0]) -> "r" (__ptr)
> > > - Link to v2:
> > > https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
> > >
> > > Changes in v2:
> > > - Use a normal ld.d for the high word instead of ll.d to avoid race
> > >   condition
> > > - Insert a dbar between ll.d and ld.d to prevent reordering
> > > - Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to
> > > __cmpxchg128_asm(ptr, o, n)
> > > - Fix address operand constraints after testing different
> > > approaches:
> > >   * ld.d with "m"
> > >   * ll.d with "ZC",
> > >   * sc.q with "ZB"(alternative constraints caused issues:
> > >    - "r"  caused system hang
> > >    - "ZC" caused compiler error:
> > >      {standard input}: Assembler messages:
> > >      {standard input}:10037: Fatal error: Immediate overflow.
> > >      format: u0:0 )
> > > - Link to v1:
> > > https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
> > >
> > > ---
> > > George Guo (2):
> > >       LoongArch: Add 128-bit atomic cmpxchg support
> > >       LoongArch: Enable 128-bit atomics cmpxchg support
> > >
> > >  arch/loongarch/Kconfig               |  2 ++
> > >  arch/loongarch/include/asm/cmpxchg.h | 47
> > > ++++++++++++++++++++++++++++++++++++ 2 files changed, 49
> > > insertions(+) ---
> > > base-commit: d5ae5ac32615e4af729f0610fdc11ff4f4798aef
> > > change-id: 20251120-2-d03862b2cf6d
> > >
> > > Best regards,
> > > --
> > > George Guo <dongtai.guo@linux.dev>
> > >
> > >
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3)
  2025-11-26 11:05     ` Hengqi Chen
@ 2025-11-27  4:15       ` Hengqi Chen
  0 siblings, 0 replies; 9+ messages in thread
From: Hengqi Chen @ 2025-11-27  4:15 UTC (permalink / raw)
  To: George Guo; +Cc: Huacai Chen, WANG Xuerui, loongarch, linux-kernel, George Guo

On Wed, Nov 26, 2025 at 7:05 PM Hengqi Chen <hengqi.chen@gmail.com> wrote:
>
> On Wed, Nov 26, 2025 at 5:40 PM George Guo <dongtai.guo@linux.dev> wrote:
> >
> > On Wed, 26 Nov 2025 13:23:57 +0800
> > Hengqi Chen <hengqi.chen@gmail.com> wrote:
> >
> > > On Wed, Nov 26, 2025 at 10:06 AM George Guo <dongtai.guo@linux.dev>
> > > wrote:
> > > >
> > > > This patch series adds 128-bit atomic compare-and-exchange support
> > > > for LoongArch architecture, which fixes BPF scheduler test failures
> > > > caused by missing 128-bit atomics support.
> > > >
> > > > The series consists of two patches:
> > > >
> > > > 1. "LoongArch: Add 128-bit atomic cmpxchg support"
> > > >    - Implements 128-bit atomic compare-and-exchange using
> > > > LoongArch's LL.D/SC.Q instructions
> > > >    - Fixes BPF scheduler test failures (scx_central scx_qmap) where
> > > >      kmalloc_nolock_noprof returns NULL due to missing 128-bit
> > > > atomics, leading to -ENOMEM errors during scheduler initialization
> > > >
> > >
> > > This kmalloc_nolock_noprof() was introduced in v6.18-rc1 and has no
> > > caller for now.
>
> OK, it does have a caller in rc-2 [1].
>
>   [1]: https://lore.kernel.org/bpf/20251015000700.28988-1-alexei.starovoitov@gmail.com/
>
> > > Why is this related to the sched_ext failure ?
> > >
> > Hi Hengqi,
> >
> > When running scx_central, function call chain as below:
> > central_init->bpf_timer_init->__bpf_async_init->bpf_map_kmalloc_nolock->kmalloc_nolock
> > ->kmalloc_nolock_noprof
> >
>
> Thanks, will test this series.

I tried with qemu, but it seems the kernel can't even boot.

>
> > The function kmalloc_nolock_noprof returns NULL due to the following
> > condition:
> >
> > if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))
> >         /*
> >          * kmalloc_nolock() is not supported on architectures that
> >          * don't implement cmpxchg16b, but debug caches don't use
> >          * per-cpu slab and per-cpu partial slabs. They rely on
> >          * kmem_cache_node->list_lock, so kmalloc_nolock() can
> >          * attempt to allocate from debug caches by
> >          * spin_trylock_irqsave(&n->list_lock, ...)
> >          */
> >         return NULL;
> >
> > The NULL return occurs because kmalloc_nolock is not supported on
> > Loongarch, which don't implement cmpxchg16b. So I am giving the patch.
> >
> > Also I tried with debug caches(CONFIG_SLUB_DEBUG_ON=y), it works,
> > but not a good idea.
> >
> > > > 2. "LoongArch: Enable 128-bit atomics cmpxchg support"
> > > >    - Adds select HAVE_CMPXCHG_DOUBLE and select
> > > > HAVE_ALIGNED_STRUCT_PAGE in Kconfig to enable 128-bit atomic
> > > > cmpxchg support
> > > >
> > > > The issue was identified through BPF scheduler test failures where
> > > > scx_central and scx_qmap schedulers would fail to initialize.
> > > > Testing was performed using the scx_qmap scheduler from
> > > > tools/sched_ext/, confirming that the patches resolve the
> > > > initialization failures.
> > > >
> > > > Signed-off-by: George Guo <dongtai.guo@linux.dev>
> > > > ---
> > > > Changes in v3:
> > > > - dbar 0 -> __WEAK_LLSC_MB
> > > > - =ZB" (__ptr[0]) -> "r" (__ptr)
> > > > - Link to v2:
> > > > https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
> > > >
> > > > Changes in v2:
> > > > - Use a normal ld.d for the high word instead of ll.d to avoid race
> > > >   condition
> > > > - Insert a dbar between ll.d and ld.d to prevent reordering
> > > > - Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to
> > > > __cmpxchg128_asm(ptr, o, n)
> > > > - Fix address operand constraints after testing different
> > > > approaches:
> > > >   * ld.d with "m"
> > > >   * ll.d with "ZC",
> > > >   * sc.q with "ZB"(alternative constraints caused issues:
> > > >    - "r"  caused system hang
> > > >    - "ZC" caused compiler error:
> > > >      {standard input}: Assembler messages:
> > > >      {standard input}:10037: Fatal error: Immediate overflow.
> > > >      format: u0:0 )
> > > > - Link to v1:
> > > > https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
> > > >
> > > > ---
> > > > George Guo (2):
> > > >       LoongArch: Add 128-bit atomic cmpxchg support
> > > >       LoongArch: Enable 128-bit atomics cmpxchg support
> > > >
> > > >  arch/loongarch/Kconfig               |  2 ++
> > > >  arch/loongarch/include/asm/cmpxchg.h | 47
> > > > ++++++++++++++++++++++++++++++++++++ 2 files changed, 49
> > > > insertions(+) ---
> > > > base-commit: d5ae5ac32615e4af729f0610fdc11ff4f4798aef
> > > > change-id: 20251120-2-d03862b2cf6d
> > > >
> > > > Best regards,
> > > > --
> > > > George Guo <dongtai.guo@linux.dev>
> > > >
> > > >
> >

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-11-27  4:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-26  2:05 [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3) George Guo
2025-11-26  2:05 ` [PATCH v3 1/2] LoongArch: Add 128-bit atomic cmpxchg support George Guo
2025-11-26  2:05 ` [PATCH v3 2/2] LoongArch: Enable 128-bit atomics " George Guo
2025-11-26  4:44 ` [PATCH v3 0/2] LoongArch: Add 128-bit atomic cmpxchg support (v3) Huacai Chen
2025-11-26  9:42   ` George Guo
2025-11-26  5:23 ` Hengqi Chen
2025-11-26  9:40   ` George Guo
2025-11-26 11:05     ` Hengqi Chen
2025-11-27  4:15       ` Hengqi Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox