* [PATCH RFC 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h
@ 2025-06-18 3:43 Yunhui Cui
2025-06-18 3:43 ` [PATCH RFC 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
2025-06-21 14:33 ` [PATCH RFC 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h Yury Norov
0 siblings, 2 replies; 10+ messages in thread
From: Yunhui Cui @ 2025-06-18 3:43 UTC (permalink / raw)
To: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv,
linux-kernel, dennis, tj, cl, linux-mm
Cc: Yunhui Cui
The arch/riscv/include/asm/bitops.h does not functionally require
including /linux/irqflags.h. Additionally, adding
arch/riscv/include/asm/percpu.h causes a circular inclusion:
kernel/bounds.c
->include/linux/log2.h
->include/linux/bitops.h
->arch/riscv/include/asm/bitops.h
->include/linux/irqflags.h
->include/linux/find.h
->return val ? __ffs(val) : size;
->arch/riscv/include/asm/bitops.h
The compilation log is as follows:
CC kernel/bounds.s
In file included from ./include/linux/bitmap.h:11,
from ./include/linux/cpumask.h:12,
from ./arch/riscv/include/asm/processor.h:55,
from ./arch/riscv/include/asm/thread_info.h:42,
from ./include/linux/thread_info.h:60,
from ./include/asm-generic/preempt.h:5,
from ./arch/riscv/include/generated/asm/preempt.h:1,
from ./include/linux/preempt.h:79,
from ./arch/riscv/include/asm/percpu.h:8,
from ./include/linux/irqflags.h:19,
from ./arch/riscv/include/asm/bitops.h:14,
from ./include/linux/bitops.h:68,
from ./include/linux/log2.h:12,
from kernel/bounds.c:13:
./include/linux/find.h: In function 'find_next_bit':
./include/linux/find.h:66:30: error: implicit declaration of function '__ffs' [-Wimplicit-function-declaration]
66 | return val ? __ffs(val) : size;
| ^~~~~
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
arch/riscv/include/asm/bitops.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/riscv/include/asm/bitops.h b/arch/riscv/include/asm/bitops.h
index d59310f74c2ba..d9837b3cf05fe 100644
--- a/arch/riscv/include/asm/bitops.h
+++ b/arch/riscv/include/asm/bitops.h
@@ -11,7 +11,6 @@
#endif /* _LINUX_BITOPS_H */
#include <linux/compiler.h>
-#include <linux/irqflags.h>
#include <asm/barrier.h>
#include <asm/bitsperlong.h>
--
2.39.5
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH RFC 2/2] riscv: introduce percpu.h into include/asm
2025-06-18 3:43 [PATCH RFC 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui
@ 2025-06-18 3:43 ` Yunhui Cui
2025-07-17 13:04 ` Alexandre Ghiti
2025-06-21 14:33 ` [PATCH RFC 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h Yury Norov
1 sibling, 1 reply; 10+ messages in thread
From: Yunhui Cui @ 2025-06-18 3:43 UTC (permalink / raw)
To: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv,
linux-kernel, dennis, tj, cl, linux-mm
Cc: Yunhui Cui
Current percpu operations rely on generic implementations, where
raw_local_irq_save() introduces substantial overhead. Optimization
is achieved through atomic operations and preemption disabling.
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
arch/riscv/include/asm/percpu.h | 138 ++++++++++++++++++++++++++++++++
1 file changed, 138 insertions(+)
create mode 100644 arch/riscv/include/asm/percpu.h
diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percpu.h
new file mode 100644
index 0000000000000..423c0d01f874c
--- /dev/null
+++ b/arch/riscv/include/asm/percpu.h
@@ -0,0 +1,138 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_PERCPU_H
+#define __ASM_PERCPU_H
+
+#include <linux/preempt.h>
+
+#define PERCPU_RW_OPS(sz) \
+static inline unsigned long __percpu_read_##sz(void *ptr) \
+{ \
+ return READ_ONCE(*(u##sz *)ptr); \
+} \
+ \
+static inline void __percpu_write_##sz(void *ptr, unsigned long val) \
+{ \
+ WRITE_ONCE(*(u##sz *)ptr, (u##sz)val); \
+}
+
+#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \
+static inline void \
+__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) \
+{ \
+ asm volatile ( \
+ "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \
+ : [ptr] "+A" (*(u##sz *)ptr) \
+ : [val] "r" ((u##sz)(val)) \
+ : "memory"); \
+}
+
+#define __PERCPU_AMO_RET_OP_CASE(sfx, name, sz, amo_insn) \
+static inline u##sz \
+__percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long val) \
+{ \
+ register u##sz ret; \
+ \
+ asm volatile ( \
+ "amo" #amo_insn #sfx " %[ret], %[val], %[ptr]" \
+ : [ptr] "+A" (*(u##sz *)ptr), [ret] "=r" (ret) \
+ : [val] "r" ((u##sz)(val)) \
+ : "memory"); \
+ \
+ return ret + val; \
+}
+
+#define PERCPU_OP(name, amo_insn) \
+ __PERCPU_AMO_OP_CASE(.b, name, 8, amo_insn) \
+ __PERCPU_AMO_OP_CASE(.h, name, 16, amo_insn) \
+ __PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn) \
+ __PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn) \
+
+#define PERCPU_RET_OP(name, amo_insn) \
+ __PERCPU_AMO_RET_OP_CASE(.b, name, 8, amo_insn) \
+ __PERCPU_AMO_RET_OP_CASE(.h, name, 16, amo_insn) \
+ __PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn) \
+ __PERCPU_AMO_RET_OP_CASE(.d, name, 64, amo_insn)
+
+PERCPU_RW_OPS(8)
+PERCPU_RW_OPS(16)
+PERCPU_RW_OPS(32)
+PERCPU_RW_OPS(64)
+
+PERCPU_OP(add, add)
+PERCPU_OP(andnot, and)
+PERCPU_OP(or, or)
+PERCPU_RET_OP(add, add)
+
+#undef PERCPU_RW_OPS
+#undef __PERCPU_AMO_OP_CASE
+#undef __PERCPU_AMO_RET_OP_CASE
+#undef PERCPU_OP
+#undef PERCPU_RET_OP
+
+#define _pcp_protect(op, pcp, ...) \
+({ \
+ preempt_disable_notrace(); \
+ op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \
+ preempt_enable_notrace(); \
+})
+
+#define _pcp_protect_return(op, pcp, args...) \
+({ \
+ typeof(pcp) __retval; \
+ preempt_disable_notrace(); \
+ __retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \
+ preempt_enable_notrace(); \
+ __retval; \
+})
+
+#define this_cpu_read_1(pcp) _pcp_protect_return(__percpu_read_8, pcp)
+#define this_cpu_read_2(pcp) _pcp_protect_return(__percpu_read_16, pcp)
+#define this_cpu_read_4(pcp) _pcp_protect_return(__percpu_read_32, pcp)
+#define this_cpu_read_8(pcp) _pcp_protect_return(__percpu_read_64, pcp)
+
+#define this_cpu_write_1(pcp, val) _pcp_protect(__percpu_write_8, pcp, (unsigned long)val)
+#define this_cpu_write_2(pcp, val) _pcp_protect(__percpu_write_16, pcp, (unsigned long)val)
+#define this_cpu_write_4(pcp, val) _pcp_protect(__percpu_write_32, pcp, (unsigned long)val)
+#define this_cpu_write_8(pcp, val) _pcp_protect(__percpu_write_64, pcp, (unsigned long)val)
+
+#define this_cpu_add_1(pcp, val) _pcp_protect(__percpu_add_amo_case_8, pcp, val)
+#define this_cpu_add_2(pcp, val) _pcp_protect(__percpu_add_amo_case_16, pcp, val)
+#define this_cpu_add_4(pcp, val) _pcp_protect(__percpu_add_amo_case_32, pcp, val)
+#define this_cpu_add_8(pcp, val) _pcp_protect(__percpu_add_amo_case_64, pcp, val)
+
+#define this_cpu_add_return_1(pcp, val) \
+_pcp_protect_return(__percpu_add_return_amo_case_8, pcp, val)
+
+#define this_cpu_add_return_2(pcp, val) \
+_pcp_protect_return(__percpu_add_return_amo_case_16, pcp, val)
+
+#define this_cpu_add_return_4(pcp, val) \
+_pcp_protect_return(__percpu_add_return_amo_case_32, pcp, val)
+
+#define this_cpu_add_return_8(pcp, val) \
+_pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val)
+
+#define this_cpu_and_1(pcp, val) _pcp_protect(__percpu_andnot_amo_case_8, pcp, ~val)
+#define this_cpu_and_2(pcp, val) _pcp_protect(__percpu_andnot_amo_case_16, pcp, ~val)
+#define this_cpu_and_4(pcp, val) _pcp_protect(__percpu_andnot_amo_case_32, pcp, ~val)
+#define this_cpu_and_8(pcp, val) _pcp_protect(__percpu_andnot_amo_case_64, pcp, ~val)
+
+#define this_cpu_or_1(pcp, val) _pcp_protect(__percpu_or_amo_case_8, pcp, val)
+#define this_cpu_or_2(pcp, val) _pcp_protect(__percpu_or_amo_case_16, pcp, val)
+#define this_cpu_or_4(pcp, val) _pcp_protect(__percpu_or_amo_case_32, pcp, val)
+#define this_cpu_or_8(pcp, val) _pcp_protect(__percpu_or_amo_case_64, pcp, val)
+
+#define this_cpu_xchg_1(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val)
+#define this_cpu_xchg_2(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val)
+#define this_cpu_xchg_4(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val)
+#define this_cpu_xchg_8(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val)
+
+#define this_cpu_cmpxchg_1(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+#define this_cpu_cmpxchg_2(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+#define this_cpu_cmpxchg_4(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+#define this_cpu_cmpxchg_8(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+
+#include <asm-generic/percpu.h>
+
+#endif /* __ASM_PERCPU_H */
--
2.39.5
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h
2025-06-18 3:43 [PATCH RFC 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui
2025-06-18 3:43 ` [PATCH RFC 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
@ 2025-06-21 14:33 ` Yury Norov
1 sibling, 0 replies; 10+ messages in thread
From: Yury Norov @ 2025-06-21 14:33 UTC (permalink / raw)
To: Yunhui Cui
Cc: linux, paul.walmsley, palmer, aou, alex, linux-riscv,
linux-kernel, dennis, tj, cl, linux-mm
On Wed, Jun 18, 2025 at 11:43:27AM +0800, Yunhui Cui wrote:
> The arch/riscv/include/asm/bitops.h does not functionally require
> including /linux/irqflags.h. Additionally, adding
> arch/riscv/include/asm/percpu.h causes a circular inclusion:
> kernel/bounds.c
> ->include/linux/log2.h
> ->include/linux/bitops.h
> ->arch/riscv/include/asm/bitops.h
> ->include/linux/irqflags.h
> ->include/linux/find.h
> ->return val ? __ffs(val) : size;
> ->arch/riscv/include/asm/bitops.h
>
> The compilation log is as follows:
> CC kernel/bounds.s
> In file included from ./include/linux/bitmap.h:11,
> from ./include/linux/cpumask.h:12,
> from ./arch/riscv/include/asm/processor.h:55,
> from ./arch/riscv/include/asm/thread_info.h:42,
> from ./include/linux/thread_info.h:60,
> from ./include/asm-generic/preempt.h:5,
> from ./arch/riscv/include/generated/asm/preempt.h:1,
> from ./include/linux/preempt.h:79,
> from ./arch/riscv/include/asm/percpu.h:8,
> from ./include/linux/irqflags.h:19,
> from ./arch/riscv/include/asm/bitops.h:14,
> from ./include/linux/bitops.h:68,
> from ./include/linux/log2.h:12,
> from kernel/bounds.c:13:
> ./include/linux/find.h: In function 'find_next_bit':
> ./include/linux/find.h:66:30: error: implicit declaration of function '__ffs' [-Wimplicit-function-declaration]
> 66 | return val ? __ffs(val) : size;
> | ^~~~~
>
> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Is this patch a prerequisite for #2 or it fixes build on master for
you?
If fixes broken build, I'll take it and send an -rc pull request.
Otherwise, please work with RISC-V maintainers to move them together.
Thanks,
Yury
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 2/2] riscv: introduce percpu.h into include/asm
2025-06-18 3:43 ` [PATCH RFC 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
@ 2025-07-17 13:04 ` Alexandre Ghiti
2025-07-17 13:05 ` Alexandre Ghiti
0 siblings, 1 reply; 10+ messages in thread
From: Alexandre Ghiti @ 2025-07-17 13:04 UTC (permalink / raw)
To: Yunhui Cui, yury.norov, linux, paul.walmsley, palmer, aou,
linux-riscv, linux-kernel, dennis, tj, cl, linux-mm
Hi Yunhui,
On 6/18/25 05:43, Yunhui Cui wrote:
> Current percpu operations rely on generic implementations, where
> raw_local_irq_save() introduces substantial overhead. Optimization
> is achieved through atomic operations and preemption disabling.
>
> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> ---
> arch/riscv/include/asm/percpu.h | 138 ++++++++++++++++++++++++++++++++
> 1 file changed, 138 insertions(+)
> create mode 100644 arch/riscv/include/asm/percpu.h
>
> diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percpu.h
> new file mode 100644
> index 0000000000000..423c0d01f874c
> --- /dev/null
> +++ b/arch/riscv/include/asm/percpu.h
> @@ -0,0 +1,138 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#ifndef __ASM_PERCPU_H
> +#define __ASM_PERCPU_H
> +
> +#include <linux/preempt.h>
> +
> +#define PERCPU_RW_OPS(sz) \
> +static inline unsigned long __percpu_read_##sz(void *ptr) \
> +{ \
> + return READ_ONCE(*(u##sz *)ptr); \
> +} \
> + \
> +static inline void __percpu_write_##sz(void *ptr, unsigned long val) \
> +{ \
> + WRITE_ONCE(*(u##sz *)ptr, (u##sz)val); \
> +}
> +
> +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \
> +static inline void \
> +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) \
> +{ \
> + asm volatile ( \
> + "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \
> + : [ptr] "+A" (*(u##sz *)ptr) \
> + : [val] "r" ((u##sz)(val)) \
> + : "memory"); \
> +}
> +
> +#define __PERCPU_AMO_RET_OP_CASE(sfx, name, sz, amo_insn) \
> +static inline u##sz \
> +__percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long val) \
> +{ \
> + register u##sz ret; \
> + \
> + asm volatile ( \
> + "amo" #amo_insn #sfx " %[ret], %[val], %[ptr]" \
> + : [ptr] "+A" (*(u##sz *)ptr), [ret] "=r" (ret) \
> + : [val] "r" ((u##sz)(val)) \
> + : "memory"); \
> + \
> + return ret + val; \
> +}
> +
> +#define PERCPU_OP(name, amo_insn) \
> + __PERCPU_AMO_OP_CASE(.b, name, 8, amo_insn) \
> + __PERCPU_AMO_OP_CASE(.h, name, 16, amo_insn) \
> + __PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn) \
> + __PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn) \
> +
> +#define PERCPU_RET_OP(name, amo_insn) \
> + __PERCPU_AMO_RET_OP_CASE(.b, name, 8, amo_insn) \
> + __PERCPU_AMO_RET_OP_CASE(.h, name, 16, amo_insn) \
> + __PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn) \
> + __PERCPU_AMO_RET_OP_CASE(.d, name, 64, amo_insn)
> +
> +PERCPU_RW_OPS(8)
> +PERCPU_RW_OPS(16)
> +PERCPU_RW_OPS(32)
> +PERCPU_RW_OPS(64)
> +
> +PERCPU_OP(add, add)
> +PERCPU_OP(andnot, and)
> +PERCPU_OP(or, or)
> +PERCPU_RET_OP(add, add)
> +
> +#undef PERCPU_RW_OPS
> +#undef __PERCPU_AMO_OP_CASE
> +#undef __PERCPU_AMO_RET_OP_CASE
> +#undef PERCPU_OP
> +#undef PERCPU_RET_OP
> +
> +#define _pcp_protect(op, pcp, ...) \
> +({ \
> + preempt_disable_notrace(); \
> + op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \
> + preempt_enable_notrace(); \
> +})
> +
> +#define _pcp_protect_return(op, pcp, args...) \
> +({ \
> + typeof(pcp) __retval; \
> + preempt_disable_notrace(); \
> + __retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \
> + preempt_enable_notrace(); \
> + __retval; \
> +})
> +
> +#define this_cpu_read_1(pcp) _pcp_protect_return(__percpu_read_8, pcp)
> +#define this_cpu_read_2(pcp) _pcp_protect_return(__percpu_read_16, pcp)
> +#define this_cpu_read_4(pcp) _pcp_protect_return(__percpu_read_32, pcp)
> +#define this_cpu_read_8(pcp) _pcp_protect_return(__percpu_read_64, pcp)
> +
> +#define this_cpu_write_1(pcp, val) _pcp_protect(__percpu_write_8, pcp, (unsigned long)val)
> +#define this_cpu_write_2(pcp, val) _pcp_protect(__percpu_write_16, pcp, (unsigned long)val)
> +#define this_cpu_write_4(pcp, val) _pcp_protect(__percpu_write_32, pcp, (unsigned long)val)
> +#define this_cpu_write_8(pcp, val) _pcp_protect(__percpu_write_64, pcp, (unsigned long)val)
> +
> +#define this_cpu_add_1(pcp, val) _pcp_protect(__percpu_add_amo_case_8, pcp, val)
> +#define this_cpu_add_2(pcp, val) _pcp_protect(__percpu_add_amo_case_16, pcp, val)
> +#define this_cpu_add_4(pcp, val) _pcp_protect(__percpu_add_amo_case_32, pcp, val)
> +#define this_cpu_add_8(pcp, val) _pcp_protect(__percpu_add_amo_case_64, pcp, val)
> +
> +#define this_cpu_add_return_1(pcp, val) \
> +_pcp_protect_return(__percpu_add_return_amo_case_8, pcp, val)
> +
> +#define this_cpu_add_return_2(pcp, val) \
> +_pcp_protect_return(__percpu_add_return_amo_case_16, pcp, val)
> +
> +#define this_cpu_add_return_4(pcp, val) \
> +_pcp_protect_return(__percpu_add_return_amo_case_32, pcp, val)
> +
> +#define this_cpu_add_return_8(pcp, val) \
> +_pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val)
> +
> +#define this_cpu_and_1(pcp, val) _pcp_protect(__percpu_andnot_amo_case_8, pcp, ~val)
> +#define this_cpu_and_2(pcp, val) _pcp_protect(__percpu_andnot_amo_case_16, pcp, ~val)
> +#define this_cpu_and_4(pcp, val) _pcp_protect(__percpu_andnot_amo_case_32, pcp, ~val)
> +#define this_cpu_and_8(pcp, val) _pcp_protect(__percpu_andnot_amo_case_64, pcp, ~val)
Why do we define __percpu_andnot based on amoand, and use
__percpu_andnot with ~val here? Can't we just define __percpu_and?
> +
> +#define this_cpu_or_1(pcp, val) _pcp_protect(__percpu_or_amo_case_8, pcp, val)
> +#define this_cpu_or_2(pcp, val) _pcp_protect(__percpu_or_amo_case_16, pcp, val)
> +#define this_cpu_or_4(pcp, val) _pcp_protect(__percpu_or_amo_case_32, pcp, val)
> +#define this_cpu_or_8(pcp, val) _pcp_protect(__percpu_or_amo_case_64, pcp, val)
> +
> +#define this_cpu_xchg_1(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val)
> +#define this_cpu_xchg_2(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val)
> +#define this_cpu_xchg_4(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val)
> +#define this_cpu_xchg_8(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val)
> +
> +#define this_cpu_cmpxchg_1(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> +#define this_cpu_cmpxchg_2(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> +#define this_cpu_cmpxchg_4(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> +#define this_cpu_cmpxchg_8(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> +
> +#include <asm-generic/percpu.h>
> +
> +#endif /* __ASM_PERCPU_H */
It all looks good to me, just one thing, can you also implement
this_cpu_cmpxchg64/128()?
And since this is almost a copy/paste from arm64, either mention it at
the top of the file or (better) merge both implementations somewhere to
avoid redefining existing code :) But up to you.
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Thanks,
Alex
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 2/2] riscv: introduce percpu.h into include/asm
2025-07-17 13:04 ` Alexandre Ghiti
@ 2025-07-17 13:05 ` Alexandre Ghiti
2025-07-18 6:40 ` [External] " yunhui cui
0 siblings, 1 reply; 10+ messages in thread
From: Alexandre Ghiti @ 2025-07-17 13:05 UTC (permalink / raw)
To: Yunhui Cui, yury.norov, linux, paul.walmsley, palmer, aou,
linux-riscv, linux-kernel, dennis, tj, cl, linux-mm
On 7/17/25 15:04, Alexandre Ghiti wrote:
> Hi Yunhui,
>
> On 6/18/25 05:43, Yunhui Cui wrote:
>> Current percpu operations rely on generic implementations, where
>> raw_local_irq_save() introduces substantial overhead. Optimization
>> is achieved through atomic operations and preemption disabling.
>>
>> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
>> ---
>> arch/riscv/include/asm/percpu.h | 138 ++++++++++++++++++++++++++++++++
>> 1 file changed, 138 insertions(+)
>> create mode 100644 arch/riscv/include/asm/percpu.h
>>
>> diff --git a/arch/riscv/include/asm/percpu.h
>> b/arch/riscv/include/asm/percpu.h
>> new file mode 100644
>> index 0000000000000..423c0d01f874c
>> --- /dev/null
>> +++ b/arch/riscv/include/asm/percpu.h
>> @@ -0,0 +1,138 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +
>> +#ifndef __ASM_PERCPU_H
>> +#define __ASM_PERCPU_H
>> +
>> +#include <linux/preempt.h>
>> +
>> +#define PERCPU_RW_OPS(sz) \
>> +static inline unsigned long __percpu_read_##sz(void *ptr) \
>> +{ \
>> + return READ_ONCE(*(u##sz *)ptr); \
>> +} \
>> + \
>> +static inline void __percpu_write_##sz(void *ptr, unsigned long
>> val) \
>> +{ \
>> + WRITE_ONCE(*(u##sz *)ptr, (u##sz)val); \
>> +}
>> +
>> +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \
>> +static inline void \
>> +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) \
>> +{ \
>> + asm volatile ( \
>> + "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \
>> + : [ptr] "+A" (*(u##sz *)ptr) \
>> + : [val] "r" ((u##sz)(val)) \
>> + : "memory"); \
>> +}
>> +
>> +#define __PERCPU_AMO_RET_OP_CASE(sfx, name, sz, amo_insn) \
>> +static inline u##sz \
>> +__percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long
>> val) \
>> +{ \
>> + register u##sz ret; \
>> + \
>> + asm volatile ( \
>> + "amo" #amo_insn #sfx " %[ret], %[val], %[ptr]" \
>> + : [ptr] "+A" (*(u##sz *)ptr), [ret] "=r" (ret) \
>> + : [val] "r" ((u##sz)(val)) \
>> + : "memory"); \
>> + \
>> + return ret + val; \
>> +}
>> +
>> +#define PERCPU_OP(name, amo_insn) \
>> + __PERCPU_AMO_OP_CASE(.b, name, 8, amo_insn) \
>> + __PERCPU_AMO_OP_CASE(.h, name, 16, amo_insn) \
>> + __PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn) \
>> + __PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn) \
>> +
>> +#define PERCPU_RET_OP(name, amo_insn) \
>> + __PERCPU_AMO_RET_OP_CASE(.b, name, 8, amo_insn) \
>> + __PERCPU_AMO_RET_OP_CASE(.h, name, 16, amo_insn) \
>> + __PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn) \
>> + __PERCPU_AMO_RET_OP_CASE(.d, name, 64, amo_insn)
>> +
>> +PERCPU_RW_OPS(8)
>> +PERCPU_RW_OPS(16)
>> +PERCPU_RW_OPS(32)
>> +PERCPU_RW_OPS(64)
>> +
>> +PERCPU_OP(add, add)
>> +PERCPU_OP(andnot, and)
>> +PERCPU_OP(or, or)
>> +PERCPU_RET_OP(add, add)
>> +
>> +#undef PERCPU_RW_OPS
>> +#undef __PERCPU_AMO_OP_CASE
>> +#undef __PERCPU_AMO_RET_OP_CASE
>> +#undef PERCPU_OP
>> +#undef PERCPU_RET_OP
>> +
>> +#define _pcp_protect(op, pcp, ...) \
>> +({ \
>> + preempt_disable_notrace(); \
>> + op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \
>> + preempt_enable_notrace(); \
>> +})
>> +
>> +#define _pcp_protect_return(op, pcp, args...) \
>> +({ \
>> + typeof(pcp) __retval; \
>> + preempt_disable_notrace(); \
>> + __retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \
>> + preempt_enable_notrace(); \
>> + __retval; \
>> +})
>> +
>> +#define this_cpu_read_1(pcp) _pcp_protect_return(__percpu_read_8, pcp)
>> +#define this_cpu_read_2(pcp) _pcp_protect_return(__percpu_read_16, pcp)
>> +#define this_cpu_read_4(pcp) _pcp_protect_return(__percpu_read_32, pcp)
>> +#define this_cpu_read_8(pcp) _pcp_protect_return(__percpu_read_64, pcp)
>> +
>> +#define this_cpu_write_1(pcp, val) _pcp_protect(__percpu_write_8,
>> pcp, (unsigned long)val)
>> +#define this_cpu_write_2(pcp, val) _pcp_protect(__percpu_write_16,
>> pcp, (unsigned long)val)
>> +#define this_cpu_write_4(pcp, val) _pcp_protect(__percpu_write_32,
>> pcp, (unsigned long)val)
>> +#define this_cpu_write_8(pcp, val) _pcp_protect(__percpu_write_64,
>> pcp, (unsigned long)val)
>> +
>> +#define this_cpu_add_1(pcp, val)
>> _pcp_protect(__percpu_add_amo_case_8, pcp, val)
>> +#define this_cpu_add_2(pcp, val)
>> _pcp_protect(__percpu_add_amo_case_16, pcp, val)
>> +#define this_cpu_add_4(pcp, val)
>> _pcp_protect(__percpu_add_amo_case_32, pcp, val)
>> +#define this_cpu_add_8(pcp, val)
>> _pcp_protect(__percpu_add_amo_case_64, pcp, val)
>> +
>> +#define this_cpu_add_return_1(pcp, val) \
>> +_pcp_protect_return(__percpu_add_return_amo_case_8, pcp, val)
>> +
>> +#define this_cpu_add_return_2(pcp, val) \
>> +_pcp_protect_return(__percpu_add_return_amo_case_16, pcp, val)
>> +
>> +#define this_cpu_add_return_4(pcp, val) \
>> +_pcp_protect_return(__percpu_add_return_amo_case_32, pcp, val)
>> +
>> +#define this_cpu_add_return_8(pcp, val) \
>> +_pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val)
>> +
>> +#define this_cpu_and_1(pcp, val)
>> _pcp_protect(__percpu_andnot_amo_case_8, pcp, ~val)
>> +#define this_cpu_and_2(pcp, val)
>> _pcp_protect(__percpu_andnot_amo_case_16, pcp, ~val)
>> +#define this_cpu_and_4(pcp, val)
>> _pcp_protect(__percpu_andnot_amo_case_32, pcp, ~val)
>> +#define this_cpu_and_8(pcp, val)
>> _pcp_protect(__percpu_andnot_amo_case_64, pcp, ~val)
>
>
> Why do we define __percpu_andnot based on amoand, and use
> __percpu_andnot with ~val here? Can't we just define __percpu_and?
>
>
>> +
>> +#define this_cpu_or_1(pcp, val) _pcp_protect(__percpu_or_amo_case_8,
>> pcp, val)
>> +#define this_cpu_or_2(pcp, val)
>> _pcp_protect(__percpu_or_amo_case_16, pcp, val)
>> +#define this_cpu_or_4(pcp, val)
>> _pcp_protect(__percpu_or_amo_case_32, pcp, val)
>> +#define this_cpu_or_8(pcp, val)
>> _pcp_protect(__percpu_or_amo_case_64, pcp, val)
>> +
>> +#define this_cpu_xchg_1(pcp, val) _pcp_protect_return(xchg_relaxed,
>> pcp, val)
>> +#define this_cpu_xchg_2(pcp, val) _pcp_protect_return(xchg_relaxed,
>> pcp, val)
>> +#define this_cpu_xchg_4(pcp, val) _pcp_protect_return(xchg_relaxed,
>> pcp, val)
>> +#define this_cpu_xchg_8(pcp, val) _pcp_protect_return(xchg_relaxed,
>> pcp, val)
>> +
>> +#define this_cpu_cmpxchg_1(pcp, o, n)
>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>> +#define this_cpu_cmpxchg_2(pcp, o, n)
>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>> +#define this_cpu_cmpxchg_4(pcp, o, n)
>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>> +#define this_cpu_cmpxchg_8(pcp, o, n)
>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>> +
>> +#include <asm-generic/percpu.h>
>> +
>> +#endif /* __ASM_PERCPU_H */
>
>
> It all looks good to me, just one thing, can you also implement
> this_cpu_cmpxchg64/128()?
>
One last thing sorry, can you add a cover letter too?
Thanks!
Alex
> And since this is almost a copy/paste from arm64, either mention it at
> the top of the file or (better) merge both implementations somewhere
> to avoid redefining existing code :) But up to you.
>
> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>
> Thanks,
>
> Alex
>
>
>
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] Re: [PATCH RFC 2/2] riscv: introduce percpu.h into include/asm
2025-07-17 13:05 ` Alexandre Ghiti
@ 2025-07-18 6:40 ` yunhui cui
2025-07-18 14:22 ` Alexandre Ghiti
0 siblings, 1 reply; 10+ messages in thread
From: yunhui cui @ 2025-07-18 6:40 UTC (permalink / raw)
To: Alexandre Ghiti
Cc: yury.norov, linux, paul.walmsley, palmer, aou, linux-riscv,
linux-kernel, dennis, tj, cl, linux-mm
Hi Alex,
On Thu, Jul 17, 2025 at 9:06 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
> On 7/17/25 15:04, Alexandre Ghiti wrote:
> > Hi Yunhui,
> >
> > On 6/18/25 05:43, Yunhui Cui wrote:
> >> Current percpu operations rely on generic implementations, where
> >> raw_local_irq_save() introduces substantial overhead. Optimization
> >> is achieved through atomic operations and preemption disabling.
> >>
> >> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> >> ---
> >> arch/riscv/include/asm/percpu.h | 138 ++++++++++++++++++++++++++++++++
> >> 1 file changed, 138 insertions(+)
> >> create mode 100644 arch/riscv/include/asm/percpu.h
> >>
> >> diff --git a/arch/riscv/include/asm/percpu.h
> >> b/arch/riscv/include/asm/percpu.h
> >> new file mode 100644
> >> index 0000000000000..423c0d01f874c
> >> --- /dev/null
> >> +++ b/arch/riscv/include/asm/percpu.h
> >> @@ -0,0 +1,138 @@
> >> +/* SPDX-License-Identifier: GPL-2.0-only */
> >> +
> >> +#ifndef __ASM_PERCPU_H
> >> +#define __ASM_PERCPU_H
> >> +
> >> +#include <linux/preempt.h>
> >> +
> >> +#define PERCPU_RW_OPS(sz) \
> >> +static inline unsigned long __percpu_read_##sz(void *ptr) \
> >> +{ \
> >> + return READ_ONCE(*(u##sz *)ptr); \
> >> +} \
> >> + \
> >> +static inline void __percpu_write_##sz(void *ptr, unsigned long
> >> val) \
> >> +{ \
> >> + WRITE_ONCE(*(u##sz *)ptr, (u##sz)val); \
> >> +}
> >> +
> >> +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \
> >> +static inline void \
> >> +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) \
> >> +{ \
> >> + asm volatile ( \
> >> + "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \
> >> + : [ptr] "+A" (*(u##sz *)ptr) \
> >> + : [val] "r" ((u##sz)(val)) \
> >> + : "memory"); \
> >> +}
> >> +
> >> +#define __PERCPU_AMO_RET_OP_CASE(sfx, name, sz, amo_insn) \
> >> +static inline u##sz \
> >> +__percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long
> >> val) \
> >> +{ \
> >> + register u##sz ret; \
> >> + \
> >> + asm volatile ( \
> >> + "amo" #amo_insn #sfx " %[ret], %[val], %[ptr]" \
> >> + : [ptr] "+A" (*(u##sz *)ptr), [ret] "=r" (ret) \
> >> + : [val] "r" ((u##sz)(val)) \
> >> + : "memory"); \
> >> + \
> >> + return ret + val; \
> >> +}
> >> +
> >> +#define PERCPU_OP(name, amo_insn) \
> >> + __PERCPU_AMO_OP_CASE(.b, name, 8, amo_insn) \
> >> + __PERCPU_AMO_OP_CASE(.h, name, 16, amo_insn) \
> >> + __PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn) \
> >> + __PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn) \
> >> +
> >> +#define PERCPU_RET_OP(name, amo_insn) \
> >> + __PERCPU_AMO_RET_OP_CASE(.b, name, 8, amo_insn) \
> >> + __PERCPU_AMO_RET_OP_CASE(.h, name, 16, amo_insn) \
> >> + __PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn) \
> >> + __PERCPU_AMO_RET_OP_CASE(.d, name, 64, amo_insn)
> >> +
> >> +PERCPU_RW_OPS(8)
> >> +PERCPU_RW_OPS(16)
> >> +PERCPU_RW_OPS(32)
> >> +PERCPU_RW_OPS(64)
> >> +
> >> +PERCPU_OP(add, add)
> >> +PERCPU_OP(andnot, and)
> >> +PERCPU_OP(or, or)
> >> +PERCPU_RET_OP(add, add)
> >> +
> >> +#undef PERCPU_RW_OPS
> >> +#undef __PERCPU_AMO_OP_CASE
> >> +#undef __PERCPU_AMO_RET_OP_CASE
> >> +#undef PERCPU_OP
> >> +#undef PERCPU_RET_OP
> >> +
> >> +#define _pcp_protect(op, pcp, ...) \
> >> +({ \
> >> + preempt_disable_notrace(); \
> >> + op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \
> >> + preempt_enable_notrace(); \
> >> +})
> >> +
> >> +#define _pcp_protect_return(op, pcp, args...) \
> >> +({ \
> >> + typeof(pcp) __retval; \
> >> + preempt_disable_notrace(); \
> >> + __retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \
> >> + preempt_enable_notrace(); \
> >> + __retval; \
> >> +})
> >> +
> >> +#define this_cpu_read_1(pcp) _pcp_protect_return(__percpu_read_8, pcp)
> >> +#define this_cpu_read_2(pcp) _pcp_protect_return(__percpu_read_16, pcp)
> >> +#define this_cpu_read_4(pcp) _pcp_protect_return(__percpu_read_32, pcp)
> >> +#define this_cpu_read_8(pcp) _pcp_protect_return(__percpu_read_64, pcp)
> >> +
> >> +#define this_cpu_write_1(pcp, val) _pcp_protect(__percpu_write_8,
> >> pcp, (unsigned long)val)
> >> +#define this_cpu_write_2(pcp, val) _pcp_protect(__percpu_write_16,
> >> pcp, (unsigned long)val)
> >> +#define this_cpu_write_4(pcp, val) _pcp_protect(__percpu_write_32,
> >> pcp, (unsigned long)val)
> >> +#define this_cpu_write_8(pcp, val) _pcp_protect(__percpu_write_64,
> >> pcp, (unsigned long)val)
> >> +
> >> +#define this_cpu_add_1(pcp, val)
> >> _pcp_protect(__percpu_add_amo_case_8, pcp, val)
> >> +#define this_cpu_add_2(pcp, val)
> >> _pcp_protect(__percpu_add_amo_case_16, pcp, val)
> >> +#define this_cpu_add_4(pcp, val)
> >> _pcp_protect(__percpu_add_amo_case_32, pcp, val)
> >> +#define this_cpu_add_8(pcp, val)
> >> _pcp_protect(__percpu_add_amo_case_64, pcp, val)
> >> +
> >> +#define this_cpu_add_return_1(pcp, val) \
> >> +_pcp_protect_return(__percpu_add_return_amo_case_8, pcp, val)
> >> +
> >> +#define this_cpu_add_return_2(pcp, val) \
> >> +_pcp_protect_return(__percpu_add_return_amo_case_16, pcp, val)
> >> +
> >> +#define this_cpu_add_return_4(pcp, val) \
> >> +_pcp_protect_return(__percpu_add_return_amo_case_32, pcp, val)
> >> +
> >> +#define this_cpu_add_return_8(pcp, val) \
> >> +_pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val)
> >> +
> >> +#define this_cpu_and_1(pcp, val)
> >> _pcp_protect(__percpu_andnot_amo_case_8, pcp, ~val)
> >> +#define this_cpu_and_2(pcp, val)
> >> _pcp_protect(__percpu_andnot_amo_case_16, pcp, ~val)
> >> +#define this_cpu_and_4(pcp, val)
> >> _pcp_protect(__percpu_andnot_amo_case_32, pcp, ~val)
> >> +#define this_cpu_and_8(pcp, val)
> >> _pcp_protect(__percpu_andnot_amo_case_64, pcp, ~val)
> >
> >
> > Why do we define __percpu_andnot based on amoand, and use
> > __percpu_andnot with ~val here? Can't we just define __percpu_and?
> >
> >
> >> +
> >> +#define this_cpu_or_1(pcp, val) _pcp_protect(__percpu_or_amo_case_8,
> >> pcp, val)
> >> +#define this_cpu_or_2(pcp, val)
> >> _pcp_protect(__percpu_or_amo_case_16, pcp, val)
> >> +#define this_cpu_or_4(pcp, val)
> >> _pcp_protect(__percpu_or_amo_case_32, pcp, val)
> >> +#define this_cpu_or_8(pcp, val)
> >> _pcp_protect(__percpu_or_amo_case_64, pcp, val)
> >> +
> >> +#define this_cpu_xchg_1(pcp, val) _pcp_protect_return(xchg_relaxed,
> >> pcp, val)
> >> +#define this_cpu_xchg_2(pcp, val) _pcp_protect_return(xchg_relaxed,
> >> pcp, val)
> >> +#define this_cpu_xchg_4(pcp, val) _pcp_protect_return(xchg_relaxed,
> >> pcp, val)
> >> +#define this_cpu_xchg_8(pcp, val) _pcp_protect_return(xchg_relaxed,
> >> pcp, val)
> >> +
> >> +#define this_cpu_cmpxchg_1(pcp, o, n)
> >> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >> +#define this_cpu_cmpxchg_2(pcp, o, n)
> >> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >> +#define this_cpu_cmpxchg_4(pcp, o, n)
> >> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >> +#define this_cpu_cmpxchg_8(pcp, o, n)
> >> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >> +
> >> +#include <asm-generic/percpu.h>
> >> +
> >> +#endif /* __ASM_PERCPU_H */
> >
> >
> > It all looks good to me, just one thing, can you also implement
> > this_cpu_cmpxchg64/128()?
> >
>
> One last thing sorry, can you add a cover letter too?
Okay.
>
> Thanks!
>
> Alex
>
>
> > And since this is almost a copy/paste from arm64, either mention it at
> > the top of the file or (better) merge both implementations somewhere
> > to avoid redefining existing code :) But up to you.
Actually, there's a concern here. We should account for scenarios
where ZABHA isn't supported. Given that xxx_8() and xxx_16() are
rarely used in practice, could we initially support only xxx_32() and
xxx_64()? For xxx_8() and xxx_16(), we could default to the generic
implementation.
> >
> > Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> >
> > Thanks,
> >
> > Alex
> >
> >
> >
Thanks,
Yunhui
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] Re: [PATCH RFC 2/2] riscv: introduce percpu.h into include/asm
2025-07-18 6:40 ` [External] " yunhui cui
@ 2025-07-18 14:22 ` Alexandre Ghiti
2025-07-18 14:33 ` yunhui cui
0 siblings, 1 reply; 10+ messages in thread
From: Alexandre Ghiti @ 2025-07-18 14:22 UTC (permalink / raw)
To: yunhui cui
Cc: yury.norov, linux, paul.walmsley, palmer, aou, linux-riscv,
linux-kernel, dennis, tj, cl, linux-mm
Hi Yunhui,
On 7/18/25 08:40, yunhui cui wrote:
> Hi Alex,
>
> On Thu, Jul 17, 2025 at 9:06 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>> On 7/17/25 15:04, Alexandre Ghiti wrote:
>>> Hi Yunhui,
>>>
>>> On 6/18/25 05:43, Yunhui Cui wrote:
>>>> Current percpu operations rely on generic implementations, where
>>>> raw_local_irq_save() introduces substantial overhead. Optimization
>>>> is achieved through atomic operations and preemption disabling.
>>>>
>>>> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
>>>> ---
>>>> arch/riscv/include/asm/percpu.h | 138 ++++++++++++++++++++++++++++++++
>>>> 1 file changed, 138 insertions(+)
>>>> create mode 100644 arch/riscv/include/asm/percpu.h
>>>>
>>>> diff --git a/arch/riscv/include/asm/percpu.h
>>>> b/arch/riscv/include/asm/percpu.h
>>>> new file mode 100644
>>>> index 0000000000000..423c0d01f874c
>>>> --- /dev/null
>>>> +++ b/arch/riscv/include/asm/percpu.h
>>>> @@ -0,0 +1,138 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>>> +
>>>> +#ifndef __ASM_PERCPU_H
>>>> +#define __ASM_PERCPU_H
>>>> +
>>>> +#include <linux/preempt.h>
>>>> +
>>>> +#define PERCPU_RW_OPS(sz) \
>>>> +static inline unsigned long __percpu_read_##sz(void *ptr) \
>>>> +{ \
>>>> + return READ_ONCE(*(u##sz *)ptr); \
>>>> +} \
>>>> + \
>>>> +static inline void __percpu_write_##sz(void *ptr, unsigned long
>>>> val) \
>>>> +{ \
>>>> + WRITE_ONCE(*(u##sz *)ptr, (u##sz)val); \
>>>> +}
>>>> +
>>>> +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \
>>>> +static inline void \
>>>> +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) \
>>>> +{ \
>>>> + asm volatile ( \
>>>> + "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \
>>>> + : [ptr] "+A" (*(u##sz *)ptr) \
>>>> + : [val] "r" ((u##sz)(val)) \
>>>> + : "memory"); \
>>>> +}
>>>> +
>>>> +#define __PERCPU_AMO_RET_OP_CASE(sfx, name, sz, amo_insn) \
>>>> +static inline u##sz \
>>>> +__percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long
>>>> val) \
>>>> +{ \
>>>> + register u##sz ret; \
>>>> + \
>>>> + asm volatile ( \
>>>> + "amo" #amo_insn #sfx " %[ret], %[val], %[ptr]" \
>>>> + : [ptr] "+A" (*(u##sz *)ptr), [ret] "=r" (ret) \
>>>> + : [val] "r" ((u##sz)(val)) \
>>>> + : "memory"); \
>>>> + \
>>>> + return ret + val; \
>>>> +}
>>>> +
>>>> +#define PERCPU_OP(name, amo_insn) \
>>>> + __PERCPU_AMO_OP_CASE(.b, name, 8, amo_insn) \
>>>> + __PERCPU_AMO_OP_CASE(.h, name, 16, amo_insn) \
>>>> + __PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn) \
>>>> + __PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn) \
>>>> +
>>>> +#define PERCPU_RET_OP(name, amo_insn) \
>>>> + __PERCPU_AMO_RET_OP_CASE(.b, name, 8, amo_insn) \
>>>> + __PERCPU_AMO_RET_OP_CASE(.h, name, 16, amo_insn) \
>>>> + __PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn) \
>>>> + __PERCPU_AMO_RET_OP_CASE(.d, name, 64, amo_insn)
>>>> +
>>>> +PERCPU_RW_OPS(8)
>>>> +PERCPU_RW_OPS(16)
>>>> +PERCPU_RW_OPS(32)
>>>> +PERCPU_RW_OPS(64)
>>>> +
>>>> +PERCPU_OP(add, add)
>>>> +PERCPU_OP(andnot, and)
>>>> +PERCPU_OP(or, or)
>>>> +PERCPU_RET_OP(add, add)
>>>> +
>>>> +#undef PERCPU_RW_OPS
>>>> +#undef __PERCPU_AMO_OP_CASE
>>>> +#undef __PERCPU_AMO_RET_OP_CASE
>>>> +#undef PERCPU_OP
>>>> +#undef PERCPU_RET_OP
>>>> +
>>>> +#define _pcp_protect(op, pcp, ...) \
>>>> +({ \
>>>> + preempt_disable_notrace(); \
>>>> + op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \
>>>> + preempt_enable_notrace(); \
>>>> +})
>>>> +
>>>> +#define _pcp_protect_return(op, pcp, args...) \
>>>> +({ \
>>>> + typeof(pcp) __retval; \
>>>> + preempt_disable_notrace(); \
>>>> + __retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \
>>>> + preempt_enable_notrace(); \
>>>> + __retval; \
>>>> +})
>>>> +
>>>> +#define this_cpu_read_1(pcp) _pcp_protect_return(__percpu_read_8, pcp)
>>>> +#define this_cpu_read_2(pcp) _pcp_protect_return(__percpu_read_16, pcp)
>>>> +#define this_cpu_read_4(pcp) _pcp_protect_return(__percpu_read_32, pcp)
>>>> +#define this_cpu_read_8(pcp) _pcp_protect_return(__percpu_read_64, pcp)
>>>> +
>>>> +#define this_cpu_write_1(pcp, val) _pcp_protect(__percpu_write_8,
>>>> pcp, (unsigned long)val)
>>>> +#define this_cpu_write_2(pcp, val) _pcp_protect(__percpu_write_16,
>>>> pcp, (unsigned long)val)
>>>> +#define this_cpu_write_4(pcp, val) _pcp_protect(__percpu_write_32,
>>>> pcp, (unsigned long)val)
>>>> +#define this_cpu_write_8(pcp, val) _pcp_protect(__percpu_write_64,
>>>> pcp, (unsigned long)val)
>>>> +
>>>> +#define this_cpu_add_1(pcp, val)
>>>> _pcp_protect(__percpu_add_amo_case_8, pcp, val)
>>>> +#define this_cpu_add_2(pcp, val)
>>>> _pcp_protect(__percpu_add_amo_case_16, pcp, val)
>>>> +#define this_cpu_add_4(pcp, val)
>>>> _pcp_protect(__percpu_add_amo_case_32, pcp, val)
>>>> +#define this_cpu_add_8(pcp, val)
>>>> _pcp_protect(__percpu_add_amo_case_64, pcp, val)
>>>> +
>>>> +#define this_cpu_add_return_1(pcp, val) \
>>>> +_pcp_protect_return(__percpu_add_return_amo_case_8, pcp, val)
>>>> +
>>>> +#define this_cpu_add_return_2(pcp, val) \
>>>> +_pcp_protect_return(__percpu_add_return_amo_case_16, pcp, val)
>>>> +
>>>> +#define this_cpu_add_return_4(pcp, val) \
>>>> +_pcp_protect_return(__percpu_add_return_amo_case_32, pcp, val)
>>>> +
>>>> +#define this_cpu_add_return_8(pcp, val) \
>>>> +_pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val)
>>>> +
>>>> +#define this_cpu_and_1(pcp, val)
>>>> _pcp_protect(__percpu_andnot_amo_case_8, pcp, ~val)
>>>> +#define this_cpu_and_2(pcp, val)
>>>> _pcp_protect(__percpu_andnot_amo_case_16, pcp, ~val)
>>>> +#define this_cpu_and_4(pcp, val)
>>>> _pcp_protect(__percpu_andnot_amo_case_32, pcp, ~val)
>>>> +#define this_cpu_and_8(pcp, val)
>>>> _pcp_protect(__percpu_andnot_amo_case_64, pcp, ~val)
>>>
>>> Why do we define __percpu_andnot based on amoand, and use
>>> __percpu_andnot with ~val here? Can't we just define __percpu_and?
What about that ^?
>>>
>>>> +
>>>> +#define this_cpu_or_1(pcp, val) _pcp_protect(__percpu_or_amo_case_8,
>>>> pcp, val)
>>>> +#define this_cpu_or_2(pcp, val)
>>>> _pcp_protect(__percpu_or_amo_case_16, pcp, val)
>>>> +#define this_cpu_or_4(pcp, val)
>>>> _pcp_protect(__percpu_or_amo_case_32, pcp, val)
>>>> +#define this_cpu_or_8(pcp, val)
>>>> _pcp_protect(__percpu_or_amo_case_64, pcp, val)
>>>> +
>>>> +#define this_cpu_xchg_1(pcp, val) _pcp_protect_return(xchg_relaxed,
>>>> pcp, val)
>>>> +#define this_cpu_xchg_2(pcp, val) _pcp_protect_return(xchg_relaxed,
>>>> pcp, val)
>>>> +#define this_cpu_xchg_4(pcp, val) _pcp_protect_return(xchg_relaxed,
>>>> pcp, val)
>>>> +#define this_cpu_xchg_8(pcp, val) _pcp_protect_return(xchg_relaxed,
>>>> pcp, val)
>>>> +
>>>> +#define this_cpu_cmpxchg_1(pcp, o, n)
>>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>>>> +#define this_cpu_cmpxchg_2(pcp, o, n)
>>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>>>> +#define this_cpu_cmpxchg_4(pcp, o, n)
>>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>>>> +#define this_cpu_cmpxchg_8(pcp, o, n)
>>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>>>> +
>>>> +#include <asm-generic/percpu.h>
>>>> +
>>>> +#endif /* __ASM_PERCPU_H */
>>>
>>> It all looks good to me, just one thing, can you also implement
>>> this_cpu_cmpxchg64/128()?
>>>
>> One last thing sorry, can you add a cover letter too?
> Okay.
>
>> Thanks!
>>
>> Alex
>>
>>
>>> And since this is almost a copy/paste from arm64, either mention it at
>>> the top of the file or (better) merge both implementations somewhere
>>> to avoid redefining existing code :) But up to you.
> Actually, there's a concern here. We should account for scenarios
> where ZABHA isn't supported. Given that xxx_8() and xxx_16() are
> rarely used in practice, could we initially support only xxx_32() and
> xxx_64()? For xxx_8() and xxx_16(), we could default to the generic
> implementation.
Why isn't lr/sc enough?
>
>
>>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>>>
>>> Thanks,
>>>
>>> Alex
>>>
>>>
>>>
> Thanks,
> Yunhui
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] Re: [PATCH RFC 2/2] riscv: introduce percpu.h into include/asm
2025-07-18 14:22 ` Alexandre Ghiti
@ 2025-07-18 14:33 ` yunhui cui
2025-08-07 14:54 ` Alexandre Ghiti
0 siblings, 1 reply; 10+ messages in thread
From: yunhui cui @ 2025-07-18 14:33 UTC (permalink / raw)
To: Alexandre Ghiti
Cc: yury.norov, linux, paul.walmsley, palmer, aou, linux-riscv,
linux-kernel, dennis, tj, cl, linux-mm
Hi Alex,
On Fri, Jul 18, 2025 at 10:23 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
> Hi Yunhui,
>
> On 7/18/25 08:40, yunhui cui wrote:
> > Hi Alex,
> >
> > On Thu, Jul 17, 2025 at 9:06 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
> >> On 7/17/25 15:04, Alexandre Ghiti wrote:
> >>> Hi Yunhui,
> >>>
> >>> On 6/18/25 05:43, Yunhui Cui wrote:
> >>>> Current percpu operations rely on generic implementations, where
> >>>> raw_local_irq_save() introduces substantial overhead. Optimization
> >>>> is achieved through atomic operations and preemption disabling.
> >>>>
> >>>> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> >>>> ---
> >>>> arch/riscv/include/asm/percpu.h | 138 ++++++++++++++++++++++++++++++++
> >>>> 1 file changed, 138 insertions(+)
> >>>> create mode 100644 arch/riscv/include/asm/percpu.h
> >>>>
> >>>> diff --git a/arch/riscv/include/asm/percpu.h
> >>>> b/arch/riscv/include/asm/percpu.h
> >>>> new file mode 100644
> >>>> index 0000000000000..423c0d01f874c
> >>>> --- /dev/null
> >>>> +++ b/arch/riscv/include/asm/percpu.h
> >>>> @@ -0,0 +1,138 @@
> >>>> +/* SPDX-License-Identifier: GPL-2.0-only */
> >>>> +
> >>>> +#ifndef __ASM_PERCPU_H
> >>>> +#define __ASM_PERCPU_H
> >>>> +
> >>>> +#include <linux/preempt.h>
> >>>> +
> >>>> +#define PERCPU_RW_OPS(sz) \
> >>>> +static inline unsigned long __percpu_read_##sz(void *ptr) \
> >>>> +{ \
> >>>> + return READ_ONCE(*(u##sz *)ptr); \
> >>>> +} \
> >>>> + \
> >>>> +static inline void __percpu_write_##sz(void *ptr, unsigned long
> >>>> val) \
> >>>> +{ \
> >>>> + WRITE_ONCE(*(u##sz *)ptr, (u##sz)val); \
> >>>> +}
> >>>> +
> >>>> +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \
> >>>> +static inline void \
> >>>> +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) \
> >>>> +{ \
> >>>> + asm volatile ( \
> >>>> + "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \
> >>>> + : [ptr] "+A" (*(u##sz *)ptr) \
> >>>> + : [val] "r" ((u##sz)(val)) \
> >>>> + : "memory"); \
> >>>> +}
> >>>> +
> >>>> +#define __PERCPU_AMO_RET_OP_CASE(sfx, name, sz, amo_insn) \
> >>>> +static inline u##sz \
> >>>> +__percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long
> >>>> val) \
> >>>> +{ \
> >>>> + register u##sz ret; \
> >>>> + \
> >>>> + asm volatile ( \
> >>>> + "amo" #amo_insn #sfx " %[ret], %[val], %[ptr]" \
> >>>> + : [ptr] "+A" (*(u##sz *)ptr), [ret] "=r" (ret) \
> >>>> + : [val] "r" ((u##sz)(val)) \
> >>>> + : "memory"); \
> >>>> + \
> >>>> + return ret + val; \
> >>>> +}
> >>>> +
> >>>> +#define PERCPU_OP(name, amo_insn) \
> >>>> + __PERCPU_AMO_OP_CASE(.b, name, 8, amo_insn) \
> >>>> + __PERCPU_AMO_OP_CASE(.h, name, 16, amo_insn) \
> >>>> + __PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn) \
> >>>> + __PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn) \
> >>>> +
> >>>> +#define PERCPU_RET_OP(name, amo_insn) \
> >>>> + __PERCPU_AMO_RET_OP_CASE(.b, name, 8, amo_insn) \
> >>>> + __PERCPU_AMO_RET_OP_CASE(.h, name, 16, amo_insn) \
> >>>> + __PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn) \
> >>>> + __PERCPU_AMO_RET_OP_CASE(.d, name, 64, amo_insn)
> >>>> +
> >>>> +PERCPU_RW_OPS(8)
> >>>> +PERCPU_RW_OPS(16)
> >>>> +PERCPU_RW_OPS(32)
> >>>> +PERCPU_RW_OPS(64)
> >>>> +
> >>>> +PERCPU_OP(add, add)
> >>>> +PERCPU_OP(andnot, and)
> >>>> +PERCPU_OP(or, or)
> >>>> +PERCPU_RET_OP(add, add)
> >>>> +
> >>>> +#undef PERCPU_RW_OPS
> >>>> +#undef __PERCPU_AMO_OP_CASE
> >>>> +#undef __PERCPU_AMO_RET_OP_CASE
> >>>> +#undef PERCPU_OP
> >>>> +#undef PERCPU_RET_OP
> >>>> +
> >>>> +#define _pcp_protect(op, pcp, ...) \
> >>>> +({ \
> >>>> + preempt_disable_notrace(); \
> >>>> + op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \
> >>>> + preempt_enable_notrace(); \
> >>>> +})
> >>>> +
> >>>> +#define _pcp_protect_return(op, pcp, args...) \
> >>>> +({ \
> >>>> + typeof(pcp) __retval; \
> >>>> + preempt_disable_notrace(); \
> >>>> + __retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \
> >>>> + preempt_enable_notrace(); \
> >>>> + __retval; \
> >>>> +})
> >>>> +
> >>>> +#define this_cpu_read_1(pcp) _pcp_protect_return(__percpu_read_8, pcp)
> >>>> +#define this_cpu_read_2(pcp) _pcp_protect_return(__percpu_read_16, pcp)
> >>>> +#define this_cpu_read_4(pcp) _pcp_protect_return(__percpu_read_32, pcp)
> >>>> +#define this_cpu_read_8(pcp) _pcp_protect_return(__percpu_read_64, pcp)
> >>>> +
> >>>> +#define this_cpu_write_1(pcp, val) _pcp_protect(__percpu_write_8,
> >>>> pcp, (unsigned long)val)
> >>>> +#define this_cpu_write_2(pcp, val) _pcp_protect(__percpu_write_16,
> >>>> pcp, (unsigned long)val)
> >>>> +#define this_cpu_write_4(pcp, val) _pcp_protect(__percpu_write_32,
> >>>> pcp, (unsigned long)val)
> >>>> +#define this_cpu_write_8(pcp, val) _pcp_protect(__percpu_write_64,
> >>>> pcp, (unsigned long)val)
> >>>> +
> >>>> +#define this_cpu_add_1(pcp, val)
> >>>> _pcp_protect(__percpu_add_amo_case_8, pcp, val)
> >>>> +#define this_cpu_add_2(pcp, val)
> >>>> _pcp_protect(__percpu_add_amo_case_16, pcp, val)
> >>>> +#define this_cpu_add_4(pcp, val)
> >>>> _pcp_protect(__percpu_add_amo_case_32, pcp, val)
> >>>> +#define this_cpu_add_8(pcp, val)
> >>>> _pcp_protect(__percpu_add_amo_case_64, pcp, val)
> >>>> +
> >>>> +#define this_cpu_add_return_1(pcp, val) \
> >>>> +_pcp_protect_return(__percpu_add_return_amo_case_8, pcp, val)
> >>>> +
> >>>> +#define this_cpu_add_return_2(pcp, val) \
> >>>> +_pcp_protect_return(__percpu_add_return_amo_case_16, pcp, val)
> >>>> +
> >>>> +#define this_cpu_add_return_4(pcp, val) \
> >>>> +_pcp_protect_return(__percpu_add_return_amo_case_32, pcp, val)
> >>>> +
> >>>> +#define this_cpu_add_return_8(pcp, val) \
> >>>> +_pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val)
> >>>> +
> >>>> +#define this_cpu_and_1(pcp, val)
> >>>> _pcp_protect(__percpu_andnot_amo_case_8, pcp, ~val)
> >>>> +#define this_cpu_and_2(pcp, val)
> >>>> _pcp_protect(__percpu_andnot_amo_case_16, pcp, ~val)
> >>>> +#define this_cpu_and_4(pcp, val)
> >>>> _pcp_protect(__percpu_andnot_amo_case_32, pcp, ~val)
> >>>> +#define this_cpu_and_8(pcp, val)
> >>>> _pcp_protect(__percpu_andnot_amo_case_64, pcp, ~val)
> >>>
> >>> Why do we define __percpu_andnot based on amoand, and use
> >>> __percpu_andnot with ~val here? Can't we just define __percpu_and?
>
>
> What about that ^?
>
>
> >>>
> >>>> +
> >>>> +#define this_cpu_or_1(pcp, val) _pcp_protect(__percpu_or_amo_case_8,
> >>>> pcp, val)
> >>>> +#define this_cpu_or_2(pcp, val)
> >>>> _pcp_protect(__percpu_or_amo_case_16, pcp, val)
> >>>> +#define this_cpu_or_4(pcp, val)
> >>>> _pcp_protect(__percpu_or_amo_case_32, pcp, val)
> >>>> +#define this_cpu_or_8(pcp, val)
> >>>> _pcp_protect(__percpu_or_amo_case_64, pcp, val)
> >>>> +
> >>>> +#define this_cpu_xchg_1(pcp, val) _pcp_protect_return(xchg_relaxed,
> >>>> pcp, val)
> >>>> +#define this_cpu_xchg_2(pcp, val) _pcp_protect_return(xchg_relaxed,
> >>>> pcp, val)
> >>>> +#define this_cpu_xchg_4(pcp, val) _pcp_protect_return(xchg_relaxed,
> >>>> pcp, val)
> >>>> +#define this_cpu_xchg_8(pcp, val) _pcp_protect_return(xchg_relaxed,
> >>>> pcp, val)
> >>>> +
> >>>> +#define this_cpu_cmpxchg_1(pcp, o, n)
> >>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >>>> +#define this_cpu_cmpxchg_2(pcp, o, n)
> >>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >>>> +#define this_cpu_cmpxchg_4(pcp, o, n)
> >>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >>>> +#define this_cpu_cmpxchg_8(pcp, o, n)
> >>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >>>> +
> >>>> +#include <asm-generic/percpu.h>
> >>>> +
> >>>> +#endif /* __ASM_PERCPU_H */
> >>>
> >>> It all looks good to me, just one thing, can you also implement
> >>> this_cpu_cmpxchg64/128()?
> >>>
> >> One last thing sorry, can you add a cover letter too?
> > Okay.
> >
> >> Thanks!
> >>
> >> Alex
> >>
> >>
> >>> And since this is almost a copy/paste from arm64, either mention it at
> >>> the top of the file or (better) merge both implementations somewhere
> >>> to avoid redefining existing code :) But up to you.
> > Actually, there's a concern here. We should account for scenarios
> > where ZABHA isn't supported. Given that xxx_8() and xxx_16() are
> > rarely used in practice, could we initially support only xxx_32() and
> > xxx_64()? For xxx_8() and xxx_16(), we could default to the generic
> > implementation.
>
>
> Why isn't lr/sc enough?
If I'm not mistaken, the current RISC-V does not support lr.bh/sc.bh,
is that right?
>
>
> >
> >
> >>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> >>>
> >>> Thanks,
> >>>
> >>> Alex
> >>>
> >>>
> >>>
> > Thanks,
> > Yunhui
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
Thanks,
Yunhui
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] Re: [PATCH RFC 2/2] riscv: introduce percpu.h into include/asm
2025-07-18 14:33 ` yunhui cui
@ 2025-08-07 14:54 ` Alexandre Ghiti
2025-08-19 12:45 ` yunhui cui
0 siblings, 1 reply; 10+ messages in thread
From: Alexandre Ghiti @ 2025-08-07 14:54 UTC (permalink / raw)
To: yunhui cui
Cc: yury.norov, linux, paul.walmsley, palmer, aou, linux-riscv,
linux-kernel, dennis, tj, cl, linux-mm
Hi Yunhui,
On 7/18/25 16:33, yunhui cui wrote:
> Hi Alex,
>
> On Fri, Jul 18, 2025 at 10:23 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>> Hi Yunhui,
>>
>> On 7/18/25 08:40, yunhui cui wrote:
>>> Hi Alex,
>>>
>>> On Thu, Jul 17, 2025 at 9:06 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>>>> On 7/17/25 15:04, Alexandre Ghiti wrote:
>>>>> Hi Yunhui,
>>>>>
>>>>> On 6/18/25 05:43, Yunhui Cui wrote:
>>>>>> Current percpu operations rely on generic implementations, where
>>>>>> raw_local_irq_save() introduces substantial overhead. Optimization
>>>>>> is achieved through atomic operations and preemption disabling.
>>>>>>
>>>>>> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
>>>>>> ---
>>>>>> arch/riscv/include/asm/percpu.h | 138 ++++++++++++++++++++++++++++++++
>>>>>> 1 file changed, 138 insertions(+)
>>>>>> create mode 100644 arch/riscv/include/asm/percpu.h
>>>>>>
>>>>>> diff --git a/arch/riscv/include/asm/percpu.h
>>>>>> b/arch/riscv/include/asm/percpu.h
>>>>>> new file mode 100644
>>>>>> index 0000000000000..423c0d01f874c
>>>>>> --- /dev/null
>>>>>> +++ b/arch/riscv/include/asm/percpu.h
>>>>>> @@ -0,0 +1,138 @@
>>>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>>>>> +
>>>>>> +#ifndef __ASM_PERCPU_H
>>>>>> +#define __ASM_PERCPU_H
>>>>>> +
>>>>>> +#include <linux/preempt.h>
>>>>>> +
>>>>>> +#define PERCPU_RW_OPS(sz) \
>>>>>> +static inline unsigned long __percpu_read_##sz(void *ptr) \
>>>>>> +{ \
>>>>>> + return READ_ONCE(*(u##sz *)ptr); \
>>>>>> +} \
>>>>>> + \
>>>>>> +static inline void __percpu_write_##sz(void *ptr, unsigned long
>>>>>> val) \
>>>>>> +{ \
>>>>>> + WRITE_ONCE(*(u##sz *)ptr, (u##sz)val); \
>>>>>> +}
>>>>>> +
>>>>>> +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \
>>>>>> +static inline void \
>>>>>> +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) \
>>>>>> +{ \
>>>>>> + asm volatile ( \
>>>>>> + "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \
>>>>>> + : [ptr] "+A" (*(u##sz *)ptr) \
>>>>>> + : [val] "r" ((u##sz)(val)) \
>>>>>> + : "memory"); \
>>>>>> +}
>>>>>> +
>>>>>> +#define __PERCPU_AMO_RET_OP_CASE(sfx, name, sz, amo_insn) \
>>>>>> +static inline u##sz \
>>>>>> +__percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long
>>>>>> val) \
>>>>>> +{ \
>>>>>> + register u##sz ret; \
>>>>>> + \
>>>>>> + asm volatile ( \
>>>>>> + "amo" #amo_insn #sfx " %[ret], %[val], %[ptr]" \
>>>>>> + : [ptr] "+A" (*(u##sz *)ptr), [ret] "=r" (ret) \
>>>>>> + : [val] "r" ((u##sz)(val)) \
>>>>>> + : "memory"); \
>>>>>> + \
>>>>>> + return ret + val; \
>>>>>> +}
>>>>>> +
>>>>>> +#define PERCPU_OP(name, amo_insn) \
>>>>>> + __PERCPU_AMO_OP_CASE(.b, name, 8, amo_insn) \
>>>>>> + __PERCPU_AMO_OP_CASE(.h, name, 16, amo_insn) \
>>>>>> + __PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn) \
>>>>>> + __PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn) \
>>>>>> +
>>>>>> +#define PERCPU_RET_OP(name, amo_insn) \
>>>>>> + __PERCPU_AMO_RET_OP_CASE(.b, name, 8, amo_insn) \
>>>>>> + __PERCPU_AMO_RET_OP_CASE(.h, name, 16, amo_insn) \
>>>>>> + __PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn) \
>>>>>> + __PERCPU_AMO_RET_OP_CASE(.d, name, 64, amo_insn)
>>>>>> +
>>>>>> +PERCPU_RW_OPS(8)
>>>>>> +PERCPU_RW_OPS(16)
>>>>>> +PERCPU_RW_OPS(32)
>>>>>> +PERCPU_RW_OPS(64)
>>>>>> +
>>>>>> +PERCPU_OP(add, add)
>>>>>> +PERCPU_OP(andnot, and)
>>>>>> +PERCPU_OP(or, or)
>>>>>> +PERCPU_RET_OP(add, add)
>>>>>> +
>>>>>> +#undef PERCPU_RW_OPS
>>>>>> +#undef __PERCPU_AMO_OP_CASE
>>>>>> +#undef __PERCPU_AMO_RET_OP_CASE
>>>>>> +#undef PERCPU_OP
>>>>>> +#undef PERCPU_RET_OP
>>>>>> +
>>>>>> +#define _pcp_protect(op, pcp, ...) \
>>>>>> +({ \
>>>>>> + preempt_disable_notrace(); \
>>>>>> + op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \
>>>>>> + preempt_enable_notrace(); \
>>>>>> +})
>>>>>> +
>>>>>> +#define _pcp_protect_return(op, pcp, args...) \
>>>>>> +({ \
>>>>>> + typeof(pcp) __retval; \
>>>>>> + preempt_disable_notrace(); \
>>>>>> + __retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \
>>>>>> + preempt_enable_notrace(); \
>>>>>> + __retval; \
>>>>>> +})
>>>>>> +
>>>>>> +#define this_cpu_read_1(pcp) _pcp_protect_return(__percpu_read_8, pcp)
>>>>>> +#define this_cpu_read_2(pcp) _pcp_protect_return(__percpu_read_16, pcp)
>>>>>> +#define this_cpu_read_4(pcp) _pcp_protect_return(__percpu_read_32, pcp)
>>>>>> +#define this_cpu_read_8(pcp) _pcp_protect_return(__percpu_read_64, pcp)
>>>>>> +
>>>>>> +#define this_cpu_write_1(pcp, val) _pcp_protect(__percpu_write_8,
>>>>>> pcp, (unsigned long)val)
>>>>>> +#define this_cpu_write_2(pcp, val) _pcp_protect(__percpu_write_16,
>>>>>> pcp, (unsigned long)val)
>>>>>> +#define this_cpu_write_4(pcp, val) _pcp_protect(__percpu_write_32,
>>>>>> pcp, (unsigned long)val)
>>>>>> +#define this_cpu_write_8(pcp, val) _pcp_protect(__percpu_write_64,
>>>>>> pcp, (unsigned long)val)
>>>>>> +
>>>>>> +#define this_cpu_add_1(pcp, val)
>>>>>> _pcp_protect(__percpu_add_amo_case_8, pcp, val)
>>>>>> +#define this_cpu_add_2(pcp, val)
>>>>>> _pcp_protect(__percpu_add_amo_case_16, pcp, val)
>>>>>> +#define this_cpu_add_4(pcp, val)
>>>>>> _pcp_protect(__percpu_add_amo_case_32, pcp, val)
>>>>>> +#define this_cpu_add_8(pcp, val)
>>>>>> _pcp_protect(__percpu_add_amo_case_64, pcp, val)
>>>>>> +
>>>>>> +#define this_cpu_add_return_1(pcp, val) \
>>>>>> +_pcp_protect_return(__percpu_add_return_amo_case_8, pcp, val)
>>>>>> +
>>>>>> +#define this_cpu_add_return_2(pcp, val) \
>>>>>> +_pcp_protect_return(__percpu_add_return_amo_case_16, pcp, val)
>>>>>> +
>>>>>> +#define this_cpu_add_return_4(pcp, val) \
>>>>>> +_pcp_protect_return(__percpu_add_return_amo_case_32, pcp, val)
>>>>>> +
>>>>>> +#define this_cpu_add_return_8(pcp, val) \
>>>>>> +_pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val)
>>>>>> +
>>>>>> +#define this_cpu_and_1(pcp, val)
>>>>>> _pcp_protect(__percpu_andnot_amo_case_8, pcp, ~val)
>>>>>> +#define this_cpu_and_2(pcp, val)
>>>>>> _pcp_protect(__percpu_andnot_amo_case_16, pcp, ~val)
>>>>>> +#define this_cpu_and_4(pcp, val)
>>>>>> _pcp_protect(__percpu_andnot_amo_case_32, pcp, ~val)
>>>>>> +#define this_cpu_and_8(pcp, val)
>>>>>> _pcp_protect(__percpu_andnot_amo_case_64, pcp, ~val)
>>>>> Why do we define __percpu_andnot based on amoand, and use
>>>>> __percpu_andnot with ~val here? Can't we just define __percpu_and?
>>
>> What about that ^?
>>
>>
>>>>>> +
>>>>>> +#define this_cpu_or_1(pcp, val) _pcp_protect(__percpu_or_amo_case_8,
>>>>>> pcp, val)
>>>>>> +#define this_cpu_or_2(pcp, val)
>>>>>> _pcp_protect(__percpu_or_amo_case_16, pcp, val)
>>>>>> +#define this_cpu_or_4(pcp, val)
>>>>>> _pcp_protect(__percpu_or_amo_case_32, pcp, val)
>>>>>> +#define this_cpu_or_8(pcp, val)
>>>>>> _pcp_protect(__percpu_or_amo_case_64, pcp, val)
>>>>>> +
>>>>>> +#define this_cpu_xchg_1(pcp, val) _pcp_protect_return(xchg_relaxed,
>>>>>> pcp, val)
>>>>>> +#define this_cpu_xchg_2(pcp, val) _pcp_protect_return(xchg_relaxed,
>>>>>> pcp, val)
>>>>>> +#define this_cpu_xchg_4(pcp, val) _pcp_protect_return(xchg_relaxed,
>>>>>> pcp, val)
>>>>>> +#define this_cpu_xchg_8(pcp, val) _pcp_protect_return(xchg_relaxed,
>>>>>> pcp, val)
>>>>>> +
>>>>>> +#define this_cpu_cmpxchg_1(pcp, o, n)
>>>>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>>>>>> +#define this_cpu_cmpxchg_2(pcp, o, n)
>>>>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>>>>>> +#define this_cpu_cmpxchg_4(pcp, o, n)
>>>>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>>>>>> +#define this_cpu_cmpxchg_8(pcp, o, n)
>>>>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>>>>>> +
>>>>>> +#include <asm-generic/percpu.h>
>>>>>> +
>>>>>> +#endif /* __ASM_PERCPU_H */
>>>>> It all looks good to me, just one thing, can you also implement
>>>>> this_cpu_cmpxchg64/128()?
>>>>>
>>>> One last thing sorry, can you add a cover letter too?
>>> Okay.
>>>
>>>> Thanks!
>>>>
>>>> Alex
>>>>
>>>>
>>>>> And since this is almost a copy/paste from arm64, either mention it at
>>>>> the top of the file or (better) merge both implementations somewhere
>>>>> to avoid redefining existing code :) But up to you.
>>> Actually, there's a concern here. We should account for scenarios
>>> where ZABHA isn't supported. Given that xxx_8() and xxx_16() are
>>> rarely used in practice, could we initially support only xxx_32() and
>>> xxx_64()? For xxx_8() and xxx_16(), we could default to the generic
>>> implementation.
>>
>> Why isn't lr/sc enough?
> If I'm not mistaken, the current RISC-V does not support lr.bh/sc.bh,
> is that right?
Yes, that's right, but we have an implementation of cmpxchg[8|16]() that
uses lr.w/sc.w which works (unless I missed something, I have just
checked again), so I think that's alright no?
Thanks,
Alex
>
>>
>>>
>>>>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Alex
>>>>>
>>>>>
>>>>>
>>> Thanks,
>>> Yunhui
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> Thanks,
> Yunhui
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] Re: [PATCH RFC 2/2] riscv: introduce percpu.h into include/asm
2025-08-07 14:54 ` Alexandre Ghiti
@ 2025-08-19 12:45 ` yunhui cui
0 siblings, 0 replies; 10+ messages in thread
From: yunhui cui @ 2025-08-19 12:45 UTC (permalink / raw)
To: Alexandre Ghiti
Cc: yury.norov, linux, paul.walmsley, palmer, aou, linux-riscv,
linux-kernel, dennis, tj, cl, linux-mm
Hi Alex,
On Thu, Aug 7, 2025 at 10:55 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
> Hi Yunhui,
>
> On 7/18/25 16:33, yunhui cui wrote:
> > Hi Alex,
> >
> > On Fri, Jul 18, 2025 at 10:23 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
> >> Hi Yunhui,
> >>
> >> On 7/18/25 08:40, yunhui cui wrote:
> >>> Hi Alex,
> >>>
> >>> On Thu, Jul 17, 2025 at 9:06 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
> >>>> On 7/17/25 15:04, Alexandre Ghiti wrote:
> >>>>> Hi Yunhui,
> >>>>>
> >>>>> On 6/18/25 05:43, Yunhui Cui wrote:
> >>>>>> Current percpu operations rely on generic implementations, where
> >>>>>> raw_local_irq_save() introduces substantial overhead. Optimization
> >>>>>> is achieved through atomic operations and preemption disabling.
> >>>>>>
> >>>>>> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> >>>>>> ---
> >>>>>> arch/riscv/include/asm/percpu.h | 138 ++++++++++++++++++++++++++++++++
> >>>>>> 1 file changed, 138 insertions(+)
> >>>>>> create mode 100644 arch/riscv/include/asm/percpu.h
> >>>>>>
> >>>>>> diff --git a/arch/riscv/include/asm/percpu.h
> >>>>>> b/arch/riscv/include/asm/percpu.h
> >>>>>> new file mode 100644
> >>>>>> index 0000000000000..423c0d01f874c
> >>>>>> --- /dev/null
> >>>>>> +++ b/arch/riscv/include/asm/percpu.h
> >>>>>> @@ -0,0 +1,138 @@
> >>>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
> >>>>>> +
> >>>>>> +#ifndef __ASM_PERCPU_H
> >>>>>> +#define __ASM_PERCPU_H
> >>>>>> +
> >>>>>> +#include <linux/preempt.h>
> >>>>>> +
> >>>>>> +#define PERCPU_RW_OPS(sz) \
> >>>>>> +static inline unsigned long __percpu_read_##sz(void *ptr) \
> >>>>>> +{ \
> >>>>>> + return READ_ONCE(*(u##sz *)ptr); \
> >>>>>> +} \
> >>>>>> + \
> >>>>>> +static inline void __percpu_write_##sz(void *ptr, unsigned long
> >>>>>> val) \
> >>>>>> +{ \
> >>>>>> + WRITE_ONCE(*(u##sz *)ptr, (u##sz)val); \
> >>>>>> +}
> >>>>>> +
> >>>>>> +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \
> >>>>>> +static inline void \
> >>>>>> +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) \
> >>>>>> +{ \
> >>>>>> + asm volatile ( \
> >>>>>> + "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \
> >>>>>> + : [ptr] "+A" (*(u##sz *)ptr) \
> >>>>>> + : [val] "r" ((u##sz)(val)) \
> >>>>>> + : "memory"); \
> >>>>>> +}
> >>>>>> +
> >>>>>> +#define __PERCPU_AMO_RET_OP_CASE(sfx, name, sz, amo_insn) \
> >>>>>> +static inline u##sz \
> >>>>>> +__percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long
> >>>>>> val) \
> >>>>>> +{ \
> >>>>>> + register u##sz ret; \
> >>>>>> + \
> >>>>>> + asm volatile ( \
> >>>>>> + "amo" #amo_insn #sfx " %[ret], %[val], %[ptr]" \
> >>>>>> + : [ptr] "+A" (*(u##sz *)ptr), [ret] "=r" (ret) \
> >>>>>> + : [val] "r" ((u##sz)(val)) \
> >>>>>> + : "memory"); \
> >>>>>> + \
> >>>>>> + return ret + val; \
> >>>>>> +}
> >>>>>> +
> >>>>>> +#define PERCPU_OP(name, amo_insn) \
> >>>>>> + __PERCPU_AMO_OP_CASE(.b, name, 8, amo_insn) \
> >>>>>> + __PERCPU_AMO_OP_CASE(.h, name, 16, amo_insn) \
> >>>>>> + __PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn) \
> >>>>>> + __PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn) \
> >>>>>> +
> >>>>>> +#define PERCPU_RET_OP(name, amo_insn) \
> >>>>>> + __PERCPU_AMO_RET_OP_CASE(.b, name, 8, amo_insn) \
> >>>>>> + __PERCPU_AMO_RET_OP_CASE(.h, name, 16, amo_insn) \
> >>>>>> + __PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn) \
> >>>>>> + __PERCPU_AMO_RET_OP_CASE(.d, name, 64, amo_insn)
> >>>>>> +
> >>>>>> +PERCPU_RW_OPS(8)
> >>>>>> +PERCPU_RW_OPS(16)
> >>>>>> +PERCPU_RW_OPS(32)
> >>>>>> +PERCPU_RW_OPS(64)
> >>>>>> +
> >>>>>> +PERCPU_OP(add, add)
> >>>>>> +PERCPU_OP(andnot, and)
> >>>>>> +PERCPU_OP(or, or)
> >>>>>> +PERCPU_RET_OP(add, add)
> >>>>>> +
> >>>>>> +#undef PERCPU_RW_OPS
> >>>>>> +#undef __PERCPU_AMO_OP_CASE
> >>>>>> +#undef __PERCPU_AMO_RET_OP_CASE
> >>>>>> +#undef PERCPU_OP
> >>>>>> +#undef PERCPU_RET_OP
> >>>>>> +
> >>>>>> +#define _pcp_protect(op, pcp, ...) \
> >>>>>> +({ \
> >>>>>> + preempt_disable_notrace(); \
> >>>>>> + op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \
> >>>>>> + preempt_enable_notrace(); \
> >>>>>> +})
> >>>>>> +
> >>>>>> +#define _pcp_protect_return(op, pcp, args...) \
> >>>>>> +({ \
> >>>>>> + typeof(pcp) __retval; \
> >>>>>> + preempt_disable_notrace(); \
> >>>>>> + __retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \
> >>>>>> + preempt_enable_notrace(); \
> >>>>>> + __retval; \
> >>>>>> +})
> >>>>>> +
> >>>>>> +#define this_cpu_read_1(pcp) _pcp_protect_return(__percpu_read_8, pcp)
> >>>>>> +#define this_cpu_read_2(pcp) _pcp_protect_return(__percpu_read_16, pcp)
> >>>>>> +#define this_cpu_read_4(pcp) _pcp_protect_return(__percpu_read_32, pcp)
> >>>>>> +#define this_cpu_read_8(pcp) _pcp_protect_return(__percpu_read_64, pcp)
> >>>>>> +
> >>>>>> +#define this_cpu_write_1(pcp, val) _pcp_protect(__percpu_write_8,
> >>>>>> pcp, (unsigned long)val)
> >>>>>> +#define this_cpu_write_2(pcp, val) _pcp_protect(__percpu_write_16,
> >>>>>> pcp, (unsigned long)val)
> >>>>>> +#define this_cpu_write_4(pcp, val) _pcp_protect(__percpu_write_32,
> >>>>>> pcp, (unsigned long)val)
> >>>>>> +#define this_cpu_write_8(pcp, val) _pcp_protect(__percpu_write_64,
> >>>>>> pcp, (unsigned long)val)
> >>>>>> +
> >>>>>> +#define this_cpu_add_1(pcp, val)
> >>>>>> _pcp_protect(__percpu_add_amo_case_8, pcp, val)
> >>>>>> +#define this_cpu_add_2(pcp, val)
> >>>>>> _pcp_protect(__percpu_add_amo_case_16, pcp, val)
> >>>>>> +#define this_cpu_add_4(pcp, val)
> >>>>>> _pcp_protect(__percpu_add_amo_case_32, pcp, val)
> >>>>>> +#define this_cpu_add_8(pcp, val)
> >>>>>> _pcp_protect(__percpu_add_amo_case_64, pcp, val)
> >>>>>> +
> >>>>>> +#define this_cpu_add_return_1(pcp, val) \
> >>>>>> +_pcp_protect_return(__percpu_add_return_amo_case_8, pcp, val)
> >>>>>> +
> >>>>>> +#define this_cpu_add_return_2(pcp, val) \
> >>>>>> +_pcp_protect_return(__percpu_add_return_amo_case_16, pcp, val)
> >>>>>> +
> >>>>>> +#define this_cpu_add_return_4(pcp, val) \
> >>>>>> +_pcp_protect_return(__percpu_add_return_amo_case_32, pcp, val)
> >>>>>> +
> >>>>>> +#define this_cpu_add_return_8(pcp, val) \
> >>>>>> +_pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val)
> >>>>>> +
> >>>>>> +#define this_cpu_and_1(pcp, val)
> >>>>>> _pcp_protect(__percpu_andnot_amo_case_8, pcp, ~val)
> >>>>>> +#define this_cpu_and_2(pcp, val)
> >>>>>> _pcp_protect(__percpu_andnot_amo_case_16, pcp, ~val)
> >>>>>> +#define this_cpu_and_4(pcp, val)
> >>>>>> _pcp_protect(__percpu_andnot_amo_case_32, pcp, ~val)
> >>>>>> +#define this_cpu_and_8(pcp, val)
> >>>>>> _pcp_protect(__percpu_andnot_amo_case_64, pcp, ~val)
> >>>>> Why do we define __percpu_andnot based on amoand, and use
> >>>>> __percpu_andnot with ~val here? Can't we just define __percpu_and?
> >>
> >> What about that ^?
> >>
> >>
> >>>>>> +
> >>>>>> +#define this_cpu_or_1(pcp, val) _pcp_protect(__percpu_or_amo_case_8,
> >>>>>> pcp, val)
> >>>>>> +#define this_cpu_or_2(pcp, val)
> >>>>>> _pcp_protect(__percpu_or_amo_case_16, pcp, val)
> >>>>>> +#define this_cpu_or_4(pcp, val)
> >>>>>> _pcp_protect(__percpu_or_amo_case_32, pcp, val)
> >>>>>> +#define this_cpu_or_8(pcp, val)
> >>>>>> _pcp_protect(__percpu_or_amo_case_64, pcp, val)
> >>>>>> +
> >>>>>> +#define this_cpu_xchg_1(pcp, val) _pcp_protect_return(xchg_relaxed,
> >>>>>> pcp, val)
> >>>>>> +#define this_cpu_xchg_2(pcp, val) _pcp_protect_return(xchg_relaxed,
> >>>>>> pcp, val)
> >>>>>> +#define this_cpu_xchg_4(pcp, val) _pcp_protect_return(xchg_relaxed,
> >>>>>> pcp, val)
> >>>>>> +#define this_cpu_xchg_8(pcp, val) _pcp_protect_return(xchg_relaxed,
> >>>>>> pcp, val)
> >>>>>> +
> >>>>>> +#define this_cpu_cmpxchg_1(pcp, o, n)
> >>>>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >>>>>> +#define this_cpu_cmpxchg_2(pcp, o, n)
> >>>>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >>>>>> +#define this_cpu_cmpxchg_4(pcp, o, n)
> >>>>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >>>>>> +#define this_cpu_cmpxchg_8(pcp, o, n)
> >>>>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> >>>>>> +
> >>>>>> +#include <asm-generic/percpu.h>
> >>>>>> +
> >>>>>> +#endif /* __ASM_PERCPU_H */
> >>>>> It all looks good to me, just one thing, can you also implement
> >>>>> this_cpu_cmpxchg64/128()?
> >>>>>
> >>>> One last thing sorry, can you add a cover letter too?
> >>> Okay.
> >>>
> >>>> Thanks!
> >>>>
> >>>> Alex
> >>>>
> >>>>
> >>>>> And since this is almost a copy/paste from arm64, either mention it at
> >>>>> the top of the file or (better) merge both implementations somewhere
> >>>>> to avoid redefining existing code :) But up to you.
> >>> Actually, there's a concern here. We should account for scenarios
> >>> where ZABHA isn't supported. Given that xxx_8() and xxx_16() are
> >>> rarely used in practice, could we initially support only xxx_32() and
> >>> xxx_64()? For xxx_8() and xxx_16(), we could default to the generic
> >>> implementation.
> >>
> >> Why isn't lr/sc enough?
> > If I'm not mistaken, the current RISC-V does not support lr.bh/sc.bh,
> > is that right?
>
>
> Yes, that's right, but we have an implementation of cmpxchg[8|16]() that
> uses lr.w/sc.w which works (unless I missed something, I have just
> checked again), so I think that's alright no?
Since RISC-V does not support lr/sc.b/h, when ZABHA is not supported,
we need to use lr/sc.w instead, which requires some additional mask
operations. In fact, 8/16-bit percpu operations are very few. The
number of system startup operations is as follows:
8-bit Adds: 3, 16-bit Adds: 3, 32-bit Adds: 31858, 64-bit Adds: 7656
8-bit Add-Returns: 0, 16-bit Add-Returns: 0, 32-bit Add-Returns: 0,
64-bit Add-Returns: 2
8-bit ANDs: 0, 16-bit ANDs: 0, 32-bit ANDs: 0, 64-bit ANDs: 0
8-bit ANDNOTs: 0, 16-bit ANDNOTs: 0, 32-bit ANDNOTs: 0, 64-bit ANDNOTs: 0
8-bit ORs: 0, 16-bit ORs: 0, 32-bit ORs: 70, 64-bit ORs: 0
So, when ZABHA is not supported, can 8bit/16bit operations directly
fall back to the generic implementation?
>
> Thanks,
>
> Alex
>
>
> >
> >>
> >>>
> >>>>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Alex
> >>>>>
> >>>>>
> >>>>>
> >>> Thanks,
> >>> Yunhui
> >>>
> >>> _______________________________________________
> >>> linux-riscv mailing list
> >>> linux-riscv@lists.infradead.org
> >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> > Thanks,
> > Yunhui
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
Thanks,
Yunhui
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-08-19 15:45 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18 3:43 [PATCH RFC 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui
2025-06-18 3:43 ` [PATCH RFC 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
2025-07-17 13:04 ` Alexandre Ghiti
2025-07-17 13:05 ` Alexandre Ghiti
2025-07-18 6:40 ` [External] " yunhui cui
2025-07-18 14:22 ` Alexandre Ghiti
2025-07-18 14:33 ` yunhui cui
2025-08-07 14:54 ` Alexandre Ghiti
2025-08-19 12:45 ` yunhui cui
2025-06-21 14:33 ` [PATCH RFC 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h Yury Norov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).