* [PATCH 0/2] riscv: introduce percpu.h
@ 2025-08-19 13:50 Yunhui Cui
2025-08-19 13:50 ` [PATCH 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui
2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
0 siblings, 2 replies; 7+ messages in thread
From: Yunhui Cui @ 2025-08-19 13:50 UTC (permalink / raw)
To: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv,
linux-kernel, dennis, tj, cl, linux-mm
Cc: Yunhui Cui
Current per-CPU operations rely on generic code using raw_local_irq_save(),
which incurs significant overhead. This patch optimizes 32/64-bit paths with
RISC-V atomic instructions, reducing overhead.
RISC-V lacks lr/sc.b/h support; without ZABHA, emulating 8/16-bit operations
via lr/sc.w would require complex mask logic. However, data shows 8/16-bit
per-CPU operations are extremely rare (single-digit counts in boot and
hackbench tests). Thus, we let 8/16-bit ops fall back to the generic
implementation, avoiding unnecessary complexity. 32/64-bit ops use direct
atomic instructions for performance.
Yunhui Cui (2):
riscv: remove irqflags.h inclusion in asm/bitops.h
riscv: introduce percpu.h into include/asm
arch/riscv/include/asm/bitops.h | 1 -
arch/riscv/include/asm/percpu.h | 138 ++++++++++++++++++++++++++++++++
2 files changed, 138 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/include/asm/percpu.h
--
2.39.5
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h
2025-08-19 13:50 [PATCH 0/2] riscv: introduce percpu.h Yunhui Cui
@ 2025-08-19 13:50 ` Yunhui Cui
2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
1 sibling, 0 replies; 7+ messages in thread
From: Yunhui Cui @ 2025-08-19 13:50 UTC (permalink / raw)
To: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv,
linux-kernel, dennis, tj, cl, linux-mm
Cc: Yunhui Cui
The arch/riscv/include/asm/bitops.h does not functionally require
including /linux/irqflags.h. Additionally, adding
arch/riscv/include/asm/percpu.h causes a circular inclusion:
kernel/bounds.c
->include/linux/log2.h
->include/linux/bitops.h
->arch/riscv/include/asm/bitops.h
->include/linux/irqflags.h
->include/linux/find.h
->return val ? __ffs(val) : size;
->arch/riscv/include/asm/bitops.h
The compilation log is as follows:
CC kernel/bounds.s
In file included from ./include/linux/bitmap.h:11,
from ./include/linux/cpumask.h:12,
from ./arch/riscv/include/asm/processor.h:55,
from ./arch/riscv/include/asm/thread_info.h:42,
from ./include/linux/thread_info.h:60,
from ./include/asm-generic/preempt.h:5,
from ./arch/riscv/include/generated/asm/preempt.h:1,
from ./include/linux/preempt.h:79,
from ./arch/riscv/include/asm/percpu.h:8,
from ./include/linux/irqflags.h:19,
from ./arch/riscv/include/asm/bitops.h:14,
from ./include/linux/bitops.h:68,
from ./include/linux/log2.h:12,
from kernel/bounds.c:13:
./include/linux/find.h: In function 'find_next_bit':
./include/linux/find.h:66:30: error: implicit declaration of function '__ffs' [-Wimplicit-function-declaration]
66 | return val ? __ffs(val) : size;
| ^~~~~
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
arch/riscv/include/asm/bitops.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/riscv/include/asm/bitops.h b/arch/riscv/include/asm/bitops.h
index d59310f74c2ba..d9837b3cf05fe 100644
--- a/arch/riscv/include/asm/bitops.h
+++ b/arch/riscv/include/asm/bitops.h
@@ -11,7 +11,6 @@
#endif /* _LINUX_BITOPS_H */
#include <linux/compiler.h>
-#include <linux/irqflags.h>
#include <asm/barrier.h>
#include <asm/bitsperlong.h>
--
2.39.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] riscv: introduce percpu.h into include/asm
2025-08-19 13:50 [PATCH 0/2] riscv: introduce percpu.h Yunhui Cui
2025-08-19 13:50 ` [PATCH 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui
@ 2025-08-19 13:50 ` Yunhui Cui
2025-08-20 6:44 ` kernel test robot
` (2 more replies)
1 sibling, 3 replies; 7+ messages in thread
From: Yunhui Cui @ 2025-08-19 13:50 UTC (permalink / raw)
To: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv,
linux-kernel, dennis, tj, cl, linux-mm
Cc: Yunhui Cui
Current percpu operations rely on generic implementations, where
raw_local_irq_save() introduces substantial overhead. Optimization
is achieved through atomic operations and preemption disabling.
Since RISC-V does not support lr/sc.b/h, when ZABHA is not supported,
we need to use lr/sc.w instead, which requires some additional mask
operations. In fact, 8/16-bit per-CPU operations are very few. The
counts during system startup are as follows:
Reads: 8-bit: 3, 16-bit: 3, 32-bit: 1531, 64-bit: 471
Writes: 8-bit: 4, 16-bit: 3, 32-bit: 32, 64-bit: 238
Adds: 8-bit: 3, 16-bit: 3, 32-bit: 31858, 64-bit: 7656
Add-Returns: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 2
ANDs: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 0
ANDNOTs: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 0
ORs: 8-bit: 0, 16-bit: 0, 32-bit: 70, 64-bit: 0
hackbench -l 1000:
Reads: 8-bit: 3, 16-bit: 3, 32-bit: 1531, 64-bit: 2522158
Writes: 8-bit: 4, 16-bit: 3, 32-bit: 34, 64-bit: 2521522
Adds: 8-bit: 3, 16-bit: 3, 32-bit: 47771, 64-bit: 19911
Add-Returns: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 2
ANDs: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 0
ANDNOTs: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 0
ORs: 8-bit: 0, 16-bit: 0, 32-bit: 70, 64-bit: 0
Based on this, 8bit/16bit per-CPU operations can directly fall back to
the generic implementation.
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
arch/riscv/include/asm/percpu.h | 138 ++++++++++++++++++++++++++++++++
1 file changed, 138 insertions(+)
create mode 100644 arch/riscv/include/asm/percpu.h
diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percpu.h
new file mode 100644
index 0000000000000..5a1fdb37a8056
--- /dev/null
+++ b/arch/riscv/include/asm/percpu.h
@@ -0,0 +1,138 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_PERCPU_H
+#define __ASM_PERCPU_H
+
+#include <linux/preempt.h>
+
+#define PERCPU_RW_OPS(sz) \
+static inline unsigned long __percpu_read_##sz(void *ptr) \
+{ \
+ return READ_ONCE(*(u##sz *)ptr); \
+} \
+ \
+static inline void __percpu_write_##sz(void *ptr, unsigned long val) \
+{ \
+ WRITE_ONCE(*(u##sz *)ptr, (u##sz)val); \
+}
+
+#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \
+static inline void \
+__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) \
+{ \
+ asm volatile ( \
+ "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \
+ : [ptr] "+A" (*(u##sz *)ptr) \
+ : [val] "r" ((u##sz)(val)) \
+ : "memory"); \
+}
+
+#define __PERCPU_AMO_RET_OP_CASE(sfx, name, sz, amo_insn) \
+static inline u##sz \
+__percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long val) \
+{ \
+ register u##sz ret; \
+ \
+ asm volatile ( \
+ "amo" #amo_insn #sfx " %[ret], %[val], %[ptr]" \
+ : [ptr] "+A" (*(u##sz *)ptr), [ret] "=r" (ret) \
+ : [val] "r" ((u##sz)(val)) \
+ : "memory"); \
+ \
+ return ret + val; \
+}
+
+#define PERCPU_OP(name, amo_insn) \
+ __PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn) \
+ __PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn)
+
+#define PERCPU_RET_OP(name, amo_insn) \
+ __PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn) \
+ __PERCPU_AMO_RET_OP_CASE(.d, name, 64, amo_insn)
+
+PERCPU_RW_OPS(8)
+PERCPU_RW_OPS(16)
+PERCPU_RW_OPS(32)
+PERCPU_RW_OPS(64)
+
+PERCPU_OP(add, add)
+PERCPU_OP(andnot, and)
+PERCPU_OP(or, or)
+PERCPU_RET_OP(add, add)
+
+#undef PERCPU_RW_OPS
+#undef __PERCPU_AMO_OP_CASE
+#undef __PERCPU_AMO_RET_OP_CASE
+#undef PERCPU_OP
+#undef PERCPU_RET_OP
+
+#define _pcp_protect(op, pcp, ...) \
+({ \
+ preempt_disable_notrace(); \
+ op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \
+ preempt_enable_notrace(); \
+})
+
+#define _pcp_protect_return(op, pcp, args...) \
+({ \
+ typeof(pcp) __retval; \
+ preempt_disable_notrace(); \
+ __retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \
+ preempt_enable_notrace(); \
+ __retval; \
+})
+
+#define this_cpu_read_1(pcp) _pcp_protect_return(__percpu_read_8, pcp)
+#define this_cpu_read_2(pcp) _pcp_protect_return(__percpu_read_16, pcp)
+#define this_cpu_read_4(pcp) _pcp_protect_return(__percpu_read_32, pcp)
+#define this_cpu_read_8(pcp) _pcp_protect_return(__percpu_read_64, pcp)
+
+#define this_cpu_write_1(pcp, val) _pcp_protect(__percpu_write_8, pcp, (unsigned long)val)
+#define this_cpu_write_2(pcp, val) _pcp_protect(__percpu_write_16, pcp, (unsigned long)val)
+#define this_cpu_write_4(pcp, val) _pcp_protect(__percpu_write_32, pcp, (unsigned long)val)
+#define this_cpu_write_8(pcp, val) _pcp_protect(__percpu_write_64, pcp, (unsigned long)val)
+
+#define this_cpu_add_4(pcp, val) _pcp_protect(__percpu_add_amo_case_32, pcp, val)
+#define this_cpu_add_8(pcp, val) _pcp_protect(__percpu_add_amo_case_64, pcp, val)
+
+#define this_cpu_add_return_4(pcp, val) \
+_pcp_protect_return(__percpu_add_return_amo_case_32, pcp, val)
+
+#define this_cpu_add_return_8(pcp, val) \
+_pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val)
+
+#define this_cpu_and_4(pcp, val) _pcp_protect(__percpu_andnot_amo_case_32, pcp, ~val)
+#define this_cpu_and_8(pcp, val) _pcp_protect(__percpu_andnot_amo_case_64, pcp, ~val)
+
+#define this_cpu_or_4(pcp, val) _pcp_protect(__percpu_or_amo_case_32, pcp, val)
+#define this_cpu_or_8(pcp, val) _pcp_protect(__percpu_or_amo_case_64, pcp, val)
+
+#define this_cpu_xchg_1(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val)
+#define this_cpu_xchg_2(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val)
+#define this_cpu_xchg_4(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val)
+#define this_cpu_xchg_8(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val)
+
+#define this_cpu_cmpxchg_1(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+#define this_cpu_cmpxchg_2(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+#define this_cpu_cmpxchg_4(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+#define this_cpu_cmpxchg_8(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+
+#define this_cpu_cmpxchg64(pcp, o, n) this_cpu_cmpxchg_8(pcp, o, n)
+
+#define this_cpu_cmpxchg128(pcp, o, n) \
+({ \
+ typedef typeof(pcp) pcp_op_T__; \
+ u128 old__, new__, ret__; \
+ pcp_op_T__ *ptr__; \
+ old__ = o; \
+ new__ = n; \
+ preempt_disable_notrace(); \
+ ptr__ = raw_cpu_ptr(&(pcp)); \
+ ret__ = cmpxchg128_local(ptr__, old__, new__); \
+ preempt_enable_notrace(); \
+ ret__; \
+})
+
+#include <asm-generic/percpu.h>
+
+#endif /* __ASM_PERCPU_H */
--
2.39.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] riscv: introduce percpu.h into include/asm
2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
@ 2025-08-20 6:44 ` kernel test robot
2025-08-20 17:18 ` kernel test robot
2025-08-20 23:26 ` Christoph Lameter (Ampere)
2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2025-08-20 6:44 UTC (permalink / raw)
To: Yunhui Cui, yury.norov, linux, paul.walmsley, palmer, aou, alex,
linux-riscv, linux-kernel, dennis, tj, cl, linux-mm
Cc: llvm, oe-kbuild-all, Yunhui Cui
Hi Yunhui,
kernel test robot noticed the following build warnings:
[auto build test WARNING on linus/master]
[also build test WARNING on dennis-percpu/for-next v6.17-rc2 next-20250819]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Yunhui-Cui/riscv-remove-irqflags-h-inclusion-in-asm-bitops-h/20250819-215256
base: linus/master
patch link: https://lore.kernel.org/r/20250819135007.85646-3-cuiyunhui%40bytedance.com
patch subject: [PATCH 2/2] riscv: introduce percpu.h into include/asm
config: riscv-randconfig-002-20250820 (https://download.01.org/0day-ci/archive/20250820/202508201452.ciEgfhNO-lkp@intel.com/config)
compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project 93d24b6b7b148c47a2fa228a4ef31524fa1d9f3f)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250820/202508201452.ciEgfhNO-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508201452.ciEgfhNO-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from arch/riscv/kernel/asm-offsets.c:8:
In file included from include/linux/mm.h:7:
In file included from include/linux/gfp.h:7:
In file included from include/linux/mmzone.h:22:
In file included from include/linux/mm_types.h:19:
In file included from include/linux/workqueue.h:9:
>> include/linux/alloc_tag.h:219:2: warning: implicit conversion from 'typeof (tag->counters->calls)' (aka 'unsigned long long') to 'unsigned long' changes value from 18446744073709551615 to 4294967295 [-Wconstant-conversion]
219 | this_cpu_dec(tag->counters->calls);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/percpu-defs.h:512:28: note: expanded from macro 'this_cpu_dec'
512 | #define this_cpu_dec(pcp) this_cpu_sub(pcp, 1)
| ^~~~~~~~~~~~~~~~~~~~
include/linux/percpu-defs.h:510:51: note: expanded from macro 'this_cpu_sub'
510 | #define this_cpu_sub(pcp, val) this_cpu_add(pcp, -(typeof(pcp))(val))
| ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
include/linux/percpu-defs.h:501:70: note: expanded from macro 'this_cpu_add'
501 | #define this_cpu_add(pcp, val) __pcpu_size_call(this_cpu_add_, pcp, val)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
include/linux/percpu-defs.h:372:29: note: expanded from macro '__pcpu_size_call'
372 | case 8: stem##8(variable, __VA_ARGS__);break; \
| ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
arch/riscv/include/asm/percpu.h:96:78: note: expanded from macro 'this_cpu_add_8'
96 | #define this_cpu_add_8(pcp, val) _pcp_protect(__percpu_add_amo_case_64, pcp, val)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
arch/riscv/include/asm/percpu.h:72:26: note: expanded from macro '_pcp_protect'
72 | op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \
| ~~ ^~~~~~~~~~~
1 warning generated.
--
In file included from arch/riscv/errata/sifive/errata.c:7:
In file included from include/linux/memory.h:19:
In file included from include/linux/node.h:18:
In file included from include/linux/device.h:16:
In file included from include/linux/energy_model.h:7:
In file included from include/linux/kobject.h:20:
In file included from include/linux/sysfs.h:16:
In file included from include/linux/kernfs.h:12:
In file included from include/linux/idr.h:15:
In file included from include/linux/radix-tree.h:16:
In file included from include/linux/percpu.h:5:
>> include/linux/alloc_tag.h:219:2: warning: implicit conversion from 'typeof (tag->counters->calls)' (aka 'unsigned long long') to 'unsigned long' changes value from 18446744073709551615 to 4294967295 [-Wconstant-conversion]
219 | this_cpu_dec(tag->counters->calls);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/percpu-defs.h:512:28: note: expanded from macro 'this_cpu_dec'
512 | #define this_cpu_dec(pcp) this_cpu_sub(pcp, 1)
| ^~~~~~~~~~~~~~~~~~~~
include/linux/percpu-defs.h:510:51: note: expanded from macro 'this_cpu_sub'
510 | #define this_cpu_sub(pcp, val) this_cpu_add(pcp, -(typeof(pcp))(val))
| ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
include/linux/percpu-defs.h:501:70: note: expanded from macro 'this_cpu_add'
501 | #define this_cpu_add(pcp, val) __pcpu_size_call(this_cpu_add_, pcp, val)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
include/linux/percpu-defs.h:372:29: note: expanded from macro '__pcpu_size_call'
372 | case 8: stem##8(variable, __VA_ARGS__);break; \
| ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
arch/riscv/include/asm/percpu.h:96:78: note: expanded from macro 'this_cpu_add_8'
96 | #define this_cpu_add_8(pcp, val) _pcp_protect(__percpu_add_amo_case_64, pcp, val)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
arch/riscv/include/asm/percpu.h:72:26: note: expanded from macro '_pcp_protect'
72 | op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \
| ~~ ^~~~~~~~~~~
arch/riscv/errata/sifive/errata.c:29:14: warning: result of comparison of constant 9223372036854775815 with expression of type 'unsigned long' is always true [-Wtautological-constant-out-of-range-compare]
29 | if (arch_id != 0x8000000000000007 ||
| ~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~
arch/riscv/errata/sifive/errata.c:42:14: warning: result of comparison of constant 9223372036854775815 with expression of type 'unsigned long' is always true [-Wtautological-constant-out-of-range-compare]
42 | if (arch_id != 0x8000000000000007 && arch_id != 0x1)
| ~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~
3 warnings generated.
--
In file included from arch/riscv/kernel/asm-offsets.c:8:
In file included from include/linux/mm.h:7:
In file included from include/linux/gfp.h:7:
In file included from include/linux/mmzone.h:22:
In file included from include/linux/mm_types.h:19:
In file included from include/linux/workqueue.h:9:
>> include/linux/alloc_tag.h:219:2: warning: implicit conversion from 'typeof (tag->counters->calls)' (aka 'unsigned long long') to 'unsigned long' changes value from 18446744073709551615 to 4294967295 [-Wconstant-conversion]
219 | this_cpu_dec(tag->counters->calls);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/percpu-defs.h:512:28: note: expanded from macro 'this_cpu_dec'
512 | #define this_cpu_dec(pcp) this_cpu_sub(pcp, 1)
| ^~~~~~~~~~~~~~~~~~~~
include/linux/percpu-defs.h:510:51: note: expanded from macro 'this_cpu_sub'
510 | #define this_cpu_sub(pcp, val) this_cpu_add(pcp, -(typeof(pcp))(val))
| ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
include/linux/percpu-defs.h:501:70: note: expanded from macro 'this_cpu_add'
501 | #define this_cpu_add(pcp, val) __pcpu_size_call(this_cpu_add_, pcp, val)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
include/linux/percpu-defs.h:372:29: note: expanded from macro '__pcpu_size_call'
372 | case 8: stem##8(variable, __VA_ARGS__);break; \
| ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
arch/riscv/include/asm/percpu.h:96:78: note: expanded from macro 'this_cpu_add_8'
96 | #define this_cpu_add_8(pcp, val) _pcp_protect(__percpu_add_amo_case_64, pcp, val)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
arch/riscv/include/asm/percpu.h:72:26: note: expanded from macro '_pcp_protect'
72 | op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \
| ~~ ^~~~~~~~~~~
1 warning generated.
vim +219 include/linux/alloc_tag.h
22d407b164ff79 Suren Baghdasaryan 2024-03-21 202
22d407b164ff79 Suren Baghdasaryan 2024-03-21 203 static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes)
22d407b164ff79 Suren Baghdasaryan 2024-03-21 204 {
22d407b164ff79 Suren Baghdasaryan 2024-03-21 205 struct alloc_tag *tag;
22d407b164ff79 Suren Baghdasaryan 2024-03-21 206
22d407b164ff79 Suren Baghdasaryan 2024-03-21 207 alloc_tag_sub_check(ref);
22d407b164ff79 Suren Baghdasaryan 2024-03-21 208 if (!ref || !ref->ct)
22d407b164ff79 Suren Baghdasaryan 2024-03-21 209 return;
22d407b164ff79 Suren Baghdasaryan 2024-03-21 210
239d6c96d86f8a Suren Baghdasaryan 2024-03-21 211 if (is_codetag_empty(ref)) {
239d6c96d86f8a Suren Baghdasaryan 2024-03-21 212 ref->ct = NULL;
239d6c96d86f8a Suren Baghdasaryan 2024-03-21 213 return;
239d6c96d86f8a Suren Baghdasaryan 2024-03-21 214 }
239d6c96d86f8a Suren Baghdasaryan 2024-03-21 215
22d407b164ff79 Suren Baghdasaryan 2024-03-21 216 tag = ct_to_alloc_tag(ref->ct);
22d407b164ff79 Suren Baghdasaryan 2024-03-21 217
22d407b164ff79 Suren Baghdasaryan 2024-03-21 218 this_cpu_sub(tag->counters->bytes, bytes);
22d407b164ff79 Suren Baghdasaryan 2024-03-21 @219 this_cpu_dec(tag->counters->calls);
22d407b164ff79 Suren Baghdasaryan 2024-03-21 220
22d407b164ff79 Suren Baghdasaryan 2024-03-21 221 ref->ct = NULL;
22d407b164ff79 Suren Baghdasaryan 2024-03-21 222 }
22d407b164ff79 Suren Baghdasaryan 2024-03-21 223
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] riscv: introduce percpu.h into include/asm
2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
2025-08-20 6:44 ` kernel test robot
@ 2025-08-20 17:18 ` kernel test robot
2025-08-20 23:26 ` Christoph Lameter (Ampere)
2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2025-08-20 17:18 UTC (permalink / raw)
To: Yunhui Cui, yury.norov, linux, paul.walmsley, palmer, aou, alex,
linux-riscv, linux-kernel, dennis, tj, cl, linux-mm
Cc: oe-kbuild-all, Yunhui Cui
Hi Yunhui,
kernel test robot noticed the following build errors:
[auto build test ERROR on linus/master]
[also build test ERROR on dennis-percpu/for-next v6.17-rc2 next-20250820]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Yunhui-Cui/riscv-remove-irqflags-h-inclusion-in-asm-bitops-h/20250819-215256
base: linus/master
patch link: https://lore.kernel.org/r/20250819135007.85646-3-cuiyunhui%40bytedance.com
patch subject: [PATCH 2/2] riscv: introduce percpu.h into include/asm
config: riscv-allnoconfig (https://download.01.org/0day-ci/archive/20250821/202508210101.WySkXlSZ-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250821/202508210101.WySkXlSZ-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508210101.WySkXlSZ-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from include/linux/atomic.h:80,
from include/linux/cpumask.h:14,
from include/linux/smp.h:13,
from include/linux/lockdep.h:14,
from include/linux/spinlock.h:63,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:7,
from include/linux/mm.h:7,
from mm/slub.c:13:
mm/slub.c: In function '__update_cpu_freelist_fast':
>> include/linux/atomic/atomic-arch-fallback.h:414:30: error: implicit declaration of function 'arch_cmpxchg128_local'; did you mean 'arch_cmpxchg64_local'? [-Wimplicit-function-declaration]
414 | #define raw_cmpxchg128_local arch_cmpxchg128_local
| ^~~~~~~~~~~~~~~~~~~~~
include/linux/atomic/atomic-instrumented.h:5005:9: note: in expansion of macro 'raw_cmpxchg128_local'
5005 | raw_cmpxchg128_local(__ai_ptr, __VA_ARGS__); \
| ^~~~~~~~~~~~~~~~~~~~
arch/riscv/include/asm/percpu.h:131:17: note: in expansion of macro 'cmpxchg128_local'
131 | ret__ = cmpxchg128_local(ptr__, old__, new__); \
| ^~~~~~~~~~~~~~~~
include/asm-generic/percpu.h:108:17: note: in expansion of macro 'this_cpu_cmpxchg128'
108 | __val = _cmpxchg(pcp, __old, nval); \
| ^~~~~~~~
include/asm-generic/percpu.h:527:9: note: in expansion of macro '__cpu_fallback_try_cmpxchg'
527 | __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, this_cpu_cmpxchg128)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
mm/slab.h:24:41: note: in expansion of macro 'this_cpu_try_cmpxchg128'
24 | #define this_cpu_try_cmpxchg_freelist this_cpu_try_cmpxchg128
| ^~~~~~~~~~~~~~~~~~~~~~~
mm/slub.c:3638:16: note: in expansion of macro 'this_cpu_try_cmpxchg_freelist'
3638 | return this_cpu_try_cmpxchg_freelist(s->cpu_slab->freelist_tid.full,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vim +414 include/linux/atomic/atomic-arch-fallback.h
9257959a6e5b4f Mark Rutland 2023-06-05 413
9257959a6e5b4f Mark Rutland 2023-06-05 @414 #define raw_cmpxchg128_local arch_cmpxchg128_local
e6ce9d741163af Uros Bizjak 2023-04-05 415
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] riscv: introduce percpu.h into include/asm
2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
2025-08-20 6:44 ` kernel test robot
2025-08-20 17:18 ` kernel test robot
@ 2025-08-20 23:26 ` Christoph Lameter (Ampere)
2025-08-21 8:01 ` [External] " yunhui cui
2 siblings, 1 reply; 7+ messages in thread
From: Christoph Lameter (Ampere) @ 2025-08-20 23:26 UTC (permalink / raw)
To: Yunhui Cui
Cc: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv,
linux-kernel, dennis, tj, linux-mm
On Tue, 19 Aug 2025, Yunhui Cui wrote:
> +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \
> +static inline void \
> +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) \
> +{ \
> + asm volatile ( \
> + "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \
> + : [ptr] "+A" (*(u##sz *)ptr) \
> + : [val] "r" ((u##sz)(val)) \
> + : "memory"); \
> +}
AMO creates a single instruction that performs the operation?
> +#define _pcp_protect(op, pcp, ...) \
> +({ \
> + preempt_disable_notrace(); \
> + op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \
> + preempt_enable_notrace(); \
> +})
Is "op" a single instruction? If so then preempt disable / endable would
not be needed if there is no other instruction created.
But raw_cpu_ptr performs a SHIFT_PERCPU_PTR which performs an addition.
So you need the disabling of preemption to protect the add.
Is there a way on RISC V to embedd the pointer arithmetic in the "AMO"
instruction? Or can you use relative addressing to a register that
contains the cpu offset. I believe RISC V has a thread pointer?
If you can do this then a lot of preempt_enable/disable points can be
removed from the core kernel and the instruction may be as scalable as x86
which can do the per cpu operations with a single instruction.
> +
> +#define _pcp_protect_return(op, pcp, args...) \
> +({ \
> + typeof(pcp) __retval; \
> + preempt_disable_notrace(); \
> + __retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \
> + preempt_enable_notrace(); \
> + __retval; \
> +})
Same here.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [External] Re: [PATCH 2/2] riscv: introduce percpu.h into include/asm
2025-08-20 23:26 ` Christoph Lameter (Ampere)
@ 2025-08-21 8:01 ` yunhui cui
0 siblings, 0 replies; 7+ messages in thread
From: yunhui cui @ 2025-08-21 8:01 UTC (permalink / raw)
To: Christoph Lameter (Ampere)
Cc: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv,
linux-kernel, dennis, tj, linux-mm
Hi Christoph,
On Thu, Aug 21, 2025 at 7:39 AM Christoph Lameter (Ampere)
<cl@gentwo.org> wrote:
>
> On Tue, 19 Aug 2025, Yunhui Cui wrote:
>
> > +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \
> > +static inline void \
> > +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) \
> > +{ \
> > + asm volatile ( \
> > + "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \
> > + : [ptr] "+A" (*(u##sz *)ptr) \
> > + : [val] "r" ((u##sz)(val)) \
> > + : "memory"); \
> > +}
>
> AMO creates a single instruction that performs the operation?
>
> > +#define _pcp_protect(op, pcp, ...) \
> > +({ \
> > + preempt_disable_notrace(); \
> > + op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \
> > + preempt_enable_notrace(); \
> > +})
>
> Is "op" a single instruction? If so then preempt disable / endable would
> not be needed if there is no other instruction created.
>
> But raw_cpu_ptr performs a SHIFT_PERCPU_PTR which performs an addition.
> So you need the disabling of preemption to protect the add.
>
> Is there a way on RISC V to embedd the pointer arithmetic in the "AMO"
> instruction? Or can you use relative addressing to a register that
> contains the cpu offset. I believe RISC V has a thread pointer?
>
> If you can do this then a lot of preempt_enable/disable points can be
> removed from the core kernel and the instruction may be as scalable as x86
> which can do the per cpu operations with a single instruction.
Yes, thank you. While it’s certainly good to remove preemption
disabling, currently RISC-V’s amoadd.w/d instructions can take the
address of a variable rather than a register.
I previously submitted an attempt to use gp to store the percpu
offset, and we are also trying to push for an extension that uses a
register to store the percpu offset.
https://lore.kernel.org/all/CAEEQ3w=PsM5T+yMrEGdWZ2nm7m7SX3vzscLtWpOPVu1zpfm3YQ@mail.gmail.com/
https://lists.riscv.org/g/tech-privileged/topic/risc_v_tech_arch_review/113437553?page=2
>
> > +
> > +#define _pcp_protect_return(op, pcp, args...) \
> > +({ \
> > + typeof(pcp) __retval; \
> > + preempt_disable_notrace(); \
> > + __retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \
> > + preempt_enable_notrace(); \
> > + __retval; \
> > +})
>
> Same here.
>
>
Thanks,
Yunhui
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-08-21 8:01 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-19 13:50 [PATCH 0/2] riscv: introduce percpu.h Yunhui Cui
2025-08-19 13:50 ` [PATCH 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui
2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
2025-08-20 6:44 ` kernel test robot
2025-08-20 17:18 ` kernel test robot
2025-08-20 23:26 ` Christoph Lameter (Ampere)
2025-08-21 8:01 ` [External] " yunhui cui
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).