* [PATCH 0/2] riscv: introduce percpu.h @ 2025-08-19 13:50 Yunhui Cui 2025-08-19 13:50 ` [PATCH 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui 2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui 0 siblings, 2 replies; 7+ messages in thread From: Yunhui Cui @ 2025-08-19 13:50 UTC (permalink / raw) To: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv, linux-kernel, dennis, tj, cl, linux-mm Cc: Yunhui Cui Current per-CPU operations rely on generic code using raw_local_irq_save(), which incurs significant overhead. This patch optimizes 32/64-bit paths with RISC-V atomic instructions, reducing overhead. RISC-V lacks lr/sc.b/h support; without ZABHA, emulating 8/16-bit operations via lr/sc.w would require complex mask logic. However, data shows 8/16-bit per-CPU operations are extremely rare (single-digit counts in boot and hackbench tests). Thus, we let 8/16-bit ops fall back to the generic implementation, avoiding unnecessary complexity. 32/64-bit ops use direct atomic instructions for performance. Yunhui Cui (2): riscv: remove irqflags.h inclusion in asm/bitops.h riscv: introduce percpu.h into include/asm arch/riscv/include/asm/bitops.h | 1 - arch/riscv/include/asm/percpu.h | 138 ++++++++++++++++++++++++++++++++ 2 files changed, 138 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/include/asm/percpu.h -- 2.39.5 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h 2025-08-19 13:50 [PATCH 0/2] riscv: introduce percpu.h Yunhui Cui @ 2025-08-19 13:50 ` Yunhui Cui 2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui 1 sibling, 0 replies; 7+ messages in thread From: Yunhui Cui @ 2025-08-19 13:50 UTC (permalink / raw) To: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv, linux-kernel, dennis, tj, cl, linux-mm Cc: Yunhui Cui The arch/riscv/include/asm/bitops.h does not functionally require including /linux/irqflags.h. Additionally, adding arch/riscv/include/asm/percpu.h causes a circular inclusion: kernel/bounds.c ->include/linux/log2.h ->include/linux/bitops.h ->arch/riscv/include/asm/bitops.h ->include/linux/irqflags.h ->include/linux/find.h ->return val ? __ffs(val) : size; ->arch/riscv/include/asm/bitops.h The compilation log is as follows: CC kernel/bounds.s In file included from ./include/linux/bitmap.h:11, from ./include/linux/cpumask.h:12, from ./arch/riscv/include/asm/processor.h:55, from ./arch/riscv/include/asm/thread_info.h:42, from ./include/linux/thread_info.h:60, from ./include/asm-generic/preempt.h:5, from ./arch/riscv/include/generated/asm/preempt.h:1, from ./include/linux/preempt.h:79, from ./arch/riscv/include/asm/percpu.h:8, from ./include/linux/irqflags.h:19, from ./arch/riscv/include/asm/bitops.h:14, from ./include/linux/bitops.h:68, from ./include/linux/log2.h:12, from kernel/bounds.c:13: ./include/linux/find.h: In function 'find_next_bit': ./include/linux/find.h:66:30: error: implicit declaration of function '__ffs' [-Wimplicit-function-declaration] 66 | return val ? __ffs(val) : size; | ^~~~~ Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> --- arch/riscv/include/asm/bitops.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/riscv/include/asm/bitops.h b/arch/riscv/include/asm/bitops.h index d59310f74c2ba..d9837b3cf05fe 100644 --- a/arch/riscv/include/asm/bitops.h +++ b/arch/riscv/include/asm/bitops.h @@ -11,7 +11,6 @@ #endif /* _LINUX_BITOPS_H */ #include <linux/compiler.h> -#include <linux/irqflags.h> #include <asm/barrier.h> #include <asm/bitsperlong.h> -- 2.39.5 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] riscv: introduce percpu.h into include/asm 2025-08-19 13:50 [PATCH 0/2] riscv: introduce percpu.h Yunhui Cui 2025-08-19 13:50 ` [PATCH 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui @ 2025-08-19 13:50 ` Yunhui Cui 2025-08-20 6:44 ` kernel test robot ` (2 more replies) 1 sibling, 3 replies; 7+ messages in thread From: Yunhui Cui @ 2025-08-19 13:50 UTC (permalink / raw) To: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv, linux-kernel, dennis, tj, cl, linux-mm Cc: Yunhui Cui Current percpu operations rely on generic implementations, where raw_local_irq_save() introduces substantial overhead. Optimization is achieved through atomic operations and preemption disabling. Since RISC-V does not support lr/sc.b/h, when ZABHA is not supported, we need to use lr/sc.w instead, which requires some additional mask operations. In fact, 8/16-bit per-CPU operations are very few. The counts during system startup are as follows: Reads: 8-bit: 3, 16-bit: 3, 32-bit: 1531, 64-bit: 471 Writes: 8-bit: 4, 16-bit: 3, 32-bit: 32, 64-bit: 238 Adds: 8-bit: 3, 16-bit: 3, 32-bit: 31858, 64-bit: 7656 Add-Returns: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 2 ANDs: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 0 ANDNOTs: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 0 ORs: 8-bit: 0, 16-bit: 0, 32-bit: 70, 64-bit: 0 hackbench -l 1000: Reads: 8-bit: 3, 16-bit: 3, 32-bit: 1531, 64-bit: 2522158 Writes: 8-bit: 4, 16-bit: 3, 32-bit: 34, 64-bit: 2521522 Adds: 8-bit: 3, 16-bit: 3, 32-bit: 47771, 64-bit: 19911 Add-Returns: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 2 ANDs: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 0 ANDNOTs: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 0 ORs: 8-bit: 0, 16-bit: 0, 32-bit: 70, 64-bit: 0 Based on this, 8bit/16bit per-CPU operations can directly fall back to the generic implementation. Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> --- arch/riscv/include/asm/percpu.h | 138 ++++++++++++++++++++++++++++++++ 1 file changed, 138 insertions(+) create mode 100644 arch/riscv/include/asm/percpu.h diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percpu.h new file mode 100644 index 0000000000000..5a1fdb37a8056 --- /dev/null +++ b/arch/riscv/include/asm/percpu.h @@ -0,0 +1,138 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef __ASM_PERCPU_H +#define __ASM_PERCPU_H + +#include <linux/preempt.h> + +#define PERCPU_RW_OPS(sz) \ +static inline unsigned long __percpu_read_##sz(void *ptr) \ +{ \ + return READ_ONCE(*(u##sz *)ptr); \ +} \ + \ +static inline void __percpu_write_##sz(void *ptr, unsigned long val) \ +{ \ + WRITE_ONCE(*(u##sz *)ptr, (u##sz)val); \ +} + +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \ +static inline void \ +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) \ +{ \ + asm volatile ( \ + "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \ + : [ptr] "+A" (*(u##sz *)ptr) \ + : [val] "r" ((u##sz)(val)) \ + : "memory"); \ +} + +#define __PERCPU_AMO_RET_OP_CASE(sfx, name, sz, amo_insn) \ +static inline u##sz \ +__percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long val) \ +{ \ + register u##sz ret; \ + \ + asm volatile ( \ + "amo" #amo_insn #sfx " %[ret], %[val], %[ptr]" \ + : [ptr] "+A" (*(u##sz *)ptr), [ret] "=r" (ret) \ + : [val] "r" ((u##sz)(val)) \ + : "memory"); \ + \ + return ret + val; \ +} + +#define PERCPU_OP(name, amo_insn) \ + __PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn) \ + __PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn) + +#define PERCPU_RET_OP(name, amo_insn) \ + __PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn) \ + __PERCPU_AMO_RET_OP_CASE(.d, name, 64, amo_insn) + +PERCPU_RW_OPS(8) +PERCPU_RW_OPS(16) +PERCPU_RW_OPS(32) +PERCPU_RW_OPS(64) + +PERCPU_OP(add, add) +PERCPU_OP(andnot, and) +PERCPU_OP(or, or) +PERCPU_RET_OP(add, add) + +#undef PERCPU_RW_OPS +#undef __PERCPU_AMO_OP_CASE +#undef __PERCPU_AMO_RET_OP_CASE +#undef PERCPU_OP +#undef PERCPU_RET_OP + +#define _pcp_protect(op, pcp, ...) \ +({ \ + preempt_disable_notrace(); \ + op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \ + preempt_enable_notrace(); \ +}) + +#define _pcp_protect_return(op, pcp, args...) \ +({ \ + typeof(pcp) __retval; \ + preempt_disable_notrace(); \ + __retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \ + preempt_enable_notrace(); \ + __retval; \ +}) + +#define this_cpu_read_1(pcp) _pcp_protect_return(__percpu_read_8, pcp) +#define this_cpu_read_2(pcp) _pcp_protect_return(__percpu_read_16, pcp) +#define this_cpu_read_4(pcp) _pcp_protect_return(__percpu_read_32, pcp) +#define this_cpu_read_8(pcp) _pcp_protect_return(__percpu_read_64, pcp) + +#define this_cpu_write_1(pcp, val) _pcp_protect(__percpu_write_8, pcp, (unsigned long)val) +#define this_cpu_write_2(pcp, val) _pcp_protect(__percpu_write_16, pcp, (unsigned long)val) +#define this_cpu_write_4(pcp, val) _pcp_protect(__percpu_write_32, pcp, (unsigned long)val) +#define this_cpu_write_8(pcp, val) _pcp_protect(__percpu_write_64, pcp, (unsigned long)val) + +#define this_cpu_add_4(pcp, val) _pcp_protect(__percpu_add_amo_case_32, pcp, val) +#define this_cpu_add_8(pcp, val) _pcp_protect(__percpu_add_amo_case_64, pcp, val) + +#define this_cpu_add_return_4(pcp, val) \ +_pcp_protect_return(__percpu_add_return_amo_case_32, pcp, val) + +#define this_cpu_add_return_8(pcp, val) \ +_pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val) + +#define this_cpu_and_4(pcp, val) _pcp_protect(__percpu_andnot_amo_case_32, pcp, ~val) +#define this_cpu_and_8(pcp, val) _pcp_protect(__percpu_andnot_amo_case_64, pcp, ~val) + +#define this_cpu_or_4(pcp, val) _pcp_protect(__percpu_or_amo_case_32, pcp, val) +#define this_cpu_or_8(pcp, val) _pcp_protect(__percpu_or_amo_case_64, pcp, val) + +#define this_cpu_xchg_1(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val) +#define this_cpu_xchg_2(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val) +#define this_cpu_xchg_4(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val) +#define this_cpu_xchg_8(pcp, val) _pcp_protect_return(xchg_relaxed, pcp, val) + +#define this_cpu_cmpxchg_1(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) +#define this_cpu_cmpxchg_2(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) +#define this_cpu_cmpxchg_4(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) +#define this_cpu_cmpxchg_8(pcp, o, n) _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) + +#define this_cpu_cmpxchg64(pcp, o, n) this_cpu_cmpxchg_8(pcp, o, n) + +#define this_cpu_cmpxchg128(pcp, o, n) \ +({ \ + typedef typeof(pcp) pcp_op_T__; \ + u128 old__, new__, ret__; \ + pcp_op_T__ *ptr__; \ + old__ = o; \ + new__ = n; \ + preempt_disable_notrace(); \ + ptr__ = raw_cpu_ptr(&(pcp)); \ + ret__ = cmpxchg128_local(ptr__, old__, new__); \ + preempt_enable_notrace(); \ + ret__; \ +}) + +#include <asm-generic/percpu.h> + +#endif /* __ASM_PERCPU_H */ -- 2.39.5 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] riscv: introduce percpu.h into include/asm 2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui @ 2025-08-20 6:44 ` kernel test robot 2025-08-20 17:18 ` kernel test robot 2025-08-20 23:26 ` Christoph Lameter (Ampere) 2 siblings, 0 replies; 7+ messages in thread From: kernel test robot @ 2025-08-20 6:44 UTC (permalink / raw) To: Yunhui Cui, yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv, linux-kernel, dennis, tj, cl, linux-mm Cc: llvm, oe-kbuild-all, Yunhui Cui Hi Yunhui, kernel test robot noticed the following build warnings: [auto build test WARNING on linus/master] [also build test WARNING on dennis-percpu/for-next v6.17-rc2 next-20250819] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Yunhui-Cui/riscv-remove-irqflags-h-inclusion-in-asm-bitops-h/20250819-215256 base: linus/master patch link: https://lore.kernel.org/r/20250819135007.85646-3-cuiyunhui%40bytedance.com patch subject: [PATCH 2/2] riscv: introduce percpu.h into include/asm config: riscv-randconfig-002-20250820 (https://download.01.org/0day-ci/archive/20250820/202508201452.ciEgfhNO-lkp@intel.com/config) compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project 93d24b6b7b148c47a2fa228a4ef31524fa1d9f3f) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250820/202508201452.ciEgfhNO-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202508201452.ciEgfhNO-lkp@intel.com/ All warnings (new ones prefixed by >>): In file included from arch/riscv/kernel/asm-offsets.c:8: In file included from include/linux/mm.h:7: In file included from include/linux/gfp.h:7: In file included from include/linux/mmzone.h:22: In file included from include/linux/mm_types.h:19: In file included from include/linux/workqueue.h:9: >> include/linux/alloc_tag.h:219:2: warning: implicit conversion from 'typeof (tag->counters->calls)' (aka 'unsigned long long') to 'unsigned long' changes value from 18446744073709551615 to 4294967295 [-Wconstant-conversion] 219 | this_cpu_dec(tag->counters->calls); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/percpu-defs.h:512:28: note: expanded from macro 'this_cpu_dec' 512 | #define this_cpu_dec(pcp) this_cpu_sub(pcp, 1) | ^~~~~~~~~~~~~~~~~~~~ include/linux/percpu-defs.h:510:51: note: expanded from macro 'this_cpu_sub' 510 | #define this_cpu_sub(pcp, val) this_cpu_add(pcp, -(typeof(pcp))(val)) | ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ include/linux/percpu-defs.h:501:70: note: expanded from macro 'this_cpu_add' 501 | #define this_cpu_add(pcp, val) __pcpu_size_call(this_cpu_add_, pcp, val) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ include/linux/percpu-defs.h:372:29: note: expanded from macro '__pcpu_size_call' 372 | case 8: stem##8(variable, __VA_ARGS__);break; \ | ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ arch/riscv/include/asm/percpu.h:96:78: note: expanded from macro 'this_cpu_add_8' 96 | #define this_cpu_add_8(pcp, val) _pcp_protect(__percpu_add_amo_case_64, pcp, val) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ arch/riscv/include/asm/percpu.h:72:26: note: expanded from macro '_pcp_protect' 72 | op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \ | ~~ ^~~~~~~~~~~ 1 warning generated. -- In file included from arch/riscv/errata/sifive/errata.c:7: In file included from include/linux/memory.h:19: In file included from include/linux/node.h:18: In file included from include/linux/device.h:16: In file included from include/linux/energy_model.h:7: In file included from include/linux/kobject.h:20: In file included from include/linux/sysfs.h:16: In file included from include/linux/kernfs.h:12: In file included from include/linux/idr.h:15: In file included from include/linux/radix-tree.h:16: In file included from include/linux/percpu.h:5: >> include/linux/alloc_tag.h:219:2: warning: implicit conversion from 'typeof (tag->counters->calls)' (aka 'unsigned long long') to 'unsigned long' changes value from 18446744073709551615 to 4294967295 [-Wconstant-conversion] 219 | this_cpu_dec(tag->counters->calls); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/percpu-defs.h:512:28: note: expanded from macro 'this_cpu_dec' 512 | #define this_cpu_dec(pcp) this_cpu_sub(pcp, 1) | ^~~~~~~~~~~~~~~~~~~~ include/linux/percpu-defs.h:510:51: note: expanded from macro 'this_cpu_sub' 510 | #define this_cpu_sub(pcp, val) this_cpu_add(pcp, -(typeof(pcp))(val)) | ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ include/linux/percpu-defs.h:501:70: note: expanded from macro 'this_cpu_add' 501 | #define this_cpu_add(pcp, val) __pcpu_size_call(this_cpu_add_, pcp, val) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ include/linux/percpu-defs.h:372:29: note: expanded from macro '__pcpu_size_call' 372 | case 8: stem##8(variable, __VA_ARGS__);break; \ | ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ arch/riscv/include/asm/percpu.h:96:78: note: expanded from macro 'this_cpu_add_8' 96 | #define this_cpu_add_8(pcp, val) _pcp_protect(__percpu_add_amo_case_64, pcp, val) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ arch/riscv/include/asm/percpu.h:72:26: note: expanded from macro '_pcp_protect' 72 | op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \ | ~~ ^~~~~~~~~~~ arch/riscv/errata/sifive/errata.c:29:14: warning: result of comparison of constant 9223372036854775815 with expression of type 'unsigned long' is always true [-Wtautological-constant-out-of-range-compare] 29 | if (arch_id != 0x8000000000000007 || | ~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~ arch/riscv/errata/sifive/errata.c:42:14: warning: result of comparison of constant 9223372036854775815 with expression of type 'unsigned long' is always true [-Wtautological-constant-out-of-range-compare] 42 | if (arch_id != 0x8000000000000007 && arch_id != 0x1) | ~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~ 3 warnings generated. -- In file included from arch/riscv/kernel/asm-offsets.c:8: In file included from include/linux/mm.h:7: In file included from include/linux/gfp.h:7: In file included from include/linux/mmzone.h:22: In file included from include/linux/mm_types.h:19: In file included from include/linux/workqueue.h:9: >> include/linux/alloc_tag.h:219:2: warning: implicit conversion from 'typeof (tag->counters->calls)' (aka 'unsigned long long') to 'unsigned long' changes value from 18446744073709551615 to 4294967295 [-Wconstant-conversion] 219 | this_cpu_dec(tag->counters->calls); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/percpu-defs.h:512:28: note: expanded from macro 'this_cpu_dec' 512 | #define this_cpu_dec(pcp) this_cpu_sub(pcp, 1) | ^~~~~~~~~~~~~~~~~~~~ include/linux/percpu-defs.h:510:51: note: expanded from macro 'this_cpu_sub' 510 | #define this_cpu_sub(pcp, val) this_cpu_add(pcp, -(typeof(pcp))(val)) | ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ include/linux/percpu-defs.h:501:70: note: expanded from macro 'this_cpu_add' 501 | #define this_cpu_add(pcp, val) __pcpu_size_call(this_cpu_add_, pcp, val) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ include/linux/percpu-defs.h:372:29: note: expanded from macro '__pcpu_size_call' 372 | case 8: stem##8(variable, __VA_ARGS__);break; \ | ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ arch/riscv/include/asm/percpu.h:96:78: note: expanded from macro 'this_cpu_add_8' 96 | #define this_cpu_add_8(pcp, val) _pcp_protect(__percpu_add_amo_case_64, pcp, val) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ arch/riscv/include/asm/percpu.h:72:26: note: expanded from macro '_pcp_protect' 72 | op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \ | ~~ ^~~~~~~~~~~ 1 warning generated. vim +219 include/linux/alloc_tag.h 22d407b164ff79 Suren Baghdasaryan 2024-03-21 202 22d407b164ff79 Suren Baghdasaryan 2024-03-21 203 static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) 22d407b164ff79 Suren Baghdasaryan 2024-03-21 204 { 22d407b164ff79 Suren Baghdasaryan 2024-03-21 205 struct alloc_tag *tag; 22d407b164ff79 Suren Baghdasaryan 2024-03-21 206 22d407b164ff79 Suren Baghdasaryan 2024-03-21 207 alloc_tag_sub_check(ref); 22d407b164ff79 Suren Baghdasaryan 2024-03-21 208 if (!ref || !ref->ct) 22d407b164ff79 Suren Baghdasaryan 2024-03-21 209 return; 22d407b164ff79 Suren Baghdasaryan 2024-03-21 210 239d6c96d86f8a Suren Baghdasaryan 2024-03-21 211 if (is_codetag_empty(ref)) { 239d6c96d86f8a Suren Baghdasaryan 2024-03-21 212 ref->ct = NULL; 239d6c96d86f8a Suren Baghdasaryan 2024-03-21 213 return; 239d6c96d86f8a Suren Baghdasaryan 2024-03-21 214 } 239d6c96d86f8a Suren Baghdasaryan 2024-03-21 215 22d407b164ff79 Suren Baghdasaryan 2024-03-21 216 tag = ct_to_alloc_tag(ref->ct); 22d407b164ff79 Suren Baghdasaryan 2024-03-21 217 22d407b164ff79 Suren Baghdasaryan 2024-03-21 218 this_cpu_sub(tag->counters->bytes, bytes); 22d407b164ff79 Suren Baghdasaryan 2024-03-21 @219 this_cpu_dec(tag->counters->calls); 22d407b164ff79 Suren Baghdasaryan 2024-03-21 220 22d407b164ff79 Suren Baghdasaryan 2024-03-21 221 ref->ct = NULL; 22d407b164ff79 Suren Baghdasaryan 2024-03-21 222 } 22d407b164ff79 Suren Baghdasaryan 2024-03-21 223 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] riscv: introduce percpu.h into include/asm 2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui 2025-08-20 6:44 ` kernel test robot @ 2025-08-20 17:18 ` kernel test robot 2025-08-20 23:26 ` Christoph Lameter (Ampere) 2 siblings, 0 replies; 7+ messages in thread From: kernel test robot @ 2025-08-20 17:18 UTC (permalink / raw) To: Yunhui Cui, yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv, linux-kernel, dennis, tj, cl, linux-mm Cc: oe-kbuild-all, Yunhui Cui Hi Yunhui, kernel test robot noticed the following build errors: [auto build test ERROR on linus/master] [also build test ERROR on dennis-percpu/for-next v6.17-rc2 next-20250820] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Yunhui-Cui/riscv-remove-irqflags-h-inclusion-in-asm-bitops-h/20250819-215256 base: linus/master patch link: https://lore.kernel.org/r/20250819135007.85646-3-cuiyunhui%40bytedance.com patch subject: [PATCH 2/2] riscv: introduce percpu.h into include/asm config: riscv-allnoconfig (https://download.01.org/0day-ci/archive/20250821/202508210101.WySkXlSZ-lkp@intel.com/config) compiler: riscv64-linux-gcc (GCC) 15.1.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250821/202508210101.WySkXlSZ-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202508210101.WySkXlSZ-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from include/linux/atomic.h:80, from include/linux/cpumask.h:14, from include/linux/smp.h:13, from include/linux/lockdep.h:14, from include/linux/spinlock.h:63, from include/linux/mmzone.h:8, from include/linux/gfp.h:7, from include/linux/mm.h:7, from mm/slub.c:13: mm/slub.c: In function '__update_cpu_freelist_fast': >> include/linux/atomic/atomic-arch-fallback.h:414:30: error: implicit declaration of function 'arch_cmpxchg128_local'; did you mean 'arch_cmpxchg64_local'? [-Wimplicit-function-declaration] 414 | #define raw_cmpxchg128_local arch_cmpxchg128_local | ^~~~~~~~~~~~~~~~~~~~~ include/linux/atomic/atomic-instrumented.h:5005:9: note: in expansion of macro 'raw_cmpxchg128_local' 5005 | raw_cmpxchg128_local(__ai_ptr, __VA_ARGS__); \ | ^~~~~~~~~~~~~~~~~~~~ arch/riscv/include/asm/percpu.h:131:17: note: in expansion of macro 'cmpxchg128_local' 131 | ret__ = cmpxchg128_local(ptr__, old__, new__); \ | ^~~~~~~~~~~~~~~~ include/asm-generic/percpu.h:108:17: note: in expansion of macro 'this_cpu_cmpxchg128' 108 | __val = _cmpxchg(pcp, __old, nval); \ | ^~~~~~~~ include/asm-generic/percpu.h:527:9: note: in expansion of macro '__cpu_fallback_try_cmpxchg' 527 | __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, this_cpu_cmpxchg128) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ mm/slab.h:24:41: note: in expansion of macro 'this_cpu_try_cmpxchg128' 24 | #define this_cpu_try_cmpxchg_freelist this_cpu_try_cmpxchg128 | ^~~~~~~~~~~~~~~~~~~~~~~ mm/slub.c:3638:16: note: in expansion of macro 'this_cpu_try_cmpxchg_freelist' 3638 | return this_cpu_try_cmpxchg_freelist(s->cpu_slab->freelist_tid.full, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vim +414 include/linux/atomic/atomic-arch-fallback.h 9257959a6e5b4f Mark Rutland 2023-06-05 413 9257959a6e5b4f Mark Rutland 2023-06-05 @414 #define raw_cmpxchg128_local arch_cmpxchg128_local e6ce9d741163af Uros Bizjak 2023-04-05 415 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] riscv: introduce percpu.h into include/asm 2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui 2025-08-20 6:44 ` kernel test robot 2025-08-20 17:18 ` kernel test robot @ 2025-08-20 23:26 ` Christoph Lameter (Ampere) 2025-08-21 8:01 ` [External] " yunhui cui 2 siblings, 1 reply; 7+ messages in thread From: Christoph Lameter (Ampere) @ 2025-08-20 23:26 UTC (permalink / raw) To: Yunhui Cui Cc: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv, linux-kernel, dennis, tj, linux-mm On Tue, 19 Aug 2025, Yunhui Cui wrote: > +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \ > +static inline void \ > +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) \ > +{ \ > + asm volatile ( \ > + "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \ > + : [ptr] "+A" (*(u##sz *)ptr) \ > + : [val] "r" ((u##sz)(val)) \ > + : "memory"); \ > +} AMO creates a single instruction that performs the operation? > +#define _pcp_protect(op, pcp, ...) \ > +({ \ > + preempt_disable_notrace(); \ > + op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \ > + preempt_enable_notrace(); \ > +}) Is "op" a single instruction? If so then preempt disable / endable would not be needed if there is no other instruction created. But raw_cpu_ptr performs a SHIFT_PERCPU_PTR which performs an addition. So you need the disabling of preemption to protect the add. Is there a way on RISC V to embedd the pointer arithmetic in the "AMO" instruction? Or can you use relative addressing to a register that contains the cpu offset. I believe RISC V has a thread pointer? If you can do this then a lot of preempt_enable/disable points can be removed from the core kernel and the instruction may be as scalable as x86 which can do the per cpu operations with a single instruction. > + > +#define _pcp_protect_return(op, pcp, args...) \ > +({ \ > + typeof(pcp) __retval; \ > + preempt_disable_notrace(); \ > + __retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \ > + preempt_enable_notrace(); \ > + __retval; \ > +}) Same here. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [External] Re: [PATCH 2/2] riscv: introduce percpu.h into include/asm 2025-08-20 23:26 ` Christoph Lameter (Ampere) @ 2025-08-21 8:01 ` yunhui cui 0 siblings, 0 replies; 7+ messages in thread From: yunhui cui @ 2025-08-21 8:01 UTC (permalink / raw) To: Christoph Lameter (Ampere) Cc: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv, linux-kernel, dennis, tj, linux-mm Hi Christoph, On Thu, Aug 21, 2025 at 7:39 AM Christoph Lameter (Ampere) <cl@gentwo.org> wrote: > > On Tue, 19 Aug 2025, Yunhui Cui wrote: > > > +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \ > > +static inline void \ > > +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) \ > > +{ \ > > + asm volatile ( \ > > + "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \ > > + : [ptr] "+A" (*(u##sz *)ptr) \ > > + : [val] "r" ((u##sz)(val)) \ > > + : "memory"); \ > > +} > > AMO creates a single instruction that performs the operation? > > > +#define _pcp_protect(op, pcp, ...) \ > > +({ \ > > + preempt_disable_notrace(); \ > > + op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \ > > + preempt_enable_notrace(); \ > > +}) > > Is "op" a single instruction? If so then preempt disable / endable would > not be needed if there is no other instruction created. > > But raw_cpu_ptr performs a SHIFT_PERCPU_PTR which performs an addition. > So you need the disabling of preemption to protect the add. > > Is there a way on RISC V to embedd the pointer arithmetic in the "AMO" > instruction? Or can you use relative addressing to a register that > contains the cpu offset. I believe RISC V has a thread pointer? > > If you can do this then a lot of preempt_enable/disable points can be > removed from the core kernel and the instruction may be as scalable as x86 > which can do the per cpu operations with a single instruction. Yes, thank you. While it’s certainly good to remove preemption disabling, currently RISC-V’s amoadd.w/d instructions can take the address of a variable rather than a register. I previously submitted an attempt to use gp to store the percpu offset, and we are also trying to push for an extension that uses a register to store the percpu offset. https://lore.kernel.org/all/CAEEQ3w=PsM5T+yMrEGdWZ2nm7m7SX3vzscLtWpOPVu1zpfm3YQ@mail.gmail.com/ https://lists.riscv.org/g/tech-privileged/topic/risc_v_tech_arch_review/113437553?page=2 > > > + > > +#define _pcp_protect_return(op, pcp, args...) \ > > +({ \ > > + typeof(pcp) __retval; \ > > + preempt_disable_notrace(); \ > > + __retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \ > > + preempt_enable_notrace(); \ > > + __retval; \ > > +}) > > Same here. > > Thanks, Yunhui ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-08-21 8:01 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-19 13:50 [PATCH 0/2] riscv: introduce percpu.h Yunhui Cui 2025-08-19 13:50 ` [PATCH 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui 2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui 2025-08-20 6:44 ` kernel test robot 2025-08-20 17:18 ` kernel test robot 2025-08-20 23:26 ` Christoph Lameter (Ampere) 2025-08-21 8:01 ` [External] " yunhui cui
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).