linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] riscv: introduce percpu.h
@ 2025-08-19 13:50 Yunhui Cui
  2025-08-19 13:50 ` [PATCH 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui
  2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
  0 siblings, 2 replies; 7+ messages in thread
From: Yunhui Cui @ 2025-08-19 13:50 UTC (permalink / raw)
  To: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv,
	linux-kernel, dennis, tj, cl, linux-mm
  Cc: Yunhui Cui

Current per-CPU operations rely on generic code using raw_local_irq_save(),
which incurs significant overhead. This patch optimizes 32/64-bit paths with
RISC-V atomic instructions, reducing overhead.

RISC-V lacks lr/sc.b/h support; without ZABHA, emulating 8/16-bit operations
via lr/sc.w would require complex mask logic. However, data shows 8/16-bit
per-CPU operations are extremely rare (single-digit counts in boot and
hackbench tests). Thus, we let 8/16-bit ops fall back to the generic
implementation, avoiding unnecessary complexity. 32/64-bit ops use direct
atomic instructions for performance.

Yunhui Cui (2):
  riscv: remove irqflags.h inclusion in asm/bitops.h
  riscv: introduce percpu.h into include/asm

 arch/riscv/include/asm/bitops.h |   1 -
 arch/riscv/include/asm/percpu.h | 138 ++++++++++++++++++++++++++++++++
 2 files changed, 138 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/include/asm/percpu.h

-- 
2.39.5


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h
  2025-08-19 13:50 [PATCH 0/2] riscv: introduce percpu.h Yunhui Cui
@ 2025-08-19 13:50 ` Yunhui Cui
  2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
  1 sibling, 0 replies; 7+ messages in thread
From: Yunhui Cui @ 2025-08-19 13:50 UTC (permalink / raw)
  To: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv,
	linux-kernel, dennis, tj, cl, linux-mm
  Cc: Yunhui Cui

The arch/riscv/include/asm/bitops.h does not functionally require
including /linux/irqflags.h. Additionally, adding
arch/riscv/include/asm/percpu.h causes a circular inclusion:
kernel/bounds.c
->include/linux/log2.h
->include/linux/bitops.h
->arch/riscv/include/asm/bitops.h
->include/linux/irqflags.h
->include/linux/find.h
->return val ? __ffs(val) : size;
->arch/riscv/include/asm/bitops.h

The compilation log is as follows:
CC      kernel/bounds.s
In file included from ./include/linux/bitmap.h:11,
               from ./include/linux/cpumask.h:12,
               from ./arch/riscv/include/asm/processor.h:55,
               from ./arch/riscv/include/asm/thread_info.h:42,
               from ./include/linux/thread_info.h:60,
               from ./include/asm-generic/preempt.h:5,
               from ./arch/riscv/include/generated/asm/preempt.h:1,
               from ./include/linux/preempt.h:79,
               from ./arch/riscv/include/asm/percpu.h:8,
               from ./include/linux/irqflags.h:19,
               from ./arch/riscv/include/asm/bitops.h:14,
               from ./include/linux/bitops.h:68,
               from ./include/linux/log2.h:12,
               from kernel/bounds.c:13:
./include/linux/find.h: In function 'find_next_bit':
./include/linux/find.h:66:30: error: implicit declaration of function '__ffs' [-Wimplicit-function-declaration]
   66 |                 return val ? __ffs(val) : size;
      |                              ^~~~~

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
 arch/riscv/include/asm/bitops.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/riscv/include/asm/bitops.h b/arch/riscv/include/asm/bitops.h
index d59310f74c2ba..d9837b3cf05fe 100644
--- a/arch/riscv/include/asm/bitops.h
+++ b/arch/riscv/include/asm/bitops.h
@@ -11,7 +11,6 @@
 #endif /* _LINUX_BITOPS_H */
 
 #include <linux/compiler.h>
-#include <linux/irqflags.h>
 #include <asm/barrier.h>
 #include <asm/bitsperlong.h>
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] riscv: introduce percpu.h into include/asm
  2025-08-19 13:50 [PATCH 0/2] riscv: introduce percpu.h Yunhui Cui
  2025-08-19 13:50 ` [PATCH 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui
@ 2025-08-19 13:50 ` Yunhui Cui
  2025-08-20  6:44   ` kernel test robot
                     ` (2 more replies)
  1 sibling, 3 replies; 7+ messages in thread
From: Yunhui Cui @ 2025-08-19 13:50 UTC (permalink / raw)
  To: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv,
	linux-kernel, dennis, tj, cl, linux-mm
  Cc: Yunhui Cui

Current percpu operations rely on generic implementations, where
raw_local_irq_save() introduces substantial overhead. Optimization
is achieved through atomic operations and preemption disabling.

Since RISC-V does not support lr/sc.b/h, when ZABHA is not supported,
we need to use lr/sc.w instead, which requires some additional mask
operations. In fact, 8/16-bit per-CPU operations are very few. The
counts during system startup are as follows:
Reads: 8-bit: 3, 16-bit: 3, 32-bit: 1531, 64-bit: 471
Writes: 8-bit: 4, 16-bit: 3, 32-bit: 32, 64-bit: 238
Adds: 8-bit: 3, 16-bit: 3, 32-bit: 31858, 64-bit: 7656
Add-Returns: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 2
ANDs: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 0
ANDNOTs: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 0
ORs: 8-bit: 0, 16-bit: 0, 32-bit: 70, 64-bit: 0

hackbench -l 1000:
Reads: 8-bit: 3, 16-bit: 3, 32-bit: 1531, 64-bit: 2522158
Writes: 8-bit: 4, 16-bit: 3, 32-bit: 34, 64-bit: 2521522
Adds: 8-bit: 3, 16-bit: 3, 32-bit: 47771, 64-bit: 19911
Add-Returns: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 2
ANDs: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 0
ANDNOTs: 8-bit: 0, 16-bit: 0, 32-bit: 0, 64-bit: 0
ORs: 8-bit: 0, 16-bit: 0, 32-bit: 70, 64-bit: 0

Based on this, 8bit/16bit per-CPU operations can directly fall back to
the generic implementation.

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
 arch/riscv/include/asm/percpu.h | 138 ++++++++++++++++++++++++++++++++
 1 file changed, 138 insertions(+)
 create mode 100644 arch/riscv/include/asm/percpu.h

diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percpu.h
new file mode 100644
index 0000000000000..5a1fdb37a8056
--- /dev/null
+++ b/arch/riscv/include/asm/percpu.h
@@ -0,0 +1,138 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_PERCPU_H
+#define __ASM_PERCPU_H
+
+#include <linux/preempt.h>
+
+#define PERCPU_RW_OPS(sz)						\
+static inline unsigned long __percpu_read_##sz(void *ptr)		\
+{									\
+	return READ_ONCE(*(u##sz *)ptr);				\
+}									\
+									\
+static inline void __percpu_write_##sz(void *ptr, unsigned long val)	\
+{									\
+	WRITE_ONCE(*(u##sz *)ptr, (u##sz)val);				\
+}
+
+#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn)			\
+static inline void							\
+__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val)		\
+{									\
+	asm volatile (							\
+		"amo" #amo_insn #sfx " zero, %[val], %[ptr]"		\
+		: [ptr] "+A" (*(u##sz *)ptr)				\
+		: [val] "r" ((u##sz)(val))				\
+		: "memory");						\
+}
+
+#define __PERCPU_AMO_RET_OP_CASE(sfx, name, sz, amo_insn)		\
+static inline u##sz							\
+__percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long val)	\
+{									\
+	register u##sz ret;						\
+									\
+	asm volatile (							\
+		"amo" #amo_insn #sfx " %[ret], %[val], %[ptr]"		\
+		: [ptr] "+A" (*(u##sz *)ptr), [ret] "=r" (ret)		\
+		: [val] "r" ((u##sz)(val))				\
+		: "memory");						\
+									\
+	return ret + val;						\
+}
+
+#define PERCPU_OP(name, amo_insn)					\
+	__PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn)			\
+	__PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn)
+
+#define PERCPU_RET_OP(name, amo_insn)					\
+	__PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn)		\
+	__PERCPU_AMO_RET_OP_CASE(.d, name, 64, amo_insn)
+
+PERCPU_RW_OPS(8)
+PERCPU_RW_OPS(16)
+PERCPU_RW_OPS(32)
+PERCPU_RW_OPS(64)
+
+PERCPU_OP(add, add)
+PERCPU_OP(andnot, and)
+PERCPU_OP(or, or)
+PERCPU_RET_OP(add, add)
+
+#undef PERCPU_RW_OPS
+#undef __PERCPU_AMO_OP_CASE
+#undef __PERCPU_AMO_RET_OP_CASE
+#undef PERCPU_OP
+#undef PERCPU_RET_OP
+
+#define _pcp_protect(op, pcp, ...)					\
+({									\
+	preempt_disable_notrace();					\
+	op(raw_cpu_ptr(&(pcp)), __VA_ARGS__);				\
+	preempt_enable_notrace();					\
+})
+
+#define _pcp_protect_return(op, pcp, args...)				\
+({									\
+	typeof(pcp) __retval;						\
+	preempt_disable_notrace();					\
+	__retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args);	\
+	preempt_enable_notrace();					\
+	__retval;							\
+})
+
+#define this_cpu_read_1(pcp)		_pcp_protect_return(__percpu_read_8, pcp)
+#define this_cpu_read_2(pcp)		_pcp_protect_return(__percpu_read_16, pcp)
+#define this_cpu_read_4(pcp)		_pcp_protect_return(__percpu_read_32, pcp)
+#define this_cpu_read_8(pcp)		_pcp_protect_return(__percpu_read_64, pcp)
+
+#define this_cpu_write_1(pcp, val)	_pcp_protect(__percpu_write_8, pcp, (unsigned long)val)
+#define this_cpu_write_2(pcp, val)	_pcp_protect(__percpu_write_16, pcp, (unsigned long)val)
+#define this_cpu_write_4(pcp, val)	_pcp_protect(__percpu_write_32, pcp, (unsigned long)val)
+#define this_cpu_write_8(pcp, val)	_pcp_protect(__percpu_write_64, pcp, (unsigned long)val)
+
+#define this_cpu_add_4(pcp, val)	_pcp_protect(__percpu_add_amo_case_32, pcp, val)
+#define this_cpu_add_8(pcp, val)	_pcp_protect(__percpu_add_amo_case_64, pcp, val)
+
+#define this_cpu_add_return_4(pcp, val)		\
+_pcp_protect_return(__percpu_add_return_amo_case_32, pcp, val)
+
+#define this_cpu_add_return_8(pcp, val)		\
+_pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val)
+
+#define this_cpu_and_4(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_32, pcp, ~val)
+#define this_cpu_and_8(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_64, pcp, ~val)
+
+#define this_cpu_or_4(pcp, val)	_pcp_protect(__percpu_or_amo_case_32, pcp, val)
+#define this_cpu_or_8(pcp, val)	_pcp_protect(__percpu_or_amo_case_64, pcp, val)
+
+#define this_cpu_xchg_1(pcp, val)	_pcp_protect_return(xchg_relaxed, pcp, val)
+#define this_cpu_xchg_2(pcp, val)	_pcp_protect_return(xchg_relaxed, pcp, val)
+#define this_cpu_xchg_4(pcp, val)	_pcp_protect_return(xchg_relaxed, pcp, val)
+#define this_cpu_xchg_8(pcp, val)	_pcp_protect_return(xchg_relaxed, pcp, val)
+
+#define this_cpu_cmpxchg_1(pcp, o, n)	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+#define this_cpu_cmpxchg_2(pcp, o, n)	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+#define this_cpu_cmpxchg_4(pcp, o, n)	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+#define this_cpu_cmpxchg_8(pcp, o, n)	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+
+#define this_cpu_cmpxchg64(pcp, o, n)	this_cpu_cmpxchg_8(pcp, o, n)
+
+#define this_cpu_cmpxchg128(pcp, o, n)					\
+({									\
+	typedef typeof(pcp) pcp_op_T__;					\
+	u128 old__, new__, ret__;					\
+	pcp_op_T__ *ptr__;						\
+	old__ = o;							\
+	new__ = n;							\
+	preempt_disable_notrace();					\
+	ptr__ = raw_cpu_ptr(&(pcp));					\
+	ret__ = cmpxchg128_local(ptr__, old__, new__);			\
+	preempt_enable_notrace();					\
+	ret__;								\
+})
+
+#include <asm-generic/percpu.h>
+
+#endif /* __ASM_PERCPU_H */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] riscv: introduce percpu.h into include/asm
  2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
@ 2025-08-20  6:44   ` kernel test robot
  2025-08-20 17:18   ` kernel test robot
  2025-08-20 23:26   ` Christoph Lameter (Ampere)
  2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2025-08-20  6:44 UTC (permalink / raw)
  To: Yunhui Cui, yury.norov, linux, paul.walmsley, palmer, aou, alex,
	linux-riscv, linux-kernel, dennis, tj, cl, linux-mm
  Cc: llvm, oe-kbuild-all, Yunhui Cui

Hi Yunhui,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on dennis-percpu/for-next v6.17-rc2 next-20250819]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yunhui-Cui/riscv-remove-irqflags-h-inclusion-in-asm-bitops-h/20250819-215256
base:   linus/master
patch link:    https://lore.kernel.org/r/20250819135007.85646-3-cuiyunhui%40bytedance.com
patch subject: [PATCH 2/2] riscv: introduce percpu.h into include/asm
config: riscv-randconfig-002-20250820 (https://download.01.org/0day-ci/archive/20250820/202508201452.ciEgfhNO-lkp@intel.com/config)
compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project 93d24b6b7b148c47a2fa228a4ef31524fa1d9f3f)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250820/202508201452.ciEgfhNO-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508201452.ciEgfhNO-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from arch/riscv/kernel/asm-offsets.c:8:
   In file included from include/linux/mm.h:7:
   In file included from include/linux/gfp.h:7:
   In file included from include/linux/mmzone.h:22:
   In file included from include/linux/mm_types.h:19:
   In file included from include/linux/workqueue.h:9:
>> include/linux/alloc_tag.h:219:2: warning: implicit conversion from 'typeof (tag->counters->calls)' (aka 'unsigned long long') to 'unsigned long' changes value from 18446744073709551615 to 4294967295 [-Wconstant-conversion]
     219 |         this_cpu_dec(tag->counters->calls);
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:512:28: note: expanded from macro 'this_cpu_dec'
     512 | #define this_cpu_dec(pcp)               this_cpu_sub(pcp, 1)
         |                                         ^~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:510:51: note: expanded from macro 'this_cpu_sub'
     510 | #define this_cpu_sub(pcp, val)          this_cpu_add(pcp, -(typeof(pcp))(val))
         |                                         ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:501:70: note: expanded from macro 'this_cpu_add'
     501 | #define this_cpu_add(pcp, val)          __pcpu_size_call(this_cpu_add_, pcp, val)
         |                                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
   include/linux/percpu-defs.h:372:29: note: expanded from macro '__pcpu_size_call'
     372 |                 case 8: stem##8(variable, __VA_ARGS__);break;           \
         |                         ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
   arch/riscv/include/asm/percpu.h:96:78: note: expanded from macro 'this_cpu_add_8'
      96 | #define this_cpu_add_8(pcp, val)        _pcp_protect(__percpu_add_amo_case_64, pcp, val)
         |                                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
   arch/riscv/include/asm/percpu.h:72:26: note: expanded from macro '_pcp_protect'
      72 |         op(raw_cpu_ptr(&(pcp)), __VA_ARGS__);                           \
         |         ~~                      ^~~~~~~~~~~
   1 warning generated.
--
   In file included from arch/riscv/errata/sifive/errata.c:7:
   In file included from include/linux/memory.h:19:
   In file included from include/linux/node.h:18:
   In file included from include/linux/device.h:16:
   In file included from include/linux/energy_model.h:7:
   In file included from include/linux/kobject.h:20:
   In file included from include/linux/sysfs.h:16:
   In file included from include/linux/kernfs.h:12:
   In file included from include/linux/idr.h:15:
   In file included from include/linux/radix-tree.h:16:
   In file included from include/linux/percpu.h:5:
>> include/linux/alloc_tag.h:219:2: warning: implicit conversion from 'typeof (tag->counters->calls)' (aka 'unsigned long long') to 'unsigned long' changes value from 18446744073709551615 to 4294967295 [-Wconstant-conversion]
     219 |         this_cpu_dec(tag->counters->calls);
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:512:28: note: expanded from macro 'this_cpu_dec'
     512 | #define this_cpu_dec(pcp)               this_cpu_sub(pcp, 1)
         |                                         ^~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:510:51: note: expanded from macro 'this_cpu_sub'
     510 | #define this_cpu_sub(pcp, val)          this_cpu_add(pcp, -(typeof(pcp))(val))
         |                                         ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:501:70: note: expanded from macro 'this_cpu_add'
     501 | #define this_cpu_add(pcp, val)          __pcpu_size_call(this_cpu_add_, pcp, val)
         |                                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
   include/linux/percpu-defs.h:372:29: note: expanded from macro '__pcpu_size_call'
     372 |                 case 8: stem##8(variable, __VA_ARGS__);break;           \
         |                         ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
   arch/riscv/include/asm/percpu.h:96:78: note: expanded from macro 'this_cpu_add_8'
      96 | #define this_cpu_add_8(pcp, val)        _pcp_protect(__percpu_add_amo_case_64, pcp, val)
         |                                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
   arch/riscv/include/asm/percpu.h:72:26: note: expanded from macro '_pcp_protect'
      72 |         op(raw_cpu_ptr(&(pcp)), __VA_ARGS__);                           \
         |         ~~                      ^~~~~~~~~~~
   arch/riscv/errata/sifive/errata.c:29:14: warning: result of comparison of constant 9223372036854775815 with expression of type 'unsigned long' is always true [-Wtautological-constant-out-of-range-compare]
      29 |         if (arch_id != 0x8000000000000007 ||
         |             ~~~~~~~ ^  ~~~~~~~~~~~~~~~~~~
   arch/riscv/errata/sifive/errata.c:42:14: warning: result of comparison of constant 9223372036854775815 with expression of type 'unsigned long' is always true [-Wtautological-constant-out-of-range-compare]
      42 |         if (arch_id != 0x8000000000000007 && arch_id != 0x1)
         |             ~~~~~~~ ^  ~~~~~~~~~~~~~~~~~~
   3 warnings generated.
--
   In file included from arch/riscv/kernel/asm-offsets.c:8:
   In file included from include/linux/mm.h:7:
   In file included from include/linux/gfp.h:7:
   In file included from include/linux/mmzone.h:22:
   In file included from include/linux/mm_types.h:19:
   In file included from include/linux/workqueue.h:9:
>> include/linux/alloc_tag.h:219:2: warning: implicit conversion from 'typeof (tag->counters->calls)' (aka 'unsigned long long') to 'unsigned long' changes value from 18446744073709551615 to 4294967295 [-Wconstant-conversion]
     219 |         this_cpu_dec(tag->counters->calls);
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:512:28: note: expanded from macro 'this_cpu_dec'
     512 | #define this_cpu_dec(pcp)               this_cpu_sub(pcp, 1)
         |                                         ^~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:510:51: note: expanded from macro 'this_cpu_sub'
     510 | #define this_cpu_sub(pcp, val)          this_cpu_add(pcp, -(typeof(pcp))(val))
         |                                         ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:501:70: note: expanded from macro 'this_cpu_add'
     501 | #define this_cpu_add(pcp, val)          __pcpu_size_call(this_cpu_add_, pcp, val)
         |                                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
   include/linux/percpu-defs.h:372:29: note: expanded from macro '__pcpu_size_call'
     372 |                 case 8: stem##8(variable, __VA_ARGS__);break;           \
         |                         ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
   arch/riscv/include/asm/percpu.h:96:78: note: expanded from macro 'this_cpu_add_8'
      96 | #define this_cpu_add_8(pcp, val)        _pcp_protect(__percpu_add_amo_case_64, pcp, val)
         |                                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
   arch/riscv/include/asm/percpu.h:72:26: note: expanded from macro '_pcp_protect'
      72 |         op(raw_cpu_ptr(&(pcp)), __VA_ARGS__);                           \
         |         ~~                      ^~~~~~~~~~~
   1 warning generated.


vim +219 include/linux/alloc_tag.h

22d407b164ff79 Suren Baghdasaryan 2024-03-21  202  
22d407b164ff79 Suren Baghdasaryan 2024-03-21  203  static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes)
22d407b164ff79 Suren Baghdasaryan 2024-03-21  204  {
22d407b164ff79 Suren Baghdasaryan 2024-03-21  205  	struct alloc_tag *tag;
22d407b164ff79 Suren Baghdasaryan 2024-03-21  206  
22d407b164ff79 Suren Baghdasaryan 2024-03-21  207  	alloc_tag_sub_check(ref);
22d407b164ff79 Suren Baghdasaryan 2024-03-21  208  	if (!ref || !ref->ct)
22d407b164ff79 Suren Baghdasaryan 2024-03-21  209  		return;
22d407b164ff79 Suren Baghdasaryan 2024-03-21  210  
239d6c96d86f8a Suren Baghdasaryan 2024-03-21  211  	if (is_codetag_empty(ref)) {
239d6c96d86f8a Suren Baghdasaryan 2024-03-21  212  		ref->ct = NULL;
239d6c96d86f8a Suren Baghdasaryan 2024-03-21  213  		return;
239d6c96d86f8a Suren Baghdasaryan 2024-03-21  214  	}
239d6c96d86f8a Suren Baghdasaryan 2024-03-21  215  
22d407b164ff79 Suren Baghdasaryan 2024-03-21  216  	tag = ct_to_alloc_tag(ref->ct);
22d407b164ff79 Suren Baghdasaryan 2024-03-21  217  
22d407b164ff79 Suren Baghdasaryan 2024-03-21  218  	this_cpu_sub(tag->counters->bytes, bytes);
22d407b164ff79 Suren Baghdasaryan 2024-03-21 @219  	this_cpu_dec(tag->counters->calls);
22d407b164ff79 Suren Baghdasaryan 2024-03-21  220  
22d407b164ff79 Suren Baghdasaryan 2024-03-21  221  	ref->ct = NULL;
22d407b164ff79 Suren Baghdasaryan 2024-03-21  222  }
22d407b164ff79 Suren Baghdasaryan 2024-03-21  223  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] riscv: introduce percpu.h into include/asm
  2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
  2025-08-20  6:44   ` kernel test robot
@ 2025-08-20 17:18   ` kernel test robot
  2025-08-20 23:26   ` Christoph Lameter (Ampere)
  2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2025-08-20 17:18 UTC (permalink / raw)
  To: Yunhui Cui, yury.norov, linux, paul.walmsley, palmer, aou, alex,
	linux-riscv, linux-kernel, dennis, tj, cl, linux-mm
  Cc: oe-kbuild-all, Yunhui Cui

Hi Yunhui,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on dennis-percpu/for-next v6.17-rc2 next-20250820]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yunhui-Cui/riscv-remove-irqflags-h-inclusion-in-asm-bitops-h/20250819-215256
base:   linus/master
patch link:    https://lore.kernel.org/r/20250819135007.85646-3-cuiyunhui%40bytedance.com
patch subject: [PATCH 2/2] riscv: introduce percpu.h into include/asm
config: riscv-allnoconfig (https://download.01.org/0day-ci/archive/20250821/202508210101.WySkXlSZ-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250821/202508210101.WySkXlSZ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508210101.WySkXlSZ-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/atomic.h:80,
                    from include/linux/cpumask.h:14,
                    from include/linux/smp.h:13,
                    from include/linux/lockdep.h:14,
                    from include/linux/spinlock.h:63,
                    from include/linux/mmzone.h:8,
                    from include/linux/gfp.h:7,
                    from include/linux/mm.h:7,
                    from mm/slub.c:13:
   mm/slub.c: In function '__update_cpu_freelist_fast':
>> include/linux/atomic/atomic-arch-fallback.h:414:30: error: implicit declaration of function 'arch_cmpxchg128_local'; did you mean 'arch_cmpxchg64_local'? [-Wimplicit-function-declaration]
     414 | #define raw_cmpxchg128_local arch_cmpxchg128_local
         |                              ^~~~~~~~~~~~~~~~~~~~~
   include/linux/atomic/atomic-instrumented.h:5005:9: note: in expansion of macro 'raw_cmpxchg128_local'
    5005 |         raw_cmpxchg128_local(__ai_ptr, __VA_ARGS__); \
         |         ^~~~~~~~~~~~~~~~~~~~
   arch/riscv/include/asm/percpu.h:131:17: note: in expansion of macro 'cmpxchg128_local'
     131 |         ret__ = cmpxchg128_local(ptr__, old__, new__);                  \
         |                 ^~~~~~~~~~~~~~~~
   include/asm-generic/percpu.h:108:17: note: in expansion of macro 'this_cpu_cmpxchg128'
     108 |         __val = _cmpxchg(pcp, __old, nval);                             \
         |                 ^~~~~~~~
   include/asm-generic/percpu.h:527:9: note: in expansion of macro '__cpu_fallback_try_cmpxchg'
     527 |         __cpu_fallback_try_cmpxchg(pcp, ovalp, nval, this_cpu_cmpxchg128)
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
   mm/slab.h:24:41: note: in expansion of macro 'this_cpu_try_cmpxchg128'
      24 | #define this_cpu_try_cmpxchg_freelist   this_cpu_try_cmpxchg128
         |                                         ^~~~~~~~~~~~~~~~~~~~~~~
   mm/slub.c:3638:16: note: in expansion of macro 'this_cpu_try_cmpxchg_freelist'
    3638 |         return this_cpu_try_cmpxchg_freelist(s->cpu_slab->freelist_tid.full,
         |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +414 include/linux/atomic/atomic-arch-fallback.h

9257959a6e5b4f Mark Rutland 2023-06-05  413  
9257959a6e5b4f Mark Rutland 2023-06-05 @414  #define raw_cmpxchg128_local arch_cmpxchg128_local
e6ce9d741163af Uros Bizjak  2023-04-05  415  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] riscv: introduce percpu.h into include/asm
  2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
  2025-08-20  6:44   ` kernel test robot
  2025-08-20 17:18   ` kernel test robot
@ 2025-08-20 23:26   ` Christoph Lameter (Ampere)
  2025-08-21  8:01     ` [External] " yunhui cui
  2 siblings, 1 reply; 7+ messages in thread
From: Christoph Lameter (Ampere) @ 2025-08-20 23:26 UTC (permalink / raw)
  To: Yunhui Cui
  Cc: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv,
	linux-kernel, dennis, tj, linux-mm

On Tue, 19 Aug 2025, Yunhui Cui wrote:

> +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn)			\
> +static inline void							\
> +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val)		\
> +{									\
> +	asm volatile (							\
> +		"amo" #amo_insn #sfx " zero, %[val], %[ptr]"		\
> +		: [ptr] "+A" (*(u##sz *)ptr)				\
> +		: [val] "r" ((u##sz)(val))				\
> +		: "memory");						\
> +}

AMO creates a single instruction that performs the operation?

> +#define _pcp_protect(op, pcp, ...)					\
> +({									\
> +	preempt_disable_notrace();					\
> +	op(raw_cpu_ptr(&(pcp)), __VA_ARGS__);				\
> +	preempt_enable_notrace();					\
> +})

Is "op" a single instruction? If so then preempt disable / endable would
not be needed if there is no other instruction created.

But raw_cpu_ptr performs a SHIFT_PERCPU_PTR which performs an addition.
So you need the disabling of preemption to protect the add.

Is there a way on RISC V to embedd the pointer arithmetic in the "AMO"
instruction? Or can you use relative addressing to a register that
contains the cpu offset. I believe RISC V has a thread pointer?

If you can do this then a lot of preempt_enable/disable points can be
removed from the core kernel and the instruction may be as scalable as x86
which can do the per cpu operations with a single instruction.

> +
> +#define _pcp_protect_return(op, pcp, args...)				\
> +({									\
> +	typeof(pcp) __retval;						\
> +	preempt_disable_notrace();					\
> +	__retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args);	\
> +	preempt_enable_notrace();					\
> +	__retval;							\
> +})

Same here.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] Re: [PATCH 2/2] riscv: introduce percpu.h into include/asm
  2025-08-20 23:26   ` Christoph Lameter (Ampere)
@ 2025-08-21  8:01     ` yunhui cui
  0 siblings, 0 replies; 7+ messages in thread
From: yunhui cui @ 2025-08-21  8:01 UTC (permalink / raw)
  To: Christoph Lameter (Ampere)
  Cc: yury.norov, linux, paul.walmsley, palmer, aou, alex, linux-riscv,
	linux-kernel, dennis, tj, linux-mm

Hi Christoph,

On Thu, Aug 21, 2025 at 7:39 AM Christoph Lameter (Ampere)
<cl@gentwo.org> wrote:
>
> On Tue, 19 Aug 2025, Yunhui Cui wrote:
>
> > +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn)                        \
> > +static inline void                                                   \
> > +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val)                \
> > +{                                                                    \
> > +     asm volatile (                                                  \
> > +             "amo" #amo_insn #sfx " zero, %[val], %[ptr]"            \
> > +             : [ptr] "+A" (*(u##sz *)ptr)                            \
> > +             : [val] "r" ((u##sz)(val))                              \
> > +             : "memory");                                            \
> > +}
>
> AMO creates a single instruction that performs the operation?
>
> > +#define _pcp_protect(op, pcp, ...)                                   \
> > +({                                                                   \
> > +     preempt_disable_notrace();                                      \
> > +     op(raw_cpu_ptr(&(pcp)), __VA_ARGS__);                           \
> > +     preempt_enable_notrace();                                       \
> > +})
>
> Is "op" a single instruction? If so then preempt disable / endable would
> not be needed if there is no other instruction created.
>
> But raw_cpu_ptr performs a SHIFT_PERCPU_PTR which performs an addition.
> So you need the disabling of preemption to protect the add.
>
> Is there a way on RISC V to embedd the pointer arithmetic in the "AMO"
> instruction? Or can you use relative addressing to a register that
> contains the cpu offset. I believe RISC V has a thread pointer?
>
> If you can do this then a lot of preempt_enable/disable points can be
> removed from the core kernel and the instruction may be as scalable as x86
> which can do the per cpu operations with a single instruction.

Yes, thank you. While it’s certainly good to remove preemption
disabling, currently RISC-V’s amoadd.w/d instructions can take the
address of a variable rather than a register.

I previously submitted an attempt to use gp to store the percpu
offset, and we are also trying to push for an extension that uses a
register to store the percpu offset.
https://lore.kernel.org/all/CAEEQ3w=PsM5T+yMrEGdWZ2nm7m7SX3vzscLtWpOPVu1zpfm3YQ@mail.gmail.com/
https://lists.riscv.org/g/tech-privileged/topic/risc_v_tech_arch_review/113437553?page=2

>
> > +
> > +#define _pcp_protect_return(op, pcp, args...)                                \
> > +({                                                                   \
> > +     typeof(pcp) __retval;                                           \
> > +     preempt_disable_notrace();                                      \
> > +     __retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args);        \
> > +     preempt_enable_notrace();                                       \
> > +     __retval;                                                       \
> > +})
>
> Same here.
>
>

Thanks,
Yunhui

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-08-21  8:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-19 13:50 [PATCH 0/2] riscv: introduce percpu.h Yunhui Cui
2025-08-19 13:50 ` [PATCH 1/2] riscv: remove irqflags.h inclusion in asm/bitops.h Yunhui Cui
2025-08-19 13:50 ` [PATCH 2/2] riscv: introduce percpu.h into include/asm Yunhui Cui
2025-08-20  6:44   ` kernel test robot
2025-08-20 17:18   ` kernel test robot
2025-08-20 23:26   ` Christoph Lameter (Ampere)
2025-08-21  8:01     ` [External] " yunhui cui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).