BPF List
 help / color / mirror / Atom feed
* [PATCH v4 0/3] riscv: improve percpu helpers and PIO mapping
@ 2026-05-05  6:20 Yunhui Cui
  2026-05-05  6:20 ` [PATCH v4 1/3] riscv: io: avoid null-pointer arithmetic in PIO helpers Yunhui Cui
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Yunhui Cui @ 2026-05-05  6:20 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, dennis, tj, cl, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, bjorn,
	pulehui, puranjay, thuth, ajones, ben.dooks, rkrcmar, cuiyunhui,
	samuel.holland, zong.li, conor.dooley, tglx, debug, seanwascoding,
	andybnac, menglong8.dong, cyrilbur, wangruikang, atishp, apatel,
	linux-riscv, linux-kernel, linux-mm, bpf, arnd, nathan,
	nick.desaulniers+lkml, morbo, justinstitt, qingfang.deng,
	linux-arch, llvm

This series makes three small RISC-V cleanups and fixes around percpu access
and PIO helper handling.

This work is motivated in part by build reports from lkp and by follow-up
review/discussion around the percpu access rework:

https://lore.kernel.org/all/202512202218.FI6bB5kV-lkp@intel.com/
https://lore.kernel.org/all/202512210052.w0bpUAAO-lkp@intel.com/

1. Avoid forming invalid PIO address expressions when I/O port support is not
enabled, while keeping the generic pci_iounmap() behavior intact.

2. Introduce arch/riscv/include/asm/percpu.h with RISC-V-specific percpu
helpers, including the fix for the 8/16-bit add_return LR/SC fallback.

3. Cache the percpu offset in thread_info so percpu accesses can use it
directly across the relevant RISC-V paths.

Yunhui Cui (3):
  riscv: io: avoid null-pointer arithmetic in PIO helpers
  riscv: introduce percpu.h into include/asm
  riscv: store percpu offset into thread_info

 arch/riscv/include/asm/asm.h         |   6 +-
 arch/riscv/include/asm/io.h          |  26 ++-
 arch/riscv/include/asm/percpu.h      | 284 +++++++++++++++++++++++++++
 arch/riscv/include/asm/switch_to.h   |   8 +
 arch/riscv/include/asm/thread_info.h |   3 +-
 arch/riscv/kernel/asm-offsets.c      |   1 +
 arch/riscv/kernel/smpboot.c          |   7 +
 arch/riscv/net/bpf_jit_comp64.c      |   9 +-
 include/asm-generic/io.h             |   4 +
 9 files changed, 326 insertions(+), 22 deletions(-)
 create mode 100644 arch/riscv/include/asm/percpu.h

-- 
2.39.5


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4 1/3] riscv: io: avoid null-pointer arithmetic in PIO helpers
  2026-05-05  6:20 [PATCH v4 0/3] riscv: improve percpu helpers and PIO mapping Yunhui Cui
@ 2026-05-05  6:20 ` Yunhui Cui
  2026-05-05  6:33   ` Arnd Bergmann
  2026-05-05  7:20   ` bot+bpf-ci
  2026-05-05  6:20 ` [PATCH v4 2/3] riscv: introduce percpu.h into include/asm Yunhui Cui
  2026-05-05  6:20 ` [PATCH v4 3/3] riscv: store percpu offset into thread_info Yunhui Cui
  2 siblings, 2 replies; 10+ messages in thread
From: Yunhui Cui @ 2026-05-05  6:20 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, dennis, tj, cl, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, bjorn,
	pulehui, puranjay, thuth, ajones, ben.dooks, rkrcmar, cuiyunhui,
	samuel.holland, zong.li, conor.dooley, tglx, debug, seanwascoding,
	andybnac, menglong8.dong, cyrilbur, wangruikang, atishp, apatel,
	linux-riscv, linux-kernel, linux-mm, bpf, arnd, nathan,
	nick.desaulniers+lkml, morbo, justinstitt, qingfang.deng,
	linux-arch, llvm

The RISC-V PIO helpers derive I/O addresses from PCI_IOBASE in ins*(),
outs*(), and ioport_map().

Under configurations where I/O port support is not available, these
expressions can still be formed during compilation and trigger
-Wnull-pointer-arithmetic warnings from clang.

Introduce a helper for the address calculation and guard the PIO-only
helpers with CONFIG_HAS_IOPORT so unsupported configurations do not
construct these PIO address expressions.

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
 arch/riscv/include/asm/io.h | 26 ++++++++++++++++++--------
 include/asm-generic/io.h    |  4 ++++
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/riscv/include/asm/io.h b/arch/riscv/include/asm/io.h
index 09bb5f57a9d34..6f5d70313c83e 100644
--- a/arch/riscv/include/asm/io.h
+++ b/arch/riscv/include/asm/io.h
@@ -56,6 +56,8 @@
 #define __io_pbw()	RISCV_FENCE(iow, o)
 #define __io_paw()	RISCV_FENCE(o, io)
 
+#define PCI_IO_ADDR(addr)	((void __iomem *)((unsigned long)PCI_IOBASE + (addr)))
+
 /*
  * Accesses from a single hart to a single I/O address must be ordered.  This
  * allows us to use the raw read macros, but we still need to fence before and
@@ -102,12 +104,14 @@ __io_reads_ins(reads, u32, l, __io_br(), __io_ar(addr))
 #define readsw(addr, buffer, count) __readsw(addr, buffer, count)
 #define readsl(addr, buffer, count) __readsl(addr, buffer, count)
 
+#ifdef CONFIG_HAS_IOPORT
 __io_reads_ins(ins,  u8, b, __io_pbr(), __io_par(addr))
 __io_reads_ins(ins, u16, w, __io_pbr(), __io_par(addr))
 __io_reads_ins(ins, u32, l, __io_pbr(), __io_par(addr))
-#define insb(addr, buffer, count) __insb(PCI_IOBASE + (addr), buffer, count)
-#define insw(addr, buffer, count) __insw(PCI_IOBASE + (addr), buffer, count)
-#define insl(addr, buffer, count) __insl(PCI_IOBASE + (addr), buffer, count)
+#define insb(addr, buffer, count) __insb(PCI_IO_ADDR(addr), buffer, count)
+#define insw(addr, buffer, count) __insw(PCI_IO_ADDR(addr), buffer, count)
+#define insl(addr, buffer, count) __insl(PCI_IO_ADDR(addr), buffer, count)
+#endif
 
 __io_writes_outs(writes,  u8, b, __io_bw(), __io_aw())
 __io_writes_outs(writes, u16, w, __io_bw(), __io_aw())
@@ -116,25 +120,31 @@ __io_writes_outs(writes, u32, l, __io_bw(), __io_aw())
 #define writesw(addr, buffer, count) __writesw(addr, buffer, count)
 #define writesl(addr, buffer, count) __writesl(addr, buffer, count)
 
+#ifdef CONFIG_HAS_IOPORT
 __io_writes_outs(outs,  u8, b, __io_pbw(), __io_paw())
 __io_writes_outs(outs, u16, w, __io_pbw(), __io_paw())
 __io_writes_outs(outs, u32, l, __io_pbw(), __io_paw())
-#define outsb(addr, buffer, count) __outsb(PCI_IOBASE + (addr), buffer, count)
-#define outsw(addr, buffer, count) __outsw(PCI_IOBASE + (addr), buffer, count)
-#define outsl(addr, buffer, count) __outsl(PCI_IOBASE + (addr), buffer, count)
+#define outsb(addr, buffer, count) __outsb(PCI_IO_ADDR(addr), buffer, count)
+#define outsw(addr, buffer, count) __outsw(PCI_IO_ADDR(addr), buffer, count)
+#define outsl(addr, buffer, count) __outsl(PCI_IO_ADDR(addr), buffer, count)
+#endif
 
 #ifdef CONFIG_64BIT
 __io_reads_ins(reads, u64, q, __io_br(), __io_ar(addr))
 #define readsq(addr, buffer, count) __readsq(addr, buffer, count)
 
+#ifdef CONFIG_HAS_IOPORT
 __io_reads_ins(ins, u64, q, __io_pbr(), __io_par(addr))
-#define insq(addr, buffer, count) __insq(PCI_IOBASE + (addr), buffer, count)
+#define insq(addr, buffer, count) __insq(PCI_IO_ADDR(addr), buffer, count)
+#endif
 
 __io_writes_outs(writes, u64, q, __io_bw(), __io_aw())
 #define writesq(addr, buffer, count) __writesq(addr, buffer, count)
 
+#ifdef CONFIG_HAS_IOPORT
 __io_writes_outs(outs, u64, q, __io_pbr(), __io_paw())
-#define outsq(addr, buffer, count) __outsq(PCI_IOBASE + (addr), buffer, count)
+#define outsq(addr, buffer, count) __outsq(PCI_IO_ADDR(addr), buffer, count)
+#endif
 #endif
 
 #include <asm-generic/io.h>
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index ca5a1ce6f0f89..d799e5ccc9437 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -1205,8 +1205,12 @@ static inline void __iomem *ioremap_np(phys_addr_t offset, size_t size)
 #define ioport_map ioport_map
 static inline void __iomem *ioport_map(unsigned long port, unsigned int nr)
 {
+#ifdef CONFIG_HAS_IOPORT
 	port &= IO_SPACE_LIMIT;
 	return (port > MMIO_UPPER_LIMIT) ? NULL : PCI_IOBASE + port;
+#else
+	return NULL;
+#endif
 }
 #define ARCH_HAS_GENERIC_IOPORT_MAP
 #endif
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 2/3] riscv: introduce percpu.h into include/asm
  2026-05-05  6:20 [PATCH v4 0/3] riscv: improve percpu helpers and PIO mapping Yunhui Cui
  2026-05-05  6:20 ` [PATCH v4 1/3] riscv: io: avoid null-pointer arithmetic in PIO helpers Yunhui Cui
@ 2026-05-05  6:20 ` Yunhui Cui
  2026-05-05  7:05   ` bot+bpf-ci
  2026-05-05  7:26   ` sashiko-bot
  2026-05-05  6:20 ` [PATCH v4 3/3] riscv: store percpu offset into thread_info Yunhui Cui
  2 siblings, 2 replies; 10+ messages in thread
From: Yunhui Cui @ 2026-05-05  6:20 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, dennis, tj, cl, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, bjorn,
	pulehui, puranjay, thuth, ajones, ben.dooks, rkrcmar, cuiyunhui,
	samuel.holland, zong.li, conor.dooley, tglx, debug, seanwascoding,
	andybnac, menglong8.dong, cyrilbur, wangruikang, atishp, apatel,
	linux-riscv, linux-kernel, linux-mm, bpf, arnd, nathan,
	nick.desaulniers+lkml, morbo, justinstitt, qingfang.deng,
	linux-arch, llvm

Introduce a RISC-V percpu header with direct this_cpu helpers for
read/write/add/and/or/xchg/cmpxchg operations.

Make the 8/16-bit add_return LR/SC fallback return the logical
subword result so callers at non-zero subword offsets see the
expected value.

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
 arch/riscv/include/asm/percpu.h | 280 ++++++++++++++++++++++++++++++++
 1 file changed, 280 insertions(+)
 create mode 100644 arch/riscv/include/asm/percpu.h

diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percpu.h
new file mode 100644
index 0000000000000..3b26fe45e70f4
--- /dev/null
+++ b/arch/riscv/include/asm/percpu.h
@@ -0,0 +1,280 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef __ASM_PERCPU_H
+#define __ASM_PERCPU_H
+
+#include <linux/preempt.h>
+
+#include <asm/alternative-macros.h>
+#include <asm/cpufeature-macros.h>
+#include <asm/hwcap.h>
+
+#define PERCPU_RW_OPS(sz)						\
+static inline unsigned long __percpu_read_##sz(void *ptr)		\
+{									\
+	return READ_ONCE(*(u##sz *)ptr);				\
+}									\
+									\
+static inline void __percpu_write_##sz(void *ptr, unsigned long val)	\
+{									\
+	WRITE_ONCE(*(u##sz *)ptr, (u##sz)val);				\
+}
+
+PERCPU_RW_OPS(8)
+PERCPU_RW_OPS(16)
+PERCPU_RW_OPS(32)
+
+#ifdef CONFIG_64BIT
+PERCPU_RW_OPS(64)
+#endif
+
+#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn)			\
+static inline void							\
+__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val)		\
+{									\
+	asm volatile (							\
+		"amo" #amo_insn #sfx " zero, %[val], %[ptr]"		\
+		: [ptr] "+A" (*(u##sz *)ptr)				\
+		: [val] "r" ((u##sz)(val))				\
+		: "memory");						\
+}
+
+#ifdef CONFIG_64BIT
+#define PERCPU_OP(name, amo_insn)					\
+	__PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn)			\
+	__PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn)
+#else
+#define PERCPU_OP(name, amo_insn)					\
+	__PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn)
+#endif
+
+PERCPU_OP(add, add)
+PERCPU_OP(andnot, and)
+PERCPU_OP(or, or)
+
+/*
+ * Currently, only this_cpu_add_return_xxx() requires a return value,
+ * and the PERCPU_RET_OP() does not account for other operations.
+ */
+#define __PERCPU_AMO_RET_OP_CASE(sfx, name, sz, amo_insn)		\
+static inline u##sz							\
+__percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long val)	\
+{									\
+	register u##sz ret;						\
+									\
+	asm volatile (							\
+		"amo" #amo_insn #sfx " %[ret], %[val], %[ptr]"		\
+		: [ptr] "+A" (*(u##sz *)ptr), [ret] "=r" (ret)		\
+		: [val] "r" ((u##sz)(val))				\
+		: "memory");						\
+									\
+	return ret + val;						\
+}
+
+#ifdef CONFIG_64BIT
+#define PERCPU_RET_OP(name, amo_insn)					\
+	__PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn)		\
+	__PERCPU_AMO_RET_OP_CASE(.d, name, 64, amo_insn)
+#else
+#define PERCPU_RET_OP(name, amo_insn)					\
+	__PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn)
+#endif
+
+PERCPU_RET_OP(add, add)
+
+#define PERCPU_8_16_GET_SHIFT(ptr)	(((unsigned long)(ptr) & 0x3) * BITS_PER_BYTE)
+#define PERCPU_8_16_GET_MASK(sz)	GENMASK((sz) - 1, 0)
+#define PERCPU_8_16_GET_PTR32(ptr)	((u32 *)((unsigned long)(ptr) & ~0x3))
+
+#define PERCPU_8_16_OP(name, amo_insn, sz, sfx, val_type, new_val_expr, asm_op)			\
+static inline void __percpu_##name##_amo_case_##sz(void *ptr, unsigned long val)		\
+{												\
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) &&						\
+		riscv_has_extension_unlikely(RISCV_ISA_EXT_ZABHA)) {				\
+		asm volatile ("amo" #amo_insn #sfx " zero, %[val], %[ptr]"			\
+			: [ptr] "+A"(*(val_type *)ptr)						\
+			: [val] "r"((val_type)((new_val_expr) & PERCPU_8_16_GET_MASK(sz)))	\
+			: "memory");								\
+	} else {										\
+		u32 *ptr32 = PERCPU_8_16_GET_PTR32(ptr);					\
+		const unsigned long shift = PERCPU_8_16_GET_SHIFT(ptr);				\
+		const u32 mask = PERCPU_8_16_GET_MASK(sz) << shift;				\
+		const val_type val_trunc = (val_type)((new_val_expr)				\
+					   & PERCPU_8_16_GET_MASK(sz));				\
+		u32 retx, rc;									\
+		val_type new_val_type;								\
+												\
+		asm volatile (									\
+			"0: lr.w %0, %2\n"							\
+			"and %3, %0, %4\n"							\
+			"srl %3, %3, %5\n"							\
+			#asm_op " %3, %3, %6\n"							\
+			"sll %3, %3, %5\n"							\
+			"and %1, %0, %7\n"							\
+			"or %1, %1, %3\n"							\
+			"sc.w %1, %1, %2\n"							\
+			"bnez %1, 0b\n"								\
+			: "=&r"(retx), "=&r"(rc), "+A"(*ptr32), "=&r"(new_val_type)		\
+			: "r"(mask), "r"(shift), "r"(val_trunc), "r"(~mask)			\
+			: "memory");								\
+		}										\
+}
+
+#define PERCPU_OP_8_16(op_name, op, expr, final_op)			\
+	PERCPU_8_16_OP(op_name, op, 8, .b, u8, expr, final_op);		\
+	PERCPU_8_16_OP(op_name, op, 16, .h, u16, expr, final_op)
+
+PERCPU_OP_8_16(add, add, val, add)
+PERCPU_OP_8_16(andnot, and, ~(val), and)
+PERCPU_OP_8_16(or, or, val, or)
+
+#define PERCPU_8_16_RET_OP(name, amo_insn, sz, sfx, val_type, new_val_expr)			\
+static inline val_type __percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long val)	\
+{												\
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) &&						\
+		riscv_has_extension_unlikely(RISCV_ISA_EXT_ZABHA)) {				\
+		register val_type ret;								\
+		asm volatile ("amo" #amo_insn #sfx " %[ret], %[val], %[ptr]"			\
+			: [ptr] "+A"(*(val_type *)ptr), [ret] "=r"(ret)				\
+			: [val] "r"((val_type)((new_val_expr) & PERCPU_8_16_GET_MASK(sz)))	\
+			: "memory");								\
+		return ret + (val_type)((new_val_expr) & PERCPU_8_16_GET_MASK(sz));		\
+	} else {										\
+		u32 *ptr32 = PERCPU_8_16_GET_PTR32(ptr);					\
+		const unsigned long shift = PERCPU_8_16_GET_SHIFT(ptr);				\
+		const u32 mask = (PERCPU_8_16_GET_MASK(sz) << shift);				\
+		const u32 inv_mask = ~mask;							\
+		const val_type val_trunc = (val_type)((new_val_expr)				\
+					   & PERCPU_8_16_GET_MASK(sz));				\
+		u32 old, new, tmp;								\
+												\
+		asm volatile (									\
+			"0: lr.w %0, %3\n"							\
+			"and %1, %0, %4\n"							\
+			"srl %1, %1, %5\n"							\
+			"add %1, %1, %6\n"							\
+			"and %1, %1, %7\n"							\
+			"sll %1, %1, %5\n"							\
+			"and %2, %0, %8\n"							\
+			"or %2, %2, %1\n"							\
+			"sc.w %2, %2, %3\n"							\
+			"bnez %2, 0b\n"								\
+			: "=r"(old), "=r"(tmp), "=&r"(new), "+A"(*ptr32)			\
+			: "r"(mask), "r"(shift), "r"(val_trunc), "r"(PERCPU_8_16_GET_MASK(sz)), \
+			"r"(inv_mask)								\
+			: "memory");								\
+		return (val_type)(tmp >> shift);						\
+	}											\
+}
+
+PERCPU_8_16_RET_OP(add, add, 8, .b, u8, val)
+PERCPU_8_16_RET_OP(add, add, 16, .h, u16, val)
+
+#define _pcp_protect(op, pcp, ...)					\
+({									\
+	preempt_disable_notrace();					\
+	op(raw_cpu_ptr(&(pcp)), __VA_ARGS__);				\
+	preempt_enable_notrace();					\
+})
+
+#define _pcp_protect_return(op, pcp, args...)				\
+({									\
+	typeof(pcp) __retval;						\
+	preempt_disable_notrace();					\
+	__retval = (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args);	\
+	preempt_enable_notrace();					\
+	__retval;							\
+})
+
+#define this_cpu_read_1(pcp)		_pcp_protect_return(__percpu_read_8, pcp)
+#define this_cpu_read_2(pcp)		_pcp_protect_return(__percpu_read_16, pcp)
+#define this_cpu_read_4(pcp)		_pcp_protect_return(__percpu_read_32, pcp)
+
+#ifdef CONFIG_64BIT
+#define this_cpu_read_8(pcp)		_pcp_protect_return(__percpu_read_64, pcp)
+#endif
+
+#define this_cpu_write_1(pcp, val)	_pcp_protect(__percpu_write_8, pcp, (unsigned long)val)
+#define this_cpu_write_2(pcp, val)	_pcp_protect(__percpu_write_16, pcp, (unsigned long)val)
+#define this_cpu_write_4(pcp, val)	_pcp_protect(__percpu_write_32, pcp, (unsigned long)val)
+
+#ifdef CONFIG_64BIT
+#define this_cpu_write_8(pcp, val)	_pcp_protect(__percpu_write_64, pcp, (unsigned long)val)
+#endif
+
+#define this_cpu_add_1(pcp, val)	_pcp_protect(__percpu_add_amo_case_8, pcp, val)
+#define this_cpu_add_2(pcp, val)	_pcp_protect(__percpu_add_amo_case_16, pcp, val)
+#define this_cpu_add_4(pcp, val)	_pcp_protect(__percpu_add_amo_case_32, pcp, val)
+
+#ifdef CONFIG_64BIT
+#define this_cpu_add_8(pcp, val)	_pcp_protect(__percpu_add_amo_case_64, pcp, val)
+#endif
+
+#define this_cpu_add_return_1(pcp, val)		\
+_pcp_protect_return(__percpu_add_return_amo_case_8, pcp, val)
+
+#define this_cpu_add_return_2(pcp, val)		\
+_pcp_protect_return(__percpu_add_return_amo_case_16, pcp, val)
+
+#define this_cpu_add_return_4(pcp, val)		\
+_pcp_protect_return(__percpu_add_return_amo_case_32, pcp, val)
+
+#ifdef CONFIG_64BIT
+#define this_cpu_add_return_8(pcp, val)		\
+_pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val)
+#endif
+
+#define this_cpu_and_1(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_8, pcp, ~(val))
+#define this_cpu_and_2(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_16, pcp, ~(val))
+#define this_cpu_and_4(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_32, pcp, ~(val))
+
+#ifdef CONFIG_64BIT
+#define this_cpu_and_8(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_64, pcp, ~(val))
+#endif
+
+#define this_cpu_or_1(pcp, val)	_pcp_protect(__percpu_or_amo_case_8, pcp, val)
+#define this_cpu_or_2(pcp, val)	_pcp_protect(__percpu_or_amo_case_16, pcp, val)
+#define this_cpu_or_4(pcp, val)	_pcp_protect(__percpu_or_amo_case_32, pcp, val)
+
+#ifdef CONFIG_64BIT
+#define this_cpu_or_8(pcp, val)	_pcp_protect(__percpu_or_amo_case_64, pcp, val)
+#endif
+
+#define this_cpu_xchg_1(pcp, val)	_pcp_protect_return(xchg_relaxed, pcp, val)
+#define this_cpu_xchg_2(pcp, val)	_pcp_protect_return(xchg_relaxed, pcp, val)
+#define this_cpu_xchg_4(pcp, val)	_pcp_protect_return(xchg_relaxed, pcp, val)
+
+#ifdef CONFIG_64BIT
+#define this_cpu_xchg_8(pcp, val)	_pcp_protect_return(xchg_relaxed, pcp, val)
+#endif
+
+#define this_cpu_cmpxchg_1(pcp, o, n)	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+#define this_cpu_cmpxchg_2(pcp, o, n)	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+#define this_cpu_cmpxchg_4(pcp, o, n)	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+
+#ifdef CONFIG_64BIT
+#define this_cpu_cmpxchg_8(pcp, o, n)	_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+
+#define this_cpu_cmpxchg64(pcp, o, n)	this_cpu_cmpxchg_8(pcp, o, n)
+#endif
+
+#ifdef system_has_cmpxchg128
+#define this_cpu_cmpxchg128(pcp, o, n)					\
+({									\
+	u128 ret__;							\
+	typeof(pcp) *ptr__;						\
+									\
+	preempt_disable_notrace();					\
+	ptr__ = raw_cpu_ptr(&(pcp));					\
+	if (system_has_cmpxchg128())					\
+		ret__ = cmpxchg128_local(ptr__, (o), (n));		\
+	else								\
+		ret__ = this_cpu_generic_cmpxchg(pcp, (o), (n));	\
+	preempt_enable_notrace();					\
+	ret__;								\
+})
+#endif
+
+#include <asm-generic/percpu.h>
+
+#endif /* __ASM_PERCPU_H */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 3/3] riscv: store percpu offset into thread_info
  2026-05-05  6:20 [PATCH v4 0/3] riscv: improve percpu helpers and PIO mapping Yunhui Cui
  2026-05-05  6:20 ` [PATCH v4 1/3] riscv: io: avoid null-pointer arithmetic in PIO helpers Yunhui Cui
  2026-05-05  6:20 ` [PATCH v4 2/3] riscv: introduce percpu.h into include/asm Yunhui Cui
@ 2026-05-05  6:20 ` Yunhui Cui
  2026-05-05  7:20   ` bot+bpf-ci
  2026-05-05  8:11   ` sashiko-bot
  2 siblings, 2 replies; 10+ messages in thread
From: Yunhui Cui @ 2026-05-05  6:20 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, dennis, tj, cl, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, bjorn,
	pulehui, puranjay, thuth, ajones, ben.dooks, rkrcmar, cuiyunhui,
	samuel.holland, zong.li, conor.dooley, tglx, debug, seanwascoding,
	andybnac, menglong8.dong, cyrilbur, wangruikang, atishp, apatel,
	linux-riscv, linux-kernel, linux-mm, bpf, arnd, nathan,
	nick.desaulniers+lkml, morbo, justinstitt, qingfang.deng,
	linux-arch, llvm

Originally we planned to add a register for the percpu offset,
which would speed up percpu variable R/W and reduce access
instructions. After discussion [1], it’s now stored in thread_info.

[1] https://lists.riscv.org/g/tech-privileged/topic/risc_v_tech_arch_review/113437553?page=2

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
 arch/riscv/include/asm/asm.h         | 6 +-----
 arch/riscv/include/asm/percpu.h      | 4 ++++
 arch/riscv/include/asm/switch_to.h   | 8 ++++++++
 arch/riscv/include/asm/thread_info.h | 3 ++-
 arch/riscv/kernel/asm-offsets.c      | 1 +
 arch/riscv/kernel/smpboot.c          | 7 +++++++
 arch/riscv/net/bpf_jit_comp64.c      | 9 +--------
 7 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
index e9e8ba83e632f..137a49488325e 100644
--- a/arch/riscv/include/asm/asm.h
+++ b/arch/riscv/include/asm/asm.h
@@ -91,11 +91,7 @@
 
 #ifdef CONFIG_SMP
 .macro asm_per_cpu dst sym tmp
-	lw    \tmp, TASK_TI_CPU_NUM(tp)
-	slli  \tmp, \tmp, RISCV_LGPTR
-	la    \dst, __per_cpu_offset
-	add   \dst, \dst, \tmp
-	REG_L \tmp, 0(\dst)
+	REG_L \tmp, TASK_TI_PCPU_OFFSET(tp)
 	la    \dst, \sym
 	add   \dst, \dst, \tmp
 .endm
diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percpu.h
index 3b26fe45e70f4..84612d672105d 100644
--- a/arch/riscv/include/asm/percpu.h
+++ b/arch/riscv/include/asm/percpu.h
@@ -7,7 +7,9 @@
 
 #include <asm/alternative-macros.h>
 #include <asm/cpufeature-macros.h>
+#include <asm/current.h>
 #include <asm/hwcap.h>
+#include <asm/thread_info.h>
 
 #define PERCPU_RW_OPS(sz)						\
 static inline unsigned long __percpu_read_##sz(void *ptr)		\
@@ -275,6 +277,8 @@ _pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val)
 })
 #endif
 
+#define __my_cpu_offset (((struct thread_info *)current)->pcpu_offset)
+
 #include <asm-generic/percpu.h>
 
 #endif /* __ASM_PERCPU_H */
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 0e71eb82f920c..733b6cd306e40 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -88,6 +88,13 @@ static inline void __switch_to_envcfg(struct task_struct *next)
 			:: "r" (next->thread.envcfg) : "memory");
 }
 
+static inline void __switch_to_pcpu_offset(struct task_struct *next)
+{
+#ifdef CONFIG_SMP
+	next->thread_info.pcpu_offset = __my_cpu_offset;
+#endif
+}
+
 extern struct task_struct *__switch_to(struct task_struct *,
 				       struct task_struct *);
 
@@ -122,6 +129,7 @@ do {							\
 	if (switch_to_should_flush_icache(__next))	\
 		local_flush_icache_all();		\
 	__switch_to_envcfg(__next);			\
+	__switch_to_pcpu_offset(__next);		\
 	((last) = __switch_to(__prev, __next));		\
 } while (0)
 
diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
index 55019fdfa9eca..f10ba62b61016 100644
--- a/arch/riscv/include/asm/thread_info.h
+++ b/arch/riscv/include/asm/thread_info.h
@@ -53,6 +53,7 @@
 struct thread_info {
 	unsigned long		flags;		/* low level flags */
 	int                     preempt_count;  /* 0=>preemptible, <0=>BUG */
+	int			cpu;
 	/*
 	 * These stack pointers are overwritten on every system call or
 	 * exception.  SP is also saved to the stack it can be recovered when
@@ -60,8 +61,8 @@ struct thread_info {
 	 */
 	long			kernel_sp;	/* Kernel stack pointer */
 	long			user_sp;	/* User stack pointer */
-	int			cpu;
 	unsigned long		syscall_work;	/* SYSCALL_WORK_ flags */
+	unsigned long		pcpu_offset;
 #ifdef CONFIG_SHADOW_CALL_STACK
 	void			*scs_base;
 	void			*scs_sp;
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index af827448a609e..fbf53b66b0e06 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -38,6 +38,7 @@ void asm_offsets(void)
 	OFFSET(TASK_THREAD_SUM, task_struct, thread.sum);
 
 	OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
+	OFFSET(TASK_TI_PCPU_OFFSET, task_struct, thread_info.pcpu_offset);
 	OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
 	OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp);
 	OFFSET(TASK_TI_USER_SP, task_struct, thread_info.user_sp);
diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
index 8b628580fe118..41463a8400f6c 100644
--- a/arch/riscv/kernel/smpboot.c
+++ b/arch/riscv/kernel/smpboot.c
@@ -209,6 +209,11 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
 }
 #endif
 
+void __init smp_prepare_boot_cpu(void)
+{
+	__my_cpu_offset = per_cpu_offset(smp_processor_id());
+}
+
 void __init smp_cpus_done(unsigned int max_cpus)
 {
 }
@@ -234,6 +239,8 @@ asmlinkage __visible void smp_callin(void)
 	mmgrab(mm);
 	current->active_mm = mm;
 
+	__my_cpu_offset = per_cpu_offset(smp_processor_id());
+
 #ifdef CONFIG_HOTPLUG_PARALLEL
 	cpuhp_ap_sync_alive();
 #endif
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 2f1109dbf105b..177c19216013e 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -1395,15 +1395,8 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			if (rd != rs)
 				emit_mv(rd, rs, ctx);
 #ifdef CONFIG_SMP
-			/* Load current CPU number in T1 */
-			emit_lw(RV_REG_T1, offsetof(struct thread_info, cpu),
+			emit_ld(RV_REG_T1, offsetof(struct thread_info, pcpu_offset),
 				RV_REG_TP, ctx);
-			/* Load address of __per_cpu_offset array in T2 */
-			emit_addr(RV_REG_T2, (u64)&__per_cpu_offset, extra_pass, ctx);
-			/* Get address of __per_cpu_offset[cpu] in T1 */
-			emit_sh3add(RV_REG_T1, RV_REG_T1, RV_REG_T2, ctx);
-			/* Load __per_cpu_offset[cpu] in T1 */
-			emit_ld(RV_REG_T1, 0, RV_REG_T1, ctx);
 			/* Add the offset to Rd */
 			emit_add(rd, rd, RV_REG_T1, ctx);
 #endif
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/3] riscv: io: avoid null-pointer arithmetic in PIO helpers
  2026-05-05  6:20 ` [PATCH v4 1/3] riscv: io: avoid null-pointer arithmetic in PIO helpers Yunhui Cui
@ 2026-05-05  6:33   ` Arnd Bergmann
  2026-05-05  7:20   ` bot+bpf-ci
  1 sibling, 0 replies; 10+ messages in thread
From: Arnd Bergmann @ 2026-05-05  6:33 UTC (permalink / raw)
  To: Yunhui Cui, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexandre Ghiti, Dennis Zhou, Tejun Heo,
	Christoph Lameter (Ampere), Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Björn Töpel, pulehui, puranjay, Thomas Huth,
	Andrew Jones, Ben Dooks, Radim Krčmář,
	Samuel Holland, Zong Li, Conor.Dooley, Thomas Gleixner,
	Deepak Gupta, seanwascoding, Andy Chiu, menglong8.dong, cyrilbur,
	wangruikang, Atish Patra, Anup Patel, linux-riscv, linux-kernel,
	linux-mm, bpf, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
	Justin Stitt, qingfang.deng, Linux-Arch, llvm

On Tue, May 5, 2026, at 08:20, Yunhui Cui wrote:
> The RISC-V PIO helpers derive I/O addresses from PCI_IOBASE in ins*(),
> outs*(), and ioport_map().
>
> Under configurations where I/O port support is not available, these
> expressions can still be formed during compilation and trigger
> -Wnull-pointer-arithmetic warnings from clang.

If a driver attempts to use ISA port operations in a configuration
without CONFIG_HAS_IOPORT, there is a NULL pointer warning because
this is actually a NULL pointer access that will crash the
kernel if the driver is ever loaded. You should not attempt
to shut up the useful warning here but instead make sure every
such code has a proper 'depends on HAS_IOPORT' dependency.

> 
> +#ifdef CONFIG_HAS_IOPORT
>  __io_reads_ins(ins,  u8, b, __io_pbr(), __io_par(addr))
>  __io_reads_ins(ins, u16, w, __io_pbr(), __io_par(addr))
>  __io_reads_ins(ins, u32, l, __io_pbr(), __io_par(addr))
> -#define insb(addr, buffer, count) __insb(PCI_IOBASE + (addr), buffer, count)
> -#define insw(addr, buffer, count) __insw(PCI_IOBASE + (addr), buffer, count)
> -#define insl(addr, buffer, count) __insl(PCI_IOBASE + (addr), buffer, count)
> +#define insb(addr, buffer, count) __insb(PCI_IO_ADDR(addr), buffer, count)
> +#define insw(addr, buffer, count) __insw(PCI_IO_ADDR(addr), buffer, count)
> +#define insl(addr, buffer, count) __insl(PCI_IO_ADDR(addr), buffer, count)
> +#endif

Can you try to remove the custom accessors altogether and
use the ones from asm-generic instead? These should already
have the correct behavior all the time, in particular they
already come with the correct #ifdef check that is apparently
missing in the riscv version.

> diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
> index ca5a1ce6f0f89..d799e5ccc9437 100644
> --- a/include/asm-generic/io.h
> +++ b/include/asm-generic/io.h
> @@ -1205,8 +1205,12 @@ static inline void __iomem 
> *ioremap_np(phys_addr_t offset, size_t size)
>  #define ioport_map ioport_map
>  static inline void __iomem *ioport_map(unsigned long port, unsigned 
> int nr)
>  {
> +#ifdef CONFIG_HAS_IOPORT
>  	port &= IO_SPACE_LIMIT;
>  	return (port > MMIO_UPPER_LIMIT) ? NULL : PCI_IOBASE + port;
> +#else
> +	return NULL;
> +#endif
>  }

The function definition here is guarded by the #ifdef CONFIG_HAS_IOPORT_MAP
check, which should already prevent it from being visible when
CONFIG_HAS_IOPORT is disabled.

     Arnd

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 2/3] riscv: introduce percpu.h into include/asm
  2026-05-05  6:20 ` [PATCH v4 2/3] riscv: introduce percpu.h into include/asm Yunhui Cui
@ 2026-05-05  7:05   ` bot+bpf-ci
  2026-05-05  7:26   ` sashiko-bot
  1 sibling, 0 replies; 10+ messages in thread
From: bot+bpf-ci @ 2026-05-05  7:05 UTC (permalink / raw)
  To: cuiyunhui, pjw, palmer, aou, alex, dennis, tj, cl, ast, daniel,
	andrii, martin.lau, eddyz87, memxor, song, yonghong.song, jolsa,
	bjorn, pulehui, puranjay, thuth, ajones, ben.dooks, rkrcmar,
	cuiyunhui, samuel.holland, zong.li, conor.dooley, tglx, debug,
	seanwascoding, andybnac, menglong8.dong, cyrilbur, wangruikang,
	atishp, apatel, linux-riscv, linux-kernel, linux-mm, bpf, arnd,
	nathan, nick.desaulniers+lkml, morbo, justinstitt, qingfang.deng,
	linux-arch, llvm
  Cc: ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 7897 bytes --]

> diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percpu.h
> new file mode 100644
> index 000000000000..3b26fe45e70f
> --- /dev/null
> +++ b/arch/riscv/include/asm/percpu.h

[ ... ]

> @@ -0,0 +1,280 @@

[ ... ]

> +#ifdef CONFIG_64BIT
> +#define PERCPU_OP(name, amo_insn)					\
> +	__PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn)			\
> +	__PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn)
> +#else
> +#define PERCPU_OP(name, amo_insn)					\
> +	__PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn)
> +#endif
> +
> +PERCPU_OP(add, add)
> +PERCPU_OP(andnot, and)
> +PERCPU_OP(or, or)

[ ... ]

> +#define PERCPU_8_16_OP(name, amo_insn, sz, sfx, val_type, new_val_expr, asm_op)			\
> +static inline void __percpu_##name##_amo_case_##sz(void *ptr, unsigned long val)		\
> +{												\
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) &&						\
> +		riscv_has_extension_unlikely(RISCV_ISA_EXT_ZABHA)) {				\
> +		asm volatile ("amo" #amo_insn #sfx " zero, %[val], %[ptr]"			\
> +			: [ptr] "+A"(*(val_type *)ptr)						\
> +			: [val] "r"((val_type)((new_val_expr) & PERCPU_8_16_GET_MASK(sz)))	\
> +			: "memory");								\
> +	} else {										\
> +		u32 *ptr32 = PERCPU_8_16_GET_PTR32(ptr);					\
> +		const unsigned long shift = PERCPU_8_16_GET_SHIFT(ptr);				\
> +		const u32 mask = PERCPU_8_16_GET_MASK(sz) << shift;				\
> +		const val_type val_trunc = (val_type)((new_val_expr)				\
> +					   & PERCPU_8_16_GET_MASK(sz));				\
> +		u32 retx, rc;									\
> +		val_type new_val_type;								\
> +												\
> +		asm volatile (									\
> +			"0: lr.w %0, %2\n"							\
> +			"and %3, %0, %4\n"							\
> +			"srl %3, %3, %5\n"							\
> +			#asm_op " %3, %3, %6\n"							\
> +			"sll %3, %3, %5\n"							\
> +			"and %1, %0, %7\n"							\
> +			"or %1, %1, %3\n"							\
> +			"sc.w %1, %1, %2\n"							\
> +			"bnez %1, 0b\n"								\
> +			: "=&r"(retx), "=&r"(rc), "+A"(*ptr32), "=&r"(new_val_type)		\
> +			: "r"(mask), "r"(shift), "r"(val_trunc), "r"(~mask)			\
> +			: "memory");								\
> +		}										\
> +}

Can the 8/16-bit LR/SC fallback for add operations corrupt adjacent
percpu data?

In the non-ZABHA path above, after extracting and shifting the subword
(lines 3-4), the code performs the operation (line 5) and immediately
shifts back (line 6). For the add instantiation:

    srl %3, %3, %5     ; new_val >>= shift
    add %3, %3, %6     ; new_val += val_trunc (may overflow 8/16 bits)
    sll %3, %3, %5     ; new_val <<= shift

There's no masking between add and sll, so a carry-out from u8 (bit 8)
or u16 (bit 16) survives the shift. The result is then ORed into the
read-modify-write word:

    and %1, %0, %7     ; rc = retx & ~mask  (clears target subword only)
    or  %1, %1, %3     ; rc |= new_val     (overflow bit enters adjacent byte)

Example: a u8 percpu at offset 0 holds 0x80; calling this_cpu_add()
with 0x80 wraps the u8 to 0 (correct), but the carry bit 8 corrupts the
adjacent byte at offset 1 in the same u32.

The sibling add_return helper (PERCPU_8_16_RET_OP) includes the missing
truncation with "and %1, %1, %7" using PERCPU_8_16_GET_MASK(sz) before
the shift-back. Why doesn't PERCPU_8_16_OP do the same?

> +
> +#define PERCPU_OP_8_16(op_name, op, expr, final_op)			\
> +	PERCPU_8_16_OP(op_name, op, 8, .b, u8, expr, final_op);		\
> +	PERCPU_8_16_OP(op_name, op, 16, .h, u16, expr, final_op)
> +
> +PERCPU_OP_8_16(add, add, val, add)
> +PERCPU_OP_8_16(andnot, and, ~(val), and)
> +PERCPU_OP_8_16(or, or, val, or)

[ ... ]

> +#define PERCPU_8_16_RET_OP(name, amo_insn, sz, sfx, val_type, new_val_expr)			\
> +static inline val_type __percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long val)	\
> +{												\
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) &&						\
> +		riscv_has_extension_unlikely(RISCV_ISA_EXT_ZABHA)) {				\
> +		register val_type ret;								\
> +		asm volatile ("amo" #amo_insn #sfx " %[ret], %[val], %[ptr]"			\
> +			: [ptr] "+A"(*(val_type *)ptr), [ret] "=r"(ret)				\
> +			: [val] "r"((val_type)((new_val_expr) & PERCPU_8_16_GET_MASK(sz)))	\
> +			: "memory");								\
> +		return ret + (val_type)((new_val_expr) & PERCPU_8_16_GET_MASK(sz));		\
> +	} else {										\
> +		u32 *ptr32 = PERCPU_8_16_GET_PTR32(ptr);					\
> +		const unsigned long shift = PERCPU_8_16_GET_SHIFT(ptr);				\
> +		const u32 mask = (PERCPU_8_16_GET_MASK(sz) << shift);				\
> +		const u32 inv_mask = ~mask;							\
> +		const val_type val_trunc = (val_type)((new_val_expr)				\
> +					   & PERCPU_8_16_GET_MASK(sz));				\
> +		u32 old, new, tmp;								\
> +												\
> +		asm volatile (									\
> +			"0: lr.w %0, %3\n"							\
> +			"and %1, %0, %4\n"							\
> +			"srl %1, %1, %5\n"							\
> +			"add %1, %1, %6\n"							\
> +			"and %1, %1, %7\n"							\
> +			"sll %1, %1, %5\n"							\
> +			"and %2, %0, %8\n"							\
> +			"or %2, %2, %1\n"							\
> +			"sc.w %2, %2, %3\n"							\
> +			"bnez %2, 0b\n"								\
> +			: "=r"(old), "=r"(tmp), "=&r"(new), "+A"(*ptr32)			\
> +			: "r"(mask), "r"(shift), "r"(val_trunc), "r"(PERCPU_8_16_GET_MASK(sz)), \
> +			"r"(inv_mask)								\
> +			: "memory");								\
> +		return (val_type)(tmp >> shift);						\
> +	}											\
> +}

Does the LR/SC fallback need early-clobber qualifiers on the output
registers?

The inline asm writes "old" and "tmp" (via lr.w and and) before reading
all input registers (%4..%8: mask, shift, val_trunc, MASK(sz),
inv_mask). The GCC inline-asm contract requires "=&r" (early-clobber)
when an output is written before all inputs are consumed. Without it,
the compiler may allocate the same physical register for "old" or "tmp"
and one of the inputs, clobbering that input.

The sibling PERCPU_8_16_OP uses "=&r"(retx), "=&r"(rc),
"=&r"(new_val_type). The equivalent __arch_xchg_masked helper in
arch/riscv/include/asm/cmpxchg.h also uses "=&r" for outputs written
early.

Should "old" and "tmp" be declared "=&r" instead of "=r"?

> +
> +PERCPU_8_16_RET_OP(add, add, 8, .b, u8, val)
> +PERCPU_8_16_RET_OP(add, add, 16, .h, u16, val)

[ ... ]

> +#define this_cpu_and_1(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_8, pcp, ~(val))
> +#define this_cpu_and_2(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_16, pcp, ~(val))
> +#define this_cpu_and_4(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_32, pcp, ~(val))
> +
> +#ifdef CONFIG_64BIT
> +#define this_cpu_and_8(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_64, pcp, ~(val))
> +#endif

Do this_cpu_and_4() and this_cpu_and_8() compute the wrong result?

The macros pass ~(val) to __percpu_andnot_amo_case_32/64, which
are generated by PERCPU_OP(andnot, and). That expands to
__PERCPU_AMO_OP_CASE(.w/.d, andnot, 32/64, and) — the asm body is
"amoand.w/d zero, %[val], %[ptr]", which performs *ptr &= val (plain
AND).

With the caller passing ~val, the final operation becomes *ptr & ~val,
which is the opposite of the expected *ptr & val. Looking at the arm64
pattern (arch/arm64/include/asm/percpu.h), PERCPU_OP(andnot, bic, ldclr)
uses the BIC instruction that provides the inversion: bic performs
*ptr & ~val, so calling with ~val yields *ptr & ~(~val) = *ptr & val.
RISC-V's amoand has no equivalent inversion.

Example: a u32 percpu initialized to 0xFFFFFFFF with
this_cpu_and(var, 0x1) should produce 0x1 but yields 0xFFFFFFFE instead.

The 8/16-bit helpers happen to be correct because PERCPU_OP_8_16(andnot,
and, ~(val), and) feeds new_val_expr = ~(val), producing a second
inversion in the macro body that cancels out. Only the 32/64-bit paths
are broken.

Should PERCPU_OP(andnot, and) be replaced with a generator that does
explicit inversion, or should the this_cpu_and_4/8 callers pass val
without negation?

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25361792504

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 3/3] riscv: store percpu offset into thread_info
  2026-05-05  6:20 ` [PATCH v4 3/3] riscv: store percpu offset into thread_info Yunhui Cui
@ 2026-05-05  7:20   ` bot+bpf-ci
  2026-05-05  8:11   ` sashiko-bot
  1 sibling, 0 replies; 10+ messages in thread
From: bot+bpf-ci @ 2026-05-05  7:20 UTC (permalink / raw)
  To: cuiyunhui, pjw, palmer, aou, alex, dennis, tj, cl, ast, daniel,
	andrii, martin.lau, eddyz87, memxor, song, yonghong.song, jolsa,
	bjorn, pulehui, puranjay, thuth, ajones, ben.dooks, rkrcmar,
	cuiyunhui, samuel.holland, zong.li, conor.dooley, tglx, debug,
	seanwascoding, andybnac, menglong8.dong, cyrilbur, wangruikang,
	atishp, apatel, linux-riscv, linux-kernel, linux-mm, bpf, arnd,
	nathan, nick.desaulniers+lkml, morbo, justinstitt, qingfang.deng,
	linux-arch, llvm
  Cc: ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 5542 bytes --]

> diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
> index e9e8ba83e632..137a49488325 100644
> --- a/arch/riscv/include/asm/asm.h
> +++ b/arch/riscv/include/asm/asm.h
> @@ -91,11 +91,7 @@
>
>  #ifdef CONFIG_SMP
>  .macro asm_per_cpu dst sym tmp
> -	lw    \tmp, TASK_TI_CPU_NUM(tp)
> -	slli  \tmp, \tmp, RISCV_LGPTR
> -	la    \dst, __per_cpu_offset
> -	add   \dst, \dst, \tmp
> -	REG_L \tmp, 0(\dst)
> +	REG_L \tmp, TASK_TI_PCPU_OFFSET(tp)
>  	la    \dst, \sym
>  	add   \dst, \dst, \tmp
>  .endm

[ ... ]

> diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percpu.h
> index 3b26fe45e70f..84612d672105 100644
> --- a/arch/riscv/include/asm/percpu.h
> +++ b/arch/riscv/include/asm/percpu.h
> @@ -7,7 +7,9 @@
>
>  #include <asm/alternative-macros.h>
>  #include <asm/cpufeature-macros.h>
> +#include <asm/current.h>
>  #include <asm/hwcap.h>
> +#include <asm/thread_info.h>

[ ... ]

> @@ -275,6 +277,8 @@ _pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val)
>  })
>  #endif
>
> +#define __my_cpu_offset (((struct thread_info *)current)->pcpu_offset)
> +
>  #include <asm-generic/percpu.h>

[ ... ]

> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> index 0e71eb82f920..733b6cd306e4 100644
> --- a/arch/riscv/include/asm/switch_to.h
> +++ b/arch/riscv/include/asm/switch_to.h
> @@ -88,6 +88,13 @@ static inline void __switch_to_envcfg(struct task_struct *next)
>  			:: "r" (next->thread.envcfg) : "memory");
>  }
>
> +static inline void __switch_to_pcpu_offset(struct task_struct *next)
> +{
> +#ifdef CONFIG_SMP
> +	next->thread_info.pcpu_offset = __my_cpu_offset;
> +#endif
> +}
> +

[ ... ]

> @@ -122,6 +129,7 @@ do {							\
>  	if (switch_to_should_flush_icache(__next))	\
>  		local_flush_icache_all();		\
>  	__switch_to_envcfg(__next);			\
> +	__switch_to_pcpu_offset(__next);		\
>  	((last) = __switch_to(__prev, __next));		\
>  } while (0)

[ ... ]

> diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
> index 36918c9200c9..17996dc094fd 100644
> --- a/arch/riscv/include/asm/thread_info.h
> +++ b/arch/riscv/include/asm/thread_info.h
> @@ -53,6 +53,7 @@
>  struct thread_info {
>  	unsigned long		flags;		/* low level flags */
>  	int                     preempt_count;  /* 0=>preemptible, <0=>BUG */
> +	int			cpu;
>  	/*
>  	 * These stack pointers are overwritten on every system call or
>  	 * exception.  SP is also saved to the stack it can be recovered when
> @@ -60,8 +61,8 @@ struct thread_info {
>  	 */
>  	long			kernel_sp;	/* Kernel stack pointer */
>  	long			user_sp;	/* User stack pointer */
> -	int			cpu;
>  	unsigned long		syscall_work;	/* SYSCALL_WORK_ flags */
> +	unsigned long		pcpu_offset;

[ ... ]

> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> index af827448a609..fbf53b66b0e0 100644
> --- a/arch/riscv/kernel/asm-offsets.c
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -38,6 +38,7 @@ void asm_offsets(void)
>  	OFFSET(TASK_THREAD_SUM, task_struct, thread.sum);
>
>  	OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
> +	OFFSET(TASK_TI_PCPU_OFFSET, task_struct, thread_info.pcpu_offset);
>  	OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);

[ ... ]

> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
> index d85916a3660c..9e95c068b966 100644
> --- a/arch/riscv/kernel/smpboot.c
> +++ b/arch/riscv/kernel/smpboot.c
> @@ -209,6 +209,11 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
>  }
>  #endif
>
> +void __init smp_prepare_boot_cpu(void)
> +{
> +	__my_cpu_offset = per_cpu_offset(smp_processor_id());
> +}
> +

[ ... ]

> @@ -234,6 +239,8 @@ asmlinkage __visible void smp_callin(void)
>  	mmgrab(mm);
>  	current->active_mm = mm;
>
> +	__my_cpu_offset = per_cpu_offset(smp_processor_id());
> +

[ ... ]

> diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
> index 2f1109dbf105..177c19216013 100644
> --- a/arch/riscv/net/bpf_jit_comp64.c
> +++ b/arch/riscv/net/bpf_jit_comp64.c
> @@ -1395,15 +1395,8 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>  			if (rd != rs)
>  				emit_mv(rd, rs, ctx);
>  #ifdef CONFIG_SMP
> -			/* Load current CPU number in T1 */
> -			emit_lw(RV_REG_T1, offsetof(struct thread_info, cpu),
> +			emit_ld(RV_REG_T1, offsetof(struct thread_info, pcpu_offset),
>  				RV_REG_TP, ctx);
> -			/* Load address of __per_cpu_offset array in T2 */
> -			emit_addr(RV_REG_T2, (u64)&__per_cpu_offset, extra_pass, ctx);
> -			/* Get address of __per_cpu_offset[cpu] in T1 */
> -			emit_sh3add(RV_REG_T1, RV_REG_T1, RV_REG_T2, ctx);
> -			/* Load __per_cpu_offset[cpu] in T1 */
> -			emit_ld(RV_REG_T1, 0, RV_REG_T1, ctx);
>  			/* Add the offset to Rd */
>  			emit_add(rd, rd, RV_REG_T1, ctx);
>  #endif

Note: A lore search found an unaddressed review comment from Arnd Bergmann on
v4 of this patch series, dated Tue, 05 May 2026 08:33:04 +0200. The comment
references arch/riscv/include/asm/io.h regarding null-pointer arithmetic in
PIO helpers. However, that file is not modified in this commit. This may be
feedback on a different patch in the series.

Reference: https://lore.kernel.org/all/3720c3a4-cc0d-4ba8-97ae-35def0189e2e@app.fastmail.com/


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25361792504

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/3] riscv: io: avoid null-pointer arithmetic in PIO helpers
  2026-05-05  6:20 ` [PATCH v4 1/3] riscv: io: avoid null-pointer arithmetic in PIO helpers Yunhui Cui
  2026-05-05  6:33   ` Arnd Bergmann
@ 2026-05-05  7:20   ` bot+bpf-ci
  1 sibling, 0 replies; 10+ messages in thread
From: bot+bpf-ci @ 2026-05-05  7:20 UTC (permalink / raw)
  To: cuiyunhui, pjw, palmer, aou, alex, dennis, tj, cl, ast, daniel,
	andrii, martin.lau, eddyz87, memxor, song, yonghong.song, jolsa,
	bjorn, pulehui, puranjay, thuth, ajones, ben.dooks, rkrcmar,
	cuiyunhui, samuel.holland, zong.li, conor.dooley, tglx, debug,
	seanwascoding, andybnac, menglong8.dong, cyrilbur, wangruikang,
	atishp, apatel, linux-riscv, linux-kernel, linux-mm, bpf, arnd,
	nathan, nick.desaulniers+lkml, morbo, justinstitt, qingfang.deng,
	linux-arch, llvm
  Cc: ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2279 bytes --]

> diff --git a/arch/riscv/include/asm/io.h b/arch/riscv/include/asm/io.h
> index 09bb5f57a9d3..6f5d70313c83 100644
> --- a/arch/riscv/include/asm/io.h
> +++ b/arch/riscv/include/asm/io.h
> @@ -56,6 +56,8 @@
>  #define __io_pbw()	RISCV_FENCE(iow, o)
>  #define __io_paw()	RISCV_FENCE(o, io)
>
> +#define PCI_IO_ADDR(addr)	((void __iomem *)((unsigned long)PCI_IOBASE + (addr)))
> +
>  /*
>   * Accesses from a single hart to a single I/O address must be ordered.  This
>   * allows us to use the raw read macros, but we still need to fence before and

Arnd Bergmann raised concerns about this approach in his review of v4
(https://lore.kernel.org/all/3720c3a4-cc0d-4ba8-97ae-35def0189e2e@app.fastmail.com/).

He points out that the NULL pointer arithmetic warnings are actually
useful because they indicate real bugs - NULL pointer dereferences that
will crash the kernel if the driver is ever loaded. The proper fix is to
ensure drivers have 'depends on HAS_IOPORT' in their Kconfig entries.

He suggests removing the custom RISC-V PIO accessors (ins*/outs*)
entirely and using the asm-generic versions instead, which already have
the correct #ifdef checks. This would be a cleaner approach than adding
guards to suppress warnings.

[ ... ]

> diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
> index ca5a1ce6f0f8..d799e5ccc943 100644
> --- a/include/asm-generic/io.h
> +++ b/include/asm-generic/io.h
> @@ -1205,8 +1205,12 @@ static inline void __iomem *ioremap_np(phys_addr_t offset, size_t size)
>  #define ioport_map ioport_map
>  static inline void __iomem *ioport_map(unsigned long port, unsigned int nr)
>  {
> +#ifdef CONFIG_HAS_IOPORT
>  	port &= IO_SPACE_LIMIT;
>  	return (port > MMIO_UPPER_LIMIT) ? NULL : PCI_IOBASE + port;
> +#else
> +	return NULL;
> +#endif
>  }
>  #define ARCH_HAS_GENERIC_IOPORT_MAP
>  #endif

Is this change necessary? The function definition here is already
guarded by the #ifdef CONFIG_HAS_IOPORT_MAP check, which should prevent
it from being visible when CONFIG_HAS_IOPORT is disabled.


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25361792504

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 2/3] riscv: introduce percpu.h into include/asm
  2026-05-05  6:20 ` [PATCH v4 2/3] riscv: introduce percpu.h into include/asm Yunhui Cui
  2026-05-05  7:05   ` bot+bpf-ci
@ 2026-05-05  7:26   ` sashiko-bot
  1 sibling, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-05-05  7:26 UTC (permalink / raw)
  To: Yunhui Cui; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] The 8-bit and 16-bit `this_cpu_add` fallback logic fails to mask arithmetic overflow, corrupting adjacent percpu variables.
- [High] The 32-bit and 64-bit `this_cpu_and` macros incorrectly perform an AND-NOT operation instead of a bitwise AND.
- [High] Missing early-clobber modifiers (`=&r`) on output operands in `PERCPU_8_16_RET_OP` allow destructive register overlapping.
--

commit f4a490dc0b24e17cbd874e03d02e506d017e61b6
Author: Yunhui Cui <cuiyunhui@bytedance.com>

riscv: introduce percpu.h into include/asm

This commit introduces a RISC-V percpu header to implement direct this_cpu
operations. It provides macros for 8, 16, 32, and 64-bit read, write, and
atomic memory operations.

> diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percpu.h
> new file mode 100644
> index 0000000000000..3b26fe45e70f4
> --- /dev/null
> +++ b/arch/riscv/include/asm/percpu.h

[ ... ]

> +#define PERCPU_8_16_OP(name, amo_insn, sz, sfx, val_type, new_val_expr, asm_op)			\
> +static inline void __percpu_##name##_amo_case_##sz(void *ptr, unsigned long val)		\
> +{												\
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) &&						\
> +		riscv_has_extension_unlikely(RISCV_ISA_EXT_ZABHA)) {				\
> +		asm volatile ("amo" #amo_insn #sfx " zero, %[val], %[ptr]"			\
> +			: [ptr] "+A"(*(val_type *)ptr)						\
> +			: [val] "r"((val_type)((new_val_expr) & PERCPU_8_16_GET_MASK(sz)))	\
> +			: "memory");								\
> +	} else {										\
> +		u32 *ptr32 = PERCPU_8_16_GET_PTR32(ptr);					\
> +		const unsigned long shift = PERCPU_8_16_GET_SHIFT(ptr);				\
> +		const u32 mask = PERCPU_8_16_GET_MASK(sz) << shift;				\
> +		const val_type val_trunc = (val_type)((new_val_expr)				\
> +					   & PERCPU_8_16_GET_MASK(sz));				\
> +		u32 retx, rc;									\
> +		val_type new_val_type;								\
> +												\
> +		asm volatile (									\
> +			"0: lr.w %0, %2\n"							\
> +			"and %3, %0, %4\n"							\
> +			"srl %3, %3, %5\n"							\
> +			#asm_op " %3, %3, %6\n"							\

Is it possible for an 8-bit or 16-bit addition to overflow here? If there are
carry bits outside the subword boundary, they aren't masked out before the
result is shifted left.

Could this silent overflow spill into and corrupt adjacent per-CPU variables?

> +			"sll %3, %3, %5\n"							\
> +			"and %1, %0, %7\n"							\
> +			"or %1, %1, %3\n"							\
> +			"sc.w %1, %1, %2\n"							\
> +			"bnez %1, 0b\n"								\
> +			: "=&r"(retx), "=&r"(rc), "+A"(*ptr32), "=&r"(new_val_type)		\
> +			: "r"(mask), "r"(shift), "r"(val_trunc), "r"(~mask)			\
> +			: "memory");								\
> +		}										\
> +}

[ ... ]

> +#define PERCPU_8_16_RET_OP(name, amo_insn, sz, sfx, val_type, new_val_expr)			\
> +static inline val_type __percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long val)	\
> +{												\
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) &&						\
> +		riscv_has_extension_unlikely(RISCV_ISA_EXT_ZABHA)) {				\

[ ... ]

> +	} else {										\
> +		u32 *ptr32 = PERCPU_8_16_GET_PTR32(ptr);					\
> +		const unsigned long shift = PERCPU_8_16_GET_SHIFT(ptr);				\
> +		const u32 mask = (PERCPU_8_16_GET_MASK(sz) << shift);				\
> +		const u32 inv_mask = ~mask;							\
> +		const val_type val_trunc = (val_type)((new_val_expr)				\
> +					   & PERCPU_8_16_GET_MASK(sz));				\
> +		u32 old, new, tmp;								\
> +												\
> +		asm volatile (									\
> +			"0: lr.w %0, %3\n"							\
> +			"and %1, %0, %4\n"							\
> +			"srl %1, %1, %5\n"							\
> +			"add %1, %1, %6\n"							\
> +			"and %1, %1, %7\n"							\
> +			"sll %1, %1, %5\n"							\
> +			"and %2, %0, %8\n"							\
> +			"or %2, %2, %1\n"							\
> +			"sc.w %2, %2, %3\n"							\
> +			"bnez %2, 0b\n"								\
> +			: "=r"(old), "=r"(tmp), "=&r"(new), "+A"(*ptr32)			\

Should the old and tmp output operands use the early-clobber modifier (=&r)?

Since they are written to early in the instruction sequence before inputs
like mask (%4), shift (%5), or val_trunc (%6) are consumed, could the compiler
allocate them to the same physical registers, leading to clobbered inputs?

> +			: "r"(mask), "r"(shift), "r"(val_trunc), "r"(PERCPU_8_16_GET_MASK(sz)), \
> +			"r"(inv_mask)								\
> +			: "memory");								\
> +		return (val_type)(tmp >> shift);						\
> +	}											\
> +}

[ ... ]

> +#define this_cpu_and_1(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_8, pcp, ~(val))
> +#define this_cpu_and_2(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_16, pcp, ~(val))
> +#define this_cpu_and_4(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_32, pcp, ~(val))
> +
> +#ifdef CONFIG_64BIT
> +#define this_cpu_and_8(pcp, val)	_pcp_protect(__percpu_andnot_amo_case_64, pcp, ~(val))
> +#endif

Since the underlying __percpu_andnot_amo_case_32 and 64 macros directly use the
amoand instruction on the value passed, does negating val here with ~(val)
cause the hardware to perform an AND-NOT operation (pcp &= ~val) instead of a
bitwise AND (pcp &= val)?

> +
> +#define this_cpu_or_1(pcp, val)	_pcp_protect(__percpu_or_amo_case_8, pcp, val)

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260505062026.91724-1-cuiyunhui@bytedance.com?part=2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 3/3] riscv: store percpu offset into thread_info
  2026-05-05  6:20 ` [PATCH v4 3/3] riscv: store percpu offset into thread_info Yunhui Cui
  2026-05-05  7:20   ` bot+bpf-ci
@ 2026-05-05  8:11   ` sashiko-bot
  1 sibling, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-05-05  8:11 UTC (permalink / raw)
  To: Yunhui Cui; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Secondary CPUs run with the boot CPU's `pcpu_offset` during early boot, leading to concurrent lockless access to per-CPU data.
--

commit 9630a483e2d689234bf576230edd1da2035ea88b
Author: Yunhui Cui <cuiyunhui@bytedance.com>

riscv: store percpu offset into thread_info

This commit moves the percpu offset calculation to be directly stored in
and retrieved from the thread_info structure. This speeds up percpu variable
reads and writes by reducing the number of access instructions.

Link: https://lists.riscv.org/g/tech-privileged/topic/risc_v_tech_arch_review/113437553?page=2

> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
> index d85916a3660c3..9e95c068b966b 100644
> --- a/arch/riscv/kernel/smpboot.c
> +++ b/arch/riscv/kernel/smpboot.c

[ ... ]

> @@ -234,6 +239,8 @@ asmlinkage __visible void smp_callin(void)
>  	mmgrab(mm);
>  	current->active_mm = mm;
>  
> +	__my_cpu_offset = per_cpu_offset(smp_processor_id());
> +

Could this lead to concurrent lockless access to the boot CPU's per-CPU
data during early boot of secondary CPUs?

When the idle task for a secondary CPU is created via fork_idle(), it
copies the parent's task_struct and thread_info. As a result, the new
idle task inherits the parent's (the boot CPU's) pcpu_offset.

When the secondary CPU boots, it sets the tp register to this idle task
and begins executing C code without going through switch_to(). This
means any per-CPU access prior to the reassignment here in smp_callin()
will incorrectly access the boot CPU's per-CPU area.

Before this patch, __my_cpu_offset resolved dynamically using
thread_info->cpu, which was already correctly initialized on the boot
CPU via __set_task_cpu(), ensuring safe early access.

Now, for example, ftrace/kprobe hooks on function prologues, lockdep
checks, or mmgrab() itself could trigger a per-CPU access before this
assignment, leading to data races and potential memory corruption.

Would it be safer to initialize task_thread_info(tidle)->pcpu_offset =
per_cpu_offset(cpu) from the boot CPU in __cpu_up() prior to starting
the secondary CPU?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260505062026.91724-1-cuiyunhui@bytedance.com?part=3

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-05-05  8:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-05  6:20 [PATCH v4 0/3] riscv: improve percpu helpers and PIO mapping Yunhui Cui
2026-05-05  6:20 ` [PATCH v4 1/3] riscv: io: avoid null-pointer arithmetic in PIO helpers Yunhui Cui
2026-05-05  6:33   ` Arnd Bergmann
2026-05-05  7:20   ` bot+bpf-ci
2026-05-05  6:20 ` [PATCH v4 2/3] riscv: introduce percpu.h into include/asm Yunhui Cui
2026-05-05  7:05   ` bot+bpf-ci
2026-05-05  7:26   ` sashiko-bot
2026-05-05  6:20 ` [PATCH v4 3/3] riscv: store percpu offset into thread_info Yunhui Cui
2026-05-05  7:20   ` bot+bpf-ci
2026-05-05  8:11   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox