All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/11] Add a percpu subsection for hot data
@ 2025-02-22 19:06 Brian Gerst
  2025-02-22 19:06 ` [RFC PATCH 01/11] percpu: Introduce percpu hot section Brian Gerst
                   ` (11 more replies)
  0 siblings, 12 replies; 22+ messages in thread
From: Brian Gerst @ 2025-02-22 19:06 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Borislav Petkov,
	Ard Biesheuvel, Uros Bizjak, Brian Gerst

Add a new percpu subsection for data that is frequently accessed and
exclusive to each processor.  This is intended to replace the pcpu_hot
struct on X86, and is available to all architectures.

The one caveat with this approach is that it depends on the linker to
effeciently pack data that is smaller than machine word size.  The
binutils linker does this properly:

ffffffff842f6000 D __per_cpu_hot_start
ffffffff842f6000 D softirq_pending
ffffffff842f6002 D hardirq_stack_inuse
ffffffff842f6008 D hardirq_stack_ptr
ffffffff842f6010 D __ref_stack_chk_guard
ffffffff842f6010 D __stack_chk_guard
ffffffff842f6018 D const_cpu_current_top_of_stack
ffffffff842f6018 D cpu_current_top_of_stack
ffffffff842f6020 D const_current_task
ffffffff842f6020 D current_task
ffffffff842f6028 D __preempt_count
ffffffff842f602c D cpu_number
ffffffff842f6030 D this_cpu_off
ffffffff842f6038 D __x86_call_depth
ffffffff842f6040 D __per_cpu_hot_end

The LLVM linker doesn't do as well with packing smaller data objects,
causing it to spill over into a second cacheline.

Brian Gerst (11):
  percpu: Introduce percpu hot section
  x86/preempt: Move preempt count to percpu hot section
  x86/smp: Move cpu number to percpu hot section
  x86/retbleed: Move call depth to percpu hot section
  x86/percpu: Move top_of_stack to percpu hot section
  x86/percpu: Move current_task to percpu hot section
  x86/softirq: Move softirq_pending to percpu hot section
  x86/irq: Move irq stacks to percpu hot section
  x86/percpu: Remove pcpu_hot
  x86/stackprotector: Move __stack_chk_guard to percpu hot section
  x86/smp: Move this_cpu_off to percpu hot section

 arch/x86/entry/entry_32.S             |  4 +--
 arch/x86/entry/entry_64.S             |  6 ++---
 arch/x86/entry/entry_64_compat.S      |  4 +--
 arch/x86/include/asm/current.h        | 35 ++++-----------------------
 arch/x86/include/asm/hardirq.h        |  3 ++-
 arch/x86/include/asm/irq_stack.h      | 12 ++++-----
 arch/x86/include/asm/nospec-branch.h  | 10 +++++---
 arch/x86/include/asm/percpu.h         |  4 +--
 arch/x86/include/asm/preempt.h        | 25 ++++++++++---------
 arch/x86/include/asm/processor.h      | 15 ++++++++++--
 arch/x86/include/asm/smp.h            |  7 +++---
 arch/x86/include/asm/stackprotector.h |  2 +-
 arch/x86/kernel/asm-offsets.c         |  5 ----
 arch/x86/kernel/callthunks.c          |  3 +++
 arch/x86/kernel/cpu/common.c          | 17 +++++++------
 arch/x86/kernel/dumpstack_32.c        |  4 +--
 arch/x86/kernel/dumpstack_64.c        |  2 +-
 arch/x86/kernel/head_64.S             |  4 +--
 arch/x86/kernel/irq.c                 |  8 ++++++
 arch/x86/kernel/irq_32.c              | 12 +++++----
 arch/x86/kernel/irq_64.c              |  6 ++---
 arch/x86/kernel/process_32.c          |  6 ++---
 arch/x86/kernel/process_64.c          |  6 ++---
 arch/x86/kernel/setup_percpu.c        |  7 ++++--
 arch/x86/kernel/smpboot.c             |  4 +--
 arch/x86/kernel/vmlinux.lds.S         |  5 +++-
 arch/x86/lib/retpoline.S              |  2 +-
 include/asm-generic/vmlinux.lds.h     | 10 ++++++++
 include/linux/percpu-defs.h           | 10 ++++++++
 kernel/bpf/verifier.c                 |  4 +--
 scripts/gdb/linux/cpus.py             |  2 +-
 31 files changed, 135 insertions(+), 109 deletions(-)


base-commit: 01157ddc58dc2fe428ec17dd5a18cc13f134639f
-- 
2.48.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH 01/11] percpu: Introduce percpu hot section
  2025-02-22 19:06 [RFC PATCH 00/11] Add a percpu subsection for hot data Brian Gerst
@ 2025-02-22 19:06 ` Brian Gerst
  2025-02-22 19:06 ` [RFC PATCH 02/11] x86/preempt: Move preempt count to " Brian Gerst
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2025-02-22 19:06 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Borislav Petkov,
	Ard Biesheuvel, Uros Bizjak, Brian Gerst

Add a subsection to the percpu data for frequently accessed variables
that should remain cached on each processor.  These varables should not
be accessed from other processors to avoid cacheline bouncing.

This is intended to replace the pcpu_hot struct on X86, and open up
similar functionality to other architectures.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/kernel/vmlinux.lds.S     |  2 ++
 include/asm-generic/vmlinux.lds.h | 10 ++++++++++
 include/linux/percpu-defs.h       | 10 ++++++++++
 3 files changed, 22 insertions(+)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 1769a7126224..049485513f3c 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -187,6 +187,8 @@ SECTIONS
 
 		PAGE_ALIGNED_DATA(PAGE_SIZE)
 
+		HOT_DATA(L1_CACHE_BYTES)
+
 		CACHELINE_ALIGNED_DATA(L1_CACHE_BYTES)
 
 		DATA_DATA
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 92fc06f7da74..aaa83ec3afe4 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -385,6 +385,11 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
 	. = ALIGN(PAGE_SIZE);						\
 	__nosave_end = .;
 
+#define HOT_DATA(page_align)						\
+	. = ALIGN(page_align);						\
+	*(.data..hot) *(.data..hot.*)					\
+	. = ALIGN(page_align);
+
 #define PAGE_ALIGNED_DATA(page_align)					\
 	. = ALIGN(page_align);						\
 	*(.data..page_aligned)						\
@@ -1065,6 +1070,10 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
 	. = ALIGN(PAGE_SIZE);						\
 	*(.data..percpu..page_aligned)					\
 	. = ALIGN(cacheline);						\
+	__per_cpu_hot_start = .;					\
+	*(.data..percpu..hot) *(.data..percpu..hot.*)			\
+	__per_cpu_hot_end = .;						\
+	. = ALIGN(cacheline);						\
 	*(.data..percpu..read_mostly)					\
 	. = ALIGN(cacheline);						\
 	*(.data..percpu)						\
@@ -1112,6 +1121,7 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
 		INIT_TASK_DATA(inittask)				\
 		NOSAVE_DATA						\
 		PAGE_ALIGNED_DATA(pagealigned)				\
+		HOT_DATA(cacheline)					\
 		CACHELINE_ALIGNED_DATA(cacheline)			\
 		READ_MOSTLY_DATA(cacheline)				\
 		DATA_DATA						\
diff --git a/include/linux/percpu-defs.h b/include/linux/percpu-defs.h
index 40d34e032d5b..7db773ae5b52 100644
--- a/include/linux/percpu-defs.h
+++ b/include/linux/percpu-defs.h
@@ -112,6 +112,16 @@
 #define DEFINE_PER_CPU(type, name)					\
 	DEFINE_PER_CPU_SECTION(type, name, "")
 
+/*
+ * Declaration/definition used for per-CPU variables that are frequently
+ * accessed and should be in a single cacheline.
+ */
+#define DECLARE_PER_CPU_HOT(type, name)					\
+	DECLARE_PER_CPU_SECTION(type, name, "..hot.." #name)
+
+#define DEFINE_PER_CPU_HOT(type, name)					\
+	DEFINE_PER_CPU_SECTION(type, name, "..hot.." #name)
+
 /*
  * Declaration/definition used for per-CPU variables that must be cacheline
  * aligned under SMP conditions so that, whilst a particular instance of the
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 02/11] x86/preempt: Move preempt count to percpu hot section
  2025-02-22 19:06 [RFC PATCH 00/11] Add a percpu subsection for hot data Brian Gerst
  2025-02-22 19:06 ` [RFC PATCH 01/11] percpu: Introduce percpu hot section Brian Gerst
@ 2025-02-22 19:06 ` Brian Gerst
  2025-02-23 10:05   ` kernel test robot
                     ` (2 more replies)
  2025-02-22 19:06 ` [RFC PATCH 03/11] x86/smp: Move cpu number " Brian Gerst
                   ` (9 subsequent siblings)
  11 siblings, 3 replies; 22+ messages in thread
From: Brian Gerst @ 2025-02-22 19:06 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Borislav Petkov,
	Ard Biesheuvel, Uros Bizjak, Brian Gerst

No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/current.h |  1 -
 arch/x86/include/asm/preempt.h | 25 +++++++++++++------------
 arch/x86/kernel/cpu/common.c   |  4 +++-
 3 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
index bf5953883ec3..9a2fe2fd7d74 100644
--- a/arch/x86/include/asm/current.h
+++ b/arch/x86/include/asm/current.h
@@ -16,7 +16,6 @@ struct pcpu_hot {
 	union {
 		struct {
 			struct task_struct	*current_task;
-			int			preempt_count;
 			int			cpu_number;
 #ifdef CONFIG_MITIGATION_CALL_DEPTH_TRACKING
 			u64			call_depth;
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 919909d8cb77..2f3a40cbdd76 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -4,10 +4,11 @@
 
 #include <asm/rmwcc.h>
 #include <asm/percpu.h>
-#include <asm/current.h>
 
 #include <linux/static_call_types.h>
 
+DECLARE_PER_CPU_HOT(int, __preempt_count);
+
 /* We use the MSB mostly because its available */
 #define PREEMPT_NEED_RESCHED	0x80000000
 
@@ -23,18 +24,18 @@
  */
 static __always_inline int preempt_count(void)
 {
-	return raw_cpu_read_4(pcpu_hot.preempt_count) & ~PREEMPT_NEED_RESCHED;
+	return raw_cpu_read_4(__preempt_count) & ~PREEMPT_NEED_RESCHED;
 }
 
 static __always_inline void preempt_count_set(int pc)
 {
 	int old, new;
 
-	old = raw_cpu_read_4(pcpu_hot.preempt_count);
+	old = raw_cpu_read_4(__preempt_count);
 	do {
 		new = (old & PREEMPT_NEED_RESCHED) |
 			(pc & ~PREEMPT_NEED_RESCHED);
-	} while (!raw_cpu_try_cmpxchg_4(pcpu_hot.preempt_count, &old, new));
+	} while (!raw_cpu_try_cmpxchg_4(__preempt_count, &old, new));
 }
 
 /*
@@ -43,7 +44,7 @@ static __always_inline void preempt_count_set(int pc)
 #define init_task_preempt_count(p) do { } while (0)
 
 #define init_idle_preempt_count(p, cpu) do { \
-	per_cpu(pcpu_hot.preempt_count, (cpu)) = PREEMPT_DISABLED; \
+	per_cpu(__preempt_count, (cpu)) = PREEMPT_DISABLED; \
 } while (0)
 
 /*
@@ -57,17 +58,17 @@ static __always_inline void preempt_count_set(int pc)
 
 static __always_inline void set_preempt_need_resched(void)
 {
-	raw_cpu_and_4(pcpu_hot.preempt_count, ~PREEMPT_NEED_RESCHED);
+	raw_cpu_and_4(__preempt_count, ~PREEMPT_NEED_RESCHED);
 }
 
 static __always_inline void clear_preempt_need_resched(void)
 {
-	raw_cpu_or_4(pcpu_hot.preempt_count, PREEMPT_NEED_RESCHED);
+	raw_cpu_or_4(__preempt_count, PREEMPT_NEED_RESCHED);
 }
 
 static __always_inline bool test_preempt_need_resched(void)
 {
-	return !(raw_cpu_read_4(pcpu_hot.preempt_count) & PREEMPT_NEED_RESCHED);
+	return !(raw_cpu_read_4(__preempt_count) & PREEMPT_NEED_RESCHED);
 }
 
 /*
@@ -76,12 +77,12 @@ static __always_inline bool test_preempt_need_resched(void)
 
 static __always_inline void __preempt_count_add(int val)
 {
-	raw_cpu_add_4(pcpu_hot.preempt_count, val);
+	raw_cpu_add_4(__preempt_count, val);
 }
 
 static __always_inline void __preempt_count_sub(int val)
 {
-	raw_cpu_add_4(pcpu_hot.preempt_count, -val);
+	raw_cpu_add_4(__preempt_count, -val);
 }
 
 /*
@@ -91,7 +92,7 @@ static __always_inline void __preempt_count_sub(int val)
  */
 static __always_inline bool __preempt_count_dec_and_test(void)
 {
-	return GEN_UNARY_RMWcc("decl", __my_cpu_var(pcpu_hot.preempt_count), e,
+	return GEN_UNARY_RMWcc("decl", __my_cpu_var(__preempt_count), e,
 			       __percpu_arg([var]));
 }
 
@@ -100,7 +101,7 @@ static __always_inline bool __preempt_count_dec_and_test(void)
  */
 static __always_inline bool should_resched(int preempt_offset)
 {
-	return unlikely(raw_cpu_read_4(pcpu_hot.preempt_count) == preempt_offset);
+	return unlikely(raw_cpu_read_4(__preempt_count) == preempt_offset);
 }
 
 #ifdef CONFIG_PREEMPTION
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8b49b1338f76..519e2ec2027d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2016,12 +2016,14 @@ __setup("clearcpuid=", setup_clearcpuid);
 
 DEFINE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot) = {
 	.current_task	= &init_task,
-	.preempt_count	= INIT_PREEMPT_COUNT,
 	.top_of_stack	= TOP_OF_INIT_STACK,
 };
 EXPORT_PER_CPU_SYMBOL(pcpu_hot);
 EXPORT_PER_CPU_SYMBOL(const_pcpu_hot);
 
+DEFINE_PER_CPU_HOT(int, __preempt_count) = INIT_PREEMPT_COUNT;
+EXPORT_PER_CPU_SYMBOL(__preempt_count);
+
 #ifdef CONFIG_X86_64
 static void wrmsrl_cstar(unsigned long val)
 {
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 03/11] x86/smp: Move cpu number to percpu hot section
  2025-02-22 19:06 [RFC PATCH 00/11] Add a percpu subsection for hot data Brian Gerst
  2025-02-22 19:06 ` [RFC PATCH 01/11] percpu: Introduce percpu hot section Brian Gerst
  2025-02-22 19:06 ` [RFC PATCH 02/11] x86/preempt: Move preempt count to " Brian Gerst
@ 2025-02-22 19:06 ` Brian Gerst
  2025-02-22 19:06 ` [RFC PATCH 04/11] x86/retbleed: Move call depth " Brian Gerst
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2025-02-22 19:06 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Borislav Petkov,
	Ard Biesheuvel, Uros Bizjak, Brian Gerst

No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/current.h | 1 -
 arch/x86/include/asm/smp.h     | 7 ++++---
 arch/x86/kernel/setup_percpu.c | 5 ++++-
 kernel/bpf/verifier.c          | 4 ++--
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
index 9a2fe2fd7d74..119099431c76 100644
--- a/arch/x86/include/asm/current.h
+++ b/arch/x86/include/asm/current.h
@@ -16,7 +16,6 @@ struct pcpu_hot {
 	union {
 		struct {
 			struct task_struct	*current_task;
-			int			cpu_number;
 #ifdef CONFIG_MITIGATION_CALL_DEPTH_TRACKING
 			u64			call_depth;
 #endif
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index ca073f40698f..a249f9991708 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -5,9 +5,10 @@
 #include <linux/cpumask.h>
 
 #include <asm/cpumask.h>
-#include <asm/current.h>
 #include <asm/thread_info.h>
 
+DECLARE_PER_CPU_HOT(int, cpu_number);
+
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
@@ -133,8 +134,8 @@ __visible void smp_call_function_single_interrupt(struct pt_regs *r);
  * This function is needed by all SMP systems. It must _always_ be valid
  * from the initial startup.
  */
-#define raw_smp_processor_id()  this_cpu_read(pcpu_hot.cpu_number)
-#define __smp_processor_id() __this_cpu_read(pcpu_hot.cpu_number)
+#define raw_smp_processor_id()  this_cpu_read(cpu_number)
+#define __smp_processor_id() __this_cpu_read(cpu_number)
 
 #ifdef CONFIG_X86_32
 extern int safe_smp_processor_id(void);
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 1e7be9409aa2..0ea3443433c5 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -23,6 +23,9 @@
 #include <asm/cpumask.h>
 #include <asm/cpu.h>
 
+DEFINE_PER_CPU_HOT(int, cpu_number);
+EXPORT_PER_CPU_SYMBOL(cpu_number);
+
 DEFINE_PER_CPU_READ_MOSTLY(unsigned long, this_cpu_off);
 EXPORT_PER_CPU_SYMBOL(this_cpu_off);
 
@@ -161,7 +164,7 @@ void __init setup_per_cpu_areas(void)
 	for_each_possible_cpu(cpu) {
 		per_cpu_offset(cpu) = delta + pcpu_unit_offsets[cpu];
 		per_cpu(this_cpu_off, cpu) = per_cpu_offset(cpu);
-		per_cpu(pcpu_hot.cpu_number, cpu) = cpu;
+		per_cpu(cpu_number, cpu) = cpu;
 		setup_percpu_segment(cpu);
 		/*
 		 * Copy data used in early init routines from the
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9971c03adfd5..604134d33282 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -21687,12 +21687,12 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 		if (insn->imm == BPF_FUNC_get_smp_processor_id &&
 		    verifier_inlines_helper_call(env, insn->imm)) {
 			/* BPF_FUNC_get_smp_processor_id inlining is an
-			 * optimization, so if pcpu_hot.cpu_number is ever
+			 * optimization, so if cpu_number is ever
 			 * changed in some incompatible and hard to support
 			 * way, it's fine to back out this inlining logic
 			 */
 #ifdef CONFIG_SMP
-			insn_buf[0] = BPF_MOV32_IMM(BPF_REG_0, (u32)(unsigned long)&pcpu_hot.cpu_number);
+			insn_buf[0] = BPF_MOV32_IMM(BPF_REG_0, (u32)(unsigned long)&cpu_number);
 			insn_buf[1] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
 			insn_buf[2] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
 			cnt = 3;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 04/11] x86/retbleed: Move call depth to percpu hot section
  2025-02-22 19:06 [RFC PATCH 00/11] Add a percpu subsection for hot data Brian Gerst
                   ` (2 preceding siblings ...)
  2025-02-22 19:06 ` [RFC PATCH 03/11] x86/smp: Move cpu number " Brian Gerst
@ 2025-02-22 19:06 ` Brian Gerst
  2025-02-22 19:06 ` [RFC PATCH 05/11] x86/percpu: Move top_of_stack " Brian Gerst
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2025-02-22 19:06 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Borislav Petkov,
	Ard Biesheuvel, Uros Bizjak, Brian Gerst

No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/current.h       |  3 ---
 arch/x86/include/asm/nospec-branch.h | 10 ++++++----
 arch/x86/kernel/asm-offsets.c        |  3 ---
 arch/x86/kernel/callthunks.c         |  3 +++
 arch/x86/lib/retpoline.S             |  2 +-
 5 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
index 119099431c76..fbc7eb92adb2 100644
--- a/arch/x86/include/asm/current.h
+++ b/arch/x86/include/asm/current.h
@@ -16,9 +16,6 @@ struct pcpu_hot {
 	union {
 		struct {
 			struct task_struct	*current_task;
-#ifdef CONFIG_MITIGATION_CALL_DEPTH_TRACKING
-			u64			call_depth;
-#endif
 			unsigned long		top_of_stack;
 			void			*hardirq_stack_ptr;
 			u16			softirq_pending;
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 7e8bf78c03d5..8441d1da5382 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -78,21 +78,21 @@
 #include <asm/asm-offsets.h>
 
 #define CREDIT_CALL_DEPTH					\
-	movq	$-1, PER_CPU_VAR(pcpu_hot + X86_call_depth);
+	movq	$-1, PER_CPU_VAR(__x86_call_depth);
 
 #define RESET_CALL_DEPTH					\
 	xor	%eax, %eax;					\
 	bts	$63, %rax;					\
-	movq	%rax, PER_CPU_VAR(pcpu_hot + X86_call_depth);
+	movq	%rax, PER_CPU_VAR(__x86_call_depth);
 
 #define RESET_CALL_DEPTH_FROM_CALL				\
 	movb	$0xfc, %al;					\
 	shl	$56, %rax;					\
-	movq	%rax, PER_CPU_VAR(pcpu_hot + X86_call_depth);	\
+	movq	%rax, PER_CPU_VAR(__x86_call_depth);		\
 	CALL_THUNKS_DEBUG_INC_CALLS
 
 #define INCREMENT_CALL_DEPTH					\
-	sarq	$5, PER_CPU_VAR(pcpu_hot + X86_call_depth);	\
+	sarq	$5, PER_CPU_VAR(__x86_call_depth);		\
 	CALL_THUNKS_DEBUG_INC_CALLS
 
 #else
@@ -388,6 +388,8 @@ extern void call_depth_return_thunk(void);
 		    __stringify(INCREMENT_CALL_DEPTH),		\
 		    X86_FEATURE_CALL_DEPTH)
 
+DECLARE_PER_CPU_HOT(u64, __x86_call_depth);
+
 #ifdef CONFIG_CALL_THUNKS_DEBUG
 DECLARE_PER_CPU(u64, __x86_call_count);
 DECLARE_PER_CPU(u64, __x86_ret_count);
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index a98020bf31bb..6fae88f8ae1e 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -109,9 +109,6 @@ static void __used common(void)
 	OFFSET(TSS_sp2, tss_struct, x86_tss.sp2);
 	OFFSET(X86_top_of_stack, pcpu_hot, top_of_stack);
 	OFFSET(X86_current_task, pcpu_hot, current_task);
-#ifdef CONFIG_MITIGATION_CALL_DEPTH_TRACKING
-	OFFSET(X86_call_depth, pcpu_hot, call_depth);
-#endif
 #if IS_ENABLED(CONFIG_CRYPTO_ARIA_AESNI_AVX_X86_64)
 	/* Offset for fields in aria_ctx */
 	BLANK();
diff --git a/arch/x86/kernel/callthunks.c b/arch/x86/kernel/callthunks.c
index 8418a892d195..7032d4caf242 100644
--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -39,6 +39,9 @@ static int __init debug_thunks(char *str)
 }
 __setup("debug-callthunks", debug_thunks);
 
+DEFINE_PER_CPU_HOT(u64, __x86_call_depth);
+EXPORT_PER_CPU_SYMBOL(__x86_call_depth);
+
 #ifdef CONFIG_CALL_THUNKS_DEBUG
 DEFINE_PER_CPU(u64, __x86_call_count);
 DEFINE_PER_CPU(u64, __x86_ret_count);
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 391059b2c6fb..04502e843de0 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -342,7 +342,7 @@ SYM_FUNC_START(call_depth_return_thunk)
 	 * case.
 	 */
 	CALL_THUNKS_DEBUG_INC_RETS
-	shlq	$5, PER_CPU_VAR(pcpu_hot + X86_call_depth)
+	shlq	$5, PER_CPU_VAR(__x86_call_depth)
 	jz	1f
 	ANNOTATE_UNRET_SAFE
 	ret
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 05/11] x86/percpu: Move top_of_stack to percpu hot section
  2025-02-22 19:06 [RFC PATCH 00/11] Add a percpu subsection for hot data Brian Gerst
                   ` (3 preceding siblings ...)
  2025-02-22 19:06 ` [RFC PATCH 04/11] x86/retbleed: Move call depth " Brian Gerst
@ 2025-02-22 19:06 ` Brian Gerst
  2025-02-22 19:06 ` [RFC PATCH 06/11] x86/percpu: Move current_task " Brian Gerst
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2025-02-22 19:06 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Borislav Petkov,
	Ard Biesheuvel, Uros Bizjak, Brian Gerst

No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/entry/entry_32.S        | 4 ++--
 arch/x86/entry/entry_64.S        | 6 +++---
 arch/x86/entry/entry_64_compat.S | 4 ++--
 arch/x86/include/asm/current.h   | 1 -
 arch/x86/include/asm/percpu.h    | 2 +-
 arch/x86/include/asm/processor.h | 8 ++++++--
 arch/x86/kernel/asm-offsets.c    | 1 -
 arch/x86/kernel/cpu/common.c     | 3 ++-
 arch/x86/kernel/process_32.c     | 4 ++--
 arch/x86/kernel/process_64.c     | 2 +-
 arch/x86/kernel/smpboot.c        | 2 +-
 arch/x86/kernel/vmlinux.lds.S    | 1 +
 12 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 20be5758c2d2..92c0b4a94e0a 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1153,7 +1153,7 @@ SYM_CODE_START(asm_exc_nmi)
 	 * is using the thread stack right now, so it's safe for us to use it.
 	 */
 	movl	%esp, %ebx
-	movl	PER_CPU_VAR(pcpu_hot + X86_top_of_stack), %esp
+	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
 	call	exc_nmi
 	movl	%ebx, %esp
 
@@ -1217,7 +1217,7 @@ SYM_CODE_START(rewind_stack_and_make_dead)
 	/* Prevent any naive code from trying to unwind to our caller. */
 	xorl	%ebp, %ebp
 
-	movl	PER_CPU_VAR(pcpu_hot + X86_top_of_stack), %esi
+	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esi
 	leal	-TOP_OF_KERNEL_STACK_PADDING-PTREGS_SIZE(%esi), %esp
 
 	call	make_task_dead
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 33a955aa01d8..9baf32a7a118 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -92,7 +92,7 @@ SYM_CODE_START(entry_SYSCALL_64)
 	/* tss.sp2 is scratch space. */
 	movq	%rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2)
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
-	movq	PER_CPU_VAR(pcpu_hot + X86_top_of_stack), %rsp
+	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
 SYM_INNER_LABEL(entry_SYSCALL_64_safe_stack, SYM_L_GLOBAL)
 	ANNOTATE_NOENDBR
@@ -1166,7 +1166,7 @@ SYM_CODE_START(asm_exc_nmi)
 	FENCE_SWAPGS_USER_ENTRY
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
 	movq	%rsp, %rdx
-	movq	PER_CPU_VAR(pcpu_hot + X86_top_of_stack), %rsp
+	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 	UNWIND_HINT_IRET_REGS base=%rdx offset=8
 	pushq	5*8(%rdx)	/* pt_regs->ss */
 	pushq	4*8(%rdx)	/* pt_regs->rsp */
@@ -1484,7 +1484,7 @@ SYM_CODE_START_NOALIGN(rewind_stack_and_make_dead)
 	/* Prevent any naive code from trying to unwind to our caller. */
 	xorl	%ebp, %ebp
 
-	movq	PER_CPU_VAR(pcpu_hot + X86_top_of_stack), %rax
+	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rax
 	leaq	-PTREGS_SIZE(%rax), %rsp
 	UNWIND_HINT_REGS
 
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index ed0a5f2dc129..a45e1125fc6c 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -57,7 +57,7 @@ SYM_CODE_START(entry_SYSENTER_compat)
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
 	popq	%rax
 
-	movq	PER_CPU_VAR(pcpu_hot + X86_top_of_stack), %rsp
+	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
 	/* Construct struct pt_regs on stack */
 	pushq	$__USER_DS		/* pt_regs->ss */
@@ -193,7 +193,7 @@ SYM_CODE_START(entry_SYSCALL_compat)
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
 
 	/* Switch to the kernel stack */
-	movq	PER_CPU_VAR(pcpu_hot + X86_top_of_stack), %rsp
+	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
 SYM_INNER_LABEL(entry_SYSCALL_compat_safe_stack, SYM_L_GLOBAL)
 	ANNOTATE_NOENDBR
diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
index fbc7eb92adb2..8adbe0e3c5e7 100644
--- a/arch/x86/include/asm/current.h
+++ b/arch/x86/include/asm/current.h
@@ -16,7 +16,6 @@ struct pcpu_hot {
 	union {
 		struct {
 			struct task_struct	*current_task;
-			unsigned long		top_of_stack;
 			void			*hardirq_stack_ptr;
 			u16			softirq_pending;
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 1a76eb87c5d8..cc19bd785f0e 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -560,7 +560,7 @@ do {									\
  * it is accessed while this_cpu_read_stable() allows the value to be cached.
  * this_cpu_read_stable() is more efficient and can be used if its value
  * is guaranteed to be valid across CPUs.  The current users include
- * pcpu_hot.current_task and pcpu_hot.top_of_stack, both of which are
+ * pcpu_hot.current_task and cpu_current_top_of_stack, both of which are
  * actually per-thread variables implemented as per-CPU variables and
  * thus stable for the duration of the respective task.
  */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index b3d153730f63..1505cb1d09a8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -420,6 +420,10 @@ struct irq_stack {
 	char		stack[IRQ_STACK_SIZE];
 } __aligned(IRQ_STACK_SIZE);
 
+DECLARE_PER_CPU_HOT(unsigned long, cpu_current_top_of_stack);
+/* const-qualified alias provided by the linker. */
+DECLARE_PER_CPU_HOT(const unsigned long, const_cpu_current_top_of_stack);
+
 #ifdef CONFIG_X86_64
 static inline unsigned long cpu_kernelmode_gs_base(int cpu)
 {
@@ -545,9 +549,9 @@ static __always_inline unsigned long current_top_of_stack(void)
 	 *  entry trampoline.
 	 */
 	if (IS_ENABLED(CONFIG_USE_X86_SEG_SUPPORT))
-		return this_cpu_read_const(const_pcpu_hot.top_of_stack);
+		return this_cpu_read_const(const_cpu_current_top_of_stack);
 
-	return this_cpu_read_stable(pcpu_hot.top_of_stack);
+	return this_cpu_read_stable(cpu_current_top_of_stack);
 }
 
 static __always_inline bool on_thread_stack(void)
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 6fae88f8ae1e..54ace808defd 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -107,7 +107,6 @@ static void __used common(void)
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
 	OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
 	OFFSET(TSS_sp2, tss_struct, x86_tss.sp2);
-	OFFSET(X86_top_of_stack, pcpu_hot, top_of_stack);
 	OFFSET(X86_current_task, pcpu_hot, current_task);
 #if IS_ENABLED(CONFIG_CRYPTO_ARIA_AESNI_AVX_X86_64)
 	/* Offset for fields in aria_ctx */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 519e2ec2027d..25a5806e15aa 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2016,7 +2016,6 @@ __setup("clearcpuid=", setup_clearcpuid);
 
 DEFINE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot) = {
 	.current_task	= &init_task,
-	.top_of_stack	= TOP_OF_INIT_STACK,
 };
 EXPORT_PER_CPU_SYMBOL(pcpu_hot);
 EXPORT_PER_CPU_SYMBOL(const_pcpu_hot);
@@ -2024,6 +2023,8 @@ EXPORT_PER_CPU_SYMBOL(const_pcpu_hot);
 DEFINE_PER_CPU_HOT(int, __preempt_count) = INIT_PREEMPT_COUNT;
 EXPORT_PER_CPU_SYMBOL(__preempt_count);
 
+DEFINE_PER_CPU_HOT(unsigned long, cpu_current_top_of_stack) = TOP_OF_INIT_STACK;
+
 #ifdef CONFIG_X86_64
 static void wrmsrl_cstar(unsigned long val)
 {
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 0917c7f25720..3afb2428bedb 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -190,13 +190,13 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	arch_end_context_switch(next_p);
 
 	/*
-	 * Reload esp0 and pcpu_hot.top_of_stack.  This changes
+	 * Reload esp0 and cpu_current_top_of_stack.  This changes
 	 * current_thread_info().  Refresh the SYSENTER configuration in
 	 * case prev or next is vm86.
 	 */
 	update_task_stack(next_p);
 	refresh_sysenter_cs(next);
-	this_cpu_write(pcpu_hot.top_of_stack,
+	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
 		       THREAD_SIZE);
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 226472332a70..4252b11718f2 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -669,7 +669,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 * Switch the PDA and FPU contexts.
 	 */
 	raw_cpu_write(pcpu_hot.current_task, next_p);
-	raw_cpu_write(pcpu_hot.top_of_stack, task_top_of_stack(next_p));
+	raw_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
 	switch_fpu_finish(next_p);
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c10850ae6f09..15e054f4cbf6 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -851,7 +851,7 @@ int common_cpu_up(unsigned int cpu, struct task_struct *idle)
 
 #ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
-	per_cpu(pcpu_hot.top_of_stack, cpu) = task_top_of_stack(idle);
+	per_cpu(cpu_current_top_of_stack, cpu) = task_top_of_stack(idle);
 #endif
 	return 0;
 }
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 049485513f3c..ee019c1ea859 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -44,6 +44,7 @@ ENTRY(phys_startup_64)
 
 jiffies = jiffies_64;
 const_pcpu_hot = pcpu_hot;
+const_cpu_current_top_of_stack = cpu_current_top_of_stack;
 
 #if defined(CONFIG_X86_64)
 /*
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 06/11] x86/percpu: Move current_task to percpu hot section
  2025-02-22 19:06 [RFC PATCH 00/11] Add a percpu subsection for hot data Brian Gerst
                   ` (4 preceding siblings ...)
  2025-02-22 19:06 ` [RFC PATCH 05/11] x86/percpu: Move top_of_stack " Brian Gerst
@ 2025-02-22 19:06 ` Brian Gerst
  2025-02-22 19:06 ` [RFC PATCH 07/11] x86/softirq: Move softirq_pending " Brian Gerst
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2025-02-22 19:06 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Borislav Petkov,
	Ard Biesheuvel, Uros Bizjak, Brian Gerst

No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/current.h | 9 ++++++---
 arch/x86/include/asm/percpu.h  | 2 +-
 arch/x86/kernel/asm-offsets.c  | 1 -
 arch/x86/kernel/cpu/common.c   | 8 +++++---
 arch/x86/kernel/head_64.S      | 4 ++--
 arch/x86/kernel/process_32.c   | 2 +-
 arch/x86/kernel/process_64.c   | 2 +-
 arch/x86/kernel/smpboot.c      | 2 +-
 arch/x86/kernel/vmlinux.lds.S  | 1 +
 scripts/gdb/linux/cpus.py      | 2 +-
 10 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
index 8adbe0e3c5e7..d51299af6145 100644
--- a/arch/x86/include/asm/current.h
+++ b/arch/x86/include/asm/current.h
@@ -15,7 +15,6 @@ struct task_struct;
 struct pcpu_hot {
 	union {
 		struct {
-			struct task_struct	*current_task;
 			void			*hardirq_stack_ptr;
 			u16			softirq_pending;
 #ifdef CONFIG_X86_64
@@ -35,12 +34,16 @@ DECLARE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot);
 DECLARE_PER_CPU_ALIGNED(const struct pcpu_hot __percpu_seg_override,
 			const_pcpu_hot);
 
+DECLARE_PER_CPU_HOT(struct task_struct *, current_task);
+/* const-qualified alias provided by the linker. */
+DECLARE_PER_CPU_HOT(struct task_struct * const, const_current_task);
+
 static __always_inline struct task_struct *get_current(void)
 {
 	if (IS_ENABLED(CONFIG_USE_X86_SEG_SUPPORT))
-		return this_cpu_read_const(const_pcpu_hot.current_task);
+		return this_cpu_read_const(const_current_task);
 
-	return this_cpu_read_stable(pcpu_hot.current_task);
+	return this_cpu_read_stable(current_task);
 }
 
 #define current get_current()
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index cc19bd785f0e..370778c55091 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -560,7 +560,7 @@ do {									\
  * it is accessed while this_cpu_read_stable() allows the value to be cached.
  * this_cpu_read_stable() is more efficient and can be used if its value
  * is guaranteed to be valid across CPUs.  The current users include
- * pcpu_hot.current_task and cpu_current_top_of_stack, both of which are
+ * current_task and cpu_current_top_of_stack, both of which are
  * actually per-thread variables implemented as per-CPU variables and
  * thus stable for the duration of the respective task.
  */
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 54ace808defd..ad4ea6fb3b6c 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -107,7 +107,6 @@ static void __used common(void)
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
 	OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
 	OFFSET(TSS_sp2, tss_struct, x86_tss.sp2);
-	OFFSET(X86_current_task, pcpu_hot, current_task);
 #if IS_ENABLED(CONFIG_CRYPTO_ARIA_AESNI_AVX_X86_64)
 	/* Offset for fields in aria_ctx */
 	BLANK();
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 25a5806e15aa..f4ec6bcb2a5e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2014,12 +2014,14 @@ static __init int setup_clearcpuid(char *arg)
 }
 __setup("clearcpuid=", setup_clearcpuid);
 
-DEFINE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot) = {
-	.current_task	= &init_task,
-};
+DEFINE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot);
 EXPORT_PER_CPU_SYMBOL(pcpu_hot);
 EXPORT_PER_CPU_SYMBOL(const_pcpu_hot);
 
+DEFINE_PER_CPU_HOT(struct task_struct *, current_task) = &init_task;
+EXPORT_PER_CPU_SYMBOL(current_task);
+EXPORT_PER_CPU_SYMBOL(const_current_task);
+
 DEFINE_PER_CPU_HOT(int, __preempt_count) = INIT_PREEMPT_COUNT;
 EXPORT_PER_CPU_SYMBOL(__preempt_count);
 
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 2843b0a56198..fefe2a25cf02 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -322,7 +322,7 @@ SYM_INNER_LABEL(common_startup_64, SYM_L_LOCAL)
 	 *
 	 * RDX contains the per-cpu offset
 	 */
-	movq	pcpu_hot + X86_current_task(%rdx), %rax
+	movq	current_task(%rdx), %rax
 	movq	TASK_threadsp(%rax), %rsp
 
 	/*
@@ -433,7 +433,7 @@ SYM_CODE_START(soft_restart_cpu)
 	UNWIND_HINT_END_OF_STACK
 
 	/* Find the idle task stack */
-	movq	PER_CPU_VAR(pcpu_hot + X86_current_task), %rcx
+	movq	PER_CPU_VAR(current_task), %rcx
 	movq	TASK_threadsp(%rcx), %rsp
 
 	jmp	.Ljump_to_C_code
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 3afb2428bedb..c276dfda387f 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -206,7 +206,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	if (prev->gs | next->gs)
 		loadsegment(gs, next->gs);
 
-	raw_cpu_write(pcpu_hot.current_task, next_p);
+	raw_cpu_write(current_task, next_p);
 
 	switch_fpu_finish(next_p);
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 4252b11718f2..1517314da34a 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -668,7 +668,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/*
 	 * Switch the PDA and FPU contexts.
 	 */
-	raw_cpu_write(pcpu_hot.current_task, next_p);
+	raw_cpu_write(current_task, next_p);
 	raw_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
 	switch_fpu_finish(next_p);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 15e054f4cbf6..c89545a61d08 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -841,7 +841,7 @@ int common_cpu_up(unsigned int cpu, struct task_struct *idle)
 	/* Just in case we booted with a single CPU. */
 	alternatives_enable_smp();
 
-	per_cpu(pcpu_hot.current_task, cpu) = idle;
+	per_cpu(current_task, cpu) = idle;
 	cpu_init_stack_canary(cpu, idle);
 
 	/* Initialize the interrupt stack(s) */
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index ee019c1ea859..3c87bb620434 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -44,6 +44,7 @@ ENTRY(phys_startup_64)
 
 jiffies = jiffies_64;
 const_pcpu_hot = pcpu_hot;
+const_current_task = current_task;
 const_cpu_current_top_of_stack = cpu_current_top_of_stack;
 
 #if defined(CONFIG_X86_64)
diff --git a/scripts/gdb/linux/cpus.py b/scripts/gdb/linux/cpus.py
index 13eb8b3901b8..8f7c4fb78c2c 100644
--- a/scripts/gdb/linux/cpus.py
+++ b/scripts/gdb/linux/cpus.py
@@ -164,7 +164,7 @@ def get_current_task(cpu):
             var_ptr = gdb.parse_and_eval("(struct task_struct *)cpu_tasks[0].task")
             return var_ptr.dereference()
         else:
-            var_ptr = gdb.parse_and_eval("&pcpu_hot.current_task")
+            var_ptr = gdb.parse_and_eval("&current_task")
             return per_cpu(var_ptr, cpu).dereference()
     elif utils.is_target_arch("aarch64"):
         current_task_addr = gdb.parse_and_eval("(unsigned long)$SP_EL0")
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 07/11] x86/softirq: Move softirq_pending to percpu hot section
  2025-02-22 19:06 [RFC PATCH 00/11] Add a percpu subsection for hot data Brian Gerst
                   ` (5 preceding siblings ...)
  2025-02-22 19:06 ` [RFC PATCH 06/11] x86/percpu: Move current_task " Brian Gerst
@ 2025-02-22 19:06 ` Brian Gerst
  2025-02-22 19:06 ` [RFC PATCH 08/11] x86/irq: Move irq stacks " Brian Gerst
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2025-02-22 19:06 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Borislav Petkov,
	Ard Biesheuvel, Uros Bizjak, Brian Gerst

No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/current.h | 1 -
 arch/x86/include/asm/hardirq.h | 3 ++-
 arch/x86/kernel/irq.c          | 3 +++
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
index d51299af6145..6de46e2ae2b6 100644
--- a/arch/x86/include/asm/current.h
+++ b/arch/x86/include/asm/current.h
@@ -16,7 +16,6 @@ struct pcpu_hot {
 	union {
 		struct {
 			void			*hardirq_stack_ptr;
-			u16			softirq_pending;
 #ifdef CONFIG_X86_64
 			bool			hardirq_stack_inuse;
 #else
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 6ffa8b75f4cd..fa8ae99d62dd 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -66,7 +66,8 @@ extern u64 arch_irq_stat_cpu(unsigned int cpu);
 extern u64 arch_irq_stat(void);
 #define arch_irq_stat		arch_irq_stat
 
-#define local_softirq_pending_ref       pcpu_hot.softirq_pending
+DECLARE_PER_CPU_HOT(u16, softirq_pending);
+#define local_softirq_pending_ref       softirq_pending
 
 #if IS_ENABLED(CONFIG_KVM_INTEL)
 /*
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 385e3a5fc304..1b51d5c05583 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -31,6 +31,9 @@
 DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
 EXPORT_PER_CPU_SYMBOL(irq_stat);
 
+DEFINE_PER_CPU_HOT(u16, softirq_pending);
+EXPORT_PER_CPU_SYMBOL(softirq_pending);
+
 atomic_t irq_err_count;
 
 /*
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 08/11] x86/irq: Move irq stacks to percpu hot section
  2025-02-22 19:06 [RFC PATCH 00/11] Add a percpu subsection for hot data Brian Gerst
                   ` (6 preceding siblings ...)
  2025-02-22 19:06 ` [RFC PATCH 07/11] x86/softirq: Move softirq_pending " Brian Gerst
@ 2025-02-22 19:06 ` Brian Gerst
  2025-02-22 19:06 ` [RFC PATCH 09/11] x86/percpu: Remove pcpu_hot Brian Gerst
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2025-02-22 19:06 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Borislav Petkov,
	Ard Biesheuvel, Uros Bizjak, Brian Gerst

No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/current.h   | 10 ----------
 arch/x86/include/asm/irq_stack.h | 12 ++++++------
 arch/x86/include/asm/processor.h |  7 +++++++
 arch/x86/kernel/dumpstack_32.c   |  4 ++--
 arch/x86/kernel/dumpstack_64.c   |  2 +-
 arch/x86/kernel/irq.c            |  5 +++++
 arch/x86/kernel/irq_32.c         | 12 +++++++-----
 arch/x86/kernel/irq_64.c         |  6 +++---
 arch/x86/kernel/process_64.c     |  2 +-
 9 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
index 6de46e2ae2b6..043888c258bd 100644
--- a/arch/x86/include/asm/current.h
+++ b/arch/x86/include/asm/current.h
@@ -13,17 +13,7 @@
 struct task_struct;
 
 struct pcpu_hot {
-	union {
-		struct {
-			void			*hardirq_stack_ptr;
-#ifdef CONFIG_X86_64
-			bool			hardirq_stack_inuse;
-#else
-			void			*softirq_stack_ptr;
-#endif
-		};
 		u8	pad[64];
-	};
 };
 static_assert(sizeof(struct pcpu_hot) == 64);
 
diff --git a/arch/x86/include/asm/irq_stack.h b/arch/x86/include/asm/irq_stack.h
index 562a547c29a5..735c3a491f60 100644
--- a/arch/x86/include/asm/irq_stack.h
+++ b/arch/x86/include/asm/irq_stack.h
@@ -116,7 +116,7 @@
 	ASM_CALL_ARG2
 
 #define call_on_irqstack(func, asm_call, argconstr...)			\
-	call_on_stack(__this_cpu_read(pcpu_hot.hardirq_stack_ptr),	\
+	call_on_stack(__this_cpu_read(hardirq_stack_ptr),		\
 		      func, asm_call, argconstr)
 
 /* Macros to assert type correctness for run_*_on_irqstack macros */
@@ -135,7 +135,7 @@
 	 * User mode entry and interrupt on the irq stack do not	\
 	 * switch stacks. If from user mode the task stack is empty.	\
 	 */								\
-	if (user_mode(regs) || __this_cpu_read(pcpu_hot.hardirq_stack_inuse)) { \
+	if (user_mode(regs) || __this_cpu_read(hardirq_stack_inuse)) {	\
 		irq_enter_rcu();					\
 		func(c_args);						\
 		irq_exit_rcu();						\
@@ -146,9 +146,9 @@
 		 * places. Invoke the stack switch macro with the call	\
 		 * sequence which matches the above direct invocation.	\
 		 */							\
-		__this_cpu_write(pcpu_hot.hardirq_stack_inuse, true);	\
+		__this_cpu_write(hardirq_stack_inuse, true);		\
 		call_on_irqstack(func, asm_call, constr);		\
-		__this_cpu_write(pcpu_hot.hardirq_stack_inuse, false);	\
+		__this_cpu_write(hardirq_stack_inuse, false);		\
 	}								\
 }
 
@@ -212,9 +212,9 @@
  */
 #define do_softirq_own_stack()						\
 {									\
-	__this_cpu_write(pcpu_hot.hardirq_stack_inuse, true);		\
+	__this_cpu_write(hardirq_stack_inuse, true);			\
 	call_on_irqstack(__do_softirq, ASM_CALL_ARG0);			\
-	__this_cpu_write(pcpu_hot.hardirq_stack_inuse, false);		\
+	__this_cpu_write(hardirq_stack_inuse, false);			\
 }
 
 #endif
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 1505cb1d09a8..9dde4ffed917 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -420,6 +420,13 @@ struct irq_stack {
 	char		stack[IRQ_STACK_SIZE];
 } __aligned(IRQ_STACK_SIZE);
 
+DECLARE_PER_CPU_HOT(struct irq_stack *, hardirq_stack_ptr);
+#ifdef CONFIG_X86_64
+DECLARE_PER_CPU_HOT(bool, hardirq_stack_inuse);
+#else
+DECLARE_PER_CPU_HOT(struct irq_stack *, softirq_stack_ptr);
+#endif
+
 DECLARE_PER_CPU_HOT(unsigned long, cpu_current_top_of_stack);
 /* const-qualified alias provided by the linker. */
 DECLARE_PER_CPU_HOT(const unsigned long, const_cpu_current_top_of_stack);
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index b4905d5173fd..722fd712e1cf 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -37,7 +37,7 @@ const char *stack_type_name(enum stack_type type)
 
 static bool in_hardirq_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *begin = (unsigned long *)this_cpu_read(pcpu_hot.hardirq_stack_ptr);
+	unsigned long *begin = (unsigned long *)this_cpu_read(hardirq_stack_ptr);
 	unsigned long *end   = begin + (THREAD_SIZE / sizeof(long));
 
 	/*
@@ -62,7 +62,7 @@ static bool in_hardirq_stack(unsigned long *stack, struct stack_info *info)
 
 static bool in_softirq_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *begin = (unsigned long *)this_cpu_read(pcpu_hot.softirq_stack_ptr);
+	unsigned long *begin = (unsigned long *)this_cpu_read(softirq_stack_ptr);
 	unsigned long *end   = begin + (THREAD_SIZE / sizeof(long));
 
 	/*
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index f05339fee778..6c5defd6569a 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -134,7 +134,7 @@ static __always_inline bool in_exception_stack(unsigned long *stack, struct stac
 
 static __always_inline bool in_irq_stack(unsigned long *stack, struct stack_info *info)
 {
-	unsigned long *end = (unsigned long *)this_cpu_read(pcpu_hot.hardirq_stack_ptr);
+	unsigned long *end = (unsigned long *)this_cpu_read(hardirq_stack_ptr);
 	unsigned long *begin;
 
 	/*
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 1b51d5c05583..262e477b4651 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -34,6 +34,11 @@ EXPORT_PER_CPU_SYMBOL(irq_stat);
 DEFINE_PER_CPU_HOT(u16, softirq_pending);
 EXPORT_PER_CPU_SYMBOL(softirq_pending);
 
+DEFINE_PER_CPU_HOT(struct irq_stack *, hardirq_stack_ptr);
+#ifdef CONFIG_X86_64
+DEFINE_PER_CPU_HOT(bool, hardirq_stack_inuse);
+#endif
+
 atomic_t irq_err_count;
 
 /*
diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c
index dc1049c01f9b..dd7d9ba87bd0 100644
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -52,6 +52,8 @@ static inline int check_stack_overflow(void) { return 0; }
 static inline void print_stack_overflow(void) { }
 #endif
 
+DEFINE_PER_CPU_HOT(struct irq_stack *, softirq_stack_ptr);
+
 static void call_on_stack(void *func, void *stack)
 {
 	asm volatile("xchgl	%%ebx,%%esp	\n"
@@ -74,7 +76,7 @@ static inline int execute_on_irq_stack(int overflow, struct irq_desc *desc)
 	u32 *isp, *prev_esp, arg1;
 
 	curstk = (struct irq_stack *) current_stack();
-	irqstk = __this_cpu_read(pcpu_hot.hardirq_stack_ptr);
+	irqstk = __this_cpu_read(hardirq_stack_ptr);
 
 	/*
 	 * this is where we switch to the IRQ stack. However, if we are
@@ -112,7 +114,7 @@ int irq_init_percpu_irqstack(unsigned int cpu)
 	int node = cpu_to_node(cpu);
 	struct page *ph, *ps;
 
-	if (per_cpu(pcpu_hot.hardirq_stack_ptr, cpu))
+	if (per_cpu(hardirq_stack_ptr, cpu))
 		return 0;
 
 	ph = alloc_pages_node(node, THREADINFO_GFP, THREAD_SIZE_ORDER);
@@ -124,8 +126,8 @@ int irq_init_percpu_irqstack(unsigned int cpu)
 		return -ENOMEM;
 	}
 
-	per_cpu(pcpu_hot.hardirq_stack_ptr, cpu) = page_address(ph);
-	per_cpu(pcpu_hot.softirq_stack_ptr, cpu) = page_address(ps);
+	per_cpu(hardirq_stack_ptr, cpu) = page_address(ph);
+	per_cpu(softirq_stack_ptr, cpu) = page_address(ps);
 	return 0;
 }
 
@@ -135,7 +137,7 @@ void do_softirq_own_stack(void)
 	struct irq_stack *irqstk;
 	u32 *isp, *prev_esp;
 
-	irqstk = __this_cpu_read(pcpu_hot.softirq_stack_ptr);
+	irqstk = __this_cpu_read(softirq_stack_ptr);
 
 	/* build the stack frame on the softirq stack */
 	isp = (u32 *) ((char *)irqstk + sizeof(*irqstk));
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index 56bdeecd8ee0..4834e317e568 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -50,7 +50,7 @@ static int map_irq_stack(unsigned int cpu)
 		return -ENOMEM;
 
 	/* Store actual TOS to avoid adjustment in the hotpath */
-	per_cpu(pcpu_hot.hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE - 8;
+	per_cpu(hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE - 8;
 	return 0;
 }
 #else
@@ -63,14 +63,14 @@ static int map_irq_stack(unsigned int cpu)
 	void *va = per_cpu_ptr(&irq_stack_backing_store, cpu);
 
 	/* Store actual TOS to avoid adjustment in the hotpath */
-	per_cpu(pcpu_hot.hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE - 8;
+	per_cpu(hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE - 8;
 	return 0;
 }
 #endif
 
 int irq_init_percpu_irqstack(unsigned int cpu)
 {
-	if (per_cpu(pcpu_hot.hardirq_stack_ptr, cpu))
+	if (per_cpu(hardirq_stack_ptr, cpu))
 		return 0;
 	return map_irq_stack(cpu);
 }
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 1517314da34a..13893ec03d85 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -614,7 +614,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	int cpu = smp_processor_id();
 
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) &&
-		     this_cpu_read(pcpu_hot.hardirq_stack_inuse));
+		     this_cpu_read(hardirq_stack_inuse));
 
 	if (!test_tsk_thread_flag(prev_p, TIF_NEED_FPU_LOAD))
 		switch_fpu_prepare(prev_p, cpu);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 09/11] x86/percpu: Remove pcpu_hot
  2025-02-22 19:06 [RFC PATCH 00/11] Add a percpu subsection for hot data Brian Gerst
                   ` (7 preceding siblings ...)
  2025-02-22 19:06 ` [RFC PATCH 08/11] x86/irq: Move irq stacks " Brian Gerst
@ 2025-02-22 19:06 ` Brian Gerst
  2025-02-22 19:06 ` [RFC PATCH 10/11] x86/stackprotector: Move __stack_chk_guard to percpu hot section Brian Gerst
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2025-02-22 19:06 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Borislav Petkov,
	Ard Biesheuvel, Uros Bizjak, Brian Gerst

All fields have been moved to the percpu hot section.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/current.h | 11 -----------
 arch/x86/kernel/cpu/common.c   |  4 ----
 arch/x86/kernel/vmlinux.lds.S  |  1 -
 3 files changed, 16 deletions(-)

diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
index 043888c258bd..b6d1adb5538f 100644
--- a/arch/x86/include/asm/current.h
+++ b/arch/x86/include/asm/current.h
@@ -12,17 +12,6 @@
 
 struct task_struct;
 
-struct pcpu_hot {
-		u8	pad[64];
-};
-static_assert(sizeof(struct pcpu_hot) == 64);
-
-DECLARE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot);
-
-/* const-qualified alias to pcpu_hot, aliased by linker. */
-DECLARE_PER_CPU_ALIGNED(const struct pcpu_hot __percpu_seg_override,
-			const_pcpu_hot);
-
 DECLARE_PER_CPU_HOT(struct task_struct *, current_task);
 /* const-qualified alias provided by the linker. */
 DECLARE_PER_CPU_HOT(struct task_struct * const, const_current_task);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f4ec6bcb2a5e..ba78ee8fdb21 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2014,10 +2014,6 @@ static __init int setup_clearcpuid(char *arg)
 }
 __setup("clearcpuid=", setup_clearcpuid);
 
-DEFINE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot);
-EXPORT_PER_CPU_SYMBOL(pcpu_hot);
-EXPORT_PER_CPU_SYMBOL(const_pcpu_hot);
-
 DEFINE_PER_CPU_HOT(struct task_struct *, current_task) = &init_task;
 EXPORT_PER_CPU_SYMBOL(current_task);
 EXPORT_PER_CPU_SYMBOL(const_current_task);
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 3c87bb620434..0cfdaa0e05a0 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -43,7 +43,6 @@ ENTRY(phys_startup_64)
 #endif
 
 jiffies = jiffies_64;
-const_pcpu_hot = pcpu_hot;
 const_current_task = current_task;
 const_cpu_current_top_of_stack = cpu_current_top_of_stack;
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 10/11] x86/stackprotector: Move __stack_chk_guard to percpu hot section
  2025-02-22 19:06 [RFC PATCH 00/11] Add a percpu subsection for hot data Brian Gerst
                   ` (8 preceding siblings ...)
  2025-02-22 19:06 ` [RFC PATCH 09/11] x86/percpu: Remove pcpu_hot Brian Gerst
@ 2025-02-22 19:06 ` Brian Gerst
  2025-02-22 19:06 ` [RFC PATCH 11/11] x86/smp: Move this_cpu_off " Brian Gerst
  2025-02-23  9:36 ` [RFC PATCH 00/11] Add a percpu subsection for hot data Ingo Molnar
  11 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2025-02-22 19:06 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Borislav Petkov,
	Ard Biesheuvel, Uros Bizjak, Brian Gerst

No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/stackprotector.h | 2 +-
 arch/x86/kernel/cpu/common.c          | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
index d43fb589fcf6..f08304ac262b 100644
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -20,7 +20,7 @@
 
 #include <linux/sched.h>
 
-DECLARE_PER_CPU(unsigned long, __stack_chk_guard);
+DECLARE_PER_CPU_HOT(unsigned long, __stack_chk_guard);
 
 /*
  * Initialize the stackprotector canary value.
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index ba78ee8fdb21..eb7ac92e8565 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2089,7 +2089,7 @@ void syscall_init(void)
 #endif /* CONFIG_X86_64 */
 
 #ifdef CONFIG_STACKPROTECTOR
-DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
+DEFINE_PER_CPU_HOT(unsigned long, __stack_chk_guard);
 #ifndef CONFIG_SMP
 EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
 #endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 11/11] x86/smp: Move this_cpu_off to percpu hot section
  2025-02-22 19:06 [RFC PATCH 00/11] Add a percpu subsection for hot data Brian Gerst
                   ` (9 preceding siblings ...)
  2025-02-22 19:06 ` [RFC PATCH 10/11] x86/stackprotector: Move __stack_chk_guard to percpu hot section Brian Gerst
@ 2025-02-22 19:06 ` Brian Gerst
  2025-02-23  9:36 ` [RFC PATCH 00/11] Add a percpu subsection for hot data Ingo Molnar
  11 siblings, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2025-02-22 19:06 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Borislav Petkov,
	Ard Biesheuvel, Uros Bizjak, Brian Gerst

No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/percpu.h  | 2 +-
 arch/x86/kernel/setup_percpu.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 370778c55091..e8034fe81ec1 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -595,7 +595,7 @@ do {									\
 #include <asm-generic/percpu.h>
 
 /* We can use this directly for local CPU (faster). */
-DECLARE_PER_CPU_READ_MOSTLY(unsigned long, this_cpu_off);
+DECLARE_PER_CPU_HOT(unsigned long, this_cpu_off);
 
 #endif /* !__ASSEMBLY__ */
 
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 0ea3443433c5..11a81e2a9675 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -26,7 +26,7 @@
 DEFINE_PER_CPU_HOT(int, cpu_number);
 EXPORT_PER_CPU_SYMBOL(cpu_number);
 
-DEFINE_PER_CPU_READ_MOSTLY(unsigned long, this_cpu_off);
+DEFINE_PER_CPU_HOT(unsigned long, this_cpu_off);
 EXPORT_PER_CPU_SYMBOL(this_cpu_off);
 
 unsigned long __per_cpu_offset[NR_CPUS] __ro_after_init;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 00/11] Add a percpu subsection for hot data
  2025-02-22 19:06 [RFC PATCH 00/11] Add a percpu subsection for hot data Brian Gerst
                   ` (10 preceding siblings ...)
  2025-02-22 19:06 ` [RFC PATCH 11/11] x86/smp: Move this_cpu_off " Brian Gerst
@ 2025-02-23  9:36 ` Ingo Molnar
  2025-02-23 10:20   ` Ard Biesheuvel
  2025-02-23 18:00   ` Linus Torvalds
  11 siblings, 2 replies; 22+ messages in thread
From: Ingo Molnar @ 2025-02-23  9:36 UTC (permalink / raw)
  To: Brian Gerst
  Cc: linux-kernel, x86, H . Peter Anvin, Thomas Gleixner,
	Borislav Petkov, Ard Biesheuvel, Uros Bizjak, Linus Torvalds,
	Andy Lutomirski, Peter Zijlstra, Andrew Morton


* Brian Gerst <brgerst@gmail.com> wrote:

> Add a new percpu subsection for data that is frequently accessed and
> exclusive to each processor.  This is intended to replace the pcpu_hot
> struct on X86, and is available to all architectures.
> 
> The one caveat with this approach is that it depends on the linker to
> effeciently pack data that is smaller than machine word size.  The
> binutils linker does this properly:
> 
> ffffffff842f6000 D __per_cpu_hot_start
> ffffffff842f6000 D softirq_pending
> ffffffff842f6002 D hardirq_stack_inuse
> ffffffff842f6008 D hardirq_stack_ptr
> ffffffff842f6010 D __ref_stack_chk_guard
> ffffffff842f6010 D __stack_chk_guard
> ffffffff842f6018 D const_cpu_current_top_of_stack
> ffffffff842f6018 D cpu_current_top_of_stack
> ffffffff842f6020 D const_current_task
> ffffffff842f6020 D current_task
> ffffffff842f6028 D __preempt_count
> ffffffff842f602c D cpu_number
> ffffffff842f6030 D this_cpu_off
> ffffffff842f6038 D __x86_call_depth
> ffffffff842f6040 D __per_cpu_hot_end
> 
> The LLVM linker doesn't do as well with packing smaller data objects,
> causing it to spill over into a second cacheline.

Ok, so I like it how it decentralizes the decision about what is 'hot' 
and what is not:

  --- a/arch/x86/kernel/irq.c
  +++ b/arch/x86/kernel/irq.c

  DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
  EXPORT_PER_CPU_SYMBOL(irq_stat);

  +DEFINE_PER_CPU_HOT(u16, softirq_pending);

This can also be a drawback if it's abused by random driver code - so I 
think it should at minimum be documented to be used by core & arch 
code. Maybe add a build #error too if it's defined in modular code?

Other variants like DEFINE_PER_CPU_SHARED_ALIGNED aren't being abused 
really AFAICS, so maybe this isn't too much of a concern.

One potential drawback would be that previously the section was 
hand-ordered:

 struct pcpu_hot {
        union {
                struct {
                        struct task_struct      *current_task;
                        int                     preempt_count;
                        int                     cpu_number;
 #ifdef CONFIG_MITIGATION_CALL_DEPTH_TRACKING
                        u64                     call_depth;
 #endif
                        unsigned long           top_of_stack;
                        void                    *hardirq_stack_ptr;
                        u16                     softirq_pending;
 #ifdef CONFIG_X86_64
                        bool                    hardirq_stack_inuse;
 #else
                        void                    *softirq_stack_ptr;
 #endif
                };
                u8      pad[64];
        };
 };

... while now it's linker-ordered. But on the other hand that can be an 
advantage too: the linker will try to (or at least has a chance to) 
order the fields optimally for cache density, while the hand-packing 
always has the potential to bitrot without much of an outside, 
actionable indicator for the bitrot.

One naming suggestion, wouldn't it be better to make it explicit that 
the 'hot' qualifier is about cache locality:

  +DEFINE_PER_CPU_CACHE_HOT(u16, softirq_pending);

Makes it more of a mouthful to write definitions/declarations, but the 
actual per-cpu usage sites are unaffected as this too is otherwise part 
of the generic percpu namespace.

... and yes, DEFINE_PER_CPU_ALIGNED should probably have been named 
DEFINE_PER_CPU_CACHE_ALIGNED too. (Because 'aligned' often means 
machine word unit, so the naming is a bit ambiguous.)

I.e. in an ideal world the complete set of DEFINE_PER_CPU_XXX 
attributes should be something like:

 DEFINE_PER_CPU_CACHE_HOT
 DEFINE_PER_CPU_CACHE_ALIGNED		# was: DEFINE_PER_CPU_ALIGNED
 DEFINE_PER_CPU_CACHE_ALIGNED_SHARED	# was: DEFINE_PER_CPU_SHARED_ALIGNED

 DEFINE_PER_CPU_PAGE_ALIGNED

 DEFINE_PER_CPU_READ_MOSTLY
 DEFINE_PER_CPU_DECRYPTED

But I digress...

Anyway, I've Cc:-ed various potentially interested parties, please 
speak up now or forever hold your peace. ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 02/11] x86/preempt: Move preempt count to percpu hot section
  2025-02-22 19:06 ` [RFC PATCH 02/11] x86/preempt: Move preempt count to " Brian Gerst
@ 2025-02-23 10:05   ` kernel test robot
  2025-02-23 10:49   ` kernel test robot
  2025-02-23 11:31   ` kernel test robot
  2 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2025-02-23 10:05 UTC (permalink / raw)
  To: Brian Gerst; +Cc: oe-kbuild-all

Hi Brian,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on 01157ddc58dc2fe428ec17dd5a18cc13f134639f]

url:    https://github.com/intel-lab-lkp/linux/commits/Brian-Gerst/percpu-Introduce-percpu-hot-section/20250223-031046
base:   01157ddc58dc2fe428ec17dd5a18cc13f134639f
patch link:    https://lore.kernel.org/r/20250222190623.262689-3-brgerst%40gmail.com
patch subject: [RFC PATCH 02/11] x86/preempt: Move preempt count to percpu hot section
config: x86_64-buildonly-randconfig-003-20250223 (https://download.01.org/0day-ci/archive/20250223/202502231726.Sy9P7hJc-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250223/202502231726.Sy9P7hJc-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202502231726.Sy9P7hJc-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from include/linux/spinlock.h:56,
                    from include/linux/swait.h:7,
                    from include/linux/completion.h:12,
                    from include/linux/crypto.h:15,
                    from arch/x86/kernel/asm-offsets.c:9:
>> include/linux/preempt.h:340:34: warning: 'struct task_struct' declared inside parameter list will not be visible outside of this definition or declaration
     340 |                           struct task_struct *next);
         |                                  ^~~~~~~~~~~
--
   In file included from include/linux/spinlock.h:56,
                    from include/linux/wait.h:9,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from include/linux/highmem.h:5,
                    from include/linux/bvec.h:10,
                    from include/linux/blk_types.h:10,
                    from drivers/md/dm-vdo/vdo.h:10,
                    from drivers/md/dm-vdo/vdo.c:30:
>> include/linux/preempt.h:340:34: warning: 'struct task_struct' declared inside parameter list will not be visible outside of this definition or declaration
     340 |                           struct task_struct *next);
         |                                  ^~~~~~~~~~~
   drivers/md/dm-vdo/vdo.c: In function 'vdo_make':
   drivers/md/dm-vdo/vdo.c:562:19: warning: '%s' directive output may be truncated writing up to 55 bytes into a region of size 16 [-Wformat-truncation=]
     562 |                  "%s%u", MODULE_NAME, instance);
         |                   ^~
   drivers/md/dm-vdo/vdo.c:561:9: note: 'snprintf' output between 2 and 66 bytes into a destination of size 16
     561 |         snprintf(vdo->thread_name_prefix, sizeof(vdo->thread_name_prefix),
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     562 |                  "%s%u", MODULE_NAME, instance);
         |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
   In file included from include/linux/spinlock.h:56,
                    from include/linux/wait.h:9,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from include/linux/highmem.h:5,
                    from kernel/sched/core.c:10:
>> include/linux/preempt.h:340:34: warning: 'struct task_struct' declared inside parameter list will not be visible outside of this definition or declaration
     340 |                           struct task_struct *next);
         |                                  ^~~~~~~~~~~
   kernel/sched/core.c: In function '__fire_sched_out_preempt_notifiers':
   kernel/sched/core.c:4958:52: error: passing argument 2 of 'notifier->ops->sched_out' from incompatible pointer type [-Werror=incompatible-pointer-types]
    4958 |                 notifier->ops->sched_out(notifier, next);
         |                                                    ^~~~
         |                                                    |
         |                                                    struct task_struct *
   kernel/sched/core.c:4958:52: note: expected 'struct task_struct *' but argument is of type 'struct task_struct *'
   cc1: some warnings being treated as errors
--
   In file included from include/linux/spinlock.h:56,
                    from include/linux/wait.h:9,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from include/linux/highmem.h:5,
                    from include/linux/bvec.h:10,
                    from include/linux/blk_types.h:10,
                    from vdo.h:10,
                    from vdo.c:30:
>> include/linux/preempt.h:340:34: warning: 'struct task_struct' declared inside parameter list will not be visible outside of this definition or declaration
     340 |                           struct task_struct *next);
         |                                  ^~~~~~~~~~~
   vdo.c: In function 'vdo_make':
   vdo.c:562:19: warning: '%s' directive output may be truncated writing up to 55 bytes into a region of size 16 [-Wformat-truncation=]
     562 |                  "%s%u", MODULE_NAME, instance);
         |                   ^~
   vdo.c:561:9: note: 'snprintf' output between 2 and 66 bytes into a destination of size 16
     561 |         snprintf(vdo->thread_name_prefix, sizeof(vdo->thread_name_prefix),
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     562 |                  "%s%u", MODULE_NAME, instance);
         |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
   In file included from include/linux/spinlock.h:56,
                    from include/linux/swait.h:7,
                    from include/linux/completion.h:12,
                    from include/linux/crypto.h:15,
                    from arch/x86/kernel/asm-offsets.c:9:
>> include/linux/preempt.h:340:34: warning: 'struct task_struct' declared inside parameter list will not be visible outside of this definition or declaration
     340 |                           struct task_struct *next);
         |                                  ^~~~~~~~~~~


vim +340 include/linux/preempt.h

e107be36efb2a2 Avi Kivity 2007-07-26  322  
e107be36efb2a2 Avi Kivity 2007-07-26  323  /**
e107be36efb2a2 Avi Kivity 2007-07-26  324   * preempt_ops - notifiers called when a task is preempted and rescheduled
e107be36efb2a2 Avi Kivity 2007-07-26  325   * @sched_in: we're about to be rescheduled:
e107be36efb2a2 Avi Kivity 2007-07-26  326   *    notifier: struct preempt_notifier for the task being scheduled
e107be36efb2a2 Avi Kivity 2007-07-26  327   *    cpu:  cpu we're scheduled on
e107be36efb2a2 Avi Kivity 2007-07-26  328   * @sched_out: we've just been preempted
e107be36efb2a2 Avi Kivity 2007-07-26  329   *    notifier: struct preempt_notifier for the task being preempted
e107be36efb2a2 Avi Kivity 2007-07-26  330   *    next: the task that's kicking us out
8592e6486a177a Tejun Heo  2009-12-02  331   *
8592e6486a177a Tejun Heo  2009-12-02  332   * Please note that sched_in and out are called under different
8592e6486a177a Tejun Heo  2009-12-02  333   * contexts.  sched_out is called with rq lock held and irq disabled
8592e6486a177a Tejun Heo  2009-12-02  334   * while sched_in is called without rq lock and irq enabled.  This
8592e6486a177a Tejun Heo  2009-12-02  335   * difference is intentional and depended upon by its users.
e107be36efb2a2 Avi Kivity 2007-07-26  336   */
e107be36efb2a2 Avi Kivity 2007-07-26  337  struct preempt_ops {
e107be36efb2a2 Avi Kivity 2007-07-26  338  	void (*sched_in)(struct preempt_notifier *notifier, int cpu);
e107be36efb2a2 Avi Kivity 2007-07-26  339  	void (*sched_out)(struct preempt_notifier *notifier,
e107be36efb2a2 Avi Kivity 2007-07-26 @340  			  struct task_struct *next);
e107be36efb2a2 Avi Kivity 2007-07-26  341  };
e107be36efb2a2 Avi Kivity 2007-07-26  342  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 00/11] Add a percpu subsection for hot data
  2025-02-23  9:36 ` [RFC PATCH 00/11] Add a percpu subsection for hot data Ingo Molnar
@ 2025-02-23 10:20   ` Ard Biesheuvel
  2025-02-23 10:30     ` Uros Bizjak
  2025-02-23 14:44     ` Brian Gerst
  2025-02-23 18:00   ` Linus Torvalds
  1 sibling, 2 replies; 22+ messages in thread
From: Ard Biesheuvel @ 2025-02-23 10:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Brian Gerst, linux-kernel, x86, H . Peter Anvin, Thomas Gleixner,
	Borislav Petkov, Uros Bizjak, Linus Torvalds, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton

On Sun, 23 Feb 2025 at 10:37, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Brian Gerst <brgerst@gmail.com> wrote:
>
> > Add a new percpu subsection for data that is frequently accessed and
> > exclusive to each processor.  This is intended to replace the pcpu_hot
> > struct on X86, and is available to all architectures.
> >
> > The one caveat with this approach is that it depends on the linker to
> > effeciently pack data that is smaller than machine word size.  The
> > binutils linker does this properly:
> >
> > ffffffff842f6000 D __per_cpu_hot_start
> > ffffffff842f6000 D softirq_pending
> > ffffffff842f6002 D hardirq_stack_inuse
> > ffffffff842f6008 D hardirq_stack_ptr
> > ffffffff842f6010 D __ref_stack_chk_guard
> > ffffffff842f6010 D __stack_chk_guard
> > ffffffff842f6018 D const_cpu_current_top_of_stack
> > ffffffff842f6018 D cpu_current_top_of_stack
> > ffffffff842f6020 D const_current_task
> > ffffffff842f6020 D current_task
> > ffffffff842f6028 D __preempt_count
> > ffffffff842f602c D cpu_number
> > ffffffff842f6030 D this_cpu_off
> > ffffffff842f6038 D __x86_call_depth
> > ffffffff842f6040 D __per_cpu_hot_end
> >
> > The LLVM linker doesn't do as well with packing smaller data objects,
> > causing it to spill over into a second cacheline.
>
> ... now it's linker-ordered. But on the other hand that can be an
> advantage too: the linker will try to (or at least has a chance to)
> order the fields optimally for cache density, while the hand-packing
> always has the potential to bitrot without much of an outside,
> actionable indicator for the bitrot.
>

The linker will need some help here - by default, it just emits these
variables in the order they appear in the input.

If we emit each such variable 'foo' into .data..hot.foo, and define
the contents of the section as

*(SORT_BY_ALIGNMENT(.data..hot.*))

we should get optimal packing as long as the alignment of these
variables does not exceed their size.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 00/11] Add a percpu subsection for hot data
  2025-02-23 10:20   ` Ard Biesheuvel
@ 2025-02-23 10:30     ` Uros Bizjak
  2025-02-23 17:25       ` Brian Gerst
  2025-02-23 14:44     ` Brian Gerst
  1 sibling, 1 reply; 22+ messages in thread
From: Uros Bizjak @ 2025-02-23 10:30 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ingo Molnar, Brian Gerst, linux-kernel, x86, H . Peter Anvin,
	Thomas Gleixner, Borislav Petkov, Linus Torvalds, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton

On Sun, Feb 23, 2025 at 11:20 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Sun, 23 Feb 2025 at 10:37, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Brian Gerst <brgerst@gmail.com> wrote:
> >
> > > Add a new percpu subsection for data that is frequently accessed and
> > > exclusive to each processor.  This is intended to replace the pcpu_hot
> > > struct on X86, and is available to all architectures.
> > >
> > > The one caveat with this approach is that it depends on the linker to
> > > effeciently pack data that is smaller than machine word size.  The
> > > binutils linker does this properly:
> > >
> > > ffffffff842f6000 D __per_cpu_hot_start
> > > ffffffff842f6000 D softirq_pending
> > > ffffffff842f6002 D hardirq_stack_inuse
> > > ffffffff842f6008 D hardirq_stack_ptr
> > > ffffffff842f6010 D __ref_stack_chk_guard
> > > ffffffff842f6010 D __stack_chk_guard
> > > ffffffff842f6018 D const_cpu_current_top_of_stack
> > > ffffffff842f6018 D cpu_current_top_of_stack
> > > ffffffff842f6020 D const_current_task
> > > ffffffff842f6020 D current_task
> > > ffffffff842f6028 D __preempt_count
> > > ffffffff842f602c D cpu_number
> > > ffffffff842f6030 D this_cpu_off
> > > ffffffff842f6038 D __x86_call_depth
> > > ffffffff842f6040 D __per_cpu_hot_end
> > >
> > > The LLVM linker doesn't do as well with packing smaller data objects,
> > > causing it to spill over into a second cacheline.
> >
> > ... now it's linker-ordered. But on the other hand that can be an
> > advantage too: the linker will try to (or at least has a chance to)
> > order the fields optimally for cache density, while the hand-packing
> > always has the potential to bitrot without much of an outside,
> > actionable indicator for the bitrot.
> >
>
> The linker will need some help here - by default, it just emits these
> variables in the order they appear in the input.
>
> If we emit each such variable 'foo' into .data..hot.foo, and define
> the contents of the section as
>
> *(SORT_BY_ALIGNMENT(.data..hot.*))
>
> we should get optimal packing as long as the alignment of these
> variables does not exceed their size.

Is it possible to warn/error when data is spilled over the cache line?
Previously, there was:

-static_assert(sizeof(struct pcpu_hot) == 64);

that failed the build in this case.

Uros.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 02/11] x86/preempt: Move preempt count to percpu hot section
  2025-02-22 19:06 ` [RFC PATCH 02/11] x86/preempt: Move preempt count to " Brian Gerst
  2025-02-23 10:05   ` kernel test robot
@ 2025-02-23 10:49   ` kernel test robot
  2025-02-23 11:31   ` kernel test robot
  2 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2025-02-23 10:49 UTC (permalink / raw)
  To: Brian Gerst; +Cc: oe-kbuild-all

Hi Brian,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on 01157ddc58dc2fe428ec17dd5a18cc13f134639f]

url:    https://github.com/intel-lab-lkp/linux/commits/Brian-Gerst/percpu-Introduce-percpu-hot-section/20250223-031046
base:   01157ddc58dc2fe428ec17dd5a18cc13f134639f
patch link:    https://lore.kernel.org/r/20250222190623.262689-3-brgerst%40gmail.com
patch subject: [RFC PATCH 02/11] x86/preempt: Move preempt count to percpu hot section
config: x86_64-buildonly-randconfig-003-20250223 (https://download.01.org/0day-ci/archive/20250223/202502231853.Kren3HuC-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250223/202502231853.Kren3HuC-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202502231853.Kren3HuC-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/spinlock.h:56,
                    from include/linux/wait.h:9,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from include/linux/highmem.h:5,
                    from kernel/sched/core.c:10:
   include/linux/preempt.h:340:34: warning: 'struct task_struct' declared inside parameter list will not be visible outside of this definition or declaration
     340 |                           struct task_struct *next);
         |                                  ^~~~~~~~~~~
   kernel/sched/core.c: In function '__fire_sched_out_preempt_notifiers':
>> kernel/sched/core.c:4958:52: error: passing argument 2 of 'notifier->ops->sched_out' from incompatible pointer type [-Werror=incompatible-pointer-types]
    4958 |                 notifier->ops->sched_out(notifier, next);
         |                                                    ^~~~
         |                                                    |
         |                                                    struct task_struct *
   kernel/sched/core.c:4958:52: note: expected 'struct task_struct *' but argument is of type 'struct task_struct *'
   cc1: some warnings being treated as errors


vim +4958 kernel/sched/core.c

1cde2930e15473c kernel/sched/core.c Peter Zijlstra 2015-06-08  4950  
e107be36efb2a23 kernel/sched.c      Avi Kivity     2007-07-26  4951  static void
1cde2930e15473c kernel/sched/core.c Peter Zijlstra 2015-06-08  4952  __fire_sched_out_preempt_notifiers(struct task_struct *curr,
e107be36efb2a23 kernel/sched.c      Avi Kivity     2007-07-26  4953  				   struct task_struct *next)
e107be36efb2a23 kernel/sched.c      Avi Kivity     2007-07-26  4954  {
e107be36efb2a23 kernel/sched.c      Avi Kivity     2007-07-26  4955  	struct preempt_notifier *notifier;
e107be36efb2a23 kernel/sched.c      Avi Kivity     2007-07-26  4956  
b67bfe0d42cac56 kernel/sched/core.c Sasha Levin    2013-02-27  4957  	hlist_for_each_entry(notifier, &curr->preempt_notifiers, link)
e107be36efb2a23 kernel/sched.c      Avi Kivity     2007-07-26 @4958  		notifier->ops->sched_out(notifier, next);
e107be36efb2a23 kernel/sched.c      Avi Kivity     2007-07-26  4959  }
e107be36efb2a23 kernel/sched.c      Avi Kivity     2007-07-26  4960  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 02/11] x86/preempt: Move preempt count to percpu hot section
  2025-02-22 19:06 ` [RFC PATCH 02/11] x86/preempt: Move preempt count to " Brian Gerst
  2025-02-23 10:05   ` kernel test robot
  2025-02-23 10:49   ` kernel test robot
@ 2025-02-23 11:31   ` kernel test robot
  2 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2025-02-23 11:31 UTC (permalink / raw)
  To: Brian Gerst; +Cc: llvm, oe-kbuild-all

Hi Brian,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on 01157ddc58dc2fe428ec17dd5a18cc13f134639f]

url:    https://github.com/intel-lab-lkp/linux/commits/Brian-Gerst/percpu-Introduce-percpu-hot-section/20250223-031046
base:   01157ddc58dc2fe428ec17dd5a18cc13f134639f
patch link:    https://lore.kernel.org/r/20250222190623.262689-3-brgerst%40gmail.com
patch subject: [RFC PATCH 02/11] x86/preempt: Move preempt count to percpu hot section
config: i386-buildonly-randconfig-004-20250223 (https://download.01.org/0day-ci/archive/20250223/202502231946.RXtsMrTp-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250223/202502231946.RXtsMrTp-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202502231946.RXtsMrTp-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from arch/x86/kernel/asm-offsets.c:9:
   In file included from include/linux/crypto.h:15:
   In file included from include/linux/completion.h:12:
   In file included from include/linux/swait.h:7:
   In file included from include/linux/spinlock.h:56:
>> include/linux/preempt.h:340:13: warning: declaration of 'struct task_struct' will not be visible outside of this function [-Wvisibility]
     340 |                           struct task_struct *next);
         |                                  ^
   1 warning generated.
--
   In file included from drivers/platform/surface/surface_hotplug.c:16:
   In file included from include/linux/acpi.h:13:
   In file included from include/linux/resource_ext.h:11:
   In file included from include/linux/slab.h:16:
   In file included from include/linux/gfp.h:7:
   In file included from include/linux/mmzone.h:8:
   In file included from include/linux/spinlock.h:56:
>> include/linux/preempt.h:340:13: warning: declaration of 'struct task_struct' will not be visible outside of this function [-Wvisibility]
     340 |                           struct task_struct *next);
         |                                  ^
   drivers/platform/surface/surface_hotplug.c:79:39: warning: arithmetic between different enumeration types ('enum shps_dsm_fn' and 'enum shps_irq_type') [-Wenum-enum-conversion]
      79 |         return SHPS_DSM_FN_IRQ_BASE_PRESENCE + type;
         |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~
   2 warnings generated.
--
   In file included from kernel/sched/core.c:10:
   In file included from include/linux/highmem.h:5:
   In file included from include/linux/fs.h:6:
   In file included from include/linux/wait_bit.h:8:
   In file included from include/linux/wait.h:9:
   In file included from include/linux/spinlock.h:56:
>> include/linux/preempt.h:340:13: warning: declaration of 'struct task_struct' will not be visible outside of this function [-Wvisibility]
     340 |                           struct task_struct *next);
         |                                  ^
   kernel/sched/core.c:4958:38: error: incompatible pointer types passing 'struct task_struct *' to parameter of type 'struct task_struct *' [-Werror,-Wincompatible-pointer-types]
    4958 |                 notifier->ops->sched_out(notifier, next);
         |                                                    ^~~~
   1 warning and 1 error generated.
--
   In file included from arch/x86/kernel/asm-offsets.c:9:
   In file included from include/linux/crypto.h:15:
   In file included from include/linux/completion.h:12:
   In file included from include/linux/swait.h:7:
   In file included from include/linux/spinlock.h:56:
>> include/linux/preempt.h:340:13: warning: declaration of 'struct task_struct' will not be visible outside of this function [-Wvisibility]
     340 |                           struct task_struct *next);
         |                                  ^
   1 warning generated.


vim +340 include/linux/preempt.h

e107be36efb2a23 Avi Kivity 2007-07-26  322  
e107be36efb2a23 Avi Kivity 2007-07-26  323  /**
e107be36efb2a23 Avi Kivity 2007-07-26  324   * preempt_ops - notifiers called when a task is preempted and rescheduled
e107be36efb2a23 Avi Kivity 2007-07-26  325   * @sched_in: we're about to be rescheduled:
e107be36efb2a23 Avi Kivity 2007-07-26  326   *    notifier: struct preempt_notifier for the task being scheduled
e107be36efb2a23 Avi Kivity 2007-07-26  327   *    cpu:  cpu we're scheduled on
e107be36efb2a23 Avi Kivity 2007-07-26  328   * @sched_out: we've just been preempted
e107be36efb2a23 Avi Kivity 2007-07-26  329   *    notifier: struct preempt_notifier for the task being preempted
e107be36efb2a23 Avi Kivity 2007-07-26  330   *    next: the task that's kicking us out
8592e6486a177a0 Tejun Heo  2009-12-02  331   *
8592e6486a177a0 Tejun Heo  2009-12-02  332   * Please note that sched_in and out are called under different
8592e6486a177a0 Tejun Heo  2009-12-02  333   * contexts.  sched_out is called with rq lock held and irq disabled
8592e6486a177a0 Tejun Heo  2009-12-02  334   * while sched_in is called without rq lock and irq enabled.  This
8592e6486a177a0 Tejun Heo  2009-12-02  335   * difference is intentional and depended upon by its users.
e107be36efb2a23 Avi Kivity 2007-07-26  336   */
e107be36efb2a23 Avi Kivity 2007-07-26  337  struct preempt_ops {
e107be36efb2a23 Avi Kivity 2007-07-26  338  	void (*sched_in)(struct preempt_notifier *notifier, int cpu);
e107be36efb2a23 Avi Kivity 2007-07-26  339  	void (*sched_out)(struct preempt_notifier *notifier,
e107be36efb2a23 Avi Kivity 2007-07-26 @340  			  struct task_struct *next);
e107be36efb2a23 Avi Kivity 2007-07-26  341  };
e107be36efb2a23 Avi Kivity 2007-07-26  342  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 00/11] Add a percpu subsection for hot data
  2025-02-23 10:20   ` Ard Biesheuvel
  2025-02-23 10:30     ` Uros Bizjak
@ 2025-02-23 14:44     ` Brian Gerst
  1 sibling, 0 replies; 22+ messages in thread
From: Brian Gerst @ 2025-02-23 14:44 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ingo Molnar, linux-kernel, x86, H . Peter Anvin, Thomas Gleixner,
	Borislav Petkov, Uros Bizjak, Linus Torvalds, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton

On Sun, Feb 23, 2025 at 5:20 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Sun, 23 Feb 2025 at 10:37, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Brian Gerst <brgerst@gmail.com> wrote:
> >
> > > Add a new percpu subsection for data that is frequently accessed and
> > > exclusive to each processor.  This is intended to replace the pcpu_hot
> > > struct on X86, and is available to all architectures.
> > >
> > > The one caveat with this approach is that it depends on the linker to
> > > effeciently pack data that is smaller than machine word size.  The
> > > binutils linker does this properly:
> > >
> > > ffffffff842f6000 D __per_cpu_hot_start
> > > ffffffff842f6000 D softirq_pending
> > > ffffffff842f6002 D hardirq_stack_inuse
> > > ffffffff842f6008 D hardirq_stack_ptr
> > > ffffffff842f6010 D __ref_stack_chk_guard
> > > ffffffff842f6010 D __stack_chk_guard
> > > ffffffff842f6018 D const_cpu_current_top_of_stack
> > > ffffffff842f6018 D cpu_current_top_of_stack
> > > ffffffff842f6020 D const_current_task
> > > ffffffff842f6020 D current_task
> > > ffffffff842f6028 D __preempt_count
> > > ffffffff842f602c D cpu_number
> > > ffffffff842f6030 D this_cpu_off
> > > ffffffff842f6038 D __x86_call_depth
> > > ffffffff842f6040 D __per_cpu_hot_end
> > >
> > > The LLVM linker doesn't do as well with packing smaller data objects,
> > > causing it to spill over into a second cacheline.
> >
> > ... now it's linker-ordered. But on the other hand that can be an
> > advantage too: the linker will try to (or at least has a chance to)
> > order the fields optimally for cache density, while the hand-packing
> > always has the potential to bitrot without much of an outside,
> > actionable indicator for the bitrot.
> >
>
> The linker will need some help here - by default, it just emits these
> variables in the order they appear in the input.
>
> If we emit each such variable 'foo' into .data..hot.foo, and define
> the contents of the section as
>
> *(SORT_BY_ALIGNMENT(.data..hot.*))
>
> we should get optimal packing as long as the alignment of these
> variables does not exceed their size.

Thanks for the tip on SORT_BY_ALIGNMENT().  That got the LLVM linker
to pack the data correctly.


Brian Gerst

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 00/11] Add a percpu subsection for hot data
  2025-02-23 10:30     ` Uros Bizjak
@ 2025-02-23 17:25       ` Brian Gerst
  2025-02-23 17:30         ` Ard Biesheuvel
  0 siblings, 1 reply; 22+ messages in thread
From: Brian Gerst @ 2025-02-23 17:25 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Ard Biesheuvel, Ingo Molnar, linux-kernel, x86, H . Peter Anvin,
	Thomas Gleixner, Borislav Petkov, Linus Torvalds, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton

On Sun, Feb 23, 2025 at 5:30 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Sun, Feb 23, 2025 at 11:20 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Sun, 23 Feb 2025 at 10:37, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > >
> > > * Brian Gerst <brgerst@gmail.com> wrote:
> > >
> > > > Add a new percpu subsection for data that is frequently accessed and
> > > > exclusive to each processor.  This is intended to replace the pcpu_hot
> > > > struct on X86, and is available to all architectures.
> > > >
> > > > The one caveat with this approach is that it depends on the linker to
> > > > effeciently pack data that is smaller than machine word size.  The
> > > > binutils linker does this properly:
> > > >
> > > > ffffffff842f6000 D __per_cpu_hot_start
> > > > ffffffff842f6000 D softirq_pending
> > > > ffffffff842f6002 D hardirq_stack_inuse
> > > > ffffffff842f6008 D hardirq_stack_ptr
> > > > ffffffff842f6010 D __ref_stack_chk_guard
> > > > ffffffff842f6010 D __stack_chk_guard
> > > > ffffffff842f6018 D const_cpu_current_top_of_stack
> > > > ffffffff842f6018 D cpu_current_top_of_stack
> > > > ffffffff842f6020 D const_current_task
> > > > ffffffff842f6020 D current_task
> > > > ffffffff842f6028 D __preempt_count
> > > > ffffffff842f602c D cpu_number
> > > > ffffffff842f6030 D this_cpu_off
> > > > ffffffff842f6038 D __x86_call_depth
> > > > ffffffff842f6040 D __per_cpu_hot_end
> > > >
> > > > The LLVM linker doesn't do as well with packing smaller data objects,
> > > > causing it to spill over into a second cacheline.
> > >
> > > ... now it's linker-ordered. But on the other hand that can be an
> > > advantage too: the linker will try to (or at least has a chance to)
> > > order the fields optimally for cache density, while the hand-packing
> > > always has the potential to bitrot without much of an outside,
> > > actionable indicator for the bitrot.
> > >
> >
> > The linker will need some help here - by default, it just emits these
> > variables in the order they appear in the input.
> >
> > If we emit each such variable 'foo' into .data..hot.foo, and define
> > the contents of the section as
> >
> > *(SORT_BY_ALIGNMENT(.data..hot.*))
> >
> > we should get optimal packing as long as the alignment of these
> > variables does not exceed their size.
>
> Is it possible to warn/error when data is spilled over the cache line?
> Previously, there was:
>
> -static_assert(sizeof(struct pcpu_hot) == 64);
>
> that failed the build in this case.

I think it should be a warning and not an error.  If it does spill
into a second cacheline the kernel will still boot and function
properly so it's not a fatal error, it just could hit performance a
bit.  By decentralizing this it does make it harder to account for
size, especially with conditional builds.  Unfortunately, the linker
script language does not have a WARNING() counterpart to ASSERT().


Brian Gerst

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 00/11] Add a percpu subsection for hot data
  2025-02-23 17:25       ` Brian Gerst
@ 2025-02-23 17:30         ` Ard Biesheuvel
  0 siblings, 0 replies; 22+ messages in thread
From: Ard Biesheuvel @ 2025-02-23 17:30 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Uros Bizjak, Ingo Molnar, linux-kernel, x86, H . Peter Anvin,
	Thomas Gleixner, Borislav Petkov, Linus Torvalds, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton

On Sun, 23 Feb 2025 at 18:25, Brian Gerst <brgerst@gmail.com> wrote:
>
> On Sun, Feb 23, 2025 at 5:30 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Sun, Feb 23, 2025 at 11:20 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Sun, 23 Feb 2025 at 10:37, Ingo Molnar <mingo@kernel.org> wrote:
> > > >
> > > >
> > > > * Brian Gerst <brgerst@gmail.com> wrote:
> > > >
> > > > > Add a new percpu subsection for data that is frequently accessed and
> > > > > exclusive to each processor.  This is intended to replace the pcpu_hot
> > > > > struct on X86, and is available to all architectures.
> > > > >
> > > > > The one caveat with this approach is that it depends on the linker to
> > > > > effeciently pack data that is smaller than machine word size.  The
> > > > > binutils linker does this properly:
> > > > >
> > > > > ffffffff842f6000 D __per_cpu_hot_start
> > > > > ffffffff842f6000 D softirq_pending
> > > > > ffffffff842f6002 D hardirq_stack_inuse
> > > > > ffffffff842f6008 D hardirq_stack_ptr
> > > > > ffffffff842f6010 D __ref_stack_chk_guard
> > > > > ffffffff842f6010 D __stack_chk_guard
> > > > > ffffffff842f6018 D const_cpu_current_top_of_stack
> > > > > ffffffff842f6018 D cpu_current_top_of_stack
> > > > > ffffffff842f6020 D const_current_task
> > > > > ffffffff842f6020 D current_task
> > > > > ffffffff842f6028 D __preempt_count
> > > > > ffffffff842f602c D cpu_number
> > > > > ffffffff842f6030 D this_cpu_off
> > > > > ffffffff842f6038 D __x86_call_depth
> > > > > ffffffff842f6040 D __per_cpu_hot_end
> > > > >
> > > > > The LLVM linker doesn't do as well with packing smaller data objects,
> > > > > causing it to spill over into a second cacheline.
> > > >
> > > > ... now it's linker-ordered. But on the other hand that can be an
> > > > advantage too: the linker will try to (or at least has a chance to)
> > > > order the fields optimally for cache density, while the hand-packing
> > > > always has the potential to bitrot without much of an outside,
> > > > actionable indicator for the bitrot.
> > > >
> > >
> > > The linker will need some help here - by default, it just emits these
> > > variables in the order they appear in the input.
> > >
> > > If we emit each such variable 'foo' into .data..hot.foo, and define
> > > the contents of the section as
> > >
> > > *(SORT_BY_ALIGNMENT(.data..hot.*))
> > >
> > > we should get optimal packing as long as the alignment of these
> > > variables does not exceed their size.
> >
> > Is it possible to warn/error when data is spilled over the cache line?
> > Previously, there was:
> >
> > -static_assert(sizeof(struct pcpu_hot) == 64);
> >
> > that failed the build in this case.
>
> I think it should be a warning and not an error.  If it does spill
> into a second cacheline the kernel will still boot and function
> properly so it's not a fatal error, it just could hit performance a
> bit.  By decentralizing this it does make it harder to account for
> size, especially with conditional builds.  Unfortunately, the linker
> script language does not have a WARNING() counterpart to ASSERT().
>

Why should it even be a warning? What is the problem if the build in
question has two cachelines worth of hot per-CPU data?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 00/11] Add a percpu subsection for hot data
  2025-02-23  9:36 ` [RFC PATCH 00/11] Add a percpu subsection for hot data Ingo Molnar
  2025-02-23 10:20   ` Ard Biesheuvel
@ 2025-02-23 18:00   ` Linus Torvalds
  1 sibling, 0 replies; 22+ messages in thread
From: Linus Torvalds @ 2025-02-23 18:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Brian Gerst, linux-kernel, x86, H . Peter Anvin, Thomas Gleixner,
	Borislav Petkov, Ard Biesheuvel, Uros Bizjak, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton

On Sun, 23 Feb 2025 at 01:37, Ingo Molnar <mingo@kernel.org> wrote:
>
> This can also be a drawback if it's abused by random driver code - so I
> think it should at minimum be documented to be used by core & arch
> code. Maybe add a build #error too if it's defined in modular code?

Yes, please.

Everybody always thinks that *their* code is the most important code,
so making it easy for random filesystems or drivers to just say "this
is my hot piece of data" needs to be avoided.

That is also an argument for having the final size be asserted to be
smaller than one cacheline.

Because I do think that the patches look fine, but it's too much of an
invitation for random developers to go "Oh, *MY* code deserves a hot
marker".

           Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-02-23 18:00 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-22 19:06 [RFC PATCH 00/11] Add a percpu subsection for hot data Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 01/11] percpu: Introduce percpu hot section Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 02/11] x86/preempt: Move preempt count to " Brian Gerst
2025-02-23 10:05   ` kernel test robot
2025-02-23 10:49   ` kernel test robot
2025-02-23 11:31   ` kernel test robot
2025-02-22 19:06 ` [RFC PATCH 03/11] x86/smp: Move cpu number " Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 04/11] x86/retbleed: Move call depth " Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 05/11] x86/percpu: Move top_of_stack " Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 06/11] x86/percpu: Move current_task " Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 07/11] x86/softirq: Move softirq_pending " Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 08/11] x86/irq: Move irq stacks " Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 09/11] x86/percpu: Remove pcpu_hot Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 10/11] x86/stackprotector: Move __stack_chk_guard to percpu hot section Brian Gerst
2025-02-22 19:06 ` [RFC PATCH 11/11] x86/smp: Move this_cpu_off " Brian Gerst
2025-02-23  9:36 ` [RFC PATCH 00/11] Add a percpu subsection for hot data Ingo Molnar
2025-02-23 10:20   ` Ard Biesheuvel
2025-02-23 10:30     ` Uros Bizjak
2025-02-23 17:25       ` Brian Gerst
2025-02-23 17:30         ` Ard Biesheuvel
2025-02-23 14:44     ` Brian Gerst
2025-02-23 18:00   ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.