All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Move thread_info into task_struct for LoongArch
@ 2026-04-28  7:20 Tiezhu Yang
  2026-04-28  7:20 ` [RFC PATCH 1/3] LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK Tiezhu Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Tiezhu Yang @ 2026-04-28  7:20 UTC (permalink / raw)
  To: loongarch, bpf, linux-arch, linux-kernel

This is still an RFC, based on 7.1-rc1. I will perform more tests locally.
Looking forward to more review comments and test results.

Tiezhu Yang (3):
  LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK
  LoongArch: BPF: Inline bpf_get_current_task{_btf}() helpers
  LoongArch: BPF: Inline bpf_get_smp_processor_id() helper

 arch/loongarch/Kconfig                   |  1 +
 arch/loongarch/include/asm/current.h     | 22 ++++++++++++
 arch/loongarch/include/asm/processor.h   | 44 +++---------------------
 arch/loongarch/include/asm/ptrace.h      |  6 ----
 arch/loongarch/include/asm/smp.h         |  3 +-
 arch/loongarch/include/asm/stackframe.h  |  9 +++--
 arch/loongarch/include/asm/switch_to.h   | 15 ++++++--
 arch/loongarch/include/asm/thread_info.h | 13 +------
 arch/loongarch/kernel/asm-offsets.c      |  7 ++--
 arch/loongarch/kernel/entry.S            |  7 ++--
 arch/loongarch/kernel/head.S             | 18 ++++++----
 arch/loongarch/kernel/process.c          |  6 ++++
 arch/loongarch/kernel/relocate.c         |  2 +-
 arch/loongarch/kernel/setup.c            |  3 ++
 arch/loongarch/kernel/smp.c              |  8 +++--
 arch/loongarch/kernel/switch.S           |  9 ++---
 arch/loongarch/net/bpf_jit.c             | 26 ++++++++++++++
 17 files changed, 115 insertions(+), 84 deletions(-)
 create mode 100644 arch/loongarch/include/asm/current.h

-- 
2.42.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH 1/3] LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK
  2026-04-28  7:20 [RFC PATCH 0/3] Move thread_info into task_struct for LoongArch Tiezhu Yang
@ 2026-04-28  7:20 ` Tiezhu Yang
  2026-04-28 20:38   ` sashiko-bot
  2026-06-01 13:46   ` Huacai Chen
  2026-04-28  7:20 ` [RFC PATCH 2/3] LoongArch: BPF: Inline bpf_get_current_task{_btf}() helpers Tiezhu Yang
  2026-04-28  7:20 ` [RFC PATCH 3/3] LoongArch: BPF: Inline bpf_get_smp_processor_id() helper Tiezhu Yang
  2 siblings, 2 replies; 12+ messages in thread
From: Tiezhu Yang @ 2026-04-28  7:20 UTC (permalink / raw)
  To: loongarch, bpf, linux-arch, linux-kernel

Like other architectures such as x86, arm64, riscv, powerpc and s390,
select THREAD_INFO_IN_TASK for LoongArch to move thread_info off the
stack into task_struct. This follows modern kernel standards and also
makes the system more secure.

With this patch, thread_info is included in task_struct at an offset
of 0 instead of being placed at the bottom of the kernel stack. Thus,
the $tp register points to both thread_info and task_struct.

To support this, introduce a per-CPU variable __entry_task to store
the pointer to the current task_struct. This decouples the recovery
of the $tp register from the stack pointer during exception entry.
Relying on SP for task identification is insecure since the SP can be
controlled or corrupted by userspace.

Then initialize __entry_task for both the primary and secondary CPUs
during arch-specific setup and SMP boot paths, modify SAVE_SOME and
handle_syscall to restore the $tp register from __entry_task, and use
la_abs absolute addressing for __entry_task access in assembly to
bypass relocation limits within exception handling sections. Also,
add entry_task_switch() to update __entry_task during switch_to().

Additionally, add reg02 ($tp) to thread_struct and initialize it in
both INIT_THREAD and copy_thread() to ensure that the "current" task
pointer is correctly set up for the initial idle task and all future
child processes. While __switch_to() directly updates $tp from a1 for
efficiency, storing it in thread_struct ensures correct restoration
for new processes and in non-standard switch paths.

Furthermore, initialize reg03 ($sp) in INIT_THREAD to set the initial
kernel stack pointer for the idle task. This ensures the idle task has
a valid stack to use when the system first starts or whenever it is
switched back to.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 arch/loongarch/Kconfig                   |  1 +
 arch/loongarch/include/asm/current.h     | 22 ++++++++++++
 arch/loongarch/include/asm/processor.h   | 44 +++---------------------
 arch/loongarch/include/asm/ptrace.h      |  6 ----
 arch/loongarch/include/asm/smp.h         |  3 +-
 arch/loongarch/include/asm/stackframe.h  |  9 +++--
 arch/loongarch/include/asm/switch_to.h   | 15 ++++++--
 arch/loongarch/include/asm/thread_info.h | 13 +------
 arch/loongarch/kernel/asm-offsets.c      |  7 ++--
 arch/loongarch/kernel/entry.S            |  7 ++--
 arch/loongarch/kernel/head.S             | 18 ++++++----
 arch/loongarch/kernel/process.c          |  6 ++++
 arch/loongarch/kernel/relocate.c         |  2 +-
 arch/loongarch/kernel/setup.c            |  3 ++
 arch/loongarch/kernel/smp.c              |  8 +++--
 arch/loongarch/kernel/switch.S           |  9 ++---
 16 files changed, 89 insertions(+), 84 deletions(-)
 create mode 100644 arch/loongarch/include/asm/current.h

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 3b042dbb2c41..ea29d5d17588 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -210,6 +210,7 @@ config LOONGARCH
 	select SYSCTL_ARCH_UNALIGN_NO_WARN
 	select SYSCTL_EXCEPTION_TRACE
 	select SWIOTLB if 64BIT
+	select THREAD_INFO_IN_TASK
 	select TRACE_IRQFLAGS_SUPPORT
 	select USE_PERCPU_NUMA_NODE_ID
 	select USER_STACKTRACE_SUPPORT
diff --git a/arch/loongarch/include/asm/current.h b/arch/loongarch/include/asm/current.h
new file mode 100644
index 000000000000..c03cf0abb863
--- /dev/null
+++ b/arch/loongarch/include/asm/current.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_LOONGARCH_CURRENT_H
+#define __ASM_LOONGARCH_CURRENT_H
+
+#include <linux/compiler.h>
+
+#ifndef __ASSEMBLER__
+
+struct task_struct;
+
+register struct task_struct *current_thread_pointer __asm__("$tp");
+
+static __always_inline struct task_struct *get_current(void)
+{
+	return current_thread_pointer;
+}
+
+#define current get_current()
+
+#endif /* __ASSEMBLER__ */
+
+#endif /* __ASM_LOONGARCH_CURRENT_H */
diff --git a/arch/loongarch/include/asm/processor.h b/arch/loongarch/include/asm/processor.h
index ce8b953f8c79..df927a4318cc 100644
--- a/arch/loongarch/include/asm/processor.h
+++ b/arch/loongarch/include/asm/processor.h
@@ -109,7 +109,7 @@ struct loongarch_vdso_info;
  */
 struct thread_struct {
 	/* Main processor registers. */
-	unsigned long reg01, reg03, reg22; /* ra sp fp */
+	unsigned long reg01, reg02, reg03, reg22; /* ra tp sp fp */
 	unsigned long reg23, reg24, reg25, reg26; /* s0-s3 */
 	unsigned long reg27, reg28, reg29, reg30, reg31; /* s4-s8 */
 
@@ -145,45 +145,9 @@ struct thread_struct {
 #define thread_saved_ra(tsk)	(tsk->thread.sched_ra)
 #define thread_saved_fp(tsk)	(tsk->thread.sched_cfa)
 
-#define INIT_THREAD  {						\
-	/*							\
-	 * Main processor registers				\
-	 */							\
-	.reg01			= 0,				\
-	.reg03			= 0,				\
-	.reg22			= 0,				\
-	.reg23			= 0,				\
-	.reg24			= 0,				\
-	.reg25			= 0,				\
-	.reg26			= 0,				\
-	.reg27			= 0,				\
-	.reg28			= 0,				\
-	.reg29			= 0,				\
-	.reg30			= 0,				\
-	.reg31			= 0,				\
-	.sched_ra		= 0,				\
-	.sched_cfa		= 0,				\
-	.csr_crmd		= 0,				\
-	.csr_prmd		= 0,				\
-	.csr_euen		= 0,				\
-	.csr_ecfg		= 0,				\
-	.csr_badvaddr		= 0,				\
-	/*							\
-	 * Other stuff associated with the process		\
-	 */							\
-	.trap_nr		= 0,				\
-	.error_code		= 0,				\
-	/*							\
-	 * FPU & vector registers				\
-	 */							\
-	.fpu			= {				\
-		.fcc		= 0,				\
-		.fcsr		= 0,				\
-		.ftop		= 0,				\
-		.fpr		= {{{0,},},},			\
-	},							\
-	.hbp_break		= {0},				\
-	.hbp_watch		= {0},				\
+#define INIT_THREAD  {							\
+	.reg02 = (unsigned long)&init_task,				\
+	.reg03 = (unsigned long)&init_stack + sizeof(init_stack),	\
 }
 
 struct task_struct;
diff --git a/arch/loongarch/include/asm/ptrace.h b/arch/loongarch/include/asm/ptrace.h
index e5d21e836d99..37f53629d3c7 100644
--- a/arch/loongarch/include/asm/ptrace.h
+++ b/arch/loongarch/include/asm/ptrace.h
@@ -170,12 +170,6 @@ static inline void die_if_kernel(const char *str, struct pt_regs *regs)
 		die(str, regs);
 }
 
-#define current_pt_regs()						\
-({									\
-	unsigned long sp = (unsigned long)__builtin_frame_address(0);	\
-	(struct pt_regs *)((sp | (THREAD_SIZE - 1)) + 1) - 1;		\
-})
-
 /* Helpers for working with the user stack pointer */
 
 static inline unsigned long user_stack_pointer(struct pt_regs *regs)
diff --git a/arch/loongarch/include/asm/smp.h b/arch/loongarch/include/asm/smp.h
index 3a47f52959a8..761cc0016df8 100644
--- a/arch/loongarch/include/asm/smp.h
+++ b/arch/loongarch/include/asm/smp.h
@@ -82,7 +82,8 @@ struct seq_file;
 
 struct secondary_data {
 	unsigned long stack;
-	unsigned long thread_info;
+	unsigned long task;
+	unsigned long offset;
 };
 extern struct secondary_data cpuboot_data;
 
diff --git a/arch/loongarch/include/asm/stackframe.h b/arch/loongarch/include/asm/stackframe.h
index ecc8e50fffa8..eeda5dcc982e 100644
--- a/arch/loongarch/include/asm/stackframe.h
+++ b/arch/loongarch/include/asm/stackframe.h
@@ -191,8 +191,13 @@
 	andi	t0, t0, 0x3	/* extract pplv bit */
 	beqz	t0, 9f
 
-	LONG_LI	tp, ~_THREAD_MASK
-	and	tp, tp, sp
+	la_abs      t1, __entry_task
+#ifdef CONFIG_SMP
+	csrrd       t0, PERCPU_BASE_KS
+	LONG_ADD    t1, t1, t0
+#endif
+	LONG_L      tp, t1, 0
+
 	cfi_st  u0, PT_R21, \docfi
 	csrrd	u0, PERCPU_BASE_KS
 9:
diff --git a/arch/loongarch/include/asm/switch_to.h b/arch/loongarch/include/asm/switch_to.h
index 5b225aff3ba2..9932429cfe17 100644
--- a/arch/loongarch/include/asm/switch_to.h
+++ b/arch/loongarch/include/asm/switch_to.h
@@ -5,17 +5,25 @@
 #ifndef _ASM_SWITCH_TO_H
 #define _ASM_SWITCH_TO_H
 
+#include <linux/percpu.h>
+
 #include <asm/cpu-features.h>
 #include <asm/fpu.h>
 #include <asm/lbt.h>
 
 struct task_struct;
 
+DECLARE_PER_CPU(struct task_struct *, __entry_task);
+
+static inline void entry_task_switch(struct task_struct *next)
+{
+	__this_cpu_write(__entry_task, next);
+}
+
 /**
  * __switch_to - switch execution of a task
  * @prev:	The task previously executed.
  * @next:	The task to begin executing.
- * @next_ti:	task_thread_info(next).
  * @sched_ra:	__schedule return address.
  * @sched_cfa:	__schedule call frame address.
  *
@@ -23,7 +31,7 @@ struct task_struct;
  * the context of next. Returns prev.
  */
 extern asmlinkage struct task_struct *__switch_to(struct task_struct *prev,
-			struct task_struct *next, struct thread_info *next_ti,
+			struct task_struct *next,
 			void *sched_ra, void *sched_cfa);
 
 /*
@@ -37,7 +45,8 @@ do {										\
 	lose_fpu_inatomic(1, prev);						\
 	lose_lbt_inatomic(1, prev);						\
 	hw_breakpoint_thread_switch(next);					\
-	(last) = __switch_to(prev, next, task_thread_info(next),		\
+	entry_task_switch(next);						\
+	(last) = __switch_to(prev, next,					\
 		 __builtin_return_address(0), __builtin_frame_address(0));	\
 } while (0)
 
diff --git a/arch/loongarch/include/asm/thread_info.h b/arch/loongarch/include/asm/thread_info.h
index 4d7117fcdc78..2c95a5134976 100644
--- a/arch/loongarch/include/asm/thread_info.h
+++ b/arch/loongarch/include/asm/thread_info.h
@@ -22,9 +22,7 @@
  *   must also be changed
  */
 struct thread_info {
-	struct task_struct	*task;		/* main task structure */
 	unsigned long		flags;		/* low level flags */
-	unsigned long		tp_value;	/* thread pointer */
 	__u32			cpu;		/* current CPU */
 	int			preempt_count;	/* 0 => preemptible, <0 => BUG */
 	struct pt_regs		*regs;
@@ -37,20 +35,11 @@ struct thread_info {
  */
 #define INIT_THREAD_INFO(tsk)			\
 {						\
-	.task		= &tsk,			\
-	.flags		= _TIF_FIXADE,		\
+	.flags		= 0,			\
 	.cpu		= 0,			\
 	.preempt_count	= INIT_PREEMPT_COUNT,	\
 }
 
-/* How to get the thread information struct from C. */
-register struct thread_info *__current_thread_info __asm__("$tp");
-
-static inline struct thread_info *current_thread_info(void)
-{
-	return __current_thread_info;
-}
-
 register unsigned long current_stack_pointer __asm__("$sp");
 
 #endif /* !__ASSEMBLER__ */
diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c
index 2cc953f113ac..55f4ffc1aca5 100644
--- a/arch/loongarch/kernel/asm-offsets.c
+++ b/arch/loongarch/kernel/asm-offsets.c
@@ -70,7 +70,7 @@ static void __used output_task_defines(void)
 {
 	COMMENT("LoongArch task_struct offsets.");
 	OFFSET(TASK_STATE, task_struct, __state);
-	OFFSET(TASK_THREAD_INFO, task_struct, stack);
+	OFFSET(TASK_STACK, task_struct, stack);
 	OFFSET(TASK_FLAGS, task_struct, flags);
 	OFFSET(TASK_MM, task_struct, mm);
 	OFFSET(TASK_PID, task_struct, pid);
@@ -84,9 +84,7 @@ static void __used output_task_defines(void)
 static void __used output_thread_info_defines(void)
 {
 	COMMENT("LoongArch thread_info offsets.");
-	OFFSET(TI_TASK, thread_info, task);
 	OFFSET(TI_FLAGS, thread_info, flags);
-	OFFSET(TI_TP_VALUE, thread_info, tp_value);
 	OFFSET(TI_CPU, thread_info, cpu);
 	OFFSET(TI_PRE_COUNT, thread_info, preempt_count);
 	OFFSET(TI_REGS, thread_info, regs);
@@ -267,7 +265,8 @@ static void __used output_smpboot_defines(void)
 {
 	COMMENT("Linux smp cpu boot offsets.");
 	OFFSET(CPU_BOOT_STACK, secondary_data, stack);
-	OFFSET(CPU_BOOT_TINFO, secondary_data, thread_info);
+	OFFSET(CPU_BOOT_TASK, secondary_data, task);
+	OFFSET(CPU_BOOT_OFFSET, secondary_data, offset);
 	BLANK();
 }
 #endif
diff --git a/arch/loongarch/kernel/entry.S b/arch/loongarch/kernel/entry.S
index b53d333a7c42..b099672eab60 100644
--- a/arch/loongarch/kernel/entry.S
+++ b/arch/loongarch/kernel/entry.S
@@ -67,8 +67,11 @@ SYM_CODE_START(handle_syscall)
 #endif
 
 	move		u0, t0
-	LONG_LI		tp, ~_THREAD_MASK
-	and		tp, tp, sp
+	la_abs		t1, __entry_task
+#ifdef CONFIG_SMP
+	LONG_ADD	t1, t1, u0
+#endif
+	LONG_L		tp, t1, 0
 
 	move		a0, sp
 	bl		do_syscall
diff --git a/arch/loongarch/kernel/head.S b/arch/loongarch/kernel/head.S
index 4eed7bc312a8..ec67faab907b 100644
--- a/arch/loongarch/kernel/head.S
+++ b/arch/loongarch/kernel/head.S
@@ -74,10 +74,11 @@ SYM_CODE_START(kernel_entry)			# kernel entry point
 	/* GPR21 used for percpu base (runtime), initialized as 0 */
 	move		u0, zero
 
-	la.pcrel	tp, init_thread_union
-	/* Set the SP after an empty pt_regs.  */
-	PTR_LI		sp, (_THREAD_SIZE - PT_SIZE)
-	PTR_ADD		sp, sp, tp
+	la.pcrel	tp, init_task
+	la.pcrel	t0, init_stack
+	PTR_LI		t1, _THREAD_SIZE
+	PTR_ADD		t0, t0, t1
+	PTR_ADDI	sp, t0, -PT_SIZE
 	set_saved_sp	sp, t0, t1
 
 #ifdef CONFIG_RELOCATABLE
@@ -86,8 +87,10 @@ SYM_CODE_START(kernel_entry)			# kernel entry point
 
 #ifdef CONFIG_RANDOMIZE_BASE
 	/* Repoint the sp into the new kernel */
-	PTR_LI		sp, (_THREAD_SIZE - PT_SIZE)
-	PTR_ADD		sp, sp, tp
+	LONG_LPTR	t0, tp, TASK_STACK
+	PTR_LI		t1, _THREAD_SIZE
+	PTR_ADD		t0, t0, t1
+	PTR_ADDI	sp, t0, -PT_SIZE
 	set_saved_sp	sp, t0, t1
 
 	/* Jump to the new kernel: new_pc = current_pc + random_offset */
@@ -128,7 +131,8 @@ SYM_CODE_START(smpboot_entry)
 #endif
 	la.pcrel	t0, cpuboot_data
 	ld.d		sp, t0, CPU_BOOT_STACK
-	ld.d		tp, t0, CPU_BOOT_TINFO
+	ld.d		tp, t0, CPU_BOOT_TASK
+	ld.d		u0, t0, CPU_BOOT_OFFSET
 
 	bl		start_secondary
 	ASM_BUG()
diff --git a/arch/loongarch/kernel/process.c b/arch/loongarch/kernel/process.c
index 5505fc355e1b..71c9c6468e60 100644
--- a/arch/loongarch/kernel/process.c
+++ b/arch/loongarch/kernel/process.c
@@ -54,6 +54,9 @@
 #include <asm/vdso.h>
 #include <asm/vdso/vdso.h>
 
+DEFINE_PER_CPU(struct task_struct *, __entry_task);
+EXPORT_PER_CPU_SYMBOL_GPL(__entry_task);
+
 #ifdef CONFIG_STACKPROTECTOR
 #include <linux/stackprotector.h>
 unsigned long __stack_chk_guard __read_mostly;
@@ -223,6 +226,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 	if (clone_flags & CLONE_SETTLS)
 		childregs->regs[2] = tls;
 
+	/* Set tp to the new task structure for context switching */
+	p->thread.reg02 = (unsigned long)p;
+
 out:
 	ptrace_hw_copy_thread(p);
 	clear_tsk_thread_flag(p, TIF_USEDFPU);
diff --git a/arch/loongarch/kernel/relocate.c b/arch/loongarch/kernel/relocate.c
index 16f6a9b39659..a9ef38459629 100644
--- a/arch/loongarch/kernel/relocate.c
+++ b/arch/loongarch/kernel/relocate.c
@@ -263,7 +263,7 @@ unsigned long __init relocate_kernel(void)
 		reloc_offset += random_offset;
 
 		/* The current thread is now within the relocated kernel */
-		__current_thread_info = RELOCATED_KASLR(__current_thread_info);
+		current_thread_pointer = RELOCATED_KASLR(current_thread_pointer);
 
 		update_reloc_offset(&reloc_offset, random_offset);
 	}
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index 839b23edee87..5d434c5612ab 100644
--- a/arch/loongarch/kernel/setup.c
+++ b/arch/loongarch/kernel/setup.c
@@ -46,6 +46,7 @@
 #include <asm/pgalloc.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
+#include <asm/switch_to.h>
 #include <asm/time.h>
 #include <asm/unwind.h>
 
@@ -617,4 +618,6 @@ void __init setup_arch(char **cmdline_p)
 #ifdef CONFIG_KASAN
 	kasan_init();
 #endif
+
+	entry_task_switch(&init_task);
 }
diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
index 64a048f1b880..e8b0d2fc2a9a 100644
--- a/arch/loongarch/kernel/smp.c
+++ b/arch/loongarch/kernel/smp.c
@@ -34,6 +34,7 @@
 #include <asm/paravirt.h>
 #include <asm/processor.h>
 #include <asm/setup.h>
+#include <asm/switch_to.h>
 #include <asm/time.h>
 
 int __cpu_number_map[NR_CPUS];   /* Map physical to logical */
@@ -400,8 +401,9 @@ void loongson_boot_secondary(int cpu, struct task_struct *idle)
 	pr_info("Booting CPU#%d...\n", cpu);
 
 	entry = __pa_symbol((unsigned long)&smpboot_entry);
-	cpuboot_data.stack = (unsigned long)__KSTK_TOS(idle);
-	cpuboot_data.thread_info = (unsigned long)task_thread_info(idle);
+	cpuboot_data.stack = (unsigned long)task_pt_regs(idle);
+	cpuboot_data.task = (unsigned long)idle;
+	cpuboot_data.offset = per_cpu_offset(cpu);
 
 	csr_mail_send(entry, cpu_logical_map(cpu), 0);
 
@@ -686,6 +688,8 @@ asmlinkage void start_secondary(void)
 	 */
 	complete(&cpu_running);
 
+	entry_task_switch(current);
+
 	/*
 	 * irq will be enabled in loongson_smp_finish(), enabling it too
 	 * early is dangerous.
diff --git a/arch/loongarch/kernel/switch.S b/arch/loongarch/kernel/switch.S
index f377d8f5c51a..644348e05f6a 100644
--- a/arch/loongarch/kernel/switch.S
+++ b/arch/loongarch/kernel/switch.S
@@ -12,7 +12,7 @@
 
 /*
  * task_struct *__switch_to(task_struct *prev, task_struct *next,
- *			    struct thread_info *next_ti, void *sched_ra, void *sched_cfa)
+ *			    void *sched_ra, void *sched_cfa)
  */
 	.align	5
 SYM_FUNC_START(__switch_to)
@@ -33,11 +33,12 @@ SYM_FUNC_START(__switch_to)
 	LONG_SPTR	t8, t7, 0
 #endif
 
-	move	tp, a2
+	move	tp, a1
 	cpu_restore_nonscratch a1
 
-	li.w		t0, _THREAD_SIZE
-	PTR_ADD		t0, t0, tp
+	LONG_LPTR	t0, tp, TASK_STACK
+	PTR_LI		t1, _THREAD_SIZE
+	PTR_ADD		t0, t0, t1
 	set_saved_sp	t0, t1, t2
 
 	LONG_LPTR	t1, a1, (THREAD_CSRPRMD - TASK_STRUCT_OFFSET)
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 2/3] LoongArch: BPF: Inline bpf_get_current_task{_btf}() helpers
  2026-04-28  7:20 [RFC PATCH 0/3] Move thread_info into task_struct for LoongArch Tiezhu Yang
  2026-04-28  7:20 ` [RFC PATCH 1/3] LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK Tiezhu Yang
@ 2026-04-28  7:20 ` Tiezhu Yang
  2026-04-28  7:20 ` [RFC PATCH 3/3] LoongArch: BPF: Inline bpf_get_smp_processor_id() helper Tiezhu Yang
  2 siblings, 0 replies; 12+ messages in thread
From: Tiezhu Yang @ 2026-04-28  7:20 UTC (permalink / raw)
  To: loongarch, bpf, linux-arch, linux-kernel

The pointer to task_struct is always available in the $tp register,
the calls to bpf_get_current_task() and bpf_get_current_task_btf()
can be inlined into a single move instruction.

(1) Here is the sample test.c:

  #include <linux/bpf.h>
  #include <bpf/bpf_helpers.h>

  SEC("raw_tp/sys_enter")
  long test_task(void *ctx)
  {
	  return (long)bpf_get_current_task();
  }
  char _license[] SEC("license") = "GPL";

(2) Here are the test steps:

  sudo yum install libbpf-devel kernel-devel bpftool
  clang -target bpf -O2 -c test.c -o test.o
  sudo sysctl -w net.core.bpf_jit_enable=1
  sudo bpftool prog show name test_task
  sudo rm -f /sys/fs/bpf/test_task
  sudo bpftool prog load test.o /sys/fs/bpf/test_task
  ID=$(sudo bpftool prog show pinned /sys/fs/bpf/test_task | grep -oE '^[0-9]+')
  sudo bpftool prog dump jited id $ID

(3) Here are the test results:

Before: 6 instructions

  ...
  64:	lu12i.w	$t1, 1093
  68:	ori	$t1, $t1, 3320
  6c:	lu32i.d	$t1, 0
  70:	lu52i.d	$t1, $t1, -1792
  74:	jirl	$ra, $t1, 0
  78:	move	$a5, $a0
  ...

After: 1 instruction

  ...
  64:	move	$a5, $tp
  ...

This is similar with commit 2bb138cb20a6 ("bpf, arm64: Inline
bpf_get_current_task/_btf() helpers").

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 arch/loongarch/net/bpf_jit.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index 24913dc7f4e8..b93b67d22c26 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -1147,6 +1147,13 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, bool ext
 
 	/* function call */
 	case BPF_JMP | BPF_CALL:
+		/* Implement helper call to bpf_get_current_task/_btf() inline */
+		if (insn->src_reg == 0 && (insn->imm == BPF_FUNC_get_current_task ||
+					   insn->imm == BPF_FUNC_get_current_task_btf)) {
+			move_reg(ctx, regmap[BPF_REG_0], LOONGARCH_GPR_TP);
+			break;
+		}
+
 		ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass,
 					    &func_addr, &func_addr_fixed);
 		if (ret < 0)
@@ -2367,3 +2374,14 @@ bool bpf_jit_supports_subprog_tailcalls(void)
 {
 	return true;
 }
+
+bool bpf_jit_inlines_helper_call(s32 imm)
+{
+	switch (imm) {
+	case BPF_FUNC_get_current_task:
+	case BPF_FUNC_get_current_task_btf:
+		return true;
+	default:
+		return false;
+	}
+}
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 3/3] LoongArch: BPF: Inline bpf_get_smp_processor_id() helper
  2026-04-28  7:20 [RFC PATCH 0/3] Move thread_info into task_struct for LoongArch Tiezhu Yang
  2026-04-28  7:20 ` [RFC PATCH 1/3] LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK Tiezhu Yang
  2026-04-28  7:20 ` [RFC PATCH 2/3] LoongArch: BPF: Inline bpf_get_current_task{_btf}() helpers Tiezhu Yang
@ 2026-04-28  7:20 ` Tiezhu Yang
  2026-04-28 21:25   ` sashiko-bot
  2 siblings, 1 reply; 12+ messages in thread
From: Tiezhu Yang @ 2026-04-28  7:20 UTC (permalink / raw)
  To: loongarch, bpf, linux-arch, linux-kernel

The pointer to thread_info is always available in the $tp register,
the call to bpf_get_smp_processor_id() can be inlined into a single
load instruction.

(1) Here is the sample test.c:

  #include <linux/bpf.h>
  #include <bpf/bpf_helpers.h>

  SEC("raw_tp/sys_enter")
  int test_cpuid(void *ctx)
  {
	  return bpf_get_smp_processor_id();
  }
  char _license[] SEC("license") = "GPL";

(2) Here are the test steps:

  sudo yum install libbpf-devel kernel-devel bpftool
  clang -target bpf -O2 -c test.c -o test.o
  sudo sysctl -w net.core.bpf_jit_enable=1
  sudo bpftool prog show name test_cpuid
  sudo rm -f /sys/fs/bpf/test_cpuid
  sudo bpftool prog load test.o /sys/fs/bpf/test_cpuid
  ID=$(sudo bpftool prog show pinned /sys/fs/bpf/test_cpuid | grep -oE '^[0-9]+')
  sudo bpftool prog dump jited id $ID

(3) Here are the test results:

Before: 6 instructions

  ...
  64:	lu12i.w	$t1, 1213
  68:	ori	$t1, $t1, 1680
  6c:	lu32i.d	$t1, 0
  70:	lu52i.d	$t1, $t1, -1792
  74:	jirl	$ra, $t1, 0
  78:	move	$a5, $a0
  ...

After: 1 instruction

  ...
  64:	ld.w	$a5, $tp, 8
  ...

This is similar with commit 2ddec2c80b44 ("riscv, bpf: inline
bpf_get_smp_processor_id()").

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 arch/loongarch/net/bpf_jit.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index b93b67d22c26..b1c6194c4099 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2022 Loongson Technology Corporation Limited
  */
 #include <linux/memory.h>
+#include <asm/asm-offsets.h>
 #include "bpf_jit.h"
 
 #define LOONGARCH_MAX_REG_ARGS 8
@@ -1154,6 +1155,12 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, bool ext
 			break;
 		}
 
+		/* Implement helper call to bpf_get_smp_processor_id() inline */
+		if (insn->src_reg == 0 && insn->imm == BPF_FUNC_get_smp_processor_id) {
+			emit_insn(ctx, ldw, regmap[BPF_REG_0], LOONGARCH_GPR_TP, TI_CPU);
+			break;
+		}
+
 		ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass,
 					    &func_addr, &func_addr_fixed);
 		if (ret < 0)
@@ -2380,6 +2387,7 @@ bool bpf_jit_inlines_helper_call(s32 imm)
 	switch (imm) {
 	case BPF_FUNC_get_current_task:
 	case BPF_FUNC_get_current_task_btf:
+	case BPF_FUNC_get_smp_processor_id:
 		return true;
 	default:
 		return false;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 1/3] LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK
  2026-04-28  7:20 ` [RFC PATCH 1/3] LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK Tiezhu Yang
@ 2026-04-28 20:38   ` sashiko-bot
  2026-06-01 13:46   ` Huacai Chen
  1 sibling, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-04-28 20:38 UTC (permalink / raw)
  To: Tiezhu Yang; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] The default thread flags initialization drops `_TIF_FIXADE`, silently disabling unaligned memory access software fixups by default.
- [High] The `__switch_to` assembly function incorrectly saves the `sched_ra` and `sched_cfa` arguments due to a shifted parameter list, corrupting the saved stack unwinding information.
- [Low] The `reg02` field added to `thread_struct` is dead code, never read or restored, and left uninitialized for kernel threads.
--

commit 40aa85795dd4ee444da3115a1909cec1be7bed8d
Author: Tiezhu Yang <yangtiezhu@loongson.cn>

LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK

This commit selects THREAD_INFO_IN_TASK for LoongArch to move thread_info
off the stack into task_struct. It introduces a per-CPU variable __entry_task
to store the current task_struct pointer and updates the context switch path
and exception entry to use it.

>     Additionally, add reg02 ($tp) to thread_struct and initialize it in
>     both INIT_THREAD and copy_thread() to ensure that the "current" task
>     pointer is correctly set up for the initial idle task and all future
>     child processes. While __switch_to() directly updates $tp from a1 for
>     efficiency, storing it in thread_struct ensures correct restoration
>     for new processes and in non-standard switch paths.

Is this description accurate? Looking at the code, the reg02 field does not
appear to be read during restoration. New threads restore their user tp from
pt_regs, and cpu_restore_nonscratch does not seem to restore it. 

> diff --git a/arch/loongarch/include/asm/processor.h b/arch/loongarch/include/asm/processor.h
> index ce8b953f8c79..df927a4318cc 100644
> --- a/arch/loongarch/include/asm/processor.h
> +++ b/arch/loongarch/include/asm/processor.h
[ ... ]
> @@ -109,7 +109,7 @@ struct loongarch_vdso_info;
>   */
>  struct thread_struct {
>  	/* Main processor registers. */
> -	unsigned long reg01, reg03, reg22; /* ra sp fp */
> +	unsigned long reg01, reg02, reg03, reg22; /* ra tp sp fp */
>  	unsigned long reg23, reg24, reg25, reg26; /* s0-s3 */
>  	unsigned long reg27, reg28, reg29, reg30, reg31; /* s4-s8 */

Could this reg02 field be dead code? If it is never read back out of the
thread_struct, is it needed here?

> diff --git a/arch/loongarch/include/asm/switch_to.h b/arch/loongarch/include/asm/switch_to.h
> index 5b225aff3ba2..9932429cfe17 100644
> --- a/arch/loongarch/include/asm/switch_to.h
> +++ b/arch/loongarch/include/asm/switch_to.h
[ ... ]
> @@ -23,7 +31,7 @@ struct task_struct;
>   * the context of next. Returns prev.
>   */
>  extern asmlinkage struct task_struct *__switch_to(struct task_struct *prev,
> -			struct task_struct *next, struct thread_info *next_ti,
> +			struct task_struct *next,
>  			void *sched_ra, void *sched_cfa);

When the next_ti argument is removed here, sched_ra and sched_cfa shift to 
registers a2 and a3 under the LoongArch C calling convention.

Does the assembly implementation of __switch_to() in switch.S also need to be 
updated? If it still reads from a3 and a4, could this corrupt the saved stack 
unwinding information by storing the stack pointer into THREAD_SCHED_RA
instead? This might break stack unwinding for parked tasks.

> diff --git a/arch/loongarch/include/asm/thread_info.h b/arch/loongarch/include/asm/thread_info.h
> index 4d7117fcdc78..2c95a5134976 100644
> --- a/arch/loongarch/include/asm/thread_info.h
> +++ b/arch/loongarch/include/asm/thread_info.h
[ ... ]
> @@ -37,20 +35,11 @@ struct thread_info {
>   */
>  #define INIT_THREAD_INFO(tsk)			\
>  {						\
> -	.task		= &tsk,			\
> -	.flags		= _TIF_FIXADE,		\
> +	.flags		= 0,			\
>  	.cpu		= 0,			\
>  	.preempt_count	= INIT_PREEMPT_COUNT,	\
>  }

Does removing _TIF_FIXADE here globally disable the unaligned memory access 
software fixups? 

Since all newly created tasks inherit their initial thread info flags from 
init_task, could this cause user-space applications performing unaligned 
accesses to crash with a bus error?

> diff --git a/arch/loongarch/kernel/process.c b/arch/loongarch/kernel/process.c
> index 5505fc355e1b..71c9c6468e60 100644
> --- a/arch/loongarch/kernel/process.c
> +++ b/arch/loongarch/kernel/process.c
[ ... ]
> @@ -223,6 +226,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>  	if (clone_flags & CLONE_SETTLS)
>  		childregs->regs[2] = tls;
>  
> +	/* Set tp to the new task structure for context switching */
> +	p->thread.reg02 = (unsigned long)p;
> +
>  out:
>  	ptrace_hw_copy_thread(p);

For kernel threads, copy_thread() jumps to the out label before reaching this 
initialization. Does this leave p->thread.reg02 uninitialized for kernel
threads, potentially retaining a stale task pointer from the parent process?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260428072021.7289-1-yangtiezhu@loongson.cn?part=1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 3/3] LoongArch: BPF: Inline bpf_get_smp_processor_id() helper
  2026-04-28  7:20 ` [RFC PATCH 3/3] LoongArch: BPF: Inline bpf_get_smp_processor_id() helper Tiezhu Yang
@ 2026-04-28 21:25   ` sashiko-bot
  0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-04-28 21:25 UTC (permalink / raw)
  To: Tiezhu Yang; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] The inlined `bpf_get_smp_processor_id` helper uses a sign-extending load (`ldw`) instead of a zero-extending load (`ldwu`).
--

commit 537ce2d4859db91089b968d6b560566858b53b6f
Author: Tiezhu Yang <yangtiezhu@loongson.cn>

LoongArch: BPF: Inline bpf_get_smp_processor_id() helper

This commit inlines the bpf_get_smp_processor_id() helper into a single
load instruction on LoongArch. It accesses the thread_info which is always
available in the thread pointer register.

> diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
> index b93b67d22c26..b1c6194c4099 100644
> --- a/arch/loongarch/net/bpf_jit.c
> +++ b/arch/loongarch/net/bpf_jit.c

[ ... ]

> @@ -1154,6 +1155,12 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, bool ext
>  			break;
>  		}
>  
> +		/* Implement helper call to bpf_get_smp_processor_id() inline */
> +		if (insn->src_reg == 0 && insn->imm == BPF_FUNC_get_smp_processor_id) {
> +			emit_insn(ctx, ldw, regmap[BPF_REG_0], LOONGARCH_GPR_TP, TI_CPU);

Does this code need to use a zero-extending load (ldwu) instead of a
sign-extending load (ldw)?

The C implementation of bpf_get_smp_processor_id() returns a u64, meaning
the 32-bit smp_processor_id() result is zero-extended to 64 bits.

While CPU IDs are small positive integers so sign-extending and
zero-extending produce the same result in practice, using the
zero-extending ldwu would strictly match the BPF calling convention and
original C helper semantics.

> +			break;
> +		}
> +
>  		ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass,
>  					    &func_addr, &func_addr_fixed);
>  		if (ret < 0)

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260428072021.7289-1-yangtiezhu@loongson.cn?part=3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 1/3] LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK
  2026-04-28  7:20 ` [RFC PATCH 1/3] LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK Tiezhu Yang
  2026-04-28 20:38   ` sashiko-bot
@ 2026-06-01 13:46   ` Huacai Chen
  2026-06-03  2:30     ` Tiezhu Yang
  1 sibling, 1 reply; 12+ messages in thread
From: Huacai Chen @ 2026-06-01 13:46 UTC (permalink / raw)
  To: Tiezhu Yang; +Cc: loongarch, bpf, linux-arch, linux-kernel

Hi, Tiezhu,

On Tue, Apr 28, 2026 at 3:20 PM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
>
> Like other architectures such as x86, arm64, riscv, powerpc and s390,
> select THREAD_INFO_IN_TASK for LoongArch to move thread_info off the
> stack into task_struct. This follows modern kernel standards and also
> makes the system more secure.
>
> With this patch, thread_info is included in task_struct at an offset
> of 0 instead of being placed at the bottom of the kernel stack. Thus,
> the $tp register points to both thread_info and task_struct.
>
> To support this, introduce a per-CPU variable __entry_task to store
> the pointer to the current task_struct. This decouples the recovery
> of the $tp register from the stack pointer during exception entry.
> Relying on SP for task identification is insecure since the SP can be
> controlled or corrupted by userspace.
>
> Then initialize __entry_task for both the primary and secondary CPUs
> during arch-specific setup and SMP boot paths, modify SAVE_SOME and
> handle_syscall to restore the $tp register from __entry_task, and use
> la_abs absolute addressing for __entry_task access in assembly to
> bypass relocation limits within exception handling sections. Also,
> add entry_task_switch() to update __entry_task during switch_to().
>
> Additionally, add reg02 ($tp) to thread_struct and initialize it in
> both INIT_THREAD and copy_thread() to ensure that the "current" task
> pointer is correctly set up for the initial idle task and all future
> child processes. While __switch_to() directly updates $tp from a1 for
> efficiency, storing it in thread_struct ensures correct restoration
> for new processes and in non-standard switch paths.
>
> Furthermore, initialize reg03 ($sp) in INIT_THREAD to set the initial
> kernel stack pointer for the idle task. This ensures the idle task has
> a valid stack to use when the system first starts or whenever it is
> switched back to.
>
> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> ---
>  arch/loongarch/Kconfig                   |  1 +
>  arch/loongarch/include/asm/current.h     | 22 ++++++++++++
>  arch/loongarch/include/asm/processor.h   | 44 +++---------------------
>  arch/loongarch/include/asm/ptrace.h      |  6 ----
>  arch/loongarch/include/asm/smp.h         |  3 +-
>  arch/loongarch/include/asm/stackframe.h  |  9 +++--
>  arch/loongarch/include/asm/switch_to.h   | 15 ++++++--
>  arch/loongarch/include/asm/thread_info.h | 13 +------
>  arch/loongarch/kernel/asm-offsets.c      |  7 ++--
>  arch/loongarch/kernel/entry.S            |  7 ++--
>  arch/loongarch/kernel/head.S             | 18 ++++++----
>  arch/loongarch/kernel/process.c          |  6 ++++
>  arch/loongarch/kernel/relocate.c         |  2 +-
>  arch/loongarch/kernel/setup.c            |  3 ++
>  arch/loongarch/kernel/smp.c              |  8 +++--
>  arch/loongarch/kernel/switch.S           |  9 ++---
>  16 files changed, 89 insertions(+), 84 deletions(-)
>  create mode 100644 arch/loongarch/include/asm/current.h
First of all, you should update
Documentation/features/core/thread-info-in-task/arch-support.txt
together.

>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 3b042dbb2c41..ea29d5d17588 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -210,6 +210,7 @@ config LOONGARCH
>         select SYSCTL_ARCH_UNALIGN_NO_WARN
>         select SYSCTL_EXCEPTION_TRACE
>         select SWIOTLB if 64BIT
> +       select THREAD_INFO_IN_TASK
>         select TRACE_IRQFLAGS_SUPPORT
>         select USE_PERCPU_NUMA_NODE_ID
>         select USER_STACKTRACE_SUPPORT
> diff --git a/arch/loongarch/include/asm/current.h b/arch/loongarch/include/asm/current.h
> new file mode 100644
> index 000000000000..c03cf0abb863
> --- /dev/null
> +++ b/arch/loongarch/include/asm/current.h
> @@ -0,0 +1,22 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_LOONGARCH_CURRENT_H
> +#define __ASM_LOONGARCH_CURRENT_H
> +
> +#include <linux/compiler.h>
> +
> +#ifndef __ASSEMBLER__
> +
> +struct task_struct;
> +
> +register struct task_struct *current_thread_pointer __asm__("$tp");
> +
> +static __always_inline struct task_struct *get_current(void)
> +{
> +       return current_thread_pointer;
> +}
> +
> +#define current get_current()
> +
> +#endif /* __ASSEMBLER__ */
> +
> +#endif /* __ASM_LOONGARCH_CURRENT_H */
> diff --git a/arch/loongarch/include/asm/processor.h b/arch/loongarch/include/asm/processor.h
> index ce8b953f8c79..df927a4318cc 100644
> --- a/arch/loongarch/include/asm/processor.h
> +++ b/arch/loongarch/include/asm/processor.h
> @@ -109,7 +109,7 @@ struct loongarch_vdso_info;
>   */
>  struct thread_struct {
>         /* Main processor registers. */
> -       unsigned long reg01, reg03, reg22; /* ra sp fp */
> +       unsigned long reg01, reg02, reg03, reg22; /* ra tp sp fp */
>         unsigned long reg23, reg24, reg25, reg26; /* s0-s3 */
>         unsigned long reg27, reg28, reg29, reg30, reg31; /* s4-s8 */
>
> @@ -145,45 +145,9 @@ struct thread_struct {
>  #define thread_saved_ra(tsk)   (tsk->thread.sched_ra)
>  #define thread_saved_fp(tsk)   (tsk->thread.sched_cfa)
>
> -#define INIT_THREAD  {                                         \
> -       /*                                                      \
> -        * Main processor registers                             \
> -        */                                                     \
> -       .reg01                  = 0,                            \
> -       .reg03                  = 0,                            \
> -       .reg22                  = 0,                            \
> -       .reg23                  = 0,                            \
> -       .reg24                  = 0,                            \
> -       .reg25                  = 0,                            \
> -       .reg26                  = 0,                            \
> -       .reg27                  = 0,                            \
> -       .reg28                  = 0,                            \
> -       .reg29                  = 0,                            \
> -       .reg30                  = 0,                            \
> -       .reg31                  = 0,                            \
> -       .sched_ra               = 0,                            \
> -       .sched_cfa              = 0,                            \
> -       .csr_crmd               = 0,                            \
> -       .csr_prmd               = 0,                            \
> -       .csr_euen               = 0,                            \
> -       .csr_ecfg               = 0,                            \
> -       .csr_badvaddr           = 0,                            \
> -       /*                                                      \
> -        * Other stuff associated with the process              \
> -        */                                                     \
> -       .trap_nr                = 0,                            \
> -       .error_code             = 0,                            \
> -       /*                                                      \
> -        * FPU & vector registers                               \
> -        */                                                     \
> -       .fpu                    = {                             \
> -               .fcc            = 0,                            \
> -               .fcsr           = 0,                            \
> -               .ftop           = 0,                            \
> -               .fpr            = {{{0,},},},                   \
> -       },                                                      \
> -       .hbp_break              = {0},                          \
> -       .hbp_watch              = {0},                          \
> +#define INIT_THREAD  {                                                 \
> +       .reg02 = (unsigned long)&init_task,                             \
> +       .reg03 = (unsigned long)&init_stack + sizeof(init_stack),       \
>  }
Don't remove the old code, just adding reg02 is enough. Though the
result is the same, explicitly initialization can give more
information.

>
>  struct task_struct;
> diff --git a/arch/loongarch/include/asm/ptrace.h b/arch/loongarch/include/asm/ptrace.h
> index e5d21e836d99..37f53629d3c7 100644
> --- a/arch/loongarch/include/asm/ptrace.h
> +++ b/arch/loongarch/include/asm/ptrace.h
> @@ -170,12 +170,6 @@ static inline void die_if_kernel(const char *str, struct pt_regs *regs)
>                 die(str, regs);
>  }
>
> -#define current_pt_regs()                                              \
> -({                                                                     \
> -       unsigned long sp = (unsigned long)__builtin_frame_address(0);   \
> -       (struct pt_regs *)((sp | (THREAD_SIZE - 1)) + 1) - 1;           \
> -})
> -
This is still correct after CONFIG_THREAD_INFO_IN_TASK, so please keep
it. Especially CONFIG_THREAD_INFO_IN_TASK increases the cost of
exception/syscalls, keeping this can minimize the performance
impaction.

>  /* Helpers for working with the user stack pointer */
>
>  static inline unsigned long user_stack_pointer(struct pt_regs *regs)
> diff --git a/arch/loongarch/include/asm/smp.h b/arch/loongarch/include/asm/smp.h
> index 3a47f52959a8..761cc0016df8 100644
> --- a/arch/loongarch/include/asm/smp.h
> +++ b/arch/loongarch/include/asm/smp.h
> @@ -82,7 +82,8 @@ struct seq_file;
>
>  struct secondary_data {
>         unsigned long stack;
> -       unsigned long thread_info;
> +       unsigned long task;
> +       unsigned long offset;
>  };
>  extern struct secondary_data cpuboot_data;
>
> diff --git a/arch/loongarch/include/asm/stackframe.h b/arch/loongarch/include/asm/stackframe.h
> index ecc8e50fffa8..eeda5dcc982e 100644
> --- a/arch/loongarch/include/asm/stackframe.h
> +++ b/arch/loongarch/include/asm/stackframe.h
> @@ -191,8 +191,13 @@
>         andi    t0, t0, 0x3     /* extract pplv bit */
>         beqz    t0, 9f
>
> -       LONG_LI tp, ~_THREAD_MASK
> -       and     tp, tp, sp
> +       la_abs      t1, __entry_task
> +#ifdef CONFIG_SMP
> +       csrrd       t0, PERCPU_BASE_KS
> +       LONG_ADD    t1, t1, t0
> +#endif
> +       LONG_L      tp, t1, 0
> +
>         cfi_st  u0, PT_R21, \docfi
>         csrrd   u0, PERCPU_BASE_KS
Move these lines near to "cfi_st  fp, PT_R22, \docfi", then the above
"csrrd       t0, PERCPU_BASE_KS" can be removed.

>  9:
> diff --git a/arch/loongarch/include/asm/switch_to.h b/arch/loongarch/include/asm/switch_to.h
> index 5b225aff3ba2..9932429cfe17 100644
> --- a/arch/loongarch/include/asm/switch_to.h
> +++ b/arch/loongarch/include/asm/switch_to.h
> @@ -5,17 +5,25 @@
>  #ifndef _ASM_SWITCH_TO_H
>  #define _ASM_SWITCH_TO_H
>
> +#include <linux/percpu.h>
> +
>  #include <asm/cpu-features.h>
>  #include <asm/fpu.h>
>  #include <asm/lbt.h>
>
>  struct task_struct;
>
> +DECLARE_PER_CPU(struct task_struct *, __entry_task);
> +
> +static inline void entry_task_switch(struct task_struct *next)
> +{
> +       __this_cpu_write(__entry_task, next);
> +}
I love the UML naming, which means rename __entry_task to cpu_tasks
and rename entry_task_switch() to set_current(), then move them to
current.h.

> +
>  /**
>   * __switch_to - switch execution of a task
>   * @prev:      The task previously executed.
>   * @next:      The task to begin executing.
> - * @next_ti:   task_thread_info(next).
>   * @sched_ra:  __schedule return address.
>   * @sched_cfa: __schedule call frame address.
>   *
> @@ -23,7 +31,7 @@ struct task_struct;
>   * the context of next. Returns prev.
>   */
>  extern asmlinkage struct task_struct *__switch_to(struct task_struct *prev,
> -                       struct task_struct *next, struct thread_info *next_ti,
> +                       struct task_struct *next,
>                         void *sched_ra, void *sched_cfa);
>
>  /*
> @@ -37,7 +45,8 @@ do {                                                                          \
>         lose_fpu_inatomic(1, prev);                                             \
>         lose_lbt_inatomic(1, prev);                                             \
>         hw_breakpoint_thread_switch(next);                                      \
> -       (last) = __switch_to(prev, next, task_thread_info(next),                \
> +       entry_task_switch(next);                                                \
> +       (last) = __switch_to(prev, next,                                        \
>                  __builtin_return_address(0), __builtin_frame_address(0));      \
>  } while (0)
>
> diff --git a/arch/loongarch/include/asm/thread_info.h b/arch/loongarch/include/asm/thread_info.h
> index 4d7117fcdc78..2c95a5134976 100644
> --- a/arch/loongarch/include/asm/thread_info.h
> +++ b/arch/loongarch/include/asm/thread_info.h
> @@ -22,9 +22,7 @@
>   *   must also be changed
>   */
>  struct thread_info {
> -       struct task_struct      *task;          /* main task structure */
>         unsigned long           flags;          /* low level flags */
> -       unsigned long           tp_value;       /* thread pointer */
Don't remove tp_value, it has nothing to do with this patch, instead,
it is for future LBT tls.

>         __u32                   cpu;            /* current CPU */
>         int                     preempt_count;  /* 0 => preemptible, <0 => BUG */
>         struct pt_regs          *regs;
> @@ -37,20 +35,11 @@ struct thread_info {
>   */
>  #define INIT_THREAD_INFO(tsk)                  \
>  {                                              \
> -       .task           = &tsk,                 \
> -       .flags          = _TIF_FIXADE,          \
> +       .flags          = 0,                    \
Don't change flags.

>         .cpu            = 0,                    \
>         .preempt_count  = INIT_PREEMPT_COUNT,   \
>  }
>
> -/* How to get the thread information struct from C. */
> -register struct thread_info *__current_thread_info __asm__("$tp");
> -
> -static inline struct thread_info *current_thread_info(void)
> -{
> -       return __current_thread_info;
> -}
> -
>  register unsigned long current_stack_pointer __asm__("$sp");
>
>  #endif /* !__ASSEMBLER__ */
> diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c
> index 2cc953f113ac..55f4ffc1aca5 100644
> --- a/arch/loongarch/kernel/asm-offsets.c
> +++ b/arch/loongarch/kernel/asm-offsets.c
> @@ -70,7 +70,7 @@ static void __used output_task_defines(void)
>  {
>         COMMENT("LoongArch task_struct offsets.");
>         OFFSET(TASK_STATE, task_struct, __state);
> -       OFFSET(TASK_THREAD_INFO, task_struct, stack);
> +       OFFSET(TASK_STACK, task_struct, stack);
>         OFFSET(TASK_FLAGS, task_struct, flags);
>         OFFSET(TASK_MM, task_struct, mm);
>         OFFSET(TASK_PID, task_struct, pid);
> @@ -84,9 +84,7 @@ static void __used output_task_defines(void)
>  static void __used output_thread_info_defines(void)
>  {
>         COMMENT("LoongArch thread_info offsets.");
> -       OFFSET(TI_TASK, thread_info, task);
>         OFFSET(TI_FLAGS, thread_info, flags);
> -       OFFSET(TI_TP_VALUE, thread_info, tp_value);
>         OFFSET(TI_CPU, thread_info, cpu);
>         OFFSET(TI_PRE_COUNT, thread_info, preempt_count);
>         OFFSET(TI_REGS, thread_info, regs);
> @@ -267,7 +265,8 @@ static void __used output_smpboot_defines(void)
>  {
>         COMMENT("Linux smp cpu boot offsets.");
>         OFFSET(CPU_BOOT_STACK, secondary_data, stack);
> -       OFFSET(CPU_BOOT_TINFO, secondary_data, thread_info);
> +       OFFSET(CPU_BOOT_TASK, secondary_data, task);
> +       OFFSET(CPU_BOOT_OFFSET, secondary_data, offset);
>         BLANK();
>  }
>  #endif
> diff --git a/arch/loongarch/kernel/entry.S b/arch/loongarch/kernel/entry.S
> index b53d333a7c42..b099672eab60 100644
> --- a/arch/loongarch/kernel/entry.S
> +++ b/arch/loongarch/kernel/entry.S
> @@ -67,8 +67,11 @@ SYM_CODE_START(handle_syscall)
>  #endif
>
>         move            u0, t0
> -       LONG_LI         tp, ~_THREAD_MASK
> -       and             tp, tp, sp
> +       la_abs          t1, __entry_task
> +#ifdef CONFIG_SMP
> +       LONG_ADD        t1, t1, u0
> +#endif
> +       LONG_L          tp, t1, 0
>
>         move            a0, sp
>         bl              do_syscall
> diff --git a/arch/loongarch/kernel/head.S b/arch/loongarch/kernel/head.S
> index 4eed7bc312a8..ec67faab907b 100644
> --- a/arch/loongarch/kernel/head.S
> +++ b/arch/loongarch/kernel/head.S
> @@ -74,10 +74,11 @@ SYM_CODE_START(kernel_entry)                        # kernel entry point
>         /* GPR21 used for percpu base (runtime), initialized as 0 */
>         move            u0, zero
>
> -       la.pcrel        tp, init_thread_union
> -       /* Set the SP after an empty pt_regs.  */
> -       PTR_LI          sp, (_THREAD_SIZE - PT_SIZE)
> -       PTR_ADD         sp, sp, tp
> +       la.pcrel        tp, init_task
> +       la.pcrel        t0, init_stack
> +       PTR_LI          t1, _THREAD_SIZE
> +       PTR_ADD         t0, t0, t1
> +       PTR_ADDI        sp, t0, -PT_SIZE
>         set_saved_sp    sp, t0, t1
>
>  #ifdef CONFIG_RELOCATABLE
> @@ -86,8 +87,10 @@ SYM_CODE_START(kernel_entry)                 # kernel entry point
>
>  #ifdef CONFIG_RANDOMIZE_BASE
>         /* Repoint the sp into the new kernel */
> -       PTR_LI          sp, (_THREAD_SIZE - PT_SIZE)
> -       PTR_ADD         sp, sp, tp
> +       LONG_LPTR       t0, tp, TASK_STACK
> +       PTR_LI          t1, _THREAD_SIZE
> +       PTR_ADD         t0, t0, t1
> +       PTR_ADDI        sp, t0, -PT_SIZE
>         set_saved_sp    sp, t0, t1
>
>         /* Jump to the new kernel: new_pc = current_pc + random_offset */
> @@ -128,7 +131,8 @@ SYM_CODE_START(smpboot_entry)
>  #endif
>         la.pcrel        t0, cpuboot_data
>         ld.d            sp, t0, CPU_BOOT_STACK
> -       ld.d            tp, t0, CPU_BOOT_TINFO
> +       ld.d            tp, t0, CPU_BOOT_TASK
> +       ld.d            u0, t0, CPU_BOOT_OFFSET
>
>         bl              start_secondary
>         ASM_BUG()
> diff --git a/arch/loongarch/kernel/process.c b/arch/loongarch/kernel/process.c
> index 5505fc355e1b..71c9c6468e60 100644
> --- a/arch/loongarch/kernel/process.c
> +++ b/arch/loongarch/kernel/process.c
> @@ -54,6 +54,9 @@
>  #include <asm/vdso.h>
>  #include <asm/vdso/vdso.h>
>
> +DEFINE_PER_CPU(struct task_struct *, __entry_task);
> +EXPORT_PER_CPU_SYMBOL_GPL(__entry_task);
> +
>  #ifdef CONFIG_STACKPROTECTOR
>  #include <linux/stackprotector.h>
>  unsigned long __stack_chk_guard __read_mostly;
> @@ -223,6 +226,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>         if (clone_flags & CLONE_SETTLS)
>                 childregs->regs[2] = tls;
>
> +       /* Set tp to the new task structure for context switching */
> +       p->thread.reg02 = (unsigned long)p;
This should be before "if (unlikely(args->fn))" for kernel thread.

> +
>  out:
>         ptrace_hw_copy_thread(p);
>         clear_tsk_thread_flag(p, TIF_USEDFPU);
> diff --git a/arch/loongarch/kernel/relocate.c b/arch/loongarch/kernel/relocate.c
> index 16f6a9b39659..a9ef38459629 100644
> --- a/arch/loongarch/kernel/relocate.c
> +++ b/arch/loongarch/kernel/relocate.c
> @@ -263,7 +263,7 @@ unsigned long __init relocate_kernel(void)
>                 reloc_offset += random_offset;
>
>                 /* The current thread is now within the relocated kernel */
> -               __current_thread_info = RELOCATED_KASLR(__current_thread_info);
> +               current_thread_pointer = RELOCATED_KASLR(current_thread_pointer);
>
>                 update_reloc_offset(&reloc_offset, random_offset);
>         }
> diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
> index 839b23edee87..5d434c5612ab 100644
> --- a/arch/loongarch/kernel/setup.c
> +++ b/arch/loongarch/kernel/setup.c
> @@ -46,6 +46,7 @@
>  #include <asm/pgalloc.h>
>  #include <asm/sections.h>
>  #include <asm/setup.h>
> +#include <asm/switch_to.h>
>  #include <asm/time.h>
>  #include <asm/unwind.h>
>
> @@ -617,4 +618,6 @@ void __init setup_arch(char **cmdline_p)
>  #ifdef CONFIG_KASAN
>         kasan_init();
>  #endif
> +
> +       entry_task_switch(&init_task);
This should be as early as possible, I suggest moving it after unwind_init().

>  }
> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
> index 64a048f1b880..e8b0d2fc2a9a 100644
> --- a/arch/loongarch/kernel/smp.c
> +++ b/arch/loongarch/kernel/smp.c
> @@ -34,6 +34,7 @@
>  #include <asm/paravirt.h>
>  #include <asm/processor.h>
>  #include <asm/setup.h>
> +#include <asm/switch_to.h>
>  #include <asm/time.h>
>
>  int __cpu_number_map[NR_CPUS];   /* Map physical to logical */
> @@ -400,8 +401,9 @@ void loongson_boot_secondary(int cpu, struct task_struct *idle)
>         pr_info("Booting CPU#%d...\n", cpu);
>
>         entry = __pa_symbol((unsigned long)&smpboot_entry);
> -       cpuboot_data.stack = (unsigned long)__KSTK_TOS(idle);
> -       cpuboot_data.thread_info = (unsigned long)task_thread_info(idle);
> +       cpuboot_data.stack = (unsigned long)task_pt_regs(idle);
> +       cpuboot_data.task = (unsigned long)idle;
> +       cpuboot_data.offset = per_cpu_offset(cpu);
>
>         csr_mail_send(entry, cpu_logical_map(cpu), 0);
>
> @@ -686,6 +688,8 @@ asmlinkage void start_secondary(void)
>          */
>         complete(&cpu_running);
>
> +       entry_task_switch(current);
This should be as early as possible, I suggest moving it after cpu_probe().

> +
>         /*
>          * irq will be enabled in loongson_smp_finish(), enabling it too
>          * early is dangerous.
> diff --git a/arch/loongarch/kernel/switch.S b/arch/loongarch/kernel/switch.S
> index f377d8f5c51a..644348e05f6a 100644
> --- a/arch/loongarch/kernel/switch.S
> +++ b/arch/loongarch/kernel/switch.S
> @@ -12,7 +12,7 @@
>
>  /*
>   * task_struct *__switch_to(task_struct *prev, task_struct *next,
> - *                         struct thread_info *next_ti, void *sched_ra, void *sched_cfa)
> + *                         void *sched_ra, void *sched_cfa)
>   */
>         .align  5
>  SYM_FUNC_START(__switch_to)
> @@ -33,11 +33,12 @@ SYM_FUNC_START(__switch_to)
>         LONG_SPTR       t8, t7, 0
>  #endif
>
> -       move    tp, a2
> +       move    tp, a1
>         cpu_restore_nonscratch a1
>
> -       li.w            t0, _THREAD_SIZE
> -       PTR_ADD         t0, t0, tp
> +       LONG_LPTR       t0, tp, TASK_STACK
This should be "LONG_LPTR       t0, tp, (TASK_STACK -
TASK_STRUCT_OFFSET)", otherwise it is wrong for 32BIT.

Huacai
> +       PTR_LI          t1, _THREAD_SIZE
> +       PTR_ADD         t0, t0, t1
>         set_saved_sp    t0, t1, t2
>
>         LONG_LPTR       t1, a1, (THREAD_CSRPRMD - TASK_STRUCT_OFFSET)
> --
> 2.42.0
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 1/3] LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK
  2026-06-01 13:46   ` Huacai Chen
@ 2026-06-03  2:30     ` Tiezhu Yang
  2026-06-03 14:24       ` Huacai Chen
  0 siblings, 1 reply; 12+ messages in thread
From: Tiezhu Yang @ 2026-06-03  2:30 UTC (permalink / raw)
  To: Huacai Chen; +Cc: loongarch, bpf, linux-arch, linux-kernel

On 2026/6/1 下午9:46, Huacai Chen wrote:
> Hi, Tiezhu,

...

> First of all, you should update
> Documentation/features/core/thread-info-in-task/arch-support.txt
> together.

OK, will do it.

>> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
>> index 3b042dbb2c41..ea29d5d17588 100644
>> --- a/arch/loongarch/Kconfig
>> +++ b/arch/loongarch/Kconfig
>> @@ -210,6 +210,7 @@ config LOONGARCH
>>          select SYSCTL_ARCH_UNALIGN_NO_WARN
>>          select SYSCTL_EXCEPTION_TRACE
>>          select SWIOTLB if 64BIT
>> +       select THREAD_INFO_IN_TASK
>>          select TRACE_IRQFLAGS_SUPPORT
>>          select USE_PERCPU_NUMA_NODE_ID
>>          select USER_STACKTRACE_SUPPORT

...

>> +#define INIT_THREAD  {                                                 \
>> +       .reg02 = (unsigned long)&init_task,                             \
>> +       .reg03 = (unsigned long)&init_stack + sizeof(init_stack),       \
>>   }
> Don't remove the old code, just adding reg02 is enough. Though the
> result is the same, explicitly initialization can give more
> information.

After thinking it through, the introduction and initialization of
thread_struct.reg02 (including the assignment in INIT_THREAD and
p->thread.reg02 = (unsigned long)p; in copy_thread()) are redundant
and should be removed. The reasons are as follows:

1. Direct update in __switch_to: In __switch_to within switch.S, the
    hardware $tp register is updated directly from the next argument
    (via register a1) using "move tp, a1".

2. No restoration path: The cpu_restore_nonscratch macro does not
    include any restoration logic for reg02. This means no assembly
    or C code ever reads thread_struct.reg02 across the entire context
    switch path, whether standard or non-standard.

3. Exception/Syscall recovery relies on per-CPU variables: At exception
    and system call entry points (e.g., in stackframe.h and entry.S),
    the recovery of the kernel-space $tp relies entirely on the per-CPU
    variable __entry_task, which is already properly and explicitly
    updated during entry_task_switch() and CPU initialization.

Consequently, reg02 is a classic piece of dead code (write-only, never
read), and trimming this field would keep the architecture code clean.

Regarding the explicit zero-initialization, it is redundant in modern
kernel development.

For static structures like init_task, any uninitialized fields are
automatically zeroed out by the compiler according to the C standard.
Stripping away dozens of lines of ".field = 0" complies with modern
Linux kernel code-cleaning standards. It makes the macro much shorter
and highlights the only field that actually requires a special
runtime value (the kernel stack top in .reg03).

For reference, please see how INIT_THREAD is defined in other major
architectures, where they only initialize what is strictly necessary:

x86
#ifdef CONFIG_X86_32
#define INIT_THREAD  {							  \
	.sp0			= TOP_OF_INIT_STACK,			  \
	.sysenter_cs		= __KERNEL_CS,				  \
}

#else
extern unsigned long __top_init_kernel_stack[];

#define INIT_THREAD {							\
	.sp	= (unsigned long)&__top_init_kernel_stack,		\
}

#endif /* CONFIG_X86_64 */

arm64:
#define INIT_THREAD {				\
	.fpsimd_cpu = NR_CPUS,			\
}

riscv:
#define INIT_THREAD {					\
	.sp = sizeof(init_stack) + (long)&init_stack,	\
	.align_ctl = PR_UNALIGN_NOPRINT,		\
}

Therefore, a cleaner and more accurate approach is to drop
reg02 entirely and adopt the slimmed-down INIT_THREAD for
LoongArch.

>>   struct task_struct;
>> diff --git a/arch/loongarch/include/asm/ptrace.h b/arch/loongarch/include/asm/ptrace.h
>> index e5d21e836d99..37f53629d3c7 100644
>> --- a/arch/loongarch/include/asm/ptrace.h
>> +++ b/arch/loongarch/include/asm/ptrace.h
>> @@ -170,12 +170,6 @@ static inline void die_if_kernel(const char *str, struct pt_regs *regs)
>>                  die(str, regs);
>>   }
>>
>> -#define current_pt_regs()                                              \
>> -({                                                                     \
>> -       unsigned long sp = (unsigned long)__builtin_frame_address(0);   \
>> -       (struct pt_regs *)((sp | (THREAD_SIZE - 1)) + 1) - 1;           \
>> -})
>> -
> This is still correct after CONFIG_THREAD_INFO_IN_TASK, so please keep
> it. Especially CONFIG_THREAD_INFO_IN_TASK increases the cost of
> exception/syscalls, keeping this can minimize the performance
> impaction.

Regarding the suggestion to keep the custom current_pt_regs() macro
under CONFIG_THREAD_INFO_IN_TASK, it must be completely removed.
Keeping it would be fundamentally incorrect and dangerous for the
following reasons:

1. It becomes logically incorrect:

The old macro relies on aligning up the $sp to the top of the stack
via bitwise operations to locate the exact position of pt_regs.

With CONFIG_THREAD_INFO_IN_TASK enabled, the thread_info is moved
off the stack, and the strict coupling between the masked SP and
the absolute position of pt_regs is broken (especially if features
like VMAP_STACK are enabled in the future, where stacks are no
longer naturally aligned to THREAD_SIZE).

Keeping this macro will cause current_pt_regs() to return a
corrupted/incorrect pointer, leading to inevitable kernel panics
or silent data corruption.

2. No real performance benefit:

Once CONFIG_THREAD_INFO_IN_TASK is selected, current is simply
the hardware $tp register. Fetching pt_regs via task_pt_regs()
just compiles down to loading the stack pointer from $tp with
a single memory access, followed by a constant offset adjustment.

This is extremely fast and efficient on LoongArch, and it avoids
multiple ALU operations (or, add, sub) required by the old
SP-masking macro.

3. Alignment with other architectures:

Other major architectures (such as x86, arm64, and riscv) all
completely dropped their custom SP-masking current_pt_regs()
implementations when moving to THREAD_INFO_IN_TASK, relying
instead on the standard, safe, and generic task_pt_regs()
provided by the core kernel wrapper.

Therefore, this custom macro is both broken and insecure under
the new standard, and it must be removed to ensure kernel
stability and clean code alignment with upstream.

>>   /* Helpers for working with the user stack pointer */

...

>> diff --git a/arch/loongarch/include/asm/stackframe.h b/arch/loongarch/include/asm/stackframe.h
>> index ecc8e50fffa8..eeda5dcc982e 100644
>> --- a/arch/loongarch/include/asm/stackframe.h
>> +++ b/arch/loongarch/include/asm/stackframe.h
>> @@ -191,8 +191,13 @@
>>          andi    t0, t0, 0x3     /* extract pplv bit */
>>          beqz    t0, 9f
>>
>> -       LONG_LI tp, ~_THREAD_MASK
>> -       and     tp, tp, sp
>> +       la_abs      t1, __entry_task
>> +#ifdef CONFIG_SMP
>> +       csrrd       t0, PERCPU_BASE_KS
>> +       LONG_ADD    t1, t1, t0
>> +#endif
>> +       LONG_L      tp, t1, 0
>> +
>>          cfi_st  u0, PT_R21, \docfi
>>          csrrd   u0, PERCPU_BASE_KS
> Move these lines near to "cfi_st  fp, PT_R22, \docfi", then the above
> "csrrd       t0, PERCPU_BASE_KS" can be removed.

Regarding the suggestion for stackframe.h:

Looking at the original macro context, this is an excellent and
completely feasible assembly optimization.

By moving the __entry_task restoration right after the preservation
of u0, we can advance the "csrrd u0, PERCPU_BASE_KS" instruction and
reuse the loaded u0 register directly for the LONG_ADD on SMP platforms.
This completely eliminates the need for a duplicate csrrd instruction
inside the #ifdef CONFIG_SMP block.

The optimized code block would look like this:

         beqz    t0, 9f

         cfi_st  u0, PT_R21, \docfi
         csrrd   u0, PERCPU_BASE_KS

         la_abs      t1, __entry_task
#ifdef CONFIG_SMP
         LONG_ADD    t1, t1, u0
#endif
         LONG_L      tp, t1, 0

9:

Thank you for catching this! I will gladly incorporate this assembly
optimization into the next version.

>> diff --git a/arch/loongarch/include/asm/switch_to.h b/arch/loongarch/include/asm/switch_to.h
>> index 5b225aff3ba2..9932429cfe17 100644
>> --- a/arch/loongarch/include/asm/switch_to.h
>> +++ b/arch/loongarch/include/asm/switch_to.h
>> @@ -5,17 +5,25 @@
>>   #ifndef _ASM_SWITCH_TO_H
>>   #define _ASM_SWITCH_TO_H
>>
>> +#include <linux/percpu.h>
>> +
>>   #include <asm/cpu-features.h>
>>   #include <asm/fpu.h>
>>   #include <asm/lbt.h>
>>
>>   struct task_struct;
>>
>> +DECLARE_PER_CPU(struct task_struct *, __entry_task);
>> +
>> +static inline void entry_task_switch(struct task_struct *next)
>> +{
>> +       __this_cpu_write(__entry_task, next);
>> +}
> I love the UML naming, which means rename __entry_task to cpu_tasks
> and rename entry_task_switch() to set_current(), then move them to
> current.h.

Regarding the suggestion to rename and move __entry_task and
entry_task_switch():

Thank you for the suggestion, but after checking the upstream
kernel implementation, the current naming and placement are
actually fully aligned with the multi-architecture standards
established by ARM/ARM64.

A quick grep in the kernel tree reveals that ARM and ARM64
uses the exact same pattern:

$ grep -rn entry_task arch
arch/arm/kernel/process.c:40:DEFINE_PER_CPU(struct task_struct *, 
__entry_task);
arch/arm/include/asm/switch_to.h:31:		__this_cpu_write(__entry_task, 
next);			\
arch/arm/include/asm/thread_info.h:40:DECLARE_PER_CPU(struct task_struct 
*, __entry_task);
arch/arm/include/asm/assembler.h:357:	ldr_this_cpu	\t1, __entry_task, 
\t1, \t2
arch/arm64/kernel/process.c:609:DEFINE_PER_CPU(struct task_struct *, 
__entry_task);
arch/arm64/kernel/process.c:611:static void entry_task_switch(struct 
task_struct *next)
arch/arm64/kernel/process.c:613:	__this_cpu_write(__entry_task, next);
arch/arm64/kernel/process.c:777:	entry_task_switch(next);
arch/arm64/kernel/entry.S:223:	ldr_this_cpu	tsk, __entry_task, x20
arch/arm64/kernel/entry.S:1033:	ldr_this_cpu	dst=x0, sym=__entry_task, 
tmp=x1

As we can see:
1. Moving to current.h is heavily avoided: Both ARM and ARM64 place
    these definitions in process.c or switch_to.h, rather than
    current.h. <asm/current.h> is a highly sensitive, low-level header
    included almost everywhere. Putting per-CPU macros there would pull
    in <linux/percpu.h> and <linux/sched.h>, inevitably triggering
    catastrophic circular header dependency compile errors.

2. "__entry_task" and "entry_task_switch" are the precise industry
    standards: Rather than adopting UML's historical naming style,
    following the ARM64 conventions makes the code much more canonical
    and easier for cross-architecture developers to maintain.
    It clearly expresses that this per-CPU pointer is strictly
    dedicated to the exception entry path for task recovery.

3. "set_current()" causes mental friction: Across the generic kernel,
    "current" is universally treated as a read-only concept. Introducing
    a set_current() helper might mislead developers into thinking they
    can modify the active task pointer at will, whereas
    "entry_task_switch" explicitly limits its semantics to the context
    switch boundary.

Therefore, I prefer to keep the current naming and structure in
switch_to.h to remain consistent with ARM64 and keep the header
dependencies perfectly clean.

>> +
>>   /**
>>    * __switch_to - switch execution of a task
>>    * @prev:      The task previously executed.
>>    * @next:      The task to begin executing.
>> - * @next_ti:   task_thread_info(next).
>>    * @sched_ra:  __schedule return address.
>>    * @sched_cfa: __schedule call frame address.

...

>>   struct thread_info {
>> -       struct task_struct      *task;          /* main task structure */
>>          unsigned long           flags;          /* low level flags */
>> -       unsigned long           tp_value;       /* thread pointer */
> Don't remove tp_value, it has nothing to do with this patch, instead,
> it is for future LBT tls.

Regarding the suggestion to keep tp_value in thread_info:

You are completely right. I walked into a misunderstanding that
tp_value was strictly coupled with the kernel-space $tp tracking.
Since its true purpose is to preserve the user-space TLS value
for the LBT (Loongson Binary Translation) extension context,
it should definitely be decoupled from this THREAD_INFO_IN_TASK
migration.

I will follow the "one patch does one thing" principle and keep
tp_value untouched in struct thread_info to avoid breaking any
future or existing LBT TLS logic.

Thank you for clarifying this! I will restore this field in the
next version.

>>          __u32                   cpu;            /* current CPU */
>>          int                     preempt_count;  /* 0 => preemptible, <0 => BUG */
>>          struct pt_regs          *regs;
>> @@ -37,20 +35,11 @@ struct thread_info {
>>    */
>>   #define INIT_THREAD_INFO(tsk)                  \
>>   {                                              \
>> -       .task           = &tsk,                 \
>> -       .flags          = _TIF_FIXADE,          \
>> +       .flags          = 0,                    \
> Don't change flags.

Regarding the suggestion to keep the flags initialization:

You are completely right. Modifying the default flags (changing
_TIF_FIXADE to 0) is an unrelated side-effect that goes beyond
the scope of migrating thread_info.

Changing this could alter the alignment error fixing behavior
for the initial idle task and cause unexpected regressions.

I will follow your advice, leave the flags logic untouched,
and only remove the deleted ".task = &tsk" member.

Thank you for your critical review!

>>          .cpu            = 0,                    \
>>          .preempt_count  = INIT_PREEMPT_COUNT,   \

...

>> @@ -223,6 +226,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>>          if (clone_flags & CLONE_SETTLS)
>>                  childregs->regs[2] = tls;
>>
>> +       /* Set tp to the new task structure for context switching */
>> +       p->thread.reg02 = (unsigned long)p;
> This should be before "if (unlikely(args->fn))" for kernel thread.

Regarding the feedback on process.c and thread_struct:

Actually, after double-checking the core architecture assembly,
we don't need to worry about where to place
"p->thread.reg02 = (unsigned long)p;"
because this line can be completely deleted, and reg02 shouldn't
be added to thread_struct at all.

As analyzed previously, during context switch, the hardware $tp
register is updated directly from the C argument "next" via
"move tp, a1".

Furthermore, the cpu_restore_nonscratch macro contains absolutely
no logic to read or restore reg02. This means thread_struct.reg02
has a write-only path and is never read anywhere (even for new
processes or kernel threads). To keep the architecture code clean
and avoid misleading future developers, I will completely drop
reg02 and its assignment from the next version.

>> +
>>   out:
>>          ptrace_hw_copy_thread(p);
>>          clear_tsk_thread_flag(p, TIF_USEDFPU);

...

>> +
>> +       entry_task_switch(&init_task);
> This should be as early as possible, I suggest moving it after unwind_init().

Regarding the suggestion to move entry_task_switch() in setup.c:

You are completely right, and this is a critical catch for early
boot stability.

Placing entry_task_switch(&init_task) at the very end of
setup_arch() leaves a massive window during early initialization
where __entry_task remains NULL.

If any early exception, interrupt, or panic occurs before the end
of setup_arch(), the exception entry path will load a NULL pointer
into $tp, triggering an immediate double-fault and completely
blinding the kernel's ability to print stack traces.

Moving it immediately after unwind_init() ensures that the $tp
recovery mechanism is armed as early as possible, providing robust
exception handling support during the rest of the boot sequence.

I will absolutely adopt this suggestion and move it right after
unwind_init() in the next version. Thank you!

>>   }
>> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
>> index 64a048f1b880..e8b0d2fc2a9a 100644
>> --- a/arch/loongarch/kernel/smp.c
>> +++ b/arch/loongarch/kernel/smp.c

...

>> +       entry_task_switch(current);
> This should be as early as possible, I suggest moving it after cpu_probe().

Regarding the suggestion to move entry_task_switch() in smp.c:

You are completely right, and this is another critical catch for
early boot stability, this time on the secondary CPU path.

Placing entry_task_switch(current) after complete(&cpu_running)
leaves a dangerous window during the early C entry of
start_secondary() where the secondary CPU's __entry_task remains
uninitialized (NULL). If any early exception or kernel panic
occurs during the secondary CPU initialization prior to the
completion signal, the exception entry path will load a NULL
pointer into $tp, inducing an immediate double-fault and
completely blinding the kernel's early SMP debugging
capabilities.

Moving it immediately after cpu_probe() ensures that the
secondary CPU arms its $tp recovery mechanism at the earliest
possible stage in its C entry path.

I will absolutely adopt this suggestion and move it right
after cpu_probe() in the next version. Thank you!

>> +
>>          /*
>>           * irq will be enabled in loongson_smp_finish(), enabling it too
>>           * early is dangerous.
>> diff --git a/arch/loongarch/kernel/switch.S b/arch/loongarch/kernel/switch.S
>> index f377d8f5c51a..644348e05f6a 100644
>> --- a/arch/loongarch/kernel/switch.S
>> +++ b/arch/loongarch/kernel/switch.S

...

>> +       LONG_LPTR       t0, tp, TASK_STACK
> This should be "LONG_LPTR       t0, tp, (TASK_STACK -
> TASK_STRUCT_OFFSET)", otherwise it is wrong for 32BIT.

Regarding the suggestion for (TASK_STACK - TASK_STRUCT_OFFSET)
in switch.S:

Thank you for bringing this up! With the definition of
TASK_STRUCT_OFFSET in mind:

#ifdef CONFIG_64BIT
#define TASK_STRUCT_OFFSET 0
#else
#define TASK_STRUCT_OFFSET 2000
#endif

This is an incredibly sharp and critical catch for 32BIT
architecture compatibility.

I will update this line to:
"LONG_LPTR       t0, tp, (TASK_STACK - TASK_STRUCT_OFFSET)"
in the next version.

This is the incremental diff based on the original patch:

----->8-----
diff --git 
a/Documentation/features/core/thread-info-in-task/arch-support.txt 
b/Documentation/features/core/thread-info-in-task/arch-support.txt
index f3d744c76061..e26efdfbb6b4 100644
--- a/Documentation/features/core/thread-info-in-task/arch-support.txt
+++ b/Documentation/features/core/thread-info-in-task/arch-support.txt
@@ -12,7 +12,7 @@
      |       arm64: |  ok  |
      |        csky: | TODO |
      |     hexagon: | TODO |
-    |   loongarch: | TODO |
+    |   loongarch: |  ok  |
      |        m68k: | TODO |
      |  microblaze: | TODO |
      |        mips: | TODO |
diff --git a/arch/loongarch/include/asm/processor.h 
b/arch/loongarch/include/asm/processor.h
index df927a4318cc..5d8e82b1dce7 100644
--- a/arch/loongarch/include/asm/processor.h
+++ b/arch/loongarch/include/asm/processor.h
@@ -109,7 +109,7 @@ struct loongarch_vdso_info;
   */
  struct thread_struct {
         /* Main processor registers. */
-       unsigned long reg01, reg02, reg03, reg22; /* ra tp sp fp */
+       unsigned long reg01, reg03, reg22; /* ra sp fp */
         unsigned long reg23, reg24, reg25, reg26; /* s0-s3 */
         unsigned long reg27, reg28, reg29, reg30, reg31; /* s4-s8 */

@@ -146,7 +146,6 @@ struct thread_struct {
  #define thread_saved_fp(tsk)   (tsk->thread.sched_cfa)

  #define INIT_THREAD  {                                                 \
-       .reg02 = (unsigned long)&init_task,                             \
         .reg03 = (unsigned long)&init_stack + sizeof(init_stack),       \
  }

diff --git a/arch/loongarch/include/asm/stackframe.h 
b/arch/loongarch/include/asm/stackframe.h
index eeda5dcc982e..770db1084e8d 100644
--- a/arch/loongarch/include/asm/stackframe.h
+++ b/arch/loongarch/include/asm/stackframe.h
@@ -191,15 +191,15 @@
         andi    t0, t0, 0x3     /* extract pplv bit */
         beqz    t0, 9f

+       cfi_st  u0, PT_R21, \docfi
+       csrrd   u0, PERCPU_BASE_KS
+
         la_abs      t1, __entry_task
  #ifdef CONFIG_SMP
-       csrrd       t0, PERCPU_BASE_KS
-       LONG_ADD    t1, t1, t0
+       LONG_ADD    t1, t1, u0
  #endif
         LONG_L      tp, t1, 0

-       cfi_st  u0, PT_R21, \docfi
-       csrrd   u0, PERCPU_BASE_KS
  9:
  #ifdef CONFIG_KGDB
         li.w    t0, CSR_CRMD_WE
diff --git a/arch/loongarch/include/asm/thread_info.h 
b/arch/loongarch/include/asm/thread_info.h
index 2c95a5134976..41eabe4fb647 100644
--- a/arch/loongarch/include/asm/thread_info.h
+++ b/arch/loongarch/include/asm/thread_info.h
@@ -23,6 +23,7 @@
   */
  struct thread_info {
         unsigned long           flags;          /* low level flags */
+       unsigned long           tp_value;       /* thread pointer */
         __u32                   cpu;            /* current CPU */
         int                     preempt_count;  /* 0 => preemptible, <0 
=> BUG */
         struct pt_regs          *regs;
@@ -35,7 +36,7 @@ struct thread_info {
   */
  #define INIT_THREAD_INFO(tsk)                  \
  {                                              \
-       .flags          = 0,                    \
+       .flags          = _TIF_FIXADE,          \
         .cpu            = 0,                    \
         .preempt_count  = INIT_PREEMPT_COUNT,   \
  }
diff --git a/arch/loongarch/kernel/process.c 
b/arch/loongarch/kernel/process.c
index 71c9c6468e60..2f916c4e0e8f 100644
--- a/arch/loongarch/kernel/process.c
+++ b/arch/loongarch/kernel/process.c
@@ -226,9 +226,6 @@ int copy_thread(struct task_struct *p, const struct 
kernel_clone_args *args)
         if (clone_flags & CLONE_SETTLS)
                 childregs->regs[2] = tls;

-       /* Set tp to the new task structure for context switching */
-       p->thread.reg02 = (unsigned long)p;
-
  out:
         ptrace_hw_copy_thread(p);
         clear_tsk_thread_flag(p, TIF_USEDFPU);
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index 5d434c5612ab..7065d195f2da 100644
--- a/arch/loongarch/kernel/setup.c
+++ b/arch/loongarch/kernel/setup.c
@@ -594,6 +594,7 @@ void __init setup_arch(char **cmdline_p)
  {
         cpu_probe();
         unwind_init();
+       entry_task_switch(&init_task);

         init_environ();
         efi_init();
@@ -618,6 +619,4 @@ void __init setup_arch(char **cmdline_p)
  #ifdef CONFIG_KASAN
         kasan_init();
  #endif
-
-       entry_task_switch(&init_task);
  }
diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
index e8b0d2fc2a9a..4b74409a98a3 100644
--- a/arch/loongarch/kernel/smp.c
+++ b/arch/loongarch/kernel/smp.c
@@ -665,6 +665,7 @@ asmlinkage void start_secondary(void)
         set_my_cpu_offset(per_cpu_offset(cpu));

         cpu_probe();
+       entry_task_switch(current);
         constant_clockevent_init();
         loongson_init_secondary();

@@ -688,8 +689,6 @@ asmlinkage void start_secondary(void)
          */
         complete(&cpu_running);

-       entry_task_switch(current);
-
         /*
          * irq will be enabled in loongson_smp_finish(), enabling it too
          * early is dangerous.
diff --git a/arch/loongarch/kernel/switch.S b/arch/loongarch/kernel/switch.S
index 644348e05f6a..33a10221d73a 100644
--- a/arch/loongarch/kernel/switch.S
+++ b/arch/loongarch/kernel/switch.S
@@ -24,8 +24,8 @@ SYM_FUNC_START(__switch_to)
         LONG_SPTR       t1, a0, (THREAD_CSRPRMD - TASK_STRUCT_OFFSET)

         cpu_save_nonscratch a0
-       LONG_SPTR       a3, a0, (THREAD_SCHED_RA - TASK_STRUCT_OFFSET)
-       LONG_SPTR       a4, a0, (THREAD_SCHED_CFA - TASK_STRUCT_OFFSET)
+       LONG_SPTR       a2, a0, (THREAD_SCHED_RA - TASK_STRUCT_OFFSET)
+       LONG_SPTR       a3, a0, (THREAD_SCHED_CFA - TASK_STRUCT_OFFSET)

  #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_SMP)
         la              t7, __stack_chk_guard
@@ -36,7 +36,7 @@ SYM_FUNC_START(__switch_to)
         move    tp, a1
         cpu_restore_nonscratch a1

-       LONG_LPTR       t0, tp, TASK_STACK
+       LONG_LPTR       t0, tp, (TASK_STACK - TASK_STRUCT_OFFSET)
         PTR_LI          t1, _THREAD_SIZE
         PTR_ADD         t0, t0, t1
         set_saved_sp    t0, t1, t2

Here is a test script:

$ cat stress_test.sh
#!/bin/bash
set -e # Exit immediately if any command exits with a non-zero status

echo "=== Starting LoongArch THREAD_INFO_IN_TASK Extreme Stress Testing ==="
START_TIME=$(date)

# Clear existing dmesg buffer and back it up safely to /tmp
dmesg -c > /tmp/init_dmesg.log

# 1. Core Context Switch Stress Test
# Validates __switch_to() assembly and the 32-bit/64-bit structural 
offset calculations.
echo "Running: --context stressor (10 mins)..."
stress-ng --context $(nproc) --timeout 10m --metrics-brief

# 2. Bad System Calls and Exception Path Stress Test
# Validates handle_syscall and the __entry_task recovery path during 
exception entry.
# Fixed option to use the unambiguous '--sysbadaddr'
echo "Running: --sysbadaddr stressor (10 mins)..."
stress-ng --sysbadaddr $(nproc) --timeout 10m

# 3. Page Fault and Stack Stress Test
# Validates register reuse optimization (u0/PERCPU_BASE_KS) within the 
SAVE_SOME macro.
echo "Running: --fault stressor (10 mins)..."
stress-ng --fault $(nproc) --timeout 10m

# 4. Multi-Thread Cloning and Destruction Stress Test
# Validates the preservation of tp_value and the correctness of 
copy_thread().
echo "Running: --pthread stressor (10 mins)..."
stress-ng --pthread $(nproc) --timeout 10m

# 5. Ultimate Mixed Scheduling Matrix Test
# Simulates an extremely hostile system environment with high 
concurrency (20 mins).
echo "Running: Mixed Matrix (--schedmix + --yield) (20 mins)..."
stress-ng --schedmix $(nproc) --yield $(nproc) --timeout 20m --metrics

END_TIME=$(date)
echo "=== All stress-ng commands completed successfully ==="
echo "Start Time: $START_TIME"
echo "End Time:   $END_TIME"

# 2. Automated Kernlog Integrity Check
# Scans dmesg for hidden kernel regressions, warnings, or silent corruption.
echo "=== Analyzing kernel dmesg logs... ==="
if sudo dmesg | grep -qEi "oops|panic|warning|bug|recursive|tainted"; then
     echo "❌ WARNING: System survived but dmesg contains kernel errors! 
Please check the logs below:"
     sudo dmesg | grep -Ei "oops|panic|warning|bug|recursive|tainted" -C 5
else
     echo "✅ SUCCESS: dmesg remains perfectly silent! No Oops, Warnings, 
or Panics found."
     echo "The patch successfully passed the 1-hour stress testing suite!"
fi

Here are the test steps:

sudo dnf install -y stress-ng
chmod +x stress_test.sh
sudo ./stress_test.sh

Here is the test result:

$ sudo ./stress_test.sh
=== Starting LoongArch THREAD_INFO_IN_TASK Extreme Stress Testing ===
Running: --context stressor (10 mins)...
stress-ng: info:  [2719] setting to a 10 mins run per stressor
stress-ng: info:  [2719] dispatching hogs: 8 context
stress-ng: metrc: [2719] stressor       bogo ops real time  usr time 
sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [2719]                           (secs)    (secs) 
(secs)   (real time) (usr+sys time)
stress-ng: metrc: [2719] context        41308615    600.00   2226.94 
2571.93     68847.69        8607.98
stress-ng: info:  [2719] skipped: 0
stress-ng: info:  [2719] passed: 8: context (8)
stress-ng: info:  [2719] failed: 0
stress-ng: info:  [2719] metrics untrustworthy: 0
stress-ng: info:  [2719] successful run completed in 10 mins
Running: --sysbadaddr stressor (10 mins)...
stress-ng: info:  [2742] setting to a 10 mins run per stressor
stress-ng: info:  [2742] dispatching hogs: 8 sysbadaddr
stress-ng: info:  [2742] skipped: 0
stress-ng: info:  [2742] passed: 8: sysbadaddr (8)
stress-ng: info:  [2742] failed: 0
stress-ng: info:  [2742] metrics untrustworthy: 0
stress-ng: info:  [2742] successful run completed in 10 mins
Running: --fault stressor (10 mins)...
stress-ng: info:  [1090732] setting to a 10 mins run per stressor
stress-ng: info:  [1090732] dispatching hogs: 8 fault
stress-ng: info:  [1090732] skipped: 0
stress-ng: info:  [1090732] passed: 8: fault (8)
stress-ng: info:  [1090732] failed: 0
stress-ng: info:  [1090732] metrics untrustworthy: 0
stress-ng: info:  [1090732] successful run completed in 10 mins
Running: --pthread stressor (10 mins)...
stress-ng: info:  [1090760] setting to a 10 mins run per stressor
stress-ng: info:  [1090760] dispatching hogs: 8 pthread
stress-ng: info:  [1090760] skipped: 0
stress-ng: info:  [1090760] passed: 8: pthread (8)
stress-ng: info:  [1090760] failed: 0
stress-ng: info:  [1090760] metrics untrustworthy: 0
stress-ng: info:  [1090760] successful run completed in 10 mins
Running: Mixed Matrix (--schedmix + --yield) (20 mins)...
stress-ng: info:  [3131692] setting to a 20 mins run per stressor
stress-ng: info:  [3131692] dispatching hogs: 8 schedmix, 8 yield
stress-ng: metrc: [3131692] stressor       bogo ops real time  usr time 
sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [3131692]                           (secs)    (secs) 
  (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [3131692] schedmix        6577020   1200.04   1817.35 
  5090.05      5480.67         952.17        71.95          3392
stress-ng: metrc: [3131692] yield         2861718847   1200.00    733.75 
   1937.44   2384764.49     1071325.09        27.82          3360
stress-ng: metrc: [3131692] miscellaneous metrics:
stress-ng: metrc: [3131692] yield               6672.42 ns duration per 
sched_yield call (harmonic mean of 8 instances)
stress-ng: info:  [3131692] skipped: 0
stress-ng: info:  [3131692] passed: 16: schedmix (8) yield (8)
stress-ng: info:  [3131692] failed: 0
stress-ng: info:  [3131692] metrics untrustworthy: 0
stress-ng: info:  [3131692] successful run completed in 20 mins
=== All stress-ng commands completed successfully ===
Start Time: Wed Jun  3 09:03:43 AM CST 2026
End Time:   Wed Jun  3 10:03:44 AM CST 2026
=== Analyzing kernel dmesg logs... ===
✅ SUCCESS: dmesg remains perfectly silent! No Oops, Warnings, or Panics 
found.
The patch successfully passed the 1-hour stress testing suite!

I will send formal patch v1 next week.

Thanks,
Tiezhu


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 1/3] LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK
  2026-06-03  2:30     ` Tiezhu Yang
@ 2026-06-03 14:24       ` Huacai Chen
  2026-06-04  2:26         ` Tiezhu Yang
  0 siblings, 1 reply; 12+ messages in thread
From: Huacai Chen @ 2026-06-03 14:24 UTC (permalink / raw)
  To: Tiezhu Yang; +Cc: loongarch, bpf, linux-arch, linux-kernel

On Wed, Jun 3, 2026 at 10:30 AM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
>
> On 2026/6/1 下午9:46, Huacai Chen wrote:
> > Hi, Tiezhu,
>
> ...
>
> > First of all, you should update
> > Documentation/features/core/thread-info-in-task/arch-support.txt
> > together.
>
> OK, will do it.
>
> >> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> >> index 3b042dbb2c41..ea29d5d17588 100644
> >> --- a/arch/loongarch/Kconfig
> >> +++ b/arch/loongarch/Kconfig
> >> @@ -210,6 +210,7 @@ config LOONGARCH
> >>          select SYSCTL_ARCH_UNALIGN_NO_WARN
> >>          select SYSCTL_EXCEPTION_TRACE
> >>          select SWIOTLB if 64BIT
> >> +       select THREAD_INFO_IN_TASK
> >>          select TRACE_IRQFLAGS_SUPPORT
> >>          select USE_PERCPU_NUMA_NODE_ID
> >>          select USER_STACKTRACE_SUPPORT
>
> ...
>
> >> +#define INIT_THREAD  {                                                 \
> >> +       .reg02 = (unsigned long)&init_task,                             \
> >> +       .reg03 = (unsigned long)&init_stack + sizeof(init_stack),       \
> >>   }
> > Don't remove the old code, just adding reg02 is enough. Though the
> > result is the same, explicitly initialization can give more
> > information.
>
> After thinking it through, the introduction and initialization of
> thread_struct.reg02 (including the assignment in INIT_THREAD and
> p->thread.reg02 = (unsigned long)p; in copy_thread()) are redundant
> and should be removed. The reasons are as follows:
>
> 1. Direct update in __switch_to: In __switch_to within switch.S, the
>     hardware $tp register is updated directly from the next argument
>     (via register a1) using "move tp, a1".
>
> 2. No restoration path: The cpu_restore_nonscratch macro does not
>     include any restoration logic for reg02. This means no assembly
>     or C code ever reads thread_struct.reg02 across the entire context
>     switch path, whether standard or non-standard.
>
> 3. Exception/Syscall recovery relies on per-CPU variables: At exception
>     and system call entry points (e.g., in stackframe.h and entry.S),
>     the recovery of the kernel-space $tp relies entirely on the per-CPU
>     variable __entry_task, which is already properly and explicitly
>     updated during entry_task_switch() and CPU initialization.
>
> Consequently, reg02 is a classic piece of dead code (write-only, never
> read), and trimming this field would keep the architecture code clean.
>
> Regarding the explicit zero-initialization, it is redundant in modern
> kernel development.
>
> For static structures like init_task, any uninitialized fields are
> automatically zeroed out by the compiler according to the C standard.
> Stripping away dozens of lines of ".field = 0" complies with modern
> Linux kernel code-cleaning standards. It makes the macro much shorter
> and highlights the only field that actually requires a special
> runtime value (the kernel stack top in .reg03).
I know it is the same for compilers, I mean the current way gives more
information for humans.

In addition, without reg02, this piece completely has no relationship
with CONFIG_THREAD_INFO_IN_TASK, so please drop it.

>
> For reference, please see how INIT_THREAD is defined in other major
> architectures, where they only initialize what is strictly necessary:
>
> x86
> #ifdef CONFIG_X86_32
> #define INIT_THREAD  {                                                    \
>         .sp0                    = TOP_OF_INIT_STACK,                      \
>         .sysenter_cs            = __KERNEL_CS,                            \
> }
>
> #else
> extern unsigned long __top_init_kernel_stack[];
>
> #define INIT_THREAD {                                                   \
>         .sp     = (unsigned long)&__top_init_kernel_stack,              \
> }
>
> #endif /* CONFIG_X86_64 */
>
> arm64:
> #define INIT_THREAD {                           \
>         .fpsimd_cpu = NR_CPUS,                  \
> }
>
> riscv:
> #define INIT_THREAD {                                   \
>         .sp = sizeof(init_stack) + (long)&init_stack,   \
>         .align_ctl = PR_UNALIGN_NOPRINT,                \
> }
>
> Therefore, a cleaner and more accurate approach is to drop
> reg02 entirely and adopt the slimmed-down INIT_THREAD for
> LoongArch.
>
> >>   struct task_struct;
> >> diff --git a/arch/loongarch/include/asm/ptrace.h b/arch/loongarch/include/asm/ptrace.h
> >> index e5d21e836d99..37f53629d3c7 100644
> >> --- a/arch/loongarch/include/asm/ptrace.h
> >> +++ b/arch/loongarch/include/asm/ptrace.h
> >> @@ -170,12 +170,6 @@ static inline void die_if_kernel(const char *str, struct pt_regs *regs)
> >>                  die(str, regs);
> >>   }
> >>
> >> -#define current_pt_regs()                                              \
> >> -({                                                                     \
> >> -       unsigned long sp = (unsigned long)__builtin_frame_address(0);   \
> >> -       (struct pt_regs *)((sp | (THREAD_SIZE - 1)) + 1) - 1;           \
> >> -})
> >> -
> > This is still correct after CONFIG_THREAD_INFO_IN_TASK, so please keep
> > it. Especially CONFIG_THREAD_INFO_IN_TASK increases the cost of
> > exception/syscalls, keeping this can minimize the performance
> > impaction.
>
> Regarding the suggestion to keep the custom current_pt_regs() macro
> under CONFIG_THREAD_INFO_IN_TASK, it must be completely removed.
> Keeping it would be fundamentally incorrect and dangerous for the
> following reasons:
>
> 1. It becomes logically incorrect:
>
> The old macro relies on aligning up the $sp to the top of the stack
> via bitwise operations to locate the exact position of pt_regs.
>
> With CONFIG_THREAD_INFO_IN_TASK enabled, the thread_info is moved
> off the stack, and the strict coupling between the masked SP and
> the absolute position of pt_regs is broken (especially if features
> like VMAP_STACK are enabled in the future, where stacks are no
> longer naturally aligned to THREAD_SIZE).
>
> Keeping this macro will cause current_pt_regs() to return a
> corrupted/incorrect pointer, leading to inevitable kernel panics
> or silent data corruption.
I don't think so. CONFIG_THREAD_INFO_IN_TASK decouples TP
(thread_info) and SP (stack), but doesn't decouple SP and THREAD_SIZE,
even for the VMAP_STACK case. This is from RISC-V:

#ifdef CONFIG_VMAP_STACK
#define THREAD_ALIGN            (2 * THREAD_SIZE)
#else
#define THREAD_ALIGN            THREAD_SIZE
#endif

Stack is still aligned to THREAD_SIZE.

>
> 2. No real performance benefit:
>
> Once CONFIG_THREAD_INFO_IN_TASK is selected, current is simply
> the hardware $tp register. Fetching pt_regs via task_pt_regs()
> just compiles down to loading the stack pointer from $tp with
> a single memory access, followed by a constant offset adjustment.
>
> This is extremely fast and efficient on LoongArch, and it avoids
> multiple ALU operations (or, add, sub) required by the old
> SP-masking macro.
Do you have performance data for the two cases?

>
> 3. Alignment with other architectures:
>
> Other major architectures (such as x86, arm64, and riscv) all
> completely dropped their custom SP-masking current_pt_regs()
> implementations when moving to THREAD_INFO_IN_TASK, relying
> instead on the standard, safe, and generic task_pt_regs()
> provided by the core kernel wrapper.
>
> Therefore, this custom macro is both broken and insecure under
> the new standard, and it must be removed to ensure kernel
> stability and clean code alignment with upstream.
PowerPC, PA-RISC, ARM32 and UML are the latest archs that support
THREAD_INFO_IN_TASK.

PowerPC in 5.1:
ed1cd6deb013a11959d17a94e35ce159197632da powerpc: Activate
CONFIG_THREAD_INFO_IN_TASK.

PA-RISC in 5.16:
2214c0e77259b420402e279e9ab4277ef320d371 parisc: Move thread_info into
task struct.

ARM32 in 5.16:
18ed1c01a7dd3d7c780b06a49124da237a4c1790 ARM: smp: Enable THREAD_INFO_IN_TASK.

UML in 6.13:
2f681ba4b352cdd5658ed2a96062375a12839755 um: move thread info into task.

None of these commits remove current_pt_regs. Some of them has no
current_pt_regs before THREAD_INFO_IN_TASK, and ARM32 still has its
own implementations in arch/arm/include/asm/ptrace.h now which is
nearly the same as LoongArch.

>
> >>   /* Helpers for working with the user stack pointer */
>
> ...
>
> >> diff --git a/arch/loongarch/include/asm/stackframe.h b/arch/loongarch/include/asm/stackframe.h
> >> index ecc8e50fffa8..eeda5dcc982e 100644
> >> --- a/arch/loongarch/include/asm/stackframe.h
> >> +++ b/arch/loongarch/include/asm/stackframe.h
> >> @@ -191,8 +191,13 @@
> >>          andi    t0, t0, 0x3     /* extract pplv bit */
> >>          beqz    t0, 9f
> >>
> >> -       LONG_LI tp, ~_THREAD_MASK
> >> -       and     tp, tp, sp
> >> +       la_abs      t1, __entry_task
> >> +#ifdef CONFIG_SMP
> >> +       csrrd       t0, PERCPU_BASE_KS
> >> +       LONG_ADD    t1, t1, t0
> >> +#endif
> >> +       LONG_L      tp, t1, 0
> >> +
> >>          cfi_st  u0, PT_R21, \docfi
> >>          csrrd   u0, PERCPU_BASE_KS
> > Move these lines near to "cfi_st  fp, PT_R22, \docfi", then the above
> > "csrrd       t0, PERCPU_BASE_KS" can be removed.
>
> Regarding the suggestion for stackframe.h:
>
> Looking at the original macro context, this is an excellent and
> completely feasible assembly optimization.
>
> By moving the __entry_task restoration right after the preservation
> of u0, we can advance the "csrrd u0, PERCPU_BASE_KS" instruction and
> reuse the loaded u0 register directly for the LONG_ADD on SMP platforms.
> This completely eliminates the need for a duplicate csrrd instruction
> inside the #ifdef CONFIG_SMP block.
>
> The optimized code block would look like this:
>
>          beqz    t0, 9f
>
>          cfi_st  u0, PT_R21, \docfi
>          csrrd   u0, PERCPU_BASE_KS
>
>          la_abs      t1, __entry_task
> #ifdef CONFIG_SMP
>          LONG_ADD    t1, t1, u0
> #endif
>          LONG_L      tp, t1, 0
>
> 9:
>
> Thank you for catching this! I will gladly incorporate this assembly
> optimization into the next version.
>
> >> diff --git a/arch/loongarch/include/asm/switch_to.h b/arch/loongarch/include/asm/switch_to.h
> >> index 5b225aff3ba2..9932429cfe17 100644
> >> --- a/arch/loongarch/include/asm/switch_to.h
> >> +++ b/arch/loongarch/include/asm/switch_to.h
> >> @@ -5,17 +5,25 @@
> >>   #ifndef _ASM_SWITCH_TO_H
> >>   #define _ASM_SWITCH_TO_H
> >>
> >> +#include <linux/percpu.h>
> >> +
> >>   #include <asm/cpu-features.h>
> >>   #include <asm/fpu.h>
> >>   #include <asm/lbt.h>
> >>
> >>   struct task_struct;
> >>
> >> +DECLARE_PER_CPU(struct task_struct *, __entry_task);
> >> +
> >> +static inline void entry_task_switch(struct task_struct *next)
> >> +{
> >> +       __this_cpu_write(__entry_task, next);
> >> +}
> > I love the UML naming, which means rename __entry_task to cpu_tasks
> > and rename entry_task_switch() to set_current(), then move them to
> > current.h.
>
> Regarding the suggestion to rename and move __entry_task and
> entry_task_switch():
>
> Thank you for the suggestion, but after checking the upstream
> kernel implementation, the current naming and placement are
> actually fully aligned with the multi-architecture standards
> established by ARM/ARM64.
>
> A quick grep in the kernel tree reveals that ARM and ARM64
> uses the exact same pattern:
>
> $ grep -rn entry_task arch
> arch/arm/kernel/process.c:40:DEFINE_PER_CPU(struct task_struct *,
> __entry_task);
> arch/arm/include/asm/switch_to.h:31:            __this_cpu_write(__entry_task,
> next);                  \
> arch/arm/include/asm/thread_info.h:40:DECLARE_PER_CPU(struct task_struct
> *, __entry_task);
> arch/arm/include/asm/assembler.h:357:   ldr_this_cpu    \t1, __entry_task,
> \t1, \t2
> arch/arm64/kernel/process.c:609:DEFINE_PER_CPU(struct task_struct *,
> __entry_task);
> arch/arm64/kernel/process.c:611:static void entry_task_switch(struct
> task_struct *next)
> arch/arm64/kernel/process.c:613:        __this_cpu_write(__entry_task, next);
> arch/arm64/kernel/process.c:777:        entry_task_switch(next);
> arch/arm64/kernel/entry.S:223:  ldr_this_cpu    tsk, __entry_task, x20
> arch/arm64/kernel/entry.S:1033: ldr_this_cpu    dst=x0, sym=__entry_task,
> tmp=x1
>
> As we can see:
> 1. Moving to current.h is heavily avoided: Both ARM and ARM64 place
>     these definitions in process.c or switch_to.h, rather than
>     current.h. <asm/current.h> is a highly sensitive, low-level header
>     included almost everywhere. Putting per-CPU macros there would pull
>     in <linux/percpu.h> and <linux/sched.h>, inevitably triggering
>     catastrophic circular header dependency compile errors.
Frankly, I completely don't know what you are doing when I see
"__entry_task" for the first time. Then I see the UML naming and I
know everything immediately.

ARM64 introduces "__entry_task" and in
18ed1c01a7dd3d7c780b06a49124da237a4c1790 ARM32 follows it, and ARM32
only has __entry_task, but no entry_task_switch.

So you can think this naming is a only case rather than a common case.

>
> 2. "__entry_task" and "entry_task_switch" are the precise industry
>     standards: Rather than adopting UML's historical naming style,
>     following the ARM64 conventions makes the code much more canonical
>     and easier for cross-architecture developers to maintain.
>     It clearly expresses that this per-CPU pointer is strictly
>     dedicated to the exception entry path for task recovery.
As said before, UML is the latest one that introduces
CONFIG_THREAD_INFO_IN_TASK,  "cpu_tasks" is not a "historical style",
and x86 uses "current_task" rather than "__entry_task". Both
"cpu_tasks" and "current_task" are better than "__entry_task".

>
> 3. "set_current()" causes mental friction: Across the generic kernel,
>     "current" is universally treated as a read-only concept. Introducing
>     a set_current() helper might mislead developers into thinking they
>     can modify the active task pointer at will, whereas
>     "entry_task_switch" explicitly limits its semantics to the context
>     switch boundary.
set_current() is a very good friend of get_current(). Though "$tp" is
enough for get_current(), from the x86 implementation we know that it
can also get from the per-cpu array (but suboptimal).

Move set_current() to current.h also don't need to include
<linux/percpu.h> and <linux/sched.h>, it only need to forwarding
declare "task_struct" and include <asm/percpu.h>, which is exactly
done in the x86 implementation.

And you needn't worry about the compiling, I have tested before I comment.


Huacai

>
> Therefore, I prefer to keep the current naming and structure in
> switch_to.h to remain consistent with ARM64 and keep the header
> dependencies perfectly clean.
>
> >> +
> >>   /**
> >>    * __switch_to - switch execution of a task
> >>    * @prev:      The task previously executed.
> >>    * @next:      The task to begin executing.
> >> - * @next_ti:   task_thread_info(next).
> >>    * @sched_ra:  __schedule return address.
> >>    * @sched_cfa: __schedule call frame address.
>
> ...
>
> >>   struct thread_info {
> >> -       struct task_struct      *task;          /* main task structure */
> >>          unsigned long           flags;          /* low level flags */
> >> -       unsigned long           tp_value;       /* thread pointer */
> > Don't remove tp_value, it has nothing to do with this patch, instead,
> > it is for future LBT tls.
>
> Regarding the suggestion to keep tp_value in thread_info:
>
> You are completely right. I walked into a misunderstanding that
> tp_value was strictly coupled with the kernel-space $tp tracking.
> Since its true purpose is to preserve the user-space TLS value
> for the LBT (Loongson Binary Translation) extension context,
> it should definitely be decoupled from this THREAD_INFO_IN_TASK
> migration.
>
> I will follow the "one patch does one thing" principle and keep
> tp_value untouched in struct thread_info to avoid breaking any
> future or existing LBT TLS logic.
>
> Thank you for clarifying this! I will restore this field in the
> next version.
>
> >>          __u32                   cpu;            /* current CPU */
> >>          int                     preempt_count;  /* 0 => preemptible, <0 => BUG */
> >>          struct pt_regs          *regs;
> >> @@ -37,20 +35,11 @@ struct thread_info {
> >>    */
> >>   #define INIT_THREAD_INFO(tsk)                  \
> >>   {                                              \
> >> -       .task           = &tsk,                 \
> >> -       .flags          = _TIF_FIXADE,          \
> >> +       .flags          = 0,                    \
> > Don't change flags.
>
> Regarding the suggestion to keep the flags initialization:
>
> You are completely right. Modifying the default flags (changing
> _TIF_FIXADE to 0) is an unrelated side-effect that goes beyond
> the scope of migrating thread_info.
>
> Changing this could alter the alignment error fixing behavior
> for the initial idle task and cause unexpected regressions.
>
> I will follow your advice, leave the flags logic untouched,
> and only remove the deleted ".task = &tsk" member.
>
> Thank you for your critical review!
>
> >>          .cpu            = 0,                    \
> >>          .preempt_count  = INIT_PREEMPT_COUNT,   \
>
> ...
>
> >> @@ -223,6 +226,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> >>          if (clone_flags & CLONE_SETTLS)
> >>                  childregs->regs[2] = tls;
> >>
> >> +       /* Set tp to the new task structure for context switching */
> >> +       p->thread.reg02 = (unsigned long)p;
> > This should be before "if (unlikely(args->fn))" for kernel thread.
>
> Regarding the feedback on process.c and thread_struct:
>
> Actually, after double-checking the core architecture assembly,
> we don't need to worry about where to place
> "p->thread.reg02 = (unsigned long)p;"
> because this line can be completely deleted, and reg02 shouldn't
> be added to thread_struct at all.
>
> As analyzed previously, during context switch, the hardware $tp
> register is updated directly from the C argument "next" via
> "move tp, a1".
>
> Furthermore, the cpu_restore_nonscratch macro contains absolutely
> no logic to read or restore reg02. This means thread_struct.reg02
> has a write-only path and is never read anywhere (even for new
> processes or kernel threads). To keep the architecture code clean
> and avoid misleading future developers, I will completely drop
> reg02 and its assignment from the next version.
>
> >> +
> >>   out:
> >>          ptrace_hw_copy_thread(p);
> >>          clear_tsk_thread_flag(p, TIF_USEDFPU);
>
> ...
>
> >> +
> >> +       entry_task_switch(&init_task);
> > This should be as early as possible, I suggest moving it after unwind_init().
>
> Regarding the suggestion to move entry_task_switch() in setup.c:
>
> You are completely right, and this is a critical catch for early
> boot stability.
>
> Placing entry_task_switch(&init_task) at the very end of
> setup_arch() leaves a massive window during early initialization
> where __entry_task remains NULL.
>
> If any early exception, interrupt, or panic occurs before the end
> of setup_arch(), the exception entry path will load a NULL pointer
> into $tp, triggering an immediate double-fault and completely
> blinding the kernel's ability to print stack traces.
>
> Moving it immediately after unwind_init() ensures that the $tp
> recovery mechanism is armed as early as possible, providing robust
> exception handling support during the rest of the boot sequence.
>
> I will absolutely adopt this suggestion and move it right after
> unwind_init() in the next version. Thank you!
>
> >>   }
> >> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
> >> index 64a048f1b880..e8b0d2fc2a9a 100644
> >> --- a/arch/loongarch/kernel/smp.c
> >> +++ b/arch/loongarch/kernel/smp.c
>
> ...
>
> >> +       entry_task_switch(current);
> > This should be as early as possible, I suggest moving it after cpu_probe().
>
> Regarding the suggestion to move entry_task_switch() in smp.c:
>
> You are completely right, and this is another critical catch for
> early boot stability, this time on the secondary CPU path.
>
> Placing entry_task_switch(current) after complete(&cpu_running)
> leaves a dangerous window during the early C entry of
> start_secondary() where the secondary CPU's __entry_task remains
> uninitialized (NULL). If any early exception or kernel panic
> occurs during the secondary CPU initialization prior to the
> completion signal, the exception entry path will load a NULL
> pointer into $tp, inducing an immediate double-fault and
> completely blinding the kernel's early SMP debugging
> capabilities.
>
> Moving it immediately after cpu_probe() ensures that the
> secondary CPU arms its $tp recovery mechanism at the earliest
> possible stage in its C entry path.
>
> I will absolutely adopt this suggestion and move it right
> after cpu_probe() in the next version. Thank you!
>
> >> +
> >>          /*
> >>           * irq will be enabled in loongson_smp_finish(), enabling it too
> >>           * early is dangerous.
> >> diff --git a/arch/loongarch/kernel/switch.S b/arch/loongarch/kernel/switch.S
> >> index f377d8f5c51a..644348e05f6a 100644
> >> --- a/arch/loongarch/kernel/switch.S
> >> +++ b/arch/loongarch/kernel/switch.S
>
> ...
>
> >> +       LONG_LPTR       t0, tp, TASK_STACK
> > This should be "LONG_LPTR       t0, tp, (TASK_STACK -
> > TASK_STRUCT_OFFSET)", otherwise it is wrong for 32BIT.
>
> Regarding the suggestion for (TASK_STACK - TASK_STRUCT_OFFSET)
> in switch.S:
>
> Thank you for bringing this up! With the definition of
> TASK_STRUCT_OFFSET in mind:
>
> #ifdef CONFIG_64BIT
> #define TASK_STRUCT_OFFSET 0
> #else
> #define TASK_STRUCT_OFFSET 2000
> #endif
>
> This is an incredibly sharp and critical catch for 32BIT
> architecture compatibility.
>
> I will update this line to:
> "LONG_LPTR       t0, tp, (TASK_STACK - TASK_STRUCT_OFFSET)"
> in the next version.
>
> This is the incremental diff based on the original patch:
>
> ----->8-----
> diff --git
> a/Documentation/features/core/thread-info-in-task/arch-support.txt
> b/Documentation/features/core/thread-info-in-task/arch-support.txt
> index f3d744c76061..e26efdfbb6b4 100644
> --- a/Documentation/features/core/thread-info-in-task/arch-support.txt
> +++ b/Documentation/features/core/thread-info-in-task/arch-support.txt
> @@ -12,7 +12,7 @@
>       |       arm64: |  ok  |
>       |        csky: | TODO |
>       |     hexagon: | TODO |
> -    |   loongarch: | TODO |
> +    |   loongarch: |  ok  |
>       |        m68k: | TODO |
>       |  microblaze: | TODO |
>       |        mips: | TODO |
> diff --git a/arch/loongarch/include/asm/processor.h
> b/arch/loongarch/include/asm/processor.h
> index df927a4318cc..5d8e82b1dce7 100644
> --- a/arch/loongarch/include/asm/processor.h
> +++ b/arch/loongarch/include/asm/processor.h
> @@ -109,7 +109,7 @@ struct loongarch_vdso_info;
>    */
>   struct thread_struct {
>          /* Main processor registers. */
> -       unsigned long reg01, reg02, reg03, reg22; /* ra tp sp fp */
> +       unsigned long reg01, reg03, reg22; /* ra sp fp */
>          unsigned long reg23, reg24, reg25, reg26; /* s0-s3 */
>          unsigned long reg27, reg28, reg29, reg30, reg31; /* s4-s8 */
>
> @@ -146,7 +146,6 @@ struct thread_struct {
>   #define thread_saved_fp(tsk)   (tsk->thread.sched_cfa)
>
>   #define INIT_THREAD  {                                                 \
> -       .reg02 = (unsigned long)&init_task,                             \
>          .reg03 = (unsigned long)&init_stack + sizeof(init_stack),       \
>   }
>
> diff --git a/arch/loongarch/include/asm/stackframe.h
> b/arch/loongarch/include/asm/stackframe.h
> index eeda5dcc982e..770db1084e8d 100644
> --- a/arch/loongarch/include/asm/stackframe.h
> +++ b/arch/loongarch/include/asm/stackframe.h
> @@ -191,15 +191,15 @@
>          andi    t0, t0, 0x3     /* extract pplv bit */
>          beqz    t0, 9f
>
> +       cfi_st  u0, PT_R21, \docfi
> +       csrrd   u0, PERCPU_BASE_KS
> +
>          la_abs      t1, __entry_task
>   #ifdef CONFIG_SMP
> -       csrrd       t0, PERCPU_BASE_KS
> -       LONG_ADD    t1, t1, t0
> +       LONG_ADD    t1, t1, u0
>   #endif
>          LONG_L      tp, t1, 0
>
> -       cfi_st  u0, PT_R21, \docfi
> -       csrrd   u0, PERCPU_BASE_KS
>   9:
>   #ifdef CONFIG_KGDB
>          li.w    t0, CSR_CRMD_WE
> diff --git a/arch/loongarch/include/asm/thread_info.h
> b/arch/loongarch/include/asm/thread_info.h
> index 2c95a5134976..41eabe4fb647 100644
> --- a/arch/loongarch/include/asm/thread_info.h
> +++ b/arch/loongarch/include/asm/thread_info.h
> @@ -23,6 +23,7 @@
>    */
>   struct thread_info {
>          unsigned long           flags;          /* low level flags */
> +       unsigned long           tp_value;       /* thread pointer */
>          __u32                   cpu;            /* current CPU */
>          int                     preempt_count;  /* 0 => preemptible, <0
> => BUG */
>          struct pt_regs          *regs;
> @@ -35,7 +36,7 @@ struct thread_info {
>    */
>   #define INIT_THREAD_INFO(tsk)                  \
>   {                                              \
> -       .flags          = 0,                    \
> +       .flags          = _TIF_FIXADE,          \
>          .cpu            = 0,                    \
>          .preempt_count  = INIT_PREEMPT_COUNT,   \
>   }
> diff --git a/arch/loongarch/kernel/process.c
> b/arch/loongarch/kernel/process.c
> index 71c9c6468e60..2f916c4e0e8f 100644
> --- a/arch/loongarch/kernel/process.c
> +++ b/arch/loongarch/kernel/process.c
> @@ -226,9 +226,6 @@ int copy_thread(struct task_struct *p, const struct
> kernel_clone_args *args)
>          if (clone_flags & CLONE_SETTLS)
>                  childregs->regs[2] = tls;
>
> -       /* Set tp to the new task structure for context switching */
> -       p->thread.reg02 = (unsigned long)p;
> -
>   out:
>          ptrace_hw_copy_thread(p);
>          clear_tsk_thread_flag(p, TIF_USEDFPU);
> diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
> index 5d434c5612ab..7065d195f2da 100644
> --- a/arch/loongarch/kernel/setup.c
> +++ b/arch/loongarch/kernel/setup.c
> @@ -594,6 +594,7 @@ void __init setup_arch(char **cmdline_p)
>   {
>          cpu_probe();
>          unwind_init();
> +       entry_task_switch(&init_task);
>
>          init_environ();
>          efi_init();
> @@ -618,6 +619,4 @@ void __init setup_arch(char **cmdline_p)
>   #ifdef CONFIG_KASAN
>          kasan_init();
>   #endif
> -
> -       entry_task_switch(&init_task);
>   }
> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
> index e8b0d2fc2a9a..4b74409a98a3 100644
> --- a/arch/loongarch/kernel/smp.c
> +++ b/arch/loongarch/kernel/smp.c
> @@ -665,6 +665,7 @@ asmlinkage void start_secondary(void)
>          set_my_cpu_offset(per_cpu_offset(cpu));
>
>          cpu_probe();
> +       entry_task_switch(current);
>          constant_clockevent_init();
>          loongson_init_secondary();
>
> @@ -688,8 +689,6 @@ asmlinkage void start_secondary(void)
>           */
>          complete(&cpu_running);
>
> -       entry_task_switch(current);
> -
>          /*
>           * irq will be enabled in loongson_smp_finish(), enabling it too
>           * early is dangerous.
> diff --git a/arch/loongarch/kernel/switch.S b/arch/loongarch/kernel/switch.S
> index 644348e05f6a..33a10221d73a 100644
> --- a/arch/loongarch/kernel/switch.S
> +++ b/arch/loongarch/kernel/switch.S
> @@ -24,8 +24,8 @@ SYM_FUNC_START(__switch_to)
>          LONG_SPTR       t1, a0, (THREAD_CSRPRMD - TASK_STRUCT_OFFSET)
>
>          cpu_save_nonscratch a0
> -       LONG_SPTR       a3, a0, (THREAD_SCHED_RA - TASK_STRUCT_OFFSET)
> -       LONG_SPTR       a4, a0, (THREAD_SCHED_CFA - TASK_STRUCT_OFFSET)
> +       LONG_SPTR       a2, a0, (THREAD_SCHED_RA - TASK_STRUCT_OFFSET)
> +       LONG_SPTR       a3, a0, (THREAD_SCHED_CFA - TASK_STRUCT_OFFSET)
>
>   #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_SMP)
>          la              t7, __stack_chk_guard
> @@ -36,7 +36,7 @@ SYM_FUNC_START(__switch_to)
>          move    tp, a1
>          cpu_restore_nonscratch a1
>
> -       LONG_LPTR       t0, tp, TASK_STACK
> +       LONG_LPTR       t0, tp, (TASK_STACK - TASK_STRUCT_OFFSET)
>          PTR_LI          t1, _THREAD_SIZE
>          PTR_ADD         t0, t0, t1
>          set_saved_sp    t0, t1, t2
>
> Here is a test script:
>
> $ cat stress_test.sh
> #!/bin/bash
> set -e # Exit immediately if any command exits with a non-zero status
>
> echo "=== Starting LoongArch THREAD_INFO_IN_TASK Extreme Stress Testing ==="
> START_TIME=$(date)
>
> # Clear existing dmesg buffer and back it up safely to /tmp
> dmesg -c > /tmp/init_dmesg.log
>
> # 1. Core Context Switch Stress Test
> # Validates __switch_to() assembly and the 32-bit/64-bit structural
> offset calculations.
> echo "Running: --context stressor (10 mins)..."
> stress-ng --context $(nproc) --timeout 10m --metrics-brief
>
> # 2. Bad System Calls and Exception Path Stress Test
> # Validates handle_syscall and the __entry_task recovery path during
> exception entry.
> # Fixed option to use the unambiguous '--sysbadaddr'
> echo "Running: --sysbadaddr stressor (10 mins)..."
> stress-ng --sysbadaddr $(nproc) --timeout 10m
>
> # 3. Page Fault and Stack Stress Test
> # Validates register reuse optimization (u0/PERCPU_BASE_KS) within the
> SAVE_SOME macro.
> echo "Running: --fault stressor (10 mins)..."
> stress-ng --fault $(nproc) --timeout 10m
>
> # 4. Multi-Thread Cloning and Destruction Stress Test
> # Validates the preservation of tp_value and the correctness of
> copy_thread().
> echo "Running: --pthread stressor (10 mins)..."
> stress-ng --pthread $(nproc) --timeout 10m
>
> # 5. Ultimate Mixed Scheduling Matrix Test
> # Simulates an extremely hostile system environment with high
> concurrency (20 mins).
> echo "Running: Mixed Matrix (--schedmix + --yield) (20 mins)..."
> stress-ng --schedmix $(nproc) --yield $(nproc) --timeout 20m --metrics
>
> END_TIME=$(date)
> echo "=== All stress-ng commands completed successfully ==="
> echo "Start Time: $START_TIME"
> echo "End Time:   $END_TIME"
>
> # 2. Automated Kernlog Integrity Check
> # Scans dmesg for hidden kernel regressions, warnings, or silent corruption.
> echo "=== Analyzing kernel dmesg logs... ==="
> if sudo dmesg | grep -qEi "oops|panic|warning|bug|recursive|tainted"; then
>      echo "❌ WARNING: System survived but dmesg contains kernel errors!
> Please check the logs below:"
>      sudo dmesg | grep -Ei "oops|panic|warning|bug|recursive|tainted" -C 5
> else
>      echo "✅ SUCCESS: dmesg remains perfectly silent! No Oops, Warnings,
> or Panics found."
>      echo "The patch successfully passed the 1-hour stress testing suite!"
> fi
>
> Here are the test steps:
>
> sudo dnf install -y stress-ng
> chmod +x stress_test.sh
> sudo ./stress_test.sh
>
> Here is the test result:
>
> $ sudo ./stress_test.sh
> === Starting LoongArch THREAD_INFO_IN_TASK Extreme Stress Testing ===
> Running: --context stressor (10 mins)...
> stress-ng: info:  [2719] setting to a 10 mins run per stressor
> stress-ng: info:  [2719] dispatching hogs: 8 context
> stress-ng: metrc: [2719] stressor       bogo ops real time  usr time
> sys time   bogo ops/s     bogo ops/s
> stress-ng: metrc: [2719]                           (secs)    (secs)
> (secs)   (real time) (usr+sys time)
> stress-ng: metrc: [2719] context        41308615    600.00   2226.94
> 2571.93     68847.69        8607.98
> stress-ng: info:  [2719] skipped: 0
> stress-ng: info:  [2719] passed: 8: context (8)
> stress-ng: info:  [2719] failed: 0
> stress-ng: info:  [2719] metrics untrustworthy: 0
> stress-ng: info:  [2719] successful run completed in 10 mins
> Running: --sysbadaddr stressor (10 mins)...
> stress-ng: info:  [2742] setting to a 10 mins run per stressor
> stress-ng: info:  [2742] dispatching hogs: 8 sysbadaddr
> stress-ng: info:  [2742] skipped: 0
> stress-ng: info:  [2742] passed: 8: sysbadaddr (8)
> stress-ng: info:  [2742] failed: 0
> stress-ng: info:  [2742] metrics untrustworthy: 0
> stress-ng: info:  [2742] successful run completed in 10 mins
> Running: --fault stressor (10 mins)...
> stress-ng: info:  [1090732] setting to a 10 mins run per stressor
> stress-ng: info:  [1090732] dispatching hogs: 8 fault
> stress-ng: info:  [1090732] skipped: 0
> stress-ng: info:  [1090732] passed: 8: fault (8)
> stress-ng: info:  [1090732] failed: 0
> stress-ng: info:  [1090732] metrics untrustworthy: 0
> stress-ng: info:  [1090732] successful run completed in 10 mins
> Running: --pthread stressor (10 mins)...
> stress-ng: info:  [1090760] setting to a 10 mins run per stressor
> stress-ng: info:  [1090760] dispatching hogs: 8 pthread
> stress-ng: info:  [1090760] skipped: 0
> stress-ng: info:  [1090760] passed: 8: pthread (8)
> stress-ng: info:  [1090760] failed: 0
> stress-ng: info:  [1090760] metrics untrustworthy: 0
> stress-ng: info:  [1090760] successful run completed in 10 mins
> Running: Mixed Matrix (--schedmix + --yield) (20 mins)...
> stress-ng: info:  [3131692] setting to a 20 mins run per stressor
> stress-ng: info:  [3131692] dispatching hogs: 8 schedmix, 8 yield
> stress-ng: metrc: [3131692] stressor       bogo ops real time  usr time
> sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
> stress-ng: metrc: [3131692]                           (secs)    (secs)
>   (secs)   (real time) (usr+sys time) instance (%)          (KB)
> stress-ng: metrc: [3131692] schedmix        6577020   1200.04   1817.35
>   5090.05      5480.67         952.17        71.95          3392
> stress-ng: metrc: [3131692] yield         2861718847   1200.00    733.75
>    1937.44   2384764.49     1071325.09        27.82          3360
> stress-ng: metrc: [3131692] miscellaneous metrics:
> stress-ng: metrc: [3131692] yield               6672.42 ns duration per
> sched_yield call (harmonic mean of 8 instances)
> stress-ng: info:  [3131692] skipped: 0
> stress-ng: info:  [3131692] passed: 16: schedmix (8) yield (8)
> stress-ng: info:  [3131692] failed: 0
> stress-ng: info:  [3131692] metrics untrustworthy: 0
> stress-ng: info:  [3131692] successful run completed in 20 mins
> === All stress-ng commands completed successfully ===
> Start Time: Wed Jun  3 09:03:43 AM CST 2026
> End Time:   Wed Jun  3 10:03:44 AM CST 2026
> === Analyzing kernel dmesg logs... ===
> ✅ SUCCESS: dmesg remains perfectly silent! No Oops, Warnings, or Panics
> found.
> The patch successfully passed the 1-hour stress testing suite!
>
> I will send formal patch v1 next week.
>
> Thanks,
> Tiezhu
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 1/3] LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK
  2026-06-03 14:24       ` Huacai Chen
@ 2026-06-04  2:26         ` Tiezhu Yang
  2026-06-04  5:05           ` Tiezhu Yang
  2026-06-04  6:26           ` Huacai Chen
  0 siblings, 2 replies; 12+ messages in thread
From: Tiezhu Yang @ 2026-06-04  2:26 UTC (permalink / raw)
  To: Huacai Chen; +Cc: loongarch, bpf, linux-arch, linux-kernel

On 2026/6/3 下午10:24, Huacai Chen wrote:
> On Wed, Jun 3, 2026 at 10:30 AM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:

...

>> For static structures like init_task, any uninitialized fields are
>> automatically zeroed out by the compiler according to the C standard.
>> Stripping away dozens of lines of ".field = 0" complies with modern
>> Linux kernel code-cleaning standards. It makes the macro much shorter
>> and highlights the only field that actually requires a special
>> runtime value (the kernel stack top in .reg03).
> I know it is the same for compilers, I mean the current way gives more
> information for humans.
> 
> In addition, without reg02, this piece completely has no relationship
> with CONFIG_THREAD_INFO_IN_TASK, so please drop it.

Do you mean keep the original INIT_THREAD as is? No any changes for it?
".reg03 = (unsigned long)&init_stack + sizeof(init_stack)," is not
necessary?

That is to say, do not modify arch/loongarch/include/asm/processor.h?

>> For reference, please see how INIT_THREAD is defined in other major
>> architectures, where they only initialize what is strictly necessary:

...

> None of these commits remove current_pt_regs. Some of them has no
> current_pt_regs before THREAD_INFO_IN_TASK, and ARM32 still has its
> own implementations in arch/arm/include/asm/ptrace.h now which is
> nearly the same as LoongArch.

If you think it should keep current_pt_regs() as is, I am OK, will do
it in the next version.

That is to say, do not modify arch/loongarch/include/asm/ptrace.h?

>>>>    /* Helpers for working with the user stack pointer */

...

> set_current() is a very good friend of get_current(). Though "$tp" is
> enough for get_current(), from the x86 implementation we know that it
> can also get from the per-cpu array (but suboptimal).
> 
> Move set_current() to current.h also don't need to include
> <linux/percpu.h> and <linux/sched.h>, it only need to forwarding
> declare "task_struct" and include <asm/percpu.h>, which is exactly
> done in the x86 implementation.
> 
> And you needn't worry about the compiling, I have tested before I comment.

If you think this is proper, I am OK.

Replace __entry_task with current_task (x86) or cpu_tasks (uml)?
Replace entry_task_switch() with set_current()? or something else?
It seems a little strange for the following code in smp.c:

diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
index 4b74409a98a3..3afdeca7bc2a 100644
--- a/arch/loongarch/kernel/smp.c
+++ b/arch/loongarch/kernel/smp.c
@@ -665,7 +665,7 @@ asmlinkage void start_secondary(void)
         set_my_cpu_offset(per_cpu_offset(cpu));

         cpu_probe();
-       entry_task_switch(current);
+       set_current(current);
         constant_clockevent_init();
         loongson_init_secondary();

Thanks,
Tiezhu


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 1/3] LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK
  2026-06-04  2:26         ` Tiezhu Yang
@ 2026-06-04  5:05           ` Tiezhu Yang
  2026-06-04  6:26           ` Huacai Chen
  1 sibling, 0 replies; 12+ messages in thread
From: Tiezhu Yang @ 2026-06-04  5:05 UTC (permalink / raw)
  To: Huacai Chen; +Cc: loongarch, bpf, linux-arch, linux-kernel

On 2026/6/4 上午10:26, Tiezhu Yang wrote:
> On 2026/6/3 下午10:24, Huacai Chen wrote:
>> On Wed, Jun 3, 2026 at 10:30 AM Tiezhu Yang <yangtiezhu@loongson.cn> 
>> wrote:
> 
> ...
> 
>>> For static structures like init_task, any uninitialized fields are
>>> automatically zeroed out by the compiler according to the C standard.
>>> Stripping away dozens of lines of ".field = 0" complies with modern
>>> Linux kernel code-cleaning standards. It makes the macro much shorter
>>> and highlights the only field that actually requires a special
>>> runtime value (the kernel stack top in .reg03).
>> I know it is the same for compilers, I mean the current way gives more
>> information for humans.
>>
>> In addition, without reg02, this piece completely has no relationship
>> with CONFIG_THREAD_INFO_IN_TASK, so please drop it.
> 
> Do you mean keep the original INIT_THREAD as is? No any changes for it?
> ".reg03 = (unsigned long)&init_stack + sizeof(init_stack)," is not
> necessary?
> 
> That is to say, do not modify arch/loongarch/include/asm/processor.h?

I tested with stress-ng, it works well without modifying processor.h,
so I will keep the original INIT_THREAD in the upstream code.

>>> For reference, please see how INIT_THREAD is defined in other major
>>> architectures, where they only initialize what is strictly necessary:
> 
> ...
> 
>> None of these commits remove current_pt_regs. Some of them has no
>> current_pt_regs before THREAD_INFO_IN_TASK, and ARM32 still has its
>> own implementations in arch/arm/include/asm/ptrace.h now which is
>> nearly the same as LoongArch.
> 
> If you think it should keep current_pt_regs() as is, I am OK, will do
> it in the next version.
> 
> That is to say, do not modify arch/loongarch/include/asm/ptrace.h?

I tested with stress-ng, it works well without modifying ptrace.h,
so I will keep the original current_pt_regs in the upstream code
for now. We can delete it if there is obvious benefit in the future.

>>>>>    /* Helpers for working with the user stack pointer */
> 
> ...
> 
>> set_current() is a very good friend of get_current(). Though "$tp" is
>> enough for get_current(), from the x86 implementation we know that it
>> can also get from the per-cpu array (but suboptimal).
>>
>> Move set_current() to current.h also don't need to include
>> <linux/percpu.h> and <linux/sched.h>, it only need to forwarding
>> declare "task_struct" and include <asm/percpu.h>, which is exactly
>> done in the x86 implementation.
>>
>> And you needn't worry about the compiling, I have tested before I 
>> comment.
> 
> If you think this is proper, I am OK.
> 
> Replace __entry_task with current_task (x86) or cpu_tasks (uml)?
> Replace entry_task_switch() with set_current()? or something else?
> It seems a little strange for the following code in smp.c:

I will use current_task variable and set_current() in the next version
if no more comments.

Thanks,
Tiezhu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 1/3] LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK
  2026-06-04  2:26         ` Tiezhu Yang
  2026-06-04  5:05           ` Tiezhu Yang
@ 2026-06-04  6:26           ` Huacai Chen
  1 sibling, 0 replies; 12+ messages in thread
From: Huacai Chen @ 2026-06-04  6:26 UTC (permalink / raw)
  To: Tiezhu Yang; +Cc: loongarch, bpf, linux-arch, linux-kernel

On Thu, Jun 4, 2026 at 10:26 AM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
>
> On 2026/6/3 下午10:24, Huacai Chen wrote:
> > On Wed, Jun 3, 2026 at 10:30 AM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
>
> ...
>
> >> For static structures like init_task, any uninitialized fields are
> >> automatically zeroed out by the compiler according to the C standard.
> >> Stripping away dozens of lines of ".field = 0" complies with modern
> >> Linux kernel code-cleaning standards. It makes the macro much shorter
> >> and highlights the only field that actually requires a special
> >> runtime value (the kernel stack top in .reg03).
> > I know it is the same for compilers, I mean the current way gives more
> > information for humans.
> >
> > In addition, without reg02, this piece completely has no relationship
> > with CONFIG_THREAD_INFO_IN_TASK, so please drop it.
>
> Do you mean keep the original INIT_THREAD as is? No any changes for it?
> ".reg03 = (unsigned long)&init_stack + sizeof(init_stack)," is not
> necessary?
>
> That is to say, do not modify arch/loongarch/include/asm/processor.h?
Yes.

>
> >> For reference, please see how INIT_THREAD is defined in other major
> >> architectures, where they only initialize what is strictly necessary:
>
> ...
>
> > None of these commits remove current_pt_regs. Some of them has no
> > current_pt_regs before THREAD_INFO_IN_TASK, and ARM32 still has its
> > own implementations in arch/arm/include/asm/ptrace.h now which is
> > nearly the same as LoongArch.
>
> If you think it should keep current_pt_regs() as is, I am OK, will do
> it in the next version.
>
> That is to say, do not modify arch/loongarch/include/asm/ptrace.h?
Yes.

>
> >>>>    /* Helpers for working with the user stack pointer */
>
> ...
>
> > set_current() is a very good friend of get_current(). Though "$tp" is
> > enough for get_current(), from the x86 implementation we know that it
> > can also get from the per-cpu array (but suboptimal).
> >
> > Move set_current() to current.h also don't need to include
> > <linux/percpu.h> and <linux/sched.h>, it only need to forwarding
> > declare "task_struct" and include <asm/percpu.h>, which is exactly
> > done in the x86 implementation.
> >
> > And you needn't worry about the compiling, I have tested before I comment.
>
> If you think this is proper, I am OK.
>
> Replace __entry_task with current_task (x86) or cpu_tasks (uml)?
> Replace entry_task_switch() with set_current()? or something else?
It seems you want to avoid "current", then please use "cpu_tasks"
rather than "current_task", but "set_current" is a friend of
"get_current", just use it.



Huacai

> It seems a little strange for the following code in smp.c:
>
> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
> index 4b74409a98a3..3afdeca7bc2a 100644
> --- a/arch/loongarch/kernel/smp.c
> +++ b/arch/loongarch/kernel/smp.c
> @@ -665,7 +665,7 @@ asmlinkage void start_secondary(void)
>          set_my_cpu_offset(per_cpu_offset(cpu));
>
>          cpu_probe();
> -       entry_task_switch(current);
> +       set_current(current);
>          constant_clockevent_init();
>          loongson_init_secondary();
>
> Thanks,
> Tiezhu
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-06-04  6:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28  7:20 [RFC PATCH 0/3] Move thread_info into task_struct for LoongArch Tiezhu Yang
2026-04-28  7:20 ` [RFC PATCH 1/3] LoongArch: Implement CONFIG_THREAD_INFO_IN_TASK Tiezhu Yang
2026-04-28 20:38   ` sashiko-bot
2026-06-01 13:46   ` Huacai Chen
2026-06-03  2:30     ` Tiezhu Yang
2026-06-03 14:24       ` Huacai Chen
2026-06-04  2:26         ` Tiezhu Yang
2026-06-04  5:05           ` Tiezhu Yang
2026-06-04  6:26           ` Huacai Chen
2026-04-28  7:20 ` [RFC PATCH 2/3] LoongArch: BPF: Inline bpf_get_current_task{_btf}() helpers Tiezhu Yang
2026-04-28  7:20 ` [RFC PATCH 3/3] LoongArch: BPF: Inline bpf_get_smp_processor_id() helper Tiezhu Yang
2026-04-28 21:25   ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.