* [PATCH v2 1/4] Make enter_lazy_tlb inline on x86
2025-11-08 17:23 [PATCH v2 0/4] Optimize code generation during context switching Xie Yuanbin
@ 2025-11-08 17:23 ` Xie Yuanbin
2025-11-08 17:23 ` [PATCH v2 2/4] Make raw_spin_rq_unlock inline Xie Yuanbin
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: Xie Yuanbin @ 2025-11-08 17:23 UTC (permalink / raw)
To: david, tglx, segher, riel, peterz, linux, mathieu.desnoyers,
paulmck, pjw, palmer, aou, alex, hca, gor, agordeev, borntraeger,
svens, davem, andreas, luto, mingo, bp, dave.hansen, hpa, acme,
namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, james.clark, anna-maria, frederic, juri.lelli,
vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, nathan, nick.desaulniers+lkml, morbo, justinstitt,
qq570070308, thuth, brauner, arnd, jlayton, aalbersh, akpm, david,
lorenzo.stoakes, max.kellermann, ryan.roberts, nysal, urezki
Cc: x86, linux-arm-kernel, linux-kernel, linux-riscv, linux-s390,
sparclinux, linux-perf-users, llvm, will
This function is very short, and is called in the context switching,
which is the hot code path.
Change it to inline function on x86 to optimize performance, just like
its code on other architectures.
Signed-off-by: Xie Yuanbin <qq570070308@gmail.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
---
arch/x86/include/asm/mmu_context.h | 21 ++++++++++++++++++++-
arch/x86/mm/tlb.c | 21 ---------------------
2 files changed, 20 insertions(+), 22 deletions(-)
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 73bf3b1b44e8..263e18bc5b3d 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -136,8 +136,27 @@ static inline void mm_reset_untag_mask(struct mm_struct *mm)
}
#endif
+/*
+ * Please ignore the name of this function. It should be called
+ * switch_to_kernel_thread().
+ *
+ * enter_lazy_tlb() is a hint from the scheduler that we are entering a
+ * kernel thread or other context without an mm. Acceptable implementations
+ * include doing nothing whatsoever, switching to init_mm, or various clever
+ * lazy tricks to try to minimize TLB flushes.
+ *
+ * The scheduler reserves the right to call enter_lazy_tlb() several times
+ * in a row. It will notify us that we're going back to a real mm by
+ * calling switch_mm_irqs_off().
+ */
#define enter_lazy_tlb enter_lazy_tlb
-extern void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk);
+static __always_inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
+{
+ if (this_cpu_read(cpu_tlbstate.loaded_mm) == &init_mm)
+ return;
+
+ this_cpu_write(cpu_tlbstate_shared.is_lazy, true);
+}
#define mm_init_global_asid mm_init_global_asid
extern void mm_init_global_asid(struct mm_struct *mm);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 5d221709353e..cb715e8e75e4 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -970,27 +970,6 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
}
}
-/*
- * Please ignore the name of this function. It should be called
- * switch_to_kernel_thread().
- *
- * enter_lazy_tlb() is a hint from the scheduler that we are entering a
- * kernel thread or other context without an mm. Acceptable implementations
- * include doing nothing whatsoever, switching to init_mm, or various clever
- * lazy tricks to try to minimize TLB flushes.
- *
- * The scheduler reserves the right to call enter_lazy_tlb() several times
- * in a row. It will notify us that we're going back to a real mm by
- * calling switch_mm_irqs_off().
- */
-void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
-{
- if (this_cpu_read(cpu_tlbstate.loaded_mm) == &init_mm)
- return;
-
- this_cpu_write(cpu_tlbstate_shared.is_lazy, true);
-}
-
/*
* Using a temporary mm allows to set temporary mappings that are not accessible
* by other CPUs. Such mappings are needed to perform sensitive memory writes
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v2 2/4] Make raw_spin_rq_unlock inline
2025-11-08 17:23 [PATCH v2 0/4] Optimize code generation during context switching Xie Yuanbin
2025-11-08 17:23 ` [PATCH v2 1/4] Make enter_lazy_tlb inline on x86 Xie Yuanbin
@ 2025-11-08 17:23 ` Xie Yuanbin
2025-11-08 17:23 ` [PATCH v2 3/4] Provide the always inline version of some functions Xie Yuanbin
2025-11-08 17:23 ` [PATCH v2 4/4] Make finish_task_switch and its subfuncs inline in context switching Xie Yuanbin
3 siblings, 0 replies; 13+ messages in thread
From: Xie Yuanbin @ 2025-11-08 17:23 UTC (permalink / raw)
To: david, tglx, segher, riel, peterz, linux, mathieu.desnoyers,
paulmck, pjw, palmer, aou, alex, hca, gor, agordeev, borntraeger,
svens, davem, andreas, luto, mingo, bp, dave.hansen, hpa, acme,
namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, james.clark, anna-maria, frederic, juri.lelli,
vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, nathan, nick.desaulniers+lkml, morbo, justinstitt,
qq570070308, thuth, brauner, arnd, jlayton, aalbersh, akpm, david,
lorenzo.stoakes, max.kellermann, ryan.roberts, nysal, urezki
Cc: x86, linux-arm-kernel, linux-kernel, linux-riscv, linux-s390,
sparclinux, linux-perf-users, llvm, will
This function is short, and is called in some critical hot code paths,
such as finish_lock_switch.
Make it inline to optimize performance.
Signed-off-by: Xie Yuanbin <qq570070308@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
kernel/sched/core.c | 5 -----
kernel/sched/sched.h | 6 +++++-
2 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 81cf8452449a..0e50ef3d819a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -677,11 +677,6 @@ bool raw_spin_rq_trylock(struct rq *rq)
}
}
-void raw_spin_rq_unlock(struct rq *rq)
-{
- raw_spin_unlock(rq_lockp(rq));
-}
-
/*
* double_rq_lock - safely lock two runqueues
*/
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f702fb452eb6..7d305ec10374 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1541,13 +1541,17 @@ static inline void lockdep_assert_rq_held(struct rq *rq)
extern void raw_spin_rq_lock_nested(struct rq *rq, int subclass);
extern bool raw_spin_rq_trylock(struct rq *rq);
-extern void raw_spin_rq_unlock(struct rq *rq);
static inline void raw_spin_rq_lock(struct rq *rq)
{
raw_spin_rq_lock_nested(rq, 0);
}
+static inline void raw_spin_rq_unlock(struct rq *rq)
+{
+ raw_spin_unlock(rq_lockp(rq));
+}
+
static inline void raw_spin_rq_lock_irq(struct rq *rq)
{
local_irq_disable();
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v2 3/4] Provide the always inline version of some functions
2025-11-08 17:23 [PATCH v2 0/4] Optimize code generation during context switching Xie Yuanbin
2025-11-08 17:23 ` [PATCH v2 1/4] Make enter_lazy_tlb inline on x86 Xie Yuanbin
2025-11-08 17:23 ` [PATCH v2 2/4] Make raw_spin_rq_unlock inline Xie Yuanbin
@ 2025-11-08 17:23 ` Xie Yuanbin
2025-11-08 22:14 ` H. Peter Anvin
2025-11-09 11:31 ` Peter Zijlstra
2025-11-08 17:23 ` [PATCH v2 4/4] Make finish_task_switch and its subfuncs inline in context switching Xie Yuanbin
3 siblings, 2 replies; 13+ messages in thread
From: Xie Yuanbin @ 2025-11-08 17:23 UTC (permalink / raw)
To: david, tglx, segher, riel, peterz, linux, mathieu.desnoyers,
paulmck, pjw, palmer, aou, alex, hca, gor, agordeev, borntraeger,
svens, davem, andreas, luto, mingo, bp, dave.hansen, hpa, acme,
namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, james.clark, anna-maria, frederic, juri.lelli,
vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, nathan, nick.desaulniers+lkml, morbo, justinstitt,
qq570070308, thuth, brauner, arnd, jlayton, aalbersh, akpm, david,
lorenzo.stoakes, max.kellermann, ryan.roberts, nysal, urezki
Cc: x86, linux-arm-kernel, linux-kernel, linux-riscv, linux-s390,
sparclinux, linux-perf-users, llvm, will
On critical hot code paths, inline functions can optimize performance.
However, for current compilers, there is no way to request them to inline
at a specific calling point of a function.
Add a always inline version to some functions, so that they can be chosen
when called in hot paths.
Signed-off-by: Xie Yuanbin <qq570070308@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
arch/arm/include/asm/mmu_context.h | 12 +++++++-
arch/s390/include/asm/mmu_context.h | 12 +++++++-
arch/sparc/include/asm/mmu_context_64.h | 12 +++++++-
kernel/sched/core.c | 38 ++++++++++++++++++++++---
4 files changed, 67 insertions(+), 7 deletions(-)
diff --git a/arch/arm/include/asm/mmu_context.h b/arch/arm/include/asm/mmu_context.h
index db2cb06aa8cf..e77b271570c1 100644
--- a/arch/arm/include/asm/mmu_context.h
+++ b/arch/arm/include/asm/mmu_context.h
@@ -80,7 +80,12 @@ static inline void check_and_switch_context(struct mm_struct *mm,
#ifndef MODULE
#define finish_arch_post_lock_switch \
finish_arch_post_lock_switch
-static inline void finish_arch_post_lock_switch(void)
+/*
+ * finish_arch_post_lock_switch_ainline - the always inline version of
+ * finish_arch_post_lock_switch, used for performance sensitive paths.
+ * If unsure, use finish_arch_post_lock_switch instead.
+ */
+static __always_inline void finish_arch_post_lock_switch_ainline(void)
{
struct mm_struct *mm = current->mm;
@@ -99,6 +104,11 @@ static inline void finish_arch_post_lock_switch(void)
preempt_enable_no_resched();
}
}
+
+static inline void finish_arch_post_lock_switch(void)
+{
+ finish_arch_post_lock_switch_ainline();
+}
#endif /* !MODULE */
#endif /* CONFIG_MMU */
diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h
index d9b8501bc93d..577062834906 100644
--- a/arch/s390/include/asm/mmu_context.h
+++ b/arch/s390/include/asm/mmu_context.h
@@ -97,7 +97,12 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
}
#define finish_arch_post_lock_switch finish_arch_post_lock_switch
-static inline void finish_arch_post_lock_switch(void)
+/*
+ * finish_arch_post_lock_switch_ainline - the always inline version of
+ * finish_arch_post_lock_switch, used for performance sensitive paths.
+ * If unsure, use finish_arch_post_lock_switch instead.
+ */
+static __always_inline void finish_arch_post_lock_switch_ainline(void)
{
struct task_struct *tsk = current;
struct mm_struct *mm = tsk->mm;
@@ -120,6 +125,11 @@ static inline void finish_arch_post_lock_switch(void)
local_irq_restore(flags);
}
+static inline void finish_arch_post_lock_switch(void)
+{
+ finish_arch_post_lock_switch_ainline();
+}
+
#define activate_mm activate_mm
static inline void activate_mm(struct mm_struct *prev,
struct mm_struct *next)
diff --git a/arch/sparc/include/asm/mmu_context_64.h b/arch/sparc/include/asm/mmu_context_64.h
index 78bbacc14d2d..ca7019080574 100644
--- a/arch/sparc/include/asm/mmu_context_64.h
+++ b/arch/sparc/include/asm/mmu_context_64.h
@@ -160,7 +160,12 @@ static inline void arch_start_context_switch(struct task_struct *prev)
}
#define finish_arch_post_lock_switch finish_arch_post_lock_switch
-static inline void finish_arch_post_lock_switch(void)
+/*
+ * finish_arch_post_lock_switch_ainline - the always inline version of
+ * finish_arch_post_lock_switch, used for performance sensitive paths.
+ * If unsure, use finish_arch_post_lock_switch instead.
+ */
+static __always_inline void finish_arch_post_lock_switch_ainline(void)
{
/* Restore the state of MCDPER register for the new process
* just switched to.
@@ -185,6 +190,11 @@ static inline void finish_arch_post_lock_switch(void)
}
}
+static inline void finish_arch_post_lock_switch(void)
+{
+ finish_arch_post_lock_switch_ainline();
+}
+
#define mm_untag_mask mm_untag_mask
static inline unsigned long mm_untag_mask(struct mm_struct *mm)
{
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0e50ef3d819a..c50e672e22c4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4884,7 +4884,13 @@ static inline void finish_task(struct task_struct *prev)
smp_store_release(&prev->on_cpu, 0);
}
-static void do_balance_callbacks(struct rq *rq, struct balance_callback *head)
+/*
+ * do_balance_callbacks_ainline - the always inline version of
+ * do_balance_callbacks, used for performance sensitive paths.
+ * If unsure, use do_balance_callbacks instead.
+ */
+static __always_inline void do_balance_callbacks_ainline(struct rq *rq,
+ struct balance_callback *head)
{
void (*func)(struct rq *rq);
struct balance_callback *next;
@@ -4901,6 +4907,11 @@ static void do_balance_callbacks(struct rq *rq, struct balance_callback *head)
}
}
+static void do_balance_callbacks(struct rq *rq, struct balance_callback *head)
+{
+ do_balance_callbacks_ainline(rq, head);
+}
+
static void balance_push(struct rq *rq);
/*
@@ -4949,11 +4960,21 @@ struct balance_callback *splice_balance_callbacks(struct rq *rq)
return __splice_balance_callbacks(rq, true);
}
-static void __balance_callbacks(struct rq *rq)
+/*
+ * __balance_callbacks_ainline - the always inline version of
+ * __balance_callbacks, used for performance sensitive paths.
+ * If unsure, use __balance_callbacks instead.
+ */
+static __always_inline void __balance_callbacks_ainline(struct rq *rq)
{
do_balance_callbacks(rq, __splice_balance_callbacks(rq, false));
}
+static void __balance_callbacks(struct rq *rq)
+{
+ __balance_callbacks_ainline(rq);
+}
+
void balance_callbacks(struct rq *rq, struct balance_callback *head)
{
unsigned long flags;
@@ -5003,7 +5024,8 @@ static inline void finish_lock_switch(struct rq *rq)
#endif
#ifndef finish_arch_post_lock_switch
-# define finish_arch_post_lock_switch() do { } while (0)
+# define finish_arch_post_lock_switch() do { } while (0)
+# define finish_arch_post_lock_switch_ainline() do { } while (0)
#endif
static inline void kmap_local_sched_out(void)
@@ -5050,6 +5072,9 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
/**
* finish_task_switch - clean up after a task-switch
+ * finish_task_switch_ainline - the always inline version of this func
+ * used for performance sensitive paths
+ *
* @prev: the thread we just switched away from.
*
* finish_task_switch must be called after the context switch, paired
@@ -5067,7 +5092,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
* past. 'prev == current' is still correct but we need to recalculate this_rq
* because prev may have moved to another CPU.
*/
-static struct rq *finish_task_switch(struct task_struct *prev)
+static __always_inline struct rq *finish_task_switch_ainline(struct task_struct *prev)
__releases(rq->lock)
{
struct rq *rq = this_rq();
@@ -5159,6 +5184,11 @@ static struct rq *finish_task_switch(struct task_struct *prev)
return rq;
}
+static struct rq *finish_task_switch(struct task_struct *prev)
+{
+ return finish_task_switch_ainline(prev);
+}
+
/**
* schedule_tail - first thing a freshly forked thread must call.
* @prev: the thread we just switched away from.
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH v2 3/4] Provide the always inline version of some functions
2025-11-08 17:23 ` [PATCH v2 3/4] Provide the always inline version of some functions Xie Yuanbin
@ 2025-11-08 22:14 ` H. Peter Anvin
2025-11-09 11:51 ` Peter Zijlstra
2025-11-09 11:31 ` Peter Zijlstra
1 sibling, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2025-11-08 22:14 UTC (permalink / raw)
To: Xie Yuanbin, david, tglx, segher, riel, peterz, linux,
mathieu.desnoyers, paulmck, pjw, palmer, aou, alex, hca, gor,
agordeev, borntraeger, svens, davem, andreas, luto, mingo, bp,
dave.hansen, acme, namhyung, mark.rutland, alexander.shishkin,
jolsa, irogers, adrian.hunter, james.clark, anna-maria, frederic,
juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, nathan, nick.desaulniers+lkml, morbo,
justinstitt, qq570070308, thuth, brauner, arnd, jlayton, aalbersh,
akpm, david, lorenzo.stoakes, max.kellermann, ryan.roberts, nysal,
urezki
Cc: x86, linux-arm-kernel, linux-kernel, linux-riscv, linux-s390,
sparclinux, linux-perf-users, llvm, will
On November 8, 2025 9:23:45 AM PST, Xie Yuanbin <qq570070308@gmail.com> wrote:
>On critical hot code paths, inline functions can optimize performance.
>However, for current compilers, there is no way to request them to inline
>at a specific calling point of a function.
>
>Add a always inline version to some functions, so that they can be chosen
>when called in hot paths.
>
>Signed-off-by: Xie Yuanbin <qq570070308@gmail.com>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Rik van Riel <riel@surriel.com>
>Cc: Segher Boessenkool <segher@kernel.crashing.org>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: Peter Zijlstra <peterz@infradead.org>
>---
> arch/arm/include/asm/mmu_context.h | 12 +++++++-
> arch/s390/include/asm/mmu_context.h | 12 +++++++-
> arch/sparc/include/asm/mmu_context_64.h | 12 +++++++-
> kernel/sched/core.c | 38 ++++++++++++++++++++++---
> 4 files changed, 67 insertions(+), 7 deletions(-)
>
>diff --git a/arch/arm/include/asm/mmu_context.h b/arch/arm/include/asm/mmu_context.h
>index db2cb06aa8cf..e77b271570c1 100644
>--- a/arch/arm/include/asm/mmu_context.h
>+++ b/arch/arm/include/asm/mmu_context.h
>@@ -80,7 +80,12 @@ static inline void check_and_switch_context(struct mm_struct *mm,
> #ifndef MODULE
> #define finish_arch_post_lock_switch \
> finish_arch_post_lock_switch
>-static inline void finish_arch_post_lock_switch(void)
>+/*
>+ * finish_arch_post_lock_switch_ainline - the always inline version of
>+ * finish_arch_post_lock_switch, used for performance sensitive paths.
>+ * If unsure, use finish_arch_post_lock_switch instead.
>+ */
>+static __always_inline void finish_arch_post_lock_switch_ainline(void)
> {
> struct mm_struct *mm = current->mm;
>
>@@ -99,6 +104,11 @@ static inline void finish_arch_post_lock_switch(void)
> preempt_enable_no_resched();
> }
> }
>+
>+static inline void finish_arch_post_lock_switch(void)
>+{
>+ finish_arch_post_lock_switch_ainline();
>+}
> #endif /* !MODULE */
>
> #endif /* CONFIG_MMU */
>diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h
>index d9b8501bc93d..577062834906 100644
>--- a/arch/s390/include/asm/mmu_context.h
>+++ b/arch/s390/include/asm/mmu_context.h
>@@ -97,7 +97,12 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
> }
>
> #define finish_arch_post_lock_switch finish_arch_post_lock_switch
>-static inline void finish_arch_post_lock_switch(void)
>+/*
>+ * finish_arch_post_lock_switch_ainline - the always inline version of
>+ * finish_arch_post_lock_switch, used for performance sensitive paths.
>+ * If unsure, use finish_arch_post_lock_switch instead.
>+ */
>+static __always_inline void finish_arch_post_lock_switch_ainline(void)
> {
> struct task_struct *tsk = current;
> struct mm_struct *mm = tsk->mm;
>@@ -120,6 +125,11 @@ static inline void finish_arch_post_lock_switch(void)
> local_irq_restore(flags);
> }
>
>+static inline void finish_arch_post_lock_switch(void)
>+{
>+ finish_arch_post_lock_switch_ainline();
>+}
>+
> #define activate_mm activate_mm
> static inline void activate_mm(struct mm_struct *prev,
> struct mm_struct *next)
>diff --git a/arch/sparc/include/asm/mmu_context_64.h b/arch/sparc/include/asm/mmu_context_64.h
>index 78bbacc14d2d..ca7019080574 100644
>--- a/arch/sparc/include/asm/mmu_context_64.h
>+++ b/arch/sparc/include/asm/mmu_context_64.h
>@@ -160,7 +160,12 @@ static inline void arch_start_context_switch(struct task_struct *prev)
> }
>
> #define finish_arch_post_lock_switch finish_arch_post_lock_switch
>-static inline void finish_arch_post_lock_switch(void)
>+/*
>+ * finish_arch_post_lock_switch_ainline - the always inline version of
>+ * finish_arch_post_lock_switch, used for performance sensitive paths.
>+ * If unsure, use finish_arch_post_lock_switch instead.
>+ */
>+static __always_inline void finish_arch_post_lock_switch_ainline(void)
> {
> /* Restore the state of MCDPER register for the new process
> * just switched to.
>@@ -185,6 +190,11 @@ static inline void finish_arch_post_lock_switch(void)
> }
> }
>
>+static inline void finish_arch_post_lock_switch(void)
>+{
>+ finish_arch_post_lock_switch_ainline();
>+}
>+
> #define mm_untag_mask mm_untag_mask
> static inline unsigned long mm_untag_mask(struct mm_struct *mm)
> {
>diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>index 0e50ef3d819a..c50e672e22c4 100644
>--- a/kernel/sched/core.c
>+++ b/kernel/sched/core.c
>@@ -4884,7 +4884,13 @@ static inline void finish_task(struct task_struct *prev)
> smp_store_release(&prev->on_cpu, 0);
> }
>
>-static void do_balance_callbacks(struct rq *rq, struct balance_callback *head)
>+/*
>+ * do_balance_callbacks_ainline - the always inline version of
>+ * do_balance_callbacks, used for performance sensitive paths.
>+ * If unsure, use do_balance_callbacks instead.
>+ */
>+static __always_inline void do_balance_callbacks_ainline(struct rq *rq,
>+ struct balance_callback *head)
> {
> void (*func)(struct rq *rq);
> struct balance_callback *next;
>@@ -4901,6 +4907,11 @@ static void do_balance_callbacks(struct rq *rq, struct balance_callback *head)
> }
> }
>
>+static void do_balance_callbacks(struct rq *rq, struct balance_callback *head)
>+{
>+ do_balance_callbacks_ainline(rq, head);
>+}
>+
> static void balance_push(struct rq *rq);
>
> /*
>@@ -4949,11 +4960,21 @@ struct balance_callback *splice_balance_callbacks(struct rq *rq)
> return __splice_balance_callbacks(rq, true);
> }
>
>-static void __balance_callbacks(struct rq *rq)
>+/*
>+ * __balance_callbacks_ainline - the always inline version of
>+ * __balance_callbacks, used for performance sensitive paths.
>+ * If unsure, use __balance_callbacks instead.
>+ */
>+static __always_inline void __balance_callbacks_ainline(struct rq *rq)
> {
> do_balance_callbacks(rq, __splice_balance_callbacks(rq, false));
> }
>
>+static void __balance_callbacks(struct rq *rq)
>+{
>+ __balance_callbacks_ainline(rq);
>+}
>+
> void balance_callbacks(struct rq *rq, struct balance_callback *head)
> {
> unsigned long flags;
>@@ -5003,7 +5024,8 @@ static inline void finish_lock_switch(struct rq *rq)
> #endif
>
> #ifndef finish_arch_post_lock_switch
>-# define finish_arch_post_lock_switch() do { } while (0)
>+# define finish_arch_post_lock_switch() do { } while (0)
>+# define finish_arch_post_lock_switch_ainline() do { } while (0)
> #endif
>
> static inline void kmap_local_sched_out(void)
>@@ -5050,6 +5072,9 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
>
> /**
> * finish_task_switch - clean up after a task-switch
>+ * finish_task_switch_ainline - the always inline version of this func
>+ * used for performance sensitive paths
>+ *
> * @prev: the thread we just switched away from.
> *
> * finish_task_switch must be called after the context switch, paired
>@@ -5067,7 +5092,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
> * past. 'prev == current' is still correct but we need to recalculate this_rq
> * because prev may have moved to another CPU.
> */
>-static struct rq *finish_task_switch(struct task_struct *prev)
>+static __always_inline struct rq *finish_task_switch_ainline(struct task_struct *prev)
> __releases(rq->lock)
> {
> struct rq *rq = this_rq();
>@@ -5159,6 +5184,11 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> return rq;
> }
>
>+static struct rq *finish_task_switch(struct task_struct *prev)
>+{
>+ return finish_task_switch_ainline(prev);
>+}
>+
> /**
> * schedule_tail - first thing a freshly forked thread must call.
> * @prev: the thread we just switched away from.
There is, in fact: you have to have an always_inline version, and wrap it in a noinline version.
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v2 3/4] Provide the always inline version of some functions
2025-11-08 22:14 ` H. Peter Anvin
@ 2025-11-09 11:51 ` Peter Zijlstra
2025-11-10 23:21 ` H. Peter Anvin
0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2025-11-09 11:51 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Xie Yuanbin, david, tglx, segher, riel, linux, mathieu.desnoyers,
paulmck, pjw, palmer, aou, alex, hca, gor, agordeev, borntraeger,
svens, davem, andreas, luto, mingo, bp, dave.hansen, acme,
namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, james.clark, anna-maria, frederic, juri.lelli,
vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, nathan, nick.desaulniers+lkml, morbo, justinstitt,
thuth, brauner, arnd, jlayton, aalbersh, akpm, david,
lorenzo.stoakes, max.kellermann, ryan.roberts, nysal, urezki, x86,
linux-arm-kernel, linux-kernel, linux-riscv, linux-s390,
sparclinux, linux-perf-users, llvm, will
On Sat, Nov 08, 2025 at 02:14:44PM -0800, H. Peter Anvin wrote:
> >+static struct rq *finish_task_switch(struct task_struct *prev)
> >+{
> >+ return finish_task_switch_ainline(prev);
> >+}
> >+
> > /**
> > * schedule_tail - first thing a freshly forked thread must call.
> > * @prev: the thread we just switched away from.
>
> There is, in fact: you have to have an always_inline version, and wrap it in a noinline version.
Yes, but all of this is particularly retarded, there are exactly _2_
callers of this function. Keeping an out-of-line copy for one while
inlineing the other makes 0 sense.
Also, the amount of crap he needs to mark __always_inline doesn't make
much sense to me, is he building with -Os or something?
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v2 3/4] Provide the always inline version of some functions
2025-11-09 11:51 ` Peter Zijlstra
@ 2025-11-10 23:21 ` H. Peter Anvin
0 siblings, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2025-11-10 23:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Xie Yuanbin, david, tglx, segher, riel, linux, mathieu.desnoyers,
paulmck, pjw, palmer, aou, alex, hca, gor, agordeev, borntraeger,
svens, davem, andreas, luto, mingo, bp, dave.hansen, acme,
namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, james.clark, anna-maria, frederic, juri.lelli,
vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, nathan, nick.desaulniers+lkml, morbo, justinstitt,
thuth, brauner, arnd, jlayton, aalbersh, akpm, david,
lorenzo.stoakes, max.kellermann, ryan.roberts, nysal, urezki, x86,
linux-arm-kernel, linux-kernel, linux-riscv, linux-s390,
sparclinux, linux-perf-users, llvm, will
On 2025-11-09 03:51, Peter Zijlstra wrote:
> On Sat, Nov 08, 2025 at 02:14:44PM -0800, H. Peter Anvin wrote:
>
>>> +static struct rq *finish_task_switch(struct task_struct *prev)
>>> +{
>>> + return finish_task_switch_ainline(prev);
>>> +}
>>> +
>>> /**
>>> * schedule_tail - first thing a freshly forked thread must call.
>>> * @prev: the thread we just switched away from.
>>
>> There is, in fact: you have to have an always_inline version, and wrap it in a noinline version.
>
> Yes, but all of this is particularly retarded, there are exactly _2_
> callers of this function. Keeping an out-of-line copy for one while
> inlineing the other makes 0 sense.
>
> Also, the amount of crap he needs to mark __always_inline doesn't make
> much sense to me, is he building with -Os or something?
That's another issue -- unless the second instance of the function is on a
slow path which wants to be isolated from the rest of its function (unlikely.)
I was merely commenting on the claim that there is no way to control inlining
on a call site basis - there is.
-hpa
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/4] Provide the always inline version of some functions
2025-11-08 17:23 ` [PATCH v2 3/4] Provide the always inline version of some functions Xie Yuanbin
2025-11-08 22:14 ` H. Peter Anvin
@ 2025-11-09 11:31 ` Peter Zijlstra
2025-11-09 17:04 ` Xie Yuanbin
1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2025-11-09 11:31 UTC (permalink / raw)
To: Xie Yuanbin
Cc: david, tglx, segher, riel, linux, mathieu.desnoyers, paulmck, pjw,
palmer, aou, alex, hca, gor, agordeev, borntraeger, svens, davem,
andreas, luto, mingo, bp, dave.hansen, hpa, acme, namhyung,
mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
james.clark, anna-maria, frederic, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, nathan,
nick.desaulniers+lkml, morbo, justinstitt, thuth, brauner, arnd,
jlayton, aalbersh, akpm, david, lorenzo.stoakes, max.kellermann,
ryan.roberts, nysal, urezki, x86, linux-arm-kernel, linux-kernel,
linux-riscv, linux-s390, sparclinux, linux-perf-users, llvm, will
On Sun, Nov 09, 2025 at 01:23:45AM +0800, Xie Yuanbin wrote:
> On critical hot code paths, inline functions can optimize performance.
> However, for current compilers, there is no way to request them to inline
> at a specific calling point of a function.
>
> Add a always inline version to some functions, so that they can be chosen
> when called in hot paths.
There isn't a single function in the entire kernel with an _ainline
suffix, while there are a ton of _inline suffixed functions.
On top of that, this function was already marked inline, and your
compiler just chose to not inline them for raisins. Just make the thing
__always_inline and forget, dont make thing extra ugly for no reason.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/4] Provide the always inline version of some functions
2025-11-09 11:31 ` Peter Zijlstra
@ 2025-11-09 17:04 ` Xie Yuanbin
2025-11-09 17:35 ` Arnd Bergmann
0 siblings, 1 reply; 13+ messages in thread
From: Xie Yuanbin @ 2025-11-09 17:04 UTC (permalink / raw)
To: peterz, david, tglx, segher, riel, linux, mathieu.desnoyers,
paulmck, pjw, palmer, aou, alex, hca, gor, agordeev, borntraeger,
svens, davem, andreas, luto, mingo, bp, dave.hansen, hpa, acme,
namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, james.clark, anna-maria, frederic, juri.lelli,
vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, nathan, nick.desaulniers+lkml, morbo, justinstitt,
qq570070308, thuth, brauner, arnd, jlayton, aalbersh, akpm, david,
lorenzo.stoakes, max.kellermann, ryan.roberts, nysal, urezki
Cc: x86, linux-arm-kernel, linux-kernel, linux-riscv, linux-s390,
sparclinux, linux-perf-users, llvm, will
On Sun, 9 Nov 2025 12:31:52 +0100, Peter Zijlstra wrote:
> There isn't a single function in the entire kernel with an _ainline
> suffix, while there are a ton of _inline suffixed functions.
>
> On top of that, this function was already marked inline, and your
> compiler just chose to not inline them for raisins. Just make the thing
> __always_inline and forget, dont make thing extra ugly for no reason.
Simple test: "Make the original functions as __always_inline" VS
"Add the always inline version of these functions (with _ainline suffix)"
compile as OPTIMIZE_FOR_SIZE: the size of bzImage is same
compile as OPTIMIZE_FOR_PERFORMANCE: disassembly of vmlinux is same
Adding the always-inline version of these functions can provide better
guidance for compiler optimization, but it does indeed lead to more
complex code.
The best solution may be to prompt the compiler to always inline at a
specific calling point through some keyword.
I noticed that there are also people discussing this issue on stackerflow
, but it seems that the current compiler does not have such a feature.
Link: https://stackoverflow.com/questions/14571593
I have no objection to either option; it depends on the opinion of the
community maintainers.
Xie Yuanbin
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/4] Provide the always inline version of some functions
2025-11-09 17:04 ` Xie Yuanbin
@ 2025-11-09 17:35 ` Arnd Bergmann
2025-11-10 15:43 ` Xie Yuanbin
0 siblings, 1 reply; 13+ messages in thread
From: Arnd Bergmann @ 2025-11-09 17:35 UTC (permalink / raw)
To: Xie Yuanbin, Peter Zijlstra, David Hildenbrand, Thomas Gleixner,
Segher Boessenkool, riel, Russell King, Mathieu Desnoyers,
Paul E. McKenney, pjw, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Heiko Carstens, gor, Alexander Gordeev, Christian Borntraeger,
Sven Schnelle, David S . Miller, Andreas Larsson, Andy Lutomirski,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
James Clark, Anna-Maria Gleixner, Frederic Weisbecker, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
Benjamin Segall, Mel Gorman, Valentin Schneider,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Thomas Huth, Christian Brauner, Jeff Layton, Andrey Albershteyn,
Andrew Morton, david, Lorenzo Stoakes, max.kellermann,
Ryan Roberts, nysal, Uladzislau Rezki (Sony)
Cc: x86, linux-arm-kernel, linux-kernel, linux-riscv, linux-s390,
sparclinux, linux-perf-users, llvm, Will Deacon
On Sun, Nov 9, 2025, at 18:04, Xie Yuanbin wrote:
> On Sun, 9 Nov 2025 12:31:52 +0100, Peter Zijlstra wrote:
> Adding the always-inline version of these functions can provide better
> guidance for compiler optimization, but it does indeed lead to more
> complex code.
> The best solution may be to prompt the compiler to always inline at a
> specific calling point through some keyword.
> I noticed that there are also people discussing this issue on stackerflow
> , but it seems that the current compiler does not have such a feature.
> Link: https://stackoverflow.com/questions/14571593
You can mark the caller as __attribute__((flatten)) to force all
functions to be inlined into that one if possible. I don't know
if that would be helpful or desired here though.
Arnd
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v2 3/4] Provide the always inline version of some functions
2025-11-09 17:35 ` Arnd Bergmann
@ 2025-11-10 15:43 ` Xie Yuanbin
0 siblings, 0 replies; 13+ messages in thread
From: Xie Yuanbin @ 2025-11-10 15:43 UTC (permalink / raw)
To: arnd, david, tglx, segher, riel, peterz, linux, mathieu.desnoyers,
paulmck, pjw, palmer, aou, alex, hca, gor, agordeev, borntraeger,
svens, davem, andreas, luto, mingo, bp, dave.hansen, hpa, acme,
namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, james.clark, anna-maria, frederic, juri.lelli,
vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, nathan, nick.desaulniers+lkml, morbo, justinstitt,
qq570070308, thuth, brauner, jlayton, aalbersh, akpm, david,
lorenzo.stoakes, max.kellermann, ryan.roberts, nysal, urezki
Cc: x86, linux-arm-kernel, linux-kernel, linux-riscv, linux-s390,
sparclinux, linux-perf-users, llvm, will
On Sun, 09 Nov 2025 18:35:23 +0100, Arnd Bergmann wrote:
> You can mark the caller as __attribute__((flatten)) to force all
> functions to be inlined into that one if possible. I don't know
> if that would be helpful or desired here though.
Thanks, you made me aware of this attribute for the first time, it is
really a great attribute, and it has been already added to the kernel
public header. However, it's rarely used in the kernel, which I really
think is a pity.
Perhaps, I can try adding this attribute to some hot functions in
another patch. For this patch, I think this is sufficient. There are some
cold if branches inside `__schedule`, __flatten will inline the function
calls in this part of the code, which may cause some unexpected
performance degradation.
> Arnd
Xie Yuanbin
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 4/4] Make finish_task_switch and its subfuncs inline in context switching
2025-11-08 17:23 [PATCH v2 0/4] Optimize code generation during context switching Xie Yuanbin
` (2 preceding siblings ...)
2025-11-08 17:23 ` [PATCH v2 3/4] Provide the always inline version of some functions Xie Yuanbin
@ 2025-11-08 17:23 ` Xie Yuanbin
2025-11-09 11:35 ` Peter Zijlstra
3 siblings, 1 reply; 13+ messages in thread
From: Xie Yuanbin @ 2025-11-08 17:23 UTC (permalink / raw)
To: david, tglx, segher, riel, peterz, linux, mathieu.desnoyers,
paulmck, pjw, palmer, aou, alex, hca, gor, agordeev, borntraeger,
svens, davem, andreas, luto, mingo, bp, dave.hansen, hpa, acme,
namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, james.clark, anna-maria, frederic, juri.lelli,
vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, nathan, nick.desaulniers+lkml, morbo, justinstitt,
qq570070308, thuth, brauner, arnd, jlayton, aalbersh, akpm, david,
lorenzo.stoakes, max.kellermann, ryan.roberts, nysal, urezki
Cc: x86, linux-arm-kernel, linux-kernel, linux-riscv, linux-s390,
sparclinux, linux-perf-users, llvm, will
`finish_task_switch` is a hot path in context switching, and due to
possible mitigations inside switch_mm, performance here is greatly
affected by function calls and branch jumps. Make it inline to optimize
the performance.
After `finish_task_switch` is changed to an inline function, the number of
calls to the subfunctions (called by `finish_task_switch`) increases in
this translation unit due to the inline expansion of `finish_task_switch`.
Due to compiler optimization strategies, these functions may transition
from inline functions to non inline functions, which can actually lead to
performance degradation.
Make the subfunctions of finish_task_stwitch inline to prevent
degradation.
Time spent on calling finish_task_switch (rdtsc):
| compiler && appended cmdline | without patch | with patch |
| gcc + NA | 13.93 - 13.94 | 12.39 - 12.44 |
| gcc + "spectre_v2_user=on" | 24.69 - 24.85 | 13.68 - 13.73 |
| clang + NA | 13.89 - 13.90 | 12.70 - 12.73 |
| clang + "spectre_v2_user=on" | 29.00 - 29.02 | 18.88 - 18.97 |
Signed-off-by: Xie Yuanbin <qq570070308@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
arch/riscv/include/asm/sync_core.h | 2 +-
arch/x86/include/asm/sync_core.h | 2 +-
include/linux/perf_event.h | 2 +-
include/linux/sched/mm.h | 10 +++++-----
include/linux/tick.h | 4 ++--
include/linux/vtime.h | 8 ++++----
kernel/sched/core.c | 18 +++++++++---------
kernel/sched/sched.h | 20 ++++++++++----------
8 files changed, 33 insertions(+), 33 deletions(-)
diff --git a/arch/riscv/include/asm/sync_core.h b/arch/riscv/include/asm/sync_core.h
index 9153016da8f1..2fe6b7fe6b12 100644
--- a/arch/riscv/include/asm/sync_core.h
+++ b/arch/riscv/include/asm/sync_core.h
@@ -6,7 +6,7 @@
* RISC-V implements return to user-space through an xRET instruction,
* which is not core serializing.
*/
-static inline void sync_core_before_usermode(void)
+static __always_inline void sync_core_before_usermode(void)
{
asm volatile ("fence.i" ::: "memory");
}
diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
index 96bda43538ee..4b55fa353bb5 100644
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -93,7 +93,7 @@ static __always_inline void sync_core(void)
* to user-mode. x86 implements return to user-space through sysexit,
* sysrel, and sysretq, which are not core serializing.
*/
-static inline void sync_core_before_usermode(void)
+static __always_inline void sync_core_before_usermode(void)
{
/* With PTI, we unconditionally serialize before running user code. */
if (static_cpu_has(X86_FEATURE_PTI))
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 9870d768db4c..d9de20c20f38 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1624,7 +1624,7 @@ static inline void perf_event_task_migrate(struct task_struct *task)
task->sched_migrated = 1;
}
-static inline void perf_event_task_sched_in(struct task_struct *prev,
+static __always_inline void perf_event_task_sched_in(struct task_struct *prev,
struct task_struct *task)
{
if (static_branch_unlikely(&perf_sched_events))
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 0e1d73955fa5..e7787a6e7d22 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -44,7 +44,7 @@ static inline void smp_mb__after_mmgrab(void)
extern void __mmdrop(struct mm_struct *mm);
-static inline void mmdrop(struct mm_struct *mm)
+static __always_inline void mmdrop(struct mm_struct *mm)
{
/*
* The implicit full barrier implied by atomic_dec_and_test() is
@@ -71,14 +71,14 @@ static inline void __mmdrop_delayed(struct rcu_head *rhp)
* Invoked from finish_task_switch(). Delegates the heavy lifting on RT
* kernels via RCU.
*/
-static inline void mmdrop_sched(struct mm_struct *mm)
+static __always_inline void mmdrop_sched(struct mm_struct *mm)
{
/* Provides a full memory barrier. See mmdrop() */
if (atomic_dec_and_test(&mm->mm_count))
call_rcu(&mm->delayed_drop, __mmdrop_delayed);
}
#else
-static inline void mmdrop_sched(struct mm_struct *mm)
+static __always_inline void mmdrop_sched(struct mm_struct *mm)
{
mmdrop(mm);
}
@@ -104,7 +104,7 @@ static inline void mmdrop_lazy_tlb(struct mm_struct *mm)
}
}
-static inline void mmdrop_lazy_tlb_sched(struct mm_struct *mm)
+static __always_inline void mmdrop_lazy_tlb_sched(struct mm_struct *mm)
{
if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_REFCOUNT))
mmdrop_sched(mm);
@@ -531,7 +531,7 @@ enum {
#include <asm/membarrier.h>
#endif
-static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
+static __always_inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
{
/*
* The atomic_read() below prevents CSE. The following should
diff --git a/include/linux/tick.h b/include/linux/tick.h
index ac76ae9fa36d..fce16aa10ba2 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -175,7 +175,7 @@ extern cpumask_var_t tick_nohz_full_mask;
#ifdef CONFIG_NO_HZ_FULL
extern bool tick_nohz_full_running;
-static inline bool tick_nohz_full_enabled(void)
+static __always_inline bool tick_nohz_full_enabled(void)
{
if (!context_tracking_enabled())
return false;
@@ -299,7 +299,7 @@ static inline void __tick_nohz_task_switch(void) { }
static inline void tick_nohz_full_setup(cpumask_var_t cpumask) { }
#endif
-static inline void tick_nohz_task_switch(void)
+static __always_inline void tick_nohz_task_switch(void)
{
if (tick_nohz_full_enabled())
__tick_nohz_task_switch();
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 29dd5b91dd7d..428464bb81b3 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -67,24 +67,24 @@ static __always_inline void vtime_account_guest_exit(void)
* For now vtime state is tied to context tracking. We might want to decouple
* those later if necessary.
*/
-static inline bool vtime_accounting_enabled(void)
+static __always_inline bool vtime_accounting_enabled(void)
{
return context_tracking_enabled();
}
-static inline bool vtime_accounting_enabled_cpu(int cpu)
+static __always_inline bool vtime_accounting_enabled_cpu(int cpu)
{
return context_tracking_enabled_cpu(cpu);
}
-static inline bool vtime_accounting_enabled_this_cpu(void)
+static __always_inline bool vtime_accounting_enabled_this_cpu(void)
{
return context_tracking_enabled_this_cpu();
}
extern void vtime_task_switch_generic(struct task_struct *prev);
-static inline void vtime_task_switch(struct task_struct *prev)
+static __always_inline void vtime_task_switch(struct task_struct *prev)
{
if (vtime_accounting_enabled_this_cpu())
vtime_task_switch_generic(prev);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c50e672e22c4..cba8b93ed37b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4868,7 +4868,7 @@ static inline void prepare_task(struct task_struct *next)
WRITE_ONCE(next->on_cpu, 1);
}
-static inline void finish_task(struct task_struct *prev)
+static __always_inline void finish_task(struct task_struct *prev)
{
/*
* This must be the very last reference to @prev from this CPU. After
@@ -4930,7 +4930,7 @@ struct balance_callback balance_push_callback = {
.func = balance_push,
};
-static inline struct balance_callback *
+static __always_inline struct balance_callback *
__splice_balance_callbacks(struct rq *rq, bool split)
{
struct balance_callback *head = rq->balance_callback;
@@ -4967,12 +4967,12 @@ struct balance_callback *splice_balance_callbacks(struct rq *rq)
*/
static __always_inline void __balance_callbacks_ainline(struct rq *rq)
{
- do_balance_callbacks(rq, __splice_balance_callbacks(rq, false));
+ do_balance_callbacks_ainline(rq, __splice_balance_callbacks(rq, false));
}
static void __balance_callbacks(struct rq *rq)
{
- __balance_callbacks_ainline(rq);
+ do_balance_callbacks(rq, __splice_balance_callbacks(rq, false));
}
void balance_callbacks(struct rq *rq, struct balance_callback *head)
@@ -5003,7 +5003,7 @@ prepare_lock_switch(struct rq *rq, struct task_struct *next, struct rq_flags *rf
#endif
}
-static inline void finish_lock_switch(struct rq *rq)
+static __always_inline void finish_lock_switch(struct rq *rq)
{
/*
* If we are tracking spinlock dependencies then we have to
@@ -5011,7 +5011,7 @@ static inline void finish_lock_switch(struct rq *rq)
* prev into current:
*/
spin_acquire(&__rq_lockp(rq)->dep_map, 0, 0, _THIS_IP_);
- __balance_callbacks(rq);
+ __balance_callbacks_ainline(rq);
raw_spin_rq_unlock_irq(rq);
}
@@ -5036,7 +5036,7 @@ static inline void kmap_local_sched_out(void)
#endif
}
-static inline void kmap_local_sched_in(void)
+static __always_inline void kmap_local_sched_in(void)
{
#ifdef CONFIG_KMAP_LOCAL
if (unlikely(current->kmap_ctrl.idx))
@@ -5134,7 +5134,7 @@ static __always_inline struct rq *finish_task_switch_ainline(struct task_struct
finish_task(prev);
tick_nohz_task_switch();
finish_lock_switch(rq);
- finish_arch_post_lock_switch();
+ finish_arch_post_lock_switch_ainline();
kcov_finish_switch(current);
/*
* kmap_local_sched_out() is invoked with rq::lock held and
@@ -5289,7 +5289,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
switch_to(prev, next, prev);
barrier();
- return finish_task_switch(prev);
+ return finish_task_switch_ainline(prev);
}
/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7d305ec10374..ec301a91cb43 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1374,12 +1374,12 @@ static inline struct cpumask *sched_group_span(struct sched_group *sg);
DECLARE_STATIC_KEY_FALSE(__sched_core_enabled);
-static inline bool sched_core_enabled(struct rq *rq)
+static __always_inline bool sched_core_enabled(struct rq *rq)
{
return static_branch_unlikely(&__sched_core_enabled) && rq->core_enabled;
}
-static inline bool sched_core_disabled(void)
+static __always_inline bool sched_core_disabled(void)
{
return !static_branch_unlikely(&__sched_core_enabled);
}
@@ -1388,7 +1388,7 @@ static inline bool sched_core_disabled(void)
* Be careful with this function; not for general use. The return value isn't
* stable unless you actually hold a relevant rq->__lock.
*/
-static inline raw_spinlock_t *rq_lockp(struct rq *rq)
+static __always_inline raw_spinlock_t *rq_lockp(struct rq *rq)
{
if (sched_core_enabled(rq))
return &rq->core->__lock;
@@ -1396,7 +1396,7 @@ static inline raw_spinlock_t *rq_lockp(struct rq *rq)
return &rq->__lock;
}
-static inline raw_spinlock_t *__rq_lockp(struct rq *rq)
+static __always_inline raw_spinlock_t *__rq_lockp(struct rq *rq)
{
if (rq->core_enabled)
return &rq->core->__lock;
@@ -1487,12 +1487,12 @@ static inline bool sched_core_disabled(void)
return true;
}
-static inline raw_spinlock_t *rq_lockp(struct rq *rq)
+static __always_inline raw_spinlock_t *rq_lockp(struct rq *rq)
{
return &rq->__lock;
}
-static inline raw_spinlock_t *__rq_lockp(struct rq *rq)
+static __always_inline raw_spinlock_t *__rq_lockp(struct rq *rq)
{
return &rq->__lock;
}
@@ -1542,23 +1542,23 @@ static inline void lockdep_assert_rq_held(struct rq *rq)
extern void raw_spin_rq_lock_nested(struct rq *rq, int subclass);
extern bool raw_spin_rq_trylock(struct rq *rq);
-static inline void raw_spin_rq_lock(struct rq *rq)
+static __always_inline void raw_spin_rq_lock(struct rq *rq)
{
raw_spin_rq_lock_nested(rq, 0);
}
-static inline void raw_spin_rq_unlock(struct rq *rq)
+static __always_inline void raw_spin_rq_unlock(struct rq *rq)
{
raw_spin_unlock(rq_lockp(rq));
}
-static inline void raw_spin_rq_lock_irq(struct rq *rq)
+static __always_inline void raw_spin_rq_lock_irq(struct rq *rq)
{
local_irq_disable();
raw_spin_rq_lock(rq);
}
-static inline void raw_spin_rq_unlock_irq(struct rq *rq)
+static __always_inline void raw_spin_rq_unlock_irq(struct rq *rq)
{
raw_spin_rq_unlock(rq);
local_irq_enable();
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH v2 4/4] Make finish_task_switch and its subfuncs inline in context switching
2025-11-08 17:23 ` [PATCH v2 4/4] Make finish_task_switch and its subfuncs inline in context switching Xie Yuanbin
@ 2025-11-09 11:35 ` Peter Zijlstra
0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2025-11-09 11:35 UTC (permalink / raw)
To: Xie Yuanbin
Cc: david, tglx, segher, riel, linux, mathieu.desnoyers, paulmck, pjw,
palmer, aou, alex, hca, gor, agordeev, borntraeger, svens, davem,
andreas, luto, mingo, bp, dave.hansen, hpa, acme, namhyung,
mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
james.clark, anna-maria, frederic, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, nathan,
nick.desaulniers+lkml, morbo, justinstitt, thuth, brauner, arnd,
jlayton, aalbersh, akpm, david, lorenzo.stoakes, max.kellermann,
ryan.roberts, nysal, urezki, x86, linux-arm-kernel, linux-kernel,
linux-riscv, linux-s390, sparclinux, linux-perf-users, llvm, will
On Sun, Nov 09, 2025 at 01:23:46AM +0800, Xie Yuanbin wrote:
> `finish_task_switch` is a hot path in context switching, and due to
> possible mitigations inside switch_mm, performance here is greatly
> affected by function calls and branch jumps. Make it inline to optimize
> the performance.
>
> After `finish_task_switch` is changed to an inline function, the number of
> calls to the subfunctions (called by `finish_task_switch`) increases in
> this translation unit due to the inline expansion of `finish_task_switch`.
> Due to compiler optimization strategies, these functions may transition
> from inline functions to non inline functions, which can actually lead to
> performance degradation.
>
> Make the subfunctions of finish_task_stwitch inline to prevent
> degradation.
Yeah, try again without that _ainline garbage.
^ permalink raw reply [flat|nested] 13+ messages in thread