From: Valentin Schneider <vschneid@redhat.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
rcu@vger.kernel.org, x86@kernel.org,
linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev,
linux-riscv@lists.infradead.org, linux-arch@vger.kernel.org,
linux-trace-kernel@vger.kernel.org
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>,
Nicolas Saenz Julienne <nsaenzju@redhat.com>,
Frederic Weisbecker <frederic@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Andy Lutomirski <luto@kernel.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Josh Poimboeuf <jpoimboe@kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Arnd Bergmann <arnd@arndb.de>,
"Paul E. McKenney" <paulmck@kernel.org>,
Jason Baron <jbaron@akamai.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ard Biesheuvel <ardb@kernel.org>,
Sami Tolvanen <samitolvanen@google.com>,
"David S. Miller" <davem@davemloft.net>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun.feng@gmail.com>,
Uladzislau Rezki <urezki@gmail.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Mel Gorman <mgorman@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
Masahiro Yamada <masahiroy@kernel.org>,
Han Shen <shenhan@google.com>, Rik van Riel <riel@surriel.com>,
Jann Horn <jannh@google.com>,
Dan Carpenter <dan.carpenter@linaro.org>,
Oleg Nesterov <oleg@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Clark Williams <williams@redhat.com>,
Yair Podemsky <ypodemsk@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Daniel Wagner <dwagner@suse.de>, Petr Tesarik <ptesarik@suse.com>,
Shrikanth Hegde <sshegde@linux.ibm.com>
Subject: [PATCH v7 25/31] context_tracking,x86: Defer kernel text patching IPIs
Date: Fri, 14 Nov 2025 16:14:22 +0100 [thread overview]
Message-ID: <20251114151428.1064524-5-vschneid@redhat.com> (raw)
In-Reply-To: <20251114150133.1056710-1-vschneid@redhat.com>
text_poke_bp_batch() sends IPIs to all online CPUs to synchronize
them vs the newly patched instruction. CPUs that are executing in userspace
do not need this synchronization to happen immediately, and this is
actually harmful interference for NOHZ_FULL CPUs.
As the synchronization IPIs are sent using a blocking call, returning from
text_poke_bp_batch() implies all CPUs will observe the patched
instruction(s), and this should be preserved even if the IPI is deferred.
In other words, to safely defer this synchronization, any kernel
instruction leading to the execution of the deferred instruction
sync (ct_work_flush()) must *not* be mutable (patchable) at runtime.
This means we must pay attention to mutable instructions in the early entry
code:
- alternatives
- static keys
- static calls
- all sorts of probes (kprobes/ftrace/bpf/???)
The early entry code leading to ct_work_flush() is noinstr, which gets rid
of the probes.
Alternatives are safe, because it's boot-time patching (before SMP is
even brought up) which is before any IPI deferral can happen.
This leaves us with static keys and static calls.
Any static key used in early entry code should be only forever-enabled at
boot time, IOW __ro_after_init (pretty much like alternatives). Exceptions
are explicitly marked as allowed in .noinstr and will always generate an
IPI when flipped.
The same applies to static calls - they should be only updated at boot
time, or manually marked as an exception.
Objtool is now able to point at static keys/calls that don't respect this,
and all static keys/calls used in early entry code have now been verified
as behaving appropriately.
Leverage the new context_tracking infrastructure to defer sync_core() IPIs
to a target CPU's next kernel entry.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
---
arch/x86/include/asm/context_tracking_work.h | 6 ++-
arch/x86/include/asm/text-patching.h | 1 +
arch/x86/kernel/alternative.c | 39 +++++++++++++++++---
arch/x86/kernel/kprobes/core.c | 4 +-
arch/x86/kernel/kprobes/opt.c | 4 +-
arch/x86/kernel/module.c | 2 +-
include/asm-generic/sections.h | 15 ++++++++
include/linux/context_tracking_work.h | 4 +-
8 files changed, 60 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/context_tracking_work.h b/arch/x86/include/asm/context_tracking_work.h
index 5f3b2d0977235..485b32881fde5 100644
--- a/arch/x86/include/asm/context_tracking_work.h
+++ b/arch/x86/include/asm/context_tracking_work.h
@@ -2,11 +2,13 @@
#ifndef _ASM_X86_CONTEXT_TRACKING_WORK_H
#define _ASM_X86_CONTEXT_TRACKING_WORK_H
+#include <asm/sync_core.h>
+
static __always_inline void arch_context_tracking_work(enum ct_work work)
{
switch (work) {
- case CT_WORK_n:
- // Do work...
+ case CT_WORK_SYNC:
+ sync_core();
break;
case CT_WORK_MAX:
WARN_ON_ONCE(true);
diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index f2d142a0a862e..ca989b0b6c0ae 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -33,6 +33,7 @@ extern void text_poke_apply_relocation(u8 *buf, const u8 * const instr, size_t i
*/
extern void *text_poke(void *addr, const void *opcode, size_t len);
extern void smp_text_poke_sync_each_cpu(void);
+extern void smp_text_poke_sync_each_cpu_deferrable(void);
extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
extern void *text_poke_copy(void *addr, const void *opcode, size_t len);
#define text_poke_copy text_poke_copy
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 8ee5ff547357a..ce8989e02ae77 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -6,6 +6,7 @@
#include <linux/vmalloc.h>
#include <linux/memory.h>
#include <linux/execmem.h>
+#include <linux/context_tracking.h>
#include <asm/text-patching.h>
#include <asm/insn.h>
@@ -2708,9 +2709,24 @@ static void do_sync_core(void *info)
sync_core();
}
+static bool do_sync_core_defer_cond(int cpu, void *info)
+{
+ return !ct_set_cpu_work(cpu, CT_WORK_SYNC);
+}
+
+static void __smp_text_poke_sync_each_cpu(smp_cond_func_t cond_func)
+{
+ on_each_cpu_cond(cond_func, do_sync_core, NULL, 1);
+}
+
void smp_text_poke_sync_each_cpu(void)
{
- on_each_cpu(do_sync_core, NULL, 1);
+ __smp_text_poke_sync_each_cpu(NULL);
+}
+
+void smp_text_poke_sync_each_cpu_deferrable(void)
+{
+ __smp_text_poke_sync_each_cpu(do_sync_core_defer_cond);
}
/*
@@ -2880,6 +2896,7 @@ noinstr int smp_text_poke_int3_handler(struct pt_regs *regs)
*/
void smp_text_poke_batch_finish(void)
{
+ smp_cond_func_t cond = do_sync_core_defer_cond;
unsigned char int3 = INT3_INSN_OPCODE;
unsigned int i;
int do_sync;
@@ -2916,11 +2933,21 @@ void smp_text_poke_batch_finish(void)
* First step: add a INT3 trap to the address that will be patched.
*/
for (i = 0; i < text_poke_array.nr_entries; i++) {
- text_poke_array.vec[i].old = *(u8 *)text_poke_addr(&text_poke_array.vec[i]);
- text_poke(text_poke_addr(&text_poke_array.vec[i]), &int3, INT3_INSN_SIZE);
+ void *addr = text_poke_addr(&text_poke_array.vec[i]);
+
+ /*
+ * There's no safe way to defer IPIs for patching text in
+ * .noinstr, record whether there is at least one such poke.
+ */
+ if (is_kernel_noinstr_text((unsigned long)addr) ||
+ is_module_noinstr_text_address((unsigned long)addr))
+ cond = NULL;
+
+ text_poke_array.vec[i].old = *((u8 *)addr);
+ text_poke(addr, &int3, INT3_INSN_SIZE);
}
- smp_text_poke_sync_each_cpu();
+ __smp_text_poke_sync_each_cpu(cond);
/*
* Second step: update all but the first byte of the patched range.
@@ -2982,7 +3009,7 @@ void smp_text_poke_batch_finish(void)
* not necessary and we'd be safe even without it. But
* better safe than sorry (plus there's not only Intel).
*/
- smp_text_poke_sync_each_cpu();
+ __smp_text_poke_sync_each_cpu(cond);
}
/*
@@ -3003,7 +3030,7 @@ void smp_text_poke_batch_finish(void)
}
if (do_sync)
- smp_text_poke_sync_each_cpu();
+ __smp_text_poke_sync_each_cpu(cond);
/*
* Remove and wait for refs to be zero.
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 3863d7709386f..51957cd737f52 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -790,7 +790,7 @@ void arch_arm_kprobe(struct kprobe *p)
u8 int3 = INT3_INSN_OPCODE;
text_poke(p->addr, &int3, 1);
- smp_text_poke_sync_each_cpu();
+ smp_text_poke_sync_each_cpu_deferrable();
perf_event_text_poke(p->addr, &p->opcode, 1, &int3, 1);
}
@@ -800,7 +800,7 @@ void arch_disarm_kprobe(struct kprobe *p)
perf_event_text_poke(p->addr, &int3, 1, &p->opcode, 1);
text_poke(p->addr, &p->opcode, 1);
- smp_text_poke_sync_each_cpu();
+ smp_text_poke_sync_each_cpu_deferrable();
}
void arch_remove_kprobe(struct kprobe *p)
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 0aabd4c4e2c4f..eada8dca1c2e8 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -513,11 +513,11 @@ void arch_unoptimize_kprobe(struct optimized_kprobe *op)
JMP32_INSN_SIZE - INT3_INSN_SIZE);
text_poke(addr, new, INT3_INSN_SIZE);
- smp_text_poke_sync_each_cpu();
+ smp_text_poke_sync_each_cpu_deferrable();
text_poke(addr + INT3_INSN_SIZE,
new + INT3_INSN_SIZE,
JMP32_INSN_SIZE - INT3_INSN_SIZE);
- smp_text_poke_sync_each_cpu();
+ smp_text_poke_sync_each_cpu_deferrable();
perf_event_text_poke(op->kp.addr, old, JMP32_INSN_SIZE, new, JMP32_INSN_SIZE);
}
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 0ffbae902e2fe..c6c4f391eb465 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -206,7 +206,7 @@ static int write_relocate_add(Elf64_Shdr *sechdrs,
write, apply);
if (!early) {
- smp_text_poke_sync_each_cpu();
+ smp_text_poke_sync_each_cpu_deferrable();
mutex_unlock(&text_mutex);
}
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 0755bc39b0d80..7d2403014010e 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -199,6 +199,21 @@ static inline bool is_kernel_inittext(unsigned long addr)
addr < (unsigned long)_einittext;
}
+
+/**
+ * is_kernel_noinstr_text - checks if the pointer address is located in the
+ * .noinstr section
+ *
+ * @addr: address to check
+ *
+ * Returns: true if the address is located in .noinstr, false otherwise.
+ */
+static inline bool is_kernel_noinstr_text(unsigned long addr)
+{
+ return addr >= (unsigned long)__noinstr_text_start &&
+ addr < (unsigned long)__noinstr_text_end;
+}
+
/**
* __is_kernel_text - checks if the pointer address is located in the
* .text section
diff --git a/include/linux/context_tracking_work.h b/include/linux/context_tracking_work.h
index 3742f461183ac..abd3c196855f4 100644
--- a/include/linux/context_tracking_work.h
+++ b/include/linux/context_tracking_work.h
@@ -5,12 +5,12 @@
#include <linux/bitops.h>
enum {
- CT_WORK_n_OFFSET,
+ CT_WORK_SYNC_OFFSET,
CT_WORK_MAX_OFFSET
};
enum ct_work {
- CT_WORK_n = BIT(CT_WORK_n_OFFSET),
+ CT_WORK_SYNC = BIT(CT_WORK_SYNC_OFFSET),
CT_WORK_MAX = BIT(CT_WORK_MAX_OFFSET)
};
--
2.51.0
next prev parent reply other threads:[~2025-11-14 15:16 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-14 15:01 [PATCH v7 00/31] context_tracking,x86: Defer some IPIs until a user->kernel transition Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 01/31] objtool: Make validate_call() recognize indirect calls to pv_ops[] Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 02/31] objtool: Flesh out warning related to pv_ops[] calls Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 03/31] rcu: Add a small-width RCU watching counter debug option Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 04/31] rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 05/31] jump_label: Add annotations for validating noinstr usage Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 06/31] static_call: Add read-only-after-init static calls Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 07/31] x86/paravirt: Mark pv_sched_clock static call as __ro_after_init Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 08/31] x86/idle: Mark x86_idle " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 09/31] x86/paravirt: Mark pv_steal_clock " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 10/31] riscv/paravirt: " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 11/31] loongarch/paravirt: " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 12/31] arm64/paravirt: " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 13/31] arm/paravirt: " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 14/31] perf/x86/amd: Mark perf_lopwr_cb " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 15/31] sched/clock: Mark sched_clock_running key " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 16/31] KVM: VMX: Mark __kvm_is_using_evmcs static " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 17/31] x86/bugs: Mark cpu_buf_vm_clear key as allowed in .noinstr Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 18/31] x86/speculation/mds: Mark cpu_buf_idle_clear " Valentin Schneider
2025-11-14 15:10 ` [PATCH v7 19/31] sched/clock, x86: Mark __sched_clock_stable " Valentin Schneider
2025-11-14 15:10 ` [PATCH v7 20/31] KVM: VMX: Mark vmx_l1d_should flush and vmx_l1d_flush_cond keys " Valentin Schneider
2025-11-14 15:14 ` [PATCH v7 21/31] stackleack: Mark stack_erasing_bypass key " Valentin Schneider
2025-11-14 15:14 ` [PATCH v7 22/31] objtool: Add noinstr validation for static branches/calls Valentin Schneider
2025-11-14 15:14 ` [PATCH v7 23/31] module: Add MOD_NOINSTR_TEXT mem_type Valentin Schneider
2025-11-14 15:14 ` [PATCH v7 24/31] context-tracking: Introduce work deferral infrastructure Valentin Schneider
2025-11-14 15:14 ` Valentin Schneider [this message]
2025-11-14 15:14 ` [PATCH v7 26/31] x86/jump_label: Add ASM support for static_branch_likely() Valentin Schneider
2025-11-14 15:14 ` [PATCH v7 27/31] x86/mm: Make INVPCID type macros available to assembly Valentin Schneider
2025-11-14 15:14 ` [RFC PATCH v7 28/31] x86/mm/pti: Introduce a kernel/user CR3 software signal Valentin Schneider
2025-11-14 15:14 ` [RFC PATCH v7 29/31] x86/mm/pti: Implement a TLB flush immediately after a switch to kernel CR3 Valentin Schneider
2025-11-19 14:31 ` Andy Lutomirski
2025-11-19 15:44 ` Valentin Schneider
2025-11-19 17:31 ` Andy Lutomirski
2025-11-21 10:12 ` Valentin Schneider
2025-11-14 15:14 ` [RFC PATCH v7 30/31] x86/mm, mm/vmalloc: Defer kernel TLB flush IPIs under CONFIG_COALESCE_TLBI=y Valentin Schneider
2025-11-19 18:31 ` Dave Hansen
2025-11-19 18:33 ` Andy Lutomirski
2025-11-21 17:37 ` Valentin Schneider
2025-11-21 17:50 ` Dave Hansen
2025-11-14 15:14 ` [RFC PATCH v7 31/31] x86/entry: Add an option to coalesce TLB flushes Valentin Schneider
2025-11-14 16:20 ` [PATCH v7 00/31] context_tracking,x86: Defer some IPIs until a user->kernel transition Andy Lutomirski
2025-11-14 17:22 ` Andy Lutomirski
2025-11-14 18:14 ` Paul E. McKenney
2025-11-14 18:45 ` Andy Lutomirski
2025-11-14 20:03 ` Paul E. McKenney
2025-11-15 0:29 ` Andy Lutomirski
2025-11-15 2:30 ` Paul E. McKenney
2025-11-14 20:06 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251114151428.1064524-5-vschneid@redhat.com \
--to=vschneid@redhat.com \
--cc=acme@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=ardb@kernel.org \
--cc=arnd@arndb.de \
--cc=boqun.feng@gmail.com \
--cc=bp@alien8.de \
--cc=dan.carpenter@linaro.org \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=dwagner@suse.de \
--cc=frederic@kernel.org \
--cc=hpa@zytor.com \
--cc=jannh@google.com \
--cc=jbaron@akamai.com \
--cc=joelagnelf@nvidia.com \
--cc=josh@joshtriplett.org \
--cc=jpoimboe@kernel.org \
--cc=juri.lelli@redhat.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=loongarch@lists.linux.dev \
--cc=luto@kernel.org \
--cc=masahiroy@kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mtosatti@redhat.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=nsaenzju@redhat.com \
--cc=oleg@redhat.com \
--cc=paulmck@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=ptesarik@suse.com \
--cc=rcu@vger.kernel.org \
--cc=riel@surriel.com \
--cc=rostedt@goodmis.org \
--cc=samitolvanen@google.com \
--cc=shenhan@google.com \
--cc=sshegde@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=urezki@gmail.com \
--cc=williams@redhat.com \
--cc=x86@kernel.org \
--cc=ypodemsk@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).