From: Valentin Schneider <vschneid@redhat.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
rcu@vger.kernel.org, x86@kernel.org,
linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev,
linux-riscv@lists.infradead.org, linux-arch@vger.kernel.org,
linux-trace-kernel@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Andy Lutomirski <luto@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Josh Poimboeuf <jpoimboe@kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Arnd Bergmann <arnd@arndb.de>,
Frederic Weisbecker <frederic@kernel.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
Jason Baron <jbaron@akamai.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ard Biesheuvel <ardb@kernel.org>,
Sami Tolvanen <samitolvanen@google.com>,
"David S. Miller" <davem@davemloft.net>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun.feng@gmail.com>,
Uladzislau Rezki <urezki@gmail.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Mel Gorman <mgorman@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
Masahiro Yamada <masahiroy@kernel.org>,
Han Shen <shenhan@google.com>, Rik van Riel <riel@surriel.com>,
Jann Horn <jannh@google.com>,
Dan Carpenter <dan.carpenter@linaro.org>,
Oleg Nesterov <oleg@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Clark Williams <williams@redhat.com>,
Yair Podemsky <ypodemsk@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Daniel Wagner <dwagner@suse.de>, Petr Tesarik <ptesarik@suse.com>,
Shrikanth Hegde <sshegde@linux.ibm.com>
Subject: [RFC PATCH v7 29/31] x86/mm/pti: Implement a TLB flush immediately after a switch to kernel CR3
Date: Fri, 14 Nov 2025 16:14:26 +0100 [thread overview]
Message-ID: <20251114151428.1064524-9-vschneid@redhat.com> (raw)
In-Reply-To: <20251114150133.1056710-1-vschneid@redhat.com>
Deferring kernel range TLB flushes requires the guarantee that upon
entering the kernel, no stale entry may be accessed. The simplest way to
provide such a guarantee is to issue an unconditional flush upon switching
to the kernel CR3, as this is the pivoting point where such stale entries
may be accessed.
As this is only relevant to NOHZ_FULL, restrict the mechanism to NOHZ_FULL
CPUs.
Note that the COALESCE_TLBI config option is introduced in a later commit,
when the whole feature is implemented.
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
arch/x86/entry/calling.h | 25 ++++++++++++++++++++++---
arch/x86/kernel/asm-offsets.c | 1 +
2 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 0187c0ea2fddb..620203ef04e9f 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -10,6 +10,7 @@
#include <asm/msr.h>
#include <asm/nospec-branch.h>
#include <asm/jump_label.h>
+#include <asm/invpcid.h>
/*
@@ -171,9 +172,27 @@ For 32-bit we have the following conventions - kernel is built with
andq $(~PTI_USER_PGTABLE_AND_PCID_MASK), \reg
.endm
-.macro COALESCE_TLBI
+.macro COALESCE_TLBI scratch_reg:req
#ifdef CONFIG_COALESCE_TLBI
STATIC_BRANCH_FALSE_LIKELY housekeeping_overridden, .Lend_\@
+ /* No point in doing this for housekeeping CPUs */
+ movslq PER_CPU_VAR(cpu_number), \scratch_reg
+ bt \scratch_reg, tick_nohz_full_mask(%rip)
+ jnc .Lend_tlbi_\@
+
+ ALTERNATIVE "jmp .Lcr4_\@", "", X86_FEATURE_INVPCID
+ movq $(INVPCID_TYPE_ALL_INCL_GLOBAL), \scratch_reg
+ /* descriptor is all zeroes, point at the zero page */
+ invpcid empty_zero_page(%rip), \scratch_reg
+ jmp .Lend_tlbi_\@
+.Lcr4_\@:
+ /* Note: this gives CR4 pinning the finger */
+ movq PER_CPU_VAR(cpu_tlbstate + TLB_STATE_cr4), \scratch_reg
+ xorq $(X86_CR4_PGE), \scratch_reg
+ movq \scratch_reg, %cr4
+ xorq $(X86_CR4_PGE), \scratch_reg
+ movq \scratch_reg, %cr4
+.Lend_tlbi_\@:
movl $1, PER_CPU_VAR(kernel_cr3_loaded)
.Lend_\@:
#endif // CONFIG_COALESCE_TLBI
@@ -192,7 +211,7 @@ For 32-bit we have the following conventions - kernel is built with
mov %cr3, \scratch_reg
ADJUST_KERNEL_CR3 \scratch_reg
mov \scratch_reg, %cr3
- COALESCE_TLBI
+ COALESCE_TLBI \scratch_reg
.Lend_\@:
.endm
@@ -260,7 +279,7 @@ For 32-bit we have the following conventions - kernel is built with
ADJUST_KERNEL_CR3 \scratch_reg
movq \scratch_reg, %cr3
- COALESCE_TLBI
+ COALESCE_TLBI \scratch_reg
.Ldone_\@:
.endm
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 32ba599a51f88..deb92e9c8923d 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -106,6 +106,7 @@ static void __used common(void)
/* TLB state for the entry code */
OFFSET(TLB_STATE_user_pcid_flush_mask, tlb_state, user_pcid_flush_mask);
+ OFFSET(TLB_STATE_cr4, tlb_state, cr4);
/* Layout info for cpu_entry_area */
OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
--
2.51.0
next prev parent reply other threads:[~2025-11-14 15:17 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-14 15:01 [PATCH v7 00/31] context_tracking,x86: Defer some IPIs until a user->kernel transition Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 01/31] objtool: Make validate_call() recognize indirect calls to pv_ops[] Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 02/31] objtool: Flesh out warning related to pv_ops[] calls Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 03/31] rcu: Add a small-width RCU watching counter debug option Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 04/31] rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 05/31] jump_label: Add annotations for validating noinstr usage Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 06/31] static_call: Add read-only-after-init static calls Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 07/31] x86/paravirt: Mark pv_sched_clock static call as __ro_after_init Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 08/31] x86/idle: Mark x86_idle " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 09/31] x86/paravirt: Mark pv_steal_clock " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 10/31] riscv/paravirt: " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 11/31] loongarch/paravirt: " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 12/31] arm64/paravirt: " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 13/31] arm/paravirt: " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 14/31] perf/x86/amd: Mark perf_lopwr_cb " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 15/31] sched/clock: Mark sched_clock_running key " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 16/31] KVM: VMX: Mark __kvm_is_using_evmcs static " Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 17/31] x86/bugs: Mark cpu_buf_vm_clear key as allowed in .noinstr Valentin Schneider
2025-11-14 15:01 ` [PATCH v7 18/31] x86/speculation/mds: Mark cpu_buf_idle_clear " Valentin Schneider
2025-11-14 15:10 ` [PATCH v7 19/31] sched/clock, x86: Mark __sched_clock_stable " Valentin Schneider
2025-11-14 15:10 ` [PATCH v7 20/31] KVM: VMX: Mark vmx_l1d_should flush and vmx_l1d_flush_cond keys " Valentin Schneider
2025-11-14 15:14 ` [PATCH v7 21/31] stackleack: Mark stack_erasing_bypass key " Valentin Schneider
2025-11-14 15:14 ` [PATCH v7 22/31] objtool: Add noinstr validation for static branches/calls Valentin Schneider
2025-11-14 15:14 ` [PATCH v7 23/31] module: Add MOD_NOINSTR_TEXT mem_type Valentin Schneider
2025-11-14 15:14 ` [PATCH v7 24/31] context-tracking: Introduce work deferral infrastructure Valentin Schneider
2025-11-14 15:14 ` [PATCH v7 25/31] context_tracking,x86: Defer kernel text patching IPIs Valentin Schneider
2025-11-14 15:14 ` [PATCH v7 26/31] x86/jump_label: Add ASM support for static_branch_likely() Valentin Schneider
2025-11-14 15:14 ` [PATCH v7 27/31] x86/mm: Make INVPCID type macros available to assembly Valentin Schneider
2025-11-14 15:14 ` [RFC PATCH v7 28/31] x86/mm/pti: Introduce a kernel/user CR3 software signal Valentin Schneider
2025-11-14 15:14 ` Valentin Schneider [this message]
2025-11-19 14:31 ` [RFC PATCH v7 29/31] x86/mm/pti: Implement a TLB flush immediately after a switch to kernel CR3 Andy Lutomirski
2025-11-19 15:44 ` Valentin Schneider
2025-11-19 17:31 ` Andy Lutomirski
2025-11-21 10:12 ` Valentin Schneider
2025-11-14 15:14 ` [RFC PATCH v7 30/31] x86/mm, mm/vmalloc: Defer kernel TLB flush IPIs under CONFIG_COALESCE_TLBI=y Valentin Schneider
2025-11-19 18:31 ` Dave Hansen
2025-11-19 18:33 ` Andy Lutomirski
2025-11-21 17:37 ` Valentin Schneider
2025-11-21 17:50 ` Dave Hansen
2025-11-14 15:14 ` [RFC PATCH v7 31/31] x86/entry: Add an option to coalesce TLB flushes Valentin Schneider
2025-11-14 16:20 ` [PATCH v7 00/31] context_tracking,x86: Defer some IPIs until a user->kernel transition Andy Lutomirski
2025-11-14 17:22 ` Andy Lutomirski
2025-11-14 18:14 ` Paul E. McKenney
2025-11-14 18:45 ` Andy Lutomirski
2025-11-14 20:03 ` Paul E. McKenney
2025-11-15 0:29 ` Andy Lutomirski
2025-11-15 2:30 ` Paul E. McKenney
2025-11-14 20:06 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251114151428.1064524-9-vschneid@redhat.com \
--to=vschneid@redhat.com \
--cc=acme@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=ardb@kernel.org \
--cc=arnd@arndb.de \
--cc=boqun.feng@gmail.com \
--cc=bp@alien8.de \
--cc=dan.carpenter@linaro.org \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=dwagner@suse.de \
--cc=frederic@kernel.org \
--cc=hpa@zytor.com \
--cc=jannh@google.com \
--cc=jbaron@akamai.com \
--cc=joelagnelf@nvidia.com \
--cc=josh@joshtriplett.org \
--cc=jpoimboe@kernel.org \
--cc=juri.lelli@redhat.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=loongarch@lists.linux.dev \
--cc=luto@kernel.org \
--cc=masahiroy@kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mtosatti@redhat.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=oleg@redhat.com \
--cc=paulmck@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=ptesarik@suse.com \
--cc=rcu@vger.kernel.org \
--cc=riel@surriel.com \
--cc=rostedt@goodmis.org \
--cc=samitolvanen@google.com \
--cc=shenhan@google.com \
--cc=sshegde@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=urezki@gmail.com \
--cc=williams@redhat.com \
--cc=x86@kernel.org \
--cc=ypodemsk@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).