From: Peter Zijlstra <peterz@infradead.org>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
Valentin Schneider <vschneid@redhat.com>,
linux-kernel@vger.kernel.org, virtualization@lists.linux.dev,
linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev,
linux-riscv@lists.infradead.org,
linux-perf-users@vger.kernel.org, kvm@vger.kernel.org,
linux-arch@vger.kernel.org, linux-modules@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, rcu@vger.kernel.org,
linux-hardening@vger.kernel.org, linux-kselftest@vger.kernel.org,
bpf@vger.kernel.org, Juri Lelli <juri.lelli@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Yair Podemsky <ypodemsk@redhat.com>,
Josh Poimboeuf <jpoimboe@kernel.org>,
Daniel Wagner <dwagner@suse.de>, Petr Tesarik <ptesarik@suse.com>,
Nicolas Saenz Julienne <nsaenz@amazon.com>,
Frederic Weisbecker <frederic@kernel.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Sean Christopherson <seanjc@google.com>,
Juergen Gross <jgross@suse.com>,
Ajay Kaher <ajay.kaher@broadcom.com>,
Alexey Makhalov <alexey.amakhalov@broadcom.com>,
Broadcom internal kernel review list
<bcm-kernel-feedback-list@broadcom.com>,
Russell King <linux@armlinux.org.uk>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Huacai Chen <chenhuacai@kernel.org>,
WANG Xuerui <kernel@xen0n.name>,
Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Alexandre Ghiti <alex@ghiti.fr>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
"Liang, Kan" <kan.liang@linux.intel.com>,
Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Arnd Bergmann <arnd@arndb.de>, Jason Baron <jbaron@akamai.com>,
Ard Biesheuvel <ardb@kernel.org>,
Luis Chamberlain <mcgrof@kernel.org>,
Petr Pavlu <petr.pavlu@suse.com>,
Sami Tolvanen <samitolvanen@google.com>,
Daniel Gomez <da.gomez@samsung.com>,
Naveen N Rao <naveen@kernel.org>,
Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>,
"David S. Miller" <davem@davemloft.net>,
Masami Hiramatsu <mhiramat@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Joel Fernandes <joel@joelfernandes.org>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun.feng@gmail.com>,
Uladzislau Rezki <urezki@gmail.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang1211@gmail.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Kees Cook <kees@kernel.org>, Shuah Khan <shuah@kernel.org>,
Masahiro Yamada <masahiroy@kernel.org>,
Alice Ryhl <aliceryhl@google.com>,
Miguel Ojeda <ojeda@kernel.org>,
"Mike Rapoport (Microsoft)" <rppt@kernel.org>,
Rong Xu <xur@google.com>, Rafael Aquini <aquini@redhat.com>,
Song Liu <song@kernel.org>, Andrii Nakryiko <andrii@kernel.org>,
Dan Carpenter <dan.carpenter@linaro.org>,
Brian Gerst <brgerst@gmail.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Benjamin Berg <benjamin.berg@intel.com>,
Vishal Annapurve <vannapurve@google.com>,
Randy Dunlap <rdunlap@infradead.org>,
John Stultz <jstultz@google.com>,
Tiezhu Yang <yangtiezhu@loongson.cn>
Subject: Re: [PATCH v5 00/25] context_tracking,x86: Defer some IPIs until a user->kernel transition
Date: Fri, 2 May 2025 17:20:02 +0200 [thread overview]
Message-ID: <20250502152002.GX4439@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <6c44fa0e-28ed-400e-aaf2-e0e0720d3811@intel.com>
On Fri, May 02, 2025 at 07:33:55AM -0700, Dave Hansen wrote:
> On 5/2/25 04:22, Peter Zijlstra wrote:
> > On Wed, Apr 30, 2025 at 11:07:35AM -0700, Dave Hansen wrote:
> >
> >> Both AMD and Intel have hardware to do it. ARM CPUs do it too, I think.
> >> You can go buy the Intel hardware off the shelf today.
> > To be fair, the Intel RAR thing is pretty horrific 🙁 Definitely
> > sub-par compared to the AMD and ARM things.
> >
> > Furthermore, the paper states it is a uarch feature for SPR with no
> > guarantee future uarchs will get it (and to be fair, I'd prefer it if
> > they didn't).
>
> I don't think any of that is set in stone, fwiw. It should be entirely
> possible to obtain a longer promise about its availability.
>
> Or ask that AMD and Intel put their heads together in their fancy new
> x86 advisory group and figure out a single way forward.
This might be a good thing regardless.
> > Furthermore, I suspect it will actually be slower than IPIs for anything
> > with more than 64 logical CPUs due to reduced parallelism.
>
> Maybe my brain is crusty and I need to go back and read the spec, but I
> remember RAR using the normal old APIC programming that normal old TLB
> flush IPIs use. So they have similar restrictions. If it's inefficient
> to program a wide IPI, it's also inefficient to program a RAR operation.
> So the (theoretical) pro is that you program it like an IPI and it slots
> into the IPI code fairly easily. But the con is that it has the same
> limitations as IPIs.
The problem is in the request structure. Sending an IPI is an async
action. You do, done.
OTOH RAR has a request buffer where pending requests are put and 'polled'
for completion. This buffer does not have room for more than 64 CPUs.
This means that if you want to invalidate across more, you need to do it
in multiple batches.
So where IPI is:
- IPI all CPUs
- local invalidate
- wait for completion
This then becomes:
for ()
- RAR some CPUs
- wait for completion
Or so I thought to have understood, the paper isn't the easiest to read.
> I was actually concerned that INVLPGB won't be scalable. Since it
> doesn't have the ability to target specific CPUs in the ISA, it
> fundamentally need to either have a mechanism to reach all CPUs, or some
> way to know which TLB entries each CPU might have.
>
> Maybe AMD has something super duper clever to limit the broadcast scope.
> But if they don't, then a small range flush on a small number of CPUs
> might end up being pretty expensive, relatively.
So the way I understand things:
Sending IPIs is sending a message on the interconnect. Mostly this is a
cacheline in size (because MESI). Sparc (v9?) has a fun feature where
you can actually put data payload in an IPI.
Now, we can target an IPI to a single CPU or to a (limited) set of CPU
or broadcast to all CPUs. In fact, targeted IPIs might still be
broadcast IPIs, except most CPUs will ignore it because it doesn't match
them.
TLBI broadcast is like sending IPIs to all CPUs, the message goes out,
everybody sees it.
Much like how snoop filters and the like function, a CPU can process
these messages async -- your CPU doesn't stall for a cacheline
invalidate message either (except ofcourse if it is actively using that
line). Same for TLBI, if the local TLB does not have anything that
matches, its done. Even if it does match, as long as nothing makes
active use of it, it can just drop the TLB entry without disturbing the
actual core.
Only if the CPU has a matching TLB entry *and* it is active, then we
have options. One option is to interrupt the core, another option is to
wait for it to stop using it.
IIUC the current AMD implementation does the 'interrupt' thing.
One thing to consider in all this is that if we TLBI for an executable
page, we should very much also wipe the u-ops cache and all such related
structures -- ARM might have an 'issue' here.
That is, I think the TLBI problem is very similar to the I in MESI --
except possibly simpler, because E must not happen until all CPUs
acknowledge I etc. TLBI does not have this, it has until the next
TLBSYNC.
Anyway, I'm not a hardware person, but this is how I understand these
things to work.
prev parent reply other threads:[~2025-05-02 15:21 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-29 11:32 [PATCH v5 00/25] context_tracking,x86: Defer some IPIs until a user->kernel transition Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 01/25] objtool: Make validate_call() recognize indirect calls to pv_ops[] Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 02/25] objtool: Flesh out warning related to pv_ops[] calls Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 03/25] rcu: Add a small-width RCU watching counter debug option Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 04/25] rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 05/25] jump_label: Add annotations for validating noinstr usage Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 06/25] static_call: Add read-only-after-init static calls Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 07/25] x86/paravirt: Mark pv_sched_clock static call as __ro_after_init Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 08/25] x86/idle: Mark x86_idle " Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 09/25] x86/paravirt: Mark pv_steal_clock " Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 10/25] riscv/paravirt: " Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 11/25] loongarch/paravirt: " Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 12/25] arm64/paravirt: " Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 13/25] arm/paravirt: " Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 14/25] perf/x86/amd: Mark perf_lopwr_cb " Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 15/25] sched/clock: Mark sched_clock_running key " Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 16/25] KVM: VMX: Mark __kvm_is_using_evmcs static " Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 17/25] x86/speculation/mds: Mark mds_idle_clear key as allowed in .noinstr Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 18/25] sched/clock, x86: Mark __sched_clock_stable " Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 19/25] KVM: VMX: Mark vmx_l1d_should flush and vmx_l1d_flush_cond keys " Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 20/25] stackleack: Mark stack_erasing_bypass key " Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 21/25] objtool: Add noinstr validation for static branches/calls Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 22/25] module: Remove outdated comment about text_size Valentin Schneider
2025-05-05 14:27 ` Petr Pavlu
2025-04-29 11:32 ` [PATCH v5 23/25] module: Add MOD_NOINSTR_TEXT mem_type Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 24/25] context-tracking: Introduce work deferral infrastructure Valentin Schneider
2025-04-29 11:32 ` [PATCH v5 25/25] context_tracking,x86: Defer kernel text patching IPIs Valentin Schneider
2025-04-29 16:11 ` [PATCH v5 00/25] context_tracking,x86: Defer some IPIs until a user->kernel transition Dave Hansen
2025-04-30 17:20 ` Steven Rostedt
2025-04-30 18:07 ` Dave Hansen
2025-04-30 19:42 ` Steven Rostedt
2025-04-30 20:00 ` Dave Hansen
2025-05-02 9:55 ` Valentin Schneider
2025-05-02 13:53 ` Dave Hansen
2025-05-02 16:38 ` Valentin Schneider
2025-05-02 17:57 ` Dave Hansen
2025-05-05 15:45 ` Valentin Schneider
2025-05-02 11:22 ` Peter Zijlstra
2025-05-02 14:33 ` Dave Hansen
2025-05-02 15:20 ` Peter Zijlstra [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250502152002.GX4439@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=ajay.kaher@broadcom.com \
--cc=alex@ghiti.fr \
--cc=alexander.shishkin@linux.intel.com \
--cc=alexey.amakhalov@broadcom.com \
--cc=aliceryhl@google.com \
--cc=andrii@kernel.org \
--cc=anil.s.keshavamurthy@intel.com \
--cc=aou@eecs.berkeley.edu \
--cc=aquini@redhat.com \
--cc=ardb@kernel.org \
--cc=arnd@arndb.de \
--cc=bcm-kernel-feedback-list@broadcom.com \
--cc=benjamin.berg@intel.com \
--cc=boqun.feng@gmail.com \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=brgerst@gmail.com \
--cc=bsegall@google.com \
--cc=catalin.marinas@arm.com \
--cc=chenhuacai@kernel.org \
--cc=da.gomez@samsung.com \
--cc=dan.carpenter@linaro.org \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=dietmar.eggemann@arm.com \
--cc=dwagner@suse.de \
--cc=frederic@kernel.org \
--cc=hpa@zytor.com \
--cc=irogers@google.com \
--cc=jbaron@akamai.com \
--cc=jgross@suse.com \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=jolsa@kernel.org \
--cc=josh@joshtriplett.org \
--cc=jpoimboe@kernel.org \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=kan.liang@linux.intel.com \
--cc=kees@kernel.org \
--cc=kernel@xen0n.name \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-modules@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=loongarch@lists.linux.dev \
--cc=mark.rutland@arm.com \
--cc=masahiroy@kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mcgrof@kernel.org \
--cc=mgorman@suse.de \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=mtosatti@redhat.com \
--cc=namhyung@kernel.org \
--cc=naveen@kernel.org \
--cc=neeraj.upadhyay@kernel.org \
--cc=nsaenz@amazon.com \
--cc=ojeda@kernel.org \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=paulmck@kernel.org \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=pbonzini@redhat.com \
--cc=petr.pavlu@suse.com \
--cc=ptesarik@suse.com \
--cc=qiang.zhang1211@gmail.com \
--cc=rcu@vger.kernel.org \
--cc=rdunlap@infradead.org \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=samitolvanen@google.com \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=tglx@linutronix.de \
--cc=urezki@gmail.com \
--cc=vannapurve@google.com \
--cc=vincent.guittot@linaro.org \
--cc=virtualization@lists.linux.dev \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=xur@google.com \
--cc=yangtiezhu@loongson.cn \
--cc=ypodemsk@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox