linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joel Fernandes <joelagnelf@nvidia.com>
To: Valentin Schneider <vschneid@redhat.com>
Cc: Jann Horn <jannh@google.com>,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	virtualization@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev,
	linux-riscv@lists.infradead.org,
	linux-perf-users@vger.kernel.org, xen-devel@lists.xenproject.org,
	kvm@vger.kernel.org, linux-arch@vger.kernel.org,
	rcu@vger.kernel.org, linux-hardening@vger.kernel.org,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	bpf@vger.kernel.org, bcm-kernel-feedback-list@broadcom.com,
	Juergen Gross <jgross@suse.com>,
	Ajay Kaher <ajay.kaher@broadcom.com>,
	Alexey Makhalov <alexey.amakhalov@broadcom.com>,
	Russell King <linux@armlinux.org.uk>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Huacai Chen <chenhuacai@kernel.org>,
	WANG Xuerui <kernel@xen0n.name>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	"Liang, Kan" <kan.liang@linux.intel.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Frederic Weisbecker <frederic@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Jason Baron <jbaron@akamai.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ard Biesheuvel <ardb@kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Zqiang <qiang.zhang1211@gmail.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Clark Williams <williams@redhat.com>,
	Yair Podemsky <ypodemsk@redhat.com>,
	Tomas Glozar <tglozar@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Kees Cook <kees@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>,
	Shuah Khan <shuah@kernel.org>,
	Sami Tolvanen <samitolvanen@google.com>,
	Miguel Ojeda <ojeda@kernel.org>,
	Alice Ryhl <aliceryhl@google.com>,
	"Mike Rapoport (Microsoft)" <rppt@kernel.org>,
	Samuel Holland <samuel.holland@sifive.com>,
	Rong Xu <xur@google.com>,
	Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Yosry Ahmed <yosryahmed@google.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Masami Hiramatsu (Google)" <mhiramat@kernel.org>,
	Jinghao Jia <jinghao7@illinois.edu>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Tiezhu Yang <yangtiezhu@loongson.cn>
Subject: Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs
Date: Wed, 19 Feb 2025 10:05:47 -0500	[thread overview]
Message-ID: <20250219145302.GA480110@joelnvbox> (raw)
In-Reply-To: <xhsmhzfjpfkky.mognet@vschneid-thinkpadt14sgen2i.remote.csb>

On Fri, Jan 17, 2025 at 05:53:33PM +0100, Valentin Schneider wrote:
> On 17/01/25 16:52, Jann Horn wrote:
> > On Fri, Jan 17, 2025 at 4:25 PM Valentin Schneider <vschneid@redhat.com> wrote:
> >> On 14/01/25 19:16, Jann Horn wrote:
> >> > On Tue, Jan 14, 2025 at 6:51 PM Valentin Schneider <vschneid@redhat.com> wrote:
> >> >> vunmap()'s issued from housekeeping CPUs are a relatively common source of
> >> >> interference for isolated NOHZ_FULL CPUs, as they are hit by the
> >> >> flush_tlb_kernel_range() IPIs.
> >> >>
> >> >> Given that CPUs executing in userspace do not access data in the vmalloc
> >> >> range, these IPIs could be deferred until their next kernel entry.
> >> >>
> >> >> Deferral vs early entry danger zone
> >> >> ===================================
> >> >>
> >> >> This requires a guarantee that nothing in the vmalloc range can be vunmap'd
> >> >> and then accessed in early entry code.
> >> >
> >> > In other words, it needs a guarantee that no vmalloc allocations that
> >> > have been created in the vmalloc region while the CPU was idle can
> >> > then be accessed during early entry, right?
> >>
> >> I'm not sure if that would be a problem (not an mm expert, please do
> >> correct me) - looking at vmap_pages_range(), flush_cache_vmap() isn't
> >> deferred anyway.
> >
> > flush_cache_vmap() is about stuff like flushing data caches on
> > architectures with virtually indexed caches; that doesn't do TLB
> > maintenance. When you look for its definition on x86 or arm64, you'll
> > see that they use the generic implementation which is simply an empty
> > inline function.
> >
> >> So after vmapping something, I wouldn't expect isolated CPUs to have
> >> invalid TLB entries for the newly vmapped page.
> >>
> >> However, upon vunmap'ing something, the TLB flush is deferred, and thus
> >> stale TLB entries can and will remain on isolated CPUs, up until they
> >> execute the deferred flush themselves (IOW for the entire duration of the
> >> "danger zone").
> >>
> >> Does that make sense?
> >
> > The design idea wrt TLB flushes in the vmap code is that you don't do
> > TLB flushes when you unmap stuff or when you map stuff, because doing
> > TLB flushes across the entire system on every vmap/vunmap would be a
> > bit costly; instead you just do batched TLB flushes in between, in
> > __purge_vmap_area_lazy().
> >
> > In other words, the basic idea is that you can keep calling vmap() and
> > vunmap() a bunch of times without ever doing TLB flushes until you run
> > out of virtual memory in the vmap region; then you do one big TLB
> > flush, and afterwards you can reuse the free virtual address space for
> > new allocations again.
> >
> > So if you "defer" that batched TLB flush for CPUs that are not
> > currently running in the kernel, I think the consequence is that those
> > CPUs may end up with incoherent TLB state after a reallocation of the
> > virtual address space.
> >
> 
> Ah, gotcha, thank you for laying this out! In which case yes, any vmalloc
> that occurred while an isolated CPU was NOHZ-FULL can be an issue if said
> CPU accesses it during early entry;

So the issue is:

CPU1: unmappes vmalloc page X which was previously mapped to physical page
P1.

CPU2: does a whole bunch of vmalloc and vfree eventually crossing some lazy
threshold and sending out IPIs. It then goes ahead and does an allocation
that maps the same virtual page X to physical page P2.

CPU3 is isolated and executes some early entry code before receving said IPIs
which are supposedly deferred by Valentin's patches.

It does not receive the IPI becuase it is deferred, thus access by early
entry code to page X on this CPU results in a UAF access to P1.

Is that the issue?

thanks,

 - Joel


  reply	other threads:[~2025-02-19 15:05 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-14 17:51 [PATCH v4 00/30] context_tracking,x86: Defer some IPIs until a user->kernel transition Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 01/30] objtool: Make validate_call() recognize indirect calls to pv_ops[] Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 02/30] objtool: Flesh out warning related to pv_ops[] calls Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 03/30] rcu: Add a small-width RCU watching counter debug option Valentin Schneider
2025-01-21 13:56   ` Frederic Weisbecker
2025-01-14 17:51 ` [PATCH v4 04/30] rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE Valentin Schneider
2025-01-21 14:00   ` Frederic Weisbecker
2025-01-14 17:51 ` [PATCH v4 05/30] jump_label: Add annotations for validating noinstr usage Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 06/30] static_call: Add read-only-after-init static calls Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 07/30] x86/paravirt: Mark pv_sched_clock static call as __ro_after_init Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 08/30] x86/idle: Mark x86_idle " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 09/30] x86/paravirt: Mark pv_steal_clock " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 10/30] riscv/paravirt: " Valentin Schneider
2025-01-14 18:29   ` Andrew Jones
2025-01-14 17:51 ` [PATCH v4 11/30] loongarch/paravirt: " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 12/30] arm64/paravirt: " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 13/30] arm/paravirt: " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 14/30] perf/x86/amd: Mark perf_lopwr_cb " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 15/30] sched/clock: Mark sched_clock_running key " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 16/30] x86/speculation/mds: Mark mds_idle_clear key as allowed in .noinstr Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 17/30] sched/clock, x86: Mark __sched_clock_stable " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 18/30] x86/kvm/vmx: Mark vmx_l1d_should flush and vmx_l1d_flush_cond keys " Valentin Schneider
2025-01-14 21:19   ` Sean Christopherson
2025-01-17  9:50     ` Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 19/30] stackleack: Mark stack_erasing_bypass key " Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 20/30] objtool: Add noinstr validation for static branches/calls Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 21/30] context_tracking: Explicitely use CT_STATE_KERNEL where it is missing Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 22/30] context_tracking: Exit CT_STATE_IDLE upon irq/nmi entry Valentin Schneider
2025-01-22  0:22   ` Frederic Weisbecker
2025-01-22  1:04     ` Sean Christopherson
2025-01-27 11:17     ` Valentin Schneider
2025-02-07 17:06       ` Valentin Schneider
2025-02-07 18:37         ` Frederic Weisbecker
2025-02-10 17:36           ` Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 23/30] context_tracking: Turn CT_STATE_* into bits Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 24/30] context-tracking: Introduce work deferral infrastructure Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 25/30] context_tracking,x86: Defer kernel text patching IPIs Valentin Schneider
2025-01-14 21:13   ` Sean Christopherson
2025-01-14 21:48     ` Sean Christopherson
2025-01-17  9:54       ` Valentin Schneider
2025-01-17  9:47     ` Valentin Schneider
2025-01-17 17:15       ` Sean Christopherson
2025-01-20 13:53         ` Valentin Schneider
2025-01-14 21:26   ` Sean Christopherson
2025-01-24 10:48   ` K Prateek Nayak
2025-01-14 17:51 ` [PATCH v4 26/30] x86,tlb: Make __flush_tlb_global() noinstr-compliant Valentin Schneider
2025-01-14 21:45   ` Dave Hansen
2025-01-17 13:44     ` Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 27/30] x86/tlb: Make __flush_tlb_local() noinstr-compliant Valentin Schneider
2025-01-14 21:24   ` Sean Christopherson
2025-01-14 17:51 ` [PATCH v4 28/30] x86/tlb: Make __flush_tlb_all() noinstr Valentin Schneider
2025-01-14 17:51 ` [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Valentin Schneider
2025-01-14 18:16   ` Jann Horn
2025-01-17 15:25     ` Valentin Schneider
2025-01-17 15:52       ` Jann Horn
2025-01-17 16:53         ` Valentin Schneider
2025-02-19 15:05           ` Joel Fernandes [this message]
2025-02-19 16:18             ` Valentin Schneider
2025-02-19 17:08               ` Joel Fernandes
2025-02-19 20:32                 ` Dave Hansen
2025-01-27 15:51         ` Will Deacon
2025-02-10 18:36         ` Valentin Schneider
2025-02-10 22:08           ` Jann Horn
2025-02-11 13:33             ` Valentin Schneider
2025-02-11 14:03               ` Mark Rutland
2025-02-11 16:09                 ` Valentin Schneider
2025-02-11 14:22               ` Dave Hansen
2025-02-11 16:10                 ` Valentin Schneider
2025-02-18 22:40                 ` Valentin Schneider
2025-02-19  0:39                   ` Dave Hansen
2025-02-19 15:13                     ` Valentin Schneider
2025-02-19 20:25                       ` Dave Hansen
2025-02-20 17:10                         ` Valentin Schneider
2025-02-20 17:38                           ` Dave Hansen
2025-02-26 16:52                             ` Valentin Schneider
2025-03-25 17:52                             ` Valentin Schneider
2025-03-25 18:41                               ` Jann Horn
2025-03-26  8:56                                 ` Valentin Schneider
2025-01-17 16:11       ` Uladzislau Rezki
2025-01-17 17:00         ` Valentin Schneider
2025-01-20 11:15           ` Uladzislau Rezki
2025-01-20 16:09             ` Valentin Schneider
2025-01-21 17:00               ` Uladzislau Rezki
2025-01-24 15:22                 ` Valentin Schneider
2025-01-27 10:36                   ` Uladzislau Rezki
2025-01-14 17:51 ` [PATCH v4 30/30] context-tracking: Add a Kconfig to enable IPI deferral for NO_HZ_IDLE Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250219145302.GA480110@joelnvbox \
    --to=joelagnelf@nvidia.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ajay.kaher@broadcom.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alexey.amakhalov@broadcom.com \
    --cc=aliceryhl@google.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=ardb@kernel.org \
    --cc=arnd@arndb.de \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=boqun.feng@gmail.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=bsegall@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=chenhuacai@kernel.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=frederic@kernel.org \
    --cc=geert@linux-m68k.org \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=irogers@google.com \
    --cc=jannh@google.com \
    --cc=jbaron@akamai.com \
    --cc=jgross@suse.com \
    --cc=jiangshanlai@gmail.com \
    --cc=jinghao7@illinois.edu \
    --cc=joel@joelfernandes.org \
    --cc=jolsa@kernel.org \
    --cc=josh@joshtriplett.org \
    --cc=jpoimboe@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=kan.liang@linux.intel.com \
    --cc=kees@kernel.org \
    --cc=kernel@xen0n.name \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux@armlinux.org.uk \
    --cc=loongarch@lists.linux.dev \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mcgrof@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=nsaenzju@redhat.com \
    --cc=ojeda@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qiang.zhang1211@gmail.com \
    --cc=rcu@vger.kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=samitolvanen@google.com \
    --cc=samuel.holland@sifive.com \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=tglozar@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=virtualization@lists.linux.dev \
    --cc=vschneid@redhat.com \
    --cc=will@kernel.org \
    --cc=williams@redhat.com \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    --cc=xur@google.com \
    --cc=yangtiezhu@loongson.cn \
    --cc=yosryahmed@google.com \
    --cc=ypodemsk@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).