All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Valentin Schneider <vschneid@redhat.com>
Cc: Phil Auld <pauld@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	rcu@vger.kernel.org, x86@kernel.org,
	linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev,
	linux-riscv@lists.infradead.org, linux-arch@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Jason Baron <jbaron@akamai.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ard Biesheuvel <ardb@kernel.org>,
	Sami Tolvanen <samitolvanen@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Masahiro Yamada <masahiroy@kernel.org>,
	Han Shen <shenhan@google.com>, Rik van Riel <riel@surriel.com>,
	Jann Horn <jannh@google.com>,
	Dan Carpenter <dan.carpenter@linaro.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Clark Williams <williams@redhat.com>,
	Yair Podemsky <ypodemsk@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Daniel Wagner <dwagner@suse.de>, Petr Tesarik <ptesarik@suse.com>
Subject: Re: [PATCH v6 00/29] context_tracking,x86: Defer some IPIs until a user->kernel transition
Date: Wed, 5 Nov 2025 18:46:28 +0100	[thread overview]
Message-ID: <aQuNdOEmPYkI03my@localhost.localdomain> (raw)
In-Reply-To: <xhsmh8qgk5txe.mognet@vschneid-thinkpadt14sgen2i.remote.csb>

Le Wed, Nov 05, 2025 at 05:24:29PM +0100, Valentin Schneider a écrit :
> On 29/10/25 18:15, Frederic Weisbecker wrote:
> > Le Wed, Oct 29, 2025 at 11:32:58AM +0100, Valentin Schneider a écrit :
> >> I need to have a think about that one; one pain point I see is the context
> >> tracking work has to be NMI safe since e.g. an NMI can take us out of
> >> userspace. Another is that NOHZ-full CPUs need to be special cased in the
> >> stop machine queueing / completion.
> >>
> >> /me goes fetch a new notebook
> >
> > Something like the below (untested) ?
> >
> 
> Some minor nits below but otherwise that looks promising.
> 
> One problem I'm having however is reasoning about the danger zone; what
> forbidden actions could a NO_HZ_FULL CPU take when entering the kernel
> while take_cpu_down() is happening?
> 
> I'm actually not familiar with why we actually use stop_machine() for CPU
> hotplug; I see things like CPUHP_AP_SMPCFD_DYING::smpcfd_dying_cpu() or
> CPUHP_AP_TICK_DYING::tick_cpu_dying() expect other CPUs to be patiently
> spinning in multi_cpu_stop(), and I *think* nothing in the entry code up to
> context_tracking entry would disrupt that, but it's not a small thing to
> reason about.
> 
> AFAICT we need to reason about every .teardown callback from
> CPUHP_TEARDOWN_CPU to CPUHP_AP_OFFLINE and their explicit & implicit
> dependencies on other CPUs being STOP'd.

You're raising a very interesting question. The initial point of stop_machine()
is to synchronize this:

    set_cpu_online(cpu, 0)
    migrate timers;
    migrate hrtimers;
    flush IPIs;
    etc...

against this pattern:

    preempt_disable()
    if (cpu_online(cpu))
        queue something; // could be timer, IPI, etc...
    preempt_enable()

There have been attempts:

      https://lore.kernel.org/all/20241218171531.2217275-1-costa.shul@redhat.com/

And really it should be fine to just do:

    set_cpu_online(cpu, 0)
    synchronize_rcu()
    migrate / flush stuff

Probably we should try that instead of the busy loop I proposed
which only papers over the problem.

Of course there are other assumptions. For example the tick
timekeeper is migrated easily knowing that all online CPUs are
not idle (cf: tick_cpu_dying()). So I expect a few traps, with RCU
for example and indeed all these hotplug callbacks must be audited
one by one.

I'm not entirely unfamiliar with many of them. Let me see what I can do...

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Frederic Weisbecker <frederic@kernel.org>
To: Valentin Schneider <vschneid@redhat.com>
Cc: Phil Auld <pauld@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	rcu@vger.kernel.org, x86@kernel.org,
	linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev,
	linux-riscv@lists.infradead.org, linux-arch@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Jason Baron <jbaron@akamai.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ard Biesheuvel <ardb@kernel.org>,
	Sami Tolvanen <samitolvanen@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Masahiro Yamada <masahiroy@kernel.org>,
	Han Shen <shenhan@google.com>, Rik van Riel <riel@surriel.com>,
	Jann Horn <jannh@google.com>,
	Dan Carpenter <dan.carpenter@linaro.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Clark Williams <williams@redhat.com>,
	Yair Podemsky <ypodemsk@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Daniel Wagner <dwagner@suse.de>, Petr Tesarik <ptesarik@suse.com>
Subject: Re: [PATCH v6 00/29] context_tracking,x86: Defer some IPIs until a user->kernel transition
Date: Wed, 5 Nov 2025 18:46:28 +0100	[thread overview]
Message-ID: <aQuNdOEmPYkI03my@localhost.localdomain> (raw)
In-Reply-To: <xhsmh8qgk5txe.mognet@vschneid-thinkpadt14sgen2i.remote.csb>

Le Wed, Nov 05, 2025 at 05:24:29PM +0100, Valentin Schneider a écrit :
> On 29/10/25 18:15, Frederic Weisbecker wrote:
> > Le Wed, Oct 29, 2025 at 11:32:58AM +0100, Valentin Schneider a écrit :
> >> I need to have a think about that one; one pain point I see is the context
> >> tracking work has to be NMI safe since e.g. an NMI can take us out of
> >> userspace. Another is that NOHZ-full CPUs need to be special cased in the
> >> stop machine queueing / completion.
> >>
> >> /me goes fetch a new notebook
> >
> > Something like the below (untested) ?
> >
> 
> Some minor nits below but otherwise that looks promising.
> 
> One problem I'm having however is reasoning about the danger zone; what
> forbidden actions could a NO_HZ_FULL CPU take when entering the kernel
> while take_cpu_down() is happening?
> 
> I'm actually not familiar with why we actually use stop_machine() for CPU
> hotplug; I see things like CPUHP_AP_SMPCFD_DYING::smpcfd_dying_cpu() or
> CPUHP_AP_TICK_DYING::tick_cpu_dying() expect other CPUs to be patiently
> spinning in multi_cpu_stop(), and I *think* nothing in the entry code up to
> context_tracking entry would disrupt that, but it's not a small thing to
> reason about.
> 
> AFAICT we need to reason about every .teardown callback from
> CPUHP_TEARDOWN_CPU to CPUHP_AP_OFFLINE and their explicit & implicit
> dependencies on other CPUs being STOP'd.

You're raising a very interesting question. The initial point of stop_machine()
is to synchronize this:

    set_cpu_online(cpu, 0)
    migrate timers;
    migrate hrtimers;
    flush IPIs;
    etc...

against this pattern:

    preempt_disable()
    if (cpu_online(cpu))
        queue something; // could be timer, IPI, etc...
    preempt_enable()

There have been attempts:

      https://lore.kernel.org/all/20241218171531.2217275-1-costa.shul@redhat.com/

And really it should be fine to just do:

    set_cpu_online(cpu, 0)
    synchronize_rcu()
    migrate / flush stuff

Probably we should try that instead of the busy loop I proposed
which only papers over the problem.

Of course there are other assumptions. For example the tick
timekeeper is migrated easily knowing that all online CPUs are
not idle (cf: tick_cpu_dying()). So I expect a few traps, with RCU
for example and indeed all these hotplug callbacks must be audited
one by one.

I'm not entirely unfamiliar with many of them. Let me see what I can do...

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2025-11-05 17:46 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-10 15:38 [PATCH v6 00/29] context_tracking,x86: Defer some IPIs until a user->kernel transition Valentin Schneider
2025-10-10 15:38 ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 01/29] objtool: Make validate_call() recognize indirect calls to pv_ops[] Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 02/29] objtool: Flesh out warning related to pv_ops[] calls Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 03/29] rcu: Add a small-width RCU watching counter debug option Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 04/29] rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 05/29] jump_label: Add annotations for validating noinstr usage Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 06/29] static_call: Add read-only-after-init static calls Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-30 10:25   ` Petr Tesarik
2025-10-30 10:25     ` Petr Tesarik
2025-10-31 11:52     ` Valentin Schneider
2025-10-31 11:52       ` Valentin Schneider
2025-11-03  8:37       ` Petr Tesarik
2025-11-03  8:37         ` Petr Tesarik
2025-10-10 15:38 ` [PATCH v6 07/29] x86/paravirt: Mark pv_sched_clock static call as __ro_after_init Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 08/29] x86/idle: Mark x86_idle " Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 09/29] x86/paravirt: Mark pv_steal_clock " Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 10/29] riscv/paravirt: " Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 11/29] loongarch/paravirt: " Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 12/29] arm64/paravirt: " Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 13/29] arm/paravirt: " Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 14/29] perf/x86/amd: Mark perf_lopwr_cb " Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 15/29] sched/clock: Mark sched_clock_running key " Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 16/29] KVM: VMX: Mark __kvm_is_using_evmcs static " Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-14  0:02   ` Sean Christopherson
2025-10-14  0:02     ` Sean Christopherson
2025-10-14 11:20     ` Valentin Schneider
2025-10-14 11:20       ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 17/29] x86/speculation/mds: Mark cpu_buf_idle_clear key as allowed in .noinstr Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 18/29] sched/clock, x86: Mark __sched_clock_stable " Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 19/29] KVM: VMX: Mark vmx_l1d_should flush and vmx_l1d_flush_cond keys " Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-14  0:01   ` Sean Christopherson
2025-10-14  0:01     ` Sean Christopherson
2025-10-14 11:02     ` Valentin Schneider
2025-10-14 11:02       ` Valentin Schneider
2025-10-14 19:06       ` Sean Christopherson
2025-10-14 19:06         ` Sean Christopherson
2025-10-10 15:38 ` [PATCH v6 20/29] stackleack: Mark stack_erasing_bypass key " Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 21/29] objtool: Add noinstr validation for static branches/calls Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 22/29] module: Add MOD_NOINSTR_TEXT mem_type Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 23/29] context-tracking: Introduce work deferral infrastructure Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-28 14:00   ` Frederic Weisbecker
2025-10-28 14:00     ` Frederic Weisbecker
2025-10-29 10:09     ` Valentin Schneider
2025-10-29 10:09       ` Valentin Schneider
2025-10-29 14:52       ` Frederic Weisbecker
2025-10-29 14:52         ` Frederic Weisbecker
2025-11-03  8:32   ` Shrikanth Hegde
2025-11-03  8:32     ` Shrikanth Hegde
2025-11-04 13:45     ` Valentin Schneider
2025-11-04 13:45       ` Valentin Schneider
2025-10-10 15:38 ` [PATCH v6 24/29] context_tracking,x86: Defer kernel text patching IPIs Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-28 14:49   ` Frederic Weisbecker
2025-10-28 14:49     ` Frederic Weisbecker
2025-10-10 15:38 ` [PATCH v6 25/29] x86/mm: Make INVPCID type macros available to assembly Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [RFC PATCH v6 26/29] x86/mm/pti: Introduce a kernel/user CR3 software signal Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [RFC PATCH v6 27/29] x86/mm/pti: Implement a TLB flush immediately after a switch to kernel CR3 Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-28 15:59   ` Frederic Weisbecker
2025-10-28 15:59     ` Frederic Weisbecker
2025-10-29 10:16     ` Valentin Schneider
2025-10-29 10:16       ` Valentin Schneider
2025-10-29 10:31       ` Frederic Weisbecker
2025-10-29 10:31         ` Frederic Weisbecker
2025-10-29 14:13         ` Valentin Schneider
2025-10-29 14:13           ` Valentin Schneider
2025-10-29 14:49           ` Frederic Weisbecker
2025-10-29 14:49             ` Frederic Weisbecker
2025-10-31  9:55             ` Valentin Schneider
2025-10-31  9:55               ` Valentin Schneider
2025-10-10 15:38 ` [RFC PATCH v6 28/29] x86/mm, mm/vmalloc: Defer kernel TLB flush IPIs under CONFIG_COALESCE_TLBI=y Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-10 15:38 ` [RFC PATCH v6 29/29] x86/entry: Add an option to coalesce TLB flushes Valentin Schneider
2025-10-10 15:38   ` Valentin Schneider
2025-10-14 12:58 ` [PATCH v6 00/29] context_tracking,x86: Defer some IPIs until a user->kernel transition Juri Lelli
2025-10-14 12:58   ` Juri Lelli
2025-10-14 15:26   ` Valentin Schneider
2025-10-14 15:26     ` Valentin Schneider
2025-10-15 13:16     ` Valentin Schneider
2025-10-15 13:16       ` Valentin Schneider
2025-10-15 14:28       ` Juri Lelli
2025-10-15 14:28         ` Juri Lelli
2025-10-28 16:25 ` Frederic Weisbecker
2025-10-28 16:25   ` Frederic Weisbecker
2025-10-29 10:32   ` Valentin Schneider
2025-10-29 10:32     ` Valentin Schneider
2025-10-29 17:15     ` Frederic Weisbecker
2025-10-29 17:15       ` Frederic Weisbecker
2025-11-05 16:24       ` Valentin Schneider
2025-11-05 16:24         ` Valentin Schneider
2025-11-05 17:46         ` Frederic Weisbecker [this message]
2025-11-05 17:46           ` Frederic Weisbecker
2025-11-06 10:02           ` Valentin Schneider
2025-11-06 10:02             ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQuNdOEmPYkI03my@localhost.localdomain \
    --to=frederic@kernel.org \
    --cc=acme@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=arnd@arndb.de \
    --cc=boqun.feng@gmail.com \
    --cc=bp@alien8.de \
    --cc=dan.carpenter@linaro.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=dwagner@suse.de \
    --cc=hpa@zytor.com \
    --cc=jannh@google.com \
    --cc=jbaron@akamai.com \
    --cc=joelagnelf@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=jpoimboe@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=loongarch@lists.linux.dev \
    --cc=luto@kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=oleg@redhat.com \
    --cc=pauld@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ptesarik@suse.com \
    --cc=rcu@vger.kernel.org \
    --cc=riel@surriel.com \
    --cc=rostedt@goodmis.org \
    --cc=samitolvanen@google.com \
    --cc=shenhan@google.com \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    --cc=vschneid@redhat.com \
    --cc=williams@redhat.com \
    --cc=x86@kernel.org \
    --cc=ypodemsk@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.