From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E9FF0CCF9EE for ; Wed, 29 Oct 2025 17:15:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To: Content-Transfer-Encoding:Content-Type:MIME-Version:References:Message-ID: Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=wzfQKL/P74birS3vfbaJ1tu9wIsMCQCjuGD1SWeA7uE=; b=ed6MCajdc5Ai8pAyrqTBVCJHoF JAB7CRYsX/nxI6LqW/7fUaXuCoN/08f27EiXDM3PWd50ADh6ijw4aff8l9CqGxgU5cyfhckrJDMVy /V0pVOPBQgezO4WwcJv+dejfZ+qX+zMFvr/udLYd/b/5j7uDnE5MiSe85rzGHkT+HC8Rj8yxQ278w cMYhDCFdJEqpFZqnnqm1Uakkn2d7br6/lbCvvjYCRuhs4YgakwvT/O/xKRIkx2IgavVxbGce9fXdq fo40zYZulpakzdDDoeqODh5mLxcma9t4ip63qvN7FIiA+ZF9F9+bWvNSJTsMidD8WMdH52PqB4vhP 2TrIA0Ow==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1vE9lb-00000002A9i-3Cit; Wed, 29 Oct 2025 17:15:23 +0000 Received: from tor.source.kernel.org ([172.105.4.254]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1vE9la-00000002A9M-1QI7; Wed, 29 Oct 2025 17:15:22 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 374136149D; Wed, 29 Oct 2025 17:15:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 35E0DC4CEF7; Wed, 29 Oct 2025 17:15:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761758120; bh=t3ByI2ggpnXN7fpU/j1XfFuDuzjdaTgYJk3bPpbYU5w=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=mCctPaF/s4Ej/qaUy9NBWjZ4tFp0QCR0mZ6Q58I4iogldTOoS12UFHS0vHbG0DSph 227Cm4eZbmJrh0L/NPvh44Kt43N6IyuWC/KldVbmCoiLr50mqPOn4lsXsfukXT9u+K SBwatwsBkUUf9NSOFWQllmDLp942vZS35JrPpXugYhl3MQY2Z3swCE0Lc1cVMzrOLm q+hyKGuAJkkwhVPyyL8Zd0mjnzZC9T1HST0yNNQxlFzyBGHCKafdtiPPmxzYYRh5+I K8ivV6rg0eGcNpXKcdo4UXxI0eBtAIghhVlnwJJpy2Ww7H0rESJJHmDQURxNbt0Hv7 4WzSv5OBoFaLQ== Date: Wed, 29 Oct 2025 18:15:17 +0100 From: Frederic Weisbecker To: Valentin Schneider Cc: Phil Auld , linux-kernel@vger.kernel.org, linux-mm@kvack.org, rcu@vger.kernel.org, x86@kernel.org, linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev, linux-riscv@lists.infradead.org, linux-arch@vger.kernel.org, linux-trace-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Arnaldo Carvalho de Melo , Josh Poimboeuf , Paolo Bonzini , Arnd Bergmann , "Paul E. McKenney" , Jason Baron , Steven Rostedt , Ard Biesheuvel , Sami Tolvanen , "David S. Miller" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Mel Gorman , Andrew Morton , Masahiro Yamada , Han Shen , Rik van Riel , Jann Horn , Dan Carpenter , Oleg Nesterov , Juri Lelli , Clark Williams , Yair Podemsky , Marcelo Tosatti , Daniel Wagner , Petr Tesarik Subject: Re: [PATCH v6 00/29] context_tracking,x86: Defer some IPIs until a user->kernel transition Message-ID: References: <20251010153839.151763-1-vschneid@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Le Wed, Oct 29, 2025 at 11:32:58AM +0100, Valentin Schneider a écrit : > I need to have a think about that one; one pain point I see is the context > tracking work has to be NMI safe since e.g. an NMI can take us out of > userspace. Another is that NOHZ-full CPUs need to be special cased in the > stop machine queueing / completion. > > /me goes fetch a new notebook Something like the below (untested) ? diff --git a/arch/x86/include/asm/context_tracking_work.h b/arch/x86/include/asm/context_tracking_work.h index 485b32881fde..2940e28ecea6 100644 --- a/arch/x86/include/asm/context_tracking_work.h +++ b/arch/x86/include/asm/context_tracking_work.h @@ -3,6 +3,7 @@ #define _ASM_X86_CONTEXT_TRACKING_WORK_H #include +#include static __always_inline void arch_context_tracking_work(enum ct_work work) { @@ -10,6 +11,9 @@ static __always_inline void arch_context_tracking_work(enum ct_work work) case CT_WORK_SYNC: sync_core(); break; + case CT_WORK_STOP_MACHINE: + stop_machine_poll_wait(); + break; case CT_WORK_MAX: WARN_ON_ONCE(true); } diff --git a/include/linux/context_tracking_work.h b/include/linux/context_tracking_work.h index 2facc621be06..b63200bd73d6 100644 --- a/include/linux/context_tracking_work.h +++ b/include/linux/context_tracking_work.h @@ -6,12 +6,14 @@ enum { CT_WORK_SYNC_OFFSET, + CT_WORK_STOP_MACHINE_OFFSET, CT_WORK_MAX_OFFSET }; enum ct_work { - CT_WORK_SYNC = BIT(CT_WORK_SYNC_OFFSET), - CT_WORK_MAX = BIT(CT_WORK_MAX_OFFSET) + CT_WORK_SYNC = BIT(CT_WORK_SYNC_OFFSET), + CT_WORK_STOP_MACHINE = BIT(CT_WORK_STOP_MACHINE_OFFSET), + CT_WORK_MAX = BIT(CT_WORK_MAX_OFFSET) }; #include diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h index 72820503514c..0efe88e84b8a 100644 --- a/include/linux/stop_machine.h +++ b/include/linux/stop_machine.h @@ -36,6 +36,7 @@ bool stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg, void stop_machine_park(int cpu); void stop_machine_unpark(int cpu); void stop_machine_yield(const struct cpumask *cpumask); +void stop_machine_poll_wait(void); extern void print_stop_info(const char *log_lvl, struct task_struct *task); diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 3fe6b0c99f3d..8f0281b0db64 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -22,6 +22,7 @@ #include #include #include +#include /* * Structure to determine completion condition and record errors. May @@ -176,6 +177,68 @@ struct multi_stop_data { atomic_t thread_ack; }; +static DEFINE_PER_CPU(int, stop_machine_poll); + +void stop_machine_poll_wait(void) +{ + int *poll = this_cpu_ptr(&stop_machine_poll); + + while (*poll) + cpu_relax(); + /* Enforce the work in stop machine to be visible */ + smp_mb(); +} + +static void stop_machine_poll_start(struct multi_stop_data *msdata) +{ + int cpu; + + if (housekeeping_enabled(HK_TYPE_KERNEL_NOISE)) + return; + + /* Random target can't be known in advance */ + if (!msdata->active_cpus) + return; + + for_each_cpu_andnot(cpu, cpu_online_mask, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)) { + int *poll = per_cpu_ptr(&stop_machine_poll, cpu); + + if (cpumask_test_cpu(cpu, msdata->active_cpus)) + continue; + + *poll = 1; + + /* + * Act as a full barrier so that if the work is queued, polling is + * visible. + */ + if (ct_set_cpu_work(cpu, CT_WORK_STOP_MACHINE)) + msdata->num_threads--; + else + *poll = 0; + } +} + +static void stop_machine_poll_complete(struct multi_stop_data *msdata) +{ + int cpu; + + if (housekeeping_enabled(HK_TYPE_KERNEL_NOISE)) + return; + + for_each_cpu_andnot(cpu, cpu_online_mask, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)) { + int *poll = per_cpu_ptr(&stop_machine_poll, cpu); + + if (cpumask_test_cpu(cpu, msdata->active_cpus)) + continue; + /* + * The RmW in ack_state() fully orders the work performed in stop_machine() + * with polling. + */ + *poll = 0; + } +} + static void set_state(struct multi_stop_data *msdata, enum multi_stop_state newstate) { @@ -186,10 +249,13 @@ static void set_state(struct multi_stop_data *msdata, } /* Last one to ack a state moves to the next state. */ -static void ack_state(struct multi_stop_data *msdata) +static bool ack_state(struct multi_stop_data *msdata) { - if (atomic_dec_and_test(&msdata->thread_ack)) + if (atomic_dec_and_test(&msdata->thread_ack)) { set_state(msdata, msdata->state + 1); + return true; + } + return false; } notrace void __weak stop_machine_yield(const struct cpumask *cpumask) @@ -240,7 +306,8 @@ static int multi_cpu_stop(void *data) default: break; } - ack_state(msdata); + if (ack_state(msdata) && msdata->state == MULTI_STOP_EXIT) + stop_machine_poll_complete(msdata); } else if (curstate > MULTI_STOP_PREPARE) { /* * At this stage all other CPUs we depend on must spin @@ -615,6 +682,8 @@ int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data, return ret; } + stop_machine_poll_start(&msdata); + /* Set the initial state and stop all online cpus. */ set_state(&msdata, MULTI_STOP_PREPARE); return stop_cpus(cpu_online_mask, multi_cpu_stop, &msdata);