Re: [PATCH] cpuhp: Expedite synchronize_rcu during CPU hotplug operations

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vishal Chourasia <vishalc@linux.ibm.com>
To: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Samir M <samir@linux.ibm.com>,
	rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
	paulmck@kernel.org, frederic@kernel.org,
	neeraj.upadhyay@kernel.org, josh@joshtriplett.org,
	boqun.feng@gmail.com, urezki@gmail.com, rostedt@goodmis.org,
	tglx@linutronix.de, peterz@infradead.org, sshegde@linux.ibm.com,
	srikar@linux.ibm.com
Subject: Re: [PATCH] cpuhp: Expedite synchronize_rcu during CPU hotplug operations
Date: Mon, 2 Feb 2026 14:16:53 +0530	[thread overview]
Message-ID: <aYBkfZnP6zmWCaTX@linux.ibm.com> (raw)
In-Reply-To: <20260119051835.GA696111@joelbox2>

On Mon, Jan 12, 2026 at 03:13:33PM +0530, Vishal Chourasia wrote:
> Performance data on a PPC64 system with 400 CPUs:
> 
>+ ppc64_cpu --smt=1 (SMT8 to SMT1)
>Before: real 1m14.792s
>After:  real 0m03.205s  # ~23x improvement
>
>+ ppc64_cpu --smt=8 (SMT1 to SMT8)
>Before: real 2m27.695s
>After:  real 0m02.510s  # ~58x improvement
>
>Above numbers were collected on Linux 6.19.0-rc4-00310-g755bc1335e3b

On Mon, Jan 19, 2026 at 12:18:35AM -0500, Joel Fernandes wrote:
> Hi Vishal, Samir,
> 
> Thanks for the testing on your large CPU count system.
> 
> Considering the SMT=on performance is still terrible, before we expedite
> RCU, could we try the approach Peter suggested (avoiding repeated
> lock/unlock)? I wrote a patch below.
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git
>   tag: cpuhp-bulk-optimize-rfc-v1
> 
> I tested it lightly on rcutorture hotplug test and it passes. Please share
> any performance results, thanks.
> 
> Also I'd like to use expediting of RCU as a last resort TBH, we should
> optimize the outer operations that require RCU in the first place such as
> Peter's suggestion since that will improve the overall efficiency of the
> code. And if/when expediting RCU, Peter's other suggestion to not do it in
> cpus_write_lock() and instead do it from cpuhp_smt_enable() also makes sense
> to me.
> 
> ---8<-----------------------
> 
> From: Joel Fernandes <joelagnelf@nvidia.com>
> Subject: [PATCH] cpuhp: Optimize batch SMT enable by reducing lock acquiring
> 
> Bulk CPU hotplug operations such as enabling SMT across all cores
> require hotplugging multiple CPUs. The current implementation takes
> cpus_write_lock() for each individual CPU causing multiple slow grace
> period requests.
> 
> Therefore introduce cpu_up_locked() that assumes the caller already
> holds cpus_write_lock(). The cpuhp_smt_enable() function is updated to
> hold the lock once around the entire loop rather than for each CPU.
> 
> Link: https://lore.kernel.org/all/20260113090153.GS830755@noisy.programming.kicks-ass.net/
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  kernel/cpu.c | 40 +++++++++++++++++++++++++---------------
>  1 file changed, 25 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 8df2d773fe3b..4ce7deb236d7 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1623,34 +1623,31 @@ void cpuhp_online_idle(enum cpuhp_state state)
>  	complete_ap_thread(st, true);
>  }
> -/* Requires cpu_add_remove_lock to be held */
> -static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
> +/* Requires cpu_add_remove_lock and cpus_write_lock to be held. */
> +static int cpu_up_locked(unsigned int cpu, int tasks_frozen,
> +			 enum cpuhp_state target)
>  {
>  	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>  	struct task_struct *idle;
>  	int ret = 0;
> -	cpus_write_lock();
> +	lockdep_assert_cpus_held();
> -	if (!cpu_present(cpu)) {
> -		ret = -EINVAL;
> -		goto out;
> -	}
> +	if (!cpu_present(cpu))
> +		return -EINVAL;
>  	/*
>  	 * The caller of cpu_up() might have raced with another
>  	 * caller. Nothing to do.
>  	 */
>  	if (st->state >= target)
> -		goto out;
> +		return 0;
>  	if (st->state == CPUHP_OFFLINE) {
>  		/* Let it fail before we try to bring the cpu up */
>  		idle = idle_thread_get(cpu);
> -		if (IS_ERR(idle)) {
> -			ret = PTR_ERR(idle);
> -			goto out;
> -		}
> +		if (IS_ERR(idle))
> +			return PTR_ERR(idle);
>  		/*
>  		 * Reset stale stack state from the last time this CPU was online.
> @@ -1673,7 +1670,7 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
>  		 * return the error code..
>  		 */
>  		if (ret)
> -			goto out;
> +			return ret;
>  	}
>  	/*
> @@ -1683,7 +1680,16 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
>  	 */
>  	target = min((int)target, CPUHP_BRINGUP_CPU);
>  	ret = cpuhp_up_callbacks(cpu, st, target);
> -out:
> +	return ret;
> +}
> +
> +/* Requires cpu_add_remove_lock to be held */
> +static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
> +{
> +	int ret;
> +
> +	cpus_write_lock();
> +	ret = cpu_up_locked(cpu, tasks_frozen, target);
>  	cpus_write_unlock();
>  	arch_smt_update();
>  	return ret;
> @@ -2715,6 +2721,8 @@ int cpuhp_smt_enable(void)
>  	int cpu, ret = 0;
>  	cpu_maps_update_begin();
> +	/* Hold cpus_write_lock() for entire batch operation. */
> +	cpus_write_lock();
>  	cpu_smt_control = CPU_SMT_ENABLED;
>  	for_each_present_cpu(cpu) {
>  		/* Skip online CPUs and CPUs on offline nodes */
> @@ -2722,12 +2730,14 @@ int cpuhp_smt_enable(void)
>  			continue;
>  		if (!cpu_smt_thread_allowed(cpu) || !topology_is_core_online(cpu))
>  			continue;
> -		ret = _cpu_up(cpu, 0, CPUHP_ONLINE);
> +		ret = cpu_up_locked(cpu, 0, CPUHP_ONLINE);
>  		if (ret)
>  			break;
>  		/* See comment in cpuhp_smt_disable() */
>  		cpuhp_online_cpu_device(cpu);
>  	}
> +	cpus_write_unlock();
> +	arch_smt_update();
>  	cpu_maps_update_done();
>  	return ret;
>  }
> -- 
> 2.34.1
> 


Hi Joel,

I tested above patch on 400 CPU machine that I had originally posted the
numbers for.

# time echo 1 > /sys/devices/system/cpu/smt/control
real    1m27.133s # Base 
real    1m25.859s # With patch

# time echo 8 > /sys/devices/system/cpu/smt/control
real    1m0.682s  # Base           
real    1m3.423s  # With patch

     prev parent reply	other threads:[~2026-02-02  8:47 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-12  9:43 [PATCH] cpuhp: Expedite synchronize_rcu during CPU hotplug operations Vishal Chourasia
2026-01-12 10:08 ` Uladzislau Rezki
2026-01-12 10:43   ` Vishal Chourasia
2026-01-12 11:07     ` Uladzislau Rezki
2026-01-12 12:02   ` Shrikanth Hegde
2026-01-12 12:57     ` Uladzislau Rezki
2026-01-12 16:09       ` Joel Fernandes
2026-01-12 16:48         ` Paul E. McKenney
2026-01-12 17:05           ` Uladzislau Rezki
2026-01-12 18:27             ` Vishal Chourasia
2026-01-13  0:03               ` Paul E. McKenney
2026-01-12 22:24           ` Joel Fernandes
2026-01-13  0:01             ` Paul E. McKenney
2026-01-13  2:46               ` Joel Fernandes
2026-01-13  4:53                 ` Shrikanth Hegde
2026-01-13  8:57                   ` Joel Fernandes
2026-01-14  4:00                     ` Paul E. McKenney
2026-01-14  8:54                       ` Joel Fernandes
2026-01-16 19:02                         ` Paul E. McKenney
2026-01-14  3:59                 ` Paul E. McKenney
2026-01-12 17:09         ` Uladzislau Rezki
2026-01-12 17:36           ` Joel Fernandes
2026-01-13 12:18             ` Uladzislau Rezki
2026-01-13 12:44               ` Joel Fernandes
2026-01-13 14:17                 ` Uladzislau Rezki
2026-01-13 14:32                   ` Joel Fernandes
2026-01-13 14:53                     ` Shrikanth Hegde
2026-01-13 18:17                       ` Uladzislau Rezki
2026-01-13 17:58                     ` Uladzislau Rezki
2026-01-12 12:21 ` Shrikanth Hegde
2026-01-12 12:46   ` Vishal Chourasia
2026-01-12 14:03 ` Joel Fernandes
2026-01-12 14:20   ` Joel Fernandes
2026-01-12 14:23     ` Peter Zijlstra
2026-01-12 14:37       ` Joel Fernandes
2026-01-12 17:52         ` Vishal Chourasia
2026-01-12 14:24 ` Peter Zijlstra
2026-01-12 18:00   ` Vishal Chourasia
2026-01-13  9:01     ` Peter Zijlstra
2026-01-19 10:47       ` [PATCH] cpuhp: Expedite synchronize_rcu during SMT switch Vishal Chourasia
2026-01-19 11:43         ` Peter Zijlstra
2026-01-19 13:45           ` Shrikanth Hegde
2026-01-19 14:11             ` Peter Zijlstra
2026-01-19 14:45               ` Joel Fernandes
2026-01-19 14:59                 ` Peter Zijlstra
2026-01-27 17:48           ` Samir M
2026-01-29  7:05             ` Samir M
2026-02-03  6:31             ` Samir M
2026-01-19 10:54       ` [RESEND] " Vishal Chourasia
2026-01-18 11:38 ` [PATCH] cpuhp: Expedite synchronize_rcu during CPU hotplug operations Samir M
2026-01-19  5:18   ` Joel Fernandes
2026-01-19 13:53     ` Shrikanth Hegde
2026-01-19 21:10       ` joelagnelf
2026-02-02  8:46     ` Vishal Chourasia [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYBkfZnP6zmWCaTX@linux.ibm.com \
    --to=vishalc@linux.ibm.com \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=joelagnelf@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=samir@linux.ibm.com \
    --cc=srikar@linux.ibm.com \
    --cc=sshegde@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.