Re: [PATCH v3 2/2] cpuhp: Expedite RCU grace periods during SMT operations

public inbox for rcu@vger.kernel.org
 help / color / mirror / Atom feed

From: Samir M <samir@linux.ibm.com>
To: Joel Fernandes <joelagnelf@nvidia.com>,
	Vishal Chourasia <vishalc@linux.ibm.com>
Cc: peterz@infradead.org, aboorvad@linux.ibm.com,
	boqun.feng@gmail.com, frederic@kernel.org, josh@joshtriplett.org,
	linux-kernel@vger.kernel.org, neeraj.upadhyay@kernel.org,
	paulmck@kernel.org, rcu@vger.kernel.org, rostedt@goodmis.org,
	srikar@linux.ibm.com, sshegde@linux.ibm.com, tglx@linutronix.de,
	urezki@gmail.com
Subject: Re: [PATCH v3 2/2] cpuhp: Expedite RCU grace periods during SMT operations
Date: Mon, 2 Mar 2026 17:17:16 +0530	[thread overview]
Message-ID: <94b3284b-885d-4263-99ed-728375c1a2b7@linux.ibm.com> (raw)
In-Reply-To: <20260227011352.GA1089964@joelbox2>


On 27/02/26 6:43 am, Joel Fernandes wrote:
> On Wed, Feb 18, 2026 at 02:09:18PM +0530, Vishal Chourasia wrote:
>> Expedite synchronize_rcu during the SMT mode switch operation when
>> initiated via /sys/devices/system/cpu/smt/control interface
>>
>> SMT mode switch operation i.e. between SMT 8 to SMT 1 or vice versa and
>> others are user driven operations and therefore should complete as soon
>> as possible. Switching SMT states involves iterating over a list of CPUs
>> and performing hotplug operations. It was found these transitions took
>> significantly large amount of time to complete particularly on
>> high-core-count systems.
>>
>> Suggested-by: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
>> ---
>>   include/linux/rcupdate.h | 8 ++++++++
>>   kernel/cpu.c             | 4 ++++
>>   kernel/rcu/rcu.h         | 4 ----
>>   3 files changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>> index 7729fef249e1..61b80c29d53b 100644
>> --- a/include/linux/rcupdate.h
>> +++ b/include/linux/rcupdate.h
>> @@ -1190,6 +1190,14 @@ rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f)
>>   extern int rcu_expedited;
>>   extern int rcu_normal;
>>   
>> +#ifdef CONFIG_TINY_RCU
>> +static inline void rcu_expedite_gp(void) { }
>> +static inline void rcu_unexpedite_gp(void) { }
>> +#else
>> +void rcu_expedite_gp(void);
>> +void rcu_unexpedite_gp(void);
>> +#endif
>> +
>>   DEFINE_LOCK_GUARD_0(rcu, rcu_read_lock(), rcu_read_unlock())
>>   DECLARE_LOCK_GUARD_0_ATTRS(rcu, __acquires_shared(RCU), __releases_shared(RCU))
>>   
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index 62e209eda78c..1377a68d6f47 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -2682,6 +2682,7 @@ int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval)
>>   		ret = -EBUSY;
>>   		goto out;
>>   	}
>> +	rcu_expedite_gp();
> After the locking related changes in patch 1, is expediting still required? I
> am just a bit concerned that we are papering over the real issue of over
> usage of synchronize_rcu() (which IIRC we discussed in earlier versions of
> the patches that reducing the number of lock acquire/release was supposed to
> help.)
>
> Could you provide more justification of why expediting these sections is
> required if the locking concerns were addressed? It would be great if you can
> provide performance numbers with only the first patch and without the second
> patch. That way we can quantify this patch.
>
> thanks,
>
> --
> Joel Fernandes
>
Hi Vishal/Joel,


Configuration:
     •    Kernel version: 7.0.0-rc1
     •    Number of CPUs: 1536

I have verified the below two patches together and observed improvements,
Patch 1: 
https://lore.kernel.org/all/20260218083915.660252-4-vishalc@linux.ibm.com/

Patch 2: 
https://lore.kernel.org/all/20260218083915.660252-6-vishalc@linux.ibm.com/
SMT Mode    | Without Patch(Base) | both patch applied | % Improvement  |
------------------------------------------------------------------------|
SMT=off     | 16m 13.956s         |     6m 18.435s     |  +61.14 %      |
SMT=on      | 12m 0.982s          |     5m 59.576s     |  +50.10 %      |

When I tested the below patch independently, I did not observe any 
improvements for either smt=on or smt=off. However, in the smt=off 
scenario, I encountered hung task splats (with call traces), where some 
threads were blocked on cpus_read_lock. Please also refer to the 
attached call trace below.
Patch 1: 
https://lore.kernel.org/all/20260218083915.660252-4-vishalc@linux.ibm.com/

SMT Mode    | Without Patch(Base) | just patch 1 applied   | % 
Improvement  |
----------------------------------------------------------------------------|
SMT=off     | 16m 13.956s         |     16m 9.793s         |  +0.43 %    
    |
SMT=on      | 12m 0.982s          |     12m 19.494s        |  -2.57 %    
    |


Call traces:
12377] [  T8746]    Tainted: G      E 7.0.0-rc1-150700.51-default-dirty #1
[ 1477.612384] [  T8746] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1477.612389] [  T8746] task:systemd     state:D stack:0   pid:1 
  tgid:1   ppid:0   task_flags:0x400100 flags:0x00040000
[ 1477.612397] [  T8746] Call Trace:
[ 1477.612399] [  T8746] [c00000000cc0f4f0] [0000000000100000] 0x100000 
(unreliable)
[ 1477.612416] [  T8746] [c00000000cc0f6a0] [c00000000001fe5c] 
__switch_to+0x1dc/0x290
[ 1477.612425] [  T8746] [c00000000cc0f6f0] [c0000000012598ac] 
__schedule+0x40c/0x1a70
[ 1477.612433] [  T8746] [c00000000cc0f840] [c00000000125af58] 
schedule+0x48/0x1a0
[ 1477.612439] [  T8746] [c00000000cc0f870] [c0000000002e27b8] 
percpu_rwsem_wait+0x198/0x200
[ 1477.612445] [  T8746] [c00000000cc0f8f0] [c000000001262930] 
__percpu_down_read+0xb0/0x210
[ 1477.612449] [  T8746] [c00000000cc0f930] [c00000000022f400] 
cpus_read_lock+0xc0/0xd0
[ 1477.612456] [  T8746] [c00000000cc0f950] [c0000000003a6398] 
cgroup_procs_write_start+0x328/0x410
[ 1477.612462] [  T8746] [c00000000cc0fa00] [c0000000003a9620] 
__cgroup_procs_write+0x70/0x2c0
[ 1477.612468] [  T8746] [c00000000cc0fac0] [c0000000003a98e8] 
cgroup_procs_write+0x28/0x50
[ 1477.612473] [  T8746] [c00000000cc0faf0] [c0000000003a1624] 
cgroup_file_write+0xb4/0x240
[ 1477.612478] [  T8746] [c00000000cc0fb50] [c000000000853ba8] 
kernfs_fop_write_iter+0x1a8/0x2a0
[ 1477.612485] [  T8746] [c00000000cc0fba0] [c000000000733d5c] 
vfs_write+0x27c/0x540
[ 1477.612491] [  T8746] [c00000000cc0fc50] [c000000000734350] 
ksys_write+0x80/0x150
[ 1477.612495] [  T8746] [c00000000cc0fca0] [c000000000032898] 
system_call_exception+0x148/0x320
[ 1477.612500] [  T8746] [c00000000cc0fe50] [c00000000000d6a0] 
system_call_common+0x160/0x2c4
[ 1477.612506] [  T8746] ---- interrupt: c00 at 0x7fffa8f73df4
[ 1477.612509] [  T8746] NIP: 00007fffa8f73df4 LR: 00007fffa8eb6144 CTR: 
0000000000000000
[ 1477.612512] [  T8746] REGS: c00000000cc0fe80 TRAP: 0c00 Tainted: G    
   E    (7.0.0-rc1-150700.51-default-dirty)
[ 1477.612515] [  T8746] MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> 
CR: 28002288 XER: 00000000



Regards,
Samir
>>   	/* Hold cpus_write_lock() for entire batch operation. */
>>   	cpus_write_lock();
>>   	for_each_online_cpu(cpu) {
>> @@ -2714,6 +2715,7 @@ int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval)
>>   	if (!ret)
>>   		cpu_smt_control = ctrlval;
>>   	cpus_write_unlock();
>> +	rcu_unexpedite_gp();
>>   	arch_smt_update();
>>   out:
>>   	cpu_maps_update_done();
>> @@ -2733,6 +2735,7 @@ int cpuhp_smt_enable(void)
>>   	int cpu, ret = 0;
>>   
>>   	cpu_maps_update_begin();
>> +	rcu_expedite_gp();
>>   	/* Hold cpus_write_lock() for entire batch operation. */
>>   	cpus_write_lock();
>>   	cpu_smt_control = CPU_SMT_ENABLED;
>> @@ -2749,6 +2752,7 @@ int cpuhp_smt_enable(void)
>>   		cpuhp_online_cpu_device(cpu);
>>   	}
>>   	cpus_write_unlock();
>> +	rcu_unexpedite_gp();
>>   	arch_smt_update();
>>   	cpu_maps_update_done();
>>   	return ret;
>> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
>> index dc5d614b372c..41a0d262e964 100644
>> --- a/kernel/rcu/rcu.h
>> +++ b/kernel/rcu/rcu.h
>> @@ -512,8 +512,6 @@ do {									\
>>   static inline bool rcu_gp_is_normal(void) { return true; }
>>   static inline bool rcu_gp_is_expedited(void) { return false; }
>>   static inline bool rcu_async_should_hurry(void) { return false; }
>> -static inline void rcu_expedite_gp(void) { }
>> -static inline void rcu_unexpedite_gp(void) { }
>>   static inline void rcu_async_hurry(void) { }
>>   static inline void rcu_async_relax(void) { }
>>   static inline bool rcu_cpu_online(int cpu) { return true; }
>> @@ -521,8 +519,6 @@ static inline bool rcu_cpu_online(int cpu) { return true; }
>>   bool rcu_gp_is_normal(void);     /* Internal RCU use. */
>>   bool rcu_gp_is_expedited(void);  /* Internal RCU use. */
>>   bool rcu_async_should_hurry(void);  /* Internal RCU use. */
>> -void rcu_expedite_gp(void);
>> -void rcu_unexpedite_gp(void);
>>   void rcu_async_hurry(void);
>>   void rcu_async_relax(void);
>>   void rcupdate_announce_bootup_oddness(void);
>> -- 
>> 2.53.0
>>

next prev parent reply	other threads:[~2026-03-02 11:47 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-18  8:39 [PATCH v3 0/2] cpuhp: Improve SMT switch time via lock batching and RCU expedition Vishal Chourasia
2026-02-18  8:39 ` [PATCH v3 1/2] cpuhp: Optimize SMT switch operation by batching lock acquisition Vishal Chourasia
2026-03-25 19:09   ` Thomas Gleixner
2026-03-26 10:06     ` Vishal Chourasia
2026-02-18  8:39 ` [PATCH v3 2/2] cpuhp: Expedite RCU grace periods during SMT operations Vishal Chourasia
2026-02-27  1:13   ` Joel Fernandes
2026-03-02 11:47     ` Samir M [this message]
2026-03-06  5:44       ` Vishal Chourasia
2026-03-06 15:12         ` Paul E. McKenney
2026-03-20 18:49           ` Vishal Chourasia
2026-03-25 19:10   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94b3284b-885d-4263-99ed-728375c1a2b7@linux.ibm.com \
    --to=samir@linux.ibm.com \
    --cc=aboorvad@linux.ibm.com \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=joelagnelf@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=srikar@linux.ibm.com \
    --cc=sshegde@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    --cc=vishalc@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox