From: "Paul E. McKenney" <paulmck@kernel.org>
To: Vishal Chourasia <vishalc@linux.ibm.com>
Cc: Samir M <samir@linux.ibm.com>,
Joel Fernandes <joelagnelf@nvidia.com>,
peterz@infradead.org, aboorvad@linux.ibm.com,
boqun.feng@gmail.com, frederic@kernel.org, josh@joshtriplett.org,
linux-kernel@vger.kernel.org, neeraj.upadhyay@kernel.org,
rcu@vger.kernel.org, rostedt@goodmis.org, srikar@linux.ibm.com,
sshegde@linux.ibm.com, tglx@linutronix.de, urezki@gmail.com
Subject: Re: [PATCH v3 2/2] cpuhp: Expedite RCU grace periods during SMT operations
Date: Fri, 6 Mar 2026 07:12:04 -0800 [thread overview]
Message-ID: <bde1a8b9-7f56-45fb-830c-038fa7b85f0d@paulmck-laptop> (raw)
In-Reply-To: <aapprY-prH0l_WeK@linux.ibm.com>
On Fri, Mar 06, 2026 at 11:14:13AM +0530, Vishal Chourasia wrote:
> On Mon, Mar 02, 2026 at 05:17:16PM +0530, Samir M wrote:
> >
> > On 27/02/26 6:43 am, Joel Fernandes wrote:
> > > On Wed, Feb 18, 2026 at 02:09:18PM +0530, Vishal Chourasia wrote:
> > > > Expedite synchronize_rcu during the SMT mode switch operation when
> > > > initiated via /sys/devices/system/cpu/smt/control interface
> > > >
> > > After the locking related changes in patch 1, is expediting still required? I
> Yes.
> > > am just a bit concerned that we are papering over the real issue of over
> > > usage of synchronize_rcu() (which IIRC we discussed in earlier versions of
> > > the patches that reducing the number of lock acquire/release was supposed to
> > > help.)
> At present, I am not sure about the underlying issue. So far what I have
> found is when synchronize_rcu() is invoked, it marks the start of a new
> grace period number, say A. Thread invoking synchronize_rcu() blocks
> until all CPUs have reported QS for GP "A". There is a rcu grace period
> kthread that runs periodically looping over a CPU list to figure out all
> CPUs have reported QS. In the trace, I find some CPUs reporting QS for
> sequence number way back in the past for ex. A - N where N is > 10.
This can happen when a CPU goes idle for multiple grace periods, then
wakes up in the middle of a later grace period. This is (or at least is
supposed to be) harmless because a quiescent state was reported on that
CPU's behalf when RCU noticed that it was idle. The report is quashed
when RCU notices that the quiescent state being reported is for a grace
period that has already completed. Grace-period counter wrap is handled
by the infamous ->gpwrap field in the rcu_data structure.
I have seen N having four digits, with deep embedded devices being most
likely to have extremely large values of N.
Thanx, Paul
> > > Could you provide more justification of why expediting these sections is
> > > required if the locking concerns were addressed? It would be great if you can
> > > provide performance numbers with only the first patch and without the second
> > > patch. That way we can quantify this patch.
> > >
> > >
> > SMT Mode | Without Patch(Base) | both patch applied | % Improvement |
> > ------------------------------------------------------------------------|
> > SMT=off | 16m 13.956s | 6m 18.435s | +61.14 % |
> > SMT=on | 12m 0.982s | 5m 59.576s | +50.10 % |
> >
> > When I tested the below patch independently, I did not observe any
> > improvements for either smt=on or smt=off. However, in the smt=off scenario,
> > I encountered hung task splats (with call traces), where some threads were
> > blocked on cpus_read_lock. Please also refer to the attached call trace
> > below.
> > Patch 1:
> > https://lore.kernel.org/all/20260218083915.660252-4-vishalc@linux.ibm.com/
> >
> > SMT Mode | Without Patch(Base) | just patch 1 applied | % Improvement
> > |
> > ----------------------------------------------------------------------------|
> > SMT=off | 16m 13.956s | 16m 9.793s | +0.43 %
> > |
> > SMT=on | 12m 0.982s | 12m 19.494s | -2.57 %
> > |
> >
> >
> > Call traces:
> > 12377] [ T8746] Tainted: G E 7.0.0-rc1-150700.51-default-dirty #1
> > [ 1477.612384] [ T8746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [ 1477.612389] [ T8746] task:systemd state:D stack:0 pid:1 tgid:1
> > ppid:0 task_flags:0x400100 flags:0x00040000
> > [ 1477.612397] [ T8746] Call Trace:
> > [ 1477.612399] [ T8746] [c00000000cc0f4f0] [0000000000100000] 0x100000
> > (unreliable)
> > [ 1477.612416] [ T8746] [c00000000cc0f6a0] [c00000000001fe5c]
> > __switch_to+0x1dc/0x290
> > [ 1477.612425] [ T8746] [c00000000cc0f6f0] [c0000000012598ac]
> > __schedule+0x40c/0x1a70
> > [ 1477.612433] [ T8746] [c00000000cc0f840] [c00000000125af58]
> > schedule+0x48/0x1a0
> > [ 1477.612439] [ T8746] [c00000000cc0f870] [c0000000002e27b8]
> > percpu_rwsem_wait+0x198/0x200
> > [ 1477.612445] [ T8746] [c00000000cc0f8f0] [c000000001262930]
> > __percpu_down_read+0xb0/0x210
> > [ 1477.612449] [ T8746] [c00000000cc0f930] [c00000000022f400]
> > cpus_read_lock+0xc0/0xd0
> > [ 1477.612456] [ T8746] [c00000000cc0f950] [c0000000003a6398]
> > cgroup_procs_write_start+0x328/0x410
> > [ 1477.612462] [ T8746] [c00000000cc0fa00] [c0000000003a9620]
> > __cgroup_procs_write+0x70/0x2c0
> > [ 1477.612468] [ T8746] [c00000000cc0fac0] [c0000000003a98e8]
> > cgroup_procs_write+0x28/0x50
> > [ 1477.612473] [ T8746] [c00000000cc0faf0] [c0000000003a1624]
> > cgroup_file_write+0xb4/0x240
> > [ 1477.612478] [ T8746] [c00000000cc0fb50] [c000000000853ba8]
> > kernfs_fop_write_iter+0x1a8/0x2a0
> > [ 1477.612485] [ T8746] [c00000000cc0fba0] [c000000000733d5c]
> > vfs_write+0x27c/0x540
> > [ 1477.612491] [ T8746] [c00000000cc0fc50] [c000000000734350]
> > ksys_write+0x80/0x150
> > [ 1477.612495] [ T8746] [c00000000cc0fca0] [c000000000032898]
> > system_call_exception+0x148/0x320
> > [ 1477.612500] [ T8746] [c00000000cc0fe50] [c00000000000d6a0]
> > system_call_common+0x160/0x2c4
> > [ 1477.612506] [ T8746] ---- interrupt: c00 at 0x7fffa8f73df4
> > [ 1477.612509] [ T8746] NIP: 00007fffa8f73df4 LR: 00007fffa8eb6144 CTR:
> > 0000000000000000
> > [ 1477.612512] [ T8746] REGS: c00000000cc0fe80 TRAP: 0c00 Tainted: G
> > E (7.0.0-rc1-150700.51-default-dirty)
> > [ 1477.612515] [ T8746] MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR:
> > 28002288 XER: 00000000
> >
> >
>
> Default timeout is set to 8 mins.
>
> $ grep . /proc/sys/kernel/hung_task_timeout_secs
> /proc/sys/kernel/hung_task_timeout_secs:480
>
> Now that cpus_write_lock is taken once, and SMT mode switch can take
> tens of minutes to complete and relinquish the lock, threads waiting on
> cpus_read_lock will be blocked for this entire duration.
>
> Although there were no splats observed for "both patch applied" case
> the issue still remains.
>
> regards,
> vishal
next prev parent reply other threads:[~2026-03-06 15:12 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-18 8:39 [PATCH v3 0/2] cpuhp: Improve SMT switch time via lock batching and RCU expedition Vishal Chourasia
2026-02-18 8:39 ` [PATCH v3 1/2] cpuhp: Optimize SMT switch operation by batching lock acquisition Vishal Chourasia
2026-03-25 19:09 ` Thomas Gleixner
2026-03-26 10:06 ` Vishal Chourasia
2026-02-18 8:39 ` [PATCH v3 2/2] cpuhp: Expedite RCU grace periods during SMT operations Vishal Chourasia
2026-02-27 1:13 ` Joel Fernandes
2026-03-02 11:47 ` Samir M
2026-03-06 5:44 ` Vishal Chourasia
2026-03-06 15:12 ` Paul E. McKenney [this message]
2026-03-20 18:49 ` Vishal Chourasia
2026-03-25 19:10 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bde1a8b9-7f56-45fb-830c-038fa7b85f0d@paulmck-laptop \
--to=paulmck@kernel.org \
--cc=aboorvad@linux.ibm.com \
--cc=boqun.feng@gmail.com \
--cc=frederic@kernel.org \
--cc=joelagnelf@nvidia.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=neeraj.upadhyay@kernel.org \
--cc=peterz@infradead.org \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=samir@linux.ibm.com \
--cc=srikar@linux.ibm.com \
--cc=sshegde@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=urezki@gmail.com \
--cc=vishalc@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox