From: Tim Chen <tim.c.chen@linux.intel.com>
To: Shrikanth Hegde <sshegde@linux.ibm.com>,
Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>, Chen Yu <yu.c.chen@intel.com>,
Doug Nelson <doug.nelson@intel.com>,
Mohini Narkhede <mohini.narkhede@intel.com>,
linux-kernel@vger.kernel.org,
Vincent Guittot <vincent.guittot@linaro.org>,
K Prateek Nayak <kprateek.nayak@amd.com>
Subject: Re: [RESEND PATCH] sched/fair: Skip sched_balance_running cmpxchg when balance is not due
Date: Fri, 03 Oct 2025 09:37:42 -0700 [thread overview]
Message-ID: <16f4c4312978bc1093df4cdba2f352fee33f8927.camel@linux.intel.com> (raw)
In-Reply-To: <204e1921-f3e3-41cf-bae7-36884f50503b@linux.ibm.com>
On Fri, 2025-10-03 at 10:53 +0530, Shrikanth Hegde wrote:
>
> On 10/3/25 4:30 AM, Tim Chen wrote:
> > Repost comments:
> >
> > There have been past discussions about avoiding serialization in load
> > balancing, but no objections were raised to this patch itself during
> > its last posting:
> > https://lore.kernel.org/lkml/20250416035823.1846307-1-tim.c.chen@linux.intel.com/
> >
> > Vincent and Chen Yu have already provided their Reviewed-by tags.
> >
> > We recently encountered this issue again on a 2-socket, 240-core
> > Clearwater Forest server running SPECjbb. In this case, 14% of CPU
> > cycles were wasted on unnecessary acquisitions of
> > sched_balance_running. This reinforces the need for the change, and we
> > hope it can be merged.
> >
> > Tim
> >
> > ---
> >
> > During load balancing, balancing at the LLC level and above must be
> > serialized. The scheduler currently checks the atomic
> > `sched_balance_running` flag before verifying whether a balance is
> > actually due. This causes high contention, as multiple CPUs may attempt
> > to acquire the flag concurrently.
> >
> > On a 2-socket Granite Rapids system with sub-NUMA clustering enabled
> > and running OLTP workloads, 7.6% of CPU cycles were spent on cmpxchg
> > operations for `sched_balance_running`. In most cases, the attempt
> > aborts immediately after acquisition because the load balance time is
> > not yet due.
> >
> > Fix this by checking whether a balance is due *before* trying to
> > acquire `sched_balance_running`. This avoids many wasted acquisitions
> > and reduces the cmpxchg overhead in `sched_balance_domain()` from 7.6%
> > to 0.05%. As a result, OLTP throughput improves by 11%.
> >
> > Reviewed-by: Chen Yu <yu.c.chen@intel.com>
> > Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > ---
>
> Hi Tim.
>
> Fine by me. unnecessary atomic operations do hurt on large systems.
> The further optimization that i pointed out can come in later i guess.
> That would help only further. this should be good to begin with.
Thanks for your review and your past comments. We'll look into further
optimization if we find that this became a hot path again.
For now this change seemed to be good enough.
Tim
>
> With that.
> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
>
next prev parent reply other threads:[~2025-10-03 16:37 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-02 23:00 [RESEND PATCH] sched/fair: Skip sched_balance_running cmpxchg when balance is not due Tim Chen
2025-10-03 5:23 ` Shrikanth Hegde
2025-10-03 16:37 ` Tim Chen [this message]
2025-10-13 14:26 ` Peter Zijlstra
2025-10-13 16:32 ` Chen, Yu C
2025-10-13 16:41 ` Shrikanth Hegde
2025-10-13 16:43 ` Chen, Yu C
2025-10-14 9:26 ` Peter Zijlstra
2025-10-13 21:54 ` Tim Chen
2025-10-14 9:24 ` Peter Zijlstra
2025-10-14 9:33 ` Shrikanth Hegde
2025-10-14 9:42 ` Peter Zijlstra
2025-10-14 9:51 ` Shrikanth Hegde
2025-10-16 14:03 ` Shrikanth Hegde
2025-10-22 17:42 ` Shrikanth Hegde
2025-10-14 13:50 ` Srikar Dronamraju
2025-10-14 13:59 ` Peter Zijlstra
2025-10-14 14:28 ` Shrikanth Hegde
2025-10-14 18:05 ` Tim Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=16f4c4312978bc1093df4cdba2f352fee33f8927.camel@linux.intel.com \
--to=tim.c.chen@linux.intel.com \
--cc=doug.nelson@intel.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=mohini.narkhede@intel.com \
--cc=peterz@infradead.org \
--cc=sshegde@linux.ibm.com \
--cc=vincent.guittot@linaro.org \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox