All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Chen, Yu C" <yu.c.chen@intel.com>
To: Shrikanth Hegde <sshegde@linux.ibm.com>,
	Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Doug Nelson <doug.nelson@intel.com>,
	Mohini Narkhede <mohini.narkhede@intel.com>,
	<linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"Ingo Molnar" <mingo@kernel.org>
Subject: Re: [PATCH] sched: Skip useless sched_balance_running acquisition if load balance is not due
Date: Wed, 16 Apr 2025 14:28:55 +0800	[thread overview]
Message-ID: <667f2076-fbcd-4da7-8e4b-a8190a673355@intel.com> (raw)
In-Reply-To: <fbe29b49-92af-4b8c-b7c8-3c15405e5f15@linux.ibm.com>

Hi Shrikanth,

On 4/16/2025 1:30 PM, Shrikanth Hegde wrote:
> 
> 
> On 4/16/25 09:28, Tim Chen wrote:
>> At load balance time, balance of last level cache domains and
>> above needs to be serialized. The scheduler checks the atomic var
>> sched_balance_running first and then see if time is due for a load
>> balance. This is an expensive operation as multiple CPUs can attempt
>> sched_balance_running acquisition at the same time.
>>
>> On a 2 socket Granite Rapid systems enabling sub-numa cluster and
>> running OLTP workloads, 7.6% of cpu cycles are spent on cmpxchg of
>> sched_balance_running.  Most of the time, a balance attempt is aborted
>> immediately after acquiring sched_balance_running as load balance time
>> is not due.
>>
>> Instead, check balance due time first before acquiring
>> sched_balance_running. This skips many useless acquisitions
>> of sched_balance_running and knocks the 7.6% CPU overhead on
>> sched_balance_domain() down to 0.05%.  Throughput of the OLTP workload
>> improved by 11%.
>>
> 
> Hi Tim.
> 
> Time check makes sense specially on large systems mainly due to NEWIDLE 
> balance.
> 

Could you elaborate a little on this statement? There is no timeout 
mechanism like periodic load balancer for the NEWLY_IDLE, right?


> One more point to add, A lot of time, the CPU which acquired 
> sched_balance_running,
> need not end up doing the load balance, since it not the CPU meant to do 
> the load balance.
> 
> This thread.
> https://lore.kernel.org/all/1e43e783-55e7-417f- 
> a1a7-503229eb163a@linux.ibm.com/
> 
> 
> Best thing probably is to acquire it if this CPU has passed the time 
> check and as well it is
> actually going to do load balance.
> 
> 

This is a good point, and we might only want to deal with periodic load
balancer rather than NEWLY_IDLE balance. Because the latter is too 
frequent and contention on the sched_balance_running might introduce
high cache contention.

thanks,
Chenyu

>> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
>> Reported-by: Mohini Narkhede <mohini.narkhede@intel.com>
>> Tested-by: Mohini Narkhede <mohini.narkhede@intel.com>
>> ---
>>   kernel/sched/fair.c | 16 ++++++++--------
>>   1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index e43993a4e580..5e5f7a770b2f 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -12220,13 +12220,13 @@ static void sched_balance_domains(struct rq 
>> *rq, enum cpu_idle_type idle)
>>           interval = get_sd_balance_interval(sd, busy);
>> -        need_serialize = sd->flags & SD_SERIALIZE;
>> -        if (need_serialize) {
>> -            if (atomic_cmpxchg_acquire(&sched_balance_running, 0, 1))
>> -                goto out;
>> -        }
>> -
>>           if (time_after_eq(jiffies, sd->last_balance + interval)) {
>> +            need_serialize = sd->flags & SD_SERIALIZE;
>> +            if (need_serialize) {
>> +                if (atomic_cmpxchg_acquire(&sched_balance_running, 0, 
>> 1))
>> +                    goto out;
>> +            }
>> +
>>               if (sched_balance_rq(cpu, rq, sd, idle, 
>> &continue_balancing)) {
>>                   /*
>>                    * The LBF_DST_PINNED logic could have changed
>> @@ -12238,9 +12238,9 @@ static void sched_balance_domains(struct rq 
>> *rq, enum cpu_idle_type idle)
>>               }
>>               sd->last_balance = jiffies;
>>               interval = get_sd_balance_interval(sd, busy);
>> +            if (need_serialize)
>> +                atomic_set_release(&sched_balance_running, 0);
>>           }
>> -        if (need_serialize)
>> -            atomic_set_release(&sched_balance_running, 0);
>>   out:
>>           if (time_after(next_balance, sd->last_balance + interval)) {
>>               next_balance = sd->last_balance + interval;
> 

  reply	other threads:[~2025-04-16  6:29 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-16  3:58 [PATCH] sched: Skip useless sched_balance_running acquisition if load balance is not due Tim Chen
2025-04-16  5:30 ` Shrikanth Hegde
2025-04-16  6:28   ` Chen, Yu C [this message]
2025-04-16  9:16     ` Shrikanth Hegde
2025-04-16  9:29       ` Shrikanth Hegde
2025-04-16  9:47         ` Vincent Guittot
2025-04-16 14:14           ` Shrikanth Hegde
2025-04-17 11:10             ` K Prateek Nayak
2025-04-18 15:02             ` Vincent Guittot
2025-04-18 17:55               ` Shrikanth Hegde
2025-04-17 11:31           ` K Prateek Nayak
2025-04-17 12:01             ` Peter Zijlstra
2025-04-18  5:26               ` K Prateek Nayak
2025-04-18  9:28                 ` Peter Zijlstra
2025-04-18 12:13                   ` K Prateek Nayak
2025-04-16 16:19       ` Tim Chen
2025-04-16 17:11         ` Shrikanth Hegde
2025-04-17  9:19         ` Shrikanth Hegde
2025-04-17 17:12           ` Tim Chen
2025-05-29  9:00 ` K Prateek Nayak
2025-06-04  4:26 ` Chen, Yu C
2025-06-06 13:51 ` Vincent Guittot
2025-10-27 18:06   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=667f2076-fbcd-4da7-8e4b-a8190a673355@intel.com \
    --to=yu.c.chen@intel.com \
    --cc=doug.nelson@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mohini.narkhede@intel.com \
    --cc=peterz@infradead.org \
    --cc=sshegde@linux.ibm.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.