All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Chen Yu <yu.c.chen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	 Vincent Guittot <vincent.guittot@linaro.org>,
	linux-kernel@vger.kernel.org,
	Vinicius Gomes <vinicius.gomes@intel.com>
Subject: Re: [PATCH] sched/balance: Skip unnecessary updates to idle load balancer's flags
Date: Mon, 03 Jun 2024 09:13:47 -0700	[thread overview]
Message-ID: <c3d2a49e580cea9ae86e692f72094119310adc8f.camel@linux.intel.com> (raw)
In-Reply-To: <Zlygeqy+SVs1KZYW@chenyu5-mobl2>

On Mon, 2024-06-03 at 00:40 +0800, Chen Yu wrote:
> > 
> > With instrumentation, we found that 81% of the updates do not result in
> > any change in the ilb_cpu's flags.  That is, multiple cpus are asking
> > the ilb_cpu to do the same things over and over again, before the ilb_cpu
> > has a chance to run NOHZ load balance.
> > 
> > Skip updates to ilb_cpu's flags if no new work needs to be done.
> > Such updates do not change ilb_cpu's NOHZ flags.  This requires an extra
> > atomic read but it is less expensive than frequent unnecessary atomic
> > updates that generate cache bounces.
> 
> A race condition is that many CPUs choose the same ilb_cpu and ask it to trigger
> the nohz idle balance. This is because find_new_ilb() always finds the first
> nohz idle CPU. I wonder if we could change the
> for_each_cpu_and(ilb_cpu, nohz.idle_cpus_mask, hk_mask)
> into
> for_each_cpu_wrap(ilb_cpu,  cpumask_and(nohz.idle_cpus_mask, hk_mask), this_cpu+1) 
> so different ilb_cpu might be found by different CPUs.
> Then the extra atomic read could brings less cache bounces.
> 

Your proposal improves scaling.  However,
that could result in many idle CPUs getting kicked.  I assume that
current approach of delegating to a common idle CPU will disturb fewer CPUs
and let them stay in deeper idle states, and get the power benefits
from NOHZ scheme.

> > 
> > We saw that on the OLTP workload, cpu cycles from trigger_load_balance()
> > (or sched_balance_trigger()) got reduced from 0.7% to 0.2%.
> > 
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > ---
> >  kernel/sched/fair.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 8a5b1ae0aa55..9ab6dff6d8ac 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -11891,6 +11891,13 @@ static void kick_ilb(unsigned int flags)
> >  	if (ilb_cpu < 0)
> >  		return;
> >  
> > +	/*
> > +	 * Don't bother if no new NOHZ balance work items for ilb_cpu,
> > +	 * i.e. all bits in flags are already set in ilb_cpu.
> > +	 */
> > +	if ((atomic_read(nohz_flags(ilb_cpu)) & flags) == flags)
> 
> Maybe also mention in the comment that when above statement is true, the
> current ilb_cpu's flags is not 0 and in NOHZ_KICK_MASK, so return directly
> here is safe(anyway just 2 cents)

Not sure I follow your comments about return being safe.  Let me explain
in details.

We will return directly if and only if the bits set in flags are also set
in nohz_flags(ilb_cpu).  

The comment's intention is to say that if the above statement is true, then
the later operation of 

	atomic_fetch_or(flags, nohz_flags(ilb_cpu))

will be useless and not result in any change to nohz_flags(ilb_cpu), since all the set bits
in flags are already set in nohz_flags(ilb_cpu).

So will it be clearer if I say

	/*
	 * Don't bother if no new NOHZ balance work items for ilb_cpu,
	 * i.e. all bits in flags are already set in ilb_cpu.
	 * Later OR of flags to nohz_flags(ilb_cpu)
	 * will not change nohz_flags(ilb_cpu).
	 */

Thanks.


Tim

> Reviewed-by: Chen Yu <yu.c.chen@intel.com>
> 
> thanks,
> Chenyu
> 
> > +		return;
> > +
> >  	/*
> >  	 * Access to rq::nohz_csd is serialized by NOHZ_KICK_MASK; he who sets
> >  	 * the first flag owns it; cleared by nohz_csd_func().
> > -- 
> > 2.32.0
> > 


  reply	other threads:[~2024-06-03 16:16 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-31 20:54 [PATCH] sched/balance: Skip unnecessary updates to idle load balancer's flags Tim Chen
2024-06-02 16:40 ` Chen Yu
2024-06-03 16:13   ` Tim Chen [this message]
2024-06-04  2:10     ` Chen Yu
2024-06-04 22:32       ` Tim Chen
2024-06-04 14:37 ` Vincent Guittot
2024-06-05 14:54 ` [tip: sched/core] " tip-bot2 for Tim Chen
2024-06-05 17:07   ` Tim Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c3d2a49e580cea9ae86e692f72094119310adc8f.camel@linux.intel.com \
    --to=tim.c.chen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vinicius.gomes@intel.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.