From: Joel Fernandes <joel@joelfernandes.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Valentin Schneider <vschneid@redhat.com>,
Vineeth Pillai <vineethrp@google.com>,
Suleiman Souhlal <suleiman@google.com>,
Hsin Yi <hsinyi@google.com>,
Frederic Weisbecker <frederic@kernel.org>,
"Paul E . McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH RFC] sched/fair: Avoid unnecessary IPIs for ILB
Date: Fri, 6 Oct 2023 16:32:16 +0000 [thread overview]
Message-ID: <20231006163216.GA2188081@google.com> (raw)
In-Reply-To: <ZR/mvd8Uw8PG+jx0@gmail.com>
On Fri, Oct 06, 2023 at 12:51:41PM +0200, Ingo Molnar wrote:
>
> * Joel Fernandes (Google) <joel@joelfernandes.org> wrote:
>
> > From: Vineeth Pillai <vineethrp@google.com>
> >
> > Whenever a CPU stops its tick, it now requires another idle CPU to handle the
> > balancing for it because it can't perform its own periodic load balancing.
> > This means it might need to update 'nohz.next_balance' to 'rq->next_balance' if
> > the upcoming nohz-idle load balancing is too distant in the future. This update
> > process is done by triggering an ILB, as the general ILB handler
> > (_nohz_idle_balance) that manages regular nohz balancing also refreshes
> > 'nohz.next_balance' by looking at the 'rq->next_balance' of all other idle CPUs
> > and selecting the smallest value.
> >
[...snip...]
> > kernel/sched/fair.c | 21 ++++++++++++++-------
> > 1 file changed, 14 insertions(+), 7 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index cb225921bbca..2ece55f32782 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -11786,13 +11786,12 @@ void nohz_balance_enter_idle(int cpu)
> > /*
> > * Ensures that if nohz_idle_balance() fails to observe our
> > * @idle_cpus_mask store, it must observe the @has_blocked
> > - * and @needs_update stores.
> > + * stores.
> > */
> > smp_mb__after_atomic();
> >
> > set_cpu_sd_state_idle(cpu);
> >
> > - WRITE_ONCE(nohz.needs_update, 1);
> > out:
> > /*
> > * Each time a cpu enter idle, we assume that it has blocked load and
> > @@ -11945,21 +11944,25 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
> > }
> >
> > /*
> > - * Check if we need to run the ILB for updating blocked load before entering
> > - * idle state.
> > + * Check if we need to run the ILB for updating blocked load and/or updating
> > + * nohz.next_balance before entering idle state.
> > */
> > void nohz_run_idle_balance(int cpu)
> > {
> > unsigned int flags;
> >
> > - flags = atomic_fetch_andnot(NOHZ_NEWILB_KICK, nohz_flags(cpu));
> > + flags = atomic_fetch_andnot(NOHZ_NEWILB_KICK | NOHZ_NEXT_KICK, nohz_flags(cpu));
> > +
> > + if (!flags)
> > + return;
> >
> > /*
> > * Update the blocked load only if no SCHED_SOFTIRQ is about to happen
> > * (ie NOHZ_STATS_KICK set) and will do the same.
> > */
> > - if ((flags == NOHZ_NEWILB_KICK) && !need_resched())
> > - _nohz_idle_balance(cpu_rq(cpu), NOHZ_STATS_KICK);
> > + if ((flags == (flags & (NOHZ_NEXT_KICK | NOHZ_NEWILB_KICK))) &&
> > + !need_resched())
> > + _nohz_idle_balance(cpu_rq(cpu), flags);
> > }
> >
> > static void nohz_newidle_balance(struct rq *this_rq)
> > @@ -11977,6 +11980,10 @@ static void nohz_newidle_balance(struct rq *this_rq)
> > if (this_rq->avg_idle < sysctl_sched_migration_cost)
> > return;
> >
> > + /* If rq->next_balance before nohz.next_balance, trigger ILB */
> > + if (time_before(this_rq->next_balance, READ_ONCE(nohz.next_balance)))
> > + atomic_or(NOHZ_NEXT_KICK, nohz_flags(this_cpu));
> > +
> > /* Don't need to update blocked load of idle CPUs*/
> > if (!READ_ONCE(nohz.has_blocked) ||
> > time_before(jiffies, READ_ONCE(nohz.next_blocked)))
>
> Ok, judging by your IPI reduction numbers this is definitely an
> optimization we want to do.
Great, thanks.
> The patch does make _nohz_idle_balance() run more parallel, as previously
> it would be generally run by the first-idle CPU in nohz.idle_cpus_mask (at
> least for next_balance updates), but I think it's still SMP-safe, as all
> key data structure updates are already rq-locked AFAICS.
Yes true, we are looking into the parallelism aspect more and will update on
how it goes. Ideally, we'd like to ensure that nohz.next_balance is set to
the earliest rq->next_balance even in the presence of concurrency.
Theoretically, we feel the parallelism should not increase more than the
current code but we'll look more into it.
> One thing I noticed is that we now use nohz.needs_update only in a single
> remaining case, when _nohz_idle_balance() "self-defers":
>
> /*
> * If this CPU gets work to do, stop the load balancing
> * work being done for other CPUs. Next load
> * balancing owner will pick it up.
> */
> if (need_resched()) {
> if (flags & NOHZ_STATS_KICK)
> has_blocked_load = true;
> if (flags & NOHZ_NEXT_KICK)
> WRITE_ONCE(nohz.needs_update, 1);
> goto abort;
> }
>
> Getting a need-resched flag set on this CPU is a pretty dubious reason to
> skip an ILB run IMO, and we could do entirely without that complication,
> allowing us to remove the nohz.needs_update flag handling logic altogether?
Yes you are right I think, we can continue doing the ILB run in the case we
are only doing lighter-weight stats update and not full-blown idle balancing.
if (need_resched() && (flags & NOHZ_BALANCE_KICK))
goto abort;
That way we can get rid of the needs_update variable as well, as you and
Vincent pointed out. We could also add this as a separate patch in a series.
Thanks for pointing out this idea.
> If we do that then the !need_resched() flag needs to go from
> nohz_run_idle_balance() too:
>
> /*
> * Update the blocked load only if no SCHED_SOFTIRQ is about to happen
> * (ie NOHZ_STATS_KICK set) and will do the same.
> */
> if ((flags == (flags & (NOHZ_NEXT_KICK | NOHZ_NEWILB_KICK))) &&
> !need_resched())
> _nohz_idle_balance(cpu_rq(cpu), flags);
>
> ... if I'm reading the code right that is.
Yes, that sounds right.
We will work more on this and post the next revision soon. Thank you!
- Joel & Vineeth
next prev parent reply other threads:[~2023-10-06 16:32 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-05 16:17 [PATCH RFC] sched/fair: Avoid unnecessary IPIs for ILB Joel Fernandes (Google)
2023-10-06 10:51 ` Ingo Molnar
2023-10-06 16:32 ` Joel Fernandes [this message]
2023-10-08 17:35 ` Joel Fernandes
2023-10-09 18:33 ` Vineeth Pillai
2023-10-10 7:15 ` Vincent Guittot
2023-10-10 19:32 ` Joel Fernandes
2023-10-06 13:46 ` Vincent Guittot
2023-10-06 16:46 ` Joel Fernandes
2023-10-06 19:18 ` Shrikanth Hegde
2023-10-06 20:10 ` Shrikanth Hegde
2023-10-08 16:50 ` Joel Fernandes
2023-10-06 21:20 ` Vineeth Pillai
2023-10-08 16:46 ` Joel Fernandes
2023-10-06 20:01 ` Peter Zijlstra
2023-10-08 16:39 ` Joel Fernandes
2023-10-09 11:25 ` Ingo Molnar
2023-10-09 20:11 ` Steven Rostedt
2023-10-10 17:55 ` Joel Fernandes
2023-10-19 14:56 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231006163216.GA2188081@google.com \
--to=joel@joelfernandes.org \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=frederic@kernel.org \
--cc=hsinyi@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=suleiman@google.com \
--cc=vincent.guittot@linaro.org \
--cc=vineethrp@google.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.