From: Mel Gorman <mgorman@techsingularity.net>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Valentin Schneider <valentin.schneider@arm.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] sched/fair: Age the average idle time
Date: Thu, 17 Jun 2021 16:47:59 +0100 [thread overview]
Message-ID: <20210617154759.GR30378@techsingularity.net> (raw)
In-Reply-To: <CAKfTPtBJkpSMFFGwgdFLyO5aSnGuzQSPrtpwOFckMQa4xaex=Q@mail.gmail.com>
On Thu, Jun 17, 2021 at 05:03:54PM +0200, Vincent Guittot wrote:
> On Thu, 17 Jun 2021 at 13:05, Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > On Thu, Jun 17, 2021 at 12:02:56PM +0200, Vincent Guittot wrote:
> > > > > >
> > > > > > Fundamentally though, as the changelog notes "due to the nature of the
> > > > > > patch, this is a regression magnet". There are going to be examples
> > > > > > where a deep search is better even if a machine is fully busy or
> > > > > > overloaded and examples where cutting off the search is better. I think
> > > > > > it's better to have an idle estimate that gets updated if CPUs are fully
> > > > > > busy even if it's not a universal win.
> > > > >
> > > > > Although I agree that using a stall average idle time value of local
> > > > > is not good, I'm not sure this proposal is better. The main problem is
> > > > > that we use the avg_idle of the local CPU to estimate how many times
> > > > > we should loop and try to find another idle CPU. But there is no
> > > > > direct relation between both.
> > > >
> > > > This is true. The idle time of the local CPU is used to estimate the
> > > > idle time of the domain which is inevitably going to be inaccurate but
> > >
> > > I'm more and more convinced that using average idle time (of the
> > > local cpu or the full domain) is not the right metric. In
> > > select_idle_cpu(), we looks for an idle CPU but we don't care about
> > > how long it will be idle.
> >
> > Can we predict that accurately? cpufreq for intel_pstate used to try
> > something like that but it was a bit fuzzy and I don't know if the
> > scheduler could do much better. There is some idle prediction stuff but
> > it's related to nohz which does not really help us if a machine is nearly
> > fully busy or overloaded.
> >
> > I guess for tracking idle that revisiting
> > https://lore.kernel.org/lkml/1615872606-56087-1-git-send-email-aubrey.li@intel.com/
> > is an option now that the scan is somewhat unified. A two-pass scan
> > could be used to check potentially idle CPUs first and if there is
> > sufficient search depth left, scan other CPUs. There were some questions
>
> I think it's the other way around:
> a CPU is busy for sure if it is not set in the cpuidle_mask and we
> don't need to check it. But a cpu might not be idle even if it is set
> in the idle mask might because it's cleared during the tick
>
Tick is a long time so scan depth may still be a problem.
> > Selecting based on avg idle time could be interesting but hazardous. If
> > for example, we prioritised selecting a CPU that is mostly idle, it'll
> > also pick CPUs that are potentially in a deep idle state incurring a
> > larger wakeup cost. Right now we are not much better because we just
> > select an idle CPU and hope for the best but always targetting the most
> > idle CPU could have problems. There would also be the cost of tracking
> > idle CPUs in priority order. It would eliminate the scan depth cost
> > calculations but the overall cost would be much worse.
> >
> > Hence, I still think we can improve the scan depth costs in the short
> > term until a replacement is identified that works reasonably well.
> >
> > > Even more, we can scan all CPUs whatever the
> > > avg idle time if there is a chance that there is an idle core.
> > >
> >
> > That is an important, but separate topic. It's known that the idle core
> > detection can yield false positives. Putting core scanning under SIS_PROP
> > had mixed results when we last tried but things change. Again, it doesn't
> > help with scan depth calculations.
>
> my point was mainly to highlight that the path can take opposite
> decision for the same avg_idle value:
> - scan all cpus if has_idle_core is true whatever avg_idle
> - limit the depth if has_idle_core is false and avg_idle is short
>
I do understand the point but the idle core scan anomaly was not
intended to be addressed in the patch because putting the idle scan
under SIS_PROP potentially means using cpus with active idle siblings
prematurely.
> >
> > > > tracking idle time for the domain will be cache write intensive and
> > > > potentially very expensive. I think this was discussed before but maybe
> > > > it is my imaginaction.
> > > >
> > > > > Typically, a short average idle time on
> > > > > the local CPU doesn't mean that there are less idle CPUs and that's
> > > > > why we have a mix a gain and loss
> > > > >
> > > >
> > > > Can you evaluate if scanning proportional to cores helps if applied on
> > > > top? The patch below is a bit of pick&mix and has only seen a basic build
> > >
> > > I will queue it for some test later today
> > >
> >
> > Thanks. The proposed patch since passed a build and boot test,
> > performance evaluation is under way but as it's x86 and SMT2, I'm mostly
> > just checking that it's neutral.
>
> Results stay similar:
> group tip/sched/core + this patch + latest addon
> 1 13.358(+/- 1.82%) 12.850(+/- 2.21%) +4% 13.411(+/- 2.47%) -0%
> 4 4.286(+/- 2.77%) 4.114(+/- 2.25%) +4% 4.163(+/- 1.88%) +3%
> 16 3.175(+/- 0.55%) 3.559(+/- 0.43%) -12% 3.535(+/- 0.52%) -11%
> 32 2.912(+/- 0.79%) 3.165(+/- 0.95%) -8% 3.153(+/- 0.76%) -10%
> 64 2.859(+/- 1.12%) 2.937(+/- 0.91%) -3% 2.919(+/- 0.73%) -2%
> 128 3.092(+/- 4.75%) 3.003(+/-5.18%) +3% 2.973(+/- 0.90%) +4%
> 256 3.233(+/- 3.03%) 2.973(+/- 0.80%) +8% 3.036(+/- 1.05%) +6%
>
Ok, accounting for SMT4 didn't help.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2021-06-17 15:48 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-15 11:16 [PATCH v2] sched/fair: Age the average idle time Mel Gorman
2021-06-15 20:42 ` Peter Zijlstra
2021-06-15 21:13 ` Phil Auld
2021-06-16 9:03 ` Mel Gorman
2021-06-17 15:01 ` Phil Auld
2021-06-17 15:40 ` Mel Gorman
2021-06-18 19:22 ` Phil Auld
2021-06-16 15:52 ` Vincent Guittot
2021-06-17 7:44 ` Mel Gorman
2021-06-17 8:30 ` Vincent Guittot
2021-06-17 9:40 ` Mel Gorman
2021-06-17 10:02 ` Vincent Guittot
2021-06-17 11:05 ` Mel Gorman
2021-06-17 15:03 ` Vincent Guittot
2021-06-17 15:47 ` Mel Gorman [this message]
2021-06-17 16:05 ` Vincent Guittot
2021-06-18 8:46 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-06-23 14:46 ` [sched/fair] 5359f5ca0f: phoronix-test-suite.stress-ng.SystemVMessagePassing.bogo_ops_s 77.9% improvement kernel test robot
2021-06-23 14:46 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210617154759.GR30378@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.