From: Hongyan Xia <hongyan.xia2@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: linux-kernel@vger.kernel.org, qyousef@layalina.io,
mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
dietmar.eggemann@arm.com, rostedt@goodmis.org,
bsegall@google.com, vschneid@redhat.com, lukasz.luba@arm.com,
mgorman@suse.de, rafael.j.wysocki@intel.com
Subject: Re: [PATCH 3/5] sched/fair: Rework feec() to use cost instead of spare capacity
Date: Fri, 6 Sep 2024 16:32:20 +0100 [thread overview]
Message-ID: <27e1b3c6-7e5b-4e52-9ba4-4e08fe7a11fc@arm.com> (raw)
In-Reply-To: <CAKfTPtAX22QPPfA1smH9eOWpqBzXZ=enUOAE8zscEFY9wBbkZQ@mail.gmail.com>
On 06/09/2024 08:08, Vincent Guittot wrote:
> On Mon, 2 Sept 2024 at 13:03, Hongyan Xia <hongyan.xia2@arm.com> wrote:
>>
>> On 30/08/2024 14:03, Vincent Guittot wrote:
>>> feec() looks for the CPU with highest spare capacity in a PD assuming that
>>> it will be the best CPU from a energy efficiency PoV because it will
>>> require the smallest increase of OPP. Although this is true generally
>>> speaking, this policy also filters some others CPUs which will be as
>>> efficients because of using the same OPP.
>>> In fact, we really care about the cost of the new OPP that will be
>>> selected to handle the waking task. In many cases, several CPUs will end
>>> up selecting the same OPP and as a result using the same energy cost. In
>>> these cases, we can use other metrics to select the best CPU for the same
>>> energy cost.
>>>
>>> Rework feec() to look 1st for the lowest cost in a PD and then the most
>>> performant CPU between CPUs.
>>>
>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>>> ---
>>> kernel/sched/fair.c | 466 +++++++++++++++++++++++---------------------
>>> 1 file changed, 244 insertions(+), 222 deletions(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index e67d6029b269..2273eecf6086 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> [...]
>>>
>>> - energy = em_cpu_energy(pd->em_pd, max_util, busy_time, eenv->cpu_cap);
>>> +/* For a same cost, select the CPU that will povide best performance for the task */
>>> +static bool select_best_cpu(struct energy_cpu_stat *target,
>>> + struct energy_cpu_stat *min,
>>> + int prev, struct sched_domain *sd)
>>> +{
>>> + /* Select the one with the least number of running tasks */
>>> + if (target->nr_running < min->nr_running)
>>> + return true;
>>> + if (target->nr_running > min->nr_running)
>>> + return false;
>>>
>> This makes me a bit worried about systems with coarse-grained OPPs. All
>> my dev boards and one of my old phones have <= 3 OPPs. On my Juno board,
>> the lowest OPP on the big core spans across 512 utilization, half of the
>> full capacity. Assuming a scenario where there are 4 tasks, each with
>> 300, 100, 100, 100 utilization, the placement should be 300 on one core
>> and 3 tasks with 100 on another, but the nr_running check here would
>> give 2 tasks (300 + 100) on one CPU and 2 tasks (100 + 100) on another
>> because they are still under the lowest OPP on Juno. The second CPU will
>> also finish faster and idle more than the first one.
>
> By balancing the number of tasks on each cpu, I try to minimize the
> scheduling latency. In your case above, tasks will wait for no more
> than a slice before running whereas it might have to wait up to 2
> slices if I put all the (100 utilization) tasks on the same CPU.
If viewed in another angle, we are now asking the 300 task (which
potentially has a heavier workload to finish) to compete with a 100
task, and now one core finishes faster and the other takes longer time,
making the overall execution time longer.
>>
>> To give an extreme example, assuming the system has only one OPP (such a
>> system is dumb to begin with, but just to make a point), before this
>> patch EAS would still work okay in task placement, but after this patch,
>
> Not sure what you mean by would still work okay. Do you have an
> example in mind that would not work correctly ?
With only one OPP, this patch will balance task placement purely on the
number of tasks without considering utilization, and I don't think
that's entirely acceptable (I actually need to deal with such a device
with only one OPP in real life, although that's the fault of that
device). Before, we are still balancing on total utilization, which
results in the lowest execution time.
>
>> EAS would just balance on the number of tasks, regardless of utilization
>> of tasks on wake-up.
>
> You have to keep in mind that utilization is already taken into
> account to check if the task fits the CPU and by selecting the OPP
> (which is a nope in case of one OPP). So we know that there is enough
> capacity for the waking task
Still, taking my Juno board as an example where the 1st OPP is at
utilization 512. Assuming no 25% margin, four tasks with utilization
200, 200, 50, 50 and two CPUs, I would strongly favor 200 + 50 on one
CPU and same on the other, than 200 + 200 on one, 50 + 50 on the other.
However, with this patch, these two scheduling decisions are the same,
as long as both are under the 512 OPP.
Of course, this becomes less of a problem with fine-grained OPPs. On my
Pixel 6 with 18 OPPs on one CPU, I don't have such concerns.
>>
>> I wonder if there is a way to still take total utilization as a factor.
>
> utilization is still used to check that the task utilization fits with
> current cpu utilization and then to select the OPP. At this step we
> know that there is enough capacity for everybody
>
>> It used to be 100% of the decision making, but maybe now it is only 60%,
>> and the other 40% are things like number of tasks and contention.
>>
>>> - trace_sched_compute_energy_tp(p, dst_cpu, energy, max_util, busy_time);
>>> + /* Favor previous CPU otherwise */
>>> + if (target->cpu == prev)
>>> + return true;
>>> + if (min->cpu == prev)
>>> + return false;
>>>
>>> - return energy;
>>> + /*
>>> + * Choose CPU with lowest contention. One might want to consider load instead of
>>> + * runnable but we are supposed to not be overutilized so there is enough compute
>>> + * capacity for everybody.
>>> + */
>>> + if ((target->runnable * min->capa * sd->imbalance_pct) >=
>>> + (min->runnable * target->capa * 100))
>>> + return false;
>>> +
>>> + return true;
>>> }
>>> [...]
>>
next prev parent reply other threads:[~2024-09-06 15:32 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-30 13:03 [PATCH 0/5] sched/fair: Rework EAS to handle more cases Vincent Guittot
2024-08-30 13:03 ` [PATCH 1/5] sched/fair: Filter false overloaded_group case for EAS Vincent Guittot
2024-09-02 9:01 ` Hongyan Xia
2024-09-06 6:51 ` Vincent Guittot
2024-09-13 13:21 ` Pierre Gondois
2024-08-30 13:03 ` [PATCH 2/5] energy model: Add a get previous state function Vincent Guittot
2024-09-05 9:21 ` Lukasz Luba
2024-09-06 6:55 ` Vincent Guittot
2024-08-30 13:03 ` [PATCH 3/5] sched/fair: Rework feec() to use cost instead of spare capacity Vincent Guittot
2024-09-02 9:11 ` kernel test robot
2024-09-02 11:03 ` Hongyan Xia
2024-09-06 7:08 ` Vincent Guittot
2024-09-06 15:32 ` Hongyan Xia [this message]
2024-09-12 12:12 ` Vincent Guittot
2024-09-04 15:07 ` Pierre Gondois
2024-09-06 7:08 ` Vincent Guittot
2024-09-11 14:02 ` Pierre Gondois
2024-09-11 16:51 ` Pierre Gondois
2024-09-12 12:22 ` Vincent Guittot
2024-12-05 16:23 ` Pierre Gondois
2024-08-30 13:03 ` [RFC PATCH 4/5] sched/fair: Use EAS also when overutilized Vincent Guittot
2024-09-17 20:24 ` Christian Loehle
2024-09-19 8:25 ` Pierre Gondois
2024-09-25 13:28 ` Vincent Guittot
2024-10-07 7:03 ` Pierre Gondois
2024-10-09 8:53 ` Vincent Guittot
2024-10-11 12:52 ` Pierre Gondois
2024-10-15 12:47 ` Vincent Guittot
2024-10-31 15:21 ` Pierre Gondois
2024-09-25 13:07 ` Vincent Guittot
2024-09-20 16:17 ` Quentin Perret
2024-09-25 13:27 ` Vincent Guittot
2024-09-26 9:10 ` Quentin Perret
2024-10-01 16:20 ` Vincent Guittot
2024-10-01 17:50 ` Quentin Perret
2024-10-02 7:11 ` Lukasz Luba
2024-10-02 7:55 ` Quentin Perret
2024-10-02 9:54 ` Lukasz Luba
2024-10-03 6:27 ` Vincent Guittot
2024-10-03 8:15 ` Lukasz Luba
2024-10-03 8:26 ` Quentin Perret
2024-10-03 8:52 ` Vincent Guittot
2024-10-03 8:21 ` Quentin Perret
2024-10-03 8:57 ` Vincent Guittot
2024-10-03 9:52 ` Quentin Perret
2024-10-03 13:26 ` Vincent Guittot
2024-11-19 14:46 ` Christian Loehle
2024-08-30 13:03 ` [RFC PATCH 5/5] sched/fair: Add push task callback for EAS Vincent Guittot
2024-09-09 9:59 ` Christian Loehle
2024-09-09 12:54 ` Vincent Guittot
2024-09-11 14:03 ` Pierre Gondois
2024-09-12 12:30 ` Vincent Guittot
2024-09-13 9:09 ` Pierre Gondois
2024-09-24 12:37 ` Vincent Guittot
2024-09-13 16:08 ` Pierre Gondois
2024-09-24 13:00 ` Vincent Guittot
2024-11-07 10:14 ` [PATCH 0/5] sched/fair: Rework EAS to handle more cases Pierre Gondois
2024-11-08 9:27 ` Vincent Guittot
2024-11-08 13:10 ` Pierre Gondois
2024-11-11 19:08 ` Vincent Guittot
2024-11-28 17:24 ` Hongyan Xia
2024-11-30 10:50 ` Vincent Guittot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=27e1b3c6-7e5b-4e52-9ba4-4e08fe7a11fc@arm.com \
--to=hongyan.xia2@arm.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lukasz.luba@arm.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=qyousef@layalina.io \
--cc=rafael.j.wysocki@intel.com \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox