Re: [PATCH 3/5] sched/fair: Rework feec() to use cost instead of spare capacity

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Hongyan Xia <hongyan.xia2@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: linux-kernel@vger.kernel.org, qyousef@layalina.io,
	mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, vschneid@redhat.com, lukasz.luba@arm.com,
	mgorman@suse.de, rafael.j.wysocki@intel.com
Subject: Re: [PATCH 3/5] sched/fair: Rework feec() to use cost instead of spare capacity
Date: Fri, 6 Sep 2024 16:32:20 +0100	[thread overview]
Message-ID: <27e1b3c6-7e5b-4e52-9ba4-4e08fe7a11fc@arm.com> (raw)
In-Reply-To: <CAKfTPtAX22QPPfA1smH9eOWpqBzXZ=enUOAE8zscEFY9wBbkZQ@mail.gmail.com>

On 06/09/2024 08:08, Vincent Guittot wrote:
> On Mon, 2 Sept 2024 at 13:03, Hongyan Xia <hongyan.xia2@arm.com> wrote:
>>
>> On 30/08/2024 14:03, Vincent Guittot wrote:
>>> feec() looks for the CPU with highest spare capacity in a PD assuming that
>>> it will be the best CPU from a energy efficiency PoV because it will
>>> require the smallest increase of OPP. Although this is true generally
>>> speaking, this policy also filters some others CPUs which will be as
>>> efficients because of using the same OPP.
>>> In fact, we really care about the cost of the new OPP that will be
>>> selected to handle the waking task. In many cases, several CPUs will end
>>> up selecting the same OPP and as a result using the same energy cost. In
>>> these cases, we can use other metrics to select the best CPU for the same
>>> energy cost.
>>>
>>> Rework feec() to look 1st for the lowest cost in a PD and then the most
>>> performant CPU between CPUs.
>>>
>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>>> ---
>>>    kernel/sched/fair.c | 466 +++++++++++++++++++++++---------------------
>>>    1 file changed, 244 insertions(+), 222 deletions(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index e67d6029b269..2273eecf6086 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> [...]
>>>
>>> -     energy = em_cpu_energy(pd->em_pd, max_util, busy_time, eenv->cpu_cap);
>>> +/* For a same cost, select the CPU that will povide best performance for the task */
>>> +static bool select_best_cpu(struct energy_cpu_stat *target,
>>> +                         struct energy_cpu_stat *min,
>>> +                         int prev, struct sched_domain *sd)
>>> +{
>>> +     /*  Select the one with the least number of running tasks */
>>> +     if (target->nr_running < min->nr_running)
>>> +             return true;
>>> +     if (target->nr_running > min->nr_running)
>>> +             return false;
>>>
>> This makes me a bit worried about systems with coarse-grained OPPs. All
>> my dev boards and one of my old phones have <= 3 OPPs. On my Juno board,
>> the lowest OPP on the big core spans across 512 utilization, half of the
>> full capacity. Assuming a scenario where there are 4 tasks, each with
>> 300, 100, 100, 100 utilization, the placement should be 300 on one core
>> and 3 tasks with 100 on another, but the nr_running check here would
>> give 2 tasks (300 + 100) on one CPU and 2 tasks (100 + 100) on another
>> because they are still under the lowest OPP on Juno. The second CPU will
>> also finish faster and idle more than the first one.
> 
> By balancing the number of tasks on each cpu, I try to minimize the
> scheduling latency. In your case above, tasks will wait for no more
> than a slice before running whereas it might have to wait up to 2
> slices if I put all the (100 utilization) tasks on the same CPU.

If viewed in another angle, we are now asking the 300 task (which 
potentially has a heavier workload to finish) to compete with a 100 
task, and now one core finishes faster and the other takes longer time, 
making the overall execution time longer.

>>
>> To give an extreme example, assuming the system has only one OPP (such a
>> system is dumb to begin with, but just to make a point), before this
>> patch EAS would still work okay in task placement, but after this patch,
> 
> Not sure what you mean by would still work okay. Do you have an
> example in mind that would not work correctly ?

With only one OPP, this patch will balance task placement purely on the 
number of tasks without considering utilization, and I don't think 
that's entirely acceptable (I actually need to deal with such a device 
with only one OPP in real life, although that's the fault of that 
device). Before, we are still balancing on total utilization, which 
results in the lowest execution time.

> 
>> EAS would just balance on the number of tasks, regardless of utilization
>> of tasks on wake-up.
> 
> You have to keep in mind that utilization is already taken into
> account to check if the task fits the CPU and by selecting the OPP
> (which is a nope in case of one OPP). So we know that there is enough
> capacity for the waking task

Still, taking my Juno board as an example where the 1st OPP is at 
utilization 512. Assuming no 25% margin, four tasks with utilization 
200, 200, 50, 50 and two CPUs, I would strongly favor 200 + 50 on one 
CPU and same on the other, than 200 + 200 on one, 50 + 50 on the other. 
However, with this patch, these two scheduling decisions are the same, 
as long as both are under the 512 OPP.

Of course, this becomes less of a problem with fine-grained OPPs. On my 
Pixel 6 with 18 OPPs on one CPU, I don't have such concerns.

>>
>> I wonder if there is a way to still take total utilization as a factor.
> 
> utilization is still used to check that the task utilization fits with
> current cpu utilization and then to select the OPP. At this step we
> know that there is enough capacity for everybody
> 
>> It used to be 100% of the decision making, but maybe now it is only 60%,
>> and the other 40% are things like number of tasks and contention.
>>
>>> -     trace_sched_compute_energy_tp(p, dst_cpu, energy, max_util, busy_time);
>>> +     /* Favor previous CPU otherwise */
>>> +     if (target->cpu == prev)
>>> +             return true;
>>> +     if (min->cpu == prev)
>>> +             return false;
>>>
>>> -     return energy;
>>> +     /*
>>> +      * Choose CPU with lowest contention. One might want to consider load instead of
>>> +      * runnable but we are supposed to not be overutilized so there is enough compute
>>> +      * capacity for everybody.
>>> +      */
>>> +     if ((target->runnable * min->capa * sd->imbalance_pct) >=
>>> +                     (min->runnable * target->capa * 100))
>>> +             return false;
>>> +
>>> +     return true;
>>>    }
>>> [...]
>>

next prev parent reply	other threads:[~2024-09-06 15:32 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-30 13:03 [PATCH 0/5] sched/fair: Rework EAS to handle more cases Vincent Guittot
2024-08-30 13:03 ` [PATCH 1/5] sched/fair: Filter false overloaded_group case for EAS Vincent Guittot
2024-09-02  9:01   ` Hongyan Xia
2024-09-06  6:51     ` Vincent Guittot
2024-09-13 13:21   ` Pierre Gondois
2024-08-30 13:03 ` [PATCH 2/5] energy model: Add a get previous state function Vincent Guittot
2024-09-05  9:21   ` Lukasz Luba
2024-09-06  6:55     ` Vincent Guittot
2024-08-30 13:03 ` [PATCH 3/5] sched/fair: Rework feec() to use cost instead of spare capacity Vincent Guittot
2024-09-02  9:11   ` kernel test robot
2024-09-02 11:03   ` Hongyan Xia
2024-09-06  7:08     ` Vincent Guittot
2024-09-06 15:32       ` Hongyan Xia [this message]
2024-09-12 12:12         ` Vincent Guittot
2024-09-04 15:07   ` Pierre Gondois
2024-09-06  7:08     ` Vincent Guittot
2024-09-11 14:02   ` Pierre Gondois
2024-09-11 16:51     ` Pierre Gondois
2024-09-12 12:22     ` Vincent Guittot
2024-12-05 16:23       ` Pierre Gondois
2024-08-30 13:03 ` [RFC PATCH 4/5] sched/fair: Use EAS also when overutilized Vincent Guittot
2024-09-17 20:24   ` Christian Loehle
2024-09-19  8:25     ` Pierre Gondois
2024-09-25 13:28       ` Vincent Guittot
2024-10-07  7:03         ` Pierre Gondois
2024-10-09  8:53           ` Vincent Guittot
2024-10-11 12:52             ` Pierre Gondois
2024-10-15 12:47               ` Vincent Guittot
2024-10-31 15:21                 ` Pierre Gondois
2024-09-25 13:07     ` Vincent Guittot
2024-09-20 16:17   ` Quentin Perret
2024-09-25 13:27     ` Vincent Guittot
2024-09-26  9:10       ` Quentin Perret
2024-10-01 16:20         ` Vincent Guittot
2024-10-01 17:50           ` Quentin Perret
2024-10-02  7:11             ` Lukasz Luba
2024-10-02  7:55               ` Quentin Perret
2024-10-02  9:54                 ` Lukasz Luba
2024-10-03  6:27             ` Vincent Guittot
2024-10-03  8:15               ` Lukasz Luba
2024-10-03  8:26                 ` Quentin Perret
2024-10-03  8:52                 ` Vincent Guittot
2024-10-03  8:21               ` Quentin Perret
2024-10-03  8:57                 ` Vincent Guittot
2024-10-03  9:52                   ` Quentin Perret
2024-10-03 13:26                     ` Vincent Guittot
2024-11-19 14:46               ` Christian Loehle
2024-08-30 13:03 ` [RFC PATCH 5/5] sched/fair: Add push task callback for EAS Vincent Guittot
2024-09-09  9:59   ` Christian Loehle
2024-09-09 12:54     ` Vincent Guittot
2024-09-11 14:03   ` Pierre Gondois
2024-09-12 12:30     ` Vincent Guittot
2024-09-13  9:09       ` Pierre Gondois
2024-09-24 12:37         ` Vincent Guittot
2024-09-13 16:08   ` Pierre Gondois
2024-09-24 13:00     ` Vincent Guittot
2024-11-07 10:14 ` [PATCH 0/5] sched/fair: Rework EAS to handle more cases Pierre Gondois
2024-11-08  9:27   ` Vincent Guittot
2024-11-08 13:10     ` Pierre Gondois
2024-11-11 19:08       ` Vincent Guittot
2024-11-28 17:24 ` Hongyan Xia
2024-11-30 10:50   ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=27e1b3c6-7e5b-4e52-9ba4-4e08fe7a11fc@arm.com \
    --to=hongyan.xia2@arm.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox