public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Pierre Gondois <pierre.gondois@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>,
	Xuewen Yan <xuewen.yan94@gmail.com>
Cc: Xuewen Yan <xuewen.yan@unisoc.com>,
	dietmar.eggemann@arm.com, mingo@redhat.com, peterz@infradead.org,
	juri.lelli@redhat.com, qyousef@layalina.io, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, christian.loehle@arm.com,
	vincent.donnefort@arm.com, ke.wang@unisoc.com,
	di.shen@unisoc.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity
Date: Thu, 4 Jul 2024 19:01:27 +0200	[thread overview]
Message-ID: <c4eaadcd-e563-41cf-b174-cd0fb4453c6f@arm.com> (raw)
In-Reply-To: <CAKfTPtDuuF8XX3gbsjF_Vgys2UtbzwtaX9QFgA-3kZ7+eqk02w@mail.gmail.com>

Hello,

On 6/27/24 18:15, Vincent Guittot wrote:
> On Thu, 27 Jun 2024 at 04:02, Xuewen Yan <xuewen.yan94@gmail.com> wrote:
>>
>> On Tue, Jun 25, 2024 at 9:05 PM Vincent Guittot
>> <vincent.guittot@linaro.org> wrote:
>>>
>>> On Mon, 24 Jun 2024 at 10:22, Xuewen Yan <xuewen.yan@unisoc.com> wrote:
>>>>
>>>> Commit 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
>>>> changed the PD's util from per-CPU to per-PD capping. But because
>>>> the effective_cpu_util() would return a util which maybe bigger
>>>> than the actual_cpu_capacity, this could cause the pd_busy_time
>>>> calculation errors.
>>>
>>> I'm still not convinced that this is an error. Your example used for v1 is :
>>>
>>> The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
>>> of cpufreq-limit, the cpu_actual_cap = 512.
>>>
>>> Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
>>> effective_cpu_util(4) = 1024;
>>> effective_cpu_util(5) = 1024;
>>> effective_cpu_util(6) = 256;
>>> effective_cpu_util(7) = 0;
>>>
>>> so env->pd_busy_time = 2304

IIUC, with this configuration:
Before patch:
- env->pd_busy_time = 2304
After patch:
- env->pd_busy_time = 512 + 512 + 256 + 0 = 1280

But when computing the energy for the task to place:
compute_energy()
{
   if (dst_cpu >= 0)
     busy_time = min(eenv->pd_cap, busy_time + eenv->task_busy_time);
}

the contribution of the task might be truncated.
E.g.:
Trying to place a task on CPU7 with task_busy_time=300.
Then in compute_energy():

Before patch:
busy_time = min(2048, 2304 + 300)
busy_time = 2048
-> busy_time delta = 2048 - 2304 = 256

After patch:
busy_time = min(2048, 1280 + 300)
busy_time = 1580
-> busy_time delta = 1580 - 1280 = 300

>>>
>>> Even if effective_cpu_util(4) = 1024; is above the current max compute
>>> capacity of 512, this also means that activity of cpu4 will run twice
>>> longer . If you cap effective_cpu_util(4) to 512 you miss the
>>> information that it will run twice longer at the selected OPP. The
>>> extreme case being:
>>> effective_cpu_util(4) = 1024;
>>> effective_cpu_util(5) = 1024;
>>> effective_cpu_util(6) = 1024;
>>> effective_cpu_util(7) = 1024;
>>>
>>> in this case env->pd_busy_time = 4096
>>>
>>> If we cap, we can't make any difference between the 2 cases
>>>
>>> Do you have more details about the problem you are facing ?
>>
>> Because of the cpufreq-limit, the opp was also limited, and when compute_energy:
>>
>> energy =  ps->cost * sum_util =  ps->cost * eenv->pd_busy_time;
>>
>> Because of the cpufreq-limit, the ps->cost is the limited-freq's opp's
>> cost instead of the max freq's cost.
>> So the energy is determined by pd_busy_time.
>>
>> Still the example above:
>>
>> The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
>> of cpufreq-limit, the cpu_actual_cap = 512.
>>
>> Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
>> effective_cpu_util(4) = 1024;
>> effective_cpu_util(5) = 1024;
>> effective_cpu_util(6) = 256;
>> effective_cpu_util(7) = 0;
>>
>> Before the patch:
>> env->pd_busy_time = min(1024+1024+256, eenv->pd_cap) = 2048.
>> However, because the effective_cpu_util(7) = 0, indeed, the 2048 is bigger than
>> the actual_cpu_cap.
>>
>> After the patch:
>> cpu_util(4) = min(1024, eenv->cpu_cap) = 512;
>> cpu_util(5) = min(1024, eenv->cpu_cap) = 512;
>> cpu_util(6) = min(256, eenv->cpu_cap) = 256;
>> cpu_util(7) = 0;
>> env->pd_busy_time = min(512+512+256, eenv->pd_cap) = 1280.

If we take a similar example, on a pd with 2 CPUs and:
- effective_cpu_util(0) = 1024
- effective_cpu_util(1) = 0
and:
- eenv->cpu_cap = 100
- eenv->pd_cap = 100 + 100 = 200

Before the patch:
- env->pd_busy_time = min(1024 + 0, eenv->pd_cap) = 200

After the patch:
- cpu_util(0) = min(1024, eenv->cpu_cap) = 100
- cpu_util(1) = min(0, eenv->cpu_cap) = 0
and:
- env->pd_busy_time = min(100 + 0, eenv->pd_cap) = 100

-------------

Same example while adding additional empty CPUs:
- effective_cpu_util(0) = 1024
- effective_cpu_util(1) = 0
- effective_cpu_util(2) = 0
- effective_cpu_util(3) = 0
and:
- eenv->cpu_cap = 100
- eenv->pd_cap = 100 * 4 = 400

Before the patch:
- env->pd_busy_time = min(1024 + 0 + 0 + 0, eenv->pd_cap) = 400

After the patch:
- cpu_util(0) = min(1024, eenv->cpu_cap) = 100
- cpu_util(1) = min(0, eenv->cpu_cap) = 0
- ...
and:
- env->pd_busy_time = min(100 + 0 + 0 + 0, eenv->pd_cap) = 100

-------------

What seems strange is that the more we add empty CPUs in a pd,
the more energy is spent in the first computation, which seems
indeed incorrect (unless I missed something).

>>
>> As a result, without this patch, the energy is bigger than actual_energy.
>>
>> And even if cpu4 would run twice longer, the energy may not be equal.
>> Because:
>>   *             ps->power * cpu_max_freq
>> *   cpu_nrg = ------------------------ * cpu_util           (3)
>> *               ps->freq * scale_cpu
>>
>> the ps->power = cfv2, and then:
>>
>> *                  cv2 * cpu_max_freq
>> *   cpu_nrg = ------------------------ * cpu_util           (3)
>> *                    scale_cpu
>>
>> because the limited-freq's voltage is not equal to the max-freq's voltage.
> 
> I'm still struggling to understand why it's wrong. If the frequency is
> capped, we will never go above this limited frequency and its
> associated voltage so there is no reason to consider max-freq's
> voltage. If there is more things to do than the actual capacity can do
> per unit of time then we will run more than 1 unit of time.
> 
> nrg of PD = /Sum(cpu) ps->power * cpu-running-time
> 
> ps->power is fixed because of the limited frequency constraint
> 
> we estimate cpu-running-time = utilization / ps->performance
> with
> - utilization = util_avg
> - performance = ps->freq / cpu_max_freq * arch_scale_cpu_capacity() =
> ps->performance
> 
> Up to now we were assuming that utilization was always lower than
> performance otherwise the system was overutilized andwe fallback in
> performance mode. But when the frequency of a cpu is limited by
> userspace or thermal mitigation, the utilization can become higher
> than the limited capacity which can be translated by cpu will run
> longer.

I thought the EAS was comparing instantaneous power and not energy,
i.e. how the energy computation is done:

   *             ps->power * cpu_max_freq
   *   cpu_nrg = ------------------------ * cpu_util           (3)
   *               ps->freq * scale_cpu

cpu_nrg should have the same dimension as ps->power (i.e. energy/s).
  From this PoV, the energy computation should not take into account how
much time a task is expected to run. But it might be a side discussion,

Regards,
Pierre


> 
>>
>>>
>>>
>>>
>>>> So clamp the cpu_busy_time with the eenv->cpu_cap, which is
>>>> the actual_cpu_capacity.
>>>>
>>>> Fixes: 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
>>>> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
>>>> Tested-by: Christian Loehle <christian.loehle@arm.com>
>>>> ---
>>>> V2:
>>>> - change commit message.
>>>> - remove the eenv->pd_cap capping in eenv_pd_busy_time(). (Dietmar)
>>>> - add Tested-by.
>>>> ---
>>>>   kernel/sched/fair.c | 9 +++++----
>>>>   1 file changed, 5 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 8a5b1ae0aa55..5ca6396ef0b7 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -7864,16 +7864,17 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
>>>>                                       struct cpumask *pd_cpus,
>>>>                                       struct task_struct *p)
>>>>   {
>>>> -       unsigned long busy_time = 0;
>>>>          int cpu;
>>>>
>>>> +       eenv->pd_busy_time = 0;
>>>> +
>>>>          for_each_cpu(cpu, pd_cpus) {
>>>>                  unsigned long util = cpu_util(cpu, p, -1, 0);
>>>>
>>>> -               busy_time += effective_cpu_util(cpu, util, NULL, NULL);
>>>> +               util = effective_cpu_util(cpu, util, NULL, NULL);
>>>> +               util = min(eenv->cpu_cap, util);
>>>> +               eenv->pd_busy_time += util;
>>>>          }
>>>> -
>>>> -       eenv->pd_busy_time = min(eenv->pd_cap, busy_time);
>>>>   }
>>>>
>>>>   /*
>>>> --
>>>> 2.25.1
>>>>
>>>>
> 

  parent reply	other threads:[~2024-07-04 17:01 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-24  8:20 [PATCH V2 0/2] sched/fair: Some improvements to feec() Xuewen Yan
2024-06-24  8:20 ` [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity Xuewen Yan
2024-06-25 13:04   ` Vincent Guittot
2024-06-27  2:02     ` Xuewen Yan
2024-06-27 16:15       ` Vincent Guittot
2024-06-28  1:39         ` Qais Yousef
2024-07-01 12:00         ` Xuewen Yan
2024-07-04 17:01         ` Pierre Gondois [this message]
2024-07-05  0:10           ` Qais Yousef
2024-06-24  8:20 ` [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu() Xuewen Yan
2024-06-25  8:46   ` Vincent Guittot
2024-06-28  1:28   ` Qais Yousef
2024-07-01 12:13     ` Xuewen Yan
2024-07-03 11:46       ` Qais Yousef
2024-07-02 13:25     ` Vincent Guittot
2024-07-03 11:54       ` Qais Yousef
2024-07-03 14:54         ` Vincent Guittot
2024-07-04 23:56           ` Qais Yousef
2024-07-05  6:25             ` Dietmar Eggemann
2024-07-16 11:56             ` Xuewen Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c4eaadcd-e563-41cf-b174-cd0fb4453c6f@arm.com \
    --to=pierre.gondois@arm.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=christian.loehle@arm.com \
    --cc=di.shen@unisoc.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=ke.wang@unisoc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rostedt@goodmis.org \
    --cc=vincent.donnefort@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=xuewen.yan94@gmail.com \
    --cc=xuewen.yan@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox