public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Pierre Gondois <pierre.gondois@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>,
	mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
	linux-kernel@vger.kernel.org, kprateek.nayak@amd.com
Cc: qyousef@layalina.io, christian.loehle@arm.com
Subject: Re: [PATCH 0/6 v8] sched/fair: Add push task mechanism and handle more EAS cases
Date: Thu, 26 Feb 2026 18:34:04 +0100	[thread overview]
Message-ID: <520385b2-545a-4ac7-ae05-06d24845f512@arm.com> (raw)
In-Reply-To: <20251202181242.1536213-1-vincent.guittot@linaro.org>


On 12/2/25 19:12, Vincent Guittot wrote:
> This is a subset of [1] (sched/fair: Rework EAS to handle more cases)
>
> [1] https://lore.kernel.org/all/20250314163614.1356125-1-vincent.guittot@linaro.org/
>
> The current Energy Aware Scheduler has some known limitations which have
> became more and more visible with features like uclamp as an example. This
> serie tries to fix some of those issues:
> - tasks stacked on the same CPU of a PD
> - tasks stuck on the wrong CPU.

Following some other comments I think, I'm not sure I understand the use 
case
the patchset tries to solve.
- If this is for UCLAMP_MAX tasks:
As Christian said (somwhere) the utilization of a long running task doesn't
represent anything, so using EAS to do task placement cannot give a good
placement. The push mechanism effectively allows to down-migrate UCLAMP_MAX
tasks, but the repartition of these tasks is then subject to randomness.

On a Radxa Orion:
- 12 CPUs
- CPU[1-4] are little CPUs with capa=290
- using an artificial EM

Running 8 CPU-bound tasks with UCLAMP_MAX=100, the task placement can be:
- CPU1: 6 tasks
- CPU2: 1 task
- CPU3: 1 task
- CPU4: idle
The push mechanism triggers feec() and down-migrate tasks to little CPUs.
However doesn't balance the ratio of (load / capacity) between CPUs as the
load balancer could do. So the above placement is correct in that regard.

Another point is that it is hard to reason about what a 'fair' task 
placement
is for UCLAMP_MAX tasks as their throughput is limited on purpose.

The previous version of your patchset was trying to solve that issue,
but IMO this issue is inherent to UCLAMP_MAX setting. EAS doesn't
consider load during the task placement as all tasks are supposed
to be ~periodic and have wake-up events. CPUs are also supposed to have
some idle time, which guarantees that tasks are never really
starving, but UCLAMP_MAX contradicts this assumption.
With:
- Task[0-1]: NICE=-19, cpumask = CPUA,CPUB
- Task[2-3]: NICE=20, cpumask = CPUA,CPUB
The following task placement:
- CPUA: Task0 + Task1
- CPUB: Task2 + Task3
is fine for EAS, but sched_balance_find_dst_cpu() would do:
- CPUA: Task0 + Task2
- CPUB: Task1 + Task3
to balance the load, which is more 'fair'.

------------

- If this is to have better energy results by running feec() more often

You say later in  the cover letter that other numbers would come
later, so I m curious to see the improvement.

Also I think that Christian mentioned somewhere the fact that
feec() is subject to concurrency. I quickly got some numbers and didn't see
a huge increase of concurrent decisions with the push mechanism,
but this indeed seems like something to worry about.

feec() is costly to run. I don't have any numbers to provide.

------------

- If this is to bail out of the OU state faster by migrating tasks to 
idle CPUs
or running feec() before a CPU is considered as overutilized

I can understand this point. When testing the patches, it seemed that
an inflating task still triggered the OU state.

Indeed other CPUs are going through a load balance through:

sched_balance_find_src_group()
\-update_sd_lb_stats
   \-set_rd_overutilized()

and trigger the OU state, or through:

task_tick_fair()
\-check_pushable_task()
   \-if (rq->nr_running > 1) -> return False
\-check_update_overutilized_status()

Also task_stuck_on_cpu() checks whether a single task fills the CPU 
capacity,
not whether the CPU utilization reaches the 80% threshold.


So I didn't see that much improvement on the OU front.
However as Qais noted, the load balancer is effectively quite slow to 
migrate
misfit tasks.

The patchset runs some checks on each sched tick and each time a rq switches
to another task. If the goal was to:
- non-EAS: push misfit tasks quickly
- EAS: avoid going in the OU state
this would already be a great improvement. I assume this would also allow to
remove the misfit handling code in the load balancer.

This would also mean extending the push mechanism to all HMP systems,
not just EAS-enabled systems.

------------

Summary:

- IMO UCLAMP_MAX tasks will always be an issue for EAS. Even if these tasks
were down-migrated, other issues would come up

- I'm interested in seeing energy consumption improvement numbers,
or other performance numbers.

- Following Qais (IIUC), the push mechanism could be useful to improve 
misfit task
migration latency and avoid going in the OU state. I tried to do some 
modifications
in that sense and didn't see any show stopper so far. This would also 
allow to
remove some code in the load balancer.


>
> Patch 1 fixes the case where a CPU is wrongly classified as overloaded
> whereas it is capped to a lower compute capacity. This wrong classification
> can prevent periodic load balancer to select a group_misfit_task CPU
> because group_overloaded has higher priority.
>
> Patch 2 removes the need of testing uclamp_min in cpu_overutilized to
> trigger the active migration of a task on another CPU.
>
> Patch 3 prepares select_task_rq_fair() to be called without TTWU, Fork or
> Exec flags when we just want to look for a possible better CPU.
>
> Patch 4 adds push call back mecanism to fair scheduler but doesn't enable
> it.
>
> Patch 5 enable has_idle_core for !SMP system to track if there may be an
> idle CPU in the LLC.
>
> Patch 6 adds some conditions to enable pushing runnable tasks for EAS:
> - when a task is stuck on a CPU and the system is not overutilized.
> - if there is a possible idle CPU when the system is overutilized.
>
> More tests results will come later as I wanted to send the pachtset before
> LPC.
>
> I have kept Tbench figures as I added them in v7 but results are the same
> with the correct patch 6.
>
> Tbench on dragonboard rb5
> schedutil and EAS enabled
>
> # process     tip                   +patchset
> 1              29.3(+/-0.3%)        29.2(+/-0.2%) +0%
> 2              61.1(+/-1.8%)        61.7(+/-3.2%) +1%
> 4             260.0(+/-1.7%)       258.8(+/-2.8%) -1%
> 8            1361.2(+/-3.1%)      1377.1(+/-1.9%) +1%
> 16            981.5(+/-0.6%)       958.0(+/-1.7%) -2%
>
> Hackbench didn't show any difference
>
> Changes since v7:
> - Rebased on latest tip/sched/core
> - Fix some typos
> - Fix patch 6 mess
>
> Vincent Guittot (6):
>    sched/fair: Filter false overloaded_group case for EAS
>    sched/fair: Update overutilized detection
>    sched/fair: Prepare select_task_rq_fair() to be called for new cases
>    sched/fair: Add push task mechanism for fair
>    sched/fair: Enable idle core tracking for !SMT
>    sched/fair: Add EAS and idle cpu push trigger
>
>   kernel/sched/fair.c     | 350 +++++++++++++++++++++++++++++++++++-----
>   kernel/sched/sched.h    |  46 ++++--
>   kernel/sched/topology.c |   2 +
>   3 files changed, 345 insertions(+), 53 deletions(-)
>

  parent reply	other threads:[~2026-02-26 17:35 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-02 18:12 [PATCH 0/6 v8] sched/fair: Add push task mechanism and handle more EAS cases Vincent Guittot
2025-12-02 18:12 ` [PATCH 1/6 v8] sched/fair: Filter false overloaded_group case for EAS Vincent Guittot
2025-12-02 18:12 ` [PATCH 2/6 v8] sched/fair: Update overutilized detection Vincent Guittot
2026-02-06 17:42   ` Qais Yousef
2025-12-02 18:12 ` [PATCH 3/6 v8] sched/fair: Prepare select_task_rq_fair() to be called for new cases Vincent Guittot
2025-12-07 13:23   ` Shrikanth Hegde
2026-02-09 13:21     ` Vincent Guittot
2026-02-06 18:03   ` Qais Yousef
2026-02-09 13:21     ` Vincent Guittot
2025-12-02 18:12 ` [PATCH 4/6 v8] sched/fair: Add push task mechanism for fair Vincent Guittot
2025-12-04 10:46   ` Peter Zijlstra
2025-12-04 14:32     ` Vincent Guittot
2025-12-04 11:29   ` Peter Zijlstra
2025-12-04 14:34     ` Vincent Guittot
2025-12-05  8:59       ` Peter Zijlstra
2025-12-05 12:49         ` K Prateek Nayak
2025-12-05 12:56           ` Peter Zijlstra
2025-12-05 13:05             ` K Prateek Nayak
2025-12-05 13:36           ` Vincent Guittot
2025-12-06  3:08             ` K Prateek Nayak
2025-12-05 13:26         ` Vincent Guittot
2025-12-07 12:13   ` Shrikanth Hegde
2026-02-09 13:17     ` Vincent Guittot
2025-12-10 14:01   ` Dietmar Eggemann
2026-02-09 13:17     ` Vincent Guittot
2026-02-06 18:21   ` Qais Yousef
2026-02-09 13:18     ` Vincent Guittot
2025-12-02 18:12 ` [RFC PATCH 5/6 v8] sched/fair: Enable idle core tracking for !SMT Vincent Guittot
2025-12-05 15:52   ` Christian Loehle
2025-12-06  2:11     ` Chen, Yu C
2025-12-06 10:18       ` Vincent Guittot
2025-12-06 10:09     ` Vincent Guittot
2025-12-08 18:43   ` Christian Loehle
2025-12-02 18:12 ` [RFC PATCH 6/6 v8] sched/fair: Add EAS and idle cpu push trigger Vincent Guittot
2026-02-06 18:30   ` Qais Yousef
2026-02-09 13:20     ` Vincent Guittot
2026-02-11  0:59       ` Qais Yousef
2025-12-03 14:06 ` [PATCH 0/6 v8] sched/fair: Add push task mechanism and handle more EAS cases Christian Loehle
2025-12-10 13:30 ` Dietmar Eggemann
2026-02-06 18:32 ` Qais Yousef
2026-02-09 13:20   ` Vincent Guittot
2026-02-26 17:34 ` Pierre Gondois [this message]
2026-03-10  4:16   ` Qais Yousef
2026-03-10 10:27     ` Pierre Gondois
2026-03-10 15:11       ` Qais Yousef
2026-03-10 16:59         ` Pierre Gondois
2026-03-12  8:19           ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=520385b2-545a-4ac7-ae05-06d24845f512@arm.com \
    --to=pierre.gondois@arm.com \
    --cc=bsegall@google.com \
    --cc=christian.loehle@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox