linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] sched/fair: do not scan twice in detach_tasks()
@ 2025-07-18  6:35 Huang Shijie
  2025-07-18 18:49 ` Valentin Schneider
  0 siblings, 1 reply; 3+ messages in thread
From: Huang Shijie @ 2025-07-18  6:35 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, vschneid
  Cc: patches, cl, Shubhang, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel, Huang Shijie

When detach_tasks() scans the src_cpu's task list, the task list
may shrink during the scanning. For example, the task list
may have four tasks at the beginning, it may becomes to two
during the scanning in detach_tasks():
    Task list at beginning : "ABCD"
    Task list in scanning  : "CD"

    (ABCD stands for differnt tasks.)

In this scenario, the env->loop_max is still four, so
detach_tasks() may scan twice for some tasks:
    the scanning order maybe : "DCDC"

In the Specjbb test, this issue can be catched many times.
(Over 330,000 times in a 30-min Specjbb test)

The patch introduces "first_back" to record the first task which
is put back to the task list. If we get a task which is equal to
first_back, we break the loop, and avoid to scan twice for it.

Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
---
v2 --> v3:
    Fix a typo in the commit message.
    v2: https://lore.kernel.org/all/20250718054709.8781-1-shijie@os.amperecomputing.com/

v1 --> v2:
    Add more comment from Valentin Schneider
    v1: https://lore.kernel.org/all/20250707083636.38380-1-shijie@os.amperecomputing.com/
---
 kernel/sched/fair.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7e2963efe800..7cc9d50e3e11 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9443,6 +9443,7 @@ static int detach_tasks(struct lb_env *env)
 {
 	struct list_head *tasks = &env->src_rq->cfs_tasks;
 	unsigned long util, load;
+	struct task_struct *first_back = NULL;
 	struct task_struct *p;
 	int detached = 0;
 
@@ -9481,6 +9482,12 @@ static int detach_tasks(struct lb_env *env)
 		}
 
 		p = list_last_entry(tasks, struct task_struct, se.group_node);
+		/*
+		 * We're back to an already visited task that couldn't be
+		 * detached, we've seen all there is to see.
+		 */
+		if (p == first_back)
+			break;
 
 		if (!can_migrate_task(p, env))
 			goto next;
@@ -9562,6 +9569,8 @@ static int detach_tasks(struct lb_env *env)
 			schedstat_inc(p->stats.nr_failed_migrations_hot);
 
 		list_move(&p->se.group_node, tasks);
+		if (!first_back)
+			first_back = p;
 	}
 
 	/*
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] sched/fair: do not scan twice in detach_tasks()
  2025-07-18  6:35 [PATCH v3] sched/fair: do not scan twice in detach_tasks() Huang Shijie
@ 2025-07-18 18:49 ` Valentin Schneider
  2025-07-21  1:55   ` Shijie Huang
  0 siblings, 1 reply; 3+ messages in thread
From: Valentin Schneider @ 2025-07-18 18:49 UTC (permalink / raw)
  To: Huang Shijie, mingo, peterz, juri.lelli, vincent.guittot
  Cc: patches, cl, Shubhang, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel, Huang Shijie

On 18/07/25 14:35, Huang Shijie wrote:
> When detach_tasks() scans the src_cpu's task list, the task list
> may shrink during the scanning. For example, the task list
> may have four tasks at the beginning, it may becomes to two
> during the scanning in detach_tasks():
>     Task list at beginning : "ABCD"
>     Task list in scanning  : "CD"
>
>     (ABCD stands for differnt tasks.)
>
> In this scenario, the env->loop_max is still four, so
> detach_tasks() may scan twice for some tasks:
>     the scanning order maybe : "DCDC"
>

How about something like so:
"""
detach_tasks() uses struct lb_env.loop_max as an env.src_rq->cfs_tasks
iteration count limit. It is however set without the source RQ lock held,
and besides detach_tasks() can be re-invoked after releasing and
re-acquiring the RQ lock per LBF_NEED_BREAK.

This means that env.loop_max and the actual length of env.src_rq->cfs_tasks
as observed within detach_tasks() can differ. This can cause some tasks to
be unnecessarily iterated over more than once, for instance:

  env.loop_max := 4
  detach_tasks()
    // Here env.src->cfs_tasks only contains two tasks which can't be
    // migrated anywhere, so they're put back in the list each time.
    env.src->cfs_tasks := [p1, p0]
    // The iteration goes:
    p0; cfs_tasks := [p0, p1]
    p1; cfs_tasks := [p1, p0]
    p0; cfs_tasks := [p0, p1]
    p1; cfs_tasks := [p1, p0]

    // IOW we iterate over each task twice
"""

> In the Specjbb test, this issue can be catched many times.
                                         ^^^^^^^
                                         caught

> (Over 330,000 times in a 30-min Specjbb test)
>
> The patch introduces "first_back" to record the first task which
> is put back to the task list. If we get a task which is equal to
> first_back, we break the loop, and avoid to scan twice for it.
>
> Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>

Reviewed-by: Valentin Schneider <vschneid@redhat.com>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] sched/fair: do not scan twice in detach_tasks()
  2025-07-18 18:49 ` Valentin Schneider
@ 2025-07-21  1:55   ` Shijie Huang
  0 siblings, 0 replies; 3+ messages in thread
From: Shijie Huang @ 2025-07-21  1:55 UTC (permalink / raw)
  To: Valentin Schneider, Huang Shijie, mingo, peterz, juri.lelli,
	vincent.guittot
  Cc: patches, cl, Shubhang, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel


On 2025/7/19 2:49, Valentin Schneider wrote:
> On 18/07/25 14:35, Huang Shijie wrote:
>> When detach_tasks() scans the src_cpu's task list, the task list
>> may shrink during the scanning. For example, the task list
>> may have four tasks at the beginning, it may becomes to two
>> during the scanning in detach_tasks():
>>      Task list at beginning : "ABCD"
>>      Task list in scanning  : "CD"
>>
>>      (ABCD stands for differnt tasks.)
>>
>> In this scenario, the env->loop_max is still four, so
>> detach_tasks() may scan twice for some tasks:
>>      the scanning order maybe : "DCDC"
>>
> How about something like so:
> """
> detach_tasks() uses struct lb_env.loop_max as an env.src_rq->cfs_tasks
> iteration count limit. It is however set without the source RQ lock held,
> and besides detach_tasks() can be re-invoked after releasing and
> re-acquiring the RQ lock per LBF_NEED_BREAK.
>
> This means that env.loop_max and the actual length of env.src_rq->cfs_tasks
> as observed within detach_tasks() can differ. This can cause some tasks to
> be unnecessarily iterated over more than once, for instance:
>
>    env.loop_max := 4
>    detach_tasks()
>      // Here env.src->cfs_tasks only contains two tasks which can't be
>      // migrated anywhere, so they're put back in the list each time.
>      env.src->cfs_tasks := [p1, p0]
>      // The iteration goes:
>      p0; cfs_tasks := [p0, p1]
>      p1; cfs_tasks := [p1, p0]
>      p0; cfs_tasks := [p0, p1]
>      p1; cfs_tasks := [p1, p0]
>
>      // IOW we iterate over each task twice
> """
Okay, I will change the commit message later..
>> In the Specjbb test, this issue can be catched many times.
>                                           ^^^^^^^
>                                           caught
>
>> (Over 330,000 times in a 30-min Specjbb test)
>>
>> The patch introduces "first_back" to record the first task which
>> is put back to the task list. If we get a task which is equal to
>> first_back, we break the loop, and avoid to scan twice for it.
>>
>> Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
> Reviewed-by: Valentin Schneider <vschneid@redhat.com>


Thanks

Huang Shijie


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-07-21  1:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-18  6:35 [PATCH v3] sched/fair: do not scan twice in detach_tasks() Huang Shijie
2025-07-18 18:49 ` Valentin Schneider
2025-07-21  1:55   ` Shijie Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).