Re: [PATCH v3] sched/fair: Use all little CPUs for CPU-bound workload

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v3] sched/fair: Use all little CPUs for CPU-bound workload
       [not found] <20231206090043.634697-1-pierre.gondois@arm.com>
@ 2024-06-25 13:25 ` Pierre Gondois
  2024-07-29  9:50   ` Pierre Gondois
  0 siblings, 1 reply; 3+ messages in thread
From: Pierre Gondois @ 2024-06-25 13:25 UTC (permalink / raw)
  To: stable
  Cc: linux-kernel, Qais Yousef, Vincent Guittot, Dietmar Eggemann,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, Lukasz Luba

Hello stable folk,

This patch was merged as:
   commit 3af7524b1419 ("sched/fair: Use all little CPUs for CPU-bound workloads")
into 6.7, improving the following:
   commit 0b0695f2b34a ("sched/fair: Rework load_balance()")

Would it be possible to port it to the 6.1 stable branch ?
The patch should apply cleanly by cherry-picking onto v6.1.94,

Regards,
Pierre


On 12/6/23 10:00, Pierre Gondois wrote:
> Running n CPU-bound tasks on an n CPUs platform:
> - with asymmetric CPU capacity
> - not being a DynamIq system (i.e. having a PKG level sched domain
>    without the SD_SHARE_PKG_RESOURCES flag set)
> might result in a task placement where two tasks run on a big CPU
> and none on a little CPU. This placement could be more optimal by
> using all CPUs.
> 
> Testing platform:
> Juno-r2:
> - 2 big CPUs (1-2), maximum capacity of 1024
> - 4 little CPUs (0,3-5), maximum capacity of 383
> 
> Testing workload ([1]):
> Spawn 6 CPU-bound tasks. During the first 100ms (step 1), each tasks
> is affine to a CPU, except for:
> - one little CPU which is left idle.
> - one big CPU which has 2 tasks affine.
> After the 100ms (step 2), remove the cpumask affinity.
> 
> Before patch:
> During step 2, the load balancer running from the idle CPU tags sched
> domains as:
> - little CPUs: 'group_has_spare'. Cf. group_has_capacity() and
>    group_is_overloaded(), 3 CPU-bound tasks run on a 4 CPUs
>    sched-domain, and the idle CPU provides enough spare capacity
>    regarding the imbalance_pct
> - big CPUs: 'group_overloaded'. Indeed, 3 tasks run on a 2 CPUs
>    sched-domain, so the following path is used:
>    group_is_overloaded()
>    \-if (sgs->sum_nr_running <= sgs->group_weight) return true;
> 
>    The following path which would change the migration type to
>    'migrate_task' is not taken:
>    calculate_imbalance()
>    \-if (env->idle != CPU_NOT_IDLE && env->imbalance == 0)
>    as the local group has some spare capacity, so the imbalance
>    is not 0.
> 
> The migration type requested is 'migrate_util' and the busiest
> runqueue is the big CPU's runqueue having 2 tasks (each having a
> utilization of 512). The idle little CPU cannot pull one of these
> task as its capacity is too small for the task. The following path
> is used:
> detach_tasks()
> \-case migrate_util:
>    \-if (util > env->imbalance) goto next;
> 
> After patch:
> As the number of failed balancing attempts grows (with
> 'nr_balance_failed'), progressively make it easier to migrate
> a big task to the idling little CPU. A similar mechanism is
> used for the 'migrate_load' migration type.
> 
> Improvement:
> Running the testing workload [1] with the step 2 representing
> a ~10s load for a big CPU:
> Before patch: ~19.3s
> After patch: ~18s (-6.7%)
> 
> Similar issue reported at:
> https://lore.kernel.org/lkml/20230716014125.139577-1-qyousef@layalina.io/
> 
> v1:
> https://lore.kernel.org/all/20231110125902.2152380-1-pierre.gondois@arm.com/
> v2:
> https://lore.kernel.org/all/20231124153323.3202444-1-pierre.gondois@arm.com/
> 
> Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
> 
> Notes:
>      v2:
>      - Used Vincent's approach.
>      v3:
>      - Updated commit message.
>      - Added Reviewed-by tags
> 
>   kernel/sched/fair.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d7a3c63a2171..9481b8cff31b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9060,7 +9060,7 @@ static int detach_tasks(struct lb_env *env)
>   		case migrate_util:
>   			util = task_util_est(p);
>   
> -			if (util > env->imbalance)
> +			if (shr_bound(util, env->sd->nr_balance_failed) > env->imbalance)
>   				goto next;
>   
>   			env->imbalance -= util;

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] sched/fair: Use all little CPUs for CPU-bound workload
  2024-06-25 13:25 ` [PATCH v3] sched/fair: Use all little CPUs for CPU-bound workload Pierre Gondois
@ 2024-07-29  9:50   ` Pierre Gondois
  2024-07-29 10:07     ` Greg KH
  0 siblings, 1 reply; 3+ messages in thread
From: Pierre Gondois @ 2024-07-29  9:50 UTC (permalink / raw)
  To: stable, Sasha Levin, Lukasz Luba
  Cc: linux-kernel, Qais Yousef, Vincent Guittot, Dietmar Eggemann,
	Ingo Molnar, Peter Zijlstra

Hello Sasha,
Would it be possible to pick this patch for the 6.1 stable branch ?
Or is there something I should do for this purpose ?

Regards,
Pierre

On 6/25/24 15:25, Pierre Gondois wrote:
> Hello stable folk,
> 
> This patch was merged as:
>     commit 3af7524b1419 ("sched/fair: Use all little CPUs for CPU-bound workloads")
> into 6.7, improving the following:
>     commit 0b0695f2b34a ("sched/fair: Rework load_balance()")
> 
> Would it be possible to port it to the 6.1 stable branch ?
> The patch should apply cleanly by cherry-picking onto v6.1.94,
> 
> Regards,
> Pierre
> 
> 
> On 12/6/23 10:00, Pierre Gondois wrote:
>> Running n CPU-bound tasks on an n CPUs platform:
>> - with asymmetric CPU capacity
>> - not being a DynamIq system (i.e. having a PKG level sched domain
>>     without the SD_SHARE_PKG_RESOURCES flag set)
>> might result in a task placement where two tasks run on a big CPU
>> and none on a little CPU. This placement could be more optimal by
>> using all CPUs.
>>
>> Testing platform:
>> Juno-r2:
>> - 2 big CPUs (1-2), maximum capacity of 1024
>> - 4 little CPUs (0,3-5), maximum capacity of 383
>>
>> Testing workload ([1]):
>> Spawn 6 CPU-bound tasks. During the first 100ms (step 1), each tasks
>> is affine to a CPU, except for:
>> - one little CPU which is left idle.
>> - one big CPU which has 2 tasks affine.
>> After the 100ms (step 2), remove the cpumask affinity.
>>
>> Before patch:
>> During step 2, the load balancer running from the idle CPU tags sched
>> domains as:
>> - little CPUs: 'group_has_spare'. Cf. group_has_capacity() and
>>     group_is_overloaded(), 3 CPU-bound tasks run on a 4 CPUs
>>     sched-domain, and the idle CPU provides enough spare capacity
>>     regarding the imbalance_pct
>> - big CPUs: 'group_overloaded'. Indeed, 3 tasks run on a 2 CPUs
>>     sched-domain, so the following path is used:
>>     group_is_overloaded()
>>     \-if (sgs->sum_nr_running <= sgs->group_weight) return true;
>>
>>     The following path which would change the migration type to
>>     'migrate_task' is not taken:
>>     calculate_imbalance()
>>     \-if (env->idle != CPU_NOT_IDLE && env->imbalance == 0)
>>     as the local group has some spare capacity, so the imbalance
>>     is not 0.
>>
>> The migration type requested is 'migrate_util' and the busiest
>> runqueue is the big CPU's runqueue having 2 tasks (each having a
>> utilization of 512). The idle little CPU cannot pull one of these
>> task as its capacity is too small for the task. The following path
>> is used:
>> detach_tasks()
>> \-case migrate_util:
>>     \-if (util > env->imbalance) goto next;
>>
>> After patch:
>> As the number of failed balancing attempts grows (with
>> 'nr_balance_failed'), progressively make it easier to migrate
>> a big task to the idling little CPU. A similar mechanism is
>> used for the 'migrate_load' migration type.
>>
>> Improvement:
>> Running the testing workload [1] with the step 2 representing
>> a ~10s load for a big CPU:
>> Before patch: ~19.3s
>> After patch: ~18s (-6.7%)
>>
>> Similar issue reported at:
>> https://lore.kernel.org/lkml/20230716014125.139577-1-qyousef@layalina.io/
>>
>> v1:
>> https://lore.kernel.org/all/20231110125902.2152380-1-pierre.gondois@arm.com/
>> v2:
>> https://lore.kernel.org/all/20231124153323.3202444-1-pierre.gondois@arm.com/
>>
>> Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
>> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
>> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
>> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
>> ---
>>
>> Notes:
>>       v2:
>>       - Used Vincent's approach.
>>       v3:
>>       - Updated commit message.
>>       - Added Reviewed-by tags
>>
>>    kernel/sched/fair.c | 2 +-
>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index d7a3c63a2171..9481b8cff31b 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -9060,7 +9060,7 @@ static int detach_tasks(struct lb_env *env)
>>    		case migrate_util:
>>    			util = task_util_est(p);
>>    
>> -			if (util > env->imbalance)
>> +			if (shr_bound(util, env->sd->nr_balance_failed) > env->imbalance)
>>    				goto next;
>>    
>>    			env->imbalance -= util;
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] sched/fair: Use all little CPUs for CPU-bound workload
  2024-07-29  9:50   ` Pierre Gondois
@ 2024-07-29 10:07     ` Greg KH
  0 siblings, 0 replies; 3+ messages in thread
From: Greg KH @ 2024-07-29 10:07 UTC (permalink / raw)
  To: Pierre Gondois
  Cc: stable, Sasha Levin, Lukasz Luba, linux-kernel, Qais Yousef,
	Vincent Guittot, Dietmar Eggemann, Ingo Molnar, Peter Zijlstra

On Mon, Jul 29, 2024 at 11:50:40AM +0200, Pierre Gondois wrote:
> Hello Sasha,
> Would it be possible to pick this patch for the 6.1 stable branch ?
> Or is there something I should do for this purpose ?
> 
> Regards,
> Pierre
> 
> On 6/25/24 15:25, Pierre Gondois wrote:
> > Hello stable folk,
> > 
> > This patch was merged as:
> >     commit 3af7524b1419 ("sched/fair: Use all little CPUs for CPU-bound workloads")
> > into 6.7, improving the following:
> >     commit 0b0695f2b34a ("sched/fair: Rework load_balance()")
> > 
> > Would it be possible to port it to the 6.1 stable branch ?
> > The patch should apply cleanly by cherry-picking onto v6.1.94,

You also forgot 5.10.y and 5.15.y which need it, I've queued it up for
those as well now, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-07-29 10:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20231206090043.634697-1-pierre.gondois@arm.com>
2024-06-25 13:25 ` [PATCH v3] sched/fair: Use all little CPUs for CPU-bound workload Pierre Gondois
2024-07-29  9:50   ` Pierre Gondois
2024-07-29 10:07     ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox