public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Hagar Hemdan <hagarhem@amazon.com>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Ben Segall" <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	<linux-kernel@vger.kernel.org>, <abuehaze@amazon.com>
Subject: Re: [PATCH] /sched/core: Fix Unixbench spawn test regression
Date: Fri, 7 Mar 2025 11:07:33 +0000	[thread overview]
Message-ID: <20250307110733.GA10571@amazon.com> (raw)
In-Reply-To: <20250306162635.2614376-1-dietmar.eggemann@arm.com>

On Thu, Mar 06, 2025 at 05:26:35PM +0100, Dietmar Eggemann wrote:
> Hagar reported a 30% drop in UnixBench spawn test with commit
> eff6c8ce8d4d ("sched/core: Reduce cost of sched_move_task when config
> autogroup") on a m6g.xlarge AWS EC2 instance with 4 vCPUs and 16 GiB RAM
> (aarch64) (single level MC sched domain) [1].
> 
> There is an early bail from sched_move_task() if p->sched_task_group is
> equal to p's 'cpu cgroup' (sched_get_task_group()). E.g. both are
> pointing to taskgroup '/user.slice/user-1000.slice/session-1.scope'
> (Ubuntu '22.04.5 LTS').
> 
> So in:
> 
>   do_exit()
> 
>     sched_autogroup_exit_task()
> 
>       sched_move_task()
> 
>         if sched_get_task_group(p) == p->sched_task_group
>           return
> 
>         /* p is enqueued */
>         dequeue_task()              \
>         sched_change_group()        |
>           task_change_group_fair()  |
>             detach_task_cfs_rq()    |                              (1)
>             set_task_rq()           |
>             attach_task_cfs_rq()    |
>         enqueue_task()              /
> 
> (1) isn't called for p anymore.
> 
> Turns out that the regression is related to sgs->group_util in
> group_is_overloaded() and group_has_capacity(). If (1) isn't called for
> all the 'spawn' tasks then sgs->group_util is ~900 and
> sgs->group_capacity = 1024 (single CPU sched domain) and this leads to
> group_is_overloaded() returning true (2) and group_has_capacity() false
> (3) much more often compared to the case when (1) is called.
> 
> I.e. there are much more cases of 'group_is_overloaded' and
> 'group_fully_busy' in WF_FORK wakeup sched_balance_find_dst_cpu() which
> then returns much more often a CPU != smp_processor_id() (5).
> 
> This isn't good for these extremely short running tasks (FORK + EXIT)
> and also involves calling sched_balance_find_dst_group_cpu() unnecessary
> (single CPU sched domain).
> 
> Instead if (1) is called for 'p->flags & PF_EXITING' then the path
> (4),(6) is taken much more often.
> 
>   select_task_rq_fair(..., wake_flags = WF_FORK)
> 
>     cpu = smp_processor_id()
> 
>     new_cpu = sched_balance_find_dst_cpu(..., cpu, ...)
> 
>       group = sched_balance_find_dst_group(..., cpu)
> 
>         do {
> 
>           update_sg_wakeup_stats()
> 
>             sgs->group_type = group_classify()
> 
>               if group_is_overloaded()                             (2)
>                 return group_overloaded
> 
>               if !group_has_capacity()                             (3)
>                 return group_fully_busy
> 
>               return group_has_spare                               (4)
> 
>         } while group
> 
>         if local_sgs.group_type > idlest_sgs.group_type
>           return idlest                                            (5)
> 
>         case group_has_spare:
> 
>           if local_sgs.idle_cpus >= idlest_sgs.idle_cpus
>             return NULL                                            (6)
> 
> Unixbench Tests './Run -c 4 spawn' on:
> 
> (a) VM AWS instance (m7gd.16xlarge) with v6.13 ('maxcpus=4 nr_cpus=4')
>     and Ubuntu 22.04.5 LTS (aarch64).
> 
>     Shell & test run in '/user.slice/user-1000.slice/session-1.scope'.
> 
>     w/o patch	w/ patch
>     21005	27120
> 
> (b) i7-13700K with tip/sched/core ('nosmt maxcpus=8 nr_cpus=8') and
>     Ubuntu 22.04.5 LTS (x86_64).
> 
>     Shell & test run in '/A'.
> 
>     w/o patch	w/ patch
>     67675	88806
> 
> CONFIG_SCHED_AUTOGROUP=y & /sys/proc/kernel/sched_autogroup_enabled equal
> 0 or 1.
> 
> [1] https://lkml.kernel.org/r/20250205151026.13061-1-hagarhem@amazon.com
> 
> Reported-by: Hagar Hemdan <hagarhem@amazon.com>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>  kernel/sched/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b00f884701a6..ca0e3c2eb94a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9064,7 +9064,7 @@ void sched_move_task(struct task_struct *tsk)
>  	 * group changes.
>  	 */
>  	group = sched_get_task_group(tsk);
> -	if (group == tsk->sched_task_group)
> +	if ((group == tsk->sched_task_group) && !(tsk->flags & PF_EXITING))
>  		return;
>  
>  	update_rq_clock(rq);
> -- 
> 2.34.1
>

Thank you very much for submitting the fix and for all the explanations.

Could you please add the "Fixes:" tag for commit eff6c8ce8d4d to your patch? So that it is backported to the stable 6.12.
And actually this has been discovered internally by <abuehaze@amazon> so please add Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com> and Tested-by: Hagar Hemdan <hagarhem@amazon.com>.

Thanks,
Hagar

  reply	other threads:[~2025-03-07 11:07 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-06 16:26 [PATCH] /sched/core: Fix Unixbench spawn test regression Dietmar Eggemann
2025-03-07 11:07 ` Hagar Hemdan [this message]
2025-03-10 13:59 ` Vincent Guittot
2025-03-10 15:29   ` Dietmar Eggemann
2025-03-11 16:35     ` Vincent Guittot
2025-03-12 14:41       ` Dietmar Eggemann
2025-03-12 16:35         ` Vincent Guittot
2025-03-13  9:21         ` Hagar Hemdan
2025-03-14 16:06           ` Vincent Guittot
2025-03-14 16:20             ` Hagar Hemdan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250307110733.GA10571@amazon.com \
    --to=hagarhem@amazon.com \
    --cc=abuehaze@amazon.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox