public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Hagar Hemdan <hagarhem@amazon.com>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: <abuehaze@amazon.com>, <linux-kernel@vger.kernel.org>,
	<hagarhem@amazon.com>, <wuchi.zero@gmail.com>
Subject: Re: BUG Report: Fork benchmark drop by 30% on aarch64
Date: Fri, 28 Feb 2025 19:39:45 +0000	[thread overview]
Message-ID: <20250228193945.GA13237@amazon.com> (raw)
In-Reply-To: <5f92761b-c7d4-4b96-9398-183a5bf7556a@arm.com>

On Mon, Feb 17, 2025 at 11:51:45PM +0100, Dietmar Eggemann wrote:
> On 13/02/2025 19:55, Dietmar Eggemann wrote:
> > On 11/02/2025 22:40, Hagar Hemdan wrote:
> >> On Tue, Feb 11, 2025 at 05:27:47PM +0100, Dietmar Eggemann wrote:
> >>> On 10/02/2025 22:31, Hagar Hemdan wrote:
> >>>> On Mon, Feb 10, 2025 at 11:38:51AM +0100, Dietmar Eggemann wrote:
> >>>>> On 07/02/2025 12:07, Hagar Hemdan wrote:
> >>>>>> On Fri, Feb 07, 2025 at 10:14:54AM +0100, Dietmar Eggemann wrote:
> >>>>>>> Hi Hagar,
> >>>>>>>
> >>>>>>> On 05/02/2025 16:10, Hagar Hemdan wrote:
> >>>
> >>> [...]
> >>>
> >>>>> The 'spawn' tasks in sched_move_task() are 'running' and 'queued' so we
> >>>>> call dequeue_task(), put_prev_task(), enqueue_task() and
> >>>>> set_next_task().
> >>>>>
> >>>>> I guess what we need here is the cfs_rq->avg.load_avg (cpu_load() in
> >>>>> case of root tg) update in:
> >>>>>
> >>>>>   task_change_group_fair() -> detach_task_cfs_rq() -> ...,
> >>>>>   attach_task_cfs_rq() -> ...
> >>>>>
> >>>>> since this is used for WF_FORK, WF_EXEC handling in wakeup:
> >>>>>
> >>>>>   select_task_rq_fair() -> sched_balance_find_dst_cpu() ->
> >>>>>   sched_balance_find_dst_group_cpu()
> >>>>>
> >>>>> in form of 'least_loaded_cpu' and 'load = cpu_load(cpu_rq(i)'.
> >>>>>
> >>>>> You mentioned AutoGroups (AG). I don't see this issue on my Debian 12
> >>>>> Juno-r0 Arm64 board. When I run w/ AG, 'group' is '/' and
> >>>>> 'tsk->sched_task_group' is '/autogroup-x' so the condition 'if (group ==
> >>>>> tsk->sched_task_group)' isn't true in sched_move_task(). If I disable AG
> >>>>> then they match "/" == "/".
> >>>>>
> >>>>> I assume you run Ubuntu on your AWS instances? What kind of
> >>>>> 'cgroup/taskgroup' related setup are you using?
> >>>>
> >>>> I'm running AL2023 and use Vanilla kernel 6.13.1 on m6g.xlarge AWS instance.
> >>>> AL2023 uses cgroupv2 by default.
> >>>>>
> >>>>> Can you run w/ this debug snippet w/ and w/o AG enabled?
> >>>>
> >>>> I have run that and have attached the trace files to this email.
> >>>
> >>> Thanks!
> >>>
> >>> So w/ AG you see that 'group' and 'tsk->sched_task_group' are both
> >>> '/user.slice/user-1000.slice/session-1.scope' so we bail for those tasks
> >>> w/o doing the 'cfs_rq->avg.load_avg' update I described above.
> >>
> >> yes, both groups are identical so it returns from sched_move_task()
> >> without {de|en}queue and without call task_change_group_fair().
> > 
> > OK.
> > 
> >>> You said that there is no issue w/o AG. 
> >>
> >> To clarify, I meant by there's no regression when autogroup is disabled,
> >> that the fork results w/o AG remain consistent with or without the commit 
> >> "sched/core: Reduce cost of sched_move_task when config autogroup". However,
> >> the fork results are consistently lower when AG disabled compared to when
> >> it's enabled (without commit applied). This is illustrated in the tables
> >> provided in the report.
> > 
> > OK, but I don't quite get yet why w/o AG the results are lower even w/o
> > eff6c8ce8d4d? Have to dig further I guess. Maybe there is more than this
> > p->se.avg.load_avg update when we go via task_change_group_fair()?
> 
> './Run -c 4 spawn' on AWS instance (m7gd.16xlarge) with v6.13, 'mem=16G
> maxcpus=4 nr_cpus=4' and Ubuntu '22.04.5 LTS':
> 
> CFG_SCHED_AUTOGROUP | sched_ag_enabled | eff6c8ce8d4d | Fork (lps)
> 
>    	y	             1		   y            21005 (27120 **)
> 	y		     0		   y            21059 (27012 **)
> 	n		     -		   y            21299
> 	y		     1		   n	        27745 *
> 	y		     0		   n	        27493 *
> 	n		     -		   n	        20928
> 
> (*) So here the higher numbers are only achieved when
> 'sched_autogroup_exit_task() -> sched_move_task() ->
> sched_change_group() is called for the 'spawn' tasks.
> 
> (**) When I apply the fix from
> https://lkml.kernel.org/r/4a9cc5ab-c538-4427-8a7c-99cb317a283f@arm.com.

This is currently impacting our kernel, do you
have any concerns to submit this fix upstream?
Thanks,
Hagar
> 
> These results support the story that we need:
> 
>   task_change_group_fair() -> detach_task_cfs_rq() -> ...,
>   attach_task_cfs_rq() -> ...
> 
> i.e. the related 'cfs_rq->avg.load_avg' update during do_exit() so that
> WF_FORK handling in wakeup:
> 
>   select_task_rq_fair() -> sched_balance_find_dst_cpu() ->
>   sched_balance_find_dst_group_cpu()
> 
> can use more recent 'load = cpu_load(cpu_rq(i)' values to get a better
> 'least_loaded_cpu'.
> 
> The AWS instance runs systemd so shell and test run in a taskgroup other
> than root which trumps autogroups:
> 
>   task_wants_autogroup()
> 
>      if (tg != &root_task_group)
>        return false;
> 
>      ...
> 
> That's why 'group == tsk->sched_task_group' in sched_move_task() is
> true, which is different on my Juno: the shell from which I launch the
> tests runs in '/' so that the test ends up in an autogroup, i.e. 'group
> != tsk->sched_task_group'.
> 
> [...]

  parent reply	other threads:[~2025-02-28 19:39 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-05 15:10 BUG Report: Fork benchmark drop by 30% on aarch64 Hagar Hemdan
2025-02-07  9:14 ` Dietmar Eggemann
2025-02-07 11:07   ` Hagar Hemdan
2025-02-10 10:38     ` Dietmar Eggemann
2025-02-10 21:31       ` Hagar Hemdan
2025-02-11 16:27         ` Dietmar Eggemann
2025-02-11 21:40           ` Hagar Hemdan
2025-02-13 18:55             ` Dietmar Eggemann
2025-02-17 22:51               ` Dietmar Eggemann
2025-02-21  6:44                 ` Hagar Hemdan
2025-03-03 10:05                   ` Dietmar Eggemann
2025-03-03 13:57                     ` Hagar Hemdan
2025-02-28 19:39                 ` Hagar Hemdan [this message]
2025-03-03 10:06                   ` Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250228193945.GA13237@amazon.com \
    --to=hagarhem@amazon.com \
    --cc=abuehaze@amazon.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=wuchi.zero@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox