From: K Prateek Nayak <kprateek.nayak@amd.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Jean-Baptiste Roquefere <jb.roquefere@ateme.com>,
Peter Zijlstra <peterz@infradead.org>,
"mingo@kernel.org" <mingo@kernel.org>,
Juri Lelli <juri.lelli@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Borislav Petkov <bp@alien8.de>,
"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
"Gautham R. Shenoy" <gautham.shenoy@amd.com>,
Swapnil Sapkal <swapnil.sapkal@amd.com>,
Valentin Schneider <vschneid@redhat.com>,
"regressions@lists.linux.dev" <regressions@lists.linux.dev>,
"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: IPC drop down on AMD epyc 7702P
Date: Mon, 5 May 2025 17:59:13 +0530 [thread overview]
Message-ID: <de22462b-cda6-400f-b28c-4d1b9b244eec@amd.com> (raw)
In-Reply-To: <CAKfTPtBovA700=_0BajnzkdDP6MkdgLU=E3M0GTq4zoLW=RGhA@mail.gmail.com>
Hello Vincent,
On 5/5/2025 3:58 PM, Vincent Guittot wrote:
> On Wed, 30 Apr 2025 at 11:13, K Prateek Nayak<kprateek.nayak@amd.com> wrote:
>> (+ more scheduler folks)
>>
>> tl;dr
>>
>> JB has a workload that hates aggressive migration on the 2nd Generation
>> EPYC platform that has a small LLC domain (4C/8T) and very noticeable
>> C2C latency.
>>
>> Based on JB's observation so far, reverting commit 16b0a7a1a0af
>> ("sched/fair: Ensure tasks spreading in LLC during LB") and commit
>> c5b0a7eefc70 ("sched/fair: Remove sysctl_sched_migration_cost
>> condition") helps the workload. Both those commits allow aggressive
>> migrations for work conservation except it also increased cache
>> misses which slows the workload quite a bit.
> commit 16b0a7a1a0af ("sched/fair: Ensure tasks spreading in LLC
> during LB") eases the spread of task inside a LLC so It's not obvious
> for me how it would increase "a lot of CPU migrations go out of CCX,
> then L3 miss,". On the other hand, it will spread task in SMT and in
> LLC which can prevent running at highest freq on some system but I
> don't know if it's relevant for this SoC.
I misspoke there. JB's workload seems to be sensitive even to core to
core migrations - "relax_domain_level=2" actually disabled newidle
balance above CLUSTER level which is a subset of MC on x86 and gets
degenerated into the SMT domain.
>
> commit c5b0a7eefc70 ("sched/fair: Remove sysctl_sched_migration_cost
> condition") makes newly idle migration happen more often which can
> then do migrate tasks across LLC. But then It's more about why
> enabling newly idle load balance out of LLC if it is so costly.
It seems to be very workload + possibly platform specific
characteristic where re-priming the cache is actually very costly.
I'm not sure if there are any other uarch factors at play here that
require repriming (branch prediction, prefetcher, etc.) after a task
migration to reach same IPC.
Essentially "relax_domain_level" gets the desired characteristic
where only the periodic balance will balance long-term imbalance
but as Libo mentioned the short term imbalances can build up
and using "relax_domain_level" might lead to other problems.
Short of pinning / more analysis of which part of migrations make
the workload unhappy, I couldn't think of a better way to
communicate this requirement.
>
>> "relax_domain_level" helps but cannot be set at runtime and I couldn't
>> think of any stable / debug interfaces that JB hasn't tried out
>> already that can help this workload.
>>
>> There is a patch towards the end to set "relax_domain_level" at
>> runtime but given cpusets got away with this when transitioning to
>> cgroup-v2, I don't know what the sentiments are around its usage.
>> Any input / feedback is greatly appreciated.
--
Thanks and Regards,
Prateek
next prev parent reply other threads:[~2025-05-05 12:29 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-17 21:08 IPC drop down on AMD epyc 7702P Jean-Baptiste Roquefere
2025-04-18 6:39 ` K Prateek Nayak
2025-04-28 7:43 ` Jean-Baptiste Roquefere
2025-04-30 9:13 ` K Prateek Nayak
2025-04-30 9:25 ` Peter Zijlstra
2025-04-30 10:41 ` Libo Chen
2025-04-30 11:29 ` K Prateek Nayak
2025-05-01 2:46 ` Libo Chen
2025-05-05 10:28 ` Vincent Guittot
2025-05-05 12:29 ` K Prateek Nayak [this message]
2025-05-05 15:10 ` Vincent Guittot
2025-05-05 15:16 ` K Prateek Nayak
2025-05-16 15:05 ` Jean-Baptiste Roquefere
2025-05-22 14:51 ` Vincent Guittot
2025-05-23 12:24 ` Jean-Baptiste Roquefere
2025-05-26 7:53 ` Vincent Guittot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=de22462b-cda6-400f-b28c-4d1b9b244eec@amd.com \
--to=kprateek.nayak@amd.com \
--cc=bp@alien8.de \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=gautham.shenoy@amd.com \
--cc=jb.roquefere@ateme.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=regressions@lists.linux.dev \
--cc=rostedt@goodmis.org \
--cc=stable@vger.kernel.org \
--cc=swapnil.sapkal@amd.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox