From: Libo Chen <libo.chen@oracle.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>,
Jean-Baptiste Roquefere <jb.roquefere@ateme.com>,
Peter Zijlstra <peterz@infradead.org>,
"mingo@kernel.org" <mingo@kernel.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
"Gautham R. Shenoy" <gautham.shenoy@amd.com>,
Swapnil Sapkal <swapnil.sapkal@amd.com>,
Valentin Schneider <vschneid@redhat.com>,
"regressions@lists.linux.dev" <regressions@lists.linux.dev>,
"stable@vger.kernel.org" <stable@vger.kernel.org>,
Konrad Wilk <konrad.wilk@oracle.com>
Subject: Re: IPC drop down on AMD epyc 7702P
Date: Wed, 30 Apr 2025 19:46:06 -0700 [thread overview]
Message-ID: <f94a10cb-e65d-4697-875e-43f624f79099@oracle.com> (raw)
In-Reply-To: <020e7310-397c-4967-9635-8e197078f333@amd.com>
Hi Prateek,
On 4/30/25 04:29, K Prateek Nayak wrote:
> Hello Libo,
>
> On 4/30/2025 4:11 PM, Libo Chen wrote:
>>
>>
>> On 4/30/25 02:13, K Prateek Nayak wrote:
>>> (+ more scheduler folks)
>>>
>>> tl;dr
>>>
>>> JB has a workload that hates aggressive migration on the 2nd Generation
>>> EPYC platform that has a small LLC domain (4C/8T) and very noticeable
>>> C2C latency.
>>>
>>> Based on JB's observation so far, reverting commit 16b0a7a1a0af
>>> ("sched/fair: Ensure tasks spreading in LLC during LB") and commit
>>> c5b0a7eefc70 ("sched/fair: Remove sysctl_sched_migration_cost
>>> condition") helps the workload. Both those commits allow aggressive
>>> migrations for work conservation except it also increased cache
>>> misses which slows the workload quite a bit.
>>>
>>> "relax_domain_level" helps but cannot be set at runtime and I couldn't
>>> think of any stable / debug interfaces that JB hasn't tried out
>>> already that can help this workload.
>>>
>>> There is a patch towards the end to set "relax_domain_level" at
>>> runtime but given cpusets got away with this when transitioning to
>>> cgroup-v2, I don't know what the sentiments are around its usage.
>>> Any input / feedback is greatly appreciated.
>>>
>>
>>
>> Hi Prateek,
>>
>> Oh no, not "relax_domain_level" again, this can lead to load imbalance
>> in variety of ways. We were so glad this one went away with cgroupv2,
>
> I agree it is not pretty. JB also tried strategic pinning and they
> did report that things are better overall but unfortunately, it is
> very hard to deploy across multiple architectures and would also
> require some redesign + testing from their application side.
>
I was more of stressing broadly how bad setting "relax_domain_level"
could go wrong if an user doesn't know this essentially disables newidle
balancing at higher levels, so the ability to balance loads across CCXes
or NUMA nodes will be a lot weaker. A subset of CCXes may consistently
get much more loads due to a whole bunch of reasons. Sometimes this is
hard to spot in testing, but does show up in real-world scenarios, esp.
when users have other weird hacks.
>> it tends to be abused by users as an "easy" fix for some urgent perf
>> issues instead of addressing their root causes.
>
> Was there ever a report of similar issue where migrations for right
> reasons has led to performance degradation as a result of platform
> architecture? I doubt there is a straightforward way to solve this
> using the current interfaces - at least I haven't found one yet.
>
It wasn't due to platform architecture for us but more of "exotic" NUMA
topology (like a cubic, a node is one hop away from 3 neighbors, two
hops away from other 4) in combination with certain userlevel settings
that cause more wakeups in a subset of domains. If relax_domain_level
is left untouched, then you get no load imbalance but perf is bad. But
once you set relax_domain_level to restrict newidle balancing to lower
domain levels, you actually see better performance numbers in testing
even though CPU loads are not well-balanced. Until one day, you find
out the imbalance is so bad that it slows down everything. Luckily it
wasn't too hard to fix from the application side.
I get it may not be easy to fix from their application side in this
case and but I still think this is too hackery, one may end up
regretting.
I certainly want to hear what others think about relax_domain_level!
> Perhaps cache-aware scheduling is the way forward to solve these
> set of issues as Peter highlighted.
>
Hope so! We will start test that series and provide feedback
Thanks,
Libo
next prev parent reply other threads:[~2025-05-01 2:46 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-17 21:08 IPC drop down on AMD epyc 7702P Jean-Baptiste Roquefere
2025-04-18 6:39 ` K Prateek Nayak
2025-04-28 7:43 ` Jean-Baptiste Roquefere
2025-04-30 9:13 ` K Prateek Nayak
2025-04-30 9:25 ` Peter Zijlstra
2025-04-30 10:41 ` Libo Chen
2025-04-30 11:29 ` K Prateek Nayak
2025-05-01 2:46 ` Libo Chen [this message]
2025-05-05 10:28 ` Vincent Guittot
2025-05-05 12:29 ` K Prateek Nayak
2025-05-05 15:10 ` Vincent Guittot
2025-05-05 15:16 ` K Prateek Nayak
2025-05-16 15:05 ` Jean-Baptiste Roquefere
2025-05-22 14:51 ` Vincent Guittot
2025-05-23 12:24 ` Jean-Baptiste Roquefere
2025-05-26 7:53 ` Vincent Guittot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f94a10cb-e65d-4697-875e-43f624f79099@oracle.com \
--to=libo.chen@oracle.com \
--cc=bp@alien8.de \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=gautham.shenoy@amd.com \
--cc=jb.roquefere@ateme.com \
--cc=juri.lelli@redhat.com \
--cc=konrad.wilk@oracle.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=regressions@lists.linux.dev \
--cc=rostedt@goodmis.org \
--cc=stable@vger.kernel.org \
--cc=swapnil.sapkal@amd.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox