From: Jean-Baptiste Roquefere <jb.roquefere@ateme.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>,
"stable@vger.kernel.org" <stable@vger.kernel.org>,
"Gautham R. Shenoy" <gautham.shenoy@amd.com>,
Swapnil Sapkal <swapnil.sapkal@amd.com>
Cc: "regressions@lists.linux.dev" <regressions@lists.linux.dev>,
"mingo@kernel.org" <mingo@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Borislav Petkov <bp@alien8.de>
Subject: Re: IPC drop down on AMD epyc 7702P
Date: Mon, 28 Apr 2025 07:43:05 +0000 [thread overview]
Message-ID: <996ca8cb-3ac8-4f1b-93f1-415f43922d7a@ateme.com> (raw)
In-Reply-To: <4c0f13ab-c9cd-42c4-84bd-244365b450e2@amd.com>
[-- Attachment #1: Type: text/plain, Size: 3169 bytes --]
Hello Prateek,
thank's for your reponse.
> Looking at the commit logs, it looks like these commits do solve other
> problems around load balancing and might not be trivial to revert
> without evaluating the damages.
it's definitely not a productizable workaround !
> The processor you are running on, the AME EPYC 7702P based on the Zen2
> architecture contains 4 cores / 8 threads per CCX (LLC domain) which is
> perhaps why reducing the thread count to below this limit is helping
> your workload.
>
> What we suspect is that when running the workload, the threads that
> regularly sleep trigger a newidle balancing which causes them to move
> to another CCX leading to higher number of L3 misses.
>
> To confirm this, would it be possible to run the workload with the
> not-yet-upstream perf sched stats [1] tool and share the result from
> perf sched stats diff for the data from v6.12.17 and v6.12.17 + patch
> to rule out any other second order effect.
>
> [1]
> https://lore.kernel.org/all/20250311120230.61774-1-swapnil.sapkal@amd.com/
I had to patch tools/perf/util/session.c : static int
open_file_read(struct perf_data *data) due to "failed to open perf.data:
File exists" (looked more like a compiler issue than a tool/perf issue)
$ ./perf sched stats diff perf.data.6.12.17 perf.data.6.12.17patched >
perf.diff (see perf.diff attached)
> Assuming you control these deployments, would it possible to run
> the workload on a kernel running with "relax_domain_level=2" kernel
> cmdline that restricts newidle balance to only within the CCX. As a
> side effect, it also limits task wakeups to the same LLC domain but
> I would still like to know if this makes a difference to the
> workload you are running.
On vanilla 6.12.17 it gives the IPC we expected:
+--------------------+--------------------------+-----------------------+
| | relax_domain_level unset | relax_domain_level=2 |
+--------------------+--------------------------+-----------------------+
| Threads | 210 | 210 |
| Utilization (%) | 65,86 | 52,01 |
| CPU effective freq | 1 622,93 | 1 294,12 |
| IPC | 1,14 | 1,42 |
| L2 access (pti) | 34,36 | 38,18 |
| L2 miss (pti) | 7,34 | 7,78 |
| L3 miss (abs) | 39 711 971 741 | 33 929 609 924 |
| Mem (GB/s) | 70,68 | 49,10 |
| Context switches | 109 281 524 | 107 896 729 |
+--------------------+--------------------------+-----------------------+
Kind regards,
JB
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: perf.diff --]
[-- Type: text/x-patch; name="perf.diff", Size: 20149 bytes --]
Columns description
----------------------------------------------------------------------------------------------------
DESC -> Description of the field
COUNT -> Value of the field
PCT_CHANGE -> Percent change with corresponding base value
AVG_JIFFIES -> Avg time in jiffies between two consecutive occurrence of event
----------------------------------------------------------------------------------------------------
Time elapsed (in jiffies) : 48349, 48345
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
CPU <ALL CPUS SUMMARY>
----------------------------------------------------------------------------------------------------
DESC COUNT1 COUNT2 PCT_CHANGE PCT_CHANGE1 PCT_CHANGE2
----------------------------------------------------------------------------------------------------
sched_yield() count : 0, 8 | 0.00% |
Legacy counter can be ignored : 0, 0 | 0.00% |
schedule() called : 856174, 886448 | 3.54% |
schedule() left the processor idle : 354060, 396363 | 11.95% | ( 41.35%, 44.71% )
try_to_wake_up() was called : 478156, 469763 | -1.76% |
try_to_wake_up() was called to wake up the local cpu : 71136, 42146 | -40.75% | ( 14.88%, 8.97% )
total runtime by tasks on this processor (in jiffies) : 123927676874,108531911002 | -12.42% |
total waittime by tasks on this processor (in jiffies) : 34729211241,27076295778 | -22.04% | ( 28.02%, 24.95% )
total timeslices run on this cpu : 501606, 489799 | -2.35% |
----------------------------------------------------------------------------------------------------
CPU <ALL CPUS SUMMARY>, DOMAIN 0
----------------------------------------------------------------------------------------------------
DESC COUNT1 COUNT2 PCT_CHANGE AVG_JIFFIES1 AVG_JIFFIES2
----------------------------------------- <Category busy> ------------------------------------------
load_balance() count on cpu busy : 2494, 730 | -70.73% | $ 19.39, 66.23 $
load_balance() found balanced on cpu busy : 2445, 641 | -73.78% | $ 19.77, 75.42 $
load_balance() move task failed on cpu busy : 20, 55 | 175.00% | $ 2417.45, 879.00 $
imbalance sum on cpu busy : 453, 29661 | 6447.68% |
pull_task() count on cpu busy : 29, 35 | 20.69% |
pull_task() when target task was cache-hot on cpu busy : 0, 0 | 0.00% |
load_balance() failed to find busier queue on cpu busy : 0, 0 | 0.00% | $ 0.00, 0.00 $
load_balance() failed to find busier group on cpu busy : 2445, 641 | -73.78% | $ 19.77, 75.42 $
*load_balance() success count on cpu busy : 29, 34 | 17.24% |
*avg task pulled per successful lb attempt (cpu busy) : 1.00, 1.03 | 2.94% |
----------------------------------------- <Category idle> ------------------------------------------
load_balance() count on cpu idle : 11936, 14590 | 22.24% | $ 4.05, 3.31 $
load_balance() found balanced on cpu idle : 11690, 14069 | 20.35% | $ 4.14, 3.44 $
load_balance() move task failed on cpu idle : 8, 164 | 1950.00% | $ 6043.62, 294.79 $
imbalance sum on cpu idle : 253, 154633 | 61019.76% |
pull_task() count on cpu idle : 240, 363 | 51.25% |
pull_task() when target task was cache-hot on cpu idle : 0, 0 | 0.00% |
load_balance() failed to find busier queue on cpu idle : 0, 0 | 0.00% | $ 0.00, 0.00 $
load_balance() failed to find busier group on cpu idle : 11689, 14069 | 20.36% | $ 4.14, 3.44 $
*load_balance() success count on cpu idle : 238, 357 | 50.00% |
*avg task pulled per successful lb attempt (cpu idle) : 1.01, 1.02 | 0.83% |
---------------------------------------- <Category newidle> ----------------------------------------
load_balance() count on cpu newly idle : 331664, 31153 | -90.61% | $ 0.15, 1.55 $
load_balance() found balanced on cpu newly idle : 302817, 28735 | -90.51% | $ 0.16, 1.68 $
load_balance() move task failed on cpu newly idle : 461, 874 | 89.59% | $ 104.88, 55.31 $
imbalance sum on cpu newly idle : 28955, 829603 | 2765.15% |
pull_task() count on cpu newly idle : 28493, 1557 | -94.54% |
pull_task() when target task was cache-hot on cpu newly idle : 0, 0 | 0.00% |
load_balance() failed to find busier queue on cpu newly idle : 0, 0 | 0.00% | $ 0.00, 0.00 $
load_balance() failed to find busier group on cpu newly idle : 300234, 28470 | -90.52% | $ 0.16, 1.70 $
*load_balance() success count on cpu newly idle : 28386, 1544 | -94.56% |
*avg task pulled per successful lb attempt (cpu newly idle) : 1.00, 1.01 | 0.46% |
--------------------------------- <Category active_load_balance()> ---------------------------------
active_load_balance() count : 0, 0 | 0.00% |
active_load_balance() move task failed : 0, 0 | 0.00% |
active_load_balance() successfully moved a task : 0, 0 | 0.00% |
--------------------------------- <Category sched_balance_exec()> ----------------------------------
sbe_count is not used : 0, 0 | 0.00% |
sbe_balanced is not used : 0, 0 | 0.00% |
sbe_pushed is not used : 0, 0 | 0.00% |
--------------------------------- <Category sched_balance_fork()> ----------------------------------
sbf_count is not used : 0, 0 | 0.00% |
sbf_balanced is not used : 0, 0 | 0.00% |
sbf_pushed is not used : 0, 0 | 0.00% |
------------------------------------------ <Wakeup Info> -------------------------------------------
try_to_wake_up() awoke a task that last ran on a diff cpu : 25939, 31717 | 22.28% |
try_to_wake_up() moved task because cache-cold on own cpu : 7221, 5908 | -18.18% |
try_to_wake_up() started passive balancing : 0, 0 | 0.00% |
----------------------------------------------------------------------------------------------------
CPU <ALL CPUS SUMMARY>, DOMAIN 1
----------------------------------------------------------------------------------------------------
DESC COUNT1 COUNT2 PCT_CHANGE AVG_JIFFIES1 AVG_JIFFIES2
----------------------------------------- <Category busy> ------------------------------------------
load_balance() count on cpu busy : 45, 17 | -62.22% | $ 1074.42, 2843.82 $
load_balance() found balanced on cpu busy : 45, 16 | -64.44% | $ 1074.42, 3021.56 $
load_balance() move task failed on cpu busy : 0, 0 | 0.00% | $ 0.00, 0.00 $
imbalance sum on cpu busy : 2, 356 | 17700.00% |
pull_task() count on cpu busy : 0, 0 | 0.00% |
pull_task() when target task was cache-hot on cpu busy : 0, 0 | 0.00% |
load_balance() failed to find busier queue on cpu busy : 0, 0 | 0.00% | $ 0.00, 0.00 $
load_balance() failed to find busier group on cpu busy : 8, 2 | -75.00% | $ 6043.62, 24172.50 $
*load_balance() success count on cpu busy : 0, 1 | 0.00% |
*avg task pulled per successful lb attempt (cpu busy) : 0.00, 0.00 | 0.00% |
----------------------------------------- <Category idle> ------------------------------------------
load_balance() count on cpu idle : 7753, 7930 | 2.28% | $ 6.24, 6.10 $
load_balance() found balanced on cpu idle : 6208, 6591 | 6.17% | $ 7.79, 7.34 $
load_balance() move task failed on cpu idle : 1334, 1000 | -25.04% | $ 36.24, 48.35 $
imbalance sum on cpu idle : 1612, 274184 | 16908.93% |
pull_task() count on cpu idle : 216, 357 | 65.28% |
pull_task() when target task was cache-hot on cpu idle : 0, 10 | 0.00% |
load_balance() failed to find busier queue on cpu idle : 0, 0 | 0.00% | $ 0.00, 0.00 $
load_balance() failed to find busier group on cpu idle : 4065, 4062 | -0.07% | $ 11.89, 11.90 $
*load_balance() success count on cpu idle : 211, 339 | 60.66% |
*avg task pulled per successful lb attempt (cpu idle) : 1.02, 1.05 | 2.87% |
---------------------------------------- <Category newidle> ----------------------------------------
load_balance() count on cpu newly idle : 258017, 29345 | -88.63% | $ 0.19, 1.65 $
load_balance() found balanced on cpu newly idle : 131570, 16162 | -87.72% | $ 0.37, 2.99 $
load_balance() move task failed on cpu newly idle : 103161, 11002 | -89.34% | $ 0.47, 4.39 $
imbalance sum on cpu newly idle : 131916, 2537851 | 1823.84% |
pull_task() count on cpu newly idle : 23922, 2213 | -90.75% |
pull_task() when target task was cache-hot on cpu newly idle : 5, 5 | 0.00% |
load_balance() failed to find busier queue on cpu newly idle : 0, 2 | 0.00% | $ 0.00, 24172.50 $
load_balance() failed to find busier group on cpu newly idle : 131096, 16081 | -87.73% | $ 0.37, 3.01 $
*load_balance() success count on cpu newly idle : 23286, 2181 | -90.63% |
*avg task pulled per successful lb attempt (cpu newly idle) : 1.03, 1.01 | -1.23% |
--------------------------------- <Category active_load_balance()> ---------------------------------
active_load_balance() count : 0, 1 | 0.00% |
active_load_balance() move task failed : 0, 0 | 0.00% |
active_load_balance() successfully moved a task : 0, 1 | 0.00% |
--------------------------------- <Category sched_balance_exec()> ----------------------------------
sbe_count is not used : 0, 0 | 0.00% |
sbe_balanced is not used : 0, 0 | 0.00% |
sbe_pushed is not used : 0, 0 | 0.00% |
--------------------------------- <Category sched_balance_fork()> ----------------------------------
sbf_count is not used : 0, 0 | 0.00% |
sbf_balanced is not used : 0, 0 | 0.00% |
sbf_pushed is not used : 0, 0 | 0.00% |
------------------------------------------ <Wakeup Info> -------------------------------------------
try_to_wake_up() awoke a task that last ran on a diff cpu : 209758, 283095 | 34.96% |
try_to_wake_up() moved task because cache-cold on own cpu : 37946, 33835 | -10.83% |
try_to_wake_up() started passive balancing : 0, 0 | 0.00% |
----------------------------------------------------------------------------------------------------
CPU <ALL CPUS SUMMARY>, DOMAIN 2
----------------------------------------------------------------------------------------------------
DESC COUNT1 COUNT2 PCT_CHANGE AVG_JIFFIES1 AVG_JIFFIES2
----------------------------------------- <Category busy> ------------------------------------------
load_balance() count on cpu busy : 0, 0 | 0.00% | $ 0.00, 0.00 $
load_balance() found balanced on cpu busy : 0, 0 | 0.00% | $ 0.00, 0.00 $
load_balance() move task failed on cpu busy : 0, 0 | 0.00% | $ 0.00, 0.00 $
imbalance sum on cpu busy : 0, 0 | 0.00% |
pull_task() count on cpu busy : 0, 0 | 0.00% |
pull_task() when target task was cache-hot on cpu busy : 0, 0 | 0.00% |
load_balance() failed to find busier queue on cpu busy : 0, 0 | 0.00% | $ 0.00, 0.00 $
load_balance() failed to find busier group on cpu busy : 0, 0 | 0.00% | $ 0.00, 0.00 $
*load_balance() success count on cpu busy : 0, 0 | 0.00% |
*avg task pulled per successful lb attempt (cpu busy) : 0.00, 0.00 | 0.00% |
----------------------------------------- <Category idle> ------------------------------------------
load_balance() count on cpu idle : 1285, 1321 | 2.80% | $ 37.63, 36.60 $
load_balance() found balanced on cpu idle : 908, 1006 | 10.79% | $ 53.25, 48.06 $
load_balance() move task failed on cpu idle : 310, 209 | -32.58% | $ 155.96, 231.32 $
imbalance sum on cpu idle : 251700, 220823 | -12.27% |
pull_task() count on cpu idle : 75, 136 | 81.33% |
pull_task() when target task was cache-hot on cpu idle : 0, 0 | 0.00% |
load_balance() failed to find busier queue on cpu idle : 2, 0 | -100.00% | $ 24174.50, 0.00 $
load_balance() failed to find busier group on cpu idle : 62, 45 | -27.42% | $ 779.82, 1074.33 $
*load_balance() success count on cpu idle : 67, 106 | 58.21% |
*avg task pulled per successful lb attempt (cpu idle) : 1.12, 1.28 | 14.62% |
---------------------------------------- <Category newidle> ----------------------------------------
load_balance() count on cpu newly idle : 124013, 27086 | -78.16% | $ 0.39, 1.78 $
load_balance() found balanced on cpu newly idle : 13528, 3242 | -76.03% | $ 3.57, 14.91 $
load_balance() move task failed on cpu newly idle : 96593, 19105 | -80.22% | $ 0.50, 2.53 $
imbalance sum on cpu newly idle : 23681561, 10057827 | -57.53% |
pull_task() count on cpu newly idle : 14841, 5231 | -64.75% |
pull_task() when target task was cache-hot on cpu newly idle : 4, 3 | -25.00% |
load_balance() failed to find busier queue on cpu newly idle : 1211, 30 | -97.52% | $ 39.92, 1611.50 $
load_balance() failed to find busier group on cpu newly idle : 11812, 3063 | -74.07% | $ 4.09, 15.78 $
*load_balance() success count on cpu newly idle : 13892, 4739 | -65.89% |
*avg task pulled per successful lb attempt (cpu newly idle) : 1.07, 1.10 | 3.32% |
--------------------------------- <Category active_load_balance()> ---------------------------------
active_load_balance() count : 0, 0 | 0.00% |
active_load_balance() move task failed : 0, 0 | 0.00% |
active_load_balance() successfully moved a task : 0, 0 | 0.00% |
--------------------------------- <Category sched_balance_exec()> ----------------------------------
sbe_count is not used : 0, 0 | 0.00% |
sbe_balanced is not used : 0, 0 | 0.00% |
sbe_pushed is not used : 0, 0 | 0.00% |
--------------------------------- <Category sched_balance_fork()> ----------------------------------
sbf_count is not used : 0, 0 | 0.00% |
sbf_balanced is not used : 0, 0 | 0.00% |
sbf_pushed is not used : 0, 0 | 0.00% |
------------------------------------------ <Wakeup Info> -------------------------------------------
try_to_wake_up() awoke a task that last ran on a diff cpu : 171321, 112803 | -34.16% |
try_to_wake_up() moved task because cache-cold on own cpu : 47112, 18467 | -60.80% |
try_to_wake_up() started passive balancing : 0, 0 | 0.00% |
----------------------------------------------------------------------------------------------------
next prev parent reply other threads:[~2025-04-28 7:43 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-17 21:08 IPC drop down on AMD epyc 7702P Jean-Baptiste Roquefere
2025-04-18 6:39 ` K Prateek Nayak
2025-04-28 7:43 ` Jean-Baptiste Roquefere [this message]
2025-04-30 9:13 ` K Prateek Nayak
2025-04-30 9:25 ` Peter Zijlstra
2025-04-30 10:41 ` Libo Chen
2025-04-30 11:29 ` K Prateek Nayak
2025-05-01 2:46 ` Libo Chen
2025-05-05 10:28 ` Vincent Guittot
2025-05-05 12:29 ` K Prateek Nayak
2025-05-05 15:10 ` Vincent Guittot
2025-05-05 15:16 ` K Prateek Nayak
2025-05-16 15:05 ` Jean-Baptiste Roquefere
2025-05-22 14:51 ` Vincent Guittot
2025-05-23 12:24 ` Jean-Baptiste Roquefere
2025-05-26 7:53 ` Vincent Guittot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=996ca8cb-3ac8-4f1b-93f1-415f43922d7a@ateme.com \
--to=jb.roquefere@ateme.com \
--cc=bp@alien8.de \
--cc=gautham.shenoy@amd.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=regressions@lists.linux.dev \
--cc=stable@vger.kernel.org \
--cc=swapnil.sapkal@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox