Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: Jean-Baptiste Roquefere <jb.roquefere@ateme.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>,
	"Gautham R. Shenoy" <gautham.shenoy@amd.com>,
	Swapnil Sapkal <swapnil.sapkal@amd.com>
Cc: "regressions@lists.linux.dev" <regressions@lists.linux.dev>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Borislav Petkov <bp@alien8.de>
Subject: Re: IPC drop down on AMD epyc 7702P
Date: Mon, 28 Apr 2025 07:43:05 +0000	[thread overview]
Message-ID: <996ca8cb-3ac8-4f1b-93f1-415f43922d7a@ateme.com> (raw)
In-Reply-To: <4c0f13ab-c9cd-42c4-84bd-244365b450e2@amd.com>

[-- Attachment #1: Type: text/plain, Size: 3169 bytes --]

Hello Prateek,

thank's for your reponse.


> Looking at the commit logs, it looks like these commits do solve other
> problems around load balancing and might not be trivial to revert
> without evaluating the damages.

it's definitely not a productizable workaround !

> The processor you are running on, the AME EPYC 7702P based on the Zen2
> architecture contains 4 cores / 8 threads per CCX (LLC domain) which is
> perhaps why reducing the thread count to below this limit is helping
> your workload.
>
> What we suspect is that when running the workload, the threads that
> regularly sleep trigger a newidle balancing which causes them to move
> to another CCX leading to higher number of L3 misses.
>
> To confirm this, would it be possible to run the workload with the
> not-yet-upstream perf sched stats [1] tool and share the result from
> perf sched stats diff for the data from v6.12.17 and v6.12.17 + patch
> to rule out any other second order effect.
>
> [1] 
> https://lore.kernel.org/all/20250311120230.61774-1-swapnil.sapkal@amd.com/

I had to patch tools/perf/util/session.c : static int 
open_file_read(struct perf_data *data) due to "failed to open perf.data: 
File exists" (looked more like a compiler issue than a tool/perf issue)

$ ./perf sched stats diff perf.data.6.12.17 perf.data.6.12.17patched > 
perf.diff (see perf.diff attached)

> Assuming you control these deployments, would it possible to run
> the workload on a kernel running with "relax_domain_level=2" kernel
> cmdline that restricts newidle balance to only within the CCX. As a
> side effect, it also limits  task wakeups to the same LLC domain but
> I would still like to know if this makes a difference to the
> workload you are running.
On vanilla 6.12.17 it gives the IPC we expected:

+--------------------+--------------------------+-----------------------+
|                    | relax_domain_level unset | relax_domain_level=2  |
+--------------------+--------------------------+-----------------------+
| Threads            |  210                     | 210                  |
| Utilization (%)    |  65,86                   | 52,01                |
| CPU effective freq |  1 622,93                |  1 294,12             |
| IPC                |  1,14                    | 1,42                 |
| L2 access (pti)    |  34,36                   | 38,18                |
| L2 miss   (pti)    |  7,34                    | 7,78                 |
| L3 miss   (abs)    |  39 711 971 741          |  33 929 609 924       |
| Mem (GB/s)         |  70,68                   | 49,10                |
| Context switches   |  109 281 524             |  107 896 729          |
+--------------------+--------------------------+-----------------------+

Kind regards,

JB

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: perf.diff --]
[-- Type: text/x-patch; name="perf.diff", Size: 20149 bytes --]

Columns description
----------------------------------------------------------------------------------------------------
DESC			-> Description of the field
COUNT			-> Value of the field
PCT_CHANGE		-> Percent change with corresponding base value
AVG_JIFFIES		-> Avg time in jiffies between two consecutive occurrence of event
----------------------------------------------------------------------------------------------------
Time elapsed (in jiffies)                                        :       48349,      48345
----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------
CPU <ALL CPUS SUMMARY>
----------------------------------------------------------------------------------------------------
DESC                                                                    COUNT1      COUNT2   PCT_CHANGE    PCT_CHANGE1 PCT_CHANGE2
----------------------------------------------------------------------------------------------------
sched_yield() count                                              :           0,          8  |     0.00% |
Legacy counter can be ignored                                    :           0,          0  |     0.00% |
schedule() called                                                :      856174,     886448  |     3.54% |
schedule() left the processor idle                               :      354060,     396363  |    11.95% |  (    41.35%,     44.71% )
try_to_wake_up() was called                                      :      478156,     469763  |    -1.76% |
try_to_wake_up() was called to wake up the local cpu             :       71136,      42146  |   -40.75% |  (    14.88%,      8.97% )
total runtime by tasks on this processor (in jiffies)            : 123927676874,108531911002  |   -12.42% |
total waittime by tasks on this processor (in jiffies)           : 34729211241,27076295778  |   -22.04% |  (    28.02%,     24.95% )
total timeslices run on this cpu                                 :      501606,     489799  |    -2.35% |
----------------------------------------------------------------------------------------------------
CPU <ALL CPUS SUMMARY>, DOMAIN 0
----------------------------------------------------------------------------------------------------
DESC                                                                    COUNT1      COUNT2   PCT_CHANGE     AVG_JIFFIES1 AVG_JIFFIES2
----------------------------------------- <Category busy> ------------------------------------------
load_balance() count on cpu busy                                 :        2494,        730  |   -70.73% |  $       19.39,       66.23 $
load_balance() found balanced on cpu busy                        :        2445,        641  |   -73.78% |  $       19.77,       75.42 $
load_balance() move task failed on cpu busy                      :          20,         55  |   175.00% |  $     2417.45,      879.00 $
imbalance sum on cpu busy                                        :         453,      29661  |  6447.68% |
pull_task() count on cpu busy                                    :          29,         35  |    20.69% |
pull_task() when target task was cache-hot on cpu busy           :           0,          0  |     0.00% |
load_balance() failed to find busier queue on cpu busy           :           0,          0  |     0.00% |  $        0.00,        0.00 $
load_balance() failed to find busier group on cpu busy           :        2445,        641  |   -73.78% |  $       19.77,       75.42 $
*load_balance() success count on cpu busy                        :          29,         34  |    17.24% |
*avg task pulled per successful lb attempt (cpu busy)            :        1.00,       1.03  |     2.94% |
----------------------------------------- <Category idle> ------------------------------------------
load_balance() count on cpu idle                                 :       11936,      14590  |    22.24% |  $        4.05,        3.31 $
load_balance() found balanced on cpu idle                        :       11690,      14069  |    20.35% |  $        4.14,        3.44 $
load_balance() move task failed on cpu idle                      :           8,        164  |  1950.00% |  $     6043.62,      294.79 $
imbalance sum on cpu idle                                        :         253,     154633  | 61019.76% |
pull_task() count on cpu idle                                    :         240,        363  |    51.25% |
pull_task() when target task was cache-hot on cpu idle           :           0,          0  |     0.00% |
load_balance() failed to find busier queue on cpu idle           :           0,          0  |     0.00% |  $        0.00,        0.00 $
load_balance() failed to find busier group on cpu idle           :       11689,      14069  |    20.36% |  $        4.14,        3.44 $
*load_balance() success count on cpu idle                        :         238,        357  |    50.00% |
*avg task pulled per successful lb attempt (cpu idle)            :        1.01,       1.02  |     0.83% |
---------------------------------------- <Category newidle> ----------------------------------------
load_balance() count on cpu newly idle                           :      331664,      31153  |   -90.61% |  $        0.15,        1.55 $
load_balance() found balanced on cpu newly idle                  :      302817,      28735  |   -90.51% |  $        0.16,        1.68 $
load_balance() move task failed on cpu newly idle                :         461,        874  |    89.59% |  $      104.88,       55.31 $
imbalance sum on cpu newly idle                                  :       28955,     829603  |  2765.15% |
pull_task() count on cpu newly idle                              :       28493,       1557  |   -94.54% |
pull_task() when target task was cache-hot on cpu newly idle     :           0,          0  |     0.00% |
load_balance() failed to find busier queue on cpu newly idle     :           0,          0  |     0.00% |  $        0.00,        0.00 $
load_balance() failed to find busier group on cpu newly idle     :      300234,      28470  |   -90.52% |  $        0.16,        1.70 $
*load_balance() success count on cpu newly idle                  :       28386,       1544  |   -94.56% |
*avg task pulled per successful lb attempt (cpu newly idle)      :        1.00,       1.01  |     0.46% |
--------------------------------- <Category active_load_balance()> ---------------------------------
active_load_balance() count                                      :           0,          0  |     0.00% |
active_load_balance() move task failed                           :           0,          0  |     0.00% |
active_load_balance() successfully moved a task                  :           0,          0  |     0.00% |
--------------------------------- <Category sched_balance_exec()> ----------------------------------
sbe_count is not used                                            :           0,          0  |     0.00% |
sbe_balanced is not used                                         :           0,          0  |     0.00% |
sbe_pushed is not used                                           :           0,          0  |     0.00% |
--------------------------------- <Category sched_balance_fork()> ----------------------------------
sbf_count is not used                                            :           0,          0  |     0.00% |
sbf_balanced is not used                                         :           0,          0  |     0.00% |
sbf_pushed is not used                                           :           0,          0  |     0.00% |
------------------------------------------ <Wakeup Info> -------------------------------------------
try_to_wake_up() awoke a task that last ran on a diff cpu        :       25939,      31717  |    22.28% |
try_to_wake_up() moved task because cache-cold on own cpu        :        7221,       5908  |   -18.18% |
try_to_wake_up() started passive balancing                       :           0,          0  |     0.00% |
----------------------------------------------------------------------------------------------------
CPU <ALL CPUS SUMMARY>, DOMAIN 1
----------------------------------------------------------------------------------------------------
DESC                                                                    COUNT1      COUNT2   PCT_CHANGE     AVG_JIFFIES1 AVG_JIFFIES2
----------------------------------------- <Category busy> ------------------------------------------
load_balance() count on cpu busy                                 :          45,         17  |   -62.22% |  $     1074.42,     2843.82 $
load_balance() found balanced on cpu busy                        :          45,         16  |   -64.44% |  $     1074.42,     3021.56 $
load_balance() move task failed on cpu busy                      :           0,          0  |     0.00% |  $        0.00,        0.00 $
imbalance sum on cpu busy                                        :           2,        356  | 17700.00% |
pull_task() count on cpu busy                                    :           0,          0  |     0.00% |
pull_task() when target task was cache-hot on cpu busy           :           0,          0  |     0.00% |
load_balance() failed to find busier queue on cpu busy           :           0,          0  |     0.00% |  $        0.00,        0.00 $
load_balance() failed to find busier group on cpu busy           :           8,          2  |   -75.00% |  $     6043.62,    24172.50 $
*load_balance() success count on cpu busy                        :           0,          1  |     0.00% |
*avg task pulled per successful lb attempt (cpu busy)            :        0.00,       0.00  |     0.00% |
----------------------------------------- <Category idle> ------------------------------------------
load_balance() count on cpu idle                                 :        7753,       7930  |     2.28% |  $        6.24,        6.10 $
load_balance() found balanced on cpu idle                        :        6208,       6591  |     6.17% |  $        7.79,        7.34 $
load_balance() move task failed on cpu idle                      :        1334,       1000  |   -25.04% |  $       36.24,       48.35 $
imbalance sum on cpu idle                                        :        1612,     274184  | 16908.93% |
pull_task() count on cpu idle                                    :         216,        357  |    65.28% |
pull_task() when target task was cache-hot on cpu idle           :           0,         10  |     0.00% |
load_balance() failed to find busier queue on cpu idle           :           0,          0  |     0.00% |  $        0.00,        0.00 $
load_balance() failed to find busier group on cpu idle           :        4065,       4062  |    -0.07% |  $       11.89,       11.90 $
*load_balance() success count on cpu idle                        :         211,        339  |    60.66% |
*avg task pulled per successful lb attempt (cpu idle)            :        1.02,       1.05  |     2.87% |
---------------------------------------- <Category newidle> ----------------------------------------
load_balance() count on cpu newly idle                           :      258017,      29345  |   -88.63% |  $        0.19,        1.65 $
load_balance() found balanced on cpu newly idle                  :      131570,      16162  |   -87.72% |  $        0.37,        2.99 $
load_balance() move task failed on cpu newly idle                :      103161,      11002  |   -89.34% |  $        0.47,        4.39 $
imbalance sum on cpu newly idle                                  :      131916,    2537851  |  1823.84% |
pull_task() count on cpu newly idle                              :       23922,       2213  |   -90.75% |
pull_task() when target task was cache-hot on cpu newly idle     :           5,          5  |     0.00% |
load_balance() failed to find busier queue on cpu newly idle     :           0,          2  |     0.00% |  $        0.00,    24172.50 $
load_balance() failed to find busier group on cpu newly idle     :      131096,      16081  |   -87.73% |  $        0.37,        3.01 $
*load_balance() success count on cpu newly idle                  :       23286,       2181  |   -90.63% |
*avg task pulled per successful lb attempt (cpu newly idle)      :        1.03,       1.01  |    -1.23% |
--------------------------------- <Category active_load_balance()> ---------------------------------
active_load_balance() count                                      :           0,          1  |     0.00% |
active_load_balance() move task failed                           :           0,          0  |     0.00% |
active_load_balance() successfully moved a task                  :           0,          1  |     0.00% |
--------------------------------- <Category sched_balance_exec()> ----------------------------------
sbe_count is not used                                            :           0,          0  |     0.00% |
sbe_balanced is not used                                         :           0,          0  |     0.00% |
sbe_pushed is not used                                           :           0,          0  |     0.00% |
--------------------------------- <Category sched_balance_fork()> ----------------------------------
sbf_count is not used                                            :           0,          0  |     0.00% |
sbf_balanced is not used                                         :           0,          0  |     0.00% |
sbf_pushed is not used                                           :           0,          0  |     0.00% |
------------------------------------------ <Wakeup Info> -------------------------------------------
try_to_wake_up() awoke a task that last ran on a diff cpu        :      209758,     283095  |    34.96% |
try_to_wake_up() moved task because cache-cold on own cpu        :       37946,      33835  |   -10.83% |
try_to_wake_up() started passive balancing                       :           0,          0  |     0.00% |
----------------------------------------------------------------------------------------------------
CPU <ALL CPUS SUMMARY>, DOMAIN 2
----------------------------------------------------------------------------------------------------
DESC                                                                    COUNT1      COUNT2   PCT_CHANGE     AVG_JIFFIES1 AVG_JIFFIES2
----------------------------------------- <Category busy> ------------------------------------------
load_balance() count on cpu busy                                 :           0,          0  |     0.00% |  $        0.00,        0.00 $
load_balance() found balanced on cpu busy                        :           0,          0  |     0.00% |  $        0.00,        0.00 $
load_balance() move task failed on cpu busy                      :           0,          0  |     0.00% |  $        0.00,        0.00 $
imbalance sum on cpu busy                                        :           0,          0  |     0.00% |
pull_task() count on cpu busy                                    :           0,          0  |     0.00% |
pull_task() when target task was cache-hot on cpu busy           :           0,          0  |     0.00% |
load_balance() failed to find busier queue on cpu busy           :           0,          0  |     0.00% |  $        0.00,        0.00 $
load_balance() failed to find busier group on cpu busy           :           0,          0  |     0.00% |  $        0.00,        0.00 $
*load_balance() success count on cpu busy                        :           0,          0  |     0.00% |
*avg task pulled per successful lb attempt (cpu busy)            :        0.00,       0.00  |     0.00% |
----------------------------------------- <Category idle> ------------------------------------------
load_balance() count on cpu idle                                 :        1285,       1321  |     2.80% |  $       37.63,       36.60 $
load_balance() found balanced on cpu idle                        :         908,       1006  |    10.79% |  $       53.25,       48.06 $
load_balance() move task failed on cpu idle                      :         310,        209  |   -32.58% |  $      155.96,      231.32 $
imbalance sum on cpu idle                                        :      251700,     220823  |   -12.27% |
pull_task() count on cpu idle                                    :          75,        136  |    81.33% |
pull_task() when target task was cache-hot on cpu idle           :           0,          0  |     0.00% |
load_balance() failed to find busier queue on cpu idle           :           2,          0  |  -100.00% |  $    24174.50,        0.00 $
load_balance() failed to find busier group on cpu idle           :          62,         45  |   -27.42% |  $      779.82,     1074.33 $
*load_balance() success count on cpu idle                        :          67,        106  |    58.21% |
*avg task pulled per successful lb attempt (cpu idle)            :        1.12,       1.28  |    14.62% |
---------------------------------------- <Category newidle> ----------------------------------------
load_balance() count on cpu newly idle                           :      124013,      27086  |   -78.16% |  $        0.39,        1.78 $
load_balance() found balanced on cpu newly idle                  :       13528,       3242  |   -76.03% |  $        3.57,       14.91 $
load_balance() move task failed on cpu newly idle                :       96593,      19105  |   -80.22% |  $        0.50,        2.53 $
imbalance sum on cpu newly idle                                  :    23681561,   10057827  |   -57.53% |
pull_task() count on cpu newly idle                              :       14841,       5231  |   -64.75% |
pull_task() when target task was cache-hot on cpu newly idle     :           4,          3  |   -25.00% |
load_balance() failed to find busier queue on cpu newly idle     :        1211,         30  |   -97.52% |  $       39.92,     1611.50 $
load_balance() failed to find busier group on cpu newly idle     :       11812,       3063  |   -74.07% |  $        4.09,       15.78 $
*load_balance() success count on cpu newly idle                  :       13892,       4739  |   -65.89% |
*avg task pulled per successful lb attempt (cpu newly idle)      :        1.07,       1.10  |     3.32% |
--------------------------------- <Category active_load_balance()> ---------------------------------
active_load_balance() count                                      :           0,          0  |     0.00% |
active_load_balance() move task failed                           :           0,          0  |     0.00% |
active_load_balance() successfully moved a task                  :           0,          0  |     0.00% |
--------------------------------- <Category sched_balance_exec()> ----------------------------------
sbe_count is not used                                            :           0,          0  |     0.00% |
sbe_balanced is not used                                         :           0,          0  |     0.00% |
sbe_pushed is not used                                           :           0,          0  |     0.00% |
--------------------------------- <Category sched_balance_fork()> ----------------------------------
sbf_count is not used                                            :           0,          0  |     0.00% |
sbf_balanced is not used                                         :           0,          0  |     0.00% |
sbf_pushed is not used                                           :           0,          0  |     0.00% |
------------------------------------------ <Wakeup Info> -------------------------------------------
try_to_wake_up() awoke a task that last ran on a diff cpu        :      171321,     112803  |   -34.16% |
try_to_wake_up() moved task because cache-cold on own cpu        :       47112,      18467  |   -60.80% |
try_to_wake_up() started passive balancing                       :           0,          0  |     0.00% |
----------------------------------------------------------------------------------------------------

  reply	other threads:[~2025-04-28  7:43 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-17 21:08 IPC drop down on AMD epyc 7702P Jean-Baptiste Roquefere
2025-04-18  6:39 ` K Prateek Nayak
2025-04-28  7:43   ` Jean-Baptiste Roquefere [this message]
2025-04-30  9:13     ` K Prateek Nayak
2025-04-30  9:25       ` Peter Zijlstra
2025-04-30 10:41       ` Libo Chen
2025-04-30 11:29         ` K Prateek Nayak
2025-05-01  2:46           ` Libo Chen
2025-05-05 10:28       ` Vincent Guittot
2025-05-05 12:29         ` K Prateek Nayak
2025-05-05 15:10           ` Vincent Guittot
2025-05-05 15:16             ` K Prateek Nayak
2025-05-16 15:05               ` Jean-Baptiste Roquefere
2025-05-22 14:51                 ` Vincent Guittot
2025-05-23 12:24                   ` Jean-Baptiste Roquefere
2025-05-26  7:53                     ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=996ca8cb-3ac8-4f1b-93f1-415f43922d7a@ateme.com \
    --to=jb.roquefere@ateme.com \
    --cc=bp@alien8.de \
    --cc=gautham.shenoy@amd.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=swapnil.sapkal@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox