From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
dietmar.eggemann@arm.com, rostedt@goodmis.org,
bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
clm@meta.com, Madhavan Srinivasan <maddy@linux.ibm.com>
Subject: Re: [PATCH v2 00/12] sched: Address schbench regression
Date: Wed, 9 Jul 2025 22:16:14 +0530 [thread overview]
Message-ID: <d5cb15bd-1096-45a8-9da6-a37ff490714c@linux.ibm.com> (raw)
In-Reply-To: <20250708190201.GE477119@noisy.programming.kicks-ass.net>
On 7/9/25 00:32, Peter Zijlstra wrote:
> On Mon, Jul 07, 2025 at 11:49:17PM +0530, Shrikanth Hegde wrote:
>
>> Git bisect points to
>> # first bad commit: [dc968ba0544889883d0912360dd72d90f674c140] sched: Add ttwu_queue support for delayed tasks
>
> Moo.. Are IPIs particularly expensive on your platform?
>
> The 5 cores makes me think this is a partition of sorts, but IIRC the
> power LPAR stuff was fixed physical, so routing interrupts shouldn't be
> much more expensive vs native hardware.
>
Yes, we call it as dedicated LPAR. (Hypervisor optimises such that overhead is minimal,
i think that i true for interrupts too).
Some more variations of testing and numbers:
The system had some configs which i had messed up such as CONFIG_SCHED_SMT=n. I copied the default
distro config back and ran the benchmark again. Slightly better numbers compared to earlier.
Still a major regression. Collected mpstat numbers. It shows much less percentage compared to
earlier.
--------------------------------------------------------------------------
base: 8784fb5fa2e0 (tip/master)
Wakeup Latencies percentiles (usec) runtime 30 (s) (41567569 total samples)
50.0th: 11 (10767158 samples)
90.0th: 22 (16782627 samples)
* 99.0th: 36 (3347363 samples)
99.9th: 52 (344977 samples)
min=1, max=731
RPS percentiles (requests) runtime 30 (s) (31 total samples)
20.0th: 1443840 (31 samples)
* 50.0th: 1443840 (0 samples)
90.0th: 1443840 (0 samples)
min=1433480, max=1444037
average rps: 1442889.23
CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
all 3.24 0.00 11.39 0.00 37.30 0.00 0.00 0.00 0.00 48.07
all 2.59 0.00 11.56 0.00 37.62 0.00 0.00 0.00 0.00 48.23
base + clm's patch + series:
Wakeup Latencies percentiles (usec) runtime 30 (s) (27166787 total samples)
50.0th: 57 (8242048 samples)
90.0th: 120 (10677365 samples)
* 99.0th: 182 (2435082 samples)
99.9th: 262 (241664 samples)
min=1, max=89984
RPS percentiles (requests) runtime 30 (s) (31 total samples)
20.0th: 896000 (8 samples)
* 50.0th: 902144 (10 samples)
90.0th: 928768 (10 samples)
min=881548, max=971101
average rps: 907530.10 <<< close to 40% drop in RPS.
CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
all 1.95 0.00 7.67 0.00 14.84 0.00 0.00 0.00 0.00 75.55
all 1.61 0.00 7.91 0.00 13.53 0.05 0.00 0.00 0.00 76.90
-----------------------------------------------------------------------------
- To be sure, I tried on another system. That system had 30 cores.
base:
Wakeup Latencies percentiles (usec) runtime 30 (s) (40339785 total samples)
50.0th: 12 (12585268 samples)
90.0th: 24 (15194626 samples)
* 99.0th: 44 (3206872 samples)
99.9th: 59 (320508 samples)
min=1, max=1049
RPS percentiles (requests) runtime 30 (s) (31 total samples)
20.0th: 1320960 (14 samples)
* 50.0th: 1333248 (2 samples)
90.0th: 1386496 (12 samples)
min=1309615, max=1414281
base + clm's patch + series:
Wakeup Latencies percentiles (usec) runtime 30 (s) (34318584 total samples)
50.0th: 23 (10486283 samples)
90.0th: 64 (13436248 samples)
* 99.0th: 122 (3039318 samples)
99.9th: 166 (306231 samples)
min=1, max=7255
RPS percentiles (requests) runtime 30 (s) (31 total samples)
20.0th: 1006592 (8 samples)
* 50.0th: 1239040 (9 samples)
90.0th: 1259520 (11 samples)
min=852462, max=1268841
average rps: 1144229.23 << close 10-15% drop in RPS
- Then I resized that 30 core LPAR into a 5 core LPAR to see if the issue pops up in a smaller
config. It did. I see similar regression of 40-50% drop in RPS.
- Then I made it as 6 core system. To see if this is due to any ping pong because of odd numbers.
Numbers are similar to 5 core case.
- Maybe regressions is higher in smaller configurations.
next prev parent reply other threads:[~2025-07-09 16:47 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-02 11:49 [PATCH v2 00/12] sched: Address schbench regression Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 01/12] sched/psi: Optimize psi_group_change() cpu_clock() usage Peter Zijlstra
2025-07-15 19:11 ` Chris Mason
2025-07-16 6:06 ` K Prateek Nayak
2025-07-16 6:53 ` Beata Michalska
2025-07-16 10:40 ` Peter Zijlstra
2025-07-16 14:54 ` Johannes Weiner
2025-07-16 16:27 ` Chris Mason
2025-07-23 4:16 ` Aithal, Srikanth
2025-07-25 5:13 ` K Prateek Nayak
2025-07-02 11:49 ` [PATCH v2 02/12] sched/deadline: Less agressive dl_server handling Peter Zijlstra
2025-07-02 16:12 ` Juri Lelli
2025-07-10 12:46 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-07-14 22:56 ` [PATCH v2 02/12] " Mel Gorman
2025-07-15 14:55 ` Chris Mason
2025-07-16 18:19 ` Mel Gorman
2025-07-30 9:34 ` Geert Uytterhoeven
2025-07-30 9:46 ` Juri Lelli
2025-07-30 10:05 ` Geert Uytterhoeven
2025-08-05 22:03 ` Chris Bainbridge
2025-08-05 23:04 ` Chris Bainbridge
2025-09-15 22:29 ` John Stultz
2025-09-16 4:18 ` John Stultz
2025-09-16 5:28 ` [RFC][PATCH] sched/deadline: Fix dl_server getting stuck, allowing cpu starvation John Stultz
2025-09-16 8:51 ` Juri Lelli
2025-09-16 11:01 ` Peter Zijlstra
2025-09-16 12:52 ` Juri Lelli
2025-09-16 14:30 ` Peter Zijlstra
2025-09-16 17:35 ` John Stultz
2025-09-16 21:30 ` Peter Zijlstra
2025-09-17 3:29 ` John Stultz
2025-09-17 9:34 ` Peter Zijlstra
2025-09-17 12:26 ` Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 03/12] sched: Optimize ttwu() / select_task_rq() Peter Zijlstra
2025-07-10 16:47 ` Vincent Guittot
2025-07-14 22:59 ` Mel Gorman
2025-07-02 11:49 ` [PATCH v2 04/12] sched: Use lock guard in ttwu_runnable() Peter Zijlstra
2025-07-10 16:48 ` Vincent Guittot
2025-07-14 23:00 ` Mel Gorman
2025-07-02 11:49 ` [PATCH v2 05/12] sched: Add ttwu_queue controls Peter Zijlstra
2025-07-10 16:51 ` Vincent Guittot
2025-07-14 23:14 ` Mel Gorman
2025-07-02 11:49 ` [PATCH v2 06/12] sched: Introduce ttwu_do_migrate() Peter Zijlstra
2025-07-10 16:51 ` Vincent Guittot
2025-07-02 11:49 ` [PATCH v2 07/12] psi: Split psi_ttwu_dequeue() Peter Zijlstra
2025-07-17 23:59 ` Chris Mason
2025-07-18 18:02 ` Steven Rostedt
2025-07-02 11:49 ` [PATCH v2 08/12] sched: Re-arrange __ttwu_queue_wakelist() Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 09/12] sched: Clean up ttwu comments Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 10/12] sched: Use lock guard in sched_ttwu_pending() Peter Zijlstra
2025-07-10 16:51 ` Vincent Guittot
2025-07-02 11:49 ` [PATCH v2 11/12] sched: Change ttwu_runnable() vs sched_delayed Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 12/12] sched: Add ttwu_queue support for delayed tasks Peter Zijlstra
2025-07-03 16:00 ` Phil Auld
2025-07-03 16:47 ` Peter Zijlstra
2025-07-03 17:11 ` Phil Auld
2025-07-14 13:57 ` Phil Auld
2025-07-04 6:13 ` K Prateek Nayak
2025-07-04 7:59 ` Peter Zijlstra
2025-07-08 12:44 ` Dietmar Eggemann
2025-07-08 18:57 ` Peter Zijlstra
2025-07-08 21:02 ` Peter Zijlstra
2025-07-23 5:42 ` Shrikanth Hegde
2025-07-02 15:27 ` [PATCH v2 00/12] sched: Address schbench regression Chris Mason
2025-07-07 9:05 ` Shrikanth Hegde
2025-07-07 9:11 ` Peter Zijlstra
2025-07-07 9:38 ` Shrikanth Hegde
2025-07-16 13:46 ` Phil Auld
2025-07-17 17:25 ` Phil Auld
2025-07-07 18:19 ` Shrikanth Hegde
2025-07-08 19:02 ` Peter Zijlstra
2025-07-09 16:46 ` Shrikanth Hegde [this message]
2025-07-14 17:54 ` Shrikanth Hegde
2025-07-21 19:37 ` Shrikanth Hegde
2025-07-22 20:20 ` Chris Mason
2025-07-24 18:23 ` Chris Mason
2025-07-08 15:09 ` Chris Mason
2025-07-08 17:29 ` Shrikanth Hegde
2025-07-17 13:04 ` Beata Michalska
2025-07-17 16:57 ` Beata Michalska
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d5cb15bd-1096-45a8-9da6-a37ff490714c@linux.ibm.com \
--to=sshegde@linux.ibm.com \
--cc=bsegall@google.com \
--cc=clm@meta.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maddy@linux.ibm.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).