From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
dietmar.eggemann@arm.com, rostedt@goodmis.org,
bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
clm@meta.com
Subject: Re: [PATCH v2 00/12] sched: Address schbench regression
Date: Mon, 14 Jul 2025 23:24:36 +0530 [thread overview]
Message-ID: <c6c0c135-9d8f-4d9d-8fc5-bc703cac9bdb@linux.ibm.com> (raw)
In-Reply-To: <20250708190201.GE477119@noisy.programming.kicks-ass.net>
On 7/9/25 00:32, Peter Zijlstra wrote:
> On Mon, Jul 07, 2025 at 11:49:17PM +0530, Shrikanth Hegde wrote:
>
>> Git bisect points to
>> # first bad commit: [dc968ba0544889883d0912360dd72d90f674c140] sched: Add ttwu_queue support for delayed tasks
>
> Moo.. Are IPIs particularly expensive on your platform?
>
> The 5 cores makes me think this is a partition of sorts, but IIRC the
> power LPAR stuff was fixed physical, so routing interrupts shouldn't be
> much more expensive vs native hardware.
>
Some more data from the regression. I am looking at rps numbers
while running ./schbench -L -m 4 -M auto -t 64 -n 0 -r 5 -i 5.
All the data is from an LPAR(VM) with 5 cores.
echo TTWU_QUEUE_DELAYED > features
average rps: 970491.00
echo NO_TTWU_QUEUE_DELAYED > features
current rps: 1555456.78
So below data points are with feature enabled or disabled with series applied + clm's patch.
-------------------------------------------------------
./hardirqs
TTWU_QUEUE_DELAYED
HARDIRQ TOTAL_usecs
env2 816
IPI-2 1421603 << IPI are less compared to with feature.
NO_TTWU_QUEUE_DELAYED
HARDIRQ TOTAL_usecs
ibmvscsi 8
env2 266
IPI-2 6489980
-------------------------------------------------------
Disabled all the idle states. Regression still exits.
-------------------------------------------------------
See this warning everytime i run schbench: This happens with PATCH 12/12 only.
It is triggering this warning. Some clock update is getting messed up?
1637 static inline void assert_clock_updated(struct rq *rq)
1638 {
1639 /*
1640 * The only reason for not seeing a clock update since the
1641 * last rq_pin_lock() is if we're currently skipping updates.
1642 */
1643 WARN_ON_ONCE(rq->clock_update_flags < RQCF_ACT_SKIP);
1644 }
WARNING: kernel/sched/sched.h:1643 at update_load_avg+0x424/0x48c, CPU#6: swapper/6/0
CPU: 6 UID: 0 PID: 0 Comm: swapper/6 Kdump: loaded Not tainted 6.16.0-rc4+ #276 PREEMPT(voluntary)
NIP: c0000000001cea60 LR: c0000000001d7254 CTR: c0000000001d77b0
REGS: c000000003a674c0 TRAP: 0700 Not tainted (6.16.0-rc4+)
MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 28008208 XER: 20040000
CFAR: c0000000001ce68c IRQMASK: 3
GPR00: c0000000001d7254 c000000003a67760 c000000001bc8100 c000000061915400
GPR04: c00000008c80f480 0000000000000005 c000000003a679b0 0000000000000000
GPR08: 0000000000000001 0000000000000000 c0000003ff14d480 0000000000004000
GPR12: c0000000001d77b0 c0000003ffff7880 0000000000000000 000000002eef18c0
GPR16: 0000000000000006 0000000000000006 0000000000000008 c000000002ca2468
GPR20: 0000000000000000 0000000000000004 0000000000000009 0000000000000001
GPR24: 0000000000000000 0000000000000001 0000000000000001 c0000003ff14d480
GPR28: 0000000000000001 0000000000000005 c00000008c80f480 c000000061915400
NIP [c0000000001cea60] update_load_avg+0x424/0x48c
LR [c0000000001d7254] enqueue_entity+0x5c/0x5b8
Call Trace:
[c000000003a67760] [c000000003a677d0] 0xc000000003a677d0 (unreliable)
[c000000003a677d0] [c0000000001d7254] enqueue_entity+0x5c/0x5b8
[c000000003a67880] [c0000000001d7918] enqueue_task_fair+0x168/0x7d8
[c000000003a678f0] [c0000000001b9554] enqueue_task+0x5c/0x1c8
[c000000003a67930] [c0000000001c3f40] ttwu_do_activate+0x98/0x2fc
[c000000003a67980] [c0000000001c4460] sched_ttwu_pending+0x2bc/0x72c
[c000000003a67a60] [c0000000002c16ac] __flush_smp_call_function_queue+0x1a0/0x750
[c000000003a67b10] [c00000000005e1c4] smp_ipi_demux_relaxed+0xec/0xf4
[c000000003a67b50] [c000000000057dd4] doorbell_exception+0xe0/0x25c
[c000000003a67b90] [c0000000000383d0] __replay_soft_interrupts+0xf0/0x154
[c000000003a67d40] [c000000000038684] arch_local_irq_restore.part.0+0x1cc/0x214
[c000000003a67d90] [c0000000001b6ec8] finish_task_switch.isra.0+0xb4/0x2f8
[c000000003a67e30] [c00000000110fb9c] __schedule+0x294/0x83c
[c000000003a67ee0] [c0000000011105f0] schedule_idle+0x3c/0x64
[c000000003a67f10] [c0000000001f27f0] do_idle+0x15c/0x1ac
[c000000003a67f60] [c0000000001f2b08] cpu_startup_entry+0x4c/0x50
[c000000003a67f90] [c00000000005ede0] start_secondary+0x284/0x288
[c000000003a67fe0] [c00000000000e058] start_secondary_prolog+0x10/0x14
----------------------------------------------------------------
perf stat -a: ( idle states enabled)
TTWU_QUEUE_DELAYED:
13,612,930 context-switches # 0.000 /sec
912,737 cpu-migrations # 0.000 /sec
1,245 page-faults # 0.000 /sec
449,817,741,085 cycles
137,051,199,092 instructions # 0.30 insn per cycle
25,789,965,217 branches # 0.000 /sec
286,202,628 branch-misses # 1.11% of all branches
NO_TTWU_QUEUE_DELAYED:
24,782,786 context-switches # 0.000 /sec
4,697,384 cpu-migrations # 0.000 /sec
1,250 page-faults # 0.000 /sec
701,934,506,023 cycles
220,728,025,829 instructions # 0.31 insn per cycle
40,271,327,989 branches # 0.000 /sec
474,496,395 branch-misses # 1.18% of all branches
both cycles and instructions are low.
-------------------------------------------------------------------
perf stat -a: ( idle states disabled)
TTWU_QUEUE_DELAYED:
15,402,193 context-switches # 0.000 /sec
1,237,128 cpu-migrations # 0.000 /sec
1,245 page-faults # 0.000 /sec
781,215,992,865 cycles
149,112,303,840 instructions # 0.19 insn per cycle
28,240,010,182 branches # 0.000 /sec
294,485,795 branch-misses # 1.04% of all branches
NO_TTWU_QUEUE_DELAYED:
25,332,898 context-switches # 0.000 /sec
4,756,682 cpu-migrations # 0.000 /sec
1,256 page-faults # 0.000 /sec
781,318,730,494 cycles
220,536,732,094 instructions # 0.28 insn per cycle
40,424,495,545 branches # 0.000 /sec
446,724,952 branch-misses # 1.11% of all branches
Since idle states are disabled, cycles are always spent on CPU. so cycles are more or less, while instruction
differs. Does it mean with feature enabled, is there a lock(maybe rq) for too long?
--------------------------------------------------------------------
Will try to gather more into why is this happening.
next prev parent reply other threads:[~2025-07-14 17:54 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-02 11:49 [PATCH v2 00/12] sched: Address schbench regression Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 01/12] sched/psi: Optimize psi_group_change() cpu_clock() usage Peter Zijlstra
2025-07-15 19:11 ` Chris Mason
2025-07-16 6:06 ` K Prateek Nayak
2025-07-16 6:53 ` Beata Michalska
2025-07-16 10:40 ` Peter Zijlstra
2025-07-16 14:54 ` Johannes Weiner
2025-07-16 16:27 ` Chris Mason
2025-07-23 4:16 ` Aithal, Srikanth
2025-07-25 5:13 ` K Prateek Nayak
2025-07-02 11:49 ` [PATCH v2 02/12] sched/deadline: Less agressive dl_server handling Peter Zijlstra
2025-07-02 16:12 ` Juri Lelli
2025-07-10 12:46 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-07-14 22:56 ` [PATCH v2 02/12] " Mel Gorman
2025-07-15 14:55 ` Chris Mason
2025-07-16 18:19 ` Mel Gorman
2025-07-30 9:34 ` Geert Uytterhoeven
2025-07-30 9:46 ` Juri Lelli
2025-07-30 10:05 ` Geert Uytterhoeven
2025-08-05 22:03 ` Chris Bainbridge
2025-08-05 23:04 ` Chris Bainbridge
2025-09-15 22:29 ` John Stultz
2025-09-16 4:18 ` John Stultz
2025-09-16 5:28 ` [RFC][PATCH] sched/deadline: Fix dl_server getting stuck, allowing cpu starvation John Stultz
2025-09-16 8:51 ` Juri Lelli
2025-09-16 11:01 ` Peter Zijlstra
2025-09-16 12:52 ` Juri Lelli
2025-09-16 14:30 ` Peter Zijlstra
2025-09-16 17:35 ` John Stultz
2025-09-16 21:30 ` Peter Zijlstra
2025-09-17 3:29 ` John Stultz
2025-09-17 9:34 ` Peter Zijlstra
2025-09-17 12:26 ` Peter Zijlstra
2025-09-17 13:56 ` Juri Lelli
2025-09-17 17:30 ` Peter Zijlstra
2025-09-18 8:37 ` Juri Lelli
2025-09-18 9:04 ` Peter Zijlstra
2025-09-18 9:42 ` Juri Lelli
2025-09-17 19:29 ` John Stultz
2025-09-18 6:56 ` [tip: sched/urgent] sched/deadline: Fix dl_server behaviour tip-bot2 for Peter Zijlstra
2025-09-25 7:55 ` tip-bot2 for Peter Zijlstra
2025-09-18 6:56 ` [tip: sched/urgent] sched/deadline: Fix dl_server getting stuck tip-bot2 for Peter Zijlstra
2025-09-18 14:46 ` Dietmar Eggemann
[not found] ` <CGME20250922215704eucas1p1f53a65a5cd1eafd3e0db006653231efd@eucas1p1.samsung.com>
2025-09-22 21:57 ` Marek Szyprowski
2025-09-22 23:46 ` John Stultz
2025-09-23 6:31 ` Marek Szyprowski
2025-09-23 7:25 ` Peter Zijlstra
2025-09-23 7:52 ` Marek Szyprowski
2025-09-23 22:02 ` Peter Zijlstra
2025-09-25 7:55 ` tip-bot2 for Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 03/12] sched: Optimize ttwu() / select_task_rq() Peter Zijlstra
2025-07-10 16:47 ` Vincent Guittot
2025-07-14 22:59 ` Mel Gorman
2025-07-02 11:49 ` [PATCH v2 04/12] sched: Use lock guard in ttwu_runnable() Peter Zijlstra
2025-07-10 16:48 ` Vincent Guittot
2025-07-14 23:00 ` Mel Gorman
2025-07-02 11:49 ` [PATCH v2 05/12] sched: Add ttwu_queue controls Peter Zijlstra
2025-07-10 16:51 ` Vincent Guittot
2025-07-14 23:14 ` Mel Gorman
2025-07-02 11:49 ` [PATCH v2 06/12] sched: Introduce ttwu_do_migrate() Peter Zijlstra
2025-07-10 16:51 ` Vincent Guittot
2025-07-02 11:49 ` [PATCH v2 07/12] psi: Split psi_ttwu_dequeue() Peter Zijlstra
2025-07-17 23:59 ` Chris Mason
2025-07-18 18:02 ` Steven Rostedt
2025-07-02 11:49 ` [PATCH v2 08/12] sched: Re-arrange __ttwu_queue_wakelist() Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 09/12] sched: Clean up ttwu comments Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 10/12] sched: Use lock guard in sched_ttwu_pending() Peter Zijlstra
2025-07-10 16:51 ` Vincent Guittot
2025-07-02 11:49 ` [PATCH v2 11/12] sched: Change ttwu_runnable() vs sched_delayed Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 12/12] sched: Add ttwu_queue support for delayed tasks Peter Zijlstra
2025-07-03 16:00 ` Phil Auld
2025-07-03 16:47 ` Peter Zijlstra
2025-07-03 17:11 ` Phil Auld
2025-07-14 13:57 ` Phil Auld
2025-07-04 6:13 ` K Prateek Nayak
2025-07-04 7:59 ` Peter Zijlstra
2025-07-08 12:44 ` Dietmar Eggemann
2025-07-08 18:57 ` Peter Zijlstra
2025-07-08 21:02 ` Peter Zijlstra
2025-07-23 5:42 ` Shrikanth Hegde
2025-07-02 15:27 ` [PATCH v2 00/12] sched: Address schbench regression Chris Mason
2025-07-07 9:05 ` Shrikanth Hegde
2025-07-07 9:11 ` Peter Zijlstra
2025-07-07 9:38 ` Shrikanth Hegde
2025-07-16 13:46 ` Phil Auld
2025-07-17 17:25 ` Phil Auld
2025-07-07 18:19 ` Shrikanth Hegde
2025-07-08 19:02 ` Peter Zijlstra
2025-07-09 16:46 ` Shrikanth Hegde
2025-07-14 17:54 ` Shrikanth Hegde [this message]
2025-07-21 19:37 ` Shrikanth Hegde
2025-07-22 20:20 ` Chris Mason
2025-07-24 18:23 ` Chris Mason
2025-07-08 15:09 ` Chris Mason
2025-07-08 17:29 ` Shrikanth Hegde
2025-07-17 13:04 ` Beata Michalska
2025-07-17 16:57 ` Beata Michalska
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c6c0c135-9d8f-4d9d-8fc5-bc703cac9bdb@linux.ibm.com \
--to=sshegde@linux.ibm.com \
--cc=bsegall@google.com \
--cc=clm@meta.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).