Re: [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: Tom Gebhardt <tomge68@gmail.com>
To: Qais Yousef <qyousef@layalina.io>
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time
Date: Fri, 29 May 2026 09:53:08 +0200	[thread overview]
Message-ID: <65ef80bbf22867d1ea59642a3e03e3ef.tomge68@gmail.com> (raw)
In-Reply-To: <20260529014333.w3tqvkirgo7jij6l@airbuntu>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4176 bytes --]

On 05/29/26 02:43, Qais Yousef wrote:
> 7.0+ttwu+vincent is the best, right?

Yes.

> Have you verified your actual workload is seeing benefit? I think when
> I scanned the github bug you references the original report was observing
> a regression in some real setup, not this stressng tests.

I am the reporter. The original issue (#7308) was observed as a drop in
camera frame rate when running two parallel IMX477 streams via libcamera on
RPi5 under kernel 6.12+. The camera pipeline is pipe-IPC-heavy (GStreamer /
libcamera internal queues), so the regression surfaced there first in a real
workload. To isolate the cause I moved to synthetic pipe benchmarks
(stress-ng), which confirmed and quantified the regression cleanly.
A Raspberry Pi developer (popcornmix) also posted IPC benchmark results on
the issue, independently confirming the trend across kernel versions
(6.6=2065 > 6.18=1805 > 6.12=1662 > 7.0=1570 Kops/s).

The stress-ng pipe stressor is therefore not an artificial worst-case -- it
directly exercises the code path that causes the real-world camera regression.
That said, I agree stress-ng amplifies the effect, and I cannot give you an
exact frame-rate number yet with the ttwu+vincent patches applied.

> IPC drops 14% on 7.0 stock. Due to stalling you reckon?

Yes. The branch misprediction rate explains most of it. On Cortex-A76 a
branch mispredict costs ~13 cycles. Normalised by instruction count:

  Kernel             branch-miss rate   vs 6.6
  -----------------  -----------------  ------
  6.6.78             0.178%             ref
  7.0.0 stock        0.427%             +140%
  7.0.0+ttwu+vincent 0.271%             +52%

The raw counts I reported yesterday were misleading because the instruction
counts differ between kernels (different amounts of useful work). Apologies
for not normalising upfront. The rate tells a cleaner story: stock EEVDF
causes 2.4× more mispredictions per instruction than CFS, and ttwu+vincent
brings that down to 1.5× -- significant improvement but not full recovery.

> Do you have the full output? It would be interesting to use perf diff.

A proper perf diff with resolved kernel symbols requires running against the
matching kernel. I ran `perf report --no-children -s symbol` on each .data
file while booted on the corresponding kernel. Key findings:

7.0.0 stock (flat, self-overhead):

  12.98%  finish_task_switch.isra.0
          -> __schedule -> schedule
             -> anon_pipe_read   5.72%
             -> anon_pipe_write  1.38%

7.0.0+ttwu+vincent (flat, self-overhead):

  19.62%  finish_task_switch.isra.0
          -> __schedule -> schedule
             -> anon_pipe_read   8.22%
             -> anon_pipe_write  4.34%

The striking difference is in the pipe_write -> schedule() path: 1.38% on
stock vs 4.34% with ttwu+vincent. The ttwu patches make pipe writers yield
the CPU far more aggressively after each write, allowing the reader to run
immediately. Stock EEVDF leaves this to the scheduler's own timing, which
results in more latency and lower throughput.

The higher absolute percentage in finish_task_switch for vincent is expected:
vincent completes ~24% more pipe operations in the same wall time, so there
are proportionally more context switches completing.

On 6.6 (from the call-graph profile recorded separately), finish_task_switch
is not visible as a top-level hotspot at all -- consistent with CFS handling
this path much more efficiently.

> Maybe there's higher rq lock contention. But this finish_task_switch and
> __raw_spin_unlock_irqrestore are common to see, especially when there's
> high context switch rate.

Agreed -- I cannot rule out rq lock contention without perf diff with
matched build-IDs. The pattern I see (finish_task_switch dominant, driven
by pipe_read/write) is consistent with high context switch rate rather than
a pathological lock. But your point about a 'hot variable' like rq->clock
is noted -- I cannot confirm or deny that from flat profiles alone.

> I hope perfetto trace will help visualize the pattern that led to this
> higher context switching.

I will work on getting a perfetto trace. Expecting to have that in a
follow-up.

Tom

next prev parent reply	other threads:[~2026-05-29  7:53 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-04  1:59 [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time Qais Yousef
2026-05-04  1:59 ` [PATCH v2 01/13] sched: cpufreq: Rename map_util_perf to sugov_apply_dvfs_headroom Qais Yousef
2026-05-04  1:59 ` [PATCH v2 02/13] sched/pelt: Add a new function to approximate the future util_avg value Qais Yousef
2026-05-04  1:59 ` [PATCH v2 03/13] sched/pelt: Add a new function to approximate runtime to reach given util Qais Yousef
2026-06-04  8:50   ` Vincent Guittot
2026-05-04  1:59 ` [PATCH v2 04/13] sched/fair: Remove magic hardcoded margin in fits_capacity() Qais Yousef
2026-06-04 10:08   ` Vincent Guittot
2026-05-04  1:59 ` [PATCH v2 05/13] sched: cpufreq: Remove magic 1.25 headroom from sugov_apply_dvfs_headroom() Qais Yousef
2026-05-04  1:59 ` [PATCH v2 06/13] sched/fair: Extend util_est to improve rampup time Qais Yousef
2026-05-04  1:59 ` [PATCH v2 07/13] sched/fair: util_est: Take into account periodic tasks Qais Yousef
2026-05-04  1:59 ` [PATCH v2 RFC 08/13] sched/qos: Add a new sched-qos interface Qais Yousef
2026-05-06 20:38   ` Tim Chen
2026-05-07  9:55     ` Qais Yousef
2026-05-07 14:20       ` Chen, Yu C
2026-05-09  9:39         ` Qais Yousef
2026-05-11 10:57   ` Peter Zijlstra
2026-05-12  7:58     ` Qais Yousef
2026-05-12  8:30       ` Peter Zijlstra
2026-05-12  8:47         ` Qais Yousef
2026-05-19  9:47   ` Peter Zijlstra
2026-05-19 10:56     ` Qais Yousef
2026-05-04  1:59 ` [PATCH v2 09/13] sched/qos: Add rampup multiplier QoS Qais Yousef
2026-05-11 11:03   ` Peter Zijlstra
2026-05-12  7:59     ` Qais Yousef
2026-05-12  8:37       ` Christian Loehle
2026-05-12  8:53         ` Qais Yousef
2026-05-04  2:00 ` [PATCH v2 10/13] sched/fair: Disable util_est when rampup_multiplier is 0 Qais Yousef
2026-05-04  2:00 ` [PATCH v2 11/13] sched/fair: Don't mess with util_avg post init Qais Yousef
2026-05-04  2:00 ` [PATCH v2 12/13] sched/fair: Call update_util_est() after dequeue_entities() Qais Yousef
2026-05-04  2:00 ` [PATCH v2 RFC 13/13] sched/pelt: Always allow load updates Qais Yousef
2026-05-11 17:58 ` [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time John Stultz
2026-05-12  8:01   ` Qais Yousef
2026-05-13 15:09 ` Tom Gebhardt
2026-05-15  1:42   ` Qais Yousef
2026-05-15  8:24     ` Tom Gebhardt
2026-05-15 10:01       ` Christian Loehle
2026-05-15 13:57         ` Tom Gebhardt
2026-05-16 13:43           ` Christian Loehle
2026-05-19  7:46             ` Tom Gebhardt
2026-05-25  7:25             ` Tom Gebhardt
2026-05-28  7:38               ` Christian Loehle
2026-05-28 11:08                 ` Tom Gebhardt
2026-05-16  3:01       ` Qais Yousef
2026-05-28 12:50         ` Tom Gebhardt
2026-05-29  1:43           ` Qais Yousef
2026-05-29  7:53             ` Tom Gebhardt [this message]
2026-05-31  2:35               ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65ef80bbf22867d1ea59642a3e03e3ef.tomge68@gmail.com \
    --to=tomge68@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox