Re: [PATCH v2 09/13] sched/qos: Add rampup multiplier QoS

Linux Power Management development
 help / color / mirror / Atom feed

From: Qais Yousef <qyousef@layalina.io>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	John Stultz <jstultz@google.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	"Chen, Yu C" <yu.c.chen@intel.com>,
	Thomas Gleixner <tglx@kernel.org>,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org
Subject: Re: [PATCH v2 09/13] sched/qos: Add rampup multiplier QoS
Date: Tue, 12 May 2026 08:59:53 +0100	[thread overview]
Message-ID: <20260512075953.uoicyuwwvqcejxpn@airbuntu> (raw)
In-Reply-To: <20260511110328.GS3126523@noisy.programming.kicks-ass.net>

On 05/11/26 13:03, Peter Zijlstra wrote:
> On Mon, May 04, 2026 at 02:59:59AM +0100, Qais Yousef wrote:
> 
> > diff --git a/Documentation/scheduler/sched-qos.rst b/Documentation/scheduler/sched-qos.rst
> > index 0911261cb124..f68856f23b6b 100644
> > --- a/Documentation/scheduler/sched-qos.rst
> > +++ b/Documentation/scheduler/sched-qos.rst
> > @@ -42,3 +42,25 @@ need for extension will arise; and when this happen the task should be
> >  simpler to add the kernel extension and allow userspace to use readily by
> >  setting the newly added flag without having to update the whole of
> >  sched_attr.
> > +
> > +2. QoS Tags
> > +===========
> > +
> > +SCHED_QOS_RAMPUP_MULTIPLIER
> > +---------------------------
> > +
> > +Controls how fast util signal rises. Affects frequency selection when schedutil
> > +is in use. And affects how fast tasks migrate between clusters on HMP systems.
> > +
> > +It affects bursty tasks only. Perfectly periodic tasks are well described by
> > +util_avg and the rampup multiplier will have no effect on them.
> > +
> > +When set to 0, util_est will be disabled to help further with power saving.
> > +This behavior can be controlled via UTIL_EST_RAMPUP_ZERO sched_feature.
> > +
> > +Value is not capped to retain flexibility, but it tapers off very quickly to
> > +notice a difference above 16. Roughly it takes ~200ms to reach a util_avg of
> > +1000 starting from 0. With 16 it should take ~12.5ms. A range of 0-8 is
> > +advised for general use.
> > +
> > +Cookie must always be set to 0.
> 
> So this is a very specific feature. This is made possible by basically
> having a huge type space, allowing for throw-away hints (as per the
> previous email).

Hmm. It is specific and generic. It is specific in a sense it is about the rise
time through performance level and scheduler integration with schedutil. It is
generic also because it is about the time it takes scheduler/kernel to move
through performance levels. I could change the description to focus on these
generic elements of DVFS response time and migration time for HMP systems.

I think if we move away from PELT etc, the concept will still be valid but
implemented differently unless the new implementation can't use the concept of
a multiplier for some reason to speed up the rise time.

> 
> I suppose having these specific hints is easy, but as per always there
> is the discussion about describing task behaviour vs implementation
> details. With the argument being that task behaviour might be a more
> lasting / stable hint, while implementation details are far easier to
> actually do.
> 
> I'm missing this discussion.

The intention is to describe task behavior. But being practical as well and
allow solve real world problems with ease - so if implementation detail
description will help us fix problems simply and easily, then I am for it.

The question is how to protect ourselves? :-)

This is where the two levels of QoS can help.

One level is for app developers, which is high level abstraction that is
detached from OS internals and details. This is done in schedqos I announced
recently. The goal is for users to use the QoS exposed by this service and not
to interact directly with scheduler/kernel.

The other level is this one proposed here; which is to enable this smart
service to provide a meaningful abstraction for end users, but not directly
being used by them - and we can define it whatever we like.

And this brings us to a contentious point, how to protect and enforce this
behavior?

I think we need to enforce that these hints are used by some all knowing entity
and for sched_attr to be locked down by everyone except it. Vincent was
suggesting to use SELinux to lockdown sched_attrs, but given recent issues with
tcmalloc I think we must eneforce something at kernel level. CAP_NICE is spread
around and we don't want to mix and match how sched_attr and these new QoS are
used.

To address this I think we need to introduce a new CAP_PERF_MANAGER (or pick
your favourite name here) that can only be set for specific binaries and only
one binary is allowed to exec with this capability. If two binaries with this
capability try to run, then the second one will fail unless the first one has
exited first. And when it is running, we lock down sched_setattr() except for
this CAP_PERF_MANAGER.

I am not sure if this is enough, but I think we must enforce the usage pattern
else we can end up with a mess. I think we all agree it is hard for
applications to use sched_attr in general directly, given the benefit of
a hindsight. I commonly see the simple nice value misused in practice for
example.

Ideally I'd love to enforce a single trusted binary if that can be done :p

next prev parent reply	other threads:[~2026-05-12  7:59 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-04  1:59 [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time Qais Yousef
2026-05-04  1:59 ` [PATCH v2 01/13] sched: cpufreq: Rename map_util_perf to sugov_apply_dvfs_headroom Qais Yousef
2026-05-04  1:59 ` [PATCH v2 02/13] sched/pelt: Add a new function to approximate the future util_avg value Qais Yousef
2026-05-04  1:59 ` [PATCH v2 03/13] sched/pelt: Add a new function to approximate runtime to reach given util Qais Yousef
2026-05-04  1:59 ` [PATCH v2 04/13] sched/fair: Remove magic hardcoded margin in fits_capacity() Qais Yousef
2026-05-04  1:59 ` [PATCH v2 05/13] sched: cpufreq: Remove magic 1.25 headroom from sugov_apply_dvfs_headroom() Qais Yousef
2026-05-04  1:59 ` [PATCH v2 06/13] sched/fair: Extend util_est to improve rampup time Qais Yousef
2026-05-04  1:59 ` [PATCH v2 07/13] sched/fair: util_est: Take into account periodic tasks Qais Yousef
2026-05-04  1:59 ` [PATCH v2 RFC 08/13] sched/qos: Add a new sched-qos interface Qais Yousef
2026-05-06 20:38   ` Tim Chen
2026-05-07  9:55     ` Qais Yousef
2026-05-07 14:20       ` Chen, Yu C
2026-05-09  9:39         ` Qais Yousef
2026-05-11 10:57   ` Peter Zijlstra
2026-05-12  7:58     ` Qais Yousef
2026-05-12  8:30       ` Peter Zijlstra
2026-05-12  8:47         ` Qais Yousef
2026-05-04  1:59 ` [PATCH v2 09/13] sched/qos: Add rampup multiplier QoS Qais Yousef
2026-05-11 11:03   ` Peter Zijlstra
2026-05-12  7:59     ` Qais Yousef [this message]
2026-05-12  8:37       ` Christian Loehle
2026-05-12  8:53         ` Qais Yousef
2026-05-04  2:00 ` [PATCH v2 10/13] sched/fair: Disable util_est when rampup_multiplier is 0 Qais Yousef
2026-05-04  2:00 ` [PATCH v2 11/13] sched/fair: Don't mess with util_avg post init Qais Yousef
2026-05-04  2:00 ` [PATCH v2 12/13] sched/fair: Call update_util_est() after dequeue_entities() Qais Yousef
2026-05-04  2:00 ` [PATCH v2 RFC 13/13] sched/pelt: Always allow load updates Qais Yousef
2026-05-11 17:58 ` [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time John Stultz
2026-05-12  8:01   ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260512075953.uoicyuwwvqcejxpn@airbuntu \
    --to=qyousef@layalina.io \
    --cc=dietmar.eggemann@arm.com \
    --cc=jstultz@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@kernel.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox