Re: [PATCH v2 09/13] sched/qos: Add rampup multiplier QoS

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Christian Loehle <christian.loehle@arm.com>
To: Qais Yousef <qyousef@layalina.io>, Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	John Stultz <jstultz@google.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	"Chen, Yu C" <yu.c.chen@intel.com>,
	Thomas Gleixner <tglx@kernel.org>,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org
Subject: Re: [PATCH v2 09/13] sched/qos: Add rampup multiplier QoS
Date: Tue, 12 May 2026 09:37:32 +0100	[thread overview]
Message-ID: <1745c99b-4e31-4d7e-8221-c775ed30436f@arm.com> (raw)
In-Reply-To: <20260512075953.uoicyuwwvqcejxpn@airbuntu>

On 5/12/26 08:59, Qais Yousef wrote:
> On 05/11/26 13:03, Peter Zijlstra wrote:
>> On Mon, May 04, 2026 at 02:59:59AM +0100, Qais Yousef wrote:
>>
>>> diff --git a/Documentation/scheduler/sched-qos.rst b/Documentation/scheduler/sched-qos.rst
>>> index 0911261cb124..f68856f23b6b 100644
>>> --- a/Documentation/scheduler/sched-qos.rst
>>> +++ b/Documentation/scheduler/sched-qos.rst
>>> @@ -42,3 +42,25 @@ need for extension will arise; and when this happen the task should be
>>>  simpler to add the kernel extension and allow userspace to use readily by
>>>  setting the newly added flag without having to update the whole of
>>>  sched_attr.
>>> +
>>> +2. QoS Tags
>>> +===========
>>> +
>>> +SCHED_QOS_RAMPUP_MULTIPLIER
>>> +---------------------------
>>> +
>>> +Controls how fast util signal rises. Affects frequency selection when schedutil
>>> +is in use. And affects how fast tasks migrate between clusters on HMP systems.
>>> +
>>> +It affects bursty tasks only. Perfectly periodic tasks are well described by
>>> +util_avg and the rampup multiplier will have no effect on them.
>>> +
>>> +When set to 0, util_est will be disabled to help further with power saving.
>>> +This behavior can be controlled via UTIL_EST_RAMPUP_ZERO sched_feature.
>>> +
>>> +Value is not capped to retain flexibility, but it tapers off very quickly to
>>> +notice a difference above 16. Roughly it takes ~200ms to reach a util_avg of
>>> +1000 starting from 0. With 16 it should take ~12.5ms. A range of 0-8 is
>>> +advised for general use.
>>> +
>>> +Cookie must always be set to 0.
>>
>> So this is a very specific feature. This is made possible by basically
>> having a huge type space, allowing for throw-away hints (as per the
>> previous email).
> 
> Hmm. It is specific and generic. It is specific in a sense it is about the rise
> time through performance level and scheduler integration with schedutil. It is
> generic also because it is about the time it takes scheduler/kernel to move
> through performance levels. I could change the description to focus on these
> generic elements of DVFS response time and migration time for HMP systems.
> 
> I think if we move away from PELT etc, the concept will still be valid but
> implemented differently unless the new implementation can't use the concept of
> a multiplier for some reason to speed up the rise time.
> 
>>
>> I suppose having these specific hints is easy, but as per always there
>> is the discussion about describing task behaviour vs implementation
>> details. With the argument being that task behaviour might be a more
>> lasting / stable hint, while implementation details are far easier to
>> actually do.
>>
>> I'm missing this discussion.
> 
> The intention is to describe task behavior. But being practical as well and
> allow solve real world problems with ease - so if implementation detail
> description will help us fix problems simply and easily, then I am for it.
> 
> The question is how to protect ourselves? :-)
> 
> This is where the two levels of QoS can help.
> 
> One level is for app developers, which is high level abstraction that is
> detached from OS internals and details. This is done in schedqos I announced
> recently. The goal is for users to use the QoS exposed by this service and not
> to interact directly with scheduler/kernel.
> 
> The other level is this one proposed here; which is to enable this smart
> service to provide a meaningful abstraction for end users, but not directly
> being used by them - and we can define it whatever we like.
> 
> And this brings us to a contentious point, how to protect and enforce this
> behavior?
> 
> I think we need to enforce that these hints are used by some all knowing entity
> and for sched_attr to be locked down by everyone except it. Vincent was
> suggesting to use SELinux to lockdown sched_attrs, but given recent issues with
> tcmalloc I think we must eneforce something at kernel level. CAP_NICE is spread
> around and we don't want to mix and match how sched_attr and these new QoS are
> used.
> 
> To address this I think we need to introduce a new CAP_PERF_MANAGER (or pick
> your favourite name here) that can only be set for specific binaries and only
> one binary is allowed to exec with this capability. If two binaries with this
> capability try to run, then the second one will fail unless the first one has
> exited first. And when it is running, we lock down sched_setattr() except for
> this CAP_PERF_MANAGER.> 
> I am not sure if this is enough, but I think we must enforce the usage pattern
> else we can end up with a mess. I think we all agree it is hard for
> applications to use sched_attr in general directly, given the benefit of
> a hindsight. I commonly see the simple nice value misused in practice for
> example.
> 
> Ideally I'd love to enforce a single trusted binary if that can be done :p
> 


Just to follow along, does that mean if an application runs with CAP_PERF_MANAGER
any other that doesn't have CAP_PERF_MANAGER and calls any of
sched_setattr()
sched_setscheduler()
sched_setparam()
nice()
setpriority()

would get EPERM? Or silently be dropped?
Either seems error-prone and potentially no longer work as a "Zero API adoption mechanism".
Chromium and Unity seem to handle sched_setattr() failing, but unsure what the
situation looks like generally.

next prev parent reply	other threads:[~2026-05-12  8:37 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-04  1:59 [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time Qais Yousef
2026-05-04  1:59 ` [PATCH v2 01/13] sched: cpufreq: Rename map_util_perf to sugov_apply_dvfs_headroom Qais Yousef
2026-05-04  1:59 ` [PATCH v2 02/13] sched/pelt: Add a new function to approximate the future util_avg value Qais Yousef
2026-05-04  1:59 ` [PATCH v2 03/13] sched/pelt: Add a new function to approximate runtime to reach given util Qais Yousef
2026-05-04  1:59 ` [PATCH v2 04/13] sched/fair: Remove magic hardcoded margin in fits_capacity() Qais Yousef
2026-05-04  1:59 ` [PATCH v2 05/13] sched: cpufreq: Remove magic 1.25 headroom from sugov_apply_dvfs_headroom() Qais Yousef
2026-05-04  1:59 ` [PATCH v2 06/13] sched/fair: Extend util_est to improve rampup time Qais Yousef
2026-05-04  1:59 ` [PATCH v2 07/13] sched/fair: util_est: Take into account periodic tasks Qais Yousef
2026-05-04  1:59 ` [PATCH v2 RFC 08/13] sched/qos: Add a new sched-qos interface Qais Yousef
2026-05-06 20:38   ` Tim Chen
2026-05-07  9:55     ` Qais Yousef
2026-05-07 14:20       ` Chen, Yu C
2026-05-09  9:39         ` Qais Yousef
2026-05-11 10:57   ` Peter Zijlstra
2026-05-12  7:58     ` Qais Yousef
2026-05-12  8:30       ` Peter Zijlstra
2026-05-12  8:47         ` Qais Yousef
2026-05-04  1:59 ` [PATCH v2 09/13] sched/qos: Add rampup multiplier QoS Qais Yousef
2026-05-11 11:03   ` Peter Zijlstra
2026-05-12  7:59     ` Qais Yousef
2026-05-12  8:37       ` Christian Loehle [this message]
2026-05-12  8:53         ` Qais Yousef
2026-05-04  2:00 ` [PATCH v2 10/13] sched/fair: Disable util_est when rampup_multiplier is 0 Qais Yousef
2026-05-04  2:00 ` [PATCH v2 11/13] sched/fair: Don't mess with util_avg post init Qais Yousef
2026-05-04  2:00 ` [PATCH v2 12/13] sched/fair: Call update_util_est() after dequeue_entities() Qais Yousef
2026-05-04  2:00 ` [PATCH v2 RFC 13/13] sched/pelt: Always allow load updates Qais Yousef
2026-05-11 17:58 ` [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time John Stultz
2026-05-12  8:01   ` Qais Yousef
2026-05-13 15:09 ` Tom Gebhardt
2026-05-15  1:42   ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1745c99b-4e31-4d7e-8221-c775ed30436f@arm.com \
    --to=christian.loehle@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=jstultz@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@kernel.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.