From: Christian Loehle <christian.loehle@arm.com>
To: Qais Yousef <qyousef@layalina.io>, Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Viresh Kumar <viresh.kumar@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
John Stultz <jstultz@google.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
"Chen, Yu C" <yu.c.chen@intel.com>,
Thomas Gleixner <tglx@kernel.org>,
linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org
Subject: Re: [PATCH v2 09/13] sched/qos: Add rampup multiplier QoS
Date: Tue, 12 May 2026 09:37:32 +0100 [thread overview]
Message-ID: <1745c99b-4e31-4d7e-8221-c775ed30436f@arm.com> (raw)
In-Reply-To: <20260512075953.uoicyuwwvqcejxpn@airbuntu>
On 5/12/26 08:59, Qais Yousef wrote:
> On 05/11/26 13:03, Peter Zijlstra wrote:
>> On Mon, May 04, 2026 at 02:59:59AM +0100, Qais Yousef wrote:
>>
>>> diff --git a/Documentation/scheduler/sched-qos.rst b/Documentation/scheduler/sched-qos.rst
>>> index 0911261cb124..f68856f23b6b 100644
>>> --- a/Documentation/scheduler/sched-qos.rst
>>> +++ b/Documentation/scheduler/sched-qos.rst
>>> @@ -42,3 +42,25 @@ need for extension will arise; and when this happen the task should be
>>> simpler to add the kernel extension and allow userspace to use readily by
>>> setting the newly added flag without having to update the whole of
>>> sched_attr.
>>> +
>>> +2. QoS Tags
>>> +===========
>>> +
>>> +SCHED_QOS_RAMPUP_MULTIPLIER
>>> +---------------------------
>>> +
>>> +Controls how fast util signal rises. Affects frequency selection when schedutil
>>> +is in use. And affects how fast tasks migrate between clusters on HMP systems.
>>> +
>>> +It affects bursty tasks only. Perfectly periodic tasks are well described by
>>> +util_avg and the rampup multiplier will have no effect on them.
>>> +
>>> +When set to 0, util_est will be disabled to help further with power saving.
>>> +This behavior can be controlled via UTIL_EST_RAMPUP_ZERO sched_feature.
>>> +
>>> +Value is not capped to retain flexibility, but it tapers off very quickly to
>>> +notice a difference above 16. Roughly it takes ~200ms to reach a util_avg of
>>> +1000 starting from 0. With 16 it should take ~12.5ms. A range of 0-8 is
>>> +advised for general use.
>>> +
>>> +Cookie must always be set to 0.
>>
>> So this is a very specific feature. This is made possible by basically
>> having a huge type space, allowing for throw-away hints (as per the
>> previous email).
>
> Hmm. It is specific and generic. It is specific in a sense it is about the rise
> time through performance level and scheduler integration with schedutil. It is
> generic also because it is about the time it takes scheduler/kernel to move
> through performance levels. I could change the description to focus on these
> generic elements of DVFS response time and migration time for HMP systems.
>
> I think if we move away from PELT etc, the concept will still be valid but
> implemented differently unless the new implementation can't use the concept of
> a multiplier for some reason to speed up the rise time.
>
>>
>> I suppose having these specific hints is easy, but as per always there
>> is the discussion about describing task behaviour vs implementation
>> details. With the argument being that task behaviour might be a more
>> lasting / stable hint, while implementation details are far easier to
>> actually do.
>>
>> I'm missing this discussion.
>
> The intention is to describe task behavior. But being practical as well and
> allow solve real world problems with ease - so if implementation detail
> description will help us fix problems simply and easily, then I am for it.
>
> The question is how to protect ourselves? :-)
>
> This is where the two levels of QoS can help.
>
> One level is for app developers, which is high level abstraction that is
> detached from OS internals and details. This is done in schedqos I announced
> recently. The goal is for users to use the QoS exposed by this service and not
> to interact directly with scheduler/kernel.
>
> The other level is this one proposed here; which is to enable this smart
> service to provide a meaningful abstraction for end users, but not directly
> being used by them - and we can define it whatever we like.
>
> And this brings us to a contentious point, how to protect and enforce this
> behavior?
>
> I think we need to enforce that these hints are used by some all knowing entity
> and for sched_attr to be locked down by everyone except it. Vincent was
> suggesting to use SELinux to lockdown sched_attrs, but given recent issues with
> tcmalloc I think we must eneforce something at kernel level. CAP_NICE is spread
> around and we don't want to mix and match how sched_attr and these new QoS are
> used.
>
> To address this I think we need to introduce a new CAP_PERF_MANAGER (or pick
> your favourite name here) that can only be set for specific binaries and only
> one binary is allowed to exec with this capability. If two binaries with this
> capability try to run, then the second one will fail unless the first one has
> exited first. And when it is running, we lock down sched_setattr() except for
> this CAP_PERF_MANAGER.>
> I am not sure if this is enough, but I think we must enforce the usage pattern
> else we can end up with a mess. I think we all agree it is hard for
> applications to use sched_attr in general directly, given the benefit of
> a hindsight. I commonly see the simple nice value misused in practice for
> example.
>
> Ideally I'd love to enforce a single trusted binary if that can be done :p
>
Just to follow along, does that mean if an application runs with CAP_PERF_MANAGER
any other that doesn't have CAP_PERF_MANAGER and calls any of
sched_setattr()
sched_setscheduler()
sched_setparam()
nice()
setpriority()
would get EPERM? Or silently be dropped?
Either seems error-prone and potentially no longer work as a "Zero API adoption mechanism".
Chromium and Unity seem to handle sched_setattr() failing, but unsure what the
situation looks like generally.
next prev parent reply other threads:[~2026-05-12 8:37 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-04 1:59 [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time Qais Yousef
2026-05-04 1:59 ` [PATCH v2 01/13] sched: cpufreq: Rename map_util_perf to sugov_apply_dvfs_headroom Qais Yousef
2026-05-04 1:59 ` [PATCH v2 02/13] sched/pelt: Add a new function to approximate the future util_avg value Qais Yousef
2026-05-04 1:59 ` [PATCH v2 03/13] sched/pelt: Add a new function to approximate runtime to reach given util Qais Yousef
2026-05-04 1:59 ` [PATCH v2 04/13] sched/fair: Remove magic hardcoded margin in fits_capacity() Qais Yousef
2026-05-04 1:59 ` [PATCH v2 05/13] sched: cpufreq: Remove magic 1.25 headroom from sugov_apply_dvfs_headroom() Qais Yousef
2026-05-04 1:59 ` [PATCH v2 06/13] sched/fair: Extend util_est to improve rampup time Qais Yousef
2026-05-04 1:59 ` [PATCH v2 07/13] sched/fair: util_est: Take into account periodic tasks Qais Yousef
2026-05-04 1:59 ` [PATCH v2 RFC 08/13] sched/qos: Add a new sched-qos interface Qais Yousef
2026-05-06 20:38 ` Tim Chen
2026-05-07 9:55 ` Qais Yousef
2026-05-07 14:20 ` Chen, Yu C
2026-05-09 9:39 ` Qais Yousef
2026-05-11 10:57 ` Peter Zijlstra
2026-05-12 7:58 ` Qais Yousef
2026-05-12 8:30 ` Peter Zijlstra
2026-05-12 8:47 ` Qais Yousef
2026-05-04 1:59 ` [PATCH v2 09/13] sched/qos: Add rampup multiplier QoS Qais Yousef
2026-05-11 11:03 ` Peter Zijlstra
2026-05-12 7:59 ` Qais Yousef
2026-05-12 8:37 ` Christian Loehle [this message]
2026-05-12 8:53 ` Qais Yousef
2026-05-04 2:00 ` [PATCH v2 10/13] sched/fair: Disable util_est when rampup_multiplier is 0 Qais Yousef
2026-05-04 2:00 ` [PATCH v2 11/13] sched/fair: Don't mess with util_avg post init Qais Yousef
2026-05-04 2:00 ` [PATCH v2 12/13] sched/fair: Call update_util_est() after dequeue_entities() Qais Yousef
2026-05-04 2:00 ` [PATCH v2 RFC 13/13] sched/pelt: Always allow load updates Qais Yousef
2026-05-11 17:58 ` [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time John Stultz
2026-05-12 8:01 ` Qais Yousef
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1745c99b-4e31-4d7e-8221-c775ed30436f@arm.com \
--to=christian.loehle@arm.com \
--cc=dietmar.eggemann@arm.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=qyousef@layalina.io \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=tglx@kernel.org \
--cc=tim.c.chen@linux.intel.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox