From: Peter Zijlstra <peterz@infradead.org>
To: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@kernel.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Viresh Kumar <viresh.kumar@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
John Stultz <jstultz@google.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
"Chen, Yu C" <yu.c.chen@intel.com>,
Thomas Gleixner <tglx@kernel.org>,
linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org
Subject: Re: [PATCH v2 RFC 08/13] sched/qos: Add a new sched-qos interface
Date: Mon, 11 May 2026 12:57:04 +0200 [thread overview]
Message-ID: <20260511105704.GR3126523@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20260504020003.71306-9-qyousef@layalina.io>
On Mon, May 04, 2026 at 02:59:58AM +0100, Qais Yousef wrote:
> Provide a generic and extensible interface to describe arbitrary QoS
> tags to tell the kernel about specific behavior that is doesn't fall
> into the existing sched_attr.
>
> The interface is broken into three parts:
>
> * Type
> * Value
> * Cookie
>
> Type is an enum that should be give us enough space to extend (and
> deprecate) comfortably.
>
> Value is a signed 64bit number to allow for arbitrary high values.
>
> Cookie is to help group tasks selectively so that some QoS might want to
> operate on tasks per groups. A value of 0 indicates system wide.
>
> There are two anticipated users being discussed on the list.
>
> 1. Per task rampup multiplier to allow controlling how fast util rises,
> and by implication it can migrate between cores on HMP systems and
> cause freqs to rise with schedutil.
>
> 2. Tag a group of task that are memory dependent for Cache Aware
> Scheduling.
>
> The interface is anticipated to be provisioned to apps via utilities and
> libraries. schedqos [1] is an example how such interface can be used to
> provide higher level QoS abstraction to describe workloads without
> baking it into the binaries, and by implication without worrying about
> potential abuse. The interface requires privileged access since QoS is
> considered scarce resource and requires admin control to ensure it is
> set properly. Again that admin control is anticipated to be the schedqos
> utility service.
>
> QoS is treated as a scarce resource and the intention is for the
> a syscall to be done for each individual QoS tag. QoS tags are not
> inherited on fork by default too for the same reason.
>
> A reasonable point of debate is whether to make the sched_qos an array
> of 3 or 5 value to avoid potential bottleneck if this grows large and
> users do end up hitting a bottleneck of having to issue too many
> syscalls to set all QoS. Being limited as it is now helps enforce
> intentionality and scarcity of tagging.
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=============
> +Scheduler QoS
> +=============
> +
> +1. Introduction
> +===============
> +
> +Different workloads have different scheduling requirements to operate
> +optimally. The same applies to tasks within the same workload.
> +
> +To enable smarter usage of system resources and to cater for the conflicting
> +demands of various tasks, Scheduler QoS provides a mechanism to provide more
> +information about those demands so that scheduler can do best-effort to
> +honour them.
> +
> + @sched_qos_type what QoS hint to apply
> + @sched_qos_value value of the QoS hint
> + @sched_qos_cookie magic cookie to tag a group of tasks for which the QoS
> + applies. If 0, the hint will apply globally system
> + wide. If not 0, the hint will be relative to tasks that
> + has the same cookie value only.
> +
> +QoS hints are set once and not inherited by children by design. The
> +rationale is that each task has its individual characteristics and it is
> +encouraged to describe each of these separately. Also since system resources
> +are finite, there's a limit to what can be done to honour these requests
> +before reaching a tipping point where there are too many requests for
> +a particular QoS that is impossible to service for all of them at once and
> +some will start to lose out. For example if 10 tasks require better wake
> +up latencies on a 4 CPUs SMP system, then if they all wake up at once, only
> +4 can perceive the hint honoured and the rest will have to wait. Inheritance
> +can lead these 10 to become a 100 or a 1000 more easily, and then the QoS
> +hint will lose its meaning and effectiveness rapidly. The chances of 10
> +tasks waking up at the same time is lower than a 100 and lower than a 1000.
> +
> +To set multiple QoS hints, a syscall is required for each. This is a
> +trade-off to reduce the churn on extending the interface as the hope for
> +this to evolve as workloads and hardware get more sophisticated and the
> +need for extension will arise; and when this happen the task should be
> +simpler to add the kernel extension and allow userspace to use readily by
> +setting the newly added flag without having to update the whole of
> +sched_attr.
So 'type' is effectively meant to be an ephemeral space of hints. A
kernel can, or can not, support this arbitrary set of hints.
If a particular type is supported across two kernels, it is assumed to
be the same -- although its implementation might be different.
Your next patch implements type-0 to be this pelt multiplier thing.
I wonder about discoverability, suppose we create and discard a fair
number of these types, just because. Then how is someone (this
muddle-ware component for example) to discover which set of hints is
supported by the kernel of the day?
I suppose it can go and scan the space, by trying to set hints on itself
or something, but that seems sub-optimal.
next prev parent reply other threads:[~2026-05-11 10:57 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260504020003.71306-1-qyousef@layalina.io>
[not found] ` <20260504020003.71306-9-qyousef@layalina.io>
2026-05-06 20:38 ` [PATCH v2 RFC 08/13] sched/qos: Add a new sched-qos interface Tim Chen
2026-05-07 9:55 ` Qais Yousef
2026-05-07 14:20 ` Chen, Yu C
2026-05-09 9:39 ` Qais Yousef
2026-05-11 10:57 ` Peter Zijlstra [this message]
[not found] ` <20260504020003.71306-10-qyousef@layalina.io>
2026-05-11 11:03 ` [PATCH v2 09/13] sched/qos: Add rampup multiplier QoS Peter Zijlstra
2026-05-11 17:58 ` [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260511105704.GR3126523@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=dietmar.eggemann@arm.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=qyousef@layalina.io \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=tglx@kernel.org \
--cc=tim.c.chen@linux.intel.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox