Re: [PATCH v2 RFC 08/13] sched/qos: Add a new sched-qos interface

Linux Power Management development
 help / color / mirror / Atom feed

From: "Chen, Yu C" <yu.c.chen@intel.com>
To: Qais Yousef <qyousef@layalina.io>, Tim Chen <tim.c.chen@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	John Stultz <jstultz@google.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	"Thomas Gleixner" <tglx@kernel.org>,
	<linux-kernel@vger.kernel.org>, <linux-pm@vger.kernel.org>,
	Vern Hao <vernhao@tencent.com>, Vern Hao <haoxing990@gmail.com>
Subject: Re: [PATCH v2 RFC 08/13] sched/qos: Add a new sched-qos interface
Date: Thu, 7 May 2026 22:20:52 +0800	[thread overview]
Message-ID: <615dfcf8-31da-4e65-8964-c39022b5a1b2@intel.com> (raw)
In-Reply-To: <20260507095516.vv7blulzskkyezin@airbuntu>

On 5/7/2026 5:55 PM, Qais Yousef wrote:
> On 05/06/26 13:38, Tim Chen wrote:
>> On Mon, 2026-05-04 at 02:59 +0100, Qais Yousef wrote:

[ ... ]

> The idea is that the cookie is per QOS per process. So QOS_TYPE_A would have
> its unique cookie range, and QOS_TYPE_B would have its independent unique
> cookie range. To allow flexibility and extensibilty to describe independent
> behavior that requires independent grouping.
> 

 From a user point of view, I can think of the following use cases for 
fine-grained
cache-aware scheduling:

u1. A user wants to enable or disable cache-aware scheduling for all
     threads of a process. (No extra tagging is needed.)
u2. A user wants to enable or disable cache-aware scheduling for all
     tasks within a cgroup. (No extra tagging is needed.) Vern from
     Tencent was advocating for this model.
u3. A user wants to enable or disable cache-aware scheduling for an
     arbitrary set of tasks. (Userspace tagging is required.)

If I understand correctly, u3 is exactly the use case where schedqos
cookie can help. Under your design, we cannot tag an arbitrary set of
tasks with the same cookie; we are only allowed to assign the same cookie
to threads within the same process and under the same QoS type. So
this might eliminate the case where different processes share data
with each other that we want to aggregate(NUMA balancing's numa_group
is an indicator of tasks sharing data)

>>
>> We probably need a sched_qos_cookie structure defined analogous to
>> the sched_core_cookie to anchor the tasks.  And sched_qos_cookie could be a ptr value
>> to sched_qos_cookie, as in sched_core_cookie instead of it being a __u32
>> as in the patch below.
> 
> As part of the API or internal implementation detail? I think we do need
> a cookie structure that stores the sched_qos_type and sched_qos_cookie tuple
> internally as implementation detail. But not expose it as an interface.
> 

Yes, I think Tim was referring to the internal implementation. We need
a pointer to link tasks to their shared sched_qos_cookie.

> I think the cookie values should be userspace managed. From experience, this
> has to be done in a centralized way via a service otherwise you'd end up with
> a mess. There has to be an all knowledgeable entity managing things, which is
> what I am proposing in schedqos service. That's why the whole QOS now is
> protected with CAP_NICE capability - which I forgot to mention this change from
> v1.
> 

Not sure why we do not leverage the OS to allocate and manage cookies.
The OS has full visibility of system-wide information and can maintain 
globally
unique cookies. Users only need to request the OS to allocate, attach, 
or detach
tasks to an existing group without supplying an explicit cookie value.
One possible reason I can think of: since the schedqos cookie is defined 
per QoS
type and per process,it may be more convenient to manage it entirely 
within the
schedqos service？

thanks,
Chenyu

next prev parent reply	other threads:[~2026-05-07 14:21 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-04  1:59 [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time Qais Yousef
2026-05-04  1:59 ` [PATCH v2 01/13] sched: cpufreq: Rename map_util_perf to sugov_apply_dvfs_headroom Qais Yousef
2026-05-04  1:59 ` [PATCH v2 02/13] sched/pelt: Add a new function to approximate the future util_avg value Qais Yousef
2026-05-04  1:59 ` [PATCH v2 03/13] sched/pelt: Add a new function to approximate runtime to reach given util Qais Yousef
2026-05-04  1:59 ` [PATCH v2 04/13] sched/fair: Remove magic hardcoded margin in fits_capacity() Qais Yousef
2026-05-04  1:59 ` [PATCH v2 05/13] sched: cpufreq: Remove magic 1.25 headroom from sugov_apply_dvfs_headroom() Qais Yousef
2026-05-04  1:59 ` [PATCH v2 06/13] sched/fair: Extend util_est to improve rampup time Qais Yousef
2026-05-04  1:59 ` [PATCH v2 07/13] sched/fair: util_est: Take into account periodic tasks Qais Yousef
2026-05-04  1:59 ` [PATCH v2 RFC 08/13] sched/qos: Add a new sched-qos interface Qais Yousef
2026-05-06 20:38   ` Tim Chen
2026-05-07  9:55     ` Qais Yousef
2026-05-07 14:20       ` Chen, Yu C [this message]
2026-05-09  9:39         ` Qais Yousef
2026-05-11 10:57   ` Peter Zijlstra
2026-05-04  1:59 ` [PATCH v2 09/13] sched/qos: Add rampup multiplier QoS Qais Yousef
2026-05-11 11:03   ` Peter Zijlstra
2026-05-04  2:00 ` [PATCH v2 10/13] sched/fair: Disable util_est when rampup_multiplier is 0 Qais Yousef
2026-05-04  2:00 ` [PATCH v2 11/13] sched/fair: Don't mess with util_avg post init Qais Yousef
2026-05-04  2:00 ` [PATCH v2 12/13] sched/fair: Call update_util_est() after dequeue_entities() Qais Yousef
2026-05-04  2:00 ` [PATCH v2 RFC 13/13] sched/pelt: Always allow load updates Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=615dfcf8-31da-4e65-8964-c39022b5a1b2@intel.com \
    --to=yu.c.chen@intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=haoxing990@gmail.com \
    --cc=jstultz@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@kernel.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vernhao@tencent.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox