public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
To: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	John Stultz <jstultz@google.com>,
	linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 11/16] sched/qos: Add rampup multiplier QoS
Date: Wed, 18 Sep 2024 14:21:26 -0700	[thread overview]
Message-ID: <20240918212126.GA11943@ranerica-svr.sc.intel.com> (raw)
In-Reply-To: <20240917214337.GA10143@ranerica-svr.sc.intel.com>

On Tue, Sep 17, 2024 at 02:43:37PM -0700, Ricardo Neri wrote:
> On Tue, Aug 20, 2024 at 05:35:07PM +0100, Qais Yousef wrote:
> > Bursty tasks are hard to predict. To use resources efficiently, the
> > system would like to be exact as much as possible. But this poses
> > a challenge for these bursty tasks that need to get access to more
> > resources quickly.
> > 
> > The new SCHED_QOS_RAMPUP_MULTIPLIER allows userspace to do that. As the
> > name implies, it only helps them to transition to a higher performance
> > state when they get _busier_. That is perfectly periodic tasks by
> > definition are not going through a transition and will run at a constant
> > performance level. It is the tasks that need to transition from one
> > periodic state to another periodic state that is at a higher level that
> > this rampup_multiplier will help with. It also slows down the ewma decay
> > of util_est which should help those bursty tasks to keep their faster
> > rampup.
> > 
> > This should work complimentary with uclamp. uclamp tells the system
> > about min and max perf requirements which can be applied immediately.
> > 
> > rampup_multiplier is about reactiveness of the task to change.
> > Specifically to a change for a higher performance level. The task might
> > necessary need to have a min perf requirements, but it can have sudden
> > burst of changes that require higher perf level and it needs the system
> > to provide this faster.
> > 
> > TODO: update the sched_qos docs
> > 
> > Signed-off-by: Qais Yousef <qyousef@layalina.io>
> > ---
> >  include/linux/sched.h      |  7 ++++
> >  include/uapi/linux/sched.h |  2 ++
> >  kernel/sched/core.c        | 66 ++++++++++++++++++++++++++++++++++++++
> >  kernel/sched/fair.c        |  6 ++--
> >  kernel/sched/syscalls.c    | 38 ++++++++++++++++++++--
> >  5 files changed, 115 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 2e8c5a9ffa76..a30ee43a25fb 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -404,6 +404,11 @@ struct sched_info {
> >  #endif /* CONFIG_SCHED_INFO */
> >  };
> >  
> > +struct sched_qos {
> > +	DECLARE_BITMAP(user_defined, SCHED_QOS_MAX);
> > +	unsigned int rampup_multiplier;
> > +};
> > +
> >  /*
> >   * Integer metrics need fixed point arithmetic, e.g., sched/fair
> >   * has a few: load, load_avg, util_avg, freq, and capacity.
> > @@ -882,6 +887,8 @@ struct task_struct {
> >  
> >  	struct sched_info		sched_info;
> >  
> > +	struct sched_qos		sched_qos;
> > +
> >  	struct list_head		tasks;
> >  #ifdef CONFIG_SMP
> >  	struct plist_node		pushable_tasks;
> > diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> > index 67ef99f64ddc..0baba91ba5b8 100644
> > --- a/include/uapi/linux/sched.h
> > +++ b/include/uapi/linux/sched.h
> > @@ -104,6 +104,8 @@ struct clone_args {
> >  };
> >  
> >  enum sched_qos_type {
> > +	SCHED_QOS_RAMPUP_MULTIPLIER,
> > +	SCHED_QOS_MAX,
> >  };
> >  #endif
> >  
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index c91e6a62c7ab..54faa845cb29 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -152,6 +152,8 @@ __read_mostly int sysctl_resched_latency_warn_once = 1;
> >   */
> >  const_debug unsigned int sysctl_sched_nr_migrate = SCHED_NR_MIGRATE_BREAK;
> >  
> > +unsigned int sysctl_sched_qos_default_rampup_multiplier	= 1;
> > +
> >  __read_mostly int scheduler_running;
> >  
> >  #ifdef CONFIG_SCHED_CORE
> > @@ -4488,6 +4490,47 @@ static int sysctl_schedstats(struct ctl_table *table, int write, void *buffer,
> >  #endif /* CONFIG_SCHEDSTATS */
> >  
> >  #ifdef CONFIG_SYSCTL
> > +static void sched_qos_sync_sysctl(void)
> > +{
> > +	struct task_struct *g, *p;
> > +
> > +	guard(rcu)();
> > +	for_each_process_thread(g, p) {
> > +		struct rq_flags rf;
> > +		struct rq *rq;
> > +
> > +		rq = task_rq_lock(p, &rf);
> > +		if (!test_bit(SCHED_QOS_RAMPUP_MULTIPLIER, p->sched_qos.user_defined))
> > +			p->sched_qos.rampup_multiplier = sysctl_sched_qos_default_rampup_multiplier;
> > +		task_rq_unlock(rq, p, &rf);
> > +	}
> > +}
> > +
> > +static int sysctl_sched_qos_handler(struct ctl_table *table, int write,
> > +				    void *buffer, size_t *lenp, loff_t *ppos)
> > +{
> > +	unsigned int old_rampup_mult;
> > +	int result;
> > +
> > +	old_rampup_mult = sysctl_sched_qos_default_rampup_multiplier;
> > +
> > +	result = proc_dointvec(table, write, buffer, lenp, ppos);
> > +	if (result)
> > +		goto undo;
> > +	if (!write)
> > +		return 0;
> > +
> > +	if (old_rampup_mult != sysctl_sched_qos_default_rampup_multiplier) {
> > +		sched_qos_sync_sysctl();
> > +	}
> > +
> > +	return 0;
> > +
> > +undo:
> > +	sysctl_sched_qos_default_rampup_multiplier = old_rampup_mult;
> > +	return result;
> > +}
> > +
> >  static struct ctl_table sched_core_sysctls[] = {
> >  #ifdef CONFIG_SCHEDSTATS
> >  	{
> > @@ -4534,6 +4577,13 @@ static struct ctl_table sched_core_sysctls[] = {
> >  		.extra2		= SYSCTL_FOUR,
> >  	},
> >  #endif /* CONFIG_NUMA_BALANCING */
> > +	{
> > +		.procname	= "sched_qos_default_rampup_multiplier",
> > +		.data           = &sysctl_sched_qos_default_rampup_multiplier,
> > +		.maxlen         = sizeof(unsigned int),
> 
> IIUC, user space needs to select a value between 0 and (2^32 - 1). Does
> this mean that it will need fine-tuning for each product and application?
> 
> Could there be some translation to a fewer number of QoS levels that are
> qualitatively?
> 
> Also, I think about Intel processors. They work with hardware-controlled
> performance scaling. The proposed interface would help us to communicate
> per-task multipliers to hardware, but they would be used as hints to
> hardware and not acted upon by the kernel to scale frequency.

Also, as discussed during LPC 2024 it might be good to have an interface
that is compatible with other operating systems. They have qualitative
descriptions of QoS levels (see Len Brown's LPC 2022 presentation [1]).

It can be this hint or a new one.

[1]. https://lpc.events/event/16/contributions/1276/attachments/1070/2039/Brown-Shankar%20LPC%202022.09.13%20Sched%20QOS%20API.pdf

  reply	other threads:[~2024-09-18 21:15 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-20 16:34 [RFC PATCH 00/16] sched/fair/schedutil: Better manage system response time Qais Yousef
2024-08-20 16:34 ` [RFC PATCH 01/16] sched: cpufreq: Rename map_util_perf to sugov_apply_dvfs_headroom Qais Yousef
2024-08-20 16:34 ` [RFC PATCH 02/16] sched/pelt: Add a new function to approximate the future util_avg value Qais Yousef
2024-08-20 16:34 ` [RFC PATCH 03/16] sched/pelt: Add a new function to approximate runtime to reach given util Qais Yousef
2024-08-22  5:36   ` Sultan Alsawaf (unemployed)
2024-09-16 15:31     ` Christian Loehle
2024-08-20 16:35 ` [RFC PATCH 04/16] sched/fair: Remove magic hardcoded margin in fits_capacity() Qais Yousef
2024-08-22  5:09   ` Sultan Alsawaf (unemployed)
2024-09-17 19:41     ` Dietmar Eggemann
2024-08-20 16:35 ` [RFC PATCH 05/16] sched: cpufreq: Remove magic 1.25 headroom from sugov_apply_dvfs_headroom() Qais Yousef
2024-11-13  4:51   ` John Stultz
2024-08-20 16:35 ` [RFC PATCH 06/16] sched/schedutil: Add a new tunable to dictate response time Qais Yousef
2024-09-16 22:22   ` Dietmar Eggemann
2024-09-17 10:22     ` Christian Loehle
2024-08-20 16:35 ` [RFC PATCH 07/16] sched/pelt: Introduce PELT multiplier boot time parameter Qais Yousef
2024-08-20 16:35 ` [RFC PATCH 08/16] sched/fair: Extend util_est to improve rampup time Qais Yousef
2024-09-17 19:21   ` Dietmar Eggemann
2024-10-14 16:04   ` Christian Loehle
2024-08-20 16:35 ` [RFC PATCH 09/16] sched/fair: util_est: Take into account periodic tasks Qais Yousef
2024-11-13  4:57   ` John Stultz
2024-08-20 16:35 ` [RFC PATCH 10/16] sched/qos: Add a new sched-qos interface Qais Yousef
2024-11-28  1:47   ` John Stultz
2024-08-20 16:35 ` [RFC PATCH 11/16] sched/qos: Add rampup multiplier QoS Qais Yousef
2024-09-17 20:09   ` Dietmar Eggemann
2024-09-17 21:43   ` Ricardo Neri
2024-09-18 21:21     ` Ricardo Neri [this message]
2024-10-14 16:06   ` Christian Loehle
2024-11-28  0:12   ` John Stultz
2024-08-20 16:35 ` [RFC PATCH 12/16] sched/pelt: Add new waiting_avg to record when runnable && !running Qais Yousef
2024-09-18  7:01   ` Dietmar Eggemann
2024-08-20 16:35 ` [RFC PATCH 13/16] sched/schedutil: Take into account waiting_avg in apply_dvfs_headroom Qais Yousef
2024-08-20 16:35 ` [RFC PATCH 14/16] sched/schedutil: Ignore dvfs headroom when util is decaying Qais Yousef
2024-08-22  5:29   ` Sultan Alsawaf (unemployed)
2024-09-18 10:40   ` Christian Loehle
2024-08-20 16:35 ` [RFC PATCH 15/16] sched/fair: Enable disabling util_est via rampup_multiplier Qais Yousef
2024-08-20 16:35 ` [RFC PATCH 16/16] sched/fair: Don't mess with util_avg post init Qais Yousef
2024-09-16 12:21 ` [RFC PATCH 00/16] sched/fair/schedutil: Better manage system response time Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240918212126.GA11943@ranerica-svr.sc.intel.com \
    --to=ricardo.neri-calderon@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=jstultz@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox