From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
To: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Viresh Kumar <viresh.kumar@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
John Stultz <jstultz@google.com>,
linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 11/16] sched/qos: Add rampup multiplier QoS
Date: Tue, 17 Sep 2024 14:43:37 -0700 [thread overview]
Message-ID: <20240917214337.GA10143@ranerica-svr.sc.intel.com> (raw)
In-Reply-To: <20240820163512.1096301-12-qyousef@layalina.io>
On Tue, Aug 20, 2024 at 05:35:07PM +0100, Qais Yousef wrote:
> Bursty tasks are hard to predict. To use resources efficiently, the
> system would like to be exact as much as possible. But this poses
> a challenge for these bursty tasks that need to get access to more
> resources quickly.
>
> The new SCHED_QOS_RAMPUP_MULTIPLIER allows userspace to do that. As the
> name implies, it only helps them to transition to a higher performance
> state when they get _busier_. That is perfectly periodic tasks by
> definition are not going through a transition and will run at a constant
> performance level. It is the tasks that need to transition from one
> periodic state to another periodic state that is at a higher level that
> this rampup_multiplier will help with. It also slows down the ewma decay
> of util_est which should help those bursty tasks to keep their faster
> rampup.
>
> This should work complimentary with uclamp. uclamp tells the system
> about min and max perf requirements which can be applied immediately.
>
> rampup_multiplier is about reactiveness of the task to change.
> Specifically to a change for a higher performance level. The task might
> necessary need to have a min perf requirements, but it can have sudden
> burst of changes that require higher perf level and it needs the system
> to provide this faster.
>
> TODO: update the sched_qos docs
>
> Signed-off-by: Qais Yousef <qyousef@layalina.io>
> ---
> include/linux/sched.h | 7 ++++
> include/uapi/linux/sched.h | 2 ++
> kernel/sched/core.c | 66 ++++++++++++++++++++++++++++++++++++++
> kernel/sched/fair.c | 6 ++--
> kernel/sched/syscalls.c | 38 ++++++++++++++++++++--
> 5 files changed, 115 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2e8c5a9ffa76..a30ee43a25fb 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -404,6 +404,11 @@ struct sched_info {
> #endif /* CONFIG_SCHED_INFO */
> };
>
> +struct sched_qos {
> + DECLARE_BITMAP(user_defined, SCHED_QOS_MAX);
> + unsigned int rampup_multiplier;
> +};
> +
> /*
> * Integer metrics need fixed point arithmetic, e.g., sched/fair
> * has a few: load, load_avg, util_avg, freq, and capacity.
> @@ -882,6 +887,8 @@ struct task_struct {
>
> struct sched_info sched_info;
>
> + struct sched_qos sched_qos;
> +
> struct list_head tasks;
> #ifdef CONFIG_SMP
> struct plist_node pushable_tasks;
> diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> index 67ef99f64ddc..0baba91ba5b8 100644
> --- a/include/uapi/linux/sched.h
> +++ b/include/uapi/linux/sched.h
> @@ -104,6 +104,8 @@ struct clone_args {
> };
>
> enum sched_qos_type {
> + SCHED_QOS_RAMPUP_MULTIPLIER,
> + SCHED_QOS_MAX,
> };
> #endif
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index c91e6a62c7ab..54faa845cb29 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -152,6 +152,8 @@ __read_mostly int sysctl_resched_latency_warn_once = 1;
> */
> const_debug unsigned int sysctl_sched_nr_migrate = SCHED_NR_MIGRATE_BREAK;
>
> +unsigned int sysctl_sched_qos_default_rampup_multiplier = 1;
> +
> __read_mostly int scheduler_running;
>
> #ifdef CONFIG_SCHED_CORE
> @@ -4488,6 +4490,47 @@ static int sysctl_schedstats(struct ctl_table *table, int write, void *buffer,
> #endif /* CONFIG_SCHEDSTATS */
>
> #ifdef CONFIG_SYSCTL
> +static void sched_qos_sync_sysctl(void)
> +{
> + struct task_struct *g, *p;
> +
> + guard(rcu)();
> + for_each_process_thread(g, p) {
> + struct rq_flags rf;
> + struct rq *rq;
> +
> + rq = task_rq_lock(p, &rf);
> + if (!test_bit(SCHED_QOS_RAMPUP_MULTIPLIER, p->sched_qos.user_defined))
> + p->sched_qos.rampup_multiplier = sysctl_sched_qos_default_rampup_multiplier;
> + task_rq_unlock(rq, p, &rf);
> + }
> +}
> +
> +static int sysctl_sched_qos_handler(struct ctl_table *table, int write,
> + void *buffer, size_t *lenp, loff_t *ppos)
> +{
> + unsigned int old_rampup_mult;
> + int result;
> +
> + old_rampup_mult = sysctl_sched_qos_default_rampup_multiplier;
> +
> + result = proc_dointvec(table, write, buffer, lenp, ppos);
> + if (result)
> + goto undo;
> + if (!write)
> + return 0;
> +
> + if (old_rampup_mult != sysctl_sched_qos_default_rampup_multiplier) {
> + sched_qos_sync_sysctl();
> + }
> +
> + return 0;
> +
> +undo:
> + sysctl_sched_qos_default_rampup_multiplier = old_rampup_mult;
> + return result;
> +}
> +
> static struct ctl_table sched_core_sysctls[] = {
> #ifdef CONFIG_SCHEDSTATS
> {
> @@ -4534,6 +4577,13 @@ static struct ctl_table sched_core_sysctls[] = {
> .extra2 = SYSCTL_FOUR,
> },
> #endif /* CONFIG_NUMA_BALANCING */
> + {
> + .procname = "sched_qos_default_rampup_multiplier",
> + .data = &sysctl_sched_qos_default_rampup_multiplier,
> + .maxlen = sizeof(unsigned int),
IIUC, user space needs to select a value between 0 and (2^32 - 1). Does
this mean that it will need fine-tuning for each product and application?
Could there be some translation to a fewer number of QoS levels that are
qualitatively?
Also, I think about Intel processors. They work with hardware-controlled
performance scaling. The proposed interface would help us to communicate
per-task multipliers to hardware, but they would be used as hints to
hardware and not acted upon by the kernel to scale frequency.
next prev parent reply other threads:[~2024-09-17 21:37 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-20 16:34 [RFC PATCH 00/16] sched/fair/schedutil: Better manage system response time Qais Yousef
2024-08-20 16:34 ` [RFC PATCH 01/16] sched: cpufreq: Rename map_util_perf to sugov_apply_dvfs_headroom Qais Yousef
2024-08-20 16:34 ` [RFC PATCH 02/16] sched/pelt: Add a new function to approximate the future util_avg value Qais Yousef
2024-08-20 16:34 ` [RFC PATCH 03/16] sched/pelt: Add a new function to approximate runtime to reach given util Qais Yousef
2024-08-22 5:36 ` Sultan Alsawaf (unemployed)
2024-09-16 15:31 ` Christian Loehle
2024-08-20 16:35 ` [RFC PATCH 04/16] sched/fair: Remove magic hardcoded margin in fits_capacity() Qais Yousef
2024-08-22 5:09 ` Sultan Alsawaf (unemployed)
2024-09-17 19:41 ` Dietmar Eggemann
2024-08-20 16:35 ` [RFC PATCH 05/16] sched: cpufreq: Remove magic 1.25 headroom from sugov_apply_dvfs_headroom() Qais Yousef
2024-11-13 4:51 ` John Stultz
2024-08-20 16:35 ` [RFC PATCH 06/16] sched/schedutil: Add a new tunable to dictate response time Qais Yousef
2024-09-16 22:22 ` Dietmar Eggemann
2024-09-17 10:22 ` Christian Loehle
2024-08-20 16:35 ` [RFC PATCH 07/16] sched/pelt: Introduce PELT multiplier boot time parameter Qais Yousef
2024-08-20 16:35 ` [RFC PATCH 08/16] sched/fair: Extend util_est to improve rampup time Qais Yousef
2024-09-17 19:21 ` Dietmar Eggemann
2024-10-14 16:04 ` Christian Loehle
2024-08-20 16:35 ` [RFC PATCH 09/16] sched/fair: util_est: Take into account periodic tasks Qais Yousef
2024-11-13 4:57 ` John Stultz
2024-08-20 16:35 ` [RFC PATCH 10/16] sched/qos: Add a new sched-qos interface Qais Yousef
2024-11-28 1:47 ` John Stultz
2024-08-20 16:35 ` [RFC PATCH 11/16] sched/qos: Add rampup multiplier QoS Qais Yousef
2024-09-17 20:09 ` Dietmar Eggemann
2024-09-17 21:43 ` Ricardo Neri [this message]
2024-09-18 21:21 ` Ricardo Neri
2024-10-14 16:06 ` Christian Loehle
2024-11-28 0:12 ` John Stultz
2024-08-20 16:35 ` [RFC PATCH 12/16] sched/pelt: Add new waiting_avg to record when runnable && !running Qais Yousef
2024-09-18 7:01 ` Dietmar Eggemann
2024-08-20 16:35 ` [RFC PATCH 13/16] sched/schedutil: Take into account waiting_avg in apply_dvfs_headroom Qais Yousef
2024-08-20 16:35 ` [RFC PATCH 14/16] sched/schedutil: Ignore dvfs headroom when util is decaying Qais Yousef
2024-08-22 5:29 ` Sultan Alsawaf (unemployed)
2024-09-18 10:40 ` Christian Loehle
2024-08-20 16:35 ` [RFC PATCH 15/16] sched/fair: Enable disabling util_est via rampup_multiplier Qais Yousef
2024-08-20 16:35 ` [RFC PATCH 16/16] sched/fair: Don't mess with util_avg post init Qais Yousef
2024-09-16 12:21 ` [RFC PATCH 00/16] sched/fair/schedutil: Better manage system response time Dietmar Eggemann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240917214337.GA10143@ranerica-svr.sc.intel.com \
--to=ricardo.neri-calderon@linux.intel.com \
--cc=dietmar.eggemann@arm.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=qyousef@layalina.io \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox