From: Patrick Bellasi <patrick.bellasi@arm.com>
To: linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Steve Muckle <steve.muckle@linaro.org>,
Leo Yan <leo.yan@linaro.org>,
Viresh Kumar <viresh.kumar@linaro.org>,
"Rafael J . Wysocki" <rjw@rjwysocki.net>,
Todd Kjos <tkjos@google.com>,
Srinath Sridharan <srinathsr@google.com>,
Andres Oportus <andresoportus@google.com>,
Juri Lelli <juri.lelli@arm.com>,
Morten Rasmussen <morten.rasmussen@arm.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Chris Redpath <chris.redpath@arm.com>,
Robin Randhawa <robin.randhawa@arm.com>,
Patrick Bellasi <patrick.bellasi@arm.com>,
Ingo Molnar <mingo@redhat.com>
Subject: [RFC v2 2/8] sched/tune: add sysctl interface to define a boost value
Date: Thu, 27 Oct 2016 18:41:02 +0100 [thread overview]
Message-ID: <20161027174108.31139-3-patrick.bellasi@arm.com> (raw)
In-Reply-To: <20161027174108.31139-1-patrick.bellasi@arm.com>
The current (CFS) scheduler implementation does not allow "to boost"
tasks performance by running them at a higher OPP compared to the
minimum required to meet their workload demands.
To support tasks performance boosting the scheduler should provide a
"knob" which allows to tune how much the system is going to be optimised
for energy efficiency vs performance boosting.
It's worth to notice that by energy-efficiency we mean running a CPU at
the minimum OPP which satisfy its utilization while for performance
boosting we mean running a task as fast as possible.
This patch is the first of a series which provides a simple interface to
define a tuning knob. One system-wide "boost" tunable is exposed via:
/proc/sys/kernel/sched_cfs_boost
which can be configured in the range [0..100], to define a percentage
where:
0% boost requires to operate in "standard" mode by scheduling
tasks at the minimum capacities required by the workload demand
100% boost requires to push at maximum the task performances,
"regardless" of the incurred energy consumption
A boost value in between these two boundaries is used to bias the
power/performance trade-off, the higher the boost value the more the
scheduler is biased toward performance boosting instead of energy
efficiency.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
---
include/linux/sched/sysctl.h | 16 ++++++++++++++++
init/Kconfig | 31 +++++++++++++++++++++++++++++++
kernel/sched/Makefile | 1 +
kernel/sched/tune.c | 23 +++++++++++++++++++++++
kernel/sysctl.c | 11 +++++++++++
5 files changed, 82 insertions(+)
create mode 100644 kernel/sched/tune.c
diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index 4411453..5bfbb14 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -55,6 +55,22 @@ extern int sysctl_sched_rt_runtime;
extern unsigned int sysctl_sched_cfs_bandwidth_slice;
#endif
+#ifdef CONFIG_SCHED_TUNE
+extern unsigned int sysctl_sched_cfs_boost;
+int sysctl_sched_cfs_boost_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *length,
+ loff_t *ppos);
+static inline unsigned int get_sysctl_sched_cfs_boost(void)
+{
+ return sysctl_sched_cfs_boost;
+}
+#else
+static inline unsigned int get_sysctl_sched_cfs_boost(void)
+{
+ return 0;
+}
+#endif
+
#ifdef CONFIG_SCHED_AUTOGROUP
extern unsigned int sysctl_sched_autogroup_enabled;
#endif
diff --git a/init/Kconfig b/init/Kconfig
index 34407f1..461e052 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1248,6 +1248,37 @@ config SCHED_AUTOGROUP
desktop applications. Task group autogeneration is currently based
upon task session.
+config SCHED_TUNE
+ bool "Boosting for CFS tasks (EXPERIMENTAL)"
+ depends on SMP
+ help
+ This option enables the system-wide support for task boosting.
+ When this support is enabled a new sysctl interface is exposed to
+ user-space via:
+ /proc/sys/kernel/sched_cfs_boost
+ which allows to set a system-wide boost value in range [0..100].
+
+ The currently boosting strategy is implemented in such a way that:
+ - a 0% boost value requires to operate in "standard" mode by
+ scheduling all tasks at the minimum capacities required by their
+ workload demand
+ - a 100% boost value requires to push at maximum the task
+ performances, "regardless" of the incurred energy consumption
+
+ A boost value in between these two boundaries is used to bias the
+ power/performance trade-off, the higher the boost value the more the
+ scheduler is biased toward performance boosting instead of energy
+ efficiency.
+
+ Since this support exposes a single system-wide knob, the specified
+ boost value is applied to all (CFS) tasks in the system.
+
+ NOTE: SchedTune support is available only on SMP system since only
+ for those systems is currently defined and tracked the utilization
+ signal for RQs and SEs.
+
+ If unsure, say N.
+
config SYSFS_DEPRECATED
bool "Enable deprecated sysfs features to support old userspace tools"
depends on SYSFS
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 5e59b83..26ab2a6 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o
obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
obj-$(CONFIG_SCHEDSTATS) += stats.o
obj-$(CONFIG_SCHED_DEBUG) += debug.o
+obj-$(CONFIG_SCHED_TUNE) += tune.o
obj-$(CONFIG_CGROUP_CPUACCT) += cpuacct.o
obj-$(CONFIG_CPU_FREQ) += cpufreq.o
obj-$(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) += cpufreq_schedutil.o
diff --git a/kernel/sched/tune.c b/kernel/sched/tune.c
new file mode 100644
index 0000000..7336118
--- /dev/null
+++ b/kernel/sched/tune.c
@@ -0,0 +1,23 @@
+/*
+ * Scheduler Tunability (SchedTune) Extensions for CFS
+ *
+ * Copyright (C) 2016 ARM Ltd, Patrick Bellasi <patrick.bellasi@arm.com>
+ */
+
+#include "sched.h"
+
+unsigned int sysctl_sched_cfs_boost __read_mostly;
+
+int
+sysctl_sched_cfs_boost_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp,
+ loff_t *ppos)
+{
+ int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+ if (ret || !write)
+ return ret;
+
+ return 0;
+}
+
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 739fb17..43b6d14 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -442,6 +442,17 @@ static struct ctl_table kern_table[] = {
.extra1 = &one,
},
#endif
+#ifdef CONFIG_SCHED_TUNE
+ {
+ .procname = "sched_cfs_boost",
+ .data = &sysctl_sched_cfs_boost,
+ .maxlen = sizeof(sysctl_sched_cfs_boost),
+ .mode = 0644,
+ .proc_handler = &sysctl_sched_cfs_boost_handler,
+ .extra1 = &zero,
+ .extra2 = &one_hundred,
+ },
+#endif
#ifdef CONFIG_PROVE_LOCKING
{
.procname = "prove_locking",
--
2.10.1
next prev parent reply other threads:[~2016-10-27 17:41 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-27 17:41 [RFC v2 0/8] SchedTune: central, scheduler-driven, power-perfomance control Patrick Bellasi
2016-10-27 17:41 ` [RFC v2 1/8] sched/tune: add detailed documentation Patrick Bellasi
2016-11-04 9:46 ` Viresh Kumar
2016-11-08 10:53 ` Patrick Bellasi
2016-10-27 17:41 ` Patrick Bellasi [this message]
2016-10-27 17:41 ` [RFC v2 3/8] sched/fair: add function to convert boost value into "margin" Patrick Bellasi
2016-10-27 17:41 ` [RFC v2 4/8] sched/fair: add boosted CPU usage Patrick Bellasi
2016-10-27 17:41 ` [RFC v2 5/8] sched/tune: add initial support for CGroups based boosting Patrick Bellasi
2016-10-27 18:30 ` Tejun Heo
2016-10-27 20:14 ` Patrick Bellasi
2016-10-27 20:39 ` Tejun Heo
2016-10-27 22:34 ` Patrick Bellasi
2016-10-27 17:41 ` [RFC v2 6/8] sched/tune: compute and keep track of per CPU boost value Patrick Bellasi
2016-10-27 17:41 ` [RFC v2 7/8] sched/{fair,tune}: track RUNNABLE tasks impact on " Patrick Bellasi
2016-10-27 17:41 ` [RFC v2 8/8] sched/{fair,tune}: add support for negative boosting Patrick Bellasi
2016-10-27 20:58 ` [RFC v2 0/8] SchedTune: central, scheduler-driven, power-perfomance control Peter Zijlstra
2016-10-28 10:49 ` Patrick Bellasi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161027174108.31139-3-patrick.bellasi@arm.com \
--to=patrick.bellasi@arm.com \
--cc=andresoportus@google.com \
--cc=chris.redpath@arm.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@arm.com \
--cc=leo.yan@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=morten.rasmussen@arm.com \
--cc=peterz@infradead.org \
--cc=rjw@rjwysocki.net \
--cc=robin.randhawa@arm.com \
--cc=srinathsr@google.com \
--cc=steve.muckle@linaro.org \
--cc=tkjos@google.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.