[PATCH v3 0/4] scheduler-driven cpu frequency selection

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/4] scheduler-driven cpu frequency selection
@ 2015-06-26 23:53 Michael Turquette
  2015-06-26 23:53 ` [PATCH v3 1/4] arm: Frequency invariant scheduler load-tracking support Michael Turquette
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Michael Turquette @ 2015-06-26 23:53 UTC (permalink / raw)
  To: peterz, mingo
  Cc: linux-kernel, preeti, Morten.Rasmussen, riel, efault,
	nicolas.pitre, daniel.lezcano, dietmar.eggemann, vincent.guittot,
	amit.kucheria, juri.lelli, rjw, viresh.kumar, ashwin.chaugule,
	alex.shi, linux-pm, abelvesa, pebolle, Michael Turquette

This series addresses the comments from v2 and rearranges the code to
separate the bits that are safe to merge from the bits that are not. In
particular the simplified governor no longer implements any policy of
its own. That code is migrated to fair.c and marked as RFC. The
capacity-selection code in fair.c (patch #4) is a placeholder and can be
replaced by something more sophisticated, and it illustrates the use of
the new governor api's for anyone that wants to play around with
crafting a new policy.

Patch #1 is a re-post from Morten, as it is the only dependency these
patches have on his EAS series. Please consider merging patches #2 and
#3 if they do not appear too controversial. Without enabling the feature
in Kconfig will be no impact on existing code.

Michael Turquette (3):
  cpufreq: introduce cpufreq_driver_might_sleep
  sched: scheduler-driven cpu frequency selection
  [RFC] sched: cfs: cpu frequency scaling policy

Morten Rasmussen (1):
  arm: Frequency invariant scheduler load-tracking support

 arch/arm/include/asm/topology.h |   7 +
 arch/arm/kernel/smp.c           |  53 ++++++-
 arch/arm/kernel/topology.c      |  17 +++
 drivers/cpufreq/Kconfig         |  24 ++++
 drivers/cpufreq/cpufreq.c       |   6 +
 include/linux/cpufreq.h         |  12 ++
 kernel/sched/Makefile           |   1 +
 kernel/sched/cpufreq_sched.c    | 308 ++++++++++++++++++++++++++++++++++++++++
 kernel/sched/fair.c             |  41 ++++++
 kernel/sched/sched.h            |   8 ++
 10 files changed, 475 insertions(+), 2 deletions(-)
 create mode 100644 kernel/sched/cpufreq_sched.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/4] arm: Frequency invariant scheduler load-tracking support
  2015-06-26 23:53 [PATCH v3 0/4] scheduler-driven cpu frequency selection Michael Turquette
@ 2015-06-26 23:53 ` Michael Turquette
  2015-06-26 23:53 ` [PATCH v3 2/4] cpufreq: introduce cpufreq_driver_might_sleep Michael Turquette
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Michael Turquette @ 2015-06-26 23:53 UTC (permalink / raw)
  To: peterz, mingo
  Cc: linux-kernel, preeti, Morten.Rasmussen, riel, efault,
	nicolas.pitre, daniel.lezcano, dietmar.eggemann, vincent.guittot,
	amit.kucheria, juri.lelli, rjw, viresh.kumar, ashwin.chaugule,
	alex.shi, linux-pm, abelvesa, pebolle, Russell King,
	Morten Rasmussen, Michael Turquette

From: Morten Rasmussen <Morten.Rasmussen@arm.com>

Implements arch-specific function to provide the scheduler with a
frequency scaling correction factor for more accurate load-tracking. The
factor is:

	current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu)

This implementation only provides frequency invariance. No
micro-architecture invariance yet.

Cc: Russell King <linux@arm.linux.org.uk>

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
---
 arch/arm/include/asm/topology.h |  7 ++++++
 arch/arm/kernel/smp.c           | 53 +++++++++++++++++++++++++++++++++++++++--
 arch/arm/kernel/topology.c      | 17 +++++++++++++
 3 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
index 2fe85ff..4b985dc 100644
--- a/arch/arm/include/asm/topology.h
+++ b/arch/arm/include/asm/topology.h
@@ -24,6 +24,13 @@ void init_cpu_topology(void);
 void store_cpu_topology(unsigned int cpuid);
 const struct cpumask *cpu_coregroup_mask(int cpu);
 
+#define arch_scale_freq_capacity arm_arch_scale_freq_capacity
+struct sched_domain;
+extern
+unsigned long arm_arch_scale_freq_capacity(struct sched_domain *sd, int cpu);
+
+DECLARE_PER_CPU(atomic_long_t, cpu_freq_capacity);
+
 #else
 
 static inline void init_cpu_topology(void) { }
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 86ef244..297ce1b 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -672,12 +672,34 @@ static DEFINE_PER_CPU(unsigned long, l_p_j_ref);
 static DEFINE_PER_CPU(unsigned long, l_p_j_ref_freq);
 static unsigned long global_l_p_j_ref;
 static unsigned long global_l_p_j_ref_freq;
+static DEFINE_PER_CPU(atomic_long_t, cpu_max_freq);
+DEFINE_PER_CPU(atomic_long_t, cpu_freq_capacity);
+
+/*
+ * Scheduler load-tracking scale-invariance
+ *
+ * Provides the scheduler with a scale-invariance correction factor that
+ * compensates for frequency scaling through arch_scale_freq_capacity()
+ * (implemented in topology.c).
+ */
+static inline
+void scale_freq_capacity(int cpu, unsigned long curr, unsigned long max)
+{
+	unsigned long capacity;
+
+	if (!max)
+		return;
+
+	capacity = (curr << SCHED_CAPACITY_SHIFT) / max;
+	atomic_long_set(&per_cpu(cpu_freq_capacity, cpu), capacity);
+}
 
 static int cpufreq_callback(struct notifier_block *nb,
 					unsigned long val, void *data)
 {
 	struct cpufreq_freqs *freq = data;
 	int cpu = freq->cpu;
+	unsigned long max = atomic_long_read(&per_cpu(cpu_max_freq, cpu));
 
 	if (freq->flags & CPUFREQ_CONST_LOOPS)
 		return NOTIFY_OK;
@@ -702,6 +724,9 @@ static int cpufreq_callback(struct notifier_block *nb,
 					per_cpu(l_p_j_ref_freq, cpu),
 					freq->new);
 	}
+
+	scale_freq_capacity(cpu, freq->new, max);
+
 	return NOTIFY_OK;
 }
 
@@ -709,11 +734,35 @@ static struct notifier_block cpufreq_notifier = {
 	.notifier_call  = cpufreq_callback,
 };
 
+static int cpufreq_policy_callback(struct notifier_block *nb,
+						unsigned long val, void *data)
+{
+	struct cpufreq_policy *policy = data;
+	int i;
+
+	for_each_cpu(i, policy->cpus) {
+		scale_freq_capacity(i, policy->cur, policy->max);
+		atomic_long_set(&per_cpu(cpu_max_freq, i), policy->max);
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block cpufreq_policy_notifier = {
+	.notifier_call	= cpufreq_policy_callback,
+};
+
 static int __init register_cpufreq_notifier(void)
 {
-	return cpufreq_register_notifier(&cpufreq_notifier,
+	int ret;
+
+	ret = cpufreq_register_notifier(&cpufreq_notifier,
 						CPUFREQ_TRANSITION_NOTIFIER);
+	if (ret)
+		return ret;
+
+	return cpufreq_register_notifier(&cpufreq_policy_notifier,
+						CPUFREQ_POLICY_NOTIFIER);
 }
 core_initcall(register_cpufreq_notifier);
-
 #endif
diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 08b7847..9c09e6e 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -169,6 +169,23 @@ static void update_cpu_capacity(unsigned int cpu)
 		cpu, arch_scale_cpu_capacity(NULL, cpu));
 }
 
+/*
+ * Scheduler load-tracking scale-invariance
+ *
+ * Provides the scheduler with a scale-invariance correction factor that
+ * compensates for frequency scaling (arch_scale_freq_capacity()). The scaling
+ * factor is updated in smp.c
+ */
+unsigned long arm_arch_scale_freq_capacity(struct sched_domain *sd, int cpu)
+{
+	unsigned long curr = atomic_long_read(&per_cpu(cpu_freq_capacity, cpu));
+
+	if (!curr)
+		return SCHED_CAPACITY_SCALE;
+
+	return curr;
+}
+
 #else
 static inline void parse_dt_topology(void) {}
 static inline void update_cpu_capacity(unsigned int cpuid) {}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/4] cpufreq: introduce cpufreq_driver_might_sleep
  2015-06-26 23:53 [PATCH v3 0/4] scheduler-driven cpu frequency selection Michael Turquette
  2015-06-26 23:53 ` [PATCH v3 1/4] arm: Frequency invariant scheduler load-tracking support Michael Turquette
@ 2015-06-26 23:53 ` Michael Turquette
  2015-06-27  0:48   ` Felipe Balbi
  2015-06-26 23:53 ` [PATCH v3 3/4] sched: scheduler-driven cpu frequency selection Michael Turquette
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Michael Turquette @ 2015-06-26 23:53 UTC (permalink / raw)
  To: peterz, mingo
  Cc: linux-kernel, preeti, Morten.Rasmussen, riel, efault,
	nicolas.pitre, daniel.lezcano, dietmar.eggemann, vincent.guittot,
	amit.kucheria, juri.lelli, rjw, viresh.kumar, ashwin.chaugule,
	alex.shi, linux-pm, abelvesa, pebolle, Michael Turquette,
	Rafael J. Wysocki

From: Michael Turquette <mturquette@baylibre.com>

Some architectures and platforms perform CPU frequency transitions
through a non-blocking method, while some might block or sleep. This
distinction is important when trying to change frequency from interrupt
context or in any other non-interruptable context, such as from the
Linux scheduler.

Describe this distinction with a cpufreq driver flag,
CPUFREQ_DRIVER_WILL_NOT_SLEEP. The default is to not have this flag set,
thus erring on the side of caution.

cpufreq_driver_might_sleep() is also introduced in this patch. Setting
the above flag will allow this function to return false.

Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
---
 drivers/cpufreq/cpufreq.c | 6 ++++++
 include/linux/cpufreq.h   | 9 +++++++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 28e59a4..e5296c0 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -112,6 +112,12 @@ bool have_governor_per_policy(void)
 }
 EXPORT_SYMBOL_GPL(have_governor_per_policy);
 
+bool cpufreq_driver_might_sleep(void)
+{
+	return !(cpufreq_driver->flags & CPUFREQ_DRIVER_WILL_NOT_SLEEP);
+}
+EXPORT_SYMBOL_GPL(cpufreq_driver_might_sleep);
+
 struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy)
 {
 	if (have_governor_per_policy())
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 2ee4888..1f2c9a1 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -157,6 +157,7 @@ u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy);
 int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu);
 int cpufreq_update_policy(unsigned int cpu);
 bool have_governor_per_policy(void);
+bool cpufreq_driver_might_sleep(void);
 struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy);
 #else
 static inline unsigned int cpufreq_get(unsigned int cpu)
@@ -314,6 +315,14 @@ struct cpufreq_driver {
  */
 #define CPUFREQ_NEED_INITIAL_FREQ_CHECK	(1 << 5)
 
+/*
+ * Set by drivers that will never block or sleep during their frequency
+ * transition. Used to indicate when it is safe to call cpufreq_driver_target
+ * from non-interruptable context. Drivers must opt-in to this flag, as the
+ * safe default is that they might sleep.
+ */
+#define CPUFREQ_DRIVER_WILL_NOT_SLEEP	(1 << 6)
+
 int cpufreq_register_driver(struct cpufreq_driver *driver_data);
 int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/4] sched: scheduler-driven cpu frequency selection
  2015-06-26 23:53 [PATCH v3 0/4] scheduler-driven cpu frequency selection Michael Turquette
  2015-06-26 23:53 ` [PATCH v3 1/4] arm: Frequency invariant scheduler load-tracking support Michael Turquette
  2015-06-26 23:53 ` [PATCH v3 2/4] cpufreq: introduce cpufreq_driver_might_sleep Michael Turquette
@ 2015-06-26 23:53 ` Michael Turquette
  2015-06-27  0:47   ` Felipe Balbi
  2015-07-06 20:06   ` Dietmar Eggemann
  2015-06-26 23:53 ` [PATCH v3 4/4] [RFC] sched: cfs: cpu frequency scaling policy Michael Turquette
  2015-07-03  9:57 ` [PATCH v3 0/4] scheduler-driven cpu frequency selection Vincent Guittot
  4 siblings, 2 replies; 14+ messages in thread
From: Michael Turquette @ 2015-06-26 23:53 UTC (permalink / raw)
  To: peterz, mingo
  Cc: linux-kernel, preeti, Morten.Rasmussen, riel, efault,
	nicolas.pitre, daniel.lezcano, dietmar.eggemann, vincent.guittot,
	amit.kucheria, juri.lelli, rjw, viresh.kumar, ashwin.chaugule,
	alex.shi, linux-pm, abelvesa, pebolle, Michael Turquette

From: Michael Turquette <mturquette@baylibre.com>

Scheduler-driven cpu frequency selection is desirable as part of the
on-going effort to make the scheduler better aware of energy
consumption.  No piece of the Linux kernel has a better view of the
factors that affect a cpu frequency selection policy than the
scheduler[0], and this patch is an attempt to converge on an initial
solution.

This patch implements a simple shim layer between the Linux scheduler
and the cpufreq subsystem. This interface accepts a capacity request
from the Completely Fair Scheduler and honors the max request from all
cpus in the same frequency domain.

The policy magic comes from choosing the cpu capacity request from cfs
and is not contained in this cpufreq governor. This code is
intentionally dumb.

Note that this "governor" is event-driven. There is no polling loop to
check cpu idle time nor any other method which is unsynchronized with
the scheduler.

Thanks to Juri Lelli <juri.lelli@arm.com> for contributing design ideas,
code and test results.

[0] http://article.gmane.org/gmane.linux.kernel/1499836

Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
---
Changes in v3:
Removed all policy bits
Renamed from cpufreq_cfs to cpufreq_sched
Bug fixes
Support non-blocking frequency transitions
License fix (thanks Paul)

 drivers/cpufreq/Kconfig      |  24 ++++
 include/linux/cpufreq.h      |   3 +
 kernel/sched/Makefile        |   1 +
 kernel/sched/cpufreq_sched.c | 308 +++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h         |   8 ++
 5 files changed, 344 insertions(+)
 create mode 100644 kernel/sched/cpufreq_sched.c

diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index a171fef..0206889 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@@ -102,6 +102,15 @@ config CPU_FREQ_DEFAULT_GOV_CONSERVATIVE
 	  Be aware that not all cpufreq drivers support the conservative
 	  governor. If unsure have a look at the help section of the
 	  driver. Fallback governor will be the performance governor.
+
+config CPU_FREQ_DEFAULT_GOV_SCHED
+	bool "sched"
+	select CPU_FREQ_GOV_SCHED
+	select CPU_FREQ_GOV_PERFORMANCE
+	help
+	  Use the CPUfreq governor 'sched' as default. This scales
+	  cpu frequency from the scheduler as per-entity load tracking
+	  statistics are updated.
 endchoice
 
 config CPU_FREQ_GOV_PERFORMANCE
@@ -183,6 +192,21 @@ config CPU_FREQ_GOV_CONSERVATIVE
 
 	  If in doubt, say N.
 
+config CPU_FREQ_GOV_SCHED
+	tristate "'sched' cpufreq governor"
+	depends on CPU_FREQ
+	select CPU_FREQ_GOV_COMMON
+	help
+	  'sched' - this governor scales cpu frequency from the
+	  scheduler as a function of cpu capacity utilization. It does
+	  not evaluate utilization on a periodic basis (as ondemand
+	  does) but instead is invoked from the completely fair
+	  scheduler when updating per-entity load tracking statistics.
+	  Latency to respond to changes in load is improved over polling
+	  governors due to its event-driven design.
+
+	  If in doubt, say N.
+
 comment "CPU frequency scaling drivers"
 
 config CPUFREQ_DT
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 1f2c9a1..30241c9 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -494,6 +494,9 @@ extern struct cpufreq_governor cpufreq_gov_ondemand;
 #elif defined(CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE)
 extern struct cpufreq_governor cpufreq_gov_conservative;
 #define CPUFREQ_DEFAULT_GOVERNOR	(&cpufreq_gov_conservative)
+#elif defined(CONFIG_CPU_FREQ_DEFAULT_GOV_SCHED_GOV)
+extern struct cpufreq_governor cpufreq_gov_sched_gov;
+#define CPUFREQ_DEFAULT_GOVERNOR	(&cpufreq_gov_sched)
 #endif
 
 /*********************************************************************
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 46be870..f04386c 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -19,3 +19,4 @@ obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
 obj-$(CONFIG_SCHED_DEBUG) += debug.o
 obj-$(CONFIG_CGROUP_CPUACCT) += cpuacct.o
+obj-$(CONFIG_CPU_FREQ_GOV_SCHED) += cpufreq_sched.o
diff --git a/kernel/sched/cpufreq_sched.c b/kernel/sched/cpufreq_sched.c
new file mode 100644
index 0000000..5020f24
--- /dev/null
+++ b/kernel/sched/cpufreq_sched.c
@@ -0,0 +1,308 @@
+/*
+ *  Copyright (C)  2015 Michael Turquette <mturquette@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/cpufreq.h>
+#include <linux/module.h>
+#include <linux/kthread.h>
+#include <linux/percpu.h>
+#include <linux/irq_work.h>
+
+#include "sched.h"
+
+#define THROTTLE_NSEC		50000000 /* 50ms default */
+
+static DEFINE_PER_CPU(unsigned long, pcpu_capacity);
+static DEFINE_PER_CPU(struct cpufreq_policy *, pcpu_policy);
+
+/**
+ * gov_data - per-policy data internal to the governor
+ * @throttle: next throttling period expiry. Derived from throttle_nsec
+ * @throttle_nsec: throttle period length in nanoseconds
+ * @task: worker thread for dvfs transition that may block/sleep
+ * @irq_work: callback used to wake up worker thread
+ * @freq: new frequency stored in *_sched_update_cpu and used in *_sched_thread
+ *
+ * struct gov_data is the per-policy cpufreq_sched-specific data structure. A
+ * per-policy instance of it is created when the cpufreq_sched governor receives
+ * the CPUFREQ_GOV_START condition and a pointer to it exists in the gov_data
+ * member of struct cpufreq_policy.
+ *
+ * Readers of this data must call down_read(policy->rwsem). Writers must
+ * call down_write(policy->rwsem).
+ */
+struct gov_data {
+	ktime_t throttle;
+	unsigned int throttle_nsec;
+	struct task_struct *task;
+	struct irq_work irq_work;
+	struct cpufreq_policy *policy;
+	unsigned int freq;
+};
+
+static void cpufreq_sched_try_driver_target(struct cpufreq_policy *policy, unsigned int freq)
+{
+	struct gov_data *gd = policy->governor_data;
+
+	/* avoid race with cpufreq_sched_stop */
+	if (!down_write_trylock(&policy->rwsem))
+		return;
+
+	__cpufreq_driver_target(policy, freq, CPUFREQ_RELATION_L);
+
+	gd->throttle = ktime_add_ns(ktime_get(), gd->throttle_nsec);
+	up_write(&policy->rwsem);
+}
+
+/*
+ * we pass in struct cpufreq_policy. This is safe because changing out the
+ * policy requires a call to __cpufreq_governor(policy, CPUFREQ_GOV_STOP),
+ * which tears down all of the data structures and __cpufreq_governor(policy,
+ * CPUFREQ_GOV_START) will do a full rebuild, including this kthread with the
+ * new policy pointer
+ */
+static int cpufreq_sched_thread(void *data)
+{
+	struct sched_param param;
+	struct cpufreq_policy *policy;
+	struct gov_data *gd;
+	int ret;
+
+	policy = (struct cpufreq_policy *) data;
+	if (!policy) {
+		pr_warn("%s: missing policy\n", __func__);
+		do_exit(-EINVAL);
+	}
+
+	gd = policy->governor_data;
+	if (!gd) {
+		pr_warn("%s: missing governor data\n", __func__);
+		do_exit(-EINVAL);
+	}
+
+	param.sched_priority = 50;
+	ret = sched_setscheduler_nocheck(gd->task, SCHED_FIFO, &param);
+	if (ret) {
+		pr_warn("%s: failed to set SCHED_FIFO\n", __func__);
+		do_exit(-EINVAL);
+	} else {
+		pr_debug("%s: kthread (%d) set to SCHED_FIFO\n",
+				__func__, gd->task->pid);
+	}
+
+	ret = set_cpus_allowed_ptr(gd->task, policy->related_cpus);
+	if (ret) {
+		pr_warn("%s: failed to set allowed ptr\n", __func__);
+		do_exit(-EINVAL);
+	}
+
+	/* main loop of the per-policy kthread */
+	do {
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule();
+		if (kthread_should_stop())
+			break;
+
+		cpufreq_sched_try_driver_target(policy, gd->freq);
+	} while (!kthread_should_stop());
+
+	do_exit(0);
+}
+
+static void cpufreq_sched_irq_work(struct irq_work *irq_work)
+{
+	struct gov_data *gd;
+
+	gd = container_of(irq_work, struct gov_data, irq_work);
+	if (!gd) {
+		return;
+	}
+
+	wake_up_process(gd->task);
+}
+
+/**
+ * cpufreq_sched_set_capacity - interface to scheduler for changing capacity values
+ * @cpu: cpu whose capacity utilization has recently changed
+ * @capacity: the new capacity requested by cpu
+ *
+ * cpufreq_sched_sched_capacity is an interface exposed to the scheduler so
+ * that the scheduler may inform the governor of updates to capacity
+ * utilization and make changes to cpu frequency. Currently this interface is
+ * designed around PELT values in CFS. It can be expanded to other scheduling
+ * classes in the future if needed.
+ *
+ * cpufreq_sched_set_capacity raises an IPI. The irq_work handler for that IPI
+ * wakes up the thread that does the actual work, cpufreq_sched_thread.
+ *
+ * This functions bails out early if either condition is true:
+ * 1) this cpu did not the new maximum capacity for its frequency domain
+ * 2) no change in cpu frequency is necessary to meet the new capacity request
+ */
+void cpufreq_sched_set_cap(int cpu, unsigned long capacity)
+{
+	unsigned int freq_new, cpu_tmp;
+	struct cpufreq_policy *policy;
+	struct gov_data *gd;
+	unsigned long capacity_max = 0;
+
+	/* update per-cpu capacity request */
+	__this_cpu_write(pcpu_capacity, capacity);
+
+	policy = cpufreq_cpu_get(cpu);
+	if (IS_ERR_OR_NULL(policy)) {
+		return;
+	}
+
+	if (!policy->governor_data)
+		goto out;
+
+	gd = policy->governor_data;
+
+	/* bail early if we are throttled */
+	if (ktime_before(ktime_get(), gd->throttle))
+		goto out;
+
+	/* find max capacity requested by cpus in this policy */
+	for_each_cpu(cpu_tmp, policy->cpus)
+		capacity_max = max(capacity_max, per_cpu(pcpu_capacity, cpu_tmp));
+
+	/*
+	 * We only change frequency if this cpu's capacity request represents a
+	 * new max. If another cpu has requested a capacity greater than the
+	 * previous max then we rely on that cpu to hit this code path and make
+	 * the change. IOW, the cpu with the new max capacity is responsible
+	 * for setting the new capacity/frequency.
+	 *
+	 * If this cpu is not the new maximum then bail
+	 */
+	if (capacity_max > capacity)
+		goto out;
+
+	/* Convert the new maximum capacity request into a cpu frequency */
+	freq_new = capacity * policy->max >> SCHED_CAPACITY_SHIFT;
+
+	/* No change in frequency? Bail and return current capacity. */
+	if (freq_new == policy->cur)
+		goto out;
+
+	/* store the new frequency and perform the transition */
+	gd->freq = freq_new;
+
+	if (cpufreq_driver_might_sleep())
+		irq_work_queue_on(&gd->irq_work, cpu);
+	else
+		cpufreq_sched_try_driver_target(policy, freq_new);
+
+out:
+	cpufreq_cpu_put(policy);
+	return;
+}
+
+static int cpufreq_sched_start(struct cpufreq_policy *policy)
+{
+	struct gov_data *gd;
+	int cpu;
+
+	/* prepare per-policy private data */
+	gd = kzalloc(sizeof(*gd), GFP_KERNEL);
+	if (!gd) {
+		pr_debug("%s: failed to allocate private data\n", __func__);
+		return -ENOMEM;
+	}
+
+	/* initialize per-cpu data */
+	for_each_cpu(cpu, policy->cpus) {
+		per_cpu(pcpu_capacity, cpu) = 0;
+		per_cpu(pcpu_policy, cpu) = policy;
+	}
+
+	/*
+	 * Don't ask for freq changes at an higher rate than what
+	 * the driver advertises as transition latency.
+	 */
+	gd->throttle_nsec = policy->cpuinfo.transition_latency ?
+			    policy->cpuinfo.transition_latency :
+			    THROTTLE_NSEC;
+	pr_debug("%s: throttle threshold = %u [ns]\n",
+		  __func__, gd->throttle_nsec);
+
+	if (cpufreq_driver_might_sleep()) {
+		/* init per-policy kthread */
+		gd->task = kthread_run(cpufreq_sched_thread, policy, "kcpufreq_sched_task");
+		if (IS_ERR_OR_NULL(gd->task)) {
+			pr_err("%s: failed to create kcpufreq_sched_task thread\n", __func__);
+			goto err;
+		}
+		init_irq_work(&gd->irq_work, cpufreq_sched_irq_work);
+	}
+
+	policy->governor_data = gd;
+	gd->policy = policy;
+	return 0;
+
+err:
+	kfree(gd);
+	return -ENOMEM;
+}
+
+static int cpufreq_sched_stop(struct cpufreq_policy *policy)
+{
+	struct gov_data *gd = policy->governor_data;
+
+	if (cpufreq_driver_might_sleep()) {
+		kthread_stop(gd->task);
+	}
+
+	policy->governor_data = NULL;
+
+	/* FIXME replace with devm counterparts? */
+	kfree(gd);
+	return 0;
+}
+
+static int cpufreq_sched_setup(struct cpufreq_policy *policy, unsigned int event)
+{
+	switch (event) {
+		case CPUFREQ_GOV_START:
+			/* Start managing the frequency */
+			return cpufreq_sched_start(policy);
+
+		case CPUFREQ_GOV_STOP:
+			return cpufreq_sched_stop(policy);
+
+		case CPUFREQ_GOV_LIMITS:	/* unused */
+		case CPUFREQ_GOV_POLICY_INIT:	/* unused */
+		case CPUFREQ_GOV_POLICY_EXIT:	/* unused */
+			break;
+	}
+	return 0;
+}
+
+#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_SCHED
+static
+#endif
+struct cpufreq_governor cpufreq_gov_sched = {
+	.name			= "sched",
+	.governor		= cpufreq_sched_setup,
+	.owner			= THIS_MODULE,
+};
+
+static int __init cpufreq_sched_init(void)
+{
+	return cpufreq_register_governor(&cpufreq_gov_sched);
+}
+
+static void __exit cpufreq_sched_exit(void)
+{
+	cpufreq_unregister_governor(&cpufreq_gov_sched);
+}
+
+/* Try to make this the default governor */
+fs_initcall(cpufreq_sched_init);
+
+MODULE_LICENSE("GPL v2");
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e0e1299..25a1b85 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1396,6 +1396,13 @@ unsigned long arch_scale_freq_capacity(struct sched_domain *sd, int cpu)
 }
 #endif
 
+#ifdef CONFIG_CPU_FREQ_GOV_SCHED
+void cpufreq_sched_set_cap(int cpu, unsigned long util);
+#else
+static inline void cpufreq_sched_set_cap(int cpu, unsigned long util)
+{ }
+#endif
+
 static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
 {
 	rq->rt_avg += rt_delta * arch_scale_freq_capacity(NULL, cpu_of(rq));
@@ -1404,6 +1411,7 @@ static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
 #else
 static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta) { }
 static inline void sched_avg_update(struct rq *rq) { }
+static inline void gov_cfs_update_cpu(int cpu) {}
 #endif
 
 extern void start_bandwidth_timer(struct hrtimer *period_timer, ktime_t period);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/4] [RFC] sched: cfs: cpu frequency scaling policy
  2015-06-26 23:53 [PATCH v3 0/4] scheduler-driven cpu frequency selection Michael Turquette
                   ` (2 preceding siblings ...)
  2015-06-26 23:53 ` [PATCH v3 3/4] sched: scheduler-driven cpu frequency selection Michael Turquette
@ 2015-06-26 23:53 ` Michael Turquette
  2015-07-03  9:57 ` [PATCH v3 0/4] scheduler-driven cpu frequency selection Vincent Guittot
  4 siblings, 0 replies; 14+ messages in thread
From: Michael Turquette @ 2015-06-26 23:53 UTC (permalink / raw)
  To: peterz, mingo
  Cc: linux-kernel, preeti, Morten.Rasmussen, riel, efault,
	nicolas.pitre, daniel.lezcano, dietmar.eggemann, vincent.guittot,
	amit.kucheria, juri.lelli, rjw, viresh.kumar, ashwin.chaugule,
	alex.shi, linux-pm, abelvesa, pebolle, Michael Turquette

From: Michael Turquette <mturquette@baylibre.com>

Implements a very simple policy to scale cpu frequency as a function of
cfs utilization. This policy is a placeholder until something better
comes along. Its purpose is to illustrate how to use the
cpufreq_sched_set_capacity api and allow interested parties to hack on
this stuff.

Signed-off-by: Michael Turquette <mturquette@baylibre.com>
---
Changes in v3:
Split out into separate patch
Capacity calculation moved from cpufreq governor to cfs
Removed use of static key. Replaced with Kconfig option

 kernel/sched/fair.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 46855d0..5ccc384 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4217,6 +4217,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 {
 	struct cfs_rq *cfs_rq;
 	struct sched_entity *se = &p->se;
+	unsigned long utilization, capacity;
 
 	for_each_sched_entity(se) {
 		if (se->on_rq)
@@ -4252,6 +4253,19 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		update_rq_runnable_avg(rq, rq->nr_running);
 		add_nr_running(rq, 1);
 	}
+
+#ifdef CONFIG_CPU_FREQ_GOV_SCHED
+	/* add 25% margin to current utilization */
+	utilization = rq->cfs.utilization_load_avg;
+	capacity = utilization + (utilization >> 2);
+
+	/* handle rounding errors */
+	capacity = (capacity > SCHED_LOAD_SCALE) ? SCHED_LOAD_SCALE :
+		capacity;
+
+	cpufreq_sched_set_cap(cpu_of(rq), capacity);
+#endif
+
 	hrtick_update(rq);
 }
 
@@ -4267,6 +4281,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	struct cfs_rq *cfs_rq;
 	struct sched_entity *se = &p->se;
 	int task_sleep = flags & DEQUEUE_SLEEP;
+	unsigned long utilization, capacity;
 
 	for_each_sched_entity(se) {
 		cfs_rq = cfs_rq_of(se);
@@ -4313,6 +4328,19 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		sub_nr_running(rq, 1);
 		update_rq_runnable_avg(rq, 1);
 	}
+
+#ifdef CONFIG_CPU_FREQ_GOV_SCHED
+	/* add 25% margin to current utilization */
+	utilization = rq->cfs.utilization_load_avg;
+	capacity = utilization + (utilization >> 2);
+
+	/* handle rounding errors */
+	capacity = (capacity > SCHED_LOAD_SCALE) ? SCHED_LOAD_SCALE :
+		capacity;
+
+	cpufreq_sched_set_cap(cpu_of(rq), capacity);
+#endif
+
 	hrtick_update(rq);
 }
 
@@ -7806,6 +7834,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
 {
 	struct cfs_rq *cfs_rq;
 	struct sched_entity *se = &curr->se;
+	unsigned long utilization, capacity;
 
 	for_each_sched_entity(se) {
 		cfs_rq = cfs_rq_of(se);
@@ -7816,6 +7845,18 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
 		task_tick_numa(rq, curr);
 
 	update_rq_runnable_avg(rq, 1);
+
+#ifdef CONFIG_CPU_FREQ_GOV_SCHED
+	/* add 25% margin to current utilization */
+	utilization = rq->cfs.utilization_load_avg;
+	capacity = utilization + (utilization >> 2);
+
+	/* handle rounding errors */
+	capacity = (capacity > SCHED_LOAD_SCALE) ? SCHED_LOAD_SCALE :
+		capacity;
+
+	cpufreq_sched_set_cap(cpu_of(rq), capacity);
+#endif
 }
 
 /*
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/4] sched: scheduler-driven cpu frequency selection
  2015-06-26 23:53 ` [PATCH v3 3/4] sched: scheduler-driven cpu frequency selection Michael Turquette
@ 2015-06-27  0:47   ` Felipe Balbi
       [not found]     ` <20150629164943.9112.4253@quantum>
  2015-07-06 20:06   ` Dietmar Eggemann
  1 sibling, 1 reply; 14+ messages in thread
From: Felipe Balbi @ 2015-06-27  0:47 UTC (permalink / raw)
  To: Michael Turquette
  Cc: peterz, mingo, linux-kernel, preeti, Morten.Rasmussen, riel,
	efault, nicolas.pitre, daniel.lezcano, dietmar.eggemann,
	vincent.guittot, amit.kucheria, juri.lelli, rjw, viresh.kumar,
	ashwin.chaugule, alex.shi, linux-pm, abelvesa, pebolle

[-- Attachment #1: Type: text/plain, Size: 11060 bytes --]

Hi,

On Fri, Jun 26, 2015 at 04:53:43PM -0700, Michael Turquette wrote:
> diff --git a/kernel/sched/cpufreq_sched.c b/kernel/sched/cpufreq_sched.c
> new file mode 100644
> index 0000000..5020f24
> --- /dev/null
> +++ b/kernel/sched/cpufreq_sched.c
> @@ -0,0 +1,308 @@
> +/*
> + *  Copyright (C)  2015 Michael Turquette <mturquette@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/cpufreq.h>
> +#include <linux/module.h>
> +#include <linux/kthread.h>
> +#include <linux/percpu.h>
> +#include <linux/irq_work.h>
> +
> +#include "sched.h"
> +
> +#define THROTTLE_NSEC		50000000 /* 50ms default */
> +
> +static DEFINE_PER_CPU(unsigned long, pcpu_capacity);
> +static DEFINE_PER_CPU(struct cpufreq_policy *, pcpu_policy);
> +
> +/**
> + * gov_data - per-policy data internal to the governor
> + * @throttle: next throttling period expiry. Derived from throttle_nsec
> + * @throttle_nsec: throttle period length in nanoseconds
> + * @task: worker thread for dvfs transition that may block/sleep
> + * @irq_work: callback used to wake up worker thread
> + * @freq: new frequency stored in *_sched_update_cpu and used in *_sched_thread
> + *
> + * struct gov_data is the per-policy cpufreq_sched-specific data structure. A
> + * per-policy instance of it is created when the cpufreq_sched governor receives
> + * the CPUFREQ_GOV_START condition and a pointer to it exists in the gov_data
> + * member of struct cpufreq_policy.
> + *
> + * Readers of this data must call down_read(policy->rwsem). Writers must
> + * call down_write(policy->rwsem).
> + */
> +struct gov_data {
> +	ktime_t throttle;
> +	unsigned int throttle_nsec;
> +	struct task_struct *task;
> +	struct irq_work irq_work;
> +	struct cpufreq_policy *policy;
> +	unsigned int freq;
> +};
> +
> +static void cpufreq_sched_try_driver_target(struct cpufreq_policy *policy, unsigned int freq)
> +{
> +	struct gov_data *gd = policy->governor_data;
> +
> +	/* avoid race with cpufreq_sched_stop */
> +	if (!down_write_trylock(&policy->rwsem))
> +		return;
> +
> +	__cpufreq_driver_target(policy, freq, CPUFREQ_RELATION_L);
> +
> +	gd->throttle = ktime_add_ns(ktime_get(), gd->throttle_nsec);
> +	up_write(&policy->rwsem);
> +}
> +
> +/*
> + * we pass in struct cpufreq_policy. This is safe because changing out the
> + * policy requires a call to __cpufreq_governor(policy, CPUFREQ_GOV_STOP),
> + * which tears down all of the data structures and __cpufreq_governor(policy,
> + * CPUFREQ_GOV_START) will do a full rebuild, including this kthread with the
> + * new policy pointer
> + */
> +static int cpufreq_sched_thread(void *data)
> +{
> +	struct sched_param param;
> +	struct cpufreq_policy *policy;
> +	struct gov_data *gd;
> +	int ret;
> +
> +	policy = (struct cpufreq_policy *) data;

unnecessary cast.

> +	if (!policy) {

Is this even possible ? I'd just let it oops since it would be a really
odd case.

> +		pr_warn("%s: missing policy\n", __func__);
> +		do_exit(-EINVAL);
> +	}
> +
> +	gd = policy->governor_data;
> +	if (!gd) {

likewise.

> +		pr_warn("%s: missing governor data\n", __func__);
> +		do_exit(-EINVAL);
> +	}
> +
> +	param.sched_priority = 50;
> +	ret = sched_setscheduler_nocheck(gd->task, SCHED_FIFO, &param);
> +	if (ret) {
> +		pr_warn("%s: failed to set SCHED_FIFO\n", __func__);
> +		do_exit(-EINVAL);
> +	} else {

else is unnecessary here, but no strong feelings.

> +		pr_debug("%s: kthread (%d) set to SCHED_FIFO\n",
> +				__func__, gd->task->pid);
> +	}
> +
> +	ret = set_cpus_allowed_ptr(gd->task, policy->related_cpus);
> +	if (ret) {
> +		pr_warn("%s: failed to set allowed ptr\n", __func__);
> +		do_exit(-EINVAL);
> +	}
> +
> +	/* main loop of the per-policy kthread */
> +	do {
> +		set_current_state(TASK_INTERRUPTIBLE);
> +		schedule();
> +		if (kthread_should_stop())
> +			break;
> +
> +		cpufreq_sched_try_driver_target(policy, gd->freq);
> +	} while (!kthread_should_stop());

looks like this would be simpler with a plain while() instead of do
{} while:

	while (!kthread_should_stop()) {
		set_current_state(TASK_INTERRUPTIBLE);
		schedule();
		cpufreq_sched_try_driver_target(policy, gd->freq);
	}

> +	do_exit(0);
> +}
> +
> +static void cpufreq_sched_irq_work(struct irq_work *irq_work)
> +{
> +	struct gov_data *gd;
> +
> +	gd = container_of(irq_work, struct gov_data, irq_work);

if irq_work is the first member in struct gov_data, this gets optimized
to a cast.

> +	if (!gd) {

unnecessary parens.

> +		return;
> +	}
> +
> +	wake_up_process(gd->task);
> +}
> +
> +/**
> + * cpufreq_sched_set_capacity - interface to scheduler for changing capacity values
> + * @cpu: cpu whose capacity utilization has recently changed
> + * @capacity: the new capacity requested by cpu
> + *
> + * cpufreq_sched_sched_capacity is an interface exposed to the scheduler so
> + * that the scheduler may inform the governor of updates to capacity
> + * utilization and make changes to cpu frequency. Currently this interface is
> + * designed around PELT values in CFS. It can be expanded to other scheduling
> + * classes in the future if needed.
> + *
> + * cpufreq_sched_set_capacity raises an IPI. The irq_work handler for that IPI
> + * wakes up the thread that does the actual work, cpufreq_sched_thread.
> + *
> + * This functions bails out early if either condition is true:
> + * 1) this cpu did not the new maximum capacity for its frequency domain
> + * 2) no change in cpu frequency is necessary to meet the new capacity request
> + */
> +void cpufreq_sched_set_cap(int cpu, unsigned long capacity)
> +{
> +	unsigned int freq_new, cpu_tmp;
> +	struct cpufreq_policy *policy;
> +	struct gov_data *gd;
> +	unsigned long capacity_max = 0;
> +
> +	/* update per-cpu capacity request */
> +	__this_cpu_write(pcpu_capacity, capacity);
> +
> +	policy = cpufreq_cpu_get(cpu);
> +	if (IS_ERR_OR_NULL(policy)) {

can this really be ERR_PTR ? Also, unnecessary parens

> +		return;
> +	}
> +
> +	if (!policy->governor_data)
> +		goto out;
> +
> +	gd = policy->governor_data;
> +
> +	/* bail early if we are throttled */
> +	if (ktime_before(ktime_get(), gd->throttle))
> +		goto out;
> +
> +	/* find max capacity requested by cpus in this policy */
> +	for_each_cpu(cpu_tmp, policy->cpus)
> +		capacity_max = max(capacity_max, per_cpu(pcpu_capacity, cpu_tmp));
> +
> +	/*
> +	 * We only change frequency if this cpu's capacity request represents a
> +	 * new max. If another cpu has requested a capacity greater than the
> +	 * previous max then we rely on that cpu to hit this code path and make
> +	 * the change. IOW, the cpu with the new max capacity is responsible
> +	 * for setting the new capacity/frequency.
> +	 *
> +	 * If this cpu is not the new maximum then bail
> +	 */
> +	if (capacity_max > capacity)
> +		goto out;
> +
> +	/* Convert the new maximum capacity request into a cpu frequency */
> +	freq_new = capacity * policy->max >> SCHED_CAPACITY_SHIFT;
> +
> +	/* No change in frequency? Bail and return current capacity. */
> +	if (freq_new == policy->cur)
> +		goto out;
> +
> +	/* store the new frequency and perform the transition */
> +	gd->freq = freq_new;
> +
> +	if (cpufreq_driver_might_sleep())
> +		irq_work_queue_on(&gd->irq_work, cpu);
> +	else
> +		cpufreq_sched_try_driver_target(policy, freq_new);
> +
> +out:
> +	cpufreq_cpu_put(policy);
> +	return;

unnecessary return

> +}
> +
> +static int cpufreq_sched_start(struct cpufreq_policy *policy)
> +{
> +	struct gov_data *gd;
> +	int cpu;
> +
> +	/* prepare per-policy private data */
> +	gd = kzalloc(sizeof(*gd), GFP_KERNEL);
> +	if (!gd) {
> +		pr_debug("%s: failed to allocate private data\n", __func__);

unnecessary OOM message, that will render curly braces unnecessary too.

> +		return -ENOMEM;
> +	}
> +
> +	/* initialize per-cpu data */
> +	for_each_cpu(cpu, policy->cpus) {
> +		per_cpu(pcpu_capacity, cpu) = 0;
> +		per_cpu(pcpu_policy, cpu) = policy;
> +	}
> +
> +	/*
> +	 * Don't ask for freq changes at an higher rate than what

s/an higher/a higher

> +	 * the driver advertises as transition latency.
> +	 */
> +	gd->throttle_nsec = policy->cpuinfo.transition_latency ?
> +			    policy->cpuinfo.transition_latency :
> +			    THROTTLE_NSEC;
> +	pr_debug("%s: throttle threshold = %u [ns]\n",
> +		  __func__, gd->throttle_nsec);
> +
> +	if (cpufreq_driver_might_sleep()) {
> +		/* init per-policy kthread */
> +		gd->task = kthread_run(cpufreq_sched_thread, policy, "kcpufreq_sched_task");
> +		if (IS_ERR_OR_NULL(gd->task)) {

kthread_run() doesn't return NULL.

> +			pr_err("%s: failed to create kcpufreq_sched_task thread\n", __func__);
> +			goto err;
> +		}
> +		init_irq_work(&gd->irq_work, cpufreq_sched_irq_work);
> +	}
> +
> +	policy->governor_data = gd;
> +	gd->policy = policy;
> +	return 0;
> +
> +err:
> +	kfree(gd);
> +	return -ENOMEM;

why don't you pass along errors returned by any other function you call ?

> +}
> +
> +static int cpufreq_sched_stop(struct cpufreq_policy *policy)
> +{
> +	struct gov_data *gd = policy->governor_data;
> +
> +	if (cpufreq_driver_might_sleep()) {

unnecessary curly braces.

> +		kthread_stop(gd->task);

should you switch back to some default OPP when this is removed ? Some
SoCs can't run at certain OPPs forever (thermal limitations, or whatever
else), might be good to switch to something considered safe.

> +	}
> +
> +	policy->governor_data = NULL;
> +
> +	/* FIXME replace with devm counterparts? */
> +	kfree(gd);
> +	return 0;
> +}
> +
> +static int cpufreq_sched_setup(struct cpufreq_policy *policy, unsigned int event)
> +{
> +	switch (event) {
> +		case CPUFREQ_GOV_START:
> +			/* Start managing the frequency */
> +			return cpufreq_sched_start(policy);
> +
> +		case CPUFREQ_GOV_STOP:
> +			return cpufreq_sched_stop(policy);
> +
> +		case CPUFREQ_GOV_LIMITS:	/* unused */
> +		case CPUFREQ_GOV_POLICY_INIT:	/* unused */
> +		case CPUFREQ_GOV_POLICY_EXIT:	/* unused */
> +			break;

indentation

> +	}
> +	return 0;
> +}
> +
> +#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_SCHED
> +static
> +#endif
> +struct cpufreq_governor cpufreq_gov_sched = {
> +	.name			= "sched",
> +	.governor		= cpufreq_sched_setup,
> +	.owner			= THIS_MODULE,
> +};
> +
> +static int __init cpufreq_sched_init(void)
> +{
> +	return cpufreq_register_governor(&cpufreq_gov_sched);
> +}
> +
> +static void __exit cpufreq_sched_exit(void)
> +{
> +	cpufreq_unregister_governor(&cpufreq_gov_sched);
> +}
> +
> +/* Try to make this the default governor */
> +fs_initcall(cpufreq_sched_init);

why fs_initcall() ? Why can't this be in module_init() ?

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/4] cpufreq: introduce cpufreq_driver_might_sleep
  2015-06-26 23:53 ` [PATCH v3 2/4] cpufreq: introduce cpufreq_driver_might_sleep Michael Turquette
@ 2015-06-27  0:48   ` Felipe Balbi
       [not found]     ` <20150629162621.9112.4040@quantum>
  0 siblings, 1 reply; 14+ messages in thread
From: Felipe Balbi @ 2015-06-27  0:48 UTC (permalink / raw)
  To: Michael Turquette
  Cc: peterz, mingo, linux-kernel, preeti, Morten.Rasmussen, riel,
	efault, nicolas.pitre, daniel.lezcano, dietmar.eggemann,
	vincent.guittot, amit.kucheria, juri.lelli, rjw, viresh.kumar,
	ashwin.chaugule, alex.shi, linux-pm, abelvesa, pebolle,
	Rafael J. Wysocki

[-- Attachment #1: Type: text/plain, Size: 1838 bytes --]

Hi,

On Fri, Jun 26, 2015 at 04:53:42PM -0700, Michael Turquette wrote:
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 28e59a4..e5296c0 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -112,6 +112,12 @@ bool have_governor_per_policy(void)
>  }
>  EXPORT_SYMBOL_GPL(have_governor_per_policy);
>  
> +bool cpufreq_driver_might_sleep(void)
> +{
> +	return !(cpufreq_driver->flags & CPUFREQ_DRIVER_WILL_NOT_SLEEP);
> +}
> +EXPORT_SYMBOL_GPL(cpufreq_driver_might_sleep);
> +
>  struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy)
>  {
>  	if (have_governor_per_policy())
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 2ee4888..1f2c9a1 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -157,6 +157,7 @@ u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy);
>  int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu);
>  int cpufreq_update_policy(unsigned int cpu);
>  bool have_governor_per_policy(void);
> +bool cpufreq_driver_might_sleep(void);
>  struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy);
>  #else
>  static inline unsigned int cpufreq_get(unsigned int cpu)
> @@ -314,6 +315,14 @@ struct cpufreq_driver {
>   */
>  #define CPUFREQ_NEED_INITIAL_FREQ_CHECK	(1 << 5)
>  
> +/*
> + * Set by drivers that will never block or sleep during their frequency
> + * transition. Used to indicate when it is safe to call cpufreq_driver_target
> + * from non-interruptable context. Drivers must opt-in to this flag, as the
> + * safe default is that they might sleep.
> + */
> +#define CPUFREQ_DRIVER_WILL_NOT_SLEEP	(1 << 6)

don't you need to update current drivers and pass this flag where
necessary ?

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/4] cpufreq: introduce cpufreq_driver_might_sleep
       [not found]     ` <20150629162621.9112.4040@quantum>
@ 2015-06-29 16:39       ` Felipe Balbi
  2015-06-29 16:56         ` Michael Turquette
  0 siblings, 1 reply; 14+ messages in thread
From: Felipe Balbi @ 2015-06-29 16:39 UTC (permalink / raw)
  To: Michael Turquette
  Cc: balbi, peterz, mingo, linux-kernel, preeti, Morten.Rasmussen,
	riel, efault, nicolas.pitre, daniel.lezcano, dietmar.eggemann,
	vincent.guittot, amit.kucheria, juri.lelli, rjw, viresh.kumar,
	ashwin.chaugule, alex.shi, linux-pm, abelvesa, pebolle,
	Rafael J. Wysocki

[-- Attachment #1: Type: text/plain, Size: 2989 bytes --]

Hi,

On Mon, Jun 29, 2015 at 09:26:21AM -0700, Michael Turquette wrote:
> Quoting Felipe Balbi (2015-06-26 17:48:31)
> > Hi,
> > 
> > On Fri, Jun 26, 2015 at 04:53:42PM -0700, Michael Turquette wrote:
> > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > index 28e59a4..e5296c0 100644
> > > --- a/drivers/cpufreq/cpufreq.c
> > > +++ b/drivers/cpufreq/cpufreq.c
> > > @@ -112,6 +112,12 @@ bool have_governor_per_policy(void)
> > >  }
> > >  EXPORT_SYMBOL_GPL(have_governor_per_policy);
> > >  
> > > +bool cpufreq_driver_might_sleep(void)
> > > +{
> > > +     return !(cpufreq_driver->flags & CPUFREQ_DRIVER_WILL_NOT_SLEEP);
> > > +}
> > > +EXPORT_SYMBOL_GPL(cpufreq_driver_might_sleep);
> > > +
> > >  struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy)
> > >  {
> > >       if (have_governor_per_policy())
> > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > > index 2ee4888..1f2c9a1 100644
> > > --- a/include/linux/cpufreq.h
> > > +++ b/include/linux/cpufreq.h
> > > @@ -157,6 +157,7 @@ u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy);
> > >  int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu);
> > >  int cpufreq_update_policy(unsigned int cpu);
> > >  bool have_governor_per_policy(void);
> > > +bool cpufreq_driver_might_sleep(void);
> > >  struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy);
> > >  #else
> > >  static inline unsigned int cpufreq_get(unsigned int cpu)
> > > @@ -314,6 +315,14 @@ struct cpufreq_driver {
> > >   */
> > >  #define CPUFREQ_NEED_INITIAL_FREQ_CHECK      (1 << 5)
> > >  
> > > +/*
> > > + * Set by drivers that will never block or sleep during their frequency
> > > + * transition. Used to indicate when it is safe to call cpufreq_driver_target
> > > + * from non-interruptable context. Drivers must opt-in to this flag, as the
> > > + * safe default is that they might sleep.
> > > + */
> > > +#define CPUFREQ_DRIVER_WILL_NOT_SLEEP        (1 << 6)
> > 
> > don't you need to update current drivers and pass this flag where
> > necessary ?
> 
> Felipe,
> 
> Thanks for the review.
> 
> Setting the flag can be done, but it is an opt-in feature. First, none
> of the legacy cpufreq governors would actually make use of this flag.
> Everything they do is in process context. The first potential user of it
> is in patch #3.
> 
> Secondly, the governor in patch #3 will work without this flag set for a
> cpufreq driver. It will just defer the dvfs transition to a kthread
> instead of performing it in the hot path of the scheduler.
> 
> Finally, the only hardware I am aware of that can make use of this flag
> is Intel hardware. I know nothing about it and am happy for someone more
> knowledgeable than myself submit a patch enabling this flag for that
> architecture.

the follow-up question would be: then why introduce the flag at all ?
:-p

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/4] sched: scheduler-driven cpu frequency selection
       [not found]     ` <20150629164943.9112.4253@quantum>
@ 2015-06-29 16:55       ` Felipe Balbi
  0 siblings, 0 replies; 14+ messages in thread
From: Felipe Balbi @ 2015-06-29 16:55 UTC (permalink / raw)
  To: Michael Turquette
  Cc: balbi, peterz, mingo, linux-kernel, preeti, Morten.Rasmussen,
	riel, efault, nicolas.pitre, daniel.lezcano, dietmar.eggemann,
	vincent.guittot, amit.kucheria, juri.lelli, rjw, viresh.kumar,
	ashwin.chaugule, alex.shi, linux-pm, abelvesa, pebolle

[-- Attachment #1: Type: text/plain, Size: 1090 bytes --]

Hi,

On Mon, Jun 29, 2015 at 09:49:43AM -0700, Michael Turquette wrote:

<snip>

> > > +static int cpufreq_sched_stop(struct cpufreq_policy *policy)
> > > +{
> > > +     struct gov_data *gd = policy->governor_data;
> > > +
> > > +     if (cpufreq_driver_might_sleep()) {
> > 
> > unnecessary curly braces.
> > 
> > > +             kthread_stop(gd->task);
> > 
> 
> Thanks for the review. I'll take into account everything above.
> 
> > should you switch back to some default OPP when this is removed ? Some
> > SoCs can't run at certain OPPs forever (thermal limitations, or whatever
> > else), might be good to switch to something considered safe.
> 
> The above only happens when we unload the module or switch governors,
> and every governor has this characteristic.
> 
> I do not think that open-coding a return to some default opp in every
> governor is a good solution. This sounds like something the cpufreq core
> should take care of.

indeed.

> Also, how do we know which opp is safe?

no idea, that needs to be described somehow.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/4] cpufreq: introduce cpufreq_driver_might_sleep
  2015-06-29 16:39       ` Felipe Balbi
@ 2015-06-29 16:56         ` Michael Turquette
  2015-06-29 17:03           ` Felipe Balbi
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Turquette @ 2015-06-29 16:56 UTC (permalink / raw)
  To: balbi
  Cc: peterz, mingo, linux-kernel, preeti, Morten Rasmussen, riel,
	efault, nicolas.pitre, daniel.lezcano, dietmar.eggemann,
	vincent.guittot, Amit Kucheria, Juri Lelli, rjw, Viresh Kumar,
	ashwin.chaugule, alex.shi, linux-pm, abelvesa, pebolle,
	Rafael J. Wysocki

On Mon, Jun 29, 2015 at 9:39 AM, Felipe Balbi <balbi@ti.com> wrote:
>
> Hi,
>
> On Mon, Jun 29, 2015 at 09:26:21AM -0700, Michael Turquette wrote:
> > Quoting Felipe Balbi (2015-06-26 17:48:31)
> > > Hi,
> > >
> > > On Fri, Jun 26, 2015 at 04:53:42PM -0700, Michael Turquette wrote:
> > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > > index 28e59a4..e5296c0 100644
> > > > --- a/drivers/cpufreq/cpufreq.c
> > > > +++ b/drivers/cpufreq/cpufreq.c
> > > > @@ -112,6 +112,12 @@ bool have_governor_per_policy(void)
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(have_governor_per_policy);
> > > >
> > > > +bool cpufreq_driver_might_sleep(void)
> > > > +{
> > > > +     return !(cpufreq_driver->flags & CPUFREQ_DRIVER_WILL_NOT_SLEEP);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(cpufreq_driver_might_sleep);
> > > > +
> > > >  struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy)
> > > >  {
> > > >       if (have_governor_per_policy())
> > > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > > > index 2ee4888..1f2c9a1 100644
> > > > --- a/include/linux/cpufreq.h
> > > > +++ b/include/linux/cpufreq.h
> > > > @@ -157,6 +157,7 @@ u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy);
> > > >  int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu);
> > > >  int cpufreq_update_policy(unsigned int cpu);
> > > >  bool have_governor_per_policy(void);
> > > > +bool cpufreq_driver_might_sleep(void);
> > > >  struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy);
> > > >  #else
> > > >  static inline unsigned int cpufreq_get(unsigned int cpu)
> > > > @@ -314,6 +315,14 @@ struct cpufreq_driver {
> > > >   */
> > > >  #define CPUFREQ_NEED_INITIAL_FREQ_CHECK      (1 << 5)
> > > >
> > > > +/*
> > > > + * Set by drivers that will never block or sleep during their frequency
> > > > + * transition. Used to indicate when it is safe to call cpufreq_driver_target
> > > > + * from non-interruptable context. Drivers must opt-in to this flag, as the
> > > > + * safe default is that they might sleep.
> > > > + */
> > > > +#define CPUFREQ_DRIVER_WILL_NOT_SLEEP        (1 << 6)
> > >
> > > don't you need to update current drivers and pass this flag where
> > > necessary ?
> >
> > Felipe,
> >
> > Thanks for the review.
> >
> > Setting the flag can be done, but it is an opt-in feature. First, none
> > of the legacy cpufreq governors would actually make use of this flag.
> > Everything they do is in process context. The first potential user of it
> > is in patch #3.
> >
> > Secondly, the governor in patch #3 will work without this flag set for a
> > cpufreq driver. It will just defer the dvfs transition to a kthread
> > instead of performing it in the hot path of the scheduler.
> >
> > Finally, the only hardware I am aware of that can make use of this flag
> > is Intel hardware. I know nothing about it and am happy for someone more
> > knowledgeable than myself submit a patch enabling this flag for that
> > architecture.
>
> the follow-up question would be: then why introduce the flag at all ?
> :-p

I included it at Rafael's request:

http://lkml.kernel.org/r/<49407954.UBSF2FlX46@vostro.rjw.lan>

Regards,
Mike

>
> --
> balbi




-- 
Michael Turquette
CEO
BayLibre - At the Heart of Embedded Linux
http://baylibre.com/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/4] cpufreq: introduce cpufreq_driver_might_sleep
  2015-06-29 16:56         ` Michael Turquette
@ 2015-06-29 17:03           ` Felipe Balbi
  2015-06-29 17:10             ` Michael Turquette
  0 siblings, 1 reply; 14+ messages in thread
From: Felipe Balbi @ 2015-06-29 17:03 UTC (permalink / raw)
  To: Michael Turquette
  Cc: balbi, peterz, mingo, linux-kernel, preeti, Morten Rasmussen,
	riel, efault, nicolas.pitre, daniel.lezcano, dietmar.eggemann,
	vincent.guittot, Amit Kucheria, Juri Lelli, rjw, Viresh Kumar,
	ashwin.chaugule, alex.shi, linux-pm, abelvesa, pebolle,
	Rafael J. Wysocki

[-- Attachment #1: Type: text/plain, Size: 3122 bytes --]

Hi,

On Mon, Jun 29, 2015 at 09:56:55AM -0700, Michael Turquette wrote:
> > > > > @@ -112,6 +112,12 @@ bool have_governor_per_policy(void)
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(have_governor_per_policy);
> > > > >
> > > > > +bool cpufreq_driver_might_sleep(void)
> > > > > +{
> > > > > +     return !(cpufreq_driver->flags & CPUFREQ_DRIVER_WILL_NOT_SLEEP);
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(cpufreq_driver_might_sleep);
> > > > > +
> > > > >  struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy)
> > > > >  {
> > > > >       if (have_governor_per_policy())
> > > > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > > > > index 2ee4888..1f2c9a1 100644
> > > > > --- a/include/linux/cpufreq.h
> > > > > +++ b/include/linux/cpufreq.h
> > > > > @@ -157,6 +157,7 @@ u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy);
> > > > >  int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu);
> > > > >  int cpufreq_update_policy(unsigned int cpu);
> > > > >  bool have_governor_per_policy(void);
> > > > > +bool cpufreq_driver_might_sleep(void);
> > > > >  struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy);
> > > > >  #else
> > > > >  static inline unsigned int cpufreq_get(unsigned int cpu)
> > > > > @@ -314,6 +315,14 @@ struct cpufreq_driver {
> > > > >   */
> > > > >  #define CPUFREQ_NEED_INITIAL_FREQ_CHECK      (1 << 5)
> > > > >
> > > > > +/*
> > > > > + * Set by drivers that will never block or sleep during their frequency
> > > > > + * transition. Used to indicate when it is safe to call cpufreq_driver_target
> > > > > + * from non-interruptable context. Drivers must opt-in to this flag, as the
> > > > > + * safe default is that they might sleep.
> > > > > + */
> > > > > +#define CPUFREQ_DRIVER_WILL_NOT_SLEEP        (1 << 6)
> > > >
> > > > don't you need to update current drivers and pass this flag where
> > > > necessary ?
> > >
> > > Felipe,
> > >
> > > Thanks for the review.
> > >
> > > Setting the flag can be done, but it is an opt-in feature. First, none
> > > of the legacy cpufreq governors would actually make use of this flag.
> > > Everything they do is in process context. The first potential user of it
> > > is in patch #3.
> > >
> > > Secondly, the governor in patch #3 will work without this flag set for a
> > > cpufreq driver. It will just defer the dvfs transition to a kthread
> > > instead of performing it in the hot path of the scheduler.
> > >
> > > Finally, the only hardware I am aware of that can make use of this flag
> > > is Intel hardware. I know nothing about it and am happy for someone more
> > > knowledgeable than myself submit a patch enabling this flag for that
> > > architecture.
> >
> > the follow-up question would be: then why introduce the flag at all ?
> > :-p
> 
> I included it at Rafael's request:
> 
> http://lkml.kernel.org/r/<49407954.UBSF2FlX46@vostro.rjw.lan>

Fair enough, just think it might be an unused code path for a while,
since the flag isn't enabled anywhere :-s

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/4] cpufreq: introduce cpufreq_driver_might_sleep
  2015-06-29 17:03           ` Felipe Balbi
@ 2015-06-29 17:10             ` Michael Turquette
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Turquette @ 2015-06-29 17:10 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: peterz, mingo, linux-kernel, preeti, Morten Rasmussen, riel,
	efault, Nicolas Pitre, Daniel Lezcano, dietmar.eggemann,
	Vincent Guittot, Amit Kucheria, Juri Lelli, rjw, Viresh Kumar,
	Ashwin Chaugule, Alex Shi, linux-pm, Abel Vesa, pebolle,
	Rafael J. Wysocki

On Mon, Jun 29, 2015 at 10:03 AM, Felipe Balbi <balbi@ti.com> wrote:
> Hi,
>
> On Mon, Jun 29, 2015 at 09:56:55AM -0700, Michael Turquette wrote:
>> > > > > @@ -112,6 +112,12 @@ bool have_governor_per_policy(void)
>> > > > >  }
>> > > > >  EXPORT_SYMBOL_GPL(have_governor_per_policy);
>> > > > >
>> > > > > +bool cpufreq_driver_might_sleep(void)
>> > > > > +{
>> > > > > +     return !(cpufreq_driver->flags & CPUFREQ_DRIVER_WILL_NOT_SLEEP);
>> > > > > +}
>> > > > > +EXPORT_SYMBOL_GPL(cpufreq_driver_might_sleep);
>> > > > > +
>> > > > >  struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy)
>> > > > >  {
>> > > > >       if (have_governor_per_policy())
>> > > > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
>> > > > > index 2ee4888..1f2c9a1 100644
>> > > > > --- a/include/linux/cpufreq.h
>> > > > > +++ b/include/linux/cpufreq.h
>> > > > > @@ -157,6 +157,7 @@ u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy);
>> > > > >  int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu);
>> > > > >  int cpufreq_update_policy(unsigned int cpu);
>> > > > >  bool have_governor_per_policy(void);
>> > > > > +bool cpufreq_driver_might_sleep(void);
>> > > > >  struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy);
>> > > > >  #else
>> > > > >  static inline unsigned int cpufreq_get(unsigned int cpu)
>> > > > > @@ -314,6 +315,14 @@ struct cpufreq_driver {
>> > > > >   */
>> > > > >  #define CPUFREQ_NEED_INITIAL_FREQ_CHECK      (1 << 5)
>> > > > >
>> > > > > +/*
>> > > > > + * Set by drivers that will never block or sleep during their frequency
>> > > > > + * transition. Used to indicate when it is safe to call cpufreq_driver_target
>> > > > > + * from non-interruptable context. Drivers must opt-in to this flag, as the
>> > > > > + * safe default is that they might sleep.
>> > > > > + */
>> > > > > +#define CPUFREQ_DRIVER_WILL_NOT_SLEEP        (1 << 6)
>> > > >
>> > > > don't you need to update current drivers and pass this flag where
>> > > > necessary ?
>> > >
>> > > Felipe,
>> > >
>> > > Thanks for the review.
>> > >
>> > > Setting the flag can be done, but it is an opt-in feature. First, none
>> > > of the legacy cpufreq governors would actually make use of this flag.
>> > > Everything they do is in process context. The first potential user of it
>> > > is in patch #3.
>> > >
>> > > Secondly, the governor in patch #3 will work without this flag set for a
>> > > cpufreq driver. It will just defer the dvfs transition to a kthread
>> > > instead of performing it in the hot path of the scheduler.
>> > >
>> > > Finally, the only hardware I am aware of that can make use of this flag
>> > > is Intel hardware. I know nothing about it and am happy for someone more
>> > > knowledgeable than myself submit a patch enabling this flag for that
>> > > architecture.
>> >
>> > the follow-up question would be: then why introduce the flag at all ?
>> > :-p
>>
>> I included it at Rafael's request:
>>
>> http://lkml.kernel.org/r/<49407954.UBSF2FlX46@vostro.rjw.lan>
>
> Fair enough, just think it might be an unused code path for a while,
> since the flag isn't enabled anywhere :-s

The point of the series isn't to provide the final word on
scheduler-driven dvfs. The policy needs to be experimented with (see
patch #4) and part of that experimentation is to test on lots of
hardware. Having the flag ready to go will help those that are
experimenting on Intel hardware, or any other platform that has an
async/fast interface to kick off cpu dvfs transitions.

Regards,
Mike

>
> --
> balbi



-- 
Michael Turquette
CEO
BayLibre - At the Heart of Embedded Linux
http://baylibre.com/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/4] scheduler-driven cpu frequency selection
  2015-06-26 23:53 [PATCH v3 0/4] scheduler-driven cpu frequency selection Michael Turquette
                   ` (3 preceding siblings ...)
  2015-06-26 23:53 ` [PATCH v3 4/4] [RFC] sched: cfs: cpu frequency scaling policy Michael Turquette
@ 2015-07-03  9:57 ` Vincent Guittot
  4 siblings, 0 replies; 14+ messages in thread
From: Vincent Guittot @ 2015-07-03  9:57 UTC (permalink / raw)
  To: Michael Turquette
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Preeti U Murthy,
	Morten Rasmussen, Rik van Riel, Mike Galbraith, Nicolas Pitre,
	Daniel Lezcano, Dietmar Eggemann, Amit Kucheria, Juri Lelli,
	rjw@rjwysocki.net, Viresh Kumar, ashwin.chaugule, Alex Shi,
	linux-pm@vger.kernel.org, abelvesa, pebolle, Michael Turquette

[-- Attachment #1: Type: text/plain, Size: 4411 bytes --]

Hi Mike,

I have tried to evaluate the performance and the power consumption of
the various policy that we have in the kernel to manage the DVFS of
cpus including your sched-dvfs proposal. For this purpose, i have used
rt-app and a script that are available here:
https://git.linaro.org/power/rt-app.git/shortlog/refs/heads/master.
The script is in the directory:
doc/examples/cpufreq_governor_efficiency/
For evaluating the performance of DVFS policy, I use rt-app to run a
simple sequence that alternates run phase and sleep phase. The run
phase is done by looping a defined number of time on a function that
burns cycles (The number of loop is calibrated to be equivalent to a
specified duration at max frequency of the cpu). At the end of the
test, we know how many time is needed by each policy to run the same
pattern and we can compare policies. I have also done some power
consumption measurement so we can compare both the
performance/responsiveness and power consumption (the power figure
only reflects the power consumption of the A53 cluster).

The test has been done on cortex-A53 platofrm so i have ported the
frequency invariance patch [1/4] on arm64 (attached to the email for
reference)

The performance figures has been normalized; so 100% means as fast as
the performance governor that always use the max frequency and 0%
means as slow as the powersave governor that always uses the min
frequency. The power consumption figures have been normalized vs the
performance governor power consumption. For all governor i have used
the default parameters which means a sampling rate of 10ms and a
sampling_down_factor of 1 for ondemand and a sampling rate of 200ms
for conservative governor

              performance   powersave     ondemand      sched
conservative
run/sleep     perf  energy  perf  energy  perf  energy  perf  energy
perf  energy
50ms/1000ms   100%  100%      0%   34%     95%   62%     49%   35%     72%   33%
100ms/1000ms  100%  100%      0%   41%     98%   78%     69%   41%     92%   45%
200ms/1000ms  100%  100%      0%   51%     98%   78%     86%   62%     93%   62%
400ms/1000ms  100%  100%      0%   55%     99%   83%     93%   63%     99%   77%
1000ms/100ms  100%  100%      0%   74%     99%   100%    96%   97%     99%   99%

We can see that the responsiveness of the sched governor is not that
good for short running duration but this can probably can be explained
with the responsiveness of the load tracking which needs around 75ms
to reach 80% of max usage. there are several proposal to add the
blocked tasks in the usage of a cpu, this can may be improved the
responsiveness for some pattern.

Regards,
Vincent

On 27 June 2015 at 01:53, Michael Turquette <mturquette@baylibre.com> wrote:
> This series addresses the comments from v2 and rearranges the code to
> separate the bits that are safe to merge from the bits that are not. In
> particular the simplified governor no longer implements any policy of
> its own. That code is migrated to fair.c and marked as RFC. The
> capacity-selection code in fair.c (patch #4) is a placeholder and can be
> replaced by something more sophisticated, and it illustrates the use of
> the new governor api's for anyone that wants to play around with
> crafting a new policy.
>
> Patch #1 is a re-post from Morten, as it is the only dependency these
> patches have on his EAS series. Please consider merging patches #2 and
> #3 if they do not appear too controversial. Without enabling the feature
> in Kconfig will be no impact on existing code.
>
> Michael Turquette (3):
>   cpufreq: introduce cpufreq_driver_might_sleep
>   sched: scheduler-driven cpu frequency selection
>   [RFC] sched: cfs: cpu frequency scaling policy
>
> Morten Rasmussen (1):
>   arm: Frequency invariant scheduler load-tracking support
>
>  arch/arm/include/asm/topology.h |   7 +
>  arch/arm/kernel/smp.c           |  53 ++++++-
>  arch/arm/kernel/topology.c      |  17 +++
>  drivers/cpufreq/Kconfig         |  24 ++++
>  drivers/cpufreq/cpufreq.c       |   6 +
>  include/linux/cpufreq.h         |  12 ++
>  kernel/sched/Makefile           |   1 +
>  kernel/sched/cpufreq_sched.c    | 308 ++++++++++++++++++++++++++++++++++++++++
>  kernel/sched/fair.c             |  41 ++++++
>  kernel/sched/sched.h            |   8 ++
>  10 files changed, 475 insertions(+), 2 deletions(-)
>  create mode 100644 kernel/sched/cpufreq_sched.c
>
> --
> 1.9.1
>

[-- Attachment #2: 0001-arm64-Frequency-invariant-scheduler-load-tracking-su.patch --]
[-- Type: text/x-patch, Size: 4623 bytes --]

From 1be4a1e126cdb1b47b4d8a5dd156f8bca04715c0 Mon Sep 17 00:00:00 2001
From: Vincent Guittot <vincent.guittot@linaro.org>
Date: Fri, 3 Jul 2015 10:02:53 +0200
Subject: [PATCH] arm64: Frequency invariant scheduler load-tracking support

Implements arch-specific function to provide the scheduler with a
frequency scaling correction factor for more accurate load-tracking.

This patch is based on the implementation that has been done for arm arch
by Morten Rasmussen <morten.rasmussen@arm.com>

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/arm64/include/asm/topology.h | 11 ++++++
 arch/arm64/kernel/smp.c           | 80 +++++++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/topology.c      | 19 ++++++++++
 3 files changed, 110 insertions(+)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 7ebcd31..887835a 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -24,6 +24,17 @@ void init_cpu_topology(void);
 void store_cpu_topology(unsigned int cpuid);
 const struct cpumask *cpu_coregroup_mask(int cpu);
 
+#ifdef CONFIG_CPU_FREQ
+
+#define arch_scale_freq_capacity arm64_arch_scale_freq_capacity
+struct sched_domain;
+extern unsigned long arm64_arch_scale_freq_capacity(struct sched_domain *sd,
+				int cpu);
+
+DECLARE_PER_CPU(atomic_long_t, cpu_freq_capacity);
+
+#endif /* CONFIG_CPU_FREQ */
+
 #else
 
 static inline void init_cpu_topology(void) { }
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 328b8ce..45b88cb 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -34,6 +34,7 @@
 #include <linux/percpu.h>
 #include <linux/clockchips.h>
 #include <linux/completion.h>
+#include <linux/cpufreq.h>
 #include <linux/of.h>
 #include <linux/irq_work.h>
 
@@ -656,3 +657,82 @@ int setup_profiling_timer(unsigned int multiplier)
 {
 	return -EINVAL;
 }
+
+#ifdef CONFIG_CPU_FREQ
+
+static DEFINE_PER_CPU(atomic_long_t, cpu_max_freq);
+DEFINE_PER_CPU(atomic_long_t, cpu_freq_capacity);
+
+/*
+ * Scheduler load-tracking scale-invariance
+ *
+ * Provides the scheduler with a scale-invariance correction factor that
+ * compensates for frequency scaling through arch_scale_freq_capacity()
+ * (implemented in topology.c).
+ */
+static inline
+void scale_freq_capacity(int cpu, unsigned long curr, unsigned long max)
+{
+	unsigned long capacity;
+
+	if (!max)
+		return;
+
+	capacity = (curr << SCHED_CAPACITY_SHIFT) / max;
+	atomic_long_set(&per_cpu(cpu_freq_capacity, cpu), capacity);
+}
+
+static int cpufreq_callback(struct notifier_block *nb,
+					unsigned long val, void *data)
+{
+	struct cpufreq_freqs *freq = data;
+	int cpu = freq->cpu;
+	unsigned long max = atomic_long_read(&per_cpu(cpu_max_freq, cpu));
+
+	if (freq->flags & CPUFREQ_CONST_LOOPS)
+		return NOTIFY_OK;
+
+	scale_freq_capacity(cpu, freq->new, max);
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block cpufreq_notifier = {
+	.notifier_call  = cpufreq_callback,
+};
+
+static int cpufreq_policy_callback(struct notifier_block *nb,
+						unsigned long val, void *data)
+{
+	struct cpufreq_policy *policy = data;
+	int i;
+
+	if (val != CPUFREQ_NOTIFY)
+		return NOTIFY_OK;
+
+	for_each_cpu(i, policy->cpus) {
+		scale_freq_capacity(i, policy->cur, policy->max);
+		atomic_long_set(&per_cpu(cpu_max_freq, i), policy->max);
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block cpufreq_policy_notifier = {
+	.notifier_call	= cpufreq_policy_callback,
+};
+
+static int __init register_cpufreq_notifier(void)
+{
+	int ret;
+
+	ret = cpufreq_register_notifier(&cpufreq_notifier,
+						CPUFREQ_TRANSITION_NOTIFIER);
+	if (ret)
+		return ret;
+
+	return cpufreq_register_notifier(&cpufreq_policy_notifier,
+						CPUFREQ_POLICY_NOTIFIER);
+}
+core_initcall(register_cpufreq_notifier);
+#endif
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index fcb8f7b..6387f65 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -200,6 +200,25 @@ static int __init parse_dt_topology(void)
 	return ret;
 }
 
+#ifdef CONFIG_CPU_FREQ
+/*
+ * Scheduler load-tracking scale-invariance
+ *
+ * Provides the scheduler with a scale-invariance correction factor that
+ * compensates for frequency scaling (arch_scale_freq_capacity()). The scaling
+ * factor is updated in smp.c
+ */
+unsigned long arm64_arch_scale_freq_capacity(struct sched_domain *sd, int cpu)
+{
+	unsigned long curr = atomic_long_read(&per_cpu(cpu_freq_capacity, cpu));
+
+	if (!curr)
+		return SCHED_CAPACITY_SCALE;
+
+	return curr;
+}
+#endif
+
 /*
  * cpu topology table
  */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/4] sched: scheduler-driven cpu frequency selection
  2015-06-26 23:53 ` [PATCH v3 3/4] sched: scheduler-driven cpu frequency selection Michael Turquette
  2015-06-27  0:47   ` Felipe Balbi
@ 2015-07-06 20:06   ` Dietmar Eggemann
  1 sibling, 0 replies; 14+ messages in thread
From: Dietmar Eggemann @ 2015-07-06 20:06 UTC (permalink / raw)
  To: Michael Turquette, peterz@infradead.org, mingo@kernel.org
  Cc: linux-kernel@vger.kernel.org, preeti@linux.vnet.ibm.com,
	Morten Rasmussen, riel@redhat.com, efault@gmx.de,
	nicolas.pitre@linaro.org, daniel.lezcano@linaro.org,
	vincent.guittot@linaro.org, amit.kucheria@linaro.org, Juri Lelli,
	rjw@rjwysocki.net, viresh.kumar@linaro.org,
	ashwin.chaugule@linaro.org, alex.shi@linaro.org,
	linux-pm@vger.kernel.org, abelvesa@gmail.com, pebolle@tiscali.nl

Hi Mike,

On 27/06/15 00:53, Michael Turquette wrote:
> From: Michael Turquette <mturquette@baylibre.com>
> 

[...]

>  comment "CPU frequency scaling drivers"
> 
>  config CPUFREQ_DT
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 1f2c9a1..30241c9 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -494,6 +494,9 @@ extern struct cpufreq_governor cpufreq_gov_ondemand;
>  #elif defined(CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE)
>  extern struct cpufreq_governor cpufreq_gov_conservative;
>  #define CPUFREQ_DEFAULT_GOVERNOR       (&cpufreq_gov_conservative)
> +#elif defined(CONFIG_CPU_FREQ_DEFAULT_GOV_SCHED_GOV)
> +extern struct cpufreq_governor cpufreq_gov_sched_gov;

To get it building with CONFIG_CPU_FREQ_DEFAULT_GOV_SCHED=y .

-#elif defined(CONFIG_CPU_FREQ_DEFAULT_GOV_SCHED_GOV)
-extern struct cpufreq_governor cpufreq_gov_sched_gov;
+#elif defined(CONFIG_CPU_FREQ_DEFAULT_GOV_SCHED)
+extern struct cpufreq_governor cpufreq_gov_sched;

> +#define CPUFREQ_DEFAULT_GOVERNOR       (&cpufreq_gov_sched)
>  #endif

> +
> +/**
> + * cpufreq_sched_set_capacity - interface to scheduler for changing capacity values

[...]

minor nit:

s/cpufreq_sched_set_capacity/cpufreq_sched_set_cap

> + * @cpu: cpu whose capacity utilization has recently changed
> + * @capacity: the new capacity requested by cpu
> + *
> + * cpufreq_sched_sched_capacity is an interface exposed to the scheduler so
> + * that the scheduler may inform the governor of updates to capacity
> + * utilization and make changes to cpu frequency. Currently this interface is
> + * designed around PELT values in CFS. It can be expanded to other scheduling
> + * classes in the future if needed.
> + *
> + * cpufreq_sched_set_capacity raises an IPI. The irq_work handler for that IPI
> + * wakes up the thread that does the actual work, cpufreq_sched_thread.
> + *
> + * This functions bails out early if either condition is true:
> + * 1) this cpu did not the new maximum capacity for its frequency domain
> + * 2) no change in cpu frequency is necessary to meet the new capacity request
> + */
> +void cpufreq_sched_set_cap(int cpu, unsigned long capacity)
> +{
> +       unsigned int freq_new, cpu_tmp;
> +       struct cpufreq_policy *policy;
> +       struct gov_data *gd;
> +       unsigned long capacity_max = 0;

[...]


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-07-06 20:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-26 23:53 [PATCH v3 0/4] scheduler-driven cpu frequency selection Michael Turquette
2015-06-26 23:53 ` [PATCH v3 1/4] arm: Frequency invariant scheduler load-tracking support Michael Turquette
2015-06-26 23:53 ` [PATCH v3 2/4] cpufreq: introduce cpufreq_driver_might_sleep Michael Turquette
2015-06-27  0:48   ` Felipe Balbi
     [not found]     ` <20150629162621.9112.4040@quantum>
2015-06-29 16:39       ` Felipe Balbi
2015-06-29 16:56         ` Michael Turquette
2015-06-29 17:03           ` Felipe Balbi
2015-06-29 17:10             ` Michael Turquette
2015-06-26 23:53 ` [PATCH v3 3/4] sched: scheduler-driven cpu frequency selection Michael Turquette
2015-06-27  0:47   ` Felipe Balbi
     [not found]     ` <20150629164943.9112.4253@quantum>
2015-06-29 16:55       ` Felipe Balbi
2015-07-06 20:06   ` Dietmar Eggemann
2015-06-26 23:53 ` [PATCH v3 4/4] [RFC] sched: cfs: cpu frequency scaling policy Michael Turquette
2015-07-03  9:57 ` [PATCH v3 0/4] scheduler-driven cpu frequency selection Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).