cpufreq.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/6] cpufreq: Use idle micro-accounting information in ondemand governor
@ 2008-07-17 20:55 venkatesh.pallipadi
  2008-07-17 20:55 ` [patch 1/6] cpufreq: Add cpu number parameter to __cpufreq_driver_getavg() venkatesh.pallipadi
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: venkatesh.pallipadi @ 2008-07-17 20:55 UTC (permalink / raw)
  To: cpufreq; +Cc: davej

Currently, ondemand governor uses jiffy based idle statistics which is
very coarse grained. As a result, ondemand uses a big gaurd band around
CPU utilization while calculating the next freq.

This patchset lets ondemand use idle microaccounting when available
(when tickless feature is present) and switches to a smaller gaurd band.
This lets us do more aggressive power savings in partial idle workloads.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>


-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [patch 1/6] cpufreq: Add cpu number parameter to __cpufreq_driver_getavg()
  2008-07-17 20:55 [patch 0/6] cpufreq: Use idle micro-accounting information in ondemand governor venkatesh.pallipadi
@ 2008-07-17 20:55 ` venkatesh.pallipadi
  2008-07-17 20:55 ` [patch 2/6] cpufreq: Change load calculation in ondemand for software coordination venkatesh.pallipadi
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: venkatesh.pallipadi @ 2008-07-17 20:55 UTC (permalink / raw)
  To: cpufreq; +Cc: davej

[-- Attachment #1: cpufreq_getavg_cpu_param.patch --]
[-- Type: text/plain, Size: 3399 bytes --]

Add a cpu parameter to __cpufreq_driver_getavg(). This is needed for software
cpufreq coordination where policy->cpu may not be same as the CPU on which we
want to getavg frequency. 

A follow-on patch will use this parameter to getavg freq from all cpus
in policy->cpus.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

---
 arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |    5 +++++
 drivers/cpufreq/cpufreq.c                  |    6 +++---
 drivers/cpufreq/cpufreq_ondemand.c         |    2 +-
 include/linux/cpufreq.h                    |    3 ++-
 4 files changed, 11 insertions(+), 5 deletions(-)

Index: linux-2.6/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-2.6.orig/drivers/cpufreq/cpufreq.c	2008-07-16 10:58:51.000000000 -0700
+++ linux-2.6/drivers/cpufreq/cpufreq.c	2008-07-17 10:02:23.000000000 -0700
@@ -1482,7 +1482,7 @@ int cpufreq_driver_target(struct cpufreq
 }
 EXPORT_SYMBOL_GPL(cpufreq_driver_target);
 
-int __cpufreq_driver_getavg(struct cpufreq_policy *policy)
+int __cpufreq_driver_getavg(struct cpufreq_policy *policy, unsigned int cpu)
 {
 	int ret = 0;
 
@@ -1490,8 +1490,8 @@ int __cpufreq_driver_getavg(struct cpufr
 	if (!policy)
 		return -EINVAL;
 
-	if (cpu_online(policy->cpu) && cpufreq_driver->getavg)
-		ret = cpufreq_driver->getavg(policy->cpu);
+	if (cpu_online(cpu) && cpufreq_driver->getavg)
+		ret = cpufreq_driver->getavg(cpu);
 
 	cpufreq_cpu_put(policy);
 	return ret;
Index: linux-2.6/drivers/cpufreq/cpufreq_ondemand.c
===================================================================
--- linux-2.6.orig/drivers/cpufreq/cpufreq_ondemand.c	2008-07-16 10:58:51.000000000 -0700
+++ linux-2.6/drivers/cpufreq/cpufreq_ondemand.c	2008-07-17 10:02:23.000000000 -0700
@@ -415,7 +415,7 @@ static void dbs_check_cpu(struct cpu_dbs
 	if (load < (dbs_tuners_ins.up_threshold - 10)) {
 		unsigned int freq_next, freq_cur;
 
-		freq_cur = __cpufreq_driver_getavg(policy);
+		freq_cur = __cpufreq_driver_getavg(policy, policy->cpu);
 		if (!freq_cur)
 			freq_cur = policy->cur;
 
Index: linux-2.6/include/linux/cpufreq.h
===================================================================
--- linux-2.6.orig/include/linux/cpufreq.h	2008-07-16 10:58:51.000000000 -0700
+++ linux-2.6/include/linux/cpufreq.h	2008-07-17 10:02:23.000000000 -0700
@@ -189,7 +189,8 @@ extern int __cpufreq_driver_target(struc
 				   unsigned int relation);
 
 
-extern int __cpufreq_driver_getavg(struct cpufreq_policy *policy);
+extern int __cpufreq_driver_getavg(struct cpufreq_policy *policy,
+				   unsigned int cpu);
 
 int cpufreq_register_governor(struct cpufreq_governor *governor);
 void cpufreq_unregister_governor(struct cpufreq_governor *governor);
Index: linux-2.6/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c	2008-07-16 10:58:51.000000000 -0700
+++ linux-2.6/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c	2008-07-17 10:02:23.000000000 -0700
@@ -611,6 +611,11 @@ static int acpi_cpufreq_cpu_init(struct 
 	}
 #endif
 
+	/* Set drv_data for all secondary CPUs in policy->cpus mask */
+	for_each_cpu_mask(i, policy->cpus)
+		if (i != policy->cpu)
+			per_cpu(drv_data, i) = data;
+
 	/* capability check */
 	if (perf->state_count <= 1) {
 		dprintk("No P-States\n");

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [patch 2/6] cpufreq: Change load calculation in ondemand for software coordination
  2008-07-17 20:55 [patch 0/6] cpufreq: Use idle micro-accounting information in ondemand governor venkatesh.pallipadi
  2008-07-17 20:55 ` [patch 1/6] cpufreq: Add cpu number parameter to __cpufreq_driver_getavg() venkatesh.pallipadi
@ 2008-07-17 20:55 ` venkatesh.pallipadi
  2008-07-17 20:55 ` [patch 3/6] cpufreq: get_cpu_idle_time() changes in ondemand to suit idle-microaccounting venkatesh.pallipadi
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: venkatesh.pallipadi @ 2008-07-17 20:55 UTC (permalink / raw)
  To: cpufreq; +Cc: davej

[-- Attachment #1: ondemand_swcoord_simplify.patch --]
[-- Type: text/plain, Size: 4403 bytes --]

Change the load calculation algorithm in ondemand to work well with software
coordination of frequency across the dependent cpus.

Multiply individual CPU utilization with the average freq of that logical CPU
during the measurement interval (using getavg call). And find the max CPU
utilization number in terms of CPU freq. That number is then used to
get to the target freq for next sampling interval.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

---
 drivers/cpufreq/cpufreq_ondemand.c |   65 +++++++++++++++++++------------------
 1 file changed, 35 insertions(+), 30 deletions(-)

Index: linux-2.6/drivers/cpufreq/cpufreq_ondemand.c
===================================================================
--- linux-2.6.orig/drivers/cpufreq/cpufreq_ondemand.c	2008-07-17 12:59:31.000000000 -0700
+++ linux-2.6/drivers/cpufreq/cpufreq_ondemand.c	2008-07-17 13:05:51.000000000 -0700
@@ -334,9 +334,7 @@ static struct attribute_group dbs_attr_g
 
 static void dbs_check_cpu(struct cpu_dbs_info_s *this_dbs_info)
 {
-	unsigned int idle_ticks, total_ticks;
-	unsigned int load = 0;
-	cputime64_t cur_jiffies;
+	unsigned int max_load_freq;
 
 	struct cpufreq_policy *policy;
 	unsigned int j;
@@ -346,13 +344,7 @@ static void dbs_check_cpu(struct cpu_dbs
 
 	this_dbs_info->freq_lo = 0;
 	policy = this_dbs_info->cur_policy;
-	cur_jiffies = jiffies64_to_cputime64(get_jiffies_64());
-	total_ticks = (unsigned int) cputime64_sub(cur_jiffies,
-			this_dbs_info->prev_cpu_wall);
-	this_dbs_info->prev_cpu_wall = get_jiffies_64();
 
-	if (!total_ticks)
-		return;
 	/*
 	 * Every sampling_rate, we check, if current idle time is less
 	 * than 20% (default), then we try to increase frequency
@@ -365,27 +357,46 @@ static void dbs_check_cpu(struct cpu_dbs
 	 * 5% (default) of current frequency
 	 */
 
-	/* Get Idle Time */
-	idle_ticks = UINT_MAX;
+	/* Get Absolute Load - in terms of freq */
+	max_load_freq = 0;
+
 	for_each_cpu_mask(j, policy->cpus) {
-		cputime64_t total_idle_ticks;
-		unsigned int tmp_idle_ticks;
 		struct cpu_dbs_info_s *j_dbs_info;
+		cputime64_t cur_wall_time, cur_idle_time;
+		unsigned int idle_time, wall_time;
+		unsigned int load, load_freq;
+		int freq_avg;
 
 		j_dbs_info = &per_cpu(cpu_dbs_info, j);
-		total_idle_ticks = get_cpu_idle_time(j);
-		tmp_idle_ticks = (unsigned int) cputime64_sub(total_idle_ticks,
+		cur_wall_time = jiffies64_to_cputime64(get_jiffies_64());
+		wall_time = (unsigned int) cputime64_sub(cur_wall_time,
+				j_dbs_info->prev_cpu_wall);
+		j_dbs_info->prev_cpu_wall = cur_wall_time;
+
+		cur_idle_time = get_cpu_idle_time(j);
+		idle_time = (unsigned int) cputime64_sub(cur_idle_time,
 				j_dbs_info->prev_cpu_idle);
-		j_dbs_info->prev_cpu_idle = total_idle_ticks;
+		j_dbs_info->prev_cpu_idle = cur_idle_time;
+
+		if (unlikely(wall_time <= idle_time ||
+			     (cputime_to_msecs(wall_time) <
+			      dbs_tuners_ins.sampling_rate / (2 * 1000)))) {
+			continue;
+		}
+
+		load = 100 * (wall_time - idle_time) / wall_time;
 
-		if (tmp_idle_ticks < idle_ticks)
-			idle_ticks = tmp_idle_ticks;
+		freq_avg = __cpufreq_driver_getavg(policy, j);
+		if (freq_avg <= 0)
+			freq_avg = policy->cur;
+
+		load_freq = load * freq_avg;
+		if (load_freq > max_load_freq)
+			max_load_freq = load_freq;
 	}
-	if (likely(total_ticks > idle_ticks))
-		load = (100 * (total_ticks - idle_ticks)) / total_ticks;
 
 	/* Check for frequency increase */
-	if (load > dbs_tuners_ins.up_threshold) {
+	if (max_load_freq > dbs_tuners_ins.up_threshold * policy->cur) {
 		/* if we are already at full speed then break out early */
 		if (!dbs_tuners_ins.powersave_bias) {
 			if (policy->cur == policy->max)
@@ -412,15 +423,9 @@ static void dbs_check_cpu(struct cpu_dbs
 	 * can support the current CPU usage without triggering the up
 	 * policy. To be safe, we focus 10 points under the threshold.
 	 */
-	if (load < (dbs_tuners_ins.up_threshold - 10)) {
-		unsigned int freq_next, freq_cur;
-
-		freq_cur = __cpufreq_driver_getavg(policy, policy->cpu);
-		if (!freq_cur)
-			freq_cur = policy->cur;
-
-		freq_next = (freq_cur * load) /
-			(dbs_tuners_ins.up_threshold - 10);
+	if (max_load_freq < (dbs_tuners_ins.up_threshold - 10) * policy->cur) {
+		unsigned int freq_next;
+		freq_next = max_load_freq / (dbs_tuners_ins.up_threshold - 10);
 
 		if (!dbs_tuners_ins.powersave_bias) {
 			__cpufreq_driver_target(policy, freq_next,

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [patch 3/6] cpufreq: get_cpu_idle_time() changes in ondemand to suit idle-microaccounting
  2008-07-17 20:55 [patch 0/6] cpufreq: Use idle micro-accounting information in ondemand governor venkatesh.pallipadi
  2008-07-17 20:55 ` [patch 1/6] cpufreq: Add cpu number parameter to __cpufreq_driver_getavg() venkatesh.pallipadi
  2008-07-17 20:55 ` [patch 2/6] cpufreq: Change load calculation in ondemand for software coordination venkatesh.pallipadi
@ 2008-07-17 20:55 ` venkatesh.pallipadi
  2008-07-17 20:55 ` [patch 4/6] cpufreq_ondemand: Parameterize down differential venkatesh.pallipadi
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: venkatesh.pallipadi @ 2008-07-17 20:55 UTC (permalink / raw)
  To: cpufreq; +Cc: davej

[-- Attachment #1: prepare_idle_microaccounting.patch --]
[-- Type: text/plain, Size: 3121 bytes --]

Preparatory changes for doing idle micro-accounting in ondemand governor.
get_cpu_idle_time() gets extra parameter and returns idle time and also the
wall time that corresponds to the idle time measurement.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

---
 drivers/cpufreq/cpufreq_ondemand.c |   29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

Index: linux-2.6/drivers/cpufreq/cpufreq_ondemand.c
===================================================================
--- linux-2.6.orig/drivers/cpufreq/cpufreq_ondemand.c	2008-07-17 13:05:51.000000000 -0700
+++ linux-2.6/drivers/cpufreq/cpufreq_ondemand.c	2008-07-17 13:12:26.000000000 -0700
@@ -94,13 +94,13 @@ static struct dbs_tuners {
 	.powersave_bias = 0,
 };
 
-static inline cputime64_t get_cpu_idle_time(unsigned int cpu)
+static inline cputime64_t get_cpu_idle_time(unsigned int cpu, cputime64_t *wall)
 {
 	cputime64_t idle_time;
-	cputime64_t cur_jiffies;
+	cputime64_t cur_wall_time;
 	cputime64_t busy_time;
 
-	cur_jiffies = jiffies64_to_cputime64(get_jiffies_64());
+	cur_wall_time = jiffies64_to_cputime64(get_jiffies_64());
 	busy_time = cputime64_add(kstat_cpu(cpu).cpustat.user,
 			kstat_cpu(cpu).cpustat.system);
 
@@ -113,7 +113,10 @@ static inline cputime64_t get_cpu_idle_t
 				kstat_cpu(cpu).cpustat.nice);
 	}
 
-	idle_time = cputime64_sub(cur_jiffies, busy_time);
+	idle_time = cputime64_sub(cur_wall_time, busy_time);
+	if (wall)
+		*wall = cur_wall_time;
+
 	return idle_time;
 }
 
@@ -277,8 +280,8 @@ static ssize_t store_ignore_nice_load(st
 	for_each_online_cpu(j) {
 		struct cpu_dbs_info_s *dbs_info;
 		dbs_info = &per_cpu(cpu_dbs_info, j);
-		dbs_info->prev_cpu_idle = get_cpu_idle_time(j);
-		dbs_info->prev_cpu_wall = get_jiffies_64();
+		dbs_info->prev_cpu_idle = get_cpu_idle_time(j,
+						&dbs_info->prev_cpu_wall);
 	}
 	mutex_unlock(&dbs_mutex);
 
@@ -368,21 +371,19 @@ static void dbs_check_cpu(struct cpu_dbs
 		int freq_avg;
 
 		j_dbs_info = &per_cpu(cpu_dbs_info, j);
-		cur_wall_time = jiffies64_to_cputime64(get_jiffies_64());
+
+		cur_idle_time = get_cpu_idle_time(j, &cur_wall_time);
+
 		wall_time = (unsigned int) cputime64_sub(cur_wall_time,
 				j_dbs_info->prev_cpu_wall);
 		j_dbs_info->prev_cpu_wall = cur_wall_time;
 
-		cur_idle_time = get_cpu_idle_time(j);
 		idle_time = (unsigned int) cputime64_sub(cur_idle_time,
 				j_dbs_info->prev_cpu_idle);
 		j_dbs_info->prev_cpu_idle = cur_idle_time;
 
-		if (unlikely(wall_time <= idle_time ||
-			     (cputime_to_msecs(wall_time) <
-			      dbs_tuners_ins.sampling_rate / (2 * 1000)))) {
+		if (unlikely(!wall_time || wall_time < idle_time))
 			continue;
-		}
 
 		load = 100 * (wall_time - idle_time) / wall_time;
 
@@ -531,8 +532,8 @@ static int cpufreq_governor_dbs(struct c
 			j_dbs_info = &per_cpu(cpu_dbs_info, j);
 			j_dbs_info->cur_policy = policy;
 
-			j_dbs_info->prev_cpu_idle = get_cpu_idle_time(j);
-			j_dbs_info->prev_cpu_wall = get_jiffies_64();
+			j_dbs_info->prev_cpu_idle = get_cpu_idle_time(j,
+						&j_dbs_info->prev_cpu_wall);
 		}
 		this_dbs_info->cpu = cpu;
 		/*

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [patch 4/6] cpufreq_ondemand: Parameterize down differential
  2008-07-17 20:55 [patch 0/6] cpufreq: Use idle micro-accounting information in ondemand governor venkatesh.pallipadi
                   ` (2 preceding siblings ...)
  2008-07-17 20:55 ` [patch 3/6] cpufreq: get_cpu_idle_time() changes in ondemand to suit idle-microaccounting venkatesh.pallipadi
@ 2008-07-17 20:55 ` venkatesh.pallipadi
  2008-07-17 20:55 ` [patch 5/6] cpufreq: Changes to get_cpu_idle_time_us(), to be used in ondemand governor venkatesh.pallipadi
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: venkatesh.pallipadi @ 2008-07-17 20:55 UTC (permalink / raw)
  To: cpufreq; +Cc: davej

[-- Attachment #1: add_down_differential.patch --]
[-- Type: text/plain, Size: 2007 bytes --]

Use a parameter for down differential, instead of hardcoded 10%. Follow-on
patch changes the down-differential dynamically, based on whether
we are using idle micro-accounting or not.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

---
 drivers/cpufreq/cpufreq_ondemand.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/cpufreq/cpufreq_ondemand.c
===================================================================
--- linux-2.6.orig/drivers/cpufreq/cpufreq_ondemand.c	2008-07-17 13:12:26.000000000 -0700
+++ linux-2.6/drivers/cpufreq/cpufreq_ondemand.c	2008-07-17 13:17:06.000000000 -0700
@@ -24,6 +24,7 @@
  * It helps to keep variable names smaller, simpler
  */
 
+#define DEF_FREQUENCY_DOWN_DIFFERENTIAL		(10)
 #define DEF_FREQUENCY_UP_THRESHOLD		(80)
 #define MIN_FREQUENCY_UP_THRESHOLD		(11)
 #define MAX_FREQUENCY_UP_THRESHOLD		(100)
@@ -86,10 +87,12 @@ static struct workqueue_struct	*kondeman
 static struct dbs_tuners {
 	unsigned int sampling_rate;
 	unsigned int up_threshold;
+	unsigned int down_differential;
 	unsigned int ignore_nice;
 	unsigned int powersave_bias;
 } dbs_tuners_ins = {
 	.up_threshold = DEF_FREQUENCY_UP_THRESHOLD,
+	.down_differential = DEF_FREQUENCY_DOWN_DIFFERENTIAL,
 	.ignore_nice = 0,
 	.powersave_bias = 0,
 };
@@ -424,9 +427,13 @@ static void dbs_check_cpu(struct cpu_dbs
 	 * can support the current CPU usage without triggering the up
 	 * policy. To be safe, we focus 10 points under the threshold.
 	 */
-	if (max_load_freq < (dbs_tuners_ins.up_threshold - 10) * policy->cur) {
+	if (max_load_freq <
+	    (dbs_tuners_ins.up_threshold - dbs_tuners_ins.down_differential) *
+	     policy->cur) {
 		unsigned int freq_next;
-		freq_next = max_load_freq / (dbs_tuners_ins.up_threshold - 10);
+		freq_next = max_load_freq /
+				(dbs_tuners_ins.up_threshold -
+				 dbs_tuners_ins.down_differential);
 
 		if (!dbs_tuners_ins.powersave_bias) {
 			__cpufreq_driver_target(policy, freq_next,

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [patch 5/6] cpufreq: Changes to get_cpu_idle_time_us(), to be used in ondemand governor
  2008-07-17 20:55 [patch 0/6] cpufreq: Use idle micro-accounting information in ondemand governor venkatesh.pallipadi
                   ` (3 preceding siblings ...)
  2008-07-17 20:55 ` [patch 4/6] cpufreq_ondemand: Parameterize down differential venkatesh.pallipadi
@ 2008-07-17 20:55 ` venkatesh.pallipadi
  2008-07-17 20:56 ` [patch 6/6] cpufreq: Add idle microaccounting " venkatesh.pallipadi
  2008-07-30 16:56 ` [patch 0/6] cpufreq: Use idle micro-accounting information " Dave Jones
  6 siblings, 0 replies; 8+ messages in thread
From: venkatesh.pallipadi @ 2008-07-17 20:55 UTC (permalink / raw)
  To: cpufreq; +Cc: davej

[-- Attachment #1: generic_microaccounting_changes.patch --]
[-- Type: text/plain, Size: 1913 bytes --]

export get_cpu_idle_time_us() for it to be used in ondemand governor.
Last update time can be current time when the CPU is currently non-idle,
accounting for the busy time since last idle.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

---
 include/linux/tick.h     |    2 +-
 kernel/time/tick-sched.c |   11 ++++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c	2008-07-17 13:19:08.000000000 -0700
+++ linux-2.6/kernel/time/tick-sched.c	2008-07-17 13:27:29.000000000 -0700
@@ -20,6 +20,7 @@
 #include <linux/profile.h>
 #include <linux/sched.h>
 #include <linux/tick.h>
+#include <linux/module.h>
 
 #include <asm/irq_regs.h>
 
@@ -184,9 +185,17 @@ u64 get_cpu_idle_time_us(int cpu, u64 *l
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
 
-	*last_update_time = ktime_to_us(ts->idle_lastupdate);
+	if (!tick_nohz_enabled)
+		return -1;
+
+	if (ts->idle_active)
+		*last_update_time = ktime_to_us(ts->idle_lastupdate);
+	else
+		*last_update_time = ktime_to_us(ktime_get());
+
 	return ktime_to_us(ts->idle_sleeptime);
 }
+EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
 
 /**
  * tick_nohz_stop_sched_tick - stop the idle tick from the idle task
Index: linux-2.6/include/linux/tick.h
===================================================================
--- linux-2.6.orig/include/linux/tick.h	2008-07-17 13:19:08.000000000 -0700
+++ linux-2.6/include/linux/tick.h	2008-07-17 13:27:29.000000000 -0700
@@ -122,7 +122,7 @@ static inline ktime_t tick_nohz_get_slee
 	return len;
 }
 static inline void tick_nohz_stop_idle(int cpu) { }
-static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return 0; }
+static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
 # endif /* !NO_HZ */
 
 #endif

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [patch 6/6] cpufreq: Add idle microaccounting in ondemand governor
  2008-07-17 20:55 [patch 0/6] cpufreq: Use idle micro-accounting information in ondemand governor venkatesh.pallipadi
                   ` (4 preceding siblings ...)
  2008-07-17 20:55 ` [patch 5/6] cpufreq: Changes to get_cpu_idle_time_us(), to be used in ondemand governor venkatesh.pallipadi
@ 2008-07-17 20:56 ` venkatesh.pallipadi
  2008-07-30 16:56 ` [patch 0/6] cpufreq: Use idle micro-accounting information " Dave Jones
  6 siblings, 0 replies; 8+ messages in thread
From: venkatesh.pallipadi @ 2008-07-17 20:56 UTC (permalink / raw)
  To: cpufreq; +Cc: davej

[-- Attachment #1: intro_idle_microaccounting.patch --]
[-- Type: text/plain, Size: 3542 bytes --]

Use get_cpu_idle_time_us() to get micro-accounted idle information.
This enables ondemand to get more accurate idle and busy timings
than the jiffy based calculation. As a result, we can decrease
the ondemand safety gaurd band from 80-10 to 95-3.

Results in more aggressive power savings.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

---
 drivers/cpufreq/cpufreq_ondemand.c |   46 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/cpufreq/cpufreq_ondemand.c
===================================================================
--- linux-2.6.orig/drivers/cpufreq/cpufreq_ondemand.c	2008-07-17 13:19:08.000000000 -0700
+++ linux-2.6/drivers/cpufreq/cpufreq_ondemand.c	2008-07-17 13:27:33.000000000 -0700
@@ -18,6 +18,9 @@
 #include <linux/jiffies.h>
 #include <linux/kernel_stat.h>
 #include <linux/mutex.h>
+#include <linux/hrtimer.h>
+#include <linux/tick.h>
+#include <linux/ktime.h>
 
 /*
  * dbs is used in this file as a shortform for demandbased switching
@@ -26,6 +29,8 @@
 
 #define DEF_FREQUENCY_DOWN_DIFFERENTIAL		(10)
 #define DEF_FREQUENCY_UP_THRESHOLD		(80)
+#define MICRO_FREQUENCY_DOWN_DIFFERENTIAL	(3)
+#define MICRO_FREQUENCY_UP_THRESHOLD		(95)
 #define MIN_FREQUENCY_UP_THRESHOLD		(11)
 #define MAX_FREQUENCY_UP_THRESHOLD		(100)
 
@@ -58,6 +63,7 @@ enum {DBS_NORMAL_SAMPLE, DBS_SUB_SAMPLE}
 struct cpu_dbs_info_s {
 	cputime64_t prev_cpu_idle;
 	cputime64_t prev_cpu_wall;
+	cputime64_t prev_cpu_nice;
 	struct cpufreq_policy *cur_policy;
  	struct delayed_work work;
 	struct cpufreq_frequency_table *freq_table;
@@ -97,7 +103,8 @@ static struct dbs_tuners {
 	.powersave_bias = 0,
 };
 
-static inline cputime64_t get_cpu_idle_time(unsigned int cpu, cputime64_t *wall)
+static inline cputime64_t get_cpu_idle_time_jiffy(unsigned int cpu,
+							cputime64_t *wall)
 {
 	cputime64_t idle_time;
 	cputime64_t cur_wall_time;
@@ -123,6 +130,33 @@ static inline cputime64_t get_cpu_idle_t
 	return idle_time;
 }
 
+static inline cputime64_t get_cpu_idle_time(unsigned int cpu, cputime64_t *wall)
+{
+	u64 idle_time = get_cpu_idle_time_us(cpu, wall);
+
+	if (idle_time == -1ULL)
+		return get_cpu_idle_time_jiffy(cpu, wall);
+
+	if (dbs_tuners_ins.ignore_nice) {
+		cputime64_t cur_nice;
+		unsigned long cur_nice_jiffies;
+		struct cpu_dbs_info_s *dbs_info;
+
+		dbs_info = &per_cpu(cpu_dbs_info, cpu);
+		cur_nice = cputime64_sub(kstat_cpu(cpu).cpustat.nice,
+					 dbs_info->prev_cpu_nice);
+		/*
+		 * Assumption: nice time between sampling periods will be
+		 * less than 2^32 jiffies for 32 bit sys
+		 */
+		cur_nice_jiffies = (unsigned long)
+					cputime64_to_jiffies64(cur_nice);
+		dbs_info->prev_cpu_nice = kstat_cpu(cpu).cpustat.nice;
+		return idle_time + jiffies_to_usecs(cur_nice_jiffies);
+	}
+	return idle_time;
+}
+
 /*
  * Find right freq to be set now with powersave_bias on.
  * Returns the freq_hi to be used right now and will set freq_hi_jiffies,
@@ -602,6 +636,16 @@ EXPORT_SYMBOL(cpufreq_gov_ondemand);
 
 static int __init cpufreq_gov_dbs_init(void)
 {
+	cputime64_t wall;
+	u64 idle_time = get_cpu_idle_time_us(smp_processor_id(), &wall);
+
+	if (idle_time != -1ULL) {
+		/* Idle micro accounting is supported. Use finer thresholds */
+		dbs_tuners_ins.up_threshold = MICRO_FREQUENCY_UP_THRESHOLD;
+		dbs_tuners_ins.down_differential =
+					MICRO_FREQUENCY_DOWN_DIFFERENTIAL;
+	}
+
 	kondemand_wq = create_workqueue("kondemand");
 	if (!kondemand_wq) {
 		printk(KERN_ERR "Creation of kondemand failed\n");

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch 0/6] cpufreq: Use idle micro-accounting information in ondemand governor
  2008-07-17 20:55 [patch 0/6] cpufreq: Use idle micro-accounting information in ondemand governor venkatesh.pallipadi
                   ` (5 preceding siblings ...)
  2008-07-17 20:56 ` [patch 6/6] cpufreq: Add idle microaccounting " venkatesh.pallipadi
@ 2008-07-30 16:56 ` Dave Jones
  6 siblings, 0 replies; 8+ messages in thread
From: Dave Jones @ 2008-07-30 16:56 UTC (permalink / raw)
  To: venkatesh.pallipadi; +Cc: davej, cpufreq

On Thu, Jul 17, 2008 at 01:55:54PM -0700, Venki Pallipadi wrote:
 > Currently, ondemand governor uses jiffy based idle statistics which is
 > very coarse grained. As a result, ondemand uses a big gaurd band around
 > CPU utilization while calculating the next freq.
 > 
 > This patchset lets ondemand use idle microaccounting when available
 > (when tickless feature is present) and switches to a smaller gaurd band.
 > This lets us do more aggressive power savings in partial idle workloads.
 > 
 > Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

I had problems applying these.

Applying [CPUFREQ][1/6] cpufreq: Add cpu number parameter to __cpufreq_driver_getavg()
Applying [CPUFREQ][2/6] cpufreq: Change load calculation in ondemand for software coordination
error: patch failed: drivers/cpufreq/cpufreq_ondemand.c:365

Can you rediff them on top of the next branch of git-cpufreq ?

thanks,

	Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-07-30 16:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-17 20:55 [patch 0/6] cpufreq: Use idle micro-accounting information in ondemand governor venkatesh.pallipadi
2008-07-17 20:55 ` [patch 1/6] cpufreq: Add cpu number parameter to __cpufreq_driver_getavg() venkatesh.pallipadi
2008-07-17 20:55 ` [patch 2/6] cpufreq: Change load calculation in ondemand for software coordination venkatesh.pallipadi
2008-07-17 20:55 ` [patch 3/6] cpufreq: get_cpu_idle_time() changes in ondemand to suit idle-microaccounting venkatesh.pallipadi
2008-07-17 20:55 ` [patch 4/6] cpufreq_ondemand: Parameterize down differential venkatesh.pallipadi
2008-07-17 20:55 ` [patch 5/6] cpufreq: Changes to get_cpu_idle_time_us(), to be used in ondemand governor venkatesh.pallipadi
2008-07-17 20:56 ` [patch 6/6] cpufreq: Add idle microaccounting " venkatesh.pallipadi
2008-07-30 16:56 ` [patch 0/6] cpufreq: Use idle micro-accounting information " Dave Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).