RE: [PATCH] cpuidle: Allow menu governor to enter deeper sleep states after some time

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Doug Smythies" <dsmythies@telus.net>
To: 'Thomas Ilsche' <thomas.ilsche@tu-dresden.de>
Cc: "'Marcus Hähnel'" <mhaehnel@os.inf.tu-dresden.de>,
	"'Daniel Hackenberg'" <daniel.hackenberg@tu-dresden.de>,
	"'Robert Schöne'" <robert.schoene@tu-dresden.de>,
	mario.bielert@tu-dresden.de,
	"'Rafael J. Wysocki'" <rafael.j.wysocki@intel.com>,
	"'Alex Shi'" <alex.shi@linaro.org>,
	"'Ingo Molnar'" <mingo@kernel.org>,
	"'Rik van Riel'" <riel@redhat.com>,
	"'Daniel Lezcano'" <daniel.lezcano@linaro.org>,
	"'Nicholas Piggin'" <npiggin@gmail.com>,
	linux-pm@vger.kernel.org, "'Len Brown'" <lenb@kernel.org>
Subject: RE: [PATCH] cpuidle: Allow menu governor to enter deeper sleep states after some time
Date: Thu, 19 Oct 2017 17:17:42 -0700	[thread overview]
Message-ID: <000101d34938$da740870$8f5c1950$@net> (raw)
In-Reply-To: aiVvdxj7gLueDaiVwdfXxs

Hi Thomas,

Note 1: I think it is just a coincidence that Len Brown also replied to
	this old e-mail earlier today.

Note 2: Rafael referred me to this e-mail about a week ago, after I was
	complaining about excessive energy consumption due to unused CPUs
	not going idle when they should. My example use case was a 100%
	load on one CPU.

Note 3: The patch does not work for me as it was sent. See notes in-line
	further below.

On 2017.07.27 05:50 Thomas Ilsche wrote:

> On several contemporary Intel systems, we have observed that the idle
> power consumption is sometimes significantly too high, e.g. 105 W instead
> of 74 W for several seconds.

On my test system, I do not observe this magnitude of excessive idle energy.
While I do observe occurrences of excessive time spent in idle state 0, they
do not add up to significant energy waste.

> We tracked this issue down to patterns that confuse the heuristic of the
> menu idle governor. If the governor observes several consecutive short
> sleeps, it does expect another short sleep duration despite no immediate
> timers, sending the CPU into a shallow sleep state. On a dyntick-idle
> kernel, there are no means for the core to enter a more efficient sleep
> state if the prediction was wrong. Ironically this is particularly
> problematic if some cores of the system have very infrequent activity, i.e.
> many cores and optimized configuration for low idle power.

If I use your "powernightmare" method (or idle-virus as Len called it)
I get a drastic package power consumption difference between
fallback_timer_enabled being enabled or disabled. It seems mainly due
to changes in time spent in idle state 2. There was also a contribution
from idle state 0 on one CPU. This was a surprise to me as my thinking was
that wasted energy was dominated by extra time in idle state 0.

Example data (kernel 4.14-rc5 + this patch):

Average package power with fallback timer disabled: ~8.06 watts
Average package power with fallback timer enabled: ~3.95 watts

> The effect, cause, triggers, mitigation technique and verification thereof
> are described in detail in the a paper that is pending publication [1].

The paper is very interesting. Thank you.

> Fixing the heuristic to make a better prediction does not seem to be
> generally feasible. The following patch addresses this issue by setting a
> timer such that when the an expected immediate wake-up fails to appear, the
> CPU is woken up to go into a deeper sleep state. If the heuristic was
> right, the timer is simply canceled.
>
> Please consider the patch a draft for discussion. We need to address:
>
>  * Avoid programming the fallback timer when the deepest sleep state is
>    already chosen.
>  * Determine good default values for the introduced configurables. This
>    is difficult due to the large variety of system configurations affected
>    by the menu governor. Nevertheless we believe this can be done such
>     that many systems benefit without significant degradation in any case.
>  * Document (or remove) the sysfs entries.
>
> But first, I would like to invite comments if this is going in the right
> direction, or if this should be addressed in a different way.

The problem I am struggling with is the patch makes no difference
for my example use case of 100% load on one CPU, others idle.
The wasted power is entirely from idle state 0, and idle state 0
times remain about the same with or without the fallback timer.
If I merely disable idle state 0, things are great. However,
just disabling idle state 0 does not help when "powermightmares"
are present.

Example data (kernel 4.14-rc5 + this patch):

Average package power with fallback timer disabled: ~27.2 watts
Average package power with fallback timer enabled: ~28 watts
Average package power with state 0 disabled, fallback timer disabled: ~23.9 watts
Average package power with state 0 disabled, fallback timer enabled: ~23.9 watts

Question: Could this be because I made a mistake re-basing the patch to kernel 4.14-rc5?
Answer: Perhaps. I am unfamiliar with this area of the code.

>
> [1] https://tu-dresden.de/zih/forschung/ressourcen/dateien/projekte/haec/powernightmares.pdf
>
> Signed-off-by: Thomas Ilsche <thomas.ilsche@tu-dresden.de>
> Signed-off-by: Marcus Hähnel <mhaehnel@os.inf.tu-dresden.de>
> ---
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 61b64c2b2cb8..45bbeb362809 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -22,6 +22,7 @@
>   #include <linux/sched/stat.h>
>   #include <linux/math64.h>
>   #include <linux/cpu.h>
> +#include <linux/smp.h>
>   
>   /*
>    * Please note when changing the tuning values:
> @@ -130,6 +131,10 @@ struct menu_device {
>   	unsigned int	correction_factor[BUCKETS];
>   	unsigned int	intervals[INTERVALS];
>   	int		interval_ptr;
> +
> +	struct hrtimer	fallback_timer;
> +	int             have_timer;
> +	unsigned int	disregard_past;
>   };
>   
>   
> @@ -190,6 +195,85 @@ static inline int performance_multiplier(unsigned long nr_iowaiters, unsigned lo
>   
>   static DEFINE_PER_CPU(struct menu_device, menu_devices);
>   
> +static unsigned int fallback_timer_disregard_past = 1;
> +static unsigned int diff_threshold_bits = 8;
> +static unsigned int fallback_timer_enabled;

Shouldn't fallback_timer_enabled be initialized?
On my system it defaults to 0, or disabled, and due to another problem,
see further down, this patch wasn't working.

> +static unsigned int fallback_timer_interval_us = 10000;
> +
> +#define MENU_ATTR_RW(name, var, range_min, range_max, wfun) \
> +	static ssize_t show_##name(struct device *dev, \
> +				   struct device_attribute *attr, char *buf) \
> +	{ \
> +		return snprintf(buf, 12, "%i\n", var); \
> +	} \
> +	static ssize_t store_##name(struct device *dev, \
> +				    struct device_attribute *attr, \
> +				    const char *buf, size_t count) \
> +	{ \
> +		unsigned int tmp; \
> +		int ret = kstrtouint(buf, 10, &tmp); \
> +		if (ret != 1) \

Shouldn't this be:

 +		if (ret != 0) \

(or maybe just " if (ret) \". I did (and therefore tested) the previous.

?
Anyway, it doesn't work unless I make this change.

> +			return -EINVAL; \
> +		if (tmp > range_max || tmp < range_min) { \
> +			pr_warn("value outside of valid range [%u, %u]\n", \
> +				range_min, range_max); \
> +			return -EINVAL; \
> +		} \
> +		var = tmp; \
> +		wfun \
> +		return count; \
> +	} \
> +	static DEVICE_ATTR(fallback_timer_##name, 0644, \
> +			   show_##name, store_##name)
> +
> +MENU_ATTR_RW(threshold_bits, diff_threshold_bits, 0, 31, {});
> +
> +MENU_ATTR_RW(enable, fallback_timer_enabled, 0, 1, {
> +	int i;
> +
> +	for_each_possible_cpu(i) {
> +		struct menu_device *data = per_cpu_ptr(&menu_devices, i);
> +
> +		if (!fallback_timer_enabled) {
> +			data->have_timer = 0;
> +			hrtimer_cancel(&(data->fallback_timer));
> +		}
> +	} });
> +
> +MENU_ATTR_RW(interval_us, fallback_timer_interval_us, 1, 1000000, {});
> +
> +MENU_ATTR_RW(disregard_past, fallback_timer_disregard_past, 0, 1, {
> +	int i;
> +
> +	for_each_possible_cpu(i) {
> +		struct menu_device *data = per_cpu_ptr(&menu_devices, i);
> +
> +		data->disregard_past = 0;
> +	} });
> +
> +static struct attribute *menu_attrs[] = {
> +	&dev_attr_fallback_timer_threshold_bits.attr,
> +	&dev_attr_fallback_timer_enable.attr,
> +	&dev_attr_fallback_timer_interval_us.attr,
> +	&dev_attr_fallback_timer_disregard_past.attr,
> +	NULL
> +};
> +
> +static struct attribute_group menu_attr_group = {
> +	.attrs = menu_attrs,
> +	.name = "cpuidle_menu",
> + };
> +
> +int menu_add_interface(struct device *dev)
> +{
> +	return sysfs_create_group(&dev->kobj, &menu_attr_group);
> +}
> +
> +void menu_remove_interface(struct device *dev)
> +{
> +	sysfs_remove_group(&dev->kobj, &menu_attr_group);
> +}
> +
>   static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev);
>   
>   /*
> @@ -275,6 +359,14 @@ static unsigned int get_typical_interval(struct menu_device *data)
>   	goto again;
>   }
>   
> +static enum hrtimer_restart fallback_timer_fun(struct hrtimer *tmr)
> +{
> +	struct menu_device *mdata = this_cpu_ptr(&menu_devices);
> +
> +	mdata->disregard_past = 1;
> +	return HRTIMER_NORESTART;
> +}
> +
>   /**
>    * menu_select - selects the next idle state to enter
>    * @drv: cpuidle driver containing state data
> @@ -293,6 +385,11 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>   	unsigned long nr_iowaiters, cpu_load;
>   	int resume_latency = dev_pm_qos_raw_read_value(device);
>   
> +	if (fallback_timer_enabled && data->have_timer) {
> +		data->have_timer = 0;
> +		hrtimer_cancel(&(data->fallback_timer));
> +	}
> +
>   	if (data->needs_update) {
>   		menu_update(drv, dev);
>   		data->needs_update = 0;
> @@ -322,7 +419,32 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>   					 RESOLUTION * DECAY);
>   
>   	expected_interval = get_typical_interval(data);
> -	expected_interval = min(expected_interval, data->next_timer_us);
> +
> +	if (fallback_timer_enabled && fallback_timer_disregard_past
> +	    && data->disregard_past)  {
> +		expected_interval = data->next_timer_us;
> +		// Only disregard the past once! Then try again
> +		data->disregard_past = 0;
> +	} else {
> +		if (fallback_timer_enabled
> +		    && expected_interval < (data->next_timer_us >> diff_threshold_bits)
> +		    && data->next_timer_us > fallback_timer_interval_us * 2) {
> +			/*
> +			 * Program the fallback timer if the gap between the
> +			 * expected interval by heuristic and the next regular
> +			 * timer are too far apart.
> +			 * However, only do this when we didn't just wakup from
> +			 * a timer and are told to disregard the heuristic
> +			 */
> +			ktime_t interval =
> +				ktime_set(0, fallback_timer_interval_us * 1000);
> +
> +			hrtimer_start(&(data->fallback_timer), interval,
> +				      HRTIMER_MODE_REL_PINNED);
> +			data->have_timer = 1;
> +		}
> +		expected_interval = min(expected_interval, data->next_timer_us);
> +	}
>  
>  	if (CPUIDLE_DRIVER_STATE_START > 0) {
>  		struct cpuidle_state *s = &drv->states[CPUIDLE_DRIVER_STATE_START];
> @@ -398,6 +520,11 @@ static void menu_reflect(struct cpuidle_device *dev, int index)
>  {
>  	struct menu_device *data = this_cpu_ptr(&menu_devices);
>  
> +	if (fallback_timer_enabled && data->have_timer) {
> +		data->have_timer = 0;
> +		hrtimer_cancel(&data->fallback_timer);
> +	}
> +
>  	data->last_state_idx = index;
>  	data->needs_update = 1;
>  }
> @@ -486,6 +613,10 @@ static int menu_enable_device(struct cpuidle_driver *drv,
>  
>  	memset(data, 0, sizeof(struct menu_device));
>   
> +	hrtimer_init(&(data->fallback_timer),
> +		     CLOCK_REALTIME, HRTIMER_MODE_REL_PINNED);
> +	data->fallback_timer.function = fallback_timer_fun;
> +
>  	/*
>  	 * if the correction factor is 0 (eg first time init or cpu hotplug
>  	 * etc), we actually want to start out with a unity factor.
> @@ -509,6 +640,10 @@ static struct cpuidle_governor menu_governor = {
>   */
>  static int __init init_menu(void)
>  {
> +	int ret = menu_add_interface(cpu_subsys.dev_root);
> +
> +	if (ret)
> +		return ret;
>  	return cpuidle_register_governor(&menu_governor);
>  }

next prev parent reply	other threads:[~2017-10-20  0:17 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <a181bf42-9462-476c-6dcd-39fc7151957f@tu-dresden.de>
2017-07-27 12:50 ` [PATCH] cpuidle: Allow menu governor to enter deeper sleep states after some time Thomas Ilsche
2017-10-19  7:46   ` Len Brown
2017-10-20 16:31     ` Thomas Ilsche
2017-10-21 14:27     ` Doug Smythies
2017-10-20  0:17 ` Doug Smythies [this message]
2017-10-20 17:13   ` Thomas Ilsche
2017-10-21 14:28   ` Doug Smythies
2017-11-07 23:04     ` Thomas Ilsche
2017-11-08  4:53       ` Len Brown
2017-11-08  6:01         ` Yu Chen
2017-11-08 16:26         ` Doug Smythies
2017-11-08 16:26     ` Doug Smythies
2017-11-10 17:42     ` Doug Smythies
2017-11-14  6:12     ` Doug Smythies
2017-11-16 16:11       ` Thomas Ilsche
2017-11-16 22:47       ` Doug Smythies
2017-11-24 17:36       ` Doug Smythies
2017-12-02 12:56         ` Thomas Ilsche
2017-12-15 10:44           ` Thomas Ilsche
2017-12-15 14:23             ` Rafael J. Wysocki
2017-12-21  9:43               ` Thomas Ilsche
2017-12-22 19:37                 ` Rafael J. Wysocki
2017-12-15 16:16             ` Doug Smythies
2017-12-16  2:34               ` Rafael J. Wysocki
2017-11-25 16:30       ` Doug Smythies

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000101d34938$da740870$8f5c1950$@net' \
    --to=dsmythies@telus.net \
    --cc=alex.shi@linaro.org \
    --cc=daniel.hackenberg@tu-dresden.de \
    --cc=daniel.lezcano@linaro.org \
    --cc=lenb@kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mario.bielert@tu-dresden.de \
    --cc=mhaehnel@os.inf.tu-dresden.de \
    --cc=mingo@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=riel@redhat.com \
    --cc=robert.schoene@tu-dresden.de \
    --cc=thomas.ilsche@tu-dresden.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).