All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Doug Smythies" <dsmythies@telus.net>
To: 'Thomas Ilsche' <thomas.ilsche@tu-dresden.de>
Cc: "'Marcus Hähnel'" <mhaehnel@os.inf.tu-dresden.de>,
	"'Daniel Hackenberg'" <daniel.hackenberg@tu-dresden.de>,
	"'Robert Schöne'" <robert.schoene@tu-dresden.de>,
	mario.bielert@tu-dresden.de,
	"'Rafael J. Wysocki'" <rafael.j.wysocki@intel.com>,
	"'Alex Shi'" <alex.shi@linaro.org>,
	"'Ingo Molnar'" <mingo@kernel.org>,
	"'Rik van Riel'" <riel@redhat.com>,
	"'Daniel Lezcano'" <daniel.lezcano@linaro.org>,
	"'Nicholas Piggin'" <npiggin@gmail.com>,
	linux-pm@vger.kernel.org, "'Len Brown'" <lenb@kernel.org>
Subject: RE: [PATCH] cpuidle: Allow menu governor to enter deeper sleep states after some time
Date: Thu, 19 Oct 2017 17:17:42 -0700	[thread overview]
Message-ID: <000101d34938$da740870$8f5c1950$@net> (raw)
In-Reply-To: aiVvdxj7gLueDaiVwdfXxs

Hi Thomas,

Note 1: I think it is just a coincidence that Len Brown also replied to
	this old e-mail earlier today.

Note 2: Rafael referred me to this e-mail about a week ago, after I was
	complaining about excessive energy consumption due to unused CPUs
	not going idle when they should. My example use case was a 100%
	load on one CPU.

Note 3: The patch does not work for me as it was sent. See notes in-line
	further below.

On 2017.07.27 05:50 Thomas Ilsche wrote:

> On several contemporary Intel systems, we have observed that the idle
> power consumption is sometimes significantly too high, e.g. 105 W instead
> of 74 W for several seconds.

On my test system, I do not observe this magnitude of excessive idle energy.
While I do observe occurrences of excessive time spent in idle state 0, they
do not add up to significant energy waste.

> We tracked this issue down to patterns that confuse the heuristic of the
> menu idle governor. If the governor observes several consecutive short
> sleeps, it does expect another short sleep duration despite no immediate
> timers, sending the CPU into a shallow sleep state. On a dyntick-idle
> kernel, there are no means for the core to enter a more efficient sleep
> state if the prediction was wrong. Ironically this is particularly
> problematic if some cores of the system have very infrequent activity, i.e.
> many cores and optimized configuration for low idle power.

If I use your "powernightmare" method (or idle-virus as Len called it)
I get a drastic package power consumption difference between
fallback_timer_enabled being enabled or disabled. It seems mainly due
to changes in time spent in idle state 2. There was also a contribution
from idle state 0 on one CPU. This was a surprise to me as my thinking was
that wasted energy was dominated by extra time in idle state 0.

Example data (kernel 4.14-rc5 + this patch):

Average package power with fallback timer disabled: ~8.06 watts
Average package power with fallback timer enabled: ~3.95 watts

> The effect, cause, triggers, mitigation technique and verification thereof
> are described in detail in the a paper that is pending publication [1].

The paper is very interesting. Thank you.

> Fixing the heuristic to make a better prediction does not seem to be
> generally feasible. The following patch addresses this issue by setting a
> timer such that when the an expected immediate wake-up fails to appear, the
> CPU is woken up to go into a deeper sleep state. If the heuristic was
> right, the timer is simply canceled.
>
> Please consider the patch a draft for discussion. We need to address:
>
>  * Avoid programming the fallback timer when the deepest sleep state is
>    already chosen.
>  * Determine good default values for the introduced configurables. This
>    is difficult due to the large variety of system configurations affected
>    by the menu governor. Nevertheless we believe this can be done such
>     that many systems benefit without significant degradation in any case.
>  * Document (or remove) the sysfs entries.
>
> But first, I would like to invite comments if this is going in the right
> direction, or if this should be addressed in a different way.

The problem I am struggling with is the patch makes no difference
for my example use case of 100% load on one CPU, others idle.
The wasted power is entirely from idle state 0, and idle state 0
times remain about the same with or without the fallback timer.
If I merely disable idle state 0, things are great. However,
just disabling idle state 0 does not help when "powermightmares"
are present.

Example data (kernel 4.14-rc5 + this patch):

Average package power with fallback timer disabled: ~27.2 watts
Average package power with fallback timer enabled: ~28 watts
Average package power with state 0 disabled, fallback timer disabled: ~23.9 watts
Average package power with state 0 disabled, fallback timer enabled: ~23.9 watts

Question: Could this be because I made a mistake re-basing the patch to kernel 4.14-rc5?
Answer: Perhaps. I am unfamiliar with this area of the code.

>
> [1] https://tu-dresden.de/zih/forschung/ressourcen/dateien/projekte/haec/powernightmares.pdf
>
> Signed-off-by: Thomas Ilsche <thomas.ilsche@tu-dresden.de>
> Signed-off-by: Marcus Hähnel <mhaehnel@os.inf.tu-dresden.de>
> ---
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 61b64c2b2cb8..45bbeb362809 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -22,6 +22,7 @@
>   #include <linux/sched/stat.h>
>   #include <linux/math64.h>
>   #include <linux/cpu.h>
> +#include <linux/smp.h>
>   
>   /*
>    * Please note when changing the tuning values:
> @@ -130,6 +131,10 @@ struct menu_device {
>   	unsigned int	correction_factor[BUCKETS];
>   	unsigned int	intervals[INTERVALS];
>   	int		interval_ptr;
> +
> +	struct hrtimer	fallback_timer;
> +	int             have_timer;
> +	unsigned int	disregard_past;
>   };
>   
>   
> @@ -190,6 +195,85 @@ static inline int performance_multiplier(unsigned long nr_iowaiters, unsigned lo
>   
>   static DEFINE_PER_CPU(struct menu_device, menu_devices);
>   
> +static unsigned int fallback_timer_disregard_past = 1;
> +static unsigned int diff_threshold_bits = 8;
> +static unsigned int fallback_timer_enabled;

Shouldn't fallback_timer_enabled be initialized?
On my system it defaults to 0, or disabled, and due to another problem,
see further down, this patch wasn't working.

> +static unsigned int fallback_timer_interval_us = 10000;
> +
> +#define MENU_ATTR_RW(name, var, range_min, range_max, wfun) \
> +	static ssize_t show_##name(struct device *dev, \
> +				   struct device_attribute *attr, char *buf) \
> +	{ \
> +		return snprintf(buf, 12, "%i\n", var); \
> +	} \
> +	static ssize_t store_##name(struct device *dev, \
> +				    struct device_attribute *attr, \
> +				    const char *buf, size_t count) \
> +	{ \
> +		unsigned int tmp; \
> +		int ret = kstrtouint(buf, 10, &tmp); \
> +		if (ret != 1) \

Shouldn't this be:

 +		if (ret != 0) \

(or maybe just " if (ret) \". I did (and therefore tested) the previous.

?
Anyway, it doesn't work unless I make this change.

> +			return -EINVAL; \
> +		if (tmp > range_max || tmp < range_min) { \
> +			pr_warn("value outside of valid range [%u, %u]\n", \
> +				range_min, range_max); \
> +			return -EINVAL; \
> +		} \
> +		var = tmp; \
> +		wfun \
> +		return count; \
> +	} \
> +	static DEVICE_ATTR(fallback_timer_##name, 0644, \
> +			   show_##name, store_##name)
> +
> +MENU_ATTR_RW(threshold_bits, diff_threshold_bits, 0, 31, {});
> +
> +MENU_ATTR_RW(enable, fallback_timer_enabled, 0, 1, {
> +	int i;
> +
> +	for_each_possible_cpu(i) {
> +		struct menu_device *data = per_cpu_ptr(&menu_devices, i);
> +
> +		if (!fallback_timer_enabled) {
> +			data->have_timer = 0;
> +			hrtimer_cancel(&(data->fallback_timer));
> +		}
> +	} });
> +
> +MENU_ATTR_RW(interval_us, fallback_timer_interval_us, 1, 1000000, {});
> +
> +MENU_ATTR_RW(disregard_past, fallback_timer_disregard_past, 0, 1, {
> +	int i;
> +
> +	for_each_possible_cpu(i) {
> +		struct menu_device *data = per_cpu_ptr(&menu_devices, i);
> +
> +		data->disregard_past = 0;
> +	} });
> +
> +static struct attribute *menu_attrs[] = {
> +	&dev_attr_fallback_timer_threshold_bits.attr,
> +	&dev_attr_fallback_timer_enable.attr,
> +	&dev_attr_fallback_timer_interval_us.attr,
> +	&dev_attr_fallback_timer_disregard_past.attr,
> +	NULL
> +};
> +
> +static struct attribute_group menu_attr_group = {
> +	.attrs = menu_attrs,
> +	.name = "cpuidle_menu",
> + };
> +
> +int menu_add_interface(struct device *dev)
> +{
> +	return sysfs_create_group(&dev->kobj, &menu_attr_group);
> +}
> +
> +void menu_remove_interface(struct device *dev)
> +{
> +	sysfs_remove_group(&dev->kobj, &menu_attr_group);
> +}
> +
>   static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev);
>   
>   /*
> @@ -275,6 +359,14 @@ static unsigned int get_typical_interval(struct menu_device *data)
>   	goto again;
>   }
>   
> +static enum hrtimer_restart fallback_timer_fun(struct hrtimer *tmr)
> +{
> +	struct menu_device *mdata = this_cpu_ptr(&menu_devices);
> +
> +	mdata->disregard_past = 1;
> +	return HRTIMER_NORESTART;
> +}
> +
>   /**
>    * menu_select - selects the next idle state to enter
>    * @drv: cpuidle driver containing state data
> @@ -293,6 +385,11 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>   	unsigned long nr_iowaiters, cpu_load;
>   	int resume_latency = dev_pm_qos_raw_read_value(device);
>   
> +	if (fallback_timer_enabled && data->have_timer) {
> +		data->have_timer = 0;
> +		hrtimer_cancel(&(data->fallback_timer));
> +	}
> +
>   	if (data->needs_update) {
>   		menu_update(drv, dev);
>   		data->needs_update = 0;
> @@ -322,7 +419,32 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>   					 RESOLUTION * DECAY);
>   
>   	expected_interval = get_typical_interval(data);
> -	expected_interval = min(expected_interval, data->next_timer_us);
> +
> +	if (fallback_timer_enabled && fallback_timer_disregard_past
> +	    && data->disregard_past)  {
> +		expected_interval = data->next_timer_us;
> +		// Only disregard the past once! Then try again
> +		data->disregard_past = 0;
> +	} else {
> +		if (fallback_timer_enabled
> +		    && expected_interval < (data->next_timer_us >> diff_threshold_bits)
> +		    && data->next_timer_us > fallback_timer_interval_us * 2) {
> +			/*
> +			 * Program the fallback timer if the gap between the
> +			 * expected interval by heuristic and the next regular
> +			 * timer are too far apart.
> +			 * However, only do this when we didn't just wakup from
> +			 * a timer and are told to disregard the heuristic
> +			 */
> +			ktime_t interval =
> +				ktime_set(0, fallback_timer_interval_us * 1000);
> +
> +			hrtimer_start(&(data->fallback_timer), interval,
> +				      HRTIMER_MODE_REL_PINNED);
> +			data->have_timer = 1;
> +		}
> +		expected_interval = min(expected_interval, data->next_timer_us);
> +	}
>  
>  	if (CPUIDLE_DRIVER_STATE_START > 0) {
>  		struct cpuidle_state *s = &drv->states[CPUIDLE_DRIVER_STATE_START];
> @@ -398,6 +520,11 @@ static void menu_reflect(struct cpuidle_device *dev, int index)
>  {
>  	struct menu_device *data = this_cpu_ptr(&menu_devices);
>  
> +	if (fallback_timer_enabled && data->have_timer) {
> +		data->have_timer = 0;
> +		hrtimer_cancel(&data->fallback_timer);
> +	}
> +
>  	data->last_state_idx = index;
>  	data->needs_update = 1;
>  }
> @@ -486,6 +613,10 @@ static int menu_enable_device(struct cpuidle_driver *drv,
>  
>  	memset(data, 0, sizeof(struct menu_device));
>   
> +	hrtimer_init(&(data->fallback_timer),
> +		     CLOCK_REALTIME, HRTIMER_MODE_REL_PINNED);
> +	data->fallback_timer.function = fallback_timer_fun;
> +
>  	/*
>  	 * if the correction factor is 0 (eg first time init or cpu hotplug
>  	 * etc), we actually want to start out with a unity factor.
> @@ -509,6 +640,10 @@ static struct cpuidle_governor menu_governor = {
>   */
>  static int __init init_menu(void)
>  {
> +	int ret = menu_add_interface(cpu_subsys.dev_root);
> +
> +	if (ret)
> +		return ret;
>  	return cpuidle_register_governor(&menu_governor);
>  }
 

  parent reply	other threads:[~2017-10-20  0:17 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <a181bf42-9462-476c-6dcd-39fc7151957f@tu-dresden.de>
2017-07-27 12:50 ` [PATCH] cpuidle: Allow menu governor to enter deeper sleep states after some time Thomas Ilsche
2017-10-19  7:46   ` Len Brown
2017-10-20 16:31     ` Thomas Ilsche
2017-10-21 14:27     ` Doug Smythies
2017-10-20  0:17 ` Doug Smythies [this message]
2017-10-20 17:13   ` Thomas Ilsche
2017-10-21 14:28   ` Doug Smythies
2017-11-07 23:04     ` Thomas Ilsche
2017-11-08  4:53       ` Len Brown
2017-11-08  6:01         ` Yu Chen
2017-11-08 16:26         ` Doug Smythies
2017-11-08 16:26     ` Doug Smythies
2017-11-10 17:42     ` Doug Smythies
2017-11-14  6:12     ` Doug Smythies
2017-11-16 16:11       ` Thomas Ilsche
2017-11-16 22:47       ` Doug Smythies
2017-11-24 17:36       ` Doug Smythies
2017-12-02 12:56         ` Thomas Ilsche
2017-12-15 10:44           ` Thomas Ilsche
2017-12-15 14:23             ` Rafael J. Wysocki
2017-12-21  9:43               ` Thomas Ilsche
2017-12-22 19:37                 ` Rafael J. Wysocki
2017-12-15 16:16             ` Doug Smythies
2017-12-16  2:34               ` Rafael J. Wysocki
2017-11-25 16:30       ` Doug Smythies

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000101d34938$da740870$8f5c1950$@net' \
    --to=dsmythies@telus.net \
    --cc=alex.shi@linaro.org \
    --cc=daniel.hackenberg@tu-dresden.de \
    --cc=daniel.lezcano@linaro.org \
    --cc=lenb@kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mario.bielert@tu-dresden.de \
    --cc=mhaehnel@os.inf.tu-dresden.de \
    --cc=mingo@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=riel@redhat.com \
    --cc=robert.schoene@tu-dresden.de \
    --cc=thomas.ilsche@tu-dresden.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.