Re: [RFC 1/1] Weighted approach to gather and use history in TEO governor

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Gautham R Shenoy <ego@linux.vnet.ibm.com>
To: Pratik Rajesh Sampat <psampat@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, rafael.j.wysocki@intel.com,
	peterz@infradead.org, dsmythies@telus.net,
	daniel.lezcano@linaro.org, ego@linux.vnet.ibm.com,
	svaidy@linux.ibm.com, pratik.sampat@in.ibm.com,
	pratik.r.sampat@gmail.com
Subject: Re: [RFC 1/1] Weighted approach to gather and use history in TEO governor
Date: Tue, 25 Feb 2020 11:48:57 +0530	[thread overview]
Message-ID: <20200225061857.GH12846@in.ibm.com> (raw)
In-Reply-To: <20200222070002.12897-2-psampat@linux.ibm.com>

On Sat, Feb 22, 2020 at 12:30:02PM +0530, Pratik Rajesh Sampat wrote:
> Complementing the current self correcting window algorithm, an alternate
> approach is devised to leverage lifetime history with constant overhead
> 
> Each CPU maintains a matrix wherein each idle state maintains a
> probability distribution.
> 
> The probability distribution is nothing but a n*n matrix, where
> n = drv->state_count.
> Each entry in the array signifies a weight for that row.
> The weights can vary from the range [0-10000].
> 
> For example:
> state_mat[1][2] = 3000 means that previously when state 1 was selected,
> the probability that state 2 will occur is 30%.
> The trailing zeros correspond to having more resolution while increasing
> or reducing the weights for correction.
> 
> Initially the weights are distributed in a way such that the index of
> that state in question has a higher probability of choosing itself, as
> we have no reason to believe otherwise yet. Initial bias to itself is
> 70% and the rest 30% is equally distributed to the rest of the states.
> 
> Selection of an idle state:
> When the TEO governor chooses an idle state, the probability
> distribution for that state is looked at. A weighted random number
> generator is used using the weights as bias to choose the next idle
> state. The algorithm leans to choose that or a shallower state than that
> for its next prediction
> 
> Correction of the probability distribution:
> On wakeup, the weights are updated. The state which it should have woken
> up with (could be the hit / miss / early hit state) is increased in
> weight by the "LEARNING_RATE" % and the rest of the states for that
> index are reduced by the same factor.
> The LEARNING RATE is experimentally chosen to be 5 %


I know this is an RFC patch, not meant for inclusion, but it is good
practice to have your Signed-off-by.


> ---
>  drivers/cpuidle/governors/teo.c | 95 +++++++++++++++++++++++++++++++--
>  1 file changed, 90 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
> index de7e706efd46..8060c287f5e4 100644
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -50,6 +50,7 @@
>  #include <linux/kernel.h>
>  #include <linux/sched/clock.h>
>  #include <linux/tick.h>
> +#include <linux/random.h>
> 
>  /*
>   * The PULSE value is added to metrics when they grow and the DECAY_SHIFT value
> @@ -64,6 +65,12 @@
>   */
>  #define INTERVALS	8
> 
> +/*
> + * Percentage of the amount of weight to be shifted in the idle state weight
> + * distribution for correction
> + */
> +#define LEARNING_RATE	5
> +
>  /**
>   * struct teo_idle_state - Idle state data used by the TEO cpuidle governor.
>   * @early_hits: "Early" CPU wakeups "matching" this state.
> @@ -98,6 +105,8 @@ struct teo_idle_state {
>   * @states: Idle states data corresponding to this CPU.
>   * @interval_idx: Index of the most recent saved idle interval.
>   * @intervals: Saved idle duration values.
> + * @state_mat: Each idle state maintains a weights corresponding to that
> + * state, storing the probablity distribution of occurance for that state
>   */
>  struct teo_cpu {
>  	u64 time_span_ns;
> @@ -105,6 +114,7 @@ struct teo_cpu {
>  	struct teo_idle_state states[CPUIDLE_STATE_MAX];
>  	int interval_idx;
>  	u64 intervals[INTERVALS];
> +	int state_mat[CPUIDLE_STATE_MAX][CPUIDLE_STATE_MAX];
>  };
> 
>  static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
> @@ -117,7 +127,7 @@ static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
>  static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  {
>  	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
> -	int i, idx_hit = -1, idx_timer = -1;
> +	int i, idx_hit = -1, idx_timer = -1, idx = -1, last_idx = dev->last_state_idx;
>  	u64 measured_ns;
> 
>  	if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) {
> @@ -183,16 +193,50 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> 
>  		if (idx_timer > idx_hit) {
>  			misses += PULSE;
> -			if (idx_hit >= 0)
> +			idx = idx_timer;
> +			if (idx_hit >= 0) {
>  				cpu_data->states[idx_hit].early_hits += PULSE;
> +				idx = idx_hit;
> +			}
>  		} else {
>  			hits += PULSE;
> +			idx = last_idx;
>  		}
> 
>  		cpu_data->states[idx_timer].misses = misses;
>  		cpu_data->states[idx_timer].hits = hits;
>  	}
> 
> +	/*
> +	 * Rearrange the weight distribution of the state, increase the weight
> +	 * by the LEARNING RATE % for the idle state that was supposed to be
> +	 * chosen and reduce by the same amount for rest of the states
> +	 *
> +	 * If the weights are greater than (100 - LEARNING_RATE) % or lesser
> +	 * than LEARNING_RATE %, do not increase or decrease the confidence
> +	 * respectively
> +	 */
> +	for (i = 0; i < drv->state_count; i++) {
> +		unsigned int delta;
> +
> +		if (idx == -1)
> +			break;
> +		if (i ==  idx) {
> +			delta = (LEARNING_RATE * cpu_data->state_mat[last_idx][i]) / 100;
> +			if (cpu_data->state_mat[last_idx][i] + delta >=
> +			    (100 - LEARNING_RATE) * 100)
> +				continue;
> +			cpu_data->state_mat[last_idx][i] += delta;
> +			continue;
> +		}
> +		delta = (LEARNING_RATE * cpu_data->state_mat[last_idx][i]) /
> +			((drv->state_count - 1) * 100);


What happens when drv->state_count == 1?

> +		if (cpu_data->state_mat[last_idx][i] - delta <=
> +		    LEARNING_RATE * 100)
> +			continue;
> +		cpu_data->state_mat[last_idx][i] -= delta;

So, even update takes O(n) time, since we have to increase the weight
for one state, and correspondingly decrease the state for all the
other states.


> +	}
> +
>  	/*
>  	 * Save idle duration values corresponding to non-timer wakeups for
>  	 * pattern detection.
> @@ -244,7 +288,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>  	s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
>  	u64 duration_ns;
>  	unsigned int hits, misses, early_hits;
> -	int max_early_idx, prev_max_early_idx, constraint_idx, idx, i;
> +	int max_early_idx, prev_max_early_idx, constraint_idx, idx, i, og_idx;
>  	ktime_t delta_tick;
> 
>  	if (dev->last_state_idx >= 0) {
> @@ -374,10 +418,13 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>  	if (constraint_idx < idx)
>  		idx = constraint_idx;
> 
> +	og_idx = idx;
> +
>  	if (idx < 0) {
>  		idx = 0; /* No states enabled. Must use 0. */
>  	} else if (idx > 0) {
> -		unsigned int count = 0;
> +		unsigned int count = 0, sum_weights = 0, weights_list[CPUIDLE_STATE_MAX];
> +		int i, j = 0, rnd_wt, rnd_num = 0;
>  		u64 sum = 0;
> 
>  		/*
> @@ -412,6 +459,29 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>  								       idx, avg_ns);
>  			}
>  		}
> +		/*
> +		 * In case, the recent history yields a shallower state, then
> +		 * the probability distribution is looked at.
> +		 * The weighted random number generator uses the weights as a
> +		 * bias to choose the next idle state
> +		 */
> +		if (og_idx != idx) {
> +			for (i = 0; i <= idx; i++) {


So it seems like we are restricting our choice to states no deeper
than the selected state.

Is it not possible that cpu_data->state_mat[idx][j] has the
maximum weight when j > idx ? If yes, why are we leaving those states
out ?

> +				if (dev->states_usage[i].disable)
> +					continue;
> +				sum_weights += cpu_data->state_mat[idx][i];
> +				weights_list[j++] = sum_weights;
> +			}

Assume that cpu_data->stat_mat[idx] = {4, 5, 6, 9}
weight_list[] = {4, 9, 15, 24}

> +			get_random_bytes(&rnd_num, sizeof(rnd_num));
> +			rnd_num = rnd_num % 100;

Assume rnd_num = 50.
> +			rnd_wt = (rnd_num * sum_weights) / 100;


Then, rnd_wt = 12.

From the logic, below, it appears that you want to pick the shallowest
state i at which rnd_wt < weights_list[i]. In which case it would be
the state corresponding to the weight 6 (as the cumulative weight at
that point is 15).


> +			for (i = 0; i < j; i++) {
> +				if (rnd_wt < weights_list[i])
> +					break;
> +				rnd_wt -= weights_list[i];


And yet, because of this additional subtraction, after the first
iteration of this loop, rnd_wt = 12 - 4 = 8, which means that you will
end up picking the state corresponding to weight 5 (cumulative weight
being 9 at this point).

This doesn't seem right.

> +			}
> +			idx = i;
> +		}
>  	}
> 
>  	/*
> @@ -468,13 +538,28 @@ static int teo_enable_device(struct cpuidle_driver *drv,
>  			     struct cpuidle_device *dev)
>  {
>  	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
> -	int i;
> +	int i, j;
> 
>  	memset(cpu_data, 0, sizeof(*cpu_data));
> 
>  	for (i = 0; i < INTERVALS; i++)
>  		cpu_data->intervals[i] = U64_MAX;
> 
> +	/*
> +	 * Populate initial weights for each state
> +	 * The stop state is initially more biased for itself.
> +	 *
> +	 * Currently the initial distribution of probabilities are 70%-30%.
> +	 * The trailing 0s are for increased resolution.
> +	 */
> +	for (i = 0; i < drv->state_count; i++) {
> +		for (j = 0; j < drv->state_count; j++) {
> +			if (i == j)
> +				cpu_data->state_mat[i][j] = 7000;
> +			else
> +				cpu_data->state_mat[i][j] = 3000 / (drv->state_count - 1);


> +		}
> +	}
>  	return 0;
>  }
> 
> -- 
> 2.17.1
>

next prev parent reply	other threads:[~2020-02-25  6:25 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-22  7:00 [RFC 0/1] Weighted approach to gather and use history in TEO governor Pratik Rajesh Sampat
2020-02-22  7:00 ` [RFC 1/1] " Pratik Rajesh Sampat
2020-02-25  6:18   ` Gautham R Shenoy [this message]
2020-02-29  8:58     ` Pratik Sampat
2020-02-25  5:13 ` [RFC 0/1] " Gautham R Shenoy
2020-02-27 16:14   ` Doug Smythies
2020-02-29  8:58     ` Pratik Sampat
2020-02-29  8:58   ` Pratik Sampat
  -- strict thread matches above, loose matches on Subject: below --
2020-05-11 14:10 [RFC 0/1] Alternate history mechanism for the " Pratik Rajesh Sampat
2020-05-11 14:10 ` [RFC 1/1] Weighted approach to gather and use history in " Pratik Rajesh Sampat
2020-05-12 17:37   ` Peter Zijlstra
2020-05-13  5:31     ` Pratik Sampat
2020-05-13 14:49       ` Rafael J. Wysocki
2020-05-14 15:35         ` Pratik Sampat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200225061857.GH12846@in.ibm.com \
    --to=ego@linux.vnet.ibm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dsmythies@telus.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=pratik.r.sampat@gmail.com \
    --cc=pratik.sampat@in.ibm.com \
    --cc=psampat@linux.ibm.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=svaidy@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.