[PATCH v1] cpuidle: menu: Update documentation after previous changes

Linux Power Management development
 help / color / mirror / Atom feed

* [PATCH v1] cpuidle: menu: Update documentation after previous changes
@ 2025-01-10 12:46 Rafael J. Wysocki
  2025-01-10 14:19 ` Christian Loehle
  0 siblings, 1 reply; 5+ messages in thread
From: Rafael J. Wysocki @ 2025-01-10 12:46 UTC (permalink / raw)
  To: Linux PM; +Cc: LKML, Daniel Lezcano, Artem Bityutskiy, Christian Loehle

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

After commit 38f83090f515 ("cpuidle: menu: Remove iowait influence") and
other previous changes, the description of the menu governor in the
documentation does not match the code any more, so update it as
appropriate.

Fixes: 38f83090f515 ("cpuidle: menu: Remove iowait influence")
Fixes: 5484e31bbbff ("cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases")
Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/admin-guide/pm/cpuidle.rst |   72 +++++++++++++------------------
 1 file changed, 31 insertions(+), 41 deletions(-)

--- a/Documentation/admin-guide/pm/cpuidle.rst
+++ b/Documentation/admin-guide/pm/cpuidle.rst
@@ -269,27 +269,7 @@
 the CPU will ask the processor hardware to enter), it attempts to predict the
 idle duration and uses the predicted value for idle state selection.
 
-It first obtains the time until the closest timer event with the assumption
-that the scheduler tick will be stopped.  That time, referred to as the *sleep
-length* in what follows, is the upper bound on the time before the next CPU
-wakeup.  It is used to determine the sleep length range, which in turn is needed
-to get the sleep length correction factor.
-
-The ``menu`` governor maintains two arrays of sleep length correction factors.
-One of them is used when tasks previously running on the given CPU are waiting
-for some I/O operations to complete and the other one is used when that is not
-the case.  Each array contains several correction factor values that correspond
-to different sleep length ranges organized so that each range represented in the
-array is approximately 10 times wider than the previous one.
-
-The correction factor for the given sleep length range (determined before
-selecting the idle state for the CPU) is updated after the CPU has been woken
-up and the closer the sleep length is to the observed idle duration, the closer
-to 1 the correction factor becomes (it must fall between 0 and 1 inclusive).
-The sleep length is multiplied by the correction factor for the range that it
-falls into to obtain the first approximation of the predicted idle duration.
-
-Next, the governor uses a simple pattern recognition algorithm to refine its
+It first uses a simple pattern recognition algorithm to obtain a preliminary
 idle duration prediction.  Namely, it saves the last 8 observed idle duration
 values and, when predicting the idle duration next time, it computes the average
 and variance of them.  If the variance is small (smaller than 400 square
@@ -301,29 +281,39 @@
 taken as the "typical interval" value and so on, until either the "typical
 interval" is determined or too many data points are disregarded, in which case
 the "typical interval" is assumed to equal "infinity" (the maximum unsigned
-integer value).  The "typical interval" computed this way is compared with the
-sleep length multiplied by the correction factor and the minimum of the two is
-taken as the predicted idle duration.
-
-Then, the governor computes an extra latency limit to help "interactive"
-workloads.  It uses the observation that if the exit latency of the selected
-idle state is comparable with the predicted idle duration, the total time spent
-in that state probably will be very short and the amount of energy to save by
-entering it will be relatively small, so likely it is better to avoid the
-overhead related to entering that state and exiting it.  Thus selecting a
-shallower state is likely to be a better option then.   The first approximation
-of the extra latency limit is the predicted idle duration itself which
-additionally is divided by a value depending on the number of tasks that
-previously ran on the given CPU and now they are waiting for I/O operations to
-complete.  The result of that division is compared with the latency limit coming
-from the power management quality of service, or `PM QoS <cpu-pm-qos_>`_,
-framework and the minimum of the two is taken as the limit for the idle states'
-exit latency.
+integer value).
+
+If the "typical interval" computed this way is long enough, the governor obtains
+the time until the closest timer event with the assumption that the scheduler
+tick will be stopped.  That time, referred to as the *sleep length* in what follows,
+is the upper bound on the time before the next CPU wakeup.  It is used to determine
+the sleep length range, which in turn is needed to get the sleep length correction
+factor.
+
+The ``menu`` governor maintains an array containing several correction factor
+values that correspond to different sleep length ranges organized so that each
+range represented in the array is approximately 10 times wider than the previous
+one.
+
+The correction factor for the given sleep length range (determined before
+selecting the idle state for the CPU) is updated after the CPU has been woken
+up and the closer the sleep length is to the observed idle duration, the closer
+to 1 the correction factor becomes (it must fall between 0 and 1 inclusive).
+The sleep length is multiplied by the correction factor for the range that it
+falls into to obtain an approximation of the predicted idle duration that is
+compared to the "typical interval" determined previously and the minimum of
+the two is taken as the idle duration prediction.
+
+If the "typical interval" value is small, which means that the CPU is likely
+to be woken up soon enough, the sleep length computation is skipped as it may
+be costly and the idle duration is simply predicted to equal the "typical
+interval" value.
 
 Now, the governor is ready to walk the list of idle states and choose one of
 them.  For this purpose, it compares the target residency of each state with
-the predicted idle duration and the exit latency of it with the computed latency
-limit.  It selects the state with the target residency closest to the predicted
+the predicted idle duration and the exit latency of it with the with the latency
+limit coming from the power management quality of service, or `PM QoS <cpu-pm-qos_>`_,
+framework.  It selects the state with the target residency closest to the predicted
 idle duration, but still below it, and exit latency that does not exceed the
 limit.
 




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] cpuidle: menu: Update documentation after previous changes
  2025-01-10 12:46 [PATCH v1] cpuidle: menu: Update documentation after previous changes Rafael J. Wysocki
@ 2025-01-10 14:19 ` Christian Loehle
  0 siblings, 0 replies; 5+ messages in thread
From: Christian Loehle @ 2025-01-10 14:19 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Artem Bityutskiy

On 1/10/25 12:46, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> After commit 38f83090f515 ("cpuidle: menu: Remove iowait influence") and
> other previous changes, the description of the menu governor in the
> documentation does not match the code any more, so update it as
> appropriate.
> 
> Fixes: 38f83090f515 ("cpuidle: menu: Remove iowait influence")
> Fixes: 5484e31bbbff ("cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases")
> Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  Documentation/admin-guide/pm/cpuidle.rst |   72 +++++++++++++------------------
>  1 file changed, 31 insertions(+), 41 deletions(-)
> 
> --- a/Documentation/admin-guide/pm/cpuidle.rst
> +++ b/Documentation/admin-guide/pm/cpuidle.rst
> @@ -269,27 +269,7 @@
>  the CPU will ask the processor hardware to enter), it attempts to predict the
>  idle duration and uses the predicted value for idle state selection.
>  
> -It first obtains the time until the closest timer event with the assumption
> -that the scheduler tick will be stopped.  That time, referred to as the *sleep
> -length* in what follows, is the upper bound on the time before the next CPU
> -wakeup.  It is used to determine the sleep length range, which in turn is needed
> -to get the sleep length correction factor.
> -
> -The ``menu`` governor maintains two arrays of sleep length correction factors.
> -One of them is used when tasks previously running on the given CPU are waiting
> -for some I/O operations to complete and the other one is used when that is not
> -the case.  Each array contains several correction factor values that correspond
> -to different sleep length ranges organized so that each range represented in the
> -array is approximately 10 times wider than the previous one.
> -
> -The correction factor for the given sleep length range (determined before
> -selecting the idle state for the CPU) is updated after the CPU has been woken
> -up and the closer the sleep length is to the observed idle duration, the closer
> -to 1 the correction factor becomes (it must fall between 0 and 1 inclusive).
> -The sleep length is multiplied by the correction factor for the range that it
> -falls into to obtain the first approximation of the predicted idle duration.
> -
> -Next, the governor uses a simple pattern recognition algorithm to refine its
> +It first uses a simple pattern recognition algorithm to obtain a preliminary
>  idle duration prediction.  Namely, it saves the last 8 observed idle duration
>  values and, when predicting the idle duration next time, it computes the average
>  and variance of them.  If the variance is small (smaller than 400 square
> @@ -301,29 +281,39 @@
>  taken as the "typical interval" value and so on, until either the "typical
>  interval" is determined or too many data points are disregarded, in which case
>  the "typical interval" is assumed to equal "infinity" (the maximum unsigned
> -integer value).  The "typical interval" computed this way is compared with the
> -sleep length multiplied by the correction factor and the minimum of the two is
> -taken as the predicted idle duration.
> -
> -Then, the governor computes an extra latency limit to help "interactive"
> -workloads.  It uses the observation that if the exit latency of the selected
> -idle state is comparable with the predicted idle duration, the total time spent
> -in that state probably will be very short and the amount of energy to save by
> -entering it will be relatively small, so likely it is better to avoid the
> -overhead related to entering that state and exiting it.  Thus selecting a
> -shallower state is likely to be a better option then.   The first approximation
> -of the extra latency limit is the predicted idle duration itself which
> -additionally is divided by a value depending on the number of tasks that
> -previously ran on the given CPU and now they are waiting for I/O operations to
> -complete.  The result of that division is compared with the latency limit coming
> -from the power management quality of service, or `PM QoS <cpu-pm-qos_>`_,
> -framework and the minimum of the two is taken as the limit for the idle states'
> -exit latency.
> +integer value).
> +
> +If the "typical interval" computed this way is long enough, the governor obtains
> +the time until the closest timer event with the assumption that the scheduler
> +tick will be stopped.  That time, referred to as the *sleep length* in what follows,
> +is the upper bound on the time before the next CPU wakeup.  It is used to determine
> +the sleep length range, which in turn is needed to get the sleep length correction
> +factor.

The sleep length of course isn't really an upper bound before the next CPU wakeup,
we just treat it as such, but I guess the doc doesn't need to be too pedantic.

> +
> +The ``menu`` governor maintains an array containing several correction factor
> +values that correspond to different sleep length ranges organized so that each
> +range represented in the array is approximately 10 times wider than the previous
> +one.
> +
> +The correction factor for the given sleep length range (determined before
> +selecting the idle state for the CPU) is updated after the CPU has been woken
> +up and the closer the sleep length is to the observed idle duration, the closer
> +to 1 the correction factor becomes (it must fall between 0 and 1 inclusive).
> +The sleep length is multiplied by the correction factor for the range that it
> +falls into to obtain an approximation of the predicted idle duration that is
> +compared to the "typical interval" determined previously and the minimum of
> +the two is taken as the idle duration prediction.
> +
> +If the "typical interval" value is small, which means that the CPU is likely
> +to be woken up soon enough, the sleep length computation is skipped as it may
> +be costly and the idle duration is simply predicted to equal the "typical
> +interval" value.
>  
>  Now, the governor is ready to walk the list of idle states and choose one of
>  them.  For this purpose, it compares the target residency of each state with
> -the predicted idle duration and the exit latency of it with the computed latency
> -limit.  It selects the state with the target residency closest to the predicted
> +the predicted idle duration and the exit latency of it with the with the latency
> +limit coming from the power management quality of service, or `PM QoS <cpu-pm-qos_>`_,
> +framework.  It selects the state with the target residency closest to the predicted
>  idle duration, but still below it, and exit latency that does not exceed the
>  limit.

Reviewed-by: Christian Loehle <christian.loehle@arm.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v1] cpuidle: menu: Update documentation after previous changes
@ 2025-02-20 20:13 Rafael J. Wysocki
  2025-02-24 12:41 ` Christian Loehle
  0 siblings, 1 reply; 5+ messages in thread
From: Rafael J. Wysocki @ 2025-02-20 20:13 UTC (permalink / raw)
  To: Linux PM
  Cc: LKML, Daniel Lezcano, Christian Loehle, Artem Bityutskiy,
	Doug Smythies

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The documentaion of the menu cpuidle governor needs to be updated
to match the code bevavior after some changes made recently.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/admin-guide/pm/cpuidle.rst |   27 ++++++++++++++++-----------
 drivers/cpuidle/governors/menu.c         |   29 ++++++++++-------------------
 2 files changed, 26 insertions(+), 30 deletions(-)

--- a/Documentation/admin-guide/pm/cpuidle.rst
+++ b/Documentation/admin-guide/pm/cpuidle.rst
@@ -275,20 +275,25 @@
 and variance of them.  If the variance is small (smaller than 400 square
 milliseconds) or it is small relative to the average (the average is greater
 that 6 times the standard deviation), the average is regarded as the "typical
-interval" value.  Otherwise, the longest of the saved observed idle duration
+interval" value.  Otherwise, either the longest or the shortest (depending on
+which one is farther from the average) of the saved observed idle duration
 values is discarded and the computation is repeated for the remaining ones.
+
 Again, if the variance of them is small (in the above sense), the average is
 taken as the "typical interval" value and so on, until either the "typical
-interval" is determined or too many data points are disregarded, in which case
-the "typical interval" is assumed to equal "infinity" (the maximum unsigned
-integer value).
+interval" is determined or too many data points are disregarded.  In the latter
+case, if the size of the set of data points still under consideration is
+sufficiently large, the next idle duration is not likely to be above the largest
+idle duration value still in that set, so that value is taken as the predicted
+next idle duration.  Finally, if the set of data points still under
+consideration is too small, no prediction is made.
 
-If the "typical interval" computed this way is long enough, the governor obtains
-the time until the closest timer event with the assumption that the scheduler
-tick will be stopped.  That time, referred to as the *sleep length* in what follows,
-is the upper bound on the time before the next CPU wakeup.  It is used to determine
-the sleep length range, which in turn is needed to get the sleep length correction
-factor.
+If the preliminary prediction of the next idle duration computed this way is
+long enough, the governor obtains the time until the closest timer event with
+the assumption that the scheduler tick will be stopped.  That time, referred to
+as the *sleep length* in what follows, is the upper bound on the time before the
+next CPU wakeup.  It is used to determine the sleep length range, which in turn
+is needed to get the sleep length correction factor.
 
 The ``menu`` governor maintains an array containing several correction factor
 values that correspond to different sleep length ranges organized so that each
@@ -302,7 +307,7 @@
 The sleep length is multiplied by the correction factor for the range that it
 falls into to obtain an approximation of the predicted idle duration that is
 compared to the "typical interval" determined previously and the minimum of
-the two is taken as the idle duration prediction.
+the two is taken as the final idle duration prediction.
 
 If the "typical interval" value is small, which means that the CPU is likely
 to be woken up soon enough, the sleep length computation is skipped as it may
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -41,7 +41,7 @@
  * the  C state is required to actually break even on this cost. CPUIDLE
  * provides us this duration in the "target_residency" field. So all that we
  * need is a good prediction of how long we'll be idle. Like the traditional
- * menu governor, we start with the actual known "next timer event" time.
+ * menu governor, we take the actual known "next timer event" time.
  *
  * Since there are other source of wakeups (interrupts for example) than
  * the next timer event, this estimation is rather optimistic. To get a
@@ -50,30 +50,21 @@
  * duration always was 50% of the next timer tick, the correction factor will
  * be 0.5.
  *
- * menu uses a running average for this correction factor, however it uses a
- * set of factors, not just a single factor. This stems from the realization
- * that the ratio is dependent on the order of magnitude of the expected
- * duration; if we expect 500 milliseconds of idle time the likelihood of
- * getting an interrupt very early is much higher than if we expect 50 micro
- * seconds of idle time. A second independent factor that has big impact on
- * the actual factor is if there is (disk) IO outstanding or not.
- * (as a special twist, we consider every sleep longer than 50 milliseconds
- * as perfect; there are no power gains for sleeping longer than this)
- *
- * For these two reasons we keep an array of 12 independent factors, that gets
- * indexed based on the magnitude of the expected duration as well as the
- * "is IO outstanding" property.
+ * menu uses a running average for this correction factor, but it uses a set of
+ * factors, not just a single factor. This stems from the realization that the
+ * ratio is dependent on the order of magnitude of the expected duration; if we
+ * expect 500 milliseconds of idle time the likelihood of getting an interrupt
+ * very early is much higher than if we expect 50 micro seconds of idle time.
+ * For this reason, menu keeps an array of 6 independent factors, that gets
+ * indexed based on the magnitude of the expected duration.
  *
  * Repeatable-interval-detector
  * ----------------------------
  * There are some cases where "next timer" is a completely unusable predictor:
  * Those cases where the interval is fixed, for example due to hardware
- * interrupt mitigation, but also due to fixed transfer rate devices such as
- * mice.
+ * interrupt mitigation, but also due to fixed transfer rate devices like mice.
  * For this, we use a different predictor: We track the duration of the last 8
- * intervals and if the stand deviation of these 8 intervals is below a
- * threshold value, we use the average of these intervals as prediction.
- *
+ * intervals and use them to estimate the duration of the next one.
  */
 
 struct menu_device {




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] cpuidle: menu: Update documentation after previous changes
  2025-02-20 20:13 Rafael J. Wysocki
@ 2025-02-24 12:41 ` Christian Loehle
  2025-02-24 12:57   ` Rafael J. Wysocki
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Loehle @ 2025-02-24 12:41 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM
  Cc: LKML, Daniel Lezcano, Artem Bityutskiy, Doug Smythies

On 2/20/25 20:13, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The documentaion of the menu cpuidle governor needs to be updated
s/documentaion/documentation/
> to match the code bevavior after some changes made recently.

s/bevavior/behavior/

> 
> No functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  Documentation/admin-guide/pm/cpuidle.rst |   27 ++++++++++++++++-----------
>  drivers/cpuidle/governors/menu.c         |   29 ++++++++++-------------------
>  2 files changed, 26 insertions(+), 30 deletions(-)
> 
> --- a/Documentation/admin-guide/pm/cpuidle.rst
> +++ b/Documentation/admin-guide/pm/cpuidle.rst
> @@ -275,20 +275,25 @@
>  and variance of them.  If the variance is small (smaller than 400 square
>  milliseconds) or it is small relative to the average (the average is greater
>  that 6 times the standard deviation), the average is regarded as the "typical
> -interval" value.  Otherwise, the longest of the saved observed idle duration
> +interval" value.  Otherwise, either the longest or the shortest (depending on
> +which one is farther from the average) of the saved observed idle duration
>  values is discarded and the computation is repeated for the remaining ones.
> +
>  Again, if the variance of them is small (in the above sense), the average is
>  taken as the "typical interval" value and so on, until either the "typical
> -interval" is determined or too many data points are disregarded, in which case
> -the "typical interval" is assumed to equal "infinity" (the maximum unsigned
> -integer value).
> +interval" is determined or too many data points are disregarded.  In the latter
> +case, if the size of the set of data points still under consideration is
> +sufficiently large, the next idle duration is not likely to be above the largest
> +idle duration value still in that set, so that value is taken as the predicted
> +next idle duration.  Finally, if the set of data points still under
> +consideration is too small, no prediction is made.
>  
> -If the "typical interval" computed this way is long enough, the governor obtains
> -the time until the closest timer event with the assumption that the scheduler
> -tick will be stopped.  That time, referred to as the *sleep length* in what follows,
> -is the upper bound on the time before the next CPU wakeup.  It is used to determine
> -the sleep length range, which in turn is needed to get the sleep length correction
> -factor.
> +If the preliminary prediction of the next idle duration computed this way is
> +long enough, the governor obtains the time until the closest timer event with
> +the assumption that the scheduler tick will be stopped.  That time, referred to
> +as the *sleep length* in what follows, is the upper bound on the time before the
> +next CPU wakeup.  It is used to determine the sleep length range, which in turn
> +is needed to get the sleep length correction factor.
>  
>  The ``menu`` governor maintains an array containing several correction factor
>  values that correspond to different sleep length ranges organized so that each
> @@ -302,7 +307,7 @@
>  The sleep length is multiplied by the correction factor for the range that it
>  falls into to obtain an approximation of the predicted idle duration that is
>  compared to the "typical interval" determined previously and the minimum of
> -the two is taken as the idle duration prediction.
> +the two is taken as the final idle duration prediction.
>  
>  If the "typical interval" value is small, which means that the CPU is likely
>  to be woken up soon enough, the sleep length computation is skipped as it may
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -41,7 +41,7 @@
>   * the  C state is required to actually break even on this cost. CPUIDLE
>   * provides us this duration in the "target_residency" field. So all that we
>   * need is a good prediction of how long we'll be idle. Like the traditional
> - * menu governor, we start with the actual known "next timer event" time.
> + * menu governor, we take the actual known "next timer event" time.
>   *
>   * Since there are other source of wakeups (interrupts for example) than
>   * the next timer event, this estimation is rather optimistic. To get a
> @@ -50,30 +50,21 @@
>   * duration always was 50% of the next timer tick, the correction factor will
>   * be 0.5.
>   *
> - * menu uses a running average for this correction factor, however it uses a
> - * set of factors, not just a single factor. This stems from the realization
> - * that the ratio is dependent on the order of magnitude of the expected
> - * duration; if we expect 500 milliseconds of idle time the likelihood of
> - * getting an interrupt very early is much higher than if we expect 50 micro
> - * seconds of idle time. A second independent factor that has big impact on
> - * the actual factor is if there is (disk) IO outstanding or not.
> - * (as a special twist, we consider every sleep longer than 50 milliseconds
> - * as perfect; there are no power gains for sleeping longer than this)
> - *
> - * For these two reasons we keep an array of 12 independent factors, that gets
> - * indexed based on the magnitude of the expected duration as well as the
> - * "is IO outstanding" property.
> + * menu uses a running average for this correction factor, but it uses a set of
> + * factors, not just a single factor. This stems from the realization that the
> + * ratio is dependent on the order of magnitude of the expected duration; if we
> + * expect 500 milliseconds of idle time the likelihood of getting an interrupt
> + * very early is much higher than if we expect 50 micro seconds of idle time.
> + * For this reason, menu keeps an array of 6 independent factors, that gets
> + * indexed based on the magnitude of the expected duration.
>   *
>   * Repeatable-interval-detector
>   * ----------------------------
>   * There are some cases where "next timer" is a completely unusable predictor:
>   * Those cases where the interval is fixed, for example due to hardware
> - * interrupt mitigation, but also due to fixed transfer rate devices such as
> - * mice.
> + * interrupt mitigation, but also due to fixed transfer rate devices like mice.
>   * For this, we use a different predictor: We track the duration of the last 8
> - * intervals and if the stand deviation of these 8 intervals is below a
> - * threshold value, we use the average of these intervals as prediction.
> - *
> + * intervals and use them to estimate the duration of the next one.
>   */

Assuming you fix up the typos in the commit message when applying:
Reviewed-by: Christian Loehle <christian.loehle@arm.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] cpuidle: menu: Update documentation after previous changes
  2025-02-24 12:41 ` Christian Loehle
@ 2025-02-24 12:57   ` Rafael J. Wysocki
  0 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2025-02-24 12:57 UTC (permalink / raw)
  To: Christian Loehle
  Cc: Rafael J. Wysocki, Linux PM, LKML, Daniel Lezcano,
	Artem Bityutskiy, Doug Smythies

On Mon, Feb 24, 2025 at 1:41 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 2/20/25 20:13, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The documentaion of the menu cpuidle governor needs to be updated
> s/documentaion/documentation/
> > to match the code bevavior after some changes made recently.
>
> s/bevavior/behavior/
>
> >
> > No functional impact.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  Documentation/admin-guide/pm/cpuidle.rst |   27 ++++++++++++++++-----------
> >  drivers/cpuidle/governors/menu.c         |   29 ++++++++++-------------------
> >  2 files changed, 26 insertions(+), 30 deletions(-)
> >
> > --- a/Documentation/admin-guide/pm/cpuidle.rst
> > +++ b/Documentation/admin-guide/pm/cpuidle.rst
> > @@ -275,20 +275,25 @@
> >  and variance of them.  If the variance is small (smaller than 400 square
> >  milliseconds) or it is small relative to the average (the average is greater
> >  that 6 times the standard deviation), the average is regarded as the "typical
> > -interval" value.  Otherwise, the longest of the saved observed idle duration
> > +interval" value.  Otherwise, either the longest or the shortest (depending on
> > +which one is farther from the average) of the saved observed idle duration
> >  values is discarded and the computation is repeated for the remaining ones.
> > +
> >  Again, if the variance of them is small (in the above sense), the average is
> >  taken as the "typical interval" value and so on, until either the "typical
> > -interval" is determined or too many data points are disregarded, in which case
> > -the "typical interval" is assumed to equal "infinity" (the maximum unsigned
> > -integer value).
> > +interval" is determined or too many data points are disregarded.  In the latter
> > +case, if the size of the set of data points still under consideration is
> > +sufficiently large, the next idle duration is not likely to be above the largest
> > +idle duration value still in that set, so that value is taken as the predicted
> > +next idle duration.  Finally, if the set of data points still under
> > +consideration is too small, no prediction is made.
> >
> > -If the "typical interval" computed this way is long enough, the governor obtains
> > -the time until the closest timer event with the assumption that the scheduler
> > -tick will be stopped.  That time, referred to as the *sleep length* in what follows,
> > -is the upper bound on the time before the next CPU wakeup.  It is used to determine
> > -the sleep length range, which in turn is needed to get the sleep length correction
> > -factor.
> > +If the preliminary prediction of the next idle duration computed this way is
> > +long enough, the governor obtains the time until the closest timer event with
> > +the assumption that the scheduler tick will be stopped.  That time, referred to
> > +as the *sleep length* in what follows, is the upper bound on the time before the
> > +next CPU wakeup.  It is used to determine the sleep length range, which in turn
> > +is needed to get the sleep length correction factor.
> >
> >  The ``menu`` governor maintains an array containing several correction factor
> >  values that correspond to different sleep length ranges organized so that each
> > @@ -302,7 +307,7 @@
> >  The sleep length is multiplied by the correction factor for the range that it
> >  falls into to obtain an approximation of the predicted idle duration that is
> >  compared to the "typical interval" determined previously and the minimum of
> > -the two is taken as the idle duration prediction.
> > +the two is taken as the final idle duration prediction.
> >
> >  If the "typical interval" value is small, which means that the CPU is likely
> >  to be woken up soon enough, the sleep length computation is skipped as it may
> > --- a/drivers/cpuidle/governors/menu.c
> > +++ b/drivers/cpuidle/governors/menu.c
> > @@ -41,7 +41,7 @@
> >   * the  C state is required to actually break even on this cost. CPUIDLE
> >   * provides us this duration in the "target_residency" field. So all that we
> >   * need is a good prediction of how long we'll be idle. Like the traditional
> > - * menu governor, we start with the actual known "next timer event" time.
> > + * menu governor, we take the actual known "next timer event" time.
> >   *
> >   * Since there are other source of wakeups (interrupts for example) than
> >   * the next timer event, this estimation is rather optimistic. To get a
> > @@ -50,30 +50,21 @@
> >   * duration always was 50% of the next timer tick, the correction factor will
> >   * be 0.5.
> >   *
> > - * menu uses a running average for this correction factor, however it uses a
> > - * set of factors, not just a single factor. This stems from the realization
> > - * that the ratio is dependent on the order of magnitude of the expected
> > - * duration; if we expect 500 milliseconds of idle time the likelihood of
> > - * getting an interrupt very early is much higher than if we expect 50 micro
> > - * seconds of idle time. A second independent factor that has big impact on
> > - * the actual factor is if there is (disk) IO outstanding or not.
> > - * (as a special twist, we consider every sleep longer than 50 milliseconds
> > - * as perfect; there are no power gains for sleeping longer than this)
> > - *
> > - * For these two reasons we keep an array of 12 independent factors, that gets
> > - * indexed based on the magnitude of the expected duration as well as the
> > - * "is IO outstanding" property.
> > + * menu uses a running average for this correction factor, but it uses a set of
> > + * factors, not just a single factor. This stems from the realization that the
> > + * ratio is dependent on the order of magnitude of the expected duration; if we
> > + * expect 500 milliseconds of idle time the likelihood of getting an interrupt
> > + * very early is much higher than if we expect 50 micro seconds of idle time.
> > + * For this reason, menu keeps an array of 6 independent factors, that gets
> > + * indexed based on the magnitude of the expected duration.
> >   *
> >   * Repeatable-interval-detector
> >   * ----------------------------
> >   * There are some cases where "next timer" is a completely unusable predictor:
> >   * Those cases where the interval is fixed, for example due to hardware
> > - * interrupt mitigation, but also due to fixed transfer rate devices such as
> > - * mice.
> > + * interrupt mitigation, but also due to fixed transfer rate devices like mice.
> >   * For this, we use a different predictor: We track the duration of the last 8
> > - * intervals and if the stand deviation of these 8 intervals is below a
> > - * threshold value, we use the average of these intervals as prediction.
> > - *
> > + * intervals and use them to estimate the duration of the next one.
> >   */
>
> Assuming you fix up the typos in the commit message when applying:

I will.

> Reviewed-by: Christian Loehle <christian.loehle@arm.com>

Thank you!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-02-24 12:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-10 12:46 [PATCH v1] cpuidle: menu: Update documentation after previous changes Rafael J. Wysocki
2025-01-10 14:19 ` Christian Loehle
  -- strict thread matches above, loose matches on Subject: below --
2025-02-20 20:13 Rafael J. Wysocki
2025-02-24 12:41 ` Christian Loehle
2025-02-24 12:57   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox