[PATCH] cpuidle: Improve perf for certain workloads

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] cpuidle: Improve perf for certain workloads
@ 2014-06-13 10:23 Ross Lagerwall
  2014-06-13 11:37 ` Jan Beulich
  0 siblings, 1 reply; 3+ messages in thread
From: Ross Lagerwall @ 2014-06-13 10:23 UTC (permalink / raw)
  To: xen-devel; +Cc: Liu Jinsong, Keir Fraser, Jan Beulich, Ross Lagerwall

The existing mechanism of using interrupt frequency as a heuristic does
not work well for certain workloads.  As an example, synchronous dd on a
small block size uses deep C-states because much of the time is spent
doing processing so the interrupt frequency is not too high, but when an
IOP is submitted, the interrupt occurs soon after going idle.  This
causes exit latency to be a significant factor.

To fix this, add a new factor which limits the exit latency to be no
more than 10% of the decaying measured idle time.  This improves
performance for workloads with a medium interrupt frequency but a short
idle duration.

In the workload given previously, throughput improves by 20% with this
patch.

A side effect of this patch is to fix the use of MAX_INTERESTING.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 xen/arch/x86/acpi/cpuidle_menu.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/acpi/cpuidle_menu.c b/xen/arch/x86/acpi/cpuidle_menu.c
index 6952776..89f532c 100644
--- a/xen/arch/x86/acpi/cpuidle_menu.c
+++ b/xen/arch/x86/acpi/cpuidle_menu.c
@@ -36,6 +36,7 @@
 #define RESOLUTION 1024
 #define DECAY 4
 #define MAX_INTERESTING 50000
+#define LATENCY_MULTIPLIER 10
 
 /*
  * Concepts and ideas behind the menu governor
@@ -88,6 +89,10 @@
  * the average interrupt interval is, the smaller C state latency should be
  * and thus the less likely a busy CPU will hit such a deep C state.
  *
+ * As an additional rule to reduce the performance impact, menu tries to
+ * limit the exit latency duration to be no more than 10% of the decaying
+ * measured idle time.
+ *
  */
 
 struct perf_factor{
@@ -102,6 +107,7 @@ struct menu_device
     int             last_state_idx;
     unsigned int    expected_us;
     u64             predicted_us;
+    u64             latency_factor;
     unsigned int    measured_us;
     unsigned int    exit_us;
     unsigned int    bucket;
@@ -199,6 +205,10 @@ static int menu_select(struct acpi_processor_power *power)
 
     io_interval = avg_intr_interval_us();
 
+    data->latency_factor = DIV_ROUND(
+            data->latency_factor * (DECAY - 1) + data->measured_us,
+            DECAY);
+
     /*
      * if the correction factor is 0 (eg first time init or cpu hotplug
      * etc), we actually want to start out with a unity factor.
@@ -220,6 +230,8 @@ static int menu_select(struct acpi_processor_power *power)
             break;
         if (s->latency * IO_MULTIPLIER > io_interval)
             break;
+        if (s->latency * LATENCY_MULTIPLIER > data->latency_factor)
+            break;
         /* TBD: we need to check the QoS requirment in future */
         data->exit_us = s->latency;
         data->last_state_idx = i;
@@ -231,18 +243,16 @@ static int menu_select(struct acpi_processor_power *power)
 static void menu_reflect(struct acpi_processor_power *power)
 {
     struct menu_device *data = &__get_cpu_var(menu_devices);
-    unsigned int last_idle_us = power->last_residency;
-    unsigned int measured_us;
     u64 new_factor;
 
-    measured_us = last_idle_us;
+    data->measured_us = power->last_residency;
 
     /*
      * We correct for the exit latency; we are assuming here that the
      * exit latency happens after the event that we're interested in.
      */
-    if (measured_us > data->exit_us)
-        measured_us -= data->exit_us;
+    if (data->measured_us > data->exit_us)
+        data->measured_us -= data->exit_us;
 
     /* update our correction ratio */
 
@@ -250,7 +260,7 @@ static void menu_reflect(struct acpi_processor_power *power)
         * (DECAY - 1) / DECAY;
 
     if (data->expected_us > 0 && data->measured_us < MAX_INTERESTING)
-        new_factor += RESOLUTION * measured_us / data->expected_us;
+        new_factor += RESOLUTION * data->measured_us / data->expected_us;
     else
         /*
          * we were idle so long that we count it as a perfect
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] cpuidle: Improve perf for certain workloads
  2014-06-13 10:23 [PATCH] cpuidle: Improve perf for certain workloads Ross Lagerwall
@ 2014-06-13 11:37 ` Jan Beulich
  2014-06-13 12:53   ` Ross Lagerwall
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Beulich @ 2014-06-13 11:37 UTC (permalink / raw)
  To: ross.lagerwall, xen-devel; +Cc: jinsong.liu, keir

>>> Ross Lagerwall <ross.lagerwall@citrix.com> 06/13/14 12:23 PM >>>
>The existing mechanism of using interrupt frequency as a heuristic does
>not work well for certain workloads.  As an example, synchronous dd on a
>small block size uses deep C-states because much of the time is spent
>doing processing so the interrupt frequency is not too high, but when an
>IOP is submitted, the interrupt occurs soon after going idle.  This
>causes exit latency to be a significant factor.
>
>To fix this, add a new factor which limits the exit latency to be no
>more than 10% of the decaying measured idle time.  This improves
>performance for workloads with a medium interrupt frequency but a short
>idle duration.

Does this have a Linux counterpart (after all the code here is a clone from
Linux'es)? If so, adding a cross reference would be appreciated. If not, I'd
expect you to explain why Xen needs what Linux doesn't need.

>In the workload given previously, throughput improves by 20% with this
>patch.

This is the positive side. Did you also check for no negative effects?

>A side effect of this patch is to fix the use of MAX_INTERESTING.

What does this sentence refer to?

>@@ -88,6 +89,10 @@
  >* the average interrupt interval is, the smaller C state latency should be
  >* and thus the less likely a busy CPU will hit such a deep C state.
  >*
>+ * As an additional rule to reduce the performance impact, menu tries to
>+ * limit the exit latency duration to be no more than 10% of the decaying
>+ * measured idle time.
>+ *
  >*/
 
Even if previously there was a blank comment line at the end, please avoid
retaining such when you add further text anyway.

Jan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] cpuidle: Improve perf for certain workloads
  2014-06-13 11:37 ` Jan Beulich
@ 2014-06-13 12:53   ` Ross Lagerwall
  0 siblings, 0 replies; 3+ messages in thread
From: Ross Lagerwall @ 2014-06-13 12:53 UTC (permalink / raw)
  To: Jan Beulich, ross.lagerwall, xen-devel; +Cc: jinsong.liu, keir

On 06/13/2014 12:37 PM, Jan Beulich wrote:
>>>> Ross Lagerwall <ross.lagerwall@citrix.com> 06/13/14 12:23 PM >>>
>> The existing mechanism of using interrupt frequency as a heuristic does
>> not work well for certain workloads.  As an example, synchronous dd on a
>> small block size uses deep C-states because much of the time is spent
>> doing processing so the interrupt frequency is not too high, but when an
>> IOP is submitted, the interrupt occurs soon after going idle.  This
>> causes exit latency to be a significant factor.
>>
>> To fix this, add a new factor which limits the exit latency to be no
>> more than 10% of the decaying measured idle time.  This improves
>> performance for workloads with a medium interrupt frequency but a short
>> idle duration.
>
> Does this have a Linux counterpart (after all the code here is a clone from
> Linux'es)? If so, adding a cross reference would be appreciated. If not, I'd
> expect you to explain why Xen needs what Linux doesn't need.

No it does not have a Linux counterpart. The Linux equivalent code for 
exit latency uses a combination of the PM QOS interface, the load 
average of the system and the number of processes in IO wait state on 
that CPU. If a process is in IO wait state, it compares the exit latency 
with the predicted residency reduced by a factor of 10, which is 
somewhat similar to what this patch does.

The use of average interrupt frequency was introduced by Keir Fraser in 
353533232730 ("cpuidle: fix the menu governor to enhance IO 
performance") when porting a Linux patch to Xen.

>
>> In the workload given previously, throughput improves by 20% with this
>> patch.
>
> This is the positive side. Did you also check for no negative effects?

 From a performance perspective, this patch will never cause the machine 
to go in a deeper C-state than it would have previously.

 From a power perspective, I don't have a power meter to do actual 
measurements but watching xenpm on an idle system shows that almost all 
of the time is spent in the deepest C-state.

>
>> A side effect of this patch is to fix the use of MAX_INTERESTING.
>
> What does this sentence refer to?

I will make that more explicit in the commit message.

>
>> @@ -88,6 +89,10 @@
>    >* the average interrupt interval is, the smaller C state latency should be
>    >* and thus the less likely a busy CPU will hit such a deep C state.
>    >*
>> + * As an additional rule to reduce the performance impact, menu tries to
>> + * limit the exit latency duration to be no more than 10% of the decaying
>> + * measured idle time.
>> + *
>    >*/
>
> Even if previously there was a blank comment line at the end, please avoid
> retaining such when you add further text anyway.
>

OK.

Thanks for the review,
-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-06-13 12:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-13 10:23 [PATCH] cpuidle: Improve perf for certain workloads Ross Lagerwall
2014-06-13 11:37 ` Jan Beulich
2014-06-13 12:53   ` Ross Lagerwall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.