linux-s390.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler
@ 2012-07-18 13:37 Raghavendra K T
  2012-07-18 13:37 ` [PATCH RFC V5 1/3] kvm/config: Add config to support ple or cpu relax optimzation Raghavendra K T
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Raghavendra K T @ 2012-07-18 13:37 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Marcelo Tosatti, Ingo Molnar,
	Avi Kivity, Rik van Riel
  Cc: Srikar, S390, Carsten Otte, Christian Borntraeger, KVM,
	Raghavendra K T, chegu vinod, Andrew M. Theurer, LKML, X86,
	Gleb Natapov, linux390, Srivatsa Vaddagiri, Joerg Roedel


Currently Pause Loop Exit (PLE) handler is doing directed yield to a
random vcpu on pl-exit. We already have filtering while choosing
the candidate to yield_to. This change adds more checks while choosing
a candidate to yield_to.

On a large vcpu guests, there is a high probability of
yielding to the same vcpu who had recently done a pause-loop exit. 
Such a yield can lead to the vcpu spinning again.

The patchset keeps track of the pause loop exit and gives chance to a
vcpu which has:

 (a) Not done pause loop exit at all (probably he is preempted lock-holder)

 (b) vcpu skipped in last iteration because it did pause loop exit, and
 probably has become eligible now (next eligible lock holder)

This concept also helps in cpu relax interception cases which use same handler.

Changes since V4:
 - Naming Change (Avi):
  struct ple ==> struct spin_loop
  cpu_relax_intercepted ==> in_spin_loop
  vcpu_check_and_update_eligible ==> vcpu_eligible_for_directed_yield
 - mark vcpu in spinloop as not eligible to avoid influence of previous exit

Changes since V3:
 - arch specific fix/changes (Christian)

Changes since v2:
 - Move ple structure to common code (Avi)
 - rename pause_loop_exited to cpu_relax_intercepted (Avi)
 - add config HAVE_KVM_CPU_RELAX_INTERCEPT (Avi)
 - Drop superfluous curly braces (Ingo)

Changes since v1:
 - Add more documentation for structure and algorithm and Rename
   plo ==> ple (Rik).
 - change dy_eligible initial value to false. (otherwise very first directed
    yield will not be skipped. (Nikunj)
 - fixup signoff/from issue

Future enhancements:
  (1) Currently we have a boolean to decide on eligibility of vcpu. It
    would be nice if I get feedback on guest (>32 vcpu) whether we can
    improve better with integer counter. (with counter = say f(log n )).
  
  (2) We have not considered system load during iteration of vcpu. With
   that information we can limit the scan and also decide whether schedule()
   is better. [ I am able to use #kicked vcpus to decide on this But may
   be there are better ideas like information from global loadavg.]

  (3) We can exploit this further with PV patches since it also knows about
   next eligible lock-holder.

Summary: There is a very good improvement for kvm based guest on PLE machine.
The V5 has huge improvement for kbench.

+-----------+-----------+-----------+------------+-----------+
   base_rik    stdev       patched      stdev       %improve
+-----------+-----------+-----------+------------+-----------+
              kernbench (time in sec lesser is better)
+-----------+-----------+-----------+------------+-----------+
 1x    49.2300     1.0171    22.6842     0.3073    117.0233 %
 2x    91.9358     1.7768    53.9608     1.0154    70.37516 %
+-----------+-----------+-----------+------------+-----------+

+-----------+-----------+-----------+------------+-----------+
              ebizzy (records/sec more is better)
+-----------+-----------+-----------+------------+-----------+
 1x  1129.2500    28.6793    2125.6250    32.8239    88.23334 %
 2x  1892.3750    75.1112    2377.1250   181.6822    25.61596 %
+-----------+-----------+-----------+------------+-----------+

Note: The patches are tested on x86.

 Links
  V4: https://lkml.org/lkml/2012/7/16/80
  V3: https://lkml.org/lkml/2012/7/12/437
  V2: https://lkml.org/lkml/2012/7/10/392
  V1: https://lkml.org/lkml/2012/7/9/32

 Raghavendra K T (3):
   config: Add config to support ple or cpu relax optimzation 
   kvm : Note down when cpu relax intercepted or pause loop exited 
   kvm : Choose a better candidate for directed yield 
---
 arch/s390/kvm/Kconfig    |    1 +
 arch/x86/kvm/Kconfig     |    1 +
 include/linux/kvm_host.h |   39 +++++++++++++++++++++++++++++++++++++++
 virt/kvm/Kconfig         |    3 +++
 virt/kvm/kvm_main.c      |   41 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 85 insertions(+), 0 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH RFC V5 1/3] kvm/config: Add config to support ple or cpu relax optimzation
  2012-07-18 13:37 [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler Raghavendra K T
@ 2012-07-18 13:37 ` Raghavendra K T
  2012-07-18 13:37 ` [PATCH RFC V5 2/3] kvm: Note down when cpu relax intercepted or pause loop exited Raghavendra K T
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Raghavendra K T @ 2012-07-18 13:37 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Avi Kivity, Ingo Molnar,
	Marcelo Tosatti, Rik van Riel
  Cc: Srikar, S390, Carsten Otte, Christian Borntraeger, KVM,
	Raghavendra K T, chegu vinod, Andrew M. Theurer, LKML, X86,
	Gleb Natapov, linux390, Srivatsa Vaddagiri, Joerg Roedel

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

Suggested-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/s390/kvm/Kconfig |    1 +
 arch/x86/kvm/Kconfig  |    1 +
 virt/kvm/Kconfig      |    3 +++
 3 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index 78eb984..a6e2677 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -21,6 +21,7 @@ config KVM
 	depends on HAVE_KVM && EXPERIMENTAL
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
+	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	---help---
 	  Support hosting paravirtualized guest machines using the SIE
 	  virtualization capability on the mainframe. This should work
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index a28f338..45c044f 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -37,6 +37,7 @@ config KVM
 	select TASK_DELAY_ACCT
 	select PERF_EVENTS
 	select HAVE_KVM_MSI
+	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	---help---
 	  Support hosting fully virtualized guest machines using hardware
 	  virtualization extensions.  You will need a fairly recent
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 28694f4..d01b24b 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -21,3 +21,6 @@ config KVM_ASYNC_PF
 
 config HAVE_KVM_MSI
        bool
+
+config HAVE_KVM_CPU_RELAX_INTERCEPT
+       bool

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFC V5 2/3] kvm: Note down when cpu relax intercepted or pause loop exited
  2012-07-18 13:37 [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler Raghavendra K T
  2012-07-18 13:37 ` [PATCH RFC V5 1/3] kvm/config: Add config to support ple or cpu relax optimzation Raghavendra K T
@ 2012-07-18 13:37 ` Raghavendra K T
  2012-07-18 13:38 ` [PATCH RFC V5 3/3] kvm: Choose better candidate for directed yield Raghavendra K T
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Raghavendra K T @ 2012-07-18 13:37 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Marcelo Tosatti, Ingo Molnar,
	Avi Kivity, Rik van Riel
  Cc: Srikar, S390, Carsten Otte, Christian Borntraeger, KVM,
	Raghavendra K T, chegu vinod, Andrew M. Theurer, LKML, X86,
	Gleb Natapov, linux390, Srivatsa Vaddagiri, Joerg Roedel

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

Noting pause loop exited vcpu or cpu relax intercepted helps in
filtering right candidate to yield. Wrong selection of vcpu;
i.e., a vcpu that just did a pl-exit or cpu relax intercepted may
contribute to performance degradation.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
V2 was:
Reviewed-by: Rik van Riel <riel@redhat.com>

 include/linux/kvm_host.h |   34 ++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c      |    5 +++++
 2 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c446435..34ce296 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -183,6 +183,18 @@ struct kvm_vcpu {
 	} async_pf;
 #endif
 
+#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
+	/*
+	 * Cpu relax intercept or pause loop exit optimization
+	 * in_spin_loop: set when a vcpu does a pause loop exit
+	 *  or cpu relax intercepted.
+	 * dy_eligible: indicates whether vcpu is eligible for directed yield.
+	 */
+	struct {
+		bool in_spin_loop;
+		bool dy_eligible;
+	} spin_loop;
+#endif
 	struct kvm_vcpu_arch arch;
 };
 
@@ -890,5 +902,27 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
 	}
 }
 
+#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
+
+static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
+{
+	vcpu->spin_loop.in_spin_loop = val;
+}
+static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val)
+{
+	vcpu->spin_loop.dy_eligible = val;
+}
+
+#else /* !CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT */
+
+static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
+{
+}
+
+static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val)
+{
+}
+
+#endif /* CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT */
 #endif
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7e14068..3d6ffc8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -236,6 +236,9 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 	}
 	vcpu->run = page_address(page);
 
+	kvm_vcpu_set_in_spin_loop(vcpu, false);
+	kvm_vcpu_set_dy_eligible(vcpu, false);
+
 	r = kvm_arch_vcpu_init(vcpu);
 	if (r < 0)
 		goto fail_free_run;
@@ -1577,6 +1580,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 	int pass;
 	int i;
 
+	kvm_vcpu_set_in_spin_loop(me, true);
 	/*
 	 * We boost the priority of a VCPU that is runnable but not
 	 * currently running, because it got preempted by something
@@ -1602,6 +1606,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 			}
 		}
 	}
+	kvm_vcpu_set_in_spin_loop(me, false);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFC V5 3/3] kvm: Choose better candidate for directed yield
  2012-07-18 13:37 [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler Raghavendra K T
  2012-07-18 13:37 ` [PATCH RFC V5 1/3] kvm/config: Add config to support ple or cpu relax optimzation Raghavendra K T
  2012-07-18 13:37 ` [PATCH RFC V5 2/3] kvm: Note down when cpu relax intercepted or pause loop exited Raghavendra K T
@ 2012-07-18 13:38 ` Raghavendra K T
  2012-07-18 14:39   ` Raghavendra K T
  2012-07-20 17:36 ` [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler Marcelo Tosatti
  2012-07-23 10:03 ` Avi Kivity
  4 siblings, 1 reply; 12+ messages in thread
From: Raghavendra K T @ 2012-07-18 13:38 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Avi Kivity, Ingo Molnar,
	Marcelo Tosatti, Rik van Riel
  Cc: Srikar, S390, Carsten Otte, Christian Borntraeger, KVM,
	Raghavendra K T, chegu vinod, Andrew M. Theurer, LKML, X86,
	Gleb Natapov, linux390, Srivatsa Vaddagiri, Joerg Roedel

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

Currently, on a large vcpu guests, there is a high probability of
yielding to the same vcpu who had recently done a pause-loop exit or
cpu relax intercepted. Such a yield can lead to the vcpu spinning
again and hence degrade the performance.

The patchset keeps track of the pause loop exit/cpu relax interception
and gives chance to a vcpu which:
 (a) Has not done pause loop exit or cpu relax intercepted at all
     (probably he is preempted lock-holder)
 (b) Was skipped in last iteration because it did pause loop exit or
     cpu relax intercepted, and probably has become eligible now
     (next eligible lock holder)

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
V2 was:
Reviewed-by: Rik van Riel <riel@redhat.com>

 include/linux/kvm_host.h |    5 +++++
 virt/kvm/kvm_main.c      |   36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 34ce296..952427d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -923,6 +923,11 @@ static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val)
 {
 }
 
+static inline bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
+{
+	return true;
+}
+
 #endif /* CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT */
 #endif
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3d6ffc8..bf9fb97 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1571,6 +1571,39 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
 
+#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
+/*
+ * Helper that checks whether a VCPU is eligible for directed yield.
+ * Most eligible candidate to yield is decided by following heuristics:
+ *
+ *  (a) VCPU which has not done pl-exit or cpu relax intercepted recently
+ *  (preempted lock holder), indicated by @in_spin_loop.
+ *  Set at the beiginning and cleared at the end of interception/PLE handler.
+ *
+ *  (b) VCPU which has done pl-exit/ cpu relax intercepted but did not get
+ *  chance last time (mostly it has become eligible now since we have probably
+ *  yielded to lockholder in last iteration. This is done by toggling
+ *  @dy_eligible each time a VCPU checked for eligibility.)
+ *
+ *  Yielding to a recently pl-exited/cpu relax intercepted VCPU before yielding
+ *  to preempted lock-holder could result in wrong VCPU selection and CPU
+ *  burning. Giving priority for a potential lock-holder increases lock
+ *  progress.
+ */
+bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
+{
+	bool eligible;
+
+	eligible = !vcpu->spin_loop.in_spin_loop ||
+			(vcpu->spin_loop.in_spin_loop &&
+			 vcpu->spin_loop.dy_eligible);
+
+	if (vcpu->spin_loop.in_spin_loop)
+		vcpu->spin_loop.dy_eligible = !vcpu->spin_loop.dy_eligible;
+
+	return eligible;
+}
+#endif
 void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 {
 	struct kvm *kvm = me->kvm;
@@ -1599,6 +1632,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 				continue;
 			if (waitqueue_active(&vcpu->wq))
 				continue;
+			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
+				continue;
 			if (kvm_vcpu_yield_to(vcpu)) {
 				kvm->last_boosted_vcpu = i;
 				yielded = 1;
@@ -1607,6 +1642,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 		}
 	}
 	kvm_vcpu_set_in_spin_loop(me, false);
+	kvm_vcpu_set_dy_eligible(me, false);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC V5 3/3] kvm: Choose better candidate for directed yield
  2012-07-18 13:38 ` [PATCH RFC V5 3/3] kvm: Choose better candidate for directed yield Raghavendra K T
@ 2012-07-18 14:39   ` Raghavendra K T
  2012-07-19  9:47     ` [RESEND PATCH " Raghavendra K T
  0 siblings, 1 reply; 12+ messages in thread
From: Raghavendra K T @ 2012-07-18 14:39 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Raghavendra K T, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Marcelo Tosatti, Rik van Riel, Srikar, S390, Carsten Otte,
	Christian Borntraeger, KVM, chegu vinod, Andrew M. Theurer, LKML,
	X86, Gleb Natapov, linux390, Srivatsa Vaddagiri, Joerg Roedel

On 07/18/2012 07:08 PM, Raghavendra K T wrote:
> From: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
> +bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
> +{
> +	bool eligible;
> +
> +	eligible = !vcpu->spin_loop.in_spin_loop ||
> +			(vcpu->spin_loop.in_spin_loop&&
> +			 vcpu->spin_loop.dy_eligible);
> +
> +	if (vcpu->spin_loop.in_spin_loop)
> +		vcpu->spin_loop.dy_eligible = !vcpu->spin_loop.dy_eligible;
> +
> +	return eligible;
> +}

I should have added a comment like:
Since algorithm is based on heuristics, accessing another vcpu data
without locking does not harm. It may result in trying to yield to  same 
VCPU, fail and continue with next and so on.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RESEND PATCH RFC V5 3/3] kvm: Choose better candidate for directed yield
  2012-07-18 14:39   ` Raghavendra K T
@ 2012-07-19  9:47     ` Raghavendra K T
  0 siblings, 0 replies; 12+ messages in thread
From: Raghavendra K T @ 2012-07-19  9:47 UTC (permalink / raw)
  To: Raghavendra K T, Avi Kivity, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, Marcelo Tosatti, Rik van Riel, Srikar
  Cc: S390, Carsten Otte, Christian Borntraeger, KVM, chegu vinod,
	Andrew M. Theurer, LKML, X86, Gleb Natapov, linux390,
	Srivatsa Vaddagiri, Joerg Roedel

Currently, on a large vcpu guests, there is a high probability of
yielding to the same vcpu who had recently done a pause-loop exit or
cpu relax intercepted. Such a yield can lead to the vcpu spinning
again and hence degrade the performance.

The patchset keeps track of the pause loop exit/cpu relax interception
and gives chance to a vcpu which:
 (a) Has not done pause loop exit or cpu relax intercepted at all
     (probably he is preempted lock-holder)
 (b) Was skipped in last iteration because it did pause loop exit or
     cpu relax intercepted, and probably has become eligible now
     (next eligible lock holder)

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
V2 was:
Reviewed-by: Rik van Riel <riel@redhat.com>

 Changelog: Added comment on locking as suggested by Avi

 include/linux/kvm_host.h |    5 +++++
 virt/kvm/kvm_main.c      |   42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 34ce296..952427d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -923,6 +923,11 @@ static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val)
 {
 }
 
+static inline bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
+{
+	return true;
+}
+
 #endif /* CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT */
 #endif
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3d6ffc8..8fda756 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1571,6 +1571,43 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
 
+#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
+/*
+ * Helper that checks whether a VCPU is eligible for directed yield.
+ * Most eligible candidate to yield is decided by following heuristics:
+ *
+ *  (a) VCPU which has not done pl-exit or cpu relax intercepted recently
+ *  (preempted lock holder), indicated by @in_spin_loop.
+ *  Set at the beiginning and cleared at the end of interception/PLE handler.
+ *
+ *  (b) VCPU which has done pl-exit/ cpu relax intercepted but did not get
+ *  chance last time (mostly it has become eligible now since we have probably
+ *  yielded to lockholder in last iteration. This is done by toggling
+ *  @dy_eligible each time a VCPU checked for eligibility.)
+ *
+ *  Yielding to a recently pl-exited/cpu relax intercepted VCPU before yielding
+ *  to preempted lock-holder could result in wrong VCPU selection and CPU
+ *  burning. Giving priority for a potential lock-holder increases lock
+ *  progress.
+ *
+ *  Since algorithm is based on heuristics, accessing another VCPU data without
+ *  locking does not harm. It may result in trying to yield to  same VCPU, fail
+ *  and continue with next VCPU and so on.
+ */
+bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
+{
+	bool eligible;
+
+	eligible = !vcpu->spin_loop.in_spin_loop ||
+			(vcpu->spin_loop.in_spin_loop &&
+			 vcpu->spin_loop.dy_eligible);
+
+	if (vcpu->spin_loop.in_spin_loop)
+		kvm_vcpu_set_dy_eligible(vcpu, !vcpu->spin_loop.dy_eligible);
+
+	return eligible;
+}
+#endif
 void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 {
 	struct kvm *kvm = me->kvm;
@@ -1599,6 +1636,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 				continue;
 			if (waitqueue_active(&vcpu->wq))
 				continue;
+			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
+				continue;
 			if (kvm_vcpu_yield_to(vcpu)) {
 				kvm->last_boosted_vcpu = i;
 				yielded = 1;
@@ -1607,6 +1646,9 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 		}
 	}
 	kvm_vcpu_set_in_spin_loop(me, false);
+
+	/* Ensure vcpu is not eligible during next spinloop */
+	kvm_vcpu_set_dy_eligible(me, false);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler
  2012-07-18 13:37 [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler Raghavendra K T
                   ` (2 preceding siblings ...)
  2012-07-18 13:38 ` [PATCH RFC V5 3/3] kvm: Choose better candidate for directed yield Raghavendra K T
@ 2012-07-20 17:36 ` Marcelo Tosatti
  2012-07-22 12:34   ` Raghavendra K T
  2012-07-23 10:03 ` Avi Kivity
  4 siblings, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2012-07-20 17:36 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Avi Kivity,
	Rik van Riel, Srikar, S390, Carsten Otte, Christian Borntraeger,
	KVM, chegu vinod, Andrew M. Theurer, LKML, X86, Gleb Natapov,
	linux390, Srivatsa Vaddagiri, Joerg Roedel

On Wed, Jul 18, 2012 at 07:07:17PM +0530, Raghavendra K T wrote:
> 
> Currently Pause Loop Exit (PLE) handler is doing directed yield to a
> random vcpu on pl-exit. We already have filtering while choosing
> the candidate to yield_to. This change adds more checks while choosing
> a candidate to yield_to.
> 
> On a large vcpu guests, there is a high probability of
> yielding to the same vcpu who had recently done a pause-loop exit. 
> Such a yield can lead to the vcpu spinning again.
> 
> The patchset keeps track of the pause loop exit and gives chance to a
> vcpu which has:
> 
>  (a) Not done pause loop exit at all (probably he is preempted lock-holder)
> 
>  (b) vcpu skipped in last iteration because it did pause loop exit, and
>  probably has become eligible now (next eligible lock holder)
> 
> This concept also helps in cpu relax interception cases which use same handler.
> 
> Changes since V4:
>  - Naming Change (Avi):
>   struct ple ==> struct spin_loop
>   cpu_relax_intercepted ==> in_spin_loop
>   vcpu_check_and_update_eligible ==> vcpu_eligible_for_directed_yield
>  - mark vcpu in spinloop as not eligible to avoid influence of previous exit
> 
> Changes since V3:
>  - arch specific fix/changes (Christian)
> 
> Changes since v2:
>  - Move ple structure to common code (Avi)
>  - rename pause_loop_exited to cpu_relax_intercepted (Avi)
>  - add config HAVE_KVM_CPU_RELAX_INTERCEPT (Avi)
>  - Drop superfluous curly braces (Ingo)
> 
> Changes since v1:
>  - Add more documentation for structure and algorithm and Rename
>    plo ==> ple (Rik).
>  - change dy_eligible initial value to false. (otherwise very first directed
>     yield will not be skipped. (Nikunj)
>  - fixup signoff/from issue
> 
> Future enhancements:
>   (1) Currently we have a boolean to decide on eligibility of vcpu. It
>     would be nice if I get feedback on guest (>32 vcpu) whether we can
>     improve better with integer counter. (with counter = say f(log n )).
>   
>   (2) We have not considered system load during iteration of vcpu. With
>    that information we can limit the scan and also decide whether schedule()
>    is better. [ I am able to use #kicked vcpus to decide on this But may
>    be there are better ideas like information from global loadavg.]
> 
>   (3) We can exploit this further with PV patches since it also knows about
>    next eligible lock-holder.
> 
> Summary: There is a very good improvement for kvm based guest on PLE machine.
> The V5 has huge improvement for kbench.
> 
> +-----------+-----------+-----------+------------+-----------+
>    base_rik    stdev       patched      stdev       %improve
> +-----------+-----------+-----------+------------+-----------+
>               kernbench (time in sec lesser is better)
> +-----------+-----------+-----------+------------+-----------+
>  1x    49.2300     1.0171    22.6842     0.3073    117.0233 %
>  2x    91.9358     1.7768    53.9608     1.0154    70.37516 %
> +-----------+-----------+-----------+------------+-----------+
> 
> +-----------+-----------+-----------+------------+-----------+
>               ebizzy (records/sec more is better)
> +-----------+-----------+-----------+------------+-----------+
>  1x  1129.2500    28.6793    2125.6250    32.8239    88.23334 %
>  2x  1892.3750    75.1112    2377.1250   181.6822    25.61596 %
> +-----------+-----------+-----------+------------+-----------+
> 
> Note: The patches are tested on x86.
> 
>  Links
>   V4: https://lkml.org/lkml/2012/7/16/80
>   V3: https://lkml.org/lkml/2012/7/12/437
>   V2: https://lkml.org/lkml/2012/7/10/392
>   V1: https://lkml.org/lkml/2012/7/9/32
> 
>  Raghavendra K T (3):
>    config: Add config to support ple or cpu relax optimzation 
>    kvm : Note down when cpu relax intercepted or pause loop exited 
>    kvm : Choose a better candidate for directed yield 
> ---
>  arch/s390/kvm/Kconfig    |    1 +
>  arch/x86/kvm/Kconfig     |    1 +
>  include/linux/kvm_host.h |   39 +++++++++++++++++++++++++++++++++++++++
>  virt/kvm/Kconfig         |    3 +++
>  virt/kvm/kvm_main.c      |   41 +++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 85 insertions(+), 0 deletions(-)

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler
  2012-07-20 17:36 ` [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler Marcelo Tosatti
@ 2012-07-22 12:34   ` Raghavendra K T
  2012-07-22 12:43     ` Avi Kivity
  2012-07-22 17:58     ` Rik van Riel
  0 siblings, 2 replies; 12+ messages in thread
From: Raghavendra K T @ 2012-07-22 12:34 UTC (permalink / raw)
  To: Marcelo Tosatti, Avi Kivity, Rik van Riel, Christian Borntraeger
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Srikar, S390,
	Carsten Otte, KVM, chegu vinod, Andrew M. Theurer, LKML, X86,
	Gleb Natapov, linux390, Srivatsa Vaddagiri, Joerg Roedel

On 07/20/2012 11:06 PM, Marcelo Tosatti wrote:
> On Wed, Jul 18, 2012 at 07:07:17PM +0530, Raghavendra K T wrote:
>>
>> Currently Pause Loop Exit (PLE) handler is doing directed yield to a
>> random vcpu on pl-exit. We already have filtering while choosing
>> the candidate to yield_to. This change adds more checks while choosing
>> a candidate to yield_to.
>>
>> On a large vcpu guests, there is a high probability of
>> yielding to the same vcpu who had recently done a pause-loop exit.
>> Such a yield can lead to the vcpu spinning again.
>>
>> The patchset keeps track of the pause loop exit and gives chance to a
>> vcpu which has:
>>
>>   (a) Not done pause loop exit at all (probably he is preempted lock-holder)
>>
>>   (b) vcpu skipped in last iteration because it did pause loop exit, and
>>   probably has become eligible now (next eligible lock holder)
>>
>> This concept also helps in cpu relax interception cases which use same handler.
>>
>> Changes since V4:
>>   - Naming Change (Avi):
>>    struct ple ==>  struct spin_loop
>>    cpu_relax_intercepted ==>  in_spin_loop
>>    vcpu_check_and_update_eligible ==>  vcpu_eligible_for_directed_yield
>>   - mark vcpu in spinloop as not eligible to avoid influence of previous exit
>>
>> Changes since V3:
>>   - arch specific fix/changes (Christian)
>>
>> Changes since v2:
>>   - Move ple structure to common code (Avi)
>>   - rename pause_loop_exited to cpu_relax_intercepted (Avi)
>>   - add config HAVE_KVM_CPU_RELAX_INTERCEPT (Avi)
>>   - Drop superfluous curly braces (Ingo)
>>
>> Changes since v1:
>>   - Add more documentation for structure and algorithm and Rename
>>     plo ==>  ple (Rik).
>>   - change dy_eligible initial value to false. (otherwise very first directed
>>      yield will not be skipped. (Nikunj)
>>   - fixup signoff/from issue
>>
>> Future enhancements:
>>    (1) Currently we have a boolean to decide on eligibility of vcpu. It
>>      would be nice if I get feedback on guest (>32 vcpu) whether we can
>>      improve better with integer counter. (with counter = say f(log n )).
>>
>>    (2) We have not considered system load during iteration of vcpu. With
>>     that information we can limit the scan and also decide whether schedule()
>>     is better. [ I am able to use #kicked vcpus to decide on this But may
>>     be there are better ideas like information from global loadavg.]
>>
>>    (3) We can exploit this further with PV patches since it also knows about
>>     next eligible lock-holder.
>>
>> Summary: There is a very good improvement for kvm based guest on PLE machine.
>> The V5 has huge improvement for kbench.
>>
>> +-----------+-----------+-----------+------------+-----------+
>>     base_rik    stdev       patched      stdev       %improve
>> +-----------+-----------+-----------+------------+-----------+
>>                kernbench (time in sec lesser is better)
>> +-----------+-----------+-----------+------------+-----------+
>>   1x    49.2300     1.0171    22.6842     0.3073    117.0233 %
>>   2x    91.9358     1.7768    53.9608     1.0154    70.37516 %
>> +-----------+-----------+-----------+------------+-----------+
>>
>> +-----------+-----------+-----------+------------+-----------+
>>                ebizzy (records/sec more is better)
>> +-----------+-----------+-----------+------------+-----------+
>>   1x  1129.2500    28.6793    2125.6250    32.8239    88.23334 %
>>   2x  1892.3750    75.1112    2377.1250   181.6822    25.61596 %
>> +-----------+-----------+-----------+------------+-----------+
>>
>> Note: The patches are tested on x86.
>>
>>   Links
>>    V4: https://lkml.org/lkml/2012/7/16/80
>>    V3: https://lkml.org/lkml/2012/7/12/437
>>    V2: https://lkml.org/lkml/2012/7/10/392
>>    V1: https://lkml.org/lkml/2012/7/9/32
>>
>>   Raghavendra K T (3):
>>     config: Add config to support ple or cpu relax optimzation
>>     kvm : Note down when cpu relax intercepted or pause loop exited
>>     kvm : Choose a better candidate for directed yield
>> ---
>>   arch/s390/kvm/Kconfig    |    1 +
>>   arch/x86/kvm/Kconfig     |    1 +
>>   include/linux/kvm_host.h |   39 +++++++++++++++++++++++++++++++++++++++
>>   virt/kvm/Kconfig         |    3 +++
>>   virt/kvm/kvm_main.c      |   41 +++++++++++++++++++++++++++++++++++++++++
>>   5 files changed, 85 insertions(+), 0 deletions(-)
>
> Reviewed-by: Marcelo Tosatti<mtosatti@redhat.com>
>

Thanks Marcelo for the review. Avi, Rik, Christian, please let me know
if this series looks good now.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler
  2012-07-22 12:34   ` Raghavendra K T
@ 2012-07-22 12:43     ` Avi Kivity
  2012-07-23  7:35       ` Christian Borntraeger
  2012-07-22 17:58     ` Rik van Riel
  1 sibling, 1 reply; 12+ messages in thread
From: Avi Kivity @ 2012-07-22 12:43 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Marcelo Tosatti, Rik van Riel, Christian Borntraeger,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Srikar, S390,
	Carsten Otte, KVM, chegu vinod, Andrew M. Theurer, LKML, X86,
	Gleb Natapov, linux390, Srivatsa Vaddagiri, Joerg Roedel

On 07/22/2012 03:34 PM, Raghavendra K T wrote:
> 
> Thanks Marcelo for the review. Avi, Rik, Christian, please let me know
> if this series looks good now.
> 

It looks fine to me.  Christian, is this okay for s390?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler
  2012-07-22 12:34   ` Raghavendra K T
  2012-07-22 12:43     ` Avi Kivity
@ 2012-07-22 17:58     ` Rik van Riel
  1 sibling, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2012-07-22 17:58 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Marcelo Tosatti, Avi Kivity, Christian Borntraeger,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Srikar, S390,
	Carsten Otte, KVM, chegu vinod, Andrew M. Theurer, LKML, X86,
	Gleb Natapov, linux390, Srivatsa Vaddagiri, Joerg Roedel

On 07/22/2012 08:34 AM, Raghavendra K T wrote:
> On 07/20/2012 11:06 PM, Marcelo Tosatti wrote:
>> On Wed, Jul 18, 2012 at 07:07:17PM +0530, Raghavendra K T wrote:
>>>
>>> Currently Pause Loop Exit (PLE) handler is doing directed yield to a
>>> random vcpu on pl-exit. We already have filtering while choosing
>>> the candidate to yield_to. This change adds more checks while choosing
>>> a candidate to yield_to.
>>>
>>> On a large vcpu guests, there is a high probability of
>>> yielding to the same vcpu who had recently done a pause-loop exit.
>>> Such a yield can lead to the vcpu spinning again.
>>>
>>> The patchset keeps track of the pause loop exit and gives chance to a
>>> vcpu which has:
>>>
>>> (a) Not done pause loop exit at all (probably he is preempted
>>> lock-holder)
>>>
>>> (b) vcpu skipped in last iteration because it did pause loop exit, and
>>> probably has become eligible now (next eligible lock holder)
>>>
>>> This concept also helps in cpu relax interception cases which use
>>> same handler.
>>>
>>> Changes since V4:
>>> - Naming Change (Avi):
>>> struct ple ==> struct spin_loop
>>> cpu_relax_intercepted ==> in_spin_loop
>>> vcpu_check_and_update_eligible ==> vcpu_eligible_for_directed_yield
>>> - mark vcpu in spinloop as not eligible to avoid influence of
>>> previous exit
>>>
>>> Changes since V3:
>>> - arch specific fix/changes (Christian)
>>>
>>> Changes since v2:
>>> - Move ple structure to common code (Avi)
>>> - rename pause_loop_exited to cpu_relax_intercepted (Avi)
>>> - add config HAVE_KVM_CPU_RELAX_INTERCEPT (Avi)
>>> - Drop superfluous curly braces (Ingo)
>>>
>>> Changes since v1:
>>> - Add more documentation for structure and algorithm and Rename
>>> plo ==> ple (Rik).
>>> - change dy_eligible initial value to false. (otherwise very first
>>> directed
>>> yield will not be skipped. (Nikunj)
>>> - fixup signoff/from issue
>>>
>>> Future enhancements:
>>> (1) Currently we have a boolean to decide on eligibility of vcpu. It
>>> would be nice if I get feedback on guest (>32 vcpu) whether we can
>>> improve better with integer counter. (with counter = say f(log n )).
>>>
>>> (2) We have not considered system load during iteration of vcpu. With
>>> that information we can limit the scan and also decide whether
>>> schedule()
>>> is better. [ I am able to use #kicked vcpus to decide on this But may
>>> be there are better ideas like information from global loadavg.]
>>>
>>> (3) We can exploit this further with PV patches since it also knows
>>> about
>>> next eligible lock-holder.
>>>
>>> Summary: There is a very good improvement for kvm based guest on PLE
>>> machine.
>>> The V5 has huge improvement for kbench.
>>>
>>> +-----------+-----------+-----------+------------+-----------+
>>> base_rik stdev patched stdev %improve
>>> +-----------+-----------+-----------+------------+-----------+
>>> kernbench (time in sec lesser is better)
>>> +-----------+-----------+-----------+------------+-----------+
>>> 1x 49.2300 1.0171 22.6842 0.3073 117.0233 %
>>> 2x 91.9358 1.7768 53.9608 1.0154 70.37516 %
>>> +-----------+-----------+-----------+------------+-----------+
>>>
>>> +-----------+-----------+-----------+------------+-----------+
>>> ebizzy (records/sec more is better)
>>> +-----------+-----------+-----------+------------+-----------+
>>> 1x 1129.2500 28.6793 2125.6250 32.8239 88.23334 %
>>> 2x 1892.3750 75.1112 2377.1250 181.6822 25.61596 %
>>> +-----------+-----------+-----------+------------+-----------+
>>>
>>> Note: The patches are tested on x86.
>>>
>>> Links
>>> V4: https://lkml.org/lkml/2012/7/16/80
>>> V3: https://lkml.org/lkml/2012/7/12/437
>>> V2: https://lkml.org/lkml/2012/7/10/392
>>> V1: https://lkml.org/lkml/2012/7/9/32
>>>
>>> Raghavendra K T (3):
>>> config: Add config to support ple or cpu relax optimzation
>>> kvm : Note down when cpu relax intercepted or pause loop exited
>>> kvm : Choose a better candidate for directed yield
>>> ---
>>> arch/s390/kvm/Kconfig | 1 +
>>> arch/x86/kvm/Kconfig | 1 +
>>> include/linux/kvm_host.h | 39 +++++++++++++++++++++++++++++++++++++++
>>> virt/kvm/Kconfig | 3 +++
>>> virt/kvm/kvm_main.c | 41 +++++++++++++++++++++++++++++++++++++++++
>>> 5 files changed, 85 insertions(+), 0 deletions(-)
>>
>> Reviewed-by: Marcelo Tosatti<mtosatti@redhat.com>
>>
>
> Thanks Marcelo for the review. Avi, Rik, Christian, please let me know
> if this series looks good now.

The series looks good to me.

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler
  2012-07-22 12:43     ` Avi Kivity
@ 2012-07-23  7:35       ` Christian Borntraeger
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2012-07-23  7:35 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Raghavendra K T, Marcelo Tosatti, Rik van Riel, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, Srikar, S390, Carsten Otte, KVM,
	chegu vinod, Andrew M. Theurer, LKML, X86, Gleb Natapov, linux390,
	Srivatsa Vaddagiri, Joerg Roedel

On 22/07/12 14:43, Avi Kivity wrote:
> On 07/22/2012 03:34 PM, Raghavendra K T wrote:
>>
>> Thanks Marcelo for the review. Avi, Rik, Christian, please let me know
>> if this series looks good now.
>>
> 
> It looks fine to me.  Christian, is this okay for s390?
> 
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # on s390x

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler
  2012-07-18 13:37 [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler Raghavendra K T
                   ` (3 preceding siblings ...)
  2012-07-20 17:36 ` [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler Marcelo Tosatti
@ 2012-07-23 10:03 ` Avi Kivity
  4 siblings, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2012-07-23 10:03 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: H. Peter Anvin, Thomas Gleixner, Marcelo Tosatti, Ingo Molnar,
	Rik van Riel, Srikar, S390, Carsten Otte, Christian Borntraeger,
	KVM, chegu vinod, Andrew M. Theurer, LKML, X86, Gleb Natapov,
	linux390, Srivatsa Vaddagiri, Joerg Roedel

On 07/18/2012 04:37 PM, Raghavendra K T wrote:
> Currently Pause Loop Exit (PLE) handler is doing directed yield to a
> random vcpu on pl-exit. We already have filtering while choosing
> the candidate to yield_to. This change adds more checks while choosing
> a candidate to yield_to.
> 
> On a large vcpu guests, there is a high probability of
> yielding to the same vcpu who had recently done a pause-loop exit. 
> Such a yield can lead to the vcpu spinning again.
> 
> The patchset keeps track of the pause loop exit and gives chance to a
> vcpu which has:
> 
>  (a) Not done pause loop exit at all (probably he is preempted lock-holder)
> 
>  (b) vcpu skipped in last iteration because it did pause loop exit, and
>  probably has become eligible now (next eligible lock holder)
> 
> This concept also helps in cpu relax interception cases which use same handler.
> 

Thanks, applied to 'queue'.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-07-23 10:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-18 13:37 [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler Raghavendra K T
2012-07-18 13:37 ` [PATCH RFC V5 1/3] kvm/config: Add config to support ple or cpu relax optimzation Raghavendra K T
2012-07-18 13:37 ` [PATCH RFC V5 2/3] kvm: Note down when cpu relax intercepted or pause loop exited Raghavendra K T
2012-07-18 13:38 ` [PATCH RFC V5 3/3] kvm: Choose better candidate for directed yield Raghavendra K T
2012-07-18 14:39   ` Raghavendra K T
2012-07-19  9:47     ` [RESEND PATCH " Raghavendra K T
2012-07-20 17:36 ` [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler Marcelo Tosatti
2012-07-22 12:34   ` Raghavendra K T
2012-07-22 12:43     ` Avi Kivity
2012-07-23  7:35       ` Christian Borntraeger
2012-07-22 17:58     ` Rik van Riel
2012-07-23 10:03 ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).