Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: Andrew Jones <drjones@redhat.com>
Cc: riel@redhat.com, Peter Zijlstra <peterz@infradead.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Avi Kivity <avi@redhat.com>,
	Gleb Natapov <gleb@redhat.com>, Ingo Molnar <mingo@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Srikar <srikar@linux.vnet.ibm.com>,
	"Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com>,
	KVM <kvm@vger.kernel.org>, Jiannan Ouyang <ouyang@cs.pitt.edu>,
	Chegu Vinod <chegu_vinod@hp.com>,
	"Andrew M. Theurer" <habanero@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Srivatsa Vaddagiri <srivatsa.vaddagiri@gmail.com>
Subject: Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case
Date: Tue, 27 Nov 2012 15:57:25 +0530	[thread overview]
Message-ID: <50B4958D.104@linux.vnet.ibm.com> (raw)
In-Reply-To: <20121126134301.GB9830@turtle.usersys.redhat.com>

On 11/26/2012 07:13 PM, Andrew Jones wrote:
> On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:
>> From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
>>
>> yield_to returns -ESRCH, When source and target of yield_to
>> run queue length is one. When we see three successive failures of
>> yield_to we assume we are in potential undercommit case and abort
>> from PLE handler.
>> The assumption is backed by low probability of wrong decision
>> for even worst case scenarios such as average runqueue length
>> between 1 and 2.
>>
>> note that we do not update last boosted vcpu in failure cases.
>> Thank Avi for raising question on aborting after first fail from
yield_to.
>>
>> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
>> ---
>>   virt/kvm/kvm_main.c |   26 ++++++++++++++++----------
>>   1 file changed, 16 insertions(+), 10 deletions(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index be70035..053f494 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
>>   {
>>   	struct pid *pid;
>>   	struct task_struct *task = NULL;
>> +	bool ret = false;
>>
>>   	rcu_read_lock();
>>   	pid = rcu_dereference(target->pid);
>> @@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
>>   		task = get_pid_task(target->pid, PIDTYPE_PID);
>>   	rcu_read_unlock();
>>   	if (!task)
>> -		return false;
>> +		return ret;
>>   	if (task->flags & PF_VCPU) {
>>   		put_task_struct(task);
>> -		return false;
>> -	}
>> -	if (yield_to(task, 1)) {
>> -		put_task_struct(task);
>> -		return true;
>> +		return ret;
>>   	}
>> +	ret = yield_to(task, 1);
>>   	put_task_struct(task);
>> -	return false;
>> +
>> +	return ret;
>>   }
>>   EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
>>
>> @@ -1697,12 +1696,14 @@ bool
kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
>>   	return eligible;
>>   }
>>   #endif
>> +
>>   void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>>   {
>>   	struct kvm *kvm = me->kvm;
>>   	struct kvm_vcpu *vcpu;
>>   	int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
>>   	int yielded = 0;
>> +	int try = 3;
>>   	int pass;
>>   	int i;
>>
>> @@ -1714,7 +1715,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>>   	 * VCPU is holding the lock that we need and will release it.
>>   	 * We approximate round-robin by starting at the last boosted VCPU.
>>   	 */
>> -	for (pass = 0; pass < 2 && !yielded; pass++) {
>> +	for (pass = 0; pass < 2 && !yielded && try; pass++) {
>>   		kvm_for_each_vcpu(i, vcpu, kvm) {
>>   			if (!pass && i <= last_boosted_vcpu) {
>>   				i = last_boosted_vcpu;
>> @@ -1727,10 +1728,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>>   				continue;
>>   			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
>>   				continue;
>> -			if (kvm_vcpu_yield_to(vcpu)) {
>> +
>> +			yielded = kvm_vcpu_yield_to(vcpu);
>> +			if (yielded > 0) {
>>   				kvm->last_boosted_vcpu = i;
>> -				yielded = 1;
>>   				break;
>> +			} else if (yielded < 0) {
>> +				try--;
>> +				if (!try)
>> +					break;
>>   			}
>>   		}
>>   	}
>>

Drew, Thanks for reviewing this.
>
> The check done in patch 1/2 is done before the double_rq_lock, so it's
> cheap. Now, this patch is to avoid doing too many get_pid_task calls.I
> wonder if it would make more sense to change the vcpu state from tracking
> the pid to tracking the task. If that was done, then I don't believe this
> patch is necessary.

We would need a logic not to break upon first failure of yield_to. 
(which happens otherwise with patch1 alone). Breaking upon first
failure out of ple handler resulted in degradation in moderate
overcommits due to false exits even when we have more than one task in
other cpu run queues.

But your suggestion triggered an idea to me, what would be the cost of
iterating over all vcpus despite of yield_to failure?

(Where we breakout of PLE handler only if we have successful yield i.e 
yielded > 0) with something like this:

-       for (pass = 0; pass < 2 && !yielded; pass++) {
+       for (pass = 0; pass < 2 && yielded <=0 ; pass++) {
                 kvm_for_each_vcpu(i, vcpu, kvm) {
                         if (!pass && i <= last_boosted_vcpu) {
                                 i = last_boosted_vcpu;
@@ -1727,11 +1727,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
                                 continue;
                         if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
                                 continue;
-                       if (kvm_vcpu_yield_to(vcpu)) {
+
+                       yielded = kvm_vcpu_yield_to(vcpu);
+                       if (yielded > 0) {
                                 kvm->last_boosted_vcpu = i;
-                               yielded = 1;
                                 break;

Here is the result of the above patch w.r.t to base and current patch
series.

benchmark     improvement w.r.t base   improvement w.r.t current patch
ebizzy_1x      131.22287               -9.76%
ebizzy_4x      -7.97198                -21.1%

dbench_1x       25.67077               -25.55%
dbench_4x       -69.19086              -122.46%


Current patches perform better. So this means iterating over vcpus
has some overhead.  Though we have IMO for bigger machine with large 
guests, this is
significant..

Let me know if this patch sounds good to you..

next prev parent reply	other threads:[~2012-11-27 10:27 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-26 12:07 [PATCH V3 RFC 0/2] kvm: Improving undercommit scenarios Raghavendra K T
2012-11-26 12:07 ` [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task Raghavendra K T
2012-11-26 13:35   ` Andrew Jones
2012-11-27 10:30     ` Raghavendra K T
2012-11-27 14:04       ` Andrew Theurer
2012-11-28  7:03         ` Raghavendra K T
2012-11-27 14:23       ` Chegu Vinod
     [not found]         ` <50B68F94.3080907@hp.com>
2012-11-29  2:00           ` Andrew Theurer
     [not found]         ` <50B6B5B5.5060108@hp.com>
2012-11-29  2:20           ` Chegu Vinod
2012-12-14  0:29   ` Marcelo Tosatti
2012-12-14 15:40     ` Raghavendra K T
2012-12-19  5:35       ` Raghavendra K T
2012-11-26 12:08 ` [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case Raghavendra K T
2012-11-26 13:43   ` Andrew Jones
2012-11-26 14:06     ` Andrew Jones
2012-11-27 10:27     ` Raghavendra K T [this message]
2012-11-27 13:22       ` Andrew Jones
2012-11-28  1:12   ` Marcelo Tosatti
2012-11-28  5:10     ` Raghavendra K T
2012-11-29 12:16       ` Gleb Natapov
2012-11-30  5:04         ` Raghavendra K T
2012-12-03 19:56       ` Marcelo Tosatti
2012-12-04 17:49         ` Raghavendra K T
2012-12-06  6:59         ` Raghavendra K T
2012-12-08  0:49           ` Marcelo Tosatti
2012-11-29  2:07 ` [PATCH V3 RFC 0/2] kvm: Improving undercommit scenarios Chegu Vinod
2012-11-29  9:49   ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50B4958D.104@linux.vnet.ibm.com \
    --to=raghavendra.kt@linux.vnet.ibm.com \
    --cc=avi@redhat.com \
    --cc=chegu_vinod@hp.com \
    --cc=drjones@redhat.com \
    --cc=gleb@redhat.com \
    --cc=habanero@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=ouyang@cs.pitt.edu \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=srivatsa.vaddagiri@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).