Re: [PATCH V2 RFC 0/3] kvm: Improving undercommit,overcommit scenarios

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: habanero@linux.vnet.ibm.com, Avi Kivity <avi@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Ingo Molnar <mingo@redhat.com>, Rik van Riel <riel@redhat.com>,
	Srikar <srikar@linux.vnet.ibm.com>,
	"Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com>,
	KVM <kvm@vger.kernel.org>, Jiannan Ouyang <ouyang@cs.pitt.edu>,
	Chegu Vinod <chegu_vinod@hp.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Srivatsa Vaddagiri <srivatsa.vaddagiri@gmail.com>,
	Gleb Natapov <gleb@redhat.com>, Andrew Jones <drjones@redhat.com>
Subject: Re: [PATCH V2 RFC 0/3] kvm: Improving undercommit,overcommit scenarios
Date: Wed, 31 Oct 2012 12:06:34 +0530	[thread overview]
Message-ID: <5090C6F2.5030103@linux.vnet.ibm.com> (raw)
In-Reply-To: <1351599420.23105.14.camel@oc6622382223.ibm.com>

On 10/30/2012 05:47 PM, Andrew Theurer wrote:
> On Mon, 2012-10-29 at 19:36 +0530, Raghavendra K T wrote:
>> In some special scenarios like #vcpu <= #pcpu, PLE handler may
>> prove very costly, because there is no need to iterate over vcpus
>> and do unsuccessful yield_to burning CPU.
>>
>>   Similarly, when we have large number of small guests, it is
>> possible that a spinning vcpu fails to yield_to any vcpu of same
>> VM and go back and spin. This is also not effective when we are
>> over-committed. Instead, we do a yield() so that we give chance
>> to other VMs to run.
>>
>> This patch tries to optimize above scenarios.
>>
>>   The first patch optimizes all the yield_to by bailing out when there
>>   is no need to continue yield_to (i.e., when there is only one task
>>   in source and target rq).
>>
>>   Second patch uses that in PLE handler.
>>
>>   Third patch uses overall system load knowledge to take decison on
>>   continuing in yield_to handler, and also yielding in overcommits.
>>   To be precise,
>>   * loadavg is converted to a scale of 2048  / per CPU
>>   * a load value of less than 1024 is considered as undercommit and we
>>   return from PLE handler in those cases
>>   * a load value of greater than 3586 (1.75 * 2048) is considered as overcommit
>>    and  we yield to other VMs in such cases.
>>
>> (let threshold = 2048)
>> Rationale for using threshold/2 for undercommit limit:
>>   Having a load below (0.5 * threshold) is used to avoid (the concern rasied by Rik)
>> scenarios where we still have lock holder preempted vcpu waiting to be
>> scheduled. (scenario arises when rq length is > 1 even when we are under
>> committed)
>>
>> Rationale for using (1.75 * threshold) for overcommit scenario:
>> This is a heuristic where we should probably see rq length > 1
>> and a vcpu of a different VM is waiting to be scheduled.
>>
>>   Related future work (independent of this series):
>>
>>   - Dynamically changing PLE window depending on system load.
>>
>>   Result on 3.7.0-rc1 kernel shows around 146% improvement for ebizzy 1x
>>   with 32 core PLE machine with 32 vcpu guest.
>>   I believe we should get very good improvements for overcommit (especially > 2)
>>   on large machines with small vcpu guests. (Could not test this as I do not have
>>   access to a bigger machine)
>>
>> base = 3.7.0-rc1
>> machine: 32 core mx3850 x5 PLE mc
>>
>> --+-----------+-----------+-----------+------------+-----------+
>>                 ebizzy (rec/sec higher is beter)
>> --+-----------+-----------+-----------+------------+-----------+
>>      base        stdev       patched     stdev       %improve
>> --+-----------+-----------+-----------+------------+-----------+
>> 1x  2543.3750    20.2903    6279.3750    82.5226   146.89143
>> 2x  2410.8750    96.4327    2450.7500   207.8136     1.65396
>> 3x  2184.9167   205.5226    2178.3333    97.2034    -0.30131
>> --+-----------+-----------+-----------+------------+-----------+
>>
>> --+-----------+-----------+-----------+------------+-----------+
>>          dbench (throughput in MB/sec. higher is better)
>> --+-----------+-----------+-----------+------------+-----------+
>>      base        stdev       patched     stdev       %improve
>> --+-----------+-----------+-----------+------------+-----------+
>> 1x  5545.4330   596.4344    7042.8510  1012.0924    27.00272
>> 2x  1993.0970    43.6548    1990.6200    75.7837    -0.12428
>> 3x  1295.3867    22.3997    1315.5208    36.0075     1.55429
>> --+-----------+-----------+-----------+------------+-----------+
>
> Could you include a PLE-off result for 1x over-commit, so we know what
> the best possible result is?

Yes,

base no PLE

ebizzy_1x 7651.3000 rec/sec
ebizzy_2x   51.5000 rec/sec

ebizzy we are closer.

dbench_1x 12631.4210 MB/sec
dbench_2x 45.0842    MB/sec

(strangely dbench 1x result is not consistent sometime despite 10 runs
of 3min + 30 sec warmup runs on a 3G tmpfs. But surely it tells the trend)

>
> Looks like skipping the yield_to() for rq = 1 helps, but I'd like to
> know if the performance is the same as PLE off for 1x.  I am concerned
> the vcpu to task lookup is still expensive.
>

Yes. I still see that.

> Based on Peter's comments I would say the 3rd patch and the 2x,3x
> results are not conclusive at this time.

Avi, IMO patch 1 and 2 seem to be good to go. Please let me know.

>
> I think we should also discuss what we think a good target is.  We
> should know what our high-water mark is, and IMO, if we cannot get
> close, then I do not feel we are heading down the right path.  For
> example, if dbench aggregate throughput for 1x with PLE off is 10000
> MB/sec, then the best possible 2x,3x result, should be a little lower
> than that due to task switching the vcpus and sharing chaches.  This
> should be quite evident with current PLE handler and smaller VMs (like
> 10 vcpus or less).

Very much agree here. If we see the 2x 3x results (all/any of them).
aggregate is not near 1x. May be even 70% is a good target.

     prev parent reply	other threads:[~2012-10-31  6:41 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-29 14:06 [PATCH V2 RFC 0/3] kvm: Improving undercommit,overcommit scenarios Raghavendra K T
2012-10-29 14:06 ` [PATCH V2 RFC 1/3] sched: Bail out of yield_to when source and target runqueue has one task Raghavendra K T
2012-10-29 14:07 ` [PATCH V2 RFC 2/3] kvm: Handle yield_to failure return code for potential undercommit case Raghavendra K T
2012-10-31 12:38   ` Avi Kivity
2012-10-31 12:41     ` Raghavendra K T
2012-10-31 13:15       ` Raghavendra K T
2012-10-31 13:41         ` Avi Kivity
2012-10-31 17:06           ` Raghavendra K T
2012-11-07 10:25             ` Raghavendra K T
2012-11-09  8:38               ` [PATCH V2 RESEND " Raghavendra K T
2012-10-29 14:07 ` [PATCH V2 RFC 3/3] kvm: Check system load and handle different commit cases accordingly Raghavendra K T
2012-10-29 17:54   ` Peter Zijlstra
2012-10-30  5:57     ` Raghavendra K T
2012-10-30  6:34       ` Andrew Jones
2012-10-30  7:31         ` Raghavendra K T
2012-10-30  9:07           ` Andrew Jones
2012-10-31 12:24             ` Raghavendra K T
2012-10-30  8:14       ` Peter Zijlstra
2012-10-31  6:10         ` Raghavendra K T
2012-10-30 12:17 ` [PATCH V2 RFC 0/3] kvm: Improving undercommit,overcommit scenarios Andrew Theurer
2012-10-31  6:36   ` Raghavendra K T [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5090C6F2.5030103@linux.vnet.ibm.com \
    --to=raghavendra.kt@linux.vnet.ibm.com \
    --cc=avi@redhat.com \
    --cc=chegu_vinod@hp.com \
    --cc=drjones@redhat.com \
    --cc=gleb@redhat.com \
    --cc=habanero@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=ouyang@cs.pitt.edu \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=srivatsa.vaddagiri@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).