kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Ingo Molnar <mingo@redhat.com>, Avi Kivity <avi@redhat.com>,
	Rik van Riel <riel@redhat.com>
Cc: Srikar <srikar@linux.vnet.ibm.com>,
	"Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com>,
	KVM <kvm@vger.kernel.org>,
	Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	Jiannan Ouyang <ouyang@cs.pitt.edu>,
	Chegu Vinod <chegu_vinod@hp.com>,
	"Andrew M. Theurer" <habanero@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Srivatsa Vaddagiri <srivatsa.vaddagiri@gmail.com>,
	Gleb Natapov <gleb@redhat.com>, Andrew Jones <drjones@redhat.com>
Subject: [PATCH V2 RFC 0/3] kvm: Improving undercommit,overcommit scenarios
Date: Mon, 29 Oct 2012 19:36:22 +0530	[thread overview]
Message-ID: <20121029140621.15448.92083.sendpatchset@codeblue> (raw)

 In some special scenarios like #vcpu <= #pcpu, PLE handler may
prove very costly, because there is no need to iterate over vcpus
and do unsuccessful yield_to burning CPU.

 Similarly, when we have large number of small guests, it is
possible that a spinning vcpu fails to yield_to any vcpu of same
VM and go back and spin. This is also not effective when we are
over-committed. Instead, we do a yield() so that we give chance
to other VMs to run.

This patch tries to optimize above scenarios.

 The first patch optimizes all the yield_to by bailing out when there
 is no need to continue yield_to (i.e., when there is only one task 
 in source and target rq).

 Second patch uses that in PLE handler.
 
 Third patch uses overall system load knowledge to take decison on
 continuing in yield_to handler, and also yielding in overcommits.
 To be precise, 
 * loadavg is converted to a scale of 2048  / per CPU 
 * a load value of less than 1024 is considered as undercommit and we
 return from PLE handler in those cases 
 * a load value of greater than 3586 (1.75 * 2048) is considered as overcommit
  and  we yield to other VMs in such cases.

(let threshold = 2048)
Rationale for using threshold/2 for undercommit limit:
 Having a load below (0.5 * threshold) is used to avoid (the concern rasied by Rik)
scenarios where we still have lock holder preempted vcpu waiting to be
scheduled. (scenario arises when rq length is > 1 even when we are under
committed)

Rationale for using (1.75 * threshold) for overcommit scenario:
This is a heuristic where we should probably see rq length > 1
and a vcpu of a different VM is waiting to be scheduled.

 Related future work (independent of this series):
 
 - Dynamically changing PLE window depending on system load.

 Result on 3.7.0-rc1 kernel shows around 146% improvement for ebizzy 1x
 with 32 core PLE machine with 32 vcpu guest.
 I believe we should get very good improvements for overcommit (especially > 2)
 on large machines with small vcpu guests. (Could not test this as I do not have
 access to a bigger machine)

base = 3.7.0-rc1 
machine: 32 core mx3850 x5 PLE mc

--+-----------+-----------+-----------+------------+-----------+
               ebizzy (rec/sec higher is beter)
--+-----------+-----------+-----------+------------+-----------+
    base        stdev       patched     stdev       %improve     
--+-----------+-----------+-----------+------------+-----------+
1x  2543.3750    20.2903    6279.3750    82.5226   146.89143   
2x  2410.8750    96.4327    2450.7500   207.8136     1.65396
3x  2184.9167   205.5226    2178.3333    97.2034    -0.30131
--+-----------+-----------+-----------+------------+-----------+

--+-----------+-----------+-----------+------------+-----------+
        dbench (throughput in MB/sec. higher is better)
--+-----------+-----------+-----------+------------+-----------+
    base        stdev       patched     stdev       %improve     
--+-----------+-----------+-----------+------------+-----------+
1x  5545.4330   596.4344    7042.8510  1012.0924    27.00272
2x  1993.0970    43.6548    1990.6200    75.7837    -0.12428
3x  1295.3867    22.3997    1315.5208    36.0075     1.55429
--+-----------+-----------+-----------+------------+-----------+

 Changes since V1:
 - Discard the idea of exporting nrrunning and optimize in core scheduler (Peter)
 - Use yield() instead of schedule in overcommit scenarios (Rik)
 - Use loadavg knowledge to detect undercommit/overcommit

 Peter Zijlstra (1):
  Bail out of yield_to when source and target runqueue has one task

 Raghavendra K T (2):
  Handle yield_to failure return for potential undercommit case
  Check system load and handle different commit cases accordingly

 Please let me know your comments and suggestions.

 Link for V1:
 https://lkml.org/lkml/2012/9/21/168

 kernel/sched/core.c | 25 +++++++++++++++++++------
 virt/kvm/kvm_main.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 65 insertions(+), 16 deletions(-)

             reply	other threads:[~2012-10-29 14:06 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-29 14:06 Raghavendra K T [this message]
2012-10-29 14:06 ` [PATCH V2 RFC 1/3] sched: Bail out of yield_to when source and target runqueue has one task Raghavendra K T
2012-10-29 14:07 ` [PATCH V2 RFC 2/3] kvm: Handle yield_to failure return code for potential undercommit case Raghavendra K T
2012-10-31 12:38   ` Avi Kivity
2012-10-31 12:41     ` Raghavendra K T
2012-10-31 13:15       ` Raghavendra K T
2012-10-31 13:41         ` Avi Kivity
2012-10-31 17:06           ` Raghavendra K T
2012-11-07 10:25             ` Raghavendra K T
2012-11-09  8:38               ` [PATCH V2 RESEND " Raghavendra K T
2012-10-29 14:07 ` [PATCH V2 RFC 3/3] kvm: Check system load and handle different commit cases accordingly Raghavendra K T
2012-10-29 17:54   ` Peter Zijlstra
2012-10-30  5:57     ` Raghavendra K T
2012-10-30  6:34       ` Andrew Jones
2012-10-30  7:31         ` Raghavendra K T
2012-10-30  9:07           ` Andrew Jones
2012-10-31 12:24             ` Raghavendra K T
2012-10-30  8:14       ` Peter Zijlstra
2012-10-31  6:10         ` Raghavendra K T
2012-10-30 12:17 ` [PATCH V2 RFC 0/3] kvm: Improving undercommit,overcommit scenarios Andrew Theurer
2012-10-31  6:36   ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121029140621.15448.92083.sendpatchset@codeblue \
    --to=raghavendra.kt@linux.vnet.ibm.com \
    --cc=avi@redhat.com \
    --cc=chegu_vinod@hp.com \
    --cc=drjones@redhat.com \
    --cc=gleb@redhat.com \
    --cc=habanero@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=ouyang@cs.pitt.edu \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=srivatsa.vaddagiri@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).