Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: habanero@linux.vnet.ibm.com
Cc: Avi Kivity <avi@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rik van Riel <riel@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Ingo Molnar <mingo@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Srikar <srikar@linux.vnet.ibm.com>,
	"Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com>,
	KVM <kvm@vger.kernel.org>, Jiannan Ouyang <ouyang@cs.pitt.edu>,
	chegu vinod <chegu_vinod@hp.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Srivatsa Vaddagiri <srivatsa.vaddagiri@gmail.com>,
	Gleb Natapov <gleb@redhat.com>, Andrew Jones <drjones@redhat.com>
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
Date: Fri, 19 Oct 2012 14:00:40 +0530	[thread overview]
Message-ID: <50810FB0.9000507@linux.vnet.ibm.com> (raw)
In-Reply-To: <1350311695.22418.86.camel@oc2024037011.ibm.com>

On 10/15/2012 08:04 PM, Andrew Theurer wrote:
> On Mon, 2012-10-15 at 17:40 +0530, Raghavendra K T wrote:
>> On 10/11/2012 01:06 AM, Andrew Theurer wrote:
>>> On Wed, 2012-10-10 at 23:24 +0530, Raghavendra K T wrote:
>>>> On 10/10/2012 08:29 AM, Andrew Theurer wrote:
>>>>> On Wed, 2012-10-10 at 00:21 +0530, Raghavendra K T wrote:
>>>>>> * Avi Kivity <avi@redhat.com> [2012-10-04 17:00:28]:
>>>>>>
>>>>>>> On 10/04/2012 03:07 PM, Peter Zijlstra wrote:
>>>>>>>> On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:
>>>>>>>>>
>> [...]
>>>>> A big concern I have (if this is 1x overcommit) for ebizzy is that it
>>>>> has just terrible scalability to begin with.  I do not think we should
>>>>> try to optimize such a bad workload.
>>>>>
>>>>
>>>> I think my way of running dbench has some flaw, so I went to ebizzy.
>>>> Could you let me know how you generally run dbench?
>>>
>>> I mount a tmpfs and then specify that mount for dbench to run on.  This
>>> eliminates all IO.  I use a 300 second run time and number of threads is
>>> equal to number of vcpus.  All of the VMs of course need to have a
>>> synchronized start.
>>>
>>> I would also make sure you are using a recent kernel for dbench, where
>>> the dcache scalability is much improved.  Without any lock-holder
>>> preemption, the time in spin_lock should be very low:
>>>
>>>
>>>>       21.54%      78016         dbench  [kernel.kallsyms]   [k] copy_user_generic_unrolled
>>>>        3.51%      12723         dbench  libc-2.12.so        [.] __strchr_sse42
>>>>        2.81%      10176         dbench  dbench              [.] child_run
>>>>        2.54%       9203         dbench  [kernel.kallsyms]   [k] _raw_spin_lock
>>>>        2.33%       8423         dbench  dbench              [.] next_token
>>>>        2.02%       7335         dbench  [kernel.kallsyms]   [k] __d_lookup_rcu
>>>>        1.89%       6850         dbench  libc-2.12.so        [.] __strstr_sse42
>>>>        1.53%       5537         dbench  libc-2.12.so        [.] __memset_sse2
>>>>        1.47%       5337         dbench  [kernel.kallsyms]   [k] link_path_walk
>>>>        1.40%       5084         dbench  [kernel.kallsyms]   [k] kmem_cache_alloc
>>>>        1.38%       5009         dbench  libc-2.12.so        [.] memmove
>>>>        1.24%       4496         dbench  libc-2.12.so        [.] vfprintf
>>>>        1.15%       4169         dbench  [kernel.kallsyms]   [k] __audit_syscall_exit
>>>
>>
>> Hi Andrew,
>> I ran the test with dbench with tmpfs. I do not see any improvements in
>> dbench for 16k ple window.
>>
>> So it seems apart from ebizzy no workload benefited by that. and I
>> agree that, it may not be good to optimize for ebizzy.
>> I shall drop changing to 16k default window and continue with other
>> original patch series. Need to experiment with latest kernel.
>
> Thanks for running this again.  I do believe there are some workloads,
> when run at 1x overcommit, would benefit from a larger ple_window [with
> he current ple handling code], but I do not also want to potentially
> degrade >1x with a larger window.  I do, however, think there may be a
> another option.  I have not fully worked this out, but I think I am on
> to something.
>
> I decided to revert back to just a yield() instead of a yield_to().  My
> motivation was that yield_to() [for large VMs] is like a dog chasing its
> tail, round and round we go....   Just yield(), in particular a yield()
> which results in yielding to something -other- than the current VM's
> vcpus, helps synchronize the execution of sibling vcpus by deferring
> them until the lock holder vcpu is running again.  The more we can do to
> get all vcpus running at the same time, the far less we deal with the
> preemption problem.  The other benefit is that yield() is far, far lower
> overhead than yield_to()
>
> This does assume that vcpus from same VM do not share same runqueues.
> Yielding to a sibling vcpu with yield() is not productive for larger VMs
> in the same way that yield_to() is not.  My recent results include
> restricting vcpu placement so that sibling vcpus do not get to run on
> the same runqueue.  I do believe we could implement a initial placement
> and load balance policy to strive for this restriction (making it purely
> optional, but I bet could also help user apps which use spin locks).
>
> For 1x VMs which still vm_exit due to PLE, I believe we could probably
> just leave the ple_window alone, as long as we mostly use yield()
> instead of yield_to().  The problem with the unneeded exits in this case
> has been the overhead in routines leading up to yield_to() and the
> yield_to() itself.  If we use yield() most of the time, this overhead
> will go away.
>
> Here is a comparison of yield_to() and yield():
>
> dbench with 20-way VMs, 8 of them on 80-way host:
>
> no PLE			  426 +/- 11.03%
> no PLE w/ gangsched	32001 +/- .37%
> PLE with yield()	29207 +/- .28%
> PLE with yield_to()	 8175 +/- 1.37%
>
> Yield() is far and way better than yield_to() here and almost approaches
> gang sched result.  Here is a link for the perf sched map bitmap:
>
> https://docs.google.com/open?id=0B6tfUNlZ-14weXBfVnFFZGw1akU
>
> The thrashing is way down and sibling vcpus tend to run together,
> approximating the behavior of the gang scheduling without needing to
> actually implement gang scheduling.
>
> I did test a smaller VM:
>
> dbench with 10-way VMs, 16 of them on 80-way host:
>
> no PLE			 6248 +/- 7.69%	
> no PLE w/ gangsched	28379 +/- .07%
> PLE with yield()	29196 +/- 1.62%
> PLE with yield_to()	32217 +/- 1.76%

Hi Andrew, Results are encouraging.

>
> There is some degrade from yield() to yield_to() here, but nearly as
> large as the uplift we see on the larger VMs.  Regardless, I have an
> idea to fix that: Instead of using yield() all the time, we could use
> yield_to(), but limit the rate per vcpu to something like 1 per jiffie.
> All other exits use yield().  That rate of yield_to() should be more
> than enough for the smaller VMs, and the result should be hopefully just
> the same as the current code.  I have not coded this up yet, but it's my
> next step.

I personally feel rate limiting yield_to may be a good idea.

>
> I am also hopeful the limitation of yield_to() will also make the 1x
> issue just go away as well (even with 4096 ple_window).  The vast
> majority of exits will result in yield() which should be harmless.
>
> Keep in mind this did require ensuring sibling vcpus do not share host
> runqueues -I do think that can be possible given some optional scheduler
> tweaks.

I think this is a concern (placing). Having rate limit alone may
suffice.May be tuning that taking into overcommitted/non-overcommitted
scenario also into account would be better.

Okay below is my V2 implementation I am experimenting

1) check source -and- target runq to decide on exiting the ple handler
2)

vcpu_on_spin()
{

  .....
  if yield_to_same_vm did not succeed and we are overcommitted
     yield()

}

I think combining your thoughts and (2) complicates scenario a bit.
anyways let me see how my experiment goes. I will also check how yield
performs without any pinning.

next prev parent reply	other threads:[~2012-10-19  8:30 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-21 11:59 [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler Raghavendra K T
2012-09-21 12:00 ` [PATCH RFC 1/2] kvm: Handle undercommitted guest case " Raghavendra K T
2012-09-21 13:02   ` Rik van Riel
2012-09-21 17:24     ` Raghavendra K T
2012-09-24 15:41       ` Avi Kivity
2012-09-24 16:06         ` Avi Kivity
2012-09-24 16:14           ` Peter Zijlstra
2012-09-24 16:25             ` Avi Kivity
2012-09-25  8:09           ` Raghavendra K T
2012-09-25  8:54             ` Avi Kivity
2012-09-25 13:49               ` Raghavendra K T
2012-09-27  7:44               ` Gleb Natapov
2012-09-27  8:59                 ` Avi Kivity
2012-09-27  9:11                   ` Gleb Natapov
2012-09-27  9:33                     ` Avi Kivity
2012-09-27  9:58                       ` Gleb Natapov
2012-09-27 10:04                         ` Avi Kivity
2012-09-27 10:08                           ` Gleb Natapov
2012-09-27 10:15                             ` Avi Kivity
     [not found]               ` <CAJocwcf+8u84_yDC-PK0Yni93YSTWzYvr69nq6b3pNv1MwVJzQ@mail.gmail.com>
2012-09-27  8:50                 ` Avi Kivity
2012-09-27 11:26                   ` Raghavendra K T
2012-09-27 12:06                     ` Avi Kivity
2012-09-28 18:18                       ` Konrad Rzeszutek Wilk
2012-09-30  8:16                         ` Avi Kivity
     [not found]                   ` <CAJocwcc19F+PtsQ5okGMvYeVnkEigpZRpwWY9JgeRPFqfcVoXA@mail.gmail.com>
2012-09-28  6:16                     ` Raghavendra K T
2012-09-30  8:18                       ` Avi Kivity
2012-09-30 11:07                         ` Gleb Natapov
2012-09-30 11:13                           ` Avi Kivity
2012-10-03 14:17                             ` Raghavendra K T
2012-10-03 14:56                               ` Avi Kivity
2012-10-04  7:29                                 ` Gleb Natapov
2012-10-05  8:36                                   ` Raghavendra K T
2012-10-07  9:51                                     ` Avi Kivity
2012-09-25  7:36         ` Raghavendra K T
2012-09-25  8:12           ` Avi Kivity
2012-09-25 14:21             ` Takuya Yoshikawa
2012-09-27  8:43               ` Avi Kivity
2012-10-03 12:22         ` Raghavendra K T
2012-10-03 17:05           ` Avi Kivity
2012-10-04 10:49             ` Raghavendra K T
2012-10-04 12:41               ` Avi Kivity
2012-10-04 13:07                 ` Peter Zijlstra
2012-10-04 15:00                   ` Avi Kivity
2012-10-09 18:51                     ` Raghavendra K T
2012-10-10  2:59                       ` Andrew Theurer
2012-10-10 17:54                         ` Raghavendra K T
2012-10-10 18:03                           ` David Ahern
2012-10-10 18:14                             ` Raghavendra K T
2012-10-10 19:36                           ` Andrew Theurer
2012-10-15 12:10                             ` Raghavendra K T
2012-10-15 14:34                               ` Andrew Theurer
2012-10-19  8:30                                 ` Raghavendra K T [this message]
2012-10-19 13:31                                   ` Andrew Theurer
2012-10-10 14:24                       ` Andrew Theurer
2012-10-10 17:43                         ` Raghavendra K T
2012-10-10 19:27                           ` Andrew Theurer
2012-10-11 17:13                             ` Raghavendra K T
2012-10-11 10:39                         ` Nikunj A Dadhania
2012-10-18 12:39                       ` Avi Kivity
2012-10-19  8:19                         ` Raghavendra K T
2012-10-04 14:41                 ` Andrew Theurer
2012-10-05  9:06                   ` Raghavendra K T
2012-10-05  9:02                 ` Raghavendra K T
2012-09-24 11:33   ` Peter Zijlstra
2012-09-24 11:40     ` Raghavendra K T
2012-09-21 12:00 ` [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario " Raghavendra K T
2012-09-21 13:22   ` Rik van Riel
2012-09-21 13:46   ` Takuya Yoshikawa
2012-09-21 13:52     ` Rik van Riel
2012-09-21 17:45       ` Raghavendra K T
2012-09-24 13:43         ` Takuya Yoshikawa
2012-09-24 15:26   ` Avi Kivity
2012-09-24 15:34     ` Peter Zijlstra
2012-09-24 15:43       ` Avi Kivity
2012-09-24 15:52         ` Peter Zijlstra
2012-09-24 15:58           ` Avi Kivity
2012-09-24 16:05             ` Peter Zijlstra
2012-09-24 16:10               ` Avi Kivity
2012-09-24 16:13                 ` Peter Zijlstra
2012-09-24 16:21                   ` Avi Kivity
2012-09-25 10:11                     ` Avi Kivity
2012-09-21 13:18 ` [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios " Chegu Vinod
2012-09-21 17:36   ` Raghavendra K T
2012-09-24  8:42     ` Dor Laor
2012-09-24 12:02       ` Raghavendra K T
2012-09-25 15:00         ` Dor Laor
2012-09-26 12:27           ` Konrad Rzeszutek Wilk
2012-09-27 10:07             ` Raghavendra K T
2012-09-27  9:49           ` Raghavendra K T
2012-09-27 10:28             ` Andrew Jones
2012-09-27 10:44               ` Avi Kivity
2012-09-27 11:31               ` Raghavendra K T
2012-09-27 10:33             ` Dor Laor
2012-09-24 11:34 ` Peter Zijlstra
2012-09-24 11:52   ` Raghavendra K T
2012-09-24 12:36     ` Peter Zijlstra
2012-09-24 13:29       ` Raghavendra K T
2012-09-24 13:54         ` Peter Zijlstra
2012-09-24 14:16           ` Raghavendra K T
2012-09-25 13:40             ` Raghavendra K T
2012-09-27  8:36               ` Avi Kivity
2012-09-27 11:23                 ` Raghavendra K T
2012-09-27 12:03                   ` Avi Kivity
2012-09-27 12:25                     ` Andrew Theurer
2012-09-28  5:38                     ` Raghavendra K T
2012-09-28  5:45                       ` H. Peter Anvin
2012-09-28  6:03                         ` Raghavendra K T
2012-09-28  8:38                       ` Peter Zijlstra
2012-09-28 11:40                       ` Andrew Theurer
2012-09-28 14:11                         ` Raghavendra K T
2012-09-28 14:13                         ` Peter Zijlstra
2012-09-30  8:24                         ` Avi Kivity
2012-10-03 14:29                     ` Raghavendra K T
2012-10-03 17:25                       ` Avi Kivity
2012-10-04 10:56                         ` Raghavendra K T
2012-10-04 12:44                           ` Avi Kivity
2012-10-05  9:04                             ` Raghavendra K T
2012-09-24 15:51           ` Avi Kivity
2012-09-24 16:03             ` Peter Zijlstra
2012-09-24 16:20               ` Avi Kivity
2012-09-26 13:20                 ` Andrew Jones
2012-09-26 13:26                   ` Peter Zijlstra
2012-09-26 13:39                     ` Andrew Jones
2012-09-26 13:45                       ` Peter Zijlstra
2012-09-26 12:57       ` Andrew Jones
2012-09-27 10:21         ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50810FB0.9000507@linux.vnet.ibm.com \
    --to=raghavendra.kt@linux.vnet.ibm.com \
    --cc=avi@redhat.com \
    --cc=chegu_vinod@hp.com \
    --cc=drjones@redhat.com \
    --cc=gleb@redhat.com \
    --cc=habanero@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=ouyang@cs.pitt.edu \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=srivatsa.vaddagiri@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.