From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755283Ab2IUNS2 (ORCPT ); Fri, 21 Sep 2012 09:18:28 -0400 Received: from g5t0009.atlanta.hp.com ([15.192.0.46]:29122 "EHLO g5t0009.atlanta.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754860Ab2IUNS0 (ORCPT ); Fri, 21 Sep 2012 09:18:26 -0400 Message-ID: <505C691D.4080801@hp.com> Date: Fri, 21 Sep 2012 06:18:21 -0700 From: Chegu Vinod User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 MIME-Version: 1.0 To: Raghavendra K T CC: Peter Zijlstra , "H. Peter Anvin" , Marcelo Tosatti , Ingo Molnar , Avi Kivity , Rik van Riel , Srikar , "Nikunj A. Dadhania" , KVM , Jiannan Ouyang , "Andrew M. Theurer" , LKML , Srivatsa Vaddagiri , Gleb Natapov Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler References: <20120921115942.27611.67488.sendpatchset@codeblue> In-Reply-To: <20120921115942.27611.67488.sendpatchset@codeblue> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/21/2012 4:59 AM, Raghavendra K T wrote: > In some special scenarios like #vcpu <= #pcpu, PLE handler may > prove very costly, Yes. > because there is no need to iterate over vcpus > and do unsuccessful yield_to burning CPU. > > An idea to solve this is: > 1) As Avi had proposed we can modify hardware ple_window > dynamically to avoid frequent PL-exit. Yes. We had to do this to get around some scaling issues for large (>20way) guests (with no overcommitment) As part of some experimentation we even tried "switching off" PLE too :( > (IMHO, it is difficult to > decide when we have mixed type of VMs). Agree. Not sure if the following alternatives have also been looked at : - Could the behavior associated with the "ple_window" be modified to be a function of some [new] per-guest attribute (which can be conveyed to the host as part of the guest launch sequence). The user can choose to set this [new] attribute for a given guest. This would help avoid the frequent exits due to PLE (as Avi had mentioned earlier) ? - Can the PLE feature ( in VT) be "enhanced" to be made a per guest attribute ? IMHO, the approach of not taking a frequent exit is better than taking an exit and returning back from the handler etc. Thanks Vinod > > Another idea, proposed in the first patch, is to identify > non-overcommit case and just return from the PLE handler. > > There are are many ways to identify non-overcommit scenario. > 1) Using loadavg etc (get_avenrun/calc_global_load > /this_cpu_load) > > 2) Explicitly check nr_running()/num_online_cpus() > > 3) Check source vcpu runqueue length. > > Not sure how can we make use of (1) effectively/how to use it. > (2) has significant overhead since it iterates all cpus. > so this patch uses third method. (I feel it is uglier to export > runqueue length, but expecting suggestion on this). > > In second patch, when we have large number of small guests, it is > possible that a spinning vcpu fails to yield_to any vcpu of same > VM and go back and spin. This is also not effective when we are > over-committed. Instead, we do a schedule() so that we give chance > to other VMs to run. > > Raghavendra K T(2): > Handle undercommitted guest case in PLE handler > Be courteous to other VMs in overcommitted scenario in PLE handler > > Results: > base = 3.6.0-rc5 + ple handler optimization patches from kvm tree. > patched = base + patch1 + patch2 > machine: x240 with 16 core with HT enabled (32 cpu thread). > 32 vcpu guest with 8GB RAM. > > +-----------+-----------+-----------+------------+-----------+ > ebizzy (record/sec higher is better) > +-----------+-----------+-----------+------------+-----------+ > base stddev patched stdev %improve > +-----------+-----------+-----------+------------+-----------+ > 11293.3750 624.4378 18209.6250 371.7061 61.24166 > 3641.8750 468.9400 3725.5000 253.7823 2.29621 > +-----------+-----------+-----------+------------+-----------+ > > +-----------+-----------+-----------+------------+-----------+ > kernbench (time in sec lower is better) > +-----------+-----------+-----------+------------+-----------+ > base stddev patched stdev %improve > +-----------+-----------+-----------+------------+-----------+ > 30.6020 1.3018 30.8287 1.1517 -0.74080 > 64.0825 2.3764 63.4721 5.0191 0.95252 > 95.8638 8.7030 94.5988 8.3832 1.31958 > +-----------+-----------+-----------+------------+-----------+ > > Note: > on mx3850x5 machine with 32 cores HT disabled I got around > ebizzy 209% > kernbench 6% > improvement for 1x scenario. > > Thanks Srikar for his active partipation in discussing ideas and > reviewing the patch. > > Please let me know your suggestions and comments. > --- > include/linux/sched.h | 1 + > kernel/sched/core.c | 6 ++++++ > virt/kvm/kvm_main.c | 7 +++++++ > 3 files changed, 14 insertions(+), 0 deletions(-) > > . >