From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Paolo Bonzini <pbonzini@redhat.com>,
Gleb Natapov <gleb@kernel.org>, Vinod Chegu <chegu_vinod@hp.com>,
Hui-Zhi Zhao <hui-zhi.zhao@hp.com>,
Christian Borntraeger <borntraeger@de.ibm.com>,
Lisa Mitchell <lisa.mitchell@hp.com>
Subject: Re: [PATCH v3 4/7] KVM: VMX: dynamise PLE window
Date: Fri, 22 Aug 2014 00:40:03 +0530 [thread overview]
Message-ID: <53F6440B.8010709@linux.vnet.ibm.com> (raw)
In-Reply-To: <1408637291-18533-5-git-send-email-rkrcmar@redhat.com>
On 08/21/2014 09:38 PM, Radim Krčmář wrote:
> Window is increased on every PLE exit and decreased on every sched_in.
> The idea is that we don't want to PLE exit if there is no preemption
> going on.
> We do this with sched_in() because it does not hold rq lock.
>
> There are two new kernel parameters for changing the window:
> ple_window_grow and ple_window_shrink
> ple_window_grow affects the window on PLE exit and ple_window_shrink
> does it on sched_in; depending on their value, the window is modifier
> like this: (ple_window is kvm_intel's global)
>
> ple_window_shrink/ |
> ple_window_grow | PLE exit | sched_in
> -------------------+--------------------+---------------------
> < 1 | = ple_window | = ple_window
> < ple_window | *= ple_window_grow | /= ple_window_shrink
> otherwise | += ple_window_grow | -= ple_window_shrink
>
> A third new parameter, ple_window_max, controls the maximal ple_window;
> it is internally rounded down to a closest multiple of ple_window_grow.
>
> VCPU's PLE window is never allowed below ple_window.
>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
> arch/x86/kvm/vmx.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 85 insertions(+), 2 deletions(-)
>
Thanks for the nice patch.
default of grow = 2 and shrink = 0 is very good, which aids fast
clamping in overcommit and less ple_exits in undercommit.
with a small concern over modifier (shrinker) value in patch 6,
Reviewed-by: Raghavendra KT <raghavendra.kt@linux.vnet.ibm.com>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 18e0e52..a2fa6ba 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -125,14 +125,32 @@ module_param(nested, bool, S_IRUGO);
> * Time is measured based on a counter that runs at the same rate as the TSC,
> * refer SDM volume 3b section 21.6.13 & 22.1.3.
> */
> -#define KVM_VMX_DEFAULT_PLE_GAP 128
> -#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
> +#define KVM_VMX_DEFAULT_PLE_GAP 128
> +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
> +#define KVM_VMX_DEFAULT_PLE_WINDOW_GROW 2
> +#define KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK 0
> +#define KVM_VMX_DEFAULT_PLE_WINDOW_MAX \
> + INT_MAX / KVM_VMX_DEFAULT_PLE_WINDOW_GROW
> +
> static int ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
> module_param(ple_gap, int, S_IRUGO);
>
> static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
> module_param(ple_window, int, S_IRUGO);
>
> +/* Default doubles per-vcpu window every exit. */
> +static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
> +module_param(ple_window_grow, int, S_IRUGO);
> +
> +/* Default resets per-vcpu window every exit to ple_window. */
> +static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
> +module_param(ple_window_shrink, int, S_IRUGO);
> +
> +/* Default is to compute the maximum so we can never overflow. */
> +static int ple_window_actual_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX;
> +static int ple_window_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX;
> +module_param(ple_window_max, int, S_IRUGO);
Some observations:
1. I think a large ple_window for e.g., 16MB itself
would be very large value after which the ple_window change would not
have much effect.
So could KVM_VMX_DEFAULT_PLE_WINDOW_MAX be just
KVM_VMX_DEFAULT_PLE_WINDOW^2 or something ? or otherwise..
How about ..
define KVM_VMX_DEFAULT_PLE_WINDOW_MAX = INT_MAX
/KVM_VMX_DEFAULT_PLE_WINDOW.
(considering we don't allow default grow to be greater than default ple
window).
2. can we consider min_ple_window value of 2k. tracing showed that in
overcommit there were several occations of 4k <- 4k situations.
3. Do you think using ( << and >> ) instead of (*, /) saves some cycles
considering we could have potentially have grow,shrink numbers != power of 2
> +
> extern const ulong vmx_return;
>
> #define NR_AUTOLOAD_MSRS 8
> @@ -5679,12 +5697,73 @@ out:
> return ret;
> }
>
> +static int __grow_ple_window(int val)
> +{
> + if (ple_window_grow < 1)
why not ple_window_grow <= 1 ?
> + return ple_window;
> +
> + val = min(val, ple_window_actual_max);
> +
> + if (ple_window_grow < ple_window)
> + val *= ple_window_grow;
> + else
> + val += ple_window_grow;
> +
> + return val;
> +}
> +
> +static int __shrink_ple_window(int val, int modifier, int minimum)
> +{
> + if (modifier < 1)
why not modifier < 1. May be this would address a concern in patch 6.
> + return ple_window;
> +
> + if (modifier < ple_window)
> + val /= modifier;
> + else
> + val -= modifier;
> +
> + return max(val, minimum);
> +}
> +
> +static void grow_ple_window(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> + vmx->ple_window = __grow_ple_window(vmx->ple_window);
> +}
> +
> +static void shrink_ple_window(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> + vmx->ple_window = __shrink_ple_window(vmx->ple_window,
> + ple_window_shrink, ple_window);
> +}
> +
> +/*
> + * ple_window_actual_max is computed to be one grow_ple_window() below
> + * ple_window_max. (See __grow_ple_window for the reason.)
> + * This prevents overflows, because ple_window_max is int.
> + * ple_window_max effectively rounded down to a multiple of ple_window_grow in
> + * this process.
> + * ple_window_max is also prevented from setting vmx->ple_window < ple_window.
> + */
> +static void update_ple_window_actual_max(void)
> +{
> + ple_window_actual_max =
> + __shrink_ple_window(max(ple_window_max, ple_window),
> + ple_window_grow, INT_MIN);
> +}
> +
> /*
> * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
> * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
> */
> static int handle_pause(struct kvm_vcpu *vcpu)
> {
> + if (ple_gap)
> + grow_ple_window(vcpu);
> +
> skip_emulated_instruction(vcpu);
> kvm_vcpu_on_spin(vcpu);
>
> @@ -8854,6 +8933,8 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu,
>
> void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
> {
> + if (ple_gap)
> + shrink_ple_window(vcpu);
> }
>
> static struct kvm_x86_ops vmx_x86_ops = {
> @@ -9077,6 +9158,8 @@ static int __init vmx_init(void)
> } else
> kvm_disable_tdp();
>
> + update_ple_window_actual_max();
> +
> return 0;
>
> out7:
>
next prev parent reply other threads:[~2014-08-21 19:09 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-21 16:08 [PATCH v3 0/7] Dynamic Pause Loop Exiting window Radim Krčmář
2014-08-21 16:08 ` [PATCH v3 1/7] KVM: add kvm_arch_sched_in Radim Krčmář
2014-08-21 18:49 ` Raghavendra K T
2014-08-21 20:31 ` Radim Krčmář
2014-08-21 21:12 ` Paolo Bonzini
2014-08-21 16:08 ` [PATCH v3 2/7] KVM: x86: introduce sched_in to kvm_x86_ops Radim Krčmář
2014-08-21 18:50 ` Raghavendra K T
2014-08-21 16:08 ` [PATCH v3 3/7] KVM: VMX: make PLE window per-VCPU Radim Krčmář
2014-08-21 18:52 ` Raghavendra K T
2014-08-21 16:08 ` [PATCH v3 4/7] KVM: VMX: dynamise PLE window Radim Krčmář
2014-08-21 19:10 ` Raghavendra K T [this message]
2014-08-21 20:31 ` Paolo Bonzini
2014-08-21 20:59 ` Radim Krčmář
2014-08-21 16:08 ` [PATCH v3 5/7] KVM: trace kvm_ple_window grow/shrink Radim Krčmář
2014-08-25 13:53 ` Sabrina Dubroca
2014-08-25 14:32 ` Radim Krčmář
2014-08-25 14:44 ` Paolo Bonzini
2014-08-21 16:08 ` [PATCH v3 6/7] KVM: VMX: runtime knobs for dynamic PLE window Radim Krčmář
2014-08-21 19:17 ` Raghavendra K T
2014-08-21 21:03 ` Radim Krčmář
2014-08-21 16:08 ` [PATCH v3 7/7] KVM: VMX: optimize ple_window updates to VMCS Radim Krčmář
2014-08-21 19:18 ` Raghavendra K T
2014-08-21 16:30 ` [PATCH v3 0/7] Dynamic Pause Loop Exiting window Paolo Bonzini
2014-08-21 16:50 ` Radim Krčmář
2014-08-21 16:53 ` Paolo Bonzini
2014-08-22 4:45 ` Wanpeng Li
2014-08-25 15:11 ` Radim Krčmář
2014-08-21 17:03 ` Raghavendra K T
2014-08-21 18:40 ` Raghavendra K T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53F6440B.8010709@linux.vnet.ibm.com \
--to=raghavendra.kt@linux.vnet.ibm.com \
--cc=borntraeger@de.ibm.com \
--cc=chegu_vinod@hp.com \
--cc=gleb@kernel.org \
--cc=hui-zhi.zhao@hp.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lisa.mitchell@hp.com \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox