All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm-devel <kvm@vger.kernel.org>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Wanpeng Li" <kernellwp@gmail.com>,
	"Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>,
	"Raslan, KarimAllah" <karahmed@amazon.de>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Ankur Arora" <ankur.a.arora@oracle.com>
Subject: Re: [PATCH] x86: add cpuidle_kvm driver to allow guest side halt polling
Date: Mon, 20 May 2019 09:49:10 -0300	[thread overview]
Message-ID: <20190520124910.GA3800@amt.cnet> (raw)
In-Reply-To: <fd5caf49-6d98-4887-0052-ccbc999fc077@redhat.com>

On Mon, May 20, 2019 at 01:51:57PM +0200, Paolo Bonzini wrote:
> On 17/05/19 19:48, Marcelo Tosatti wrote:
> > 
> > The cpuidle_kvm driver allows the guest vcpus to poll for a specified
> > amount of time before halting. This provides the following benefits
> > to host side polling:
> > 
> > 	1) The POLL flag is set while polling is performed, which allows
> > 	   a remote vCPU to avoid sending an IPI (and the associated
> >  	   cost of handling the IPI) when performing a wakeup.
> > 
> > 	2) The HLT VM-exit cost can be avoided.
> > 
> > The downside of guest side polling is that polling is performed
> > even with other runnable tasks in the host.
> > 
> > Results comparing halt_poll_ns and server/client application
> > where a small packet is ping-ponged:
> > 
> > host                                        --> 31.33	
> > halt_poll_ns=300000 / no guest busy spin    --> 33.40	(93.8%)
> > halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73	(95.7%)
> > 
> > For the SAP HANA benchmarks (where idle_spin is a parameter 
> > of the previous version of the patch, results should be the
> > same):
> > 
> > hpns == halt_poll_ns
> > 
> >                           idle_spin=0/   idle_spin=800/	   idle_spin=0/
> > 			  hpns=200000    hpns=0            hpns=800000
> > DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78	  (+1%)
> > InsertC16T02 (100 thread) 2.14     	 2.07 (-3%)        2.18   (+1.8%)
> > DeleteC00T01 (1 thread)   1.34 		 1.28 (-4.5%)	   1.29   (-3.7%)
> > UpdateC00T03 (1 thread)	  4.72		 4.18 (-12%)	   4.53   (-5%)
> 
> Hi Marcelo,
> 
> some quick observations:
> 
> 1) This is actually not KVM-specific, so the name and placement of the
> docs should be adjusted.

Agreed. Will call it: cpuidle_halt_poll, move it to drivers/cpuidle/

> 2) Regarding KVM-specific code, however, we could add an MSR so that KVM
> disables halt_poll_ns for this VM when this is active in the guest?

Sure.

> 3) The spin time could use the same adaptive algorithm that KVM uses in
> the host.

Agreed... This can be done later, i suppose (the current fixed
setting works sufficiently well for our needs).

> Thanks,
> 
> Paolo
> 
> 
> > ---
> >  Documentation/virtual/kvm/guest-halt-polling.txt |   39 ++++++++
> >  arch/x86/Kconfig                                 |    9 +
> >  arch/x86/kernel/Makefile                         |    1 
> >  arch/x86/kernel/cpuidle_kvm.c                    |  105 +++++++++++++++++++++++
> >  arch/x86/kernel/process.c                        |    2 
> >  5 files changed, 155 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6.git/arch/x86/Kconfig
> > ===================================================================
> > --- linux-2.6.git.orig/arch/x86/Kconfig	2019-04-22 13:49:42.858303265 -0300
> > +++ linux-2.6.git/arch/x86/Kconfig	2019-05-16 14:18:41.254852745 -0300
> > @@ -805,6 +805,15 @@
> >  	  underlying device model, the host provides the guest with
> >  	  timing infrastructure such as time of day, and system time
> >  
> > +config KVM_CPUIDLE
> > +	tristate "KVM cpuidle driver"
> > +	depends on KVM_GUEST
> > +	default y
> > +	help
> > +	  This option enables KVM cpuidle driver, which allows to poll
> > +	  before halting in the guest (more efficient than polling in the
> > +	  host via halt_poll_ns for some scenarios).
> > +
> >  config PVH
> >  	bool "Support for running PVH guests"
> >  	---help---
> > Index: linux-2.6.git/arch/x86/kernel/Makefile
> > ===================================================================
> > --- linux-2.6.git.orig/arch/x86/kernel/Makefile	2019-04-22 13:49:42.869303331 -0300
> > +++ linux-2.6.git/arch/x86/kernel/Makefile	2019-05-17 12:59:51.673274881 -0300
> > @@ -112,6 +112,7 @@
> >  obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o
> >  
> >  obj-$(CONFIG_KVM_GUEST)		+= kvm.o kvmclock.o
> > +obj-$(CONFIG_KVM_CPUIDLE)	+= cpuidle_kvm.o
> >  obj-$(CONFIG_PARAVIRT)		+= paravirt.o paravirt_patch_$(BITS).o
> >  obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
> >  obj-$(CONFIG_PARAVIRT_CLOCK)	+= pvclock.o
> > Index: linux-2.6.git/arch/x86/kernel/process.c
> > ===================================================================
> > --- linux-2.6.git.orig/arch/x86/kernel/process.c	2019-04-22 13:49:42.876303374 -0300
> > +++ linux-2.6.git/arch/x86/kernel/process.c	2019-05-17 13:19:18.055435117 -0300
> > @@ -580,7 +580,7 @@
> >  	safe_halt();
> >  	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
> >  }
> > -#ifdef CONFIG_APM_MODULE
> > +#if defined(CONFIG_APM_MODULE) || defined(CONFIG_KVM_CPUIDLE_MODULE)
> >  EXPORT_SYMBOL(default_idle);
> >  #endif
> >  
> > Index: linux-2.6.git/arch/x86/kernel/cpuidle_kvm.c
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6.git/arch/x86/kernel/cpuidle_kvm.c	2019-05-17 13:38:02.553941356 -0300
> > @@ -0,0 +1,105 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * cpuidle driver for KVM guests.
> > + *
> > + * Copyright 2019 Red Hat, Inc. and/or its affiliates.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.  See
> > + * the COPYING file in the top-level directory.
> > + *
> > + * Authors: Marcelo Tosatti <mtosatti@redhat.com>
> > + */
> > +
> > +#include <linux/init.h>
> > +#include <linux/cpuidle.h>
> > +#include <linux/module.h>
> > +#include <linux/timekeeping.h>
> > +#include <linux/sched/idle.h>
> > +
> > +unsigned int guest_halt_poll_ns;
> > +module_param(guest_halt_poll_ns, uint, 0644);
> > +
> > +static int kvm_enter_idle(struct cpuidle_device *dev,
> > +			  struct cpuidle_driver *drv, int index)
> > +{
> > +	int do_halt = 0;
> > +
> > +	/* No polling */
> > +	if (guest_halt_poll_ns == 0) {
> > +		if (current_clr_polling_and_test()) {
> > +			local_irq_enable();
> > +			return index;
> > +		}
> > +		default_idle();
> > +		return index;
> > +	}
> > +
> > +	local_irq_enable();
> > +	if (!current_set_polling_and_test()) {
> > +		ktime_t now, end_spin;
> > +
> > +		now = ktime_get();
> > +		end_spin = ktime_add_ns(now, guest_halt_poll_ns);
> > +
> > +		while (!need_resched()) {
> > +			cpu_relax();
> > +			now = ktime_get();
> > +
> > +			if (!ktime_before(now, end_spin)) {
> > +				do_halt = 1;
> > +				break;
> > +			}
> > +		}
> > +	}
> > +
> > +	if (do_halt) {
> > +		/*
> > +		 * No events while busy spin window passed,
> > +		 * halt.
> > +		 */
> > +		local_irq_disable();
> > +		if (current_clr_polling_and_test()) {
> > +			local_irq_enable();
> > +			return index;
> > +		}
> > +		default_idle();
> > +	} else {
> > +		current_clr_polling();
> > +	}
> > +
> > +	return index;
> > +}
> > +
> > +static struct cpuidle_driver kvm_idle_driver = {
> > +	.name = "kvm_idle",
> > +	.owner = THIS_MODULE,
> > +	.states = {
> > +		{ /* entry 0 is for polling */ },
> > +		{
> > +			.enter			= kvm_enter_idle,
> > +			.exit_latency		= 0,
> > +			.target_residency	= 0,
> > +			.power_usage		= -1,
> > +			.name			= "KVM",
> > +			.desc			= "KVM idle",
> > +		},
> > +	},
> > +	.safe_state_index = 0,
> > +	.state_count = 2,
> > +};
> > +
> > +static int __init kvm_cpuidle_init(void)
> > +{
> > +	return cpuidle_register(&kvm_idle_driver, NULL);
> > +}
> > +
> > +static void __exit kvm_cpuidle_exit(void)
> > +{
> > +	cpuidle_unregister(&kvm_idle_driver);
> > +}
> > +
> > +module_init(kvm_cpuidle_init);
> > +module_exit(kvm_cpuidle_exit);
> > +MODULE_LICENSE("GPL");
> > +MODULE_AUTHOR("Marcelo Tosatti <mtosatti@redhat.com>");
> > +
> > Index: linux-2.6.git/Documentation/virtual/kvm/guest-halt-polling.txt
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6.git/Documentation/virtual/kvm/guest-halt-polling.txt	2019-05-17 13:36:39.274703710 -0300
> > @@ -0,0 +1,39 @@
> > +KVM guest halt polling
> > +======================
> > +
> > +The cpuidle_kvm driver allows the guest vcpus to poll for a specified
> > +amount of time before halting. This provides the following benefits
> > +to host side polling:
> > +
> > +	1) The POLL flag is set while polling is performed, which allows
> > +	   a remote vCPU to avoid sending an IPI (and the associated
> > + 	   cost of handling the IPI) when performing a wakeup.
> > +
> > +	2) The HLT VM-exit cost can be avoided.
> > +
> > +The downside of guest side polling is that polling is performed
> > +even with other runnable tasks in the host.
> > +
> > +Module Parameters
> > +=================
> > +
> > +The cpuidle_kvm module has 1 tuneable module parameter: guest_halt_poll_ns,
> > +the amount of time, in nanoseconds, that polling is performed before
> > +halting.
> > +
> > +This module parameter can be set from the debugfs files in:
> > +
> > +	/sys/module/cpuidle_kvm/parameters/
> > +
> > +Further Notes
> > +=============
> > +
> > +- Care should be taken when setting the guest_halt_poll_ns parameter as a
> > +large value has the potential to drive the cpu usage to 100% on a machine which
> > +would be almost entirely idle otherwise.
> > +
> > +- The effective amount of time that polling is performed is the host poll
> > +value (see halt-polling.txt) plus guest_halt_poll_ns. If all guests
> > +on a host system support and have properly configured guest_halt_poll_ns,
> > +then setting halt_poll_ns to 0 in the host is probably the best choice.
> > +
> > 

  parent reply	other threads:[~2019-05-20 15:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-17 17:48 [PATCH] x86: add cpuidle_kvm driver to allow guest side halt polling Marcelo Tosatti
2019-05-20 11:51 ` Paolo Bonzini
2019-05-20 12:07   ` Christian Borntraeger
2019-05-20 12:49     ` Marcelo Tosatti
2019-05-20 13:46     ` Christian Borntraeger
2019-05-22 15:07       ` Marcelo Tosatti
2019-05-20 12:49   ` Marcelo Tosatti [this message]
2019-05-22 15:04   ` Marcelo Tosatti
2019-05-22 15:44     ` Paolo Bonzini
2019-05-22 16:45       ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190520124910.GA3800@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=ankur.a.arora@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=karahmed@amazon.de \
    --cc=kernellwp@gmail.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.