* [PATCH 1/6] kvm: export kvm module parameter variables
2016-10-14 0:53 [PATCH 0/6] kvm: powerpc halt polling updates Suraj Jitindar Singh
@ 2016-10-14 0:53 ` Suraj Jitindar Singh
2016-10-14 0:53 ` [PATCH 2/6] powerpc/kvm: Use generic kvm module parameters in kvm-hv Suraj Jitindar Singh
` (6 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Suraj Jitindar Singh @ 2016-10-14 0:53 UTC (permalink / raw)
To: pbonzini, rkrcmar, agraf, corbet
Cc: paulus, mpe, sam.bobroff, kvm, kvm-ppc, linuxppc-dev, benh,
linux-doc, Suraj Jitindar Singh
The kvm module has the parameters halt_poll_ns, halt_poll_ns_grow, and
halt_poll_ns_shrink. Halt polling was recently added to the powerpc kvm-hv
module and these parameters were essentially duplicated for that. There is
no benefit to this duplication and it can lead to confusion when trying to
tune halt polling.
Thus move the definition of these variables to kvm_host.h and export them.
This will allow the kvm-hv module to use the same module parameters by
accessing these variables, which will be implemented in the next patch,
meaning that they will no longer be duplicated.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
include/linux/kvm_host.h | 4 ++++
virt/kvm/kvm_main.c | 9 ++++++---
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 01c0b9c..29b500a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1107,6 +1107,10 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
extern bool kvm_rebooting;
+extern unsigned int halt_poll_ns;
+extern unsigned int halt_poll_ns_grow;
+extern unsigned int halt_poll_ns_shrink;
+
struct kvm_device {
struct kvm_device_ops *ops;
struct kvm *kvm;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 81dfc73..675d7b5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -70,16 +70,19 @@ MODULE_AUTHOR("Qumranet");
MODULE_LICENSE("GPL");
/* Architectures should define their poll value according to the halt latency */
-static unsigned int halt_poll_ns = KVM_HALT_POLL_NS_DEFAULT;
+unsigned int halt_poll_ns = KVM_HALT_POLL_NS_DEFAULT;
module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR);
+EXPORT_SYMBOL_GPL(halt_poll_ns);
/* Default doubles per-vcpu halt_poll_ns. */
-static unsigned int halt_poll_ns_grow = 2;
+unsigned int halt_poll_ns_grow = 2;
module_param(halt_poll_ns_grow, uint, S_IRUGO | S_IWUSR);
+EXPORT_SYMBOL_GPL(halt_poll_ns_grow);
/* Default resets per-vcpu halt_poll_ns . */
-static unsigned int halt_poll_ns_shrink;
+unsigned int halt_poll_ns_shrink;
module_param(halt_poll_ns_shrink, uint, S_IRUGO | S_IWUSR);
+EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
/*
* Ordering of locks:
--
2.5.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/6] powerpc/kvm: Use generic kvm module parameters in kvm-hv
2016-10-14 0:53 [PATCH 0/6] kvm: powerpc halt polling updates Suraj Jitindar Singh
2016-10-14 0:53 ` [PATCH 1/6] kvm: export kvm module parameter variables Suraj Jitindar Singh
@ 2016-10-14 0:53 ` Suraj Jitindar Singh
2016-10-14 0:53 ` [PATCH 3/6] powerpc/kvm: Add check for module parameter halt_poll_ns Suraj Jitindar Singh
` (5 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Suraj Jitindar Singh @ 2016-10-14 0:53 UTC (permalink / raw)
To: pbonzini, rkrcmar, agraf, corbet
Cc: paulus, mpe, sam.bobroff, kvm, kvm-ppc, linuxppc-dev, benh,
linux-doc, Suraj Jitindar Singh
The previous patch exported the variables which back the module parameters
of the generic kvm module. Now use these variables in the kvm-hv module
so that any change to the generic module parameters will also have the
same effect for the kvm-hv module. This removes the duplication of the
kvm module parameters which was redundant and should reduce confusion when
tuning them.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
arch/powerpc/kvm/book3s_hv.c | 29 ++++++-----------------------
1 file changed, 6 insertions(+), 23 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3686471..daad638 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -104,23 +104,6 @@ module_param_cb(h_ipi_redirect, &module_param_ops, &h_ipi_redirect,
MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host core");
#endif
-/* Maximum halt poll interval defaults to KVM_HALT_POLL_NS_DEFAULT */
-static unsigned int halt_poll_max_ns = KVM_HALT_POLL_NS_DEFAULT;
-module_param(halt_poll_max_ns, uint, S_IRUGO | S_IWUSR);
-MODULE_PARM_DESC(halt_poll_max_ns, "Maximum halt poll time in ns");
-
-/* Factor by which the vcore halt poll interval is grown, default is to double
- */
-static unsigned int halt_poll_ns_grow = 2;
-module_param(halt_poll_ns_grow, int, S_IRUGO);
-MODULE_PARM_DESC(halt_poll_ns_grow, "Factor halt poll time is grown by");
-
-/* Factor by which the vcore halt poll interval is shrunk, default is to reset
- */
-static unsigned int halt_poll_ns_shrink;
-module_param(halt_poll_ns_shrink, int, S_IRUGO);
-MODULE_PARM_DESC(halt_poll_ns_shrink, "Factor halt poll time is shrunk by");
-
static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
@@ -2544,8 +2527,8 @@ static void grow_halt_poll_ns(struct kvmppc_vcore *vc)
else
vc->halt_poll_ns *= halt_poll_ns_grow;
- if (vc->halt_poll_ns > halt_poll_max_ns)
- vc->halt_poll_ns = halt_poll_max_ns;
+ if (vc->halt_poll_ns > halt_poll_ns)
+ vc->halt_poll_ns = halt_poll_ns;
}
static void shrink_halt_poll_ns(struct kvmppc_vcore *vc)
@@ -2655,15 +2638,15 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
}
/* Adjust poll time */
- if (halt_poll_max_ns) {
+ if (halt_poll_ns) {
if (block_ns <= vc->halt_poll_ns)
;
/* We slept and blocked for longer than the max halt time */
- else if (vc->halt_poll_ns && block_ns > halt_poll_max_ns)
+ else if (vc->halt_poll_ns && block_ns > halt_poll_ns)
shrink_halt_poll_ns(vc);
/* We slept and our poll time is too small */
- else if (vc->halt_poll_ns < halt_poll_max_ns &&
- block_ns < halt_poll_max_ns)
+ else if (vc->halt_poll_ns < halt_poll_ns &&
+ block_ns < halt_poll_ns)
grow_halt_poll_ns(vc);
} else
vc->halt_poll_ns = 0;
--
2.5.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/6] powerpc/kvm: Add check for module parameter halt_poll_ns
2016-10-14 0:53 [PATCH 0/6] kvm: powerpc halt polling updates Suraj Jitindar Singh
2016-10-14 0:53 ` [PATCH 1/6] kvm: export kvm module parameter variables Suraj Jitindar Singh
2016-10-14 0:53 ` [PATCH 2/6] powerpc/kvm: Use generic kvm module parameters in kvm-hv Suraj Jitindar Singh
@ 2016-10-14 0:53 ` Suraj Jitindar Singh
2016-10-14 0:53 ` [PATCH 4/6] powerpc/kvm: Decrease the powerpc default halt poll max value Suraj Jitindar Singh
` (4 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Suraj Jitindar Singh @ 2016-10-14 0:53 UTC (permalink / raw)
To: pbonzini, rkrcmar, agraf, corbet
Cc: paulus, mpe, sam.bobroff, kvm, kvm-ppc, linuxppc-dev, benh,
linux-doc, Suraj Jitindar Singh
The kvm module parameter halt_poll_ns defines the global maximum halt
polling interval and can be dynamically changed by writing to the
/sys/module/kvm/parameters/halt_poll_ns sysfs file. However in kvm-hv
this module parameter value is only ever checked when we grow the current
polling interval for the given vcore. This means that if we decrease the
halt_poll_ns value below the current polling interval we won't see any
effect unless we try to grow the polling interval above the new max at some
point or it happens to be shrunk below the halt_poll_ns value.
Update the halt polling code so that we always check for a new module param
value of halt_poll_ns and set the current halt polling interval to it if
it's currently greater than the new max. This means that it's redundant to
also perform this check in the grow_halt_poll_ns() function now.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
arch/powerpc/kvm/book3s_hv.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index daad638..6503a63 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2526,9 +2526,6 @@ static void grow_halt_poll_ns(struct kvmppc_vcore *vc)
vc->halt_poll_ns = 10000;
else
vc->halt_poll_ns *= halt_poll_ns_grow;
-
- if (vc->halt_poll_ns > halt_poll_ns)
- vc->halt_poll_ns = halt_poll_ns;
}
static void shrink_halt_poll_ns(struct kvmppc_vcore *vc)
@@ -2648,6 +2645,8 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
else if (vc->halt_poll_ns < halt_poll_ns &&
block_ns < halt_poll_ns)
grow_halt_poll_ns(vc);
+ if (vc->halt_poll_ns > halt_poll_ns)
+ vc->halt_poll_ns = halt_poll_ns;
} else
vc->halt_poll_ns = 0;
--
2.5.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 4/6] powerpc/kvm: Decrease the powerpc default halt poll max value
2016-10-14 0:53 [PATCH 0/6] kvm: powerpc halt polling updates Suraj Jitindar Singh
` (2 preceding siblings ...)
2016-10-14 0:53 ` [PATCH 3/6] powerpc/kvm: Add check for module parameter halt_poll_ns Suraj Jitindar Singh
@ 2016-10-14 0:53 ` Suraj Jitindar Singh
2016-10-14 0:53 ` [PATCH 5/6] powerpc/kvm: Comment style and print format fixups Suraj Jitindar Singh
` (3 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Suraj Jitindar Singh @ 2016-10-14 0:53 UTC (permalink / raw)
To: pbonzini, rkrcmar, agraf, corbet
Cc: paulus, mpe, sam.bobroff, kvm, kvm-ppc, linuxppc-dev, benh,
linux-doc, Suraj Jitindar Singh
KVM_HALT_POLL_NS_DEFAULT is an arch specific constant which sets the
default value of the halt_poll_ns kvm module parameter which determines
the global maximum halt polling interval.
The current value for powerpc is 500000 (500us) which means that any
repetitive workload with a period of less than that can drive the cpu
usage to 100% where it may have been mostly idle without halt polling.
This presents the possibility of a large increase in power usage with
a comparatively small performance benefit.
Reduce the default to 10000 (10us) and a user can tune this themselves
to set their affinity for halt polling based on the trade off between power
and performance which they are willing to make.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
arch/powerpc/include/asm/kvm_host.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 28350a2..037b6a1 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -48,7 +48,7 @@
#ifdef CONFIG_KVM_MMIO
#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
#endif
-#define KVM_HALT_POLL_NS_DEFAULT 500000
+#define KVM_HALT_POLL_NS_DEFAULT 10000 /* 10 us */
/* These values are internal and can be increased later */
#define KVM_NR_IRQCHIPS 1
--
2.5.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 5/6] powerpc/kvm: Comment style and print format fixups
2016-10-14 0:53 [PATCH 0/6] kvm: powerpc halt polling updates Suraj Jitindar Singh
` (3 preceding siblings ...)
2016-10-14 0:53 ` [PATCH 4/6] powerpc/kvm: Decrease the powerpc default halt poll max value Suraj Jitindar Singh
@ 2016-10-14 0:53 ` Suraj Jitindar Singh
2016-10-14 0:53 ` [PATCH 6/6] doc/kvm: Add halt polling documentation Suraj Jitindar Singh
` (2 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Suraj Jitindar Singh @ 2016-10-14 0:53 UTC (permalink / raw)
To: pbonzini, rkrcmar, agraf, corbet
Cc: paulus, mpe, sam.bobroff, kvm, kvm-ppc, linuxppc-dev, benh,
linux-doc, Suraj Jitindar Singh
Fix comment block to match kernel comment style.
Fix print format from signed to unsigned.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
arch/powerpc/kvm/book3s_hv.c | 3 ++-
arch/powerpc/kvm/trace_hv.h | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6503a63..431950d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2536,7 +2536,8 @@ static void shrink_halt_poll_ns(struct kvmppc_vcore *vc)
vc->halt_poll_ns /= halt_poll_ns_shrink;
}
-/* Check to see if any of the runnable vcpus on the vcore have pending
+/*
+ * Check to see if any of the runnable vcpus on the vcore have pending
* exceptions or are no longer ceded
*/
static int kvmppc_vcore_check_block(struct kvmppc_vcore *vc)
diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
index fb21990..ebc6dd4 100644
--- a/arch/powerpc/kvm/trace_hv.h
+++ b/arch/powerpc/kvm/trace_hv.h
@@ -449,7 +449,7 @@ TRACE_EVENT(kvmppc_vcore_wakeup,
__entry->tgid = current->tgid;
),
- TP_printk("%s time %lld ns, tgid=%d",
+ TP_printk("%s time %llu ns, tgid=%d",
__entry->waited ? "wait" : "poll",
__entry->ns, __entry->tgid)
);
--
2.5.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 6/6] doc/kvm: Add halt polling documentation
2016-10-14 0:53 [PATCH 0/6] kvm: powerpc halt polling updates Suraj Jitindar Singh
` (4 preceding siblings ...)
2016-10-14 0:53 ` [PATCH 5/6] powerpc/kvm: Comment style and print format fixups Suraj Jitindar Singh
@ 2016-10-14 0:53 ` Suraj Jitindar Singh
2016-10-14 1:16 ` Wanpeng Li
2016-10-14 3:28 ` [PATCH 0/6] kvm: powerpc halt polling updates Sam Bobroff
2016-10-14 6:27 ` Nicholas Piggin
7 siblings, 1 reply; 11+ messages in thread
From: Suraj Jitindar Singh @ 2016-10-14 0:53 UTC (permalink / raw)
To: pbonzini, rkrcmar, agraf, corbet
Cc: paulus, mpe, sam.bobroff, kvm, kvm-ppc, linuxppc-dev, benh,
linux-doc, Suraj Jitindar Singh
There is currently no documentation about the halt polling capabilities
of the kvm module. Add some documentation describing the mechanism as well
as the module parameters to all better understanding of how halt polling
should be used and the effect of tuning the module parameters.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
Documentation/virtual/kvm/00-INDEX | 2 +
Documentation/virtual/kvm/halt-polling.txt | 127 +++++++++++++++++++++++++++++
2 files changed, 129 insertions(+)
create mode 100644 Documentation/virtual/kvm/halt-polling.txt
diff --git a/Documentation/virtual/kvm/00-INDEX b/Documentation/virtual/kvm/00-INDEX
index fee9f2b..69fe1a8 100644
--- a/Documentation/virtual/kvm/00-INDEX
+++ b/Documentation/virtual/kvm/00-INDEX
@@ -6,6 +6,8 @@ cpuid.txt
- KVM-specific cpuid leaves (x86).
devices/
- KVM_CAP_DEVICE_CTRL userspace API.
+halt-polling.txt
+ - notes on halt-polling
hypercalls.txt
- KVM hypercalls.
locking.txt
diff --git a/Documentation/virtual/kvm/halt-polling.txt b/Documentation/virtual/kvm/halt-polling.txt
new file mode 100644
index 0000000..4a84183
--- /dev/null
+++ b/Documentation/virtual/kvm/halt-polling.txt
@@ -0,0 +1,127 @@
+The KVM halt polling system
+===========================
+
+The KVM halt polling system provides a feature within KVM whereby the latency
+of a guest can, under some circumstances, be reduced by polling in the host
+for some time period after the guest has elected to no longer run by cedeing.
+That is, when a guest vcpu has ceded, or in the case of powerpc when all of the
+vcpus of a single vcore have ceded, the host kernel polls for wakeup conditions
+before giving up the cpu to the scheduler in order to let something else run.
+
+Polling provides a latency advantage in cases where the guest can be run again
+very quickly by at least saving us a trip through the scheduler, normally on
+the order of a few micro-seconds, although performance benefits are workload
+dependant. In the event that no wakeup source arrives during the polling
+interval or some other task on the runqueue is runnable the scheduler is
+invoked. Thus halt polling is especially useful on workloads with very short
+wakeup periods where the time spent halt polling is minimised and the time
+savings of not invoking the scheduler are distinguishable.
+
+The generic halt polling code is implemented in:
+
+ virt/kvm/kvm_main.c: kvm_vcpu_block()
+
+The powerpc kvm-hv specific case is implemented in:
+
+ arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked()
+
+Halt Polling Interval
+=====================
+
+The maximum time for which to poll before invoking the scheduler, referred to
+as the halt polling interval, is increased and decreased based on the perceived
+effectiveness of the polling in an attempt to limit pointless polling.
+This value is stored in either the vcpu struct:
+
+ kvm_vcpu->halt_poll_ns
+
+or in the case of powerpc kvm-hv, in the vcore struct:
+
+ kvmppc_vcore->halt_poll_ns
+
+Thus this is a per vcpu (or vcore) value.
+
+During polling if a wakeup source is received within the halt polling interval,
+the interval is left unchanged. In the event that a wakeup source isn't
+received during the polling interval (and thus schedule is invoked) there are
+two options, either the polling interval and total block time[0] were less than
+the global max polling interval (see module params below), or the total block
+time was greater than the global max polling interval.
+
+In the event that both the polling interval and total block time were less than
+the global max polling interval then the polling interval can be increased in
+the hope that next time during the longer polling interval the wake up source
+will be received while the host is polling and the latency benefits will be
+received. The polling interval is grown in the function grow_halt_poll_ns() and
+is multiplied by the module parameter halt_poll_ns_grow.
+
+In the event that the total block time was greater than the global max polling
+interval then the host will never poll for long enough (limited by the global
+max) to wakeup during the polling interval so it may as well be shrunk in order
+to avoid pointless polling. The polling interval is shrunk in the function
+shrink_halt_poll_ns() and is divided by the module parameter
+halt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0.
+
+It is worth noting that this adjustment process attempts to hone in on some
+steady state polling interval but will only really do a good job for wakeups
+which come at an approximately constant rate, otherwise there will be constant
+adjustment of the polling interval.
+
+[0] total block time: the time between when the halt polling function is
+ invoked and a wakeup source received (irrespective of
+ whether the scheduler is invoked within that function).
+
+Module Parameters
+=================
+
+The kvm module has 3 tuneable module parameters to adjust the global max
+polling interval as well as the rate at which the polling interval is grown and
+shrunk. These variables are defined in include/linux/kvm_host.h and as module
+parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the
+powerpc kvm-hv case.
+
+Module Parameter | Description | Default Value
+--------------------------------------------------------------------------------
+halt_poll_ns | The global max polling interval | KVM_HALT_POLL_NS_DEFAULT
+ | which defines the ceiling value |
+ | of the polling interval for | (per arch value)
+ | each vcpu. |
+--------------------------------------------------------------------------------
+halt_poll_ns_grow | The value by which the halt | 2
+ | polling interval is multiplied |
+ | in the grow_halt_poll_ns() |
+ | function. |
+--------------------------------------------------------------------------------
+halt_poll_ns_shrink | The value by which the halt | 0
+ | polling interval is divided in |
+ | the shrink_halt_poll_ns() |
+ | function. |
+--------------------------------------------------------------------------------
+
+These module parameters can be set from the debugfs files in:
+
+ /sys/module/kvm/parameters/
+
+Note: that these module parameters are system wide values and are not able to
+ be tuned on a per vm basis.
+
+Further Notes
+=============
+
+- Care should be taken when setting the halt_poll_ns module parameter as a
+large value has the potential to drive the cpu usage to 100% on a machine which
+would be almost entirely idle otherwise. This is because even if a guest has
+wakeups during which very little work is done and which are quite far apart, if
+the period is shorter than the global max polling interval (halt_poll_ns) then
+the host will always poll for the entire block time and thus cpu utilisation
+will go to 100%.
+
+- Halt polling essentially presents a trade off between power usage and latency
+and the module parameters should be used to tune the affinity for this. Idle
+cpu time is essentially converted to host kernel time with the aim of decreasing
+latency when entering the guest.
+
+- Halt polling will only be conducted by the host when no other tasks are
+runnable on that cpu, otherwise the polling will cease immediately and
+schedule will be invoked to allow that other task to run. Thus this doesn't
+allow a guest to denial of service the cpu.
--
2.5.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 6/6] doc/kvm: Add halt polling documentation
2016-10-14 0:53 ` [PATCH 6/6] doc/kvm: Add halt polling documentation Suraj Jitindar Singh
@ 2016-10-14 1:16 ` Wanpeng Li
2016-10-14 2:32 ` Suraj Jitindar Singh
0 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2016-10-14 1:16 UTC (permalink / raw)
To: Suraj Jitindar Singh
Cc: Paolo Bonzini, Radim Krcmar, agraf, Jonathan Corbet,
Paul Mackerras, mpe, sam.bobroff, kvm, kvm-ppc, linuxppc-dev,
benh, linux-doc
2016-10-14 8:53 GMT+08:00 Suraj Jitindar Singh <sjitindarsingh@gmail.com>:
> There is currently no documentation about the halt polling capabilities
> of the kvm module. Add some documentation describing the mechanism as well
> as the module parameters to all better understanding of how halt polling
> should be used and the effect of tuning the module parameters.
How about replace "halt-polling" by "Adaptive halt-polling"? Btw,
thanks for your docs.
Regards,
Wanpeng Li
>
> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
> ---
> Documentation/virtual/kvm/00-INDEX | 2 +
> Documentation/virtual/kvm/halt-polling.txt | 127 +++++++++++++++++++++++++++++
> 2 files changed, 129 insertions(+)
> create mode 100644 Documentation/virtual/kvm/halt-polling.txt
>
> diff --git a/Documentation/virtual/kvm/00-INDEX b/Documentation/virtual/kvm/00-INDEX
> index fee9f2b..69fe1a8 100644
> --- a/Documentation/virtual/kvm/00-INDEX
> +++ b/Documentation/virtual/kvm/00-INDEX
> @@ -6,6 +6,8 @@ cpuid.txt
> - KVM-specific cpuid leaves (x86).
> devices/
> - KVM_CAP_DEVICE_CTRL userspace API.
> +halt-polling.txt
> + - notes on halt-polling
> hypercalls.txt
> - KVM hypercalls.
> locking.txt
> diff --git a/Documentation/virtual/kvm/halt-polling.txt b/Documentation/virtual/kvm/halt-polling.txt
> new file mode 100644
> index 0000000..4a84183
> --- /dev/null
> +++ b/Documentation/virtual/kvm/halt-polling.txt
> @@ -0,0 +1,127 @@
> +The KVM halt polling system
> +===========================
> +
> +The KVM halt polling system provides a feature within KVM whereby the latency
> +of a guest can, under some circumstances, be reduced by polling in the host
> +for some time period after the guest has elected to no longer run by cedeing.
> +That is, when a guest vcpu has ceded, or in the case of powerpc when all of the
> +vcpus of a single vcore have ceded, the host kernel polls for wakeup conditions
> +before giving up the cpu to the scheduler in order to let something else run.
> +
> +Polling provides a latency advantage in cases where the guest can be run again
> +very quickly by at least saving us a trip through the scheduler, normally on
> +the order of a few micro-seconds, although performance benefits are workload
> +dependant. In the event that no wakeup source arrives during the polling
> +interval or some other task on the runqueue is runnable the scheduler is
> +invoked. Thus halt polling is especially useful on workloads with very short
> +wakeup periods where the time spent halt polling is minimised and the time
> +savings of not invoking the scheduler are distinguishable.
> +
> +The generic halt polling code is implemented in:
> +
> + virt/kvm/kvm_main.c: kvm_vcpu_block()
> +
> +The powerpc kvm-hv specific case is implemented in:
> +
> + arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked()
> +
> +Halt Polling Interval
> +=====================
> +
> +The maximum time for which to poll before invoking the scheduler, referred to
> +as the halt polling interval, is increased and decreased based on the perceived
> +effectiveness of the polling in an attempt to limit pointless polling.
> +This value is stored in either the vcpu struct:
> +
> + kvm_vcpu->halt_poll_ns
> +
> +or in the case of powerpc kvm-hv, in the vcore struct:
> +
> + kvmppc_vcore->halt_poll_ns
> +
> +Thus this is a per vcpu (or vcore) value.
> +
> +During polling if a wakeup source is received within the halt polling interval,
> +the interval is left unchanged. In the event that a wakeup source isn't
> +received during the polling interval (and thus schedule is invoked) there are
> +two options, either the polling interval and total block time[0] were less than
> +the global max polling interval (see module params below), or the total block
> +time was greater than the global max polling interval.
> +
> +In the event that both the polling interval and total block time were less than
> +the global max polling interval then the polling interval can be increased in
> +the hope that next time during the longer polling interval the wake up source
> +will be received while the host is polling and the latency benefits will be
> +received. The polling interval is grown in the function grow_halt_poll_ns() and
> +is multiplied by the module parameter halt_poll_ns_grow.
> +
> +In the event that the total block time was greater than the global max polling
> +interval then the host will never poll for long enough (limited by the global
> +max) to wakeup during the polling interval so it may as well be shrunk in order
> +to avoid pointless polling. The polling interval is shrunk in the function
> +shrink_halt_poll_ns() and is divided by the module parameter
> +halt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0.
> +
> +It is worth noting that this adjustment process attempts to hone in on some
> +steady state polling interval but will only really do a good job for wakeups
> +which come at an approximately constant rate, otherwise there will be constant
> +adjustment of the polling interval.
> +
> +[0] total block time: the time between when the halt polling function is
> + invoked and a wakeup source received (irrespective of
> + whether the scheduler is invoked within that function).
> +
> +Module Parameters
> +=================
> +
> +The kvm module has 3 tuneable module parameters to adjust the global max
> +polling interval as well as the rate at which the polling interval is grown and
> +shrunk. These variables are defined in include/linux/kvm_host.h and as module
> +parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the
> +powerpc kvm-hv case.
> +
> +Module Parameter | Description | Default Value
> +--------------------------------------------------------------------------------
> +halt_poll_ns | The global max polling interval | KVM_HALT_POLL_NS_DEFAULT
> + | which defines the ceiling value |
> + | of the polling interval for | (per arch value)
> + | each vcpu. |
> +--------------------------------------------------------------------------------
> +halt_poll_ns_grow | The value by which the halt | 2
> + | polling interval is multiplied |
> + | in the grow_halt_poll_ns() |
> + | function. |
> +--------------------------------------------------------------------------------
> +halt_poll_ns_shrink | The value by which the halt | 0
> + | polling interval is divided in |
> + | the shrink_halt_poll_ns() |
> + | function. |
> +--------------------------------------------------------------------------------
> +
> +These module parameters can be set from the debugfs files in:
> +
> + /sys/module/kvm/parameters/
> +
> +Note: that these module parameters are system wide values and are not able to
> + be tuned on a per vm basis.
> +
> +Further Notes
> +=============
> +
> +- Care should be taken when setting the halt_poll_ns module parameter as a
> +large value has the potential to drive the cpu usage to 100% on a machine which
> +would be almost entirely idle otherwise. This is because even if a guest has
> +wakeups during which very little work is done and which are quite far apart, if
> +the period is shorter than the global max polling interval (halt_poll_ns) then
> +the host will always poll for the entire block time and thus cpu utilisation
> +will go to 100%.
> +
> +- Halt polling essentially presents a trade off between power usage and latency
> +and the module parameters should be used to tune the affinity for this. Idle
> +cpu time is essentially converted to host kernel time with the aim of decreasing
> +latency when entering the guest.
> +
> +- Halt polling will only be conducted by the host when no other tasks are
> +runnable on that cpu, otherwise the polling will cease immediately and
> +schedule will be invoked to allow that other task to run. Thus this doesn't
> +allow a guest to denial of service the cpu.
> --
> 2.5.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 6/6] doc/kvm: Add halt polling documentation
2016-10-14 1:16 ` Wanpeng Li
@ 2016-10-14 2:32 ` Suraj Jitindar Singh
0 siblings, 0 replies; 11+ messages in thread
From: Suraj Jitindar Singh @ 2016-10-14 2:32 UTC (permalink / raw)
To: Wanpeng Li
Cc: Paolo Bonzini, Radim Krcmar, agraf, Jonathan Corbet,
Paul Mackerras, mpe, sam.bobroff, kvm, kvm-ppc, linuxppc-dev,
benh, linux-doc
On Fri, 2016-10-14 at 09:16 +0800, Wanpeng Li wrote:
> 2016-10-14 8:53 GMT+08:00 Suraj Jitindar Singh <sjitindarsingh@gmail.
> com>:
> >
> > There is currently no documentation about the halt polling
> > capabilities
> > of the kvm module. Add some documentation describing the mechanism
> > as well
> > as the module parameters to all better understanding of how halt
> > polling
> > should be used and the effect of tuning the module parameters.
> How about replace "halt-polling" by "Adaptive halt-polling"? Btw,
Yeah that's slightly more descriptive I guess
> thanks for your docs.
>
> Regards,
> Wanpeng Li
>
> >
> >
> > Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
> > ---
> > Documentation/virtual/kvm/00-INDEX | 2 +
> > Documentation/virtual/kvm/halt-polling.txt | 127
> > +++++++++++++++++++++++++++++
> > 2 files changed, 129 insertions(+)
> > create mode 100644 Documentation/virtual/kvm/halt-polling.txt
> >
> > diff --git a/Documentation/virtual/kvm/00-INDEX
> > b/Documentation/virtual/kvm/00-INDEX
> > index fee9f2b..69fe1a8 100644
> > --- a/Documentation/virtual/kvm/00-INDEX
> > +++ b/Documentation/virtual/kvm/00-INDEX
> > @@ -6,6 +6,8 @@ cpuid.txt
> > - KVM-specific cpuid leaves (x86).
> > devices/
> > - KVM_CAP_DEVICE_CTRL userspace API.
> > +halt-polling.txt
> > + - notes on halt-polling
> > hypercalls.txt
> > - KVM hypercalls.
> > locking.txt
> > diff --git a/Documentation/virtual/kvm/halt-polling.txt
> > b/Documentation/virtual/kvm/halt-polling.txt
> > new file mode 100644
> > index 0000000..4a84183
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/halt-polling.txt
> > @@ -0,0 +1,127 @@
> > +The KVM halt polling system
> > +===========================
> > +
> > +The KVM halt polling system provides a feature within KVM whereby
> > the latency
> > +of a guest can, under some circumstances, be reduced by polling in
> > the host
> > +for some time period after the guest has elected to no longer run
> > by cedeing.
> > +That is, when a guest vcpu has ceded, or in the case of powerpc
> > when all of the
> > +vcpus of a single vcore have ceded, the host kernel polls for
> > wakeup conditions
> > +before giving up the cpu to the scheduler in order to let
> > something else run.
> > +
> > +Polling provides a latency advantage in cases where the guest can
> > be run again
> > +very quickly by at least saving us a trip through the scheduler,
> > normally on
> > +the order of a few micro-seconds, although performance benefits
> > are workload
> > +dependant. In the event that no wakeup source arrives during the
> > polling
> > +interval or some other task on the runqueue is runnable the
> > scheduler is
> > +invoked. Thus halt polling is especially useful on workloads with
> > very short
> > +wakeup periods where the time spent halt polling is minimised and
> > the time
> > +savings of not invoking the scheduler are distinguishable.
> > +
> > +The generic halt polling code is implemented in:
> > +
> > + virt/kvm/kvm_main.c: kvm_vcpu_block()
> > +
> > +The powerpc kvm-hv specific case is implemented in:
> > +
> > + arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked()
> > +
> > +Halt Polling Interval
> > +=====================
> > +
> > +The maximum time for which to poll before invoking the scheduler,
> > referred to
> > +as the halt polling interval, is increased and decreased based on
> > the perceived
> > +effectiveness of the polling in an attempt to limit pointless
> > polling.
> > +This value is stored in either the vcpu struct:
> > +
> > + kvm_vcpu->halt_poll_ns
> > +
> > +or in the case of powerpc kvm-hv, in the vcore struct:
> > +
> > + kvmppc_vcore->halt_poll_ns
> > +
> > +Thus this is a per vcpu (or vcore) value.
> > +
> > +During polling if a wakeup source is received within the halt
> > polling interval,
> > +the interval is left unchanged. In the event that a wakeup source
> > isn't
> > +received during the polling interval (and thus schedule is
> > invoked) there are
> > +two options, either the polling interval and total block time[0]
> > were less than
> > +the global max polling interval (see module params below), or the
> > total block
> > +time was greater than the global max polling interval.
> > +
> > +In the event that both the polling interval and total block time
> > were less than
> > +the global max polling interval then the polling interval can be
> > increased in
> > +the hope that next time during the longer polling interval the
> > wake up source
> > +will be received while the host is polling and the latency
> > benefits will be
> > +received. The polling interval is grown in the function
> > grow_halt_poll_ns() and
> > +is multiplied by the module parameter halt_poll_ns_grow.
> > +
> > +In the event that the total block time was greater than the global
> > max polling
> > +interval then the host will never poll for long enough (limited by
> > the global
> > +max) to wakeup during the polling interval so it may as well be
> > shrunk in order
> > +to avoid pointless polling. The polling interval is shrunk in the
> > function
> > +shrink_halt_poll_ns() and is divided by the module parameter
> > +halt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0.
> > +
> > +It is worth noting that this adjustment process attempts to hone
> > in on some
> > +steady state polling interval but will only really do a good job
> > for wakeups
> > +which come at an approximately constant rate, otherwise there will
> > be constant
> > +adjustment of the polling interval.
> > +
> > +[0] total block time: the time between when the halt polling
> > function is
> > + invoked and a wakeup source received
> > (irrespective of
> > + whether the scheduler is invoked within that
> > function).
> > +
> > +Module Parameters
> > +=================
> > +
> > +The kvm module has 3 tuneable module parameters to adjust the
> > global max
> > +polling interval as well as the rate at which the polling interval
> > is grown and
> > +shrunk. These variables are defined in include/linux/kvm_host.h
> > and as module
> > +parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c
> > in the
> > +powerpc kvm-hv case.
> > +
> > +Module
> > Parameter | Description | Default Value
> > +----------------------------------------------------------------
> > ----------------
> > +halt_poll_ns | The global max polling interval |
> > KVM_HALT_POLL_NS_DEFAULT
> > + | which defines the ceiling value |
> > + | of the polling interval for | (per arch
> > value)
> > + | each vcpu. |
> > +----------------------------------------------------------------
> > ----------------
> > +halt_poll_ns_grow | The value by which the halt | 2
> > + | polling interval is multiplied |
> > + | in the grow_halt_poll_ns() |
> > + | function. |
> > +----------------------------------------------------------------
> > ----------------
> > +halt_poll_ns_shrink | The value by which the halt | 0
> > + | polling interval is divided in |
> > + | the shrink_halt_poll_ns() |
> > + | function. |
> > +----------------------------------------------------------------
> > ----------------
> > +
> > +These module parameters can be set from the debugfs files in:
> > +
> > + /sys/module/kvm/parameters/
> > +
> > +Note: that these module parameters are system wide values and are
> > not able to
> > + be tuned on a per vm basis.
> > +
> > +Further Notes
> > +=============
> > +
> > +- Care should be taken when setting the halt_poll_ns module
> > parameter as a
> > +large value has the potential to drive the cpu usage to 100% on a
> > machine which
> > +would be almost entirely idle otherwise. This is because even if a
> > guest has
> > +wakeups during which very little work is done and which are quite
> > far apart, if
> > +the period is shorter than the global max polling interval
> > (halt_poll_ns) then
> > +the host will always poll for the entire block time and thus cpu
> > utilisation
> > +will go to 100%.
> > +
> > +- Halt polling essentially presents a trade off between power
> > usage and latency
> > +and the module parameters should be used to tune the affinity for
> > this. Idle
> > +cpu time is essentially converted to host kernel time with the aim
> > of decreasing
> > +latency when entering the guest.
> > +
> > +- Halt polling will only be conducted by the host when no other
> > tasks are
> > +runnable on that cpu, otherwise the polling will cease immediately
> > and
> > +schedule will be invoked to allow that other task to run. Thus
> > this doesn't
> > +allow a guest to denial of service the cpu.
> > --
> > 2.5.5
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/6] kvm: powerpc halt polling updates
2016-10-14 0:53 [PATCH 0/6] kvm: powerpc halt polling updates Suraj Jitindar Singh
` (5 preceding siblings ...)
2016-10-14 0:53 ` [PATCH 6/6] doc/kvm: Add halt polling documentation Suraj Jitindar Singh
@ 2016-10-14 3:28 ` Sam Bobroff
2016-10-14 6:27 ` Nicholas Piggin
7 siblings, 0 replies; 11+ messages in thread
From: Sam Bobroff @ 2016-10-14 3:28 UTC (permalink / raw)
To: Suraj Jitindar Singh
Cc: pbonzini, rkrcmar, agraf, corbet, paulus, mpe, kvm, kvm-ppc,
linuxppc-dev, benh, linux-doc
On Fri, Oct 14, 2016 at 11:53:18AM +1100, Suraj Jitindar Singh wrote:
> This patch series makes some updates and bug fixes to the powerpc kvm-hv
> halt polling code.
>
> The first two patches are concerned with exporting the generic kvm module
> parameter variables and accessing these from the powerpc specific code.
>
> The third patch fixes a bug where changing the global max halt polling
> interval module parameter can sometimes have no effect.
>
> The fourth patch decreases the default global max halt polling interval
> to something more sensible.
>
> The fifth patch contains generic fixups with no functional effect.
>
> The last patch adds halt polling documentation.
>
> Suraj Jitindar Singh (6):
> kvm: export kvm module parameter variables
> powerpc/kvm: Use generic kvm module parameters in kvm-hv
> powerpc/kvm: Add check for module parameter halt_poll_ns
> powerpc/kvm: Decrease the powerpc default halt poll max value
> powerpc/kvm: Comment style and print format fixups
> doc/kvm: Add halt polling documentation
>
> Documentation/virtual/kvm/00-INDEX | 2 +
> Documentation/virtual/kvm/halt-polling.txt | 127 +++++++++++++++++++++++++++++
> arch/powerpc/include/asm/kvm_host.h | 2 +-
> arch/powerpc/kvm/book3s_hv.c | 33 ++------
> arch/powerpc/kvm/trace_hv.h | 2 +-
> include/linux/kvm_host.h | 4 +
> virt/kvm/kvm_main.c | 9 +-
> 7 files changed, 149 insertions(+), 30 deletions(-)
> create mode 100644 Documentation/virtual/kvm/halt-polling.txt
>
> --
> 2.5.5
Hi Suraj,
I've given this set a quick test and it seems to work fine. I used a repetitive
wakeup, using a nanosleep loop in guest userspace (with real time prio), and I
was able to cause halt polling to switch on and off as I adjusted halt_poll_ns.
I think the new default value is much better: halt polling started (e.g. CPU
utilization rose to 100%) once CPU utilization had already risen to about 75%.
Cheers,
Sam.
Tested-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/6] kvm: powerpc halt polling updates
2016-10-14 0:53 [PATCH 0/6] kvm: powerpc halt polling updates Suraj Jitindar Singh
` (6 preceding siblings ...)
2016-10-14 3:28 ` [PATCH 0/6] kvm: powerpc halt polling updates Sam Bobroff
@ 2016-10-14 6:27 ` Nicholas Piggin
7 siblings, 0 replies; 11+ messages in thread
From: Nicholas Piggin @ 2016-10-14 6:27 UTC (permalink / raw)
To: Suraj Jitindar Singh
Cc: pbonzini, rkrcmar, agraf, corbet, kvm, linux-doc, kvm-ppc, paulus,
linuxppc-dev, sam.bobroff
On Fri, 14 Oct 2016 11:53:18 +1100
Suraj Jitindar Singh <sjitindarsingh@gmail.com> wrote:
> This patch series makes some updates and bug fixes to the powerpc kvm-hv
> halt polling code.
>
> The first two patches are concerned with exporting the generic kvm module
> parameter variables and accessing these from the powerpc specific code.
>
> The third patch fixes a bug where changing the global max halt polling
> interval module parameter can sometimes have no effect.
>
> The fourth patch decreases the default global max halt polling interval
> to something more sensible.
>
> The fifth patch contains generic fixups with no functional effect.
>
> The last patch adds halt polling documentation.
I want to enable polling idle in Linux for SPLPAR/KVM as we do for
dedicated mode. Essentially the guest OS will spin for a small time
before ceding.
There will be a lot of interaction between this and halt polling. I
think guest polling may still be worthwhile if you have halt polling in
the host, although it might be less effective. We should set up some
performance testing with various guest/host parameters and see what
works best. What have you been testing with so far?
Thanks,
Nick
^ permalink raw reply [flat|nested] 11+ messages in thread