* [Qemu-devel] [PATCH v5 0/2] kvm: limited x86 CPU power management
@ 2018-06-22 19:09 Michael S. Tsirkin
2018-06-22 19:09 ` [Qemu-devel] [PATCH v5 1/2] kvm: support -overcommit cpu-pm=on|off Michael S. Tsirkin
2018-06-22 19:09 ` [Qemu-devel] [PATCH v5 2/2] i386/cpu: make -cpu host support monitor/mwait Michael S. Tsirkin
0 siblings, 2 replies; 4+ messages in thread
From: Michael S. Tsirkin @ 2018-06-22 19:09 UTC (permalink / raw)
To: qemu-devel
Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Marcelo Tosatti, kvm
This adds ability to expose some host CPU power management capabilities
to guests. For intel guests, this is sufficient for guest to enable low
power CPU states on idle. For AMD guests it isn't sufficient, deeper
C-states are entered using System-IO.
When enabled this puts CPU in a low power state with exit latencies that
can go up to multiple milliseconds, and makes host scheduler as well as
host utilities such as top and powertop think the CPU is constantly
busy. Thus it has the effect of dedicating a host CPU for this guest.
mwait based power management is tied closely to specifics of CPUID,
making migration challenging. At this point only the non-migrateable
-cpu host is supported.
With this patch applied, VM latency is within the noise of
baremetal for some benchmarks.
perf bench sched pipe results:
Before:
6.452 sec
After:
4.382 sec
Baremetal:
4.136 sec
Changes since v4:
See v3, now for real.
Changes since v3:
At Paolo's suggestion, rename -dedicated to -overcommit.
Changes since v2:
At Daniel's suggestion, don't use the -realtime flag.
At Paolo's suggestion, group this with memory lock flag
which has a similar effect of dedicating memory to this VM.
Michael S. Tsirkin (2):
kvm: support -overcommit cpu-pm=on|off
i386/cpu: make -cpu host support monitor/mwait
include/sysemu/sysemu.h | 1 +
target/i386/cpu.h | 9 +++++++++
migration/migration.c | 1 +
target/i386/cpu.c | 19 ++++++++++++++-----
target/i386/kvm.c | 32 ++++++++++++++++++++++++++++++++
vl.c | 32 +++++++++++++++++++++++++++++++-
qemu-options.hx | 27 +++++++++++++++++++++++++--
7 files changed, 113 insertions(+), 8 deletions(-)
--
MST
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Qemu-devel] [PATCH v5 1/2] kvm: support -overcommit cpu-pm=on|off
2018-06-22 19:09 [Qemu-devel] [PATCH v5 0/2] kvm: limited x86 CPU power management Michael S. Tsirkin
@ 2018-06-22 19:09 ` Michael S. Tsirkin
2018-06-25 8:48 ` Juan Quintela
2018-06-22 19:09 ` [Qemu-devel] [PATCH v5 2/2] i386/cpu: make -cpu host support monitor/mwait Michael S. Tsirkin
1 sibling, 1 reply; 4+ messages in thread
From: Michael S. Tsirkin @ 2018-06-22 19:09 UTC (permalink / raw)
To: qemu-devel
Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Marcelo Tosatti, kvm, Juan Quintela, Dr. David Alan Gilbert
With this flag, kvm allows guest to control host CPU power state. This
increases latency for other processes using same host CPU in an
unpredictable way, but if decreases idle entry/exit times for the
running VCPU, so to use it QEMU needs a hint about whether host CPU is
overcommitted, hence the flag name.
Follow-up patches will expose this capability to guest
(using mwait leaf).
Based on a patch by Wanpeng Li <kernellwp@gmail.com> .
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
include/sysemu/sysemu.h | 1 +
migration/migration.c | 1 +
target/i386/kvm.c | 23 +++++++++++++++++++++++
vl.c | 32 +++++++++++++++++++++++++++++++-
qemu-options.hx | 27 +++++++++++++++++++++++++--
5 files changed, 81 insertions(+), 3 deletions(-)
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index e893f72f3b..b921c6f3b7 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -128,6 +128,7 @@ extern bool boot_strict;
extern uint8_t *boot_splash_filedata;
extern size_t boot_splash_filedata_size;
extern bool enable_mlock;
+extern bool enable_cpu_pm;
extern uint8_t qemu_extra_params_fw[2];
extern QEMUClockType rtc_clock;
extern const char *mem_path;
diff --git a/migration/migration.c b/migration/migration.c
index 1e99ec9b7e..e468b50c4f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -45,6 +45,7 @@
#include "migration/colo.h"
#include "hw/boards.h"
#include "monitor/monitor.h"
+#include "qemu/ptr_ring.h"
#define MAX_THROTTLE (32 << 20) /* Migration transfer speed throttling */
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 44f70733e7..cf9107be4b 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1357,6 +1357,29 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
smram_machine_done.notify = register_smram_listener;
qemu_add_machine_init_done_notifier(&smram_machine_done);
}
+
+ if (enable_cpu_pm) {
+ int disable_exits = kvm_check_extension(s, KVM_CAP_X86_DISABLE_EXITS);
+ int ret;
+
+/* Work around for kernel header with a typo. TODO: fix header and drop. */
+#if defined(KVM_X86_DISABLE_EXITS_HTL) && !defined(KVM_X86_DISABLE_EXITS_HLT)
+#define KVM_X86_DISABLE_EXITS_HLT KVM_X86_DISABLE_EXITS_HTL
+#endif
+ if (disable_exits) {
+ disable_exits &= (KVM_X86_DISABLE_EXITS_MWAIT |
+ KVM_X86_DISABLE_EXITS_HLT |
+ KVM_X86_DISABLE_EXITS_PAUSE);
+ }
+
+ ret = kvm_vm_enable_cap(s, KVM_CAP_X86_DISABLE_EXITS, 0,
+ disable_exits);
+ if (ret < 0) {
+ error_report("kvm: guest stopping CPU not supported: %s",
+ strerror(-ret));
+ }
+ }
+
return 0;
}
diff --git a/vl.c b/vl.c
index 06031715ac..c9530efed5 100644
--- a/vl.c
+++ b/vl.c
@@ -142,6 +142,7 @@ ram_addr_t ram_size;
const char *mem_path = NULL;
int mem_prealloc = 0; /* force preallocation of physical target memory */
bool enable_mlock = false;
+bool enable_cpu_pm = false;
int nb_nics;
NICInfo nd_table[MAX_NICS];
int autostart;
@@ -390,6 +391,22 @@ static QemuOptsList qemu_realtime_opts = {
},
};
+static QemuOptsList qemu_overcommit_opts = {
+ .name = "overcommit",
+ .head = QTAILQ_HEAD_INITIALIZER(qemu_overcommit_opts.head),
+ .desc = {
+ {
+ .name = "mem-lock",
+ .type = QEMU_OPT_BOOL,
+ },
+ {
+ .name = "cpu-pm",
+ .type = QEMU_OPT_BOOL,
+ },
+ { /* end of list */ }
+ },
+};
+
static QemuOptsList qemu_msg_opts = {
.name = "msg",
.head = QTAILQ_HEAD_INITIALIZER(qemu_msg_opts.head),
@@ -3903,7 +3920,20 @@ int main(int argc, char **argv, char **envp)
if (!opts) {
exit(1);
}
- enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
+ /* Don't override the -overcommit option if set */
+ enable_mlock = enable_mlock ||
+ qemu_opt_get_bool(opts, "mlock", true);
+ break;
+ case QEMU_OPTION_overcommit:
+ opts = qemu_opts_parse_noisily(qemu_find_opts("overcommit"),
+ optarg, false);
+ if (!opts) {
+ exit(1);
+ }
+ /* Don't override the -realtime option if set */
+ enable_mlock = enable_mlock ||
+ qemu_opt_get_bool(opts, "mem-lock", false);
+ enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", false);
break;
case QEMU_OPTION_msg:
opts = qemu_opts_parse_noisily(qemu_find_opts("msg"), optarg,
diff --git a/qemu-options.hx b/qemu-options.hx
index c0d3951e9f..1bba3d258b 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3328,8 +3328,7 @@ DEF("realtime", HAS_ARG, QEMU_OPTION_realtime,
"-realtime [mlock=on|off]\n"
" run qemu with realtime features\n"
" mlock=on|off controls mlock support (default: on)\n",
- QEMU_ARCH_ALL)
-STEXI
+ QEMU_ARCH_ALL) STEXI
@item -realtime mlock=on|off
@findex -realtime
Run qemu with realtime features.
@@ -3337,6 +3336,30 @@ mlocking qemu and guest memory can be enabled via @option{mlock=on}
(enabled by default).
ETEXI
+DEF("overcommit", HAS_ARG, QEMU_OPTION_overcommit,
+ "--overcommit [mem-lock=on|off][cpu-pm=on|off]\n"
+ " run qemu with overcommit hints\n"
+ " mem-lock=on|off controls memory lock support (default: off)\n"
+ " cpu-pm=on|off controls cpu power management (default: off)\n",
+ QEMU_ARCH_ALL)
+STEXI
+@item -overcommit mem-lock=on|off
+@item -overcommit cpu-pm=on|off
+@findex -overcommit
+Run qemu with hints about host resource overcommit. The default is
+to assume that host overcommits all resources.
+
+Locking qemu and guest memory can be enabled via @option{mem-lock=on} (disabled
+by default). This works when host memory is not overcommitted and reduces the
+worst-case latency for guest. This is equivalent to @option{realtime}.
+
+Guest ability to manage power state of host cpus (increasing latency for other
+processes on the same host cpu, but decreasing latency for guest) can be
+enabled via @option{cpu-pm=on} (disabled by default). This works best when
+host CPU is not overcommitted. When used, host estimates of CPU cycle and power
+utilization will be incorrect, not taking into account guest idle time.
+ETEXI
+
DEF("gdb", HAS_ARG, QEMU_OPTION_gdb, \
"-gdb dev wait for gdb connection on 'dev'\n", QEMU_ARCH_ALL)
STEXI
--
MST
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [Qemu-devel] [PATCH v5 2/2] i386/cpu: make -cpu host support monitor/mwait
2018-06-22 19:09 [Qemu-devel] [PATCH v5 0/2] kvm: limited x86 CPU power management Michael S. Tsirkin
2018-06-22 19:09 ` [Qemu-devel] [PATCH v5 1/2] kvm: support -overcommit cpu-pm=on|off Michael S. Tsirkin
@ 2018-06-22 19:09 ` Michael S. Tsirkin
1 sibling, 0 replies; 4+ messages in thread
From: Michael S. Tsirkin @ 2018-06-22 19:09 UTC (permalink / raw)
To: qemu-devel
Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Marcelo Tosatti, kvm
When guest CPU PM is enabled, and with -cpu host, expose the host CPU
MWAIT leaf in the CPUID so guest can make good PM decisions.
Note: the result is 100% CPU utilization reported by host as host
no longer knows that the CPU is halted.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
---
target/i386/cpu.h | 9 +++++++++
target/i386/cpu.c | 19 ++++++++++++++-----
target/i386/kvm.c | 9 +++++++++
3 files changed, 32 insertions(+), 5 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 664504610e..309f804573 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1378,6 +1378,15 @@ struct X86CPU {
/* if true the CPUID code directly forward host cache leaves to the guest */
bool cache_info_passthrough;
+ /* if true the CPUID code directly forwards
+ * host monitor/mwait leaves to the guest */
+ struct {
+ uint32_t eax;
+ uint32_t ebx;
+ uint32_t ecx;
+ uint32_t edx;
+ } mwait;
+
/* Features that were filtered out because of missing host capabilities */
uint32_t filtered_features[FEATURE_WORDS];
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 94260412e2..a4fb856d58 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3760,11 +3760,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
}
break;
case 5:
- /* mwait info: needed for Core compatibility */
- *eax = 0; /* Smallest monitor-line size in bytes */
- *ebx = 0; /* Largest monitor-line size in bytes */
- *ecx = CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
- *edx = 0;
+ /* MONITOR/MWAIT Leaf */
+ *eax = cpu->mwait.eax; /* Smallest monitor-line size in bytes */
+ *ebx = cpu->mwait.ebx; /* Largest monitor-line size in bytes */
+ *ecx = cpu->mwait.ecx; /* flags */
+ *edx = cpu->mwait.edx; /* mwait substates */
break;
case 6:
/* Thermal and Power Leaf */
@@ -4595,6 +4595,15 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
goto out;
}
+ if (xcc->host_cpuid_required && enable_cpu_pm) {
+ host_cpuid(5, 0, &cpu->mwait.eax, &cpu->mwait.ebx,
+ &cpu->mwait.ecx, &cpu->mwait.edx);
+ env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
+ }
+ /* mwait extended info: needed for Core compatibility */
+ /* We always wake on interrupt even if host does not have the capability */
+ cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
+
if (cpu->apic_id == UNASSIGNED_APIC_ID) {
error_setg(errp, "apic-id property was not initialized properly");
return;
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index cf9107be4b..805968d5b7 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -366,6 +366,15 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
if (!kvm_irqchip_in_kernel()) {
ret &= ~CPUID_EXT_X2APIC;
}
+
+ if (enable_cpu_pm) {
+ int disable_exits = kvm_check_extension(s,
+ KVM_CAP_X86_DISABLE_EXITS);
+
+ if (disable_exits & KVM_X86_DISABLE_EXITS_MWAIT) {
+ ret |= CPUID_EXT_MONITOR;
+ }
+ }
} else if (function == 6 && reg == R_EAX) {
ret |= CPUID_6_EAX_ARAT; /* safe to allow because of emulated APIC */
} else if (function == 7 && index == 0 && reg == R_EBX) {
--
MST
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH v5 1/2] kvm: support -overcommit cpu-pm=on|off
2018-06-22 19:09 ` [Qemu-devel] [PATCH v5 1/2] kvm: support -overcommit cpu-pm=on|off Michael S. Tsirkin
@ 2018-06-25 8:48 ` Juan Quintela
0 siblings, 0 replies; 4+ messages in thread
From: Juan Quintela @ 2018-06-25 8:48 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: qemu-devel, Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Marcelo Tosatti, kvm, Dr. David Alan Gilbert
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> With this flag, kvm allows guest to control host CPU power state. This
> increases latency for other processes using same host CPU in an
> unpredictable way, but if decreases idle entry/exit times for the
> running VCPU, so to use it QEMU needs a hint about whether host CPU is
> overcommitted, hence the flag name.
>
> Follow-up patches will expose this capability to guest
> (using mwait leaf).
>
> Based on a patch by Wanpeng Li <kernellwp@gmail.com> .
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
[...]
> diff --git a/migration/migration.c b/migration/migration.c
> index 1e99ec9b7e..e468b50c4f 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -45,6 +45,7 @@
> #include "migration/colo.h"
> #include "hw/boards.h"
> #include "monitor/monitor.h"
> +#include "qemu/ptr_ring.h"
>
> #define MAX_THROTTLE (32 << 20) /* Migration transfer speed throttling */
>
Why is this chuck needed? I can't see a reason.
Thanks, Juan.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-06-25 8:45 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-22 19:09 [Qemu-devel] [PATCH v5 0/2] kvm: limited x86 CPU power management Michael S. Tsirkin
2018-06-22 19:09 ` [Qemu-devel] [PATCH v5 1/2] kvm: support -overcommit cpu-pm=on|off Michael S. Tsirkin
2018-06-25 8:48 ` Juan Quintela
2018-06-22 19:09 ` [Qemu-devel] [PATCH v5 2/2] i386/cpu: make -cpu host support monitor/mwait Michael S. Tsirkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).