* [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2)
@ 2009-11-19 13:34 Avi Kivity
2009-11-19 13:34 ` [PATCH 01/35] KVM: SVM: Add tracepoint for #vmexit because intr pending Avi Kivity
` (34 more replies)
0 siblings, 35 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
Highlights:
- improved kernel context switching speed
- better interoperation with other users of virtualization extensions
- improved irq scaling
- nested svm improvements and tracing
- improved cpufreq integration
- spin loop detection on newer hardware
Notes:
- kvm/ppc64 support will be merged through the powerpc tree
- depends on tip x86/entry branch (user return notifiers)
Arnd Bergmann (1):
KVM: Enable 32bit dirty log pointers on 64bit host
Avi Kivity (6):
KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx autoload msr area
KVM: x86 shared msr infrastructure
KVM: VMX: Use shared msr infrastructure
KVM: VMX: Remove vmx->msr_offset_efer
KVM: Allow internal errors reported to userspace to carry extra data
KVM: VMX: Report unexpected simultaneous exceptions as internal
errors
Ed Swierk (1):
KVM: Xen PV-on-HVM guest support
Eduardo Habkost (3):
KVM: VMX: Use macros instead of hex value on cr0 initialization
KVM: SVM: Reset cr0 properly on vcpu reset
KVM: SVM: init_vmcb(): remove redundant save->cr0 initialization
Glauber Costa (1):
KVM: allow userspace to adjust kvmclock offset
Gleb Natapov (1):
KVM: remove duplicated task_switch check
Hollis Blanchard (1):
KVM: powerpc: Fix BUILD_BUG_ON condition
Jan Kiszka (6):
KVM: x86: Drop unneeded CONFIG_HAS_IOMEM check
KVM: x86: Fix guest single-stepping while interruptible
KVM: SVM: Cleanup NMI singlestep
KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
KVM: Reorder IOCTLs in main kvm.h
KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
Joerg Roedel (4):
KVM: SVM: Add tracepoint for #vmexit because intr pending
KVM: SVM: Add tracepoint for invlpga instruction
KVM: SVM: Add tracepoint for skinit instruction
KVM: SVM: Remove nsvm_printk debugging code
Marcelo Tosatti (7):
KVM: VMX: fix handle_pause declaration
KVM: fix irq_source_id size verification
KVM: VMX: move CR3/PDPTR update to vmx_set_cr3
KVM: MMU: update invlpg handler comment
KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel
lapic
KVM: only clear irq_source_id if irqchip is present
Mark Langsdorf (1):
KVM: SVM: Support Pause Filter in AMD processors
Zachary Amsden (1):
KVM: x86: Harden against cpufreq
Zhai, Edwin (2):
KVM: introduce kvm_vcpu_on_spin
KVM: VMX: Add support for Pause-Loop Exiting
Documentation/kvm/api.txt | 109 +++++++++++
arch/powerpc/kvm/timing.h | 2 +-
arch/x86/include/asm/kvm.h | 29 +++
arch/x86/include/asm/kvm_host.h | 13 ++-
arch/x86/include/asm/svm.h | 3 +-
arch/x86/include/asm/vmx.h | 4 +
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/irq.h | 6 +-
arch/x86/kvm/mmu.c | 1 +
arch/x86/kvm/paging_tmpl.h | 1 -
arch/x86/kvm/svm.c | 107 ++++++-----
arch/x86/kvm/trace.h | 63 +++++++
arch/x86/kvm/vmx.c | 253 ++++++++++++++++----------
arch/x86/kvm/x86.c | 379 ++++++++++++++++++++++++++++++++++-----
include/linux/kvm.h | 264 +++++++++++++++------------
include/linux/kvm_host.h | 1 +
virt/kvm/irq_comm.c | 12 +-
virt/kvm/kvm_main.c | 67 +++++++-
18 files changed, 1002 insertions(+), 313 deletions(-)
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 01/35] KVM: SVM: Add tracepoint for #vmexit because intr pending
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 02/35] KVM: SVM: Add tracepoint for invlpga instruction Avi Kivity
` (33 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Joerg Roedel <joerg.roedel@amd.com>
This patch adds a special tracepoint for the event that a
nested #vmexit is injected because kvm wants to inject an
interrupt into the guest.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
arch/x86/kvm/svm.c | 2 +-
arch/x86/kvm/trace.h | 18 ++++++++++++++++++
arch/x86/kvm/x86.c | 1 +
3 files changed, 20 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 369eeb8..78a391c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1390,7 +1390,7 @@ static inline int nested_svm_intr(struct vcpu_svm *svm)
* the #vmexit here.
*/
svm->nested.exit_required = true;
- nsvm_printk("VMexit -> INTR\n");
+ trace_kvm_nested_intr_vmexit(svm->vmcb->save.rip);
return 1;
}
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 4d6bb5e..3cc8f44 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -451,6 +451,24 @@ TRACE_EVENT(kvm_nested_vmexit_inject,
__entry->exit_info1, __entry->exit_info2,
__entry->exit_int_info, __entry->exit_int_info_err)
);
+
+/*
+ * Tracepoint for nested #vmexit because of interrupt pending
+ */
+TRACE_EVENT(kvm_nested_intr_vmexit,
+ TP_PROTO(__u64 rip),
+ TP_ARGS(rip),
+
+ TP_STRUCT__entry(
+ __field( __u64, rip )
+ ),
+
+ TP_fast_assign(
+ __entry->rip = rip
+ ),
+
+ TP_printk("rip: 0x%016llx\n", __entry->rip)
+);
#endif /* _TRACE_KVM_H */
/* This part must be outside protection */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a522d9b..2cf4146 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4987,3 +4987,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intr_vmexit);
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 02/35] KVM: SVM: Add tracepoint for invlpga instruction
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
2009-11-19 13:34 ` [PATCH 01/35] KVM: SVM: Add tracepoint for #vmexit because intr pending Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 03/35] KVM: SVM: Add tracepoint for skinit instruction Avi Kivity
` (32 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Joerg Roedel <joerg.roedel@amd.com>
This patch adds a tracepoint for the event that the guest
executed the INVLPGA instruction.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
arch/x86/kvm/svm.c | 3 +++
arch/x86/kvm/trace.h | 23 +++++++++++++++++++++++
arch/x86/kvm/x86.c | 1 +
3 files changed, 27 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 78a391c..ba18fb7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1976,6 +1976,9 @@ static int invlpga_interception(struct vcpu_svm *svm)
struct kvm_vcpu *vcpu = &svm->vcpu;
nsvm_printk("INVLPGA\n");
+ trace_kvm_invlpga(svm->vmcb->save.rip, vcpu->arch.regs[VCPU_REGS_RCX],
+ vcpu->arch.regs[VCPU_REGS_RAX]);
+
/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
kvm_mmu_invlpg(vcpu, vcpu->arch.regs[VCPU_REGS_RAX]);
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 3cc8f44..7e1f08e 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -469,6 +469,29 @@ TRACE_EVENT(kvm_nested_intr_vmexit,
TP_printk("rip: 0x%016llx\n", __entry->rip)
);
+
+/*
+ * Tracepoint for nested #vmexit because of interrupt pending
+ */
+TRACE_EVENT(kvm_invlpga,
+ TP_PROTO(__u64 rip, int asid, u64 address),
+ TP_ARGS(rip, asid, address),
+
+ TP_STRUCT__entry(
+ __field( __u64, rip )
+ __field( int, asid )
+ __field( __u64, address )
+ ),
+
+ TP_fast_assign(
+ __entry->rip = rip;
+ __entry->asid = asid;
+ __entry->address = address;
+ ),
+
+ TP_printk("rip: 0x%016llx asid: %d address: 0x%016llx\n",
+ __entry->rip, __entry->asid, __entry->address)
+);
#endif /* _TRACE_KVM_H */
/* This part must be outside protection */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2cf4146..86596fc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4988,3 +4988,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intr_vmexit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_invlpga);
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 03/35] KVM: SVM: Add tracepoint for skinit instruction
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
2009-11-19 13:34 ` [PATCH 01/35] KVM: SVM: Add tracepoint for #vmexit because intr pending Avi Kivity
2009-11-19 13:34 ` [PATCH 02/35] KVM: SVM: Add tracepoint for invlpga instruction Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 04/35] KVM: SVM: Remove nsvm_printk debugging code Avi Kivity
` (31 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Joerg Roedel <joerg.roedel@amd.com>
This patch adds a tracepoint for the event that the guest
executed the SKINIT instruction. This information is
important because SKINIT is an SVM extenstion not yet
implemented by nested SVM and we may need this information
for debugging hypervisors that do not yet run on nested SVM.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
arch/x86/kvm/svm.c | 10 +++++++++-
arch/x86/kvm/trace.h | 22 ++++++++++++++++++++++
arch/x86/kvm/x86.c | 1 +
3 files changed, 32 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ba18fb7..8b9f6fb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1987,6 +1987,14 @@ static int invlpga_interception(struct vcpu_svm *svm)
return 1;
}
+static int skinit_interception(struct vcpu_svm *svm)
+{
+ trace_kvm_skinit(svm->vmcb->save.rip, svm->vcpu.arch.regs[VCPU_REGS_RAX]);
+
+ kvm_queue_exception(&svm->vcpu, UD_VECTOR);
+ return 1;
+}
+
static int invalid_op_interception(struct vcpu_svm *svm)
{
kvm_queue_exception(&svm->vcpu, UD_VECTOR);
@@ -2350,7 +2358,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_VMSAVE] = vmsave_interception,
[SVM_EXIT_STGI] = stgi_interception,
[SVM_EXIT_CLGI] = clgi_interception,
- [SVM_EXIT_SKINIT] = invalid_op_interception,
+ [SVM_EXIT_SKINIT] = skinit_interception,
[SVM_EXIT_WBINVD] = emulate_on_interception,
[SVM_EXIT_MONITOR] = invalid_op_interception,
[SVM_EXIT_MWAIT] = invalid_op_interception,
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 7e1f08e..816e044 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -492,6 +492,28 @@ TRACE_EVENT(kvm_invlpga,
TP_printk("rip: 0x%016llx asid: %d address: 0x%016llx\n",
__entry->rip, __entry->asid, __entry->address)
);
+
+/*
+ * Tracepoint for nested #vmexit because of interrupt pending
+ */
+TRACE_EVENT(kvm_skinit,
+ TP_PROTO(__u64 rip, __u32 slb),
+ TP_ARGS(rip, slb),
+
+ TP_STRUCT__entry(
+ __field( __u64, rip )
+ __field( __u32, slb )
+ ),
+
+ TP_fast_assign(
+ __entry->rip = rip;
+ __entry->slb = slb;
+ ),
+
+ TP_printk("rip: 0x%016llx slb: 0x%08x\n",
+ __entry->rip, __entry->slb)
+);
+
#endif /* _TRACE_KVM_H */
/* This part must be outside protection */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 86596fc..098e7f8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4989,3 +4989,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intr_vmexit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_invlpga);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_skinit);
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 04/35] KVM: SVM: Remove nsvm_printk debugging code
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (2 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 03/35] KVM: SVM: Add tracepoint for skinit instruction Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 05/35] KVM: introduce kvm_vcpu_on_spin Avi Kivity
` (30 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Joerg Roedel <joerg.roedel@amd.com>
With all important informations now delivered through
tracepoints we can savely remove the nsvm_printk debugging
code for nested svm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
arch/x86/kvm/svm.c | 34 ----------------------------------
1 files changed, 0 insertions(+), 34 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8b9f6fb..69610c5 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -53,15 +53,6 @@ MODULE_LICENSE("GPL");
#define DEBUGCTL_RESERVED_BITS (~(0x3fULL))
-/* Turn on to get debugging output*/
-/* #define NESTED_DEBUG */
-
-#ifdef NESTED_DEBUG
-#define nsvm_printk(fmt, args...) printk(KERN_INFO fmt, ## args)
-#else
-#define nsvm_printk(fmt, args...) do {} while(0)
-#endif
-
static const u32 host_save_user_msrs[] = {
#ifdef CONFIG_X86_64
MSR_STAR, MSR_LSTAR, MSR_CSTAR, MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE,
@@ -1540,14 +1531,12 @@ static int nested_svm_exit_handled(struct vcpu_svm *svm)
}
default: {
u64 exit_bits = 1ULL << (exit_code - SVM_EXIT_INTR);
- nsvm_printk("exit code: 0x%x\n", exit_code);
if (svm->nested.intercept & exit_bits)
vmexit = NESTED_EXIT_DONE;
}
}
if (vmexit == NESTED_EXIT_DONE) {
- nsvm_printk("#VMEXIT reason=%04x\n", exit_code);
nested_svm_vmexit(svm);
}
@@ -1658,10 +1647,6 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
/* Restore the original control entries */
copy_vmcb_control_area(vmcb, hsave);
- /* Kill any pending exceptions */
- if (svm->vcpu.arch.exception.pending == true)
- nsvm_printk("WARNING: Pending Exception\n");
-
kvm_clear_exception_queue(&svm->vcpu);
kvm_clear_interrupt_queue(&svm->vcpu);
@@ -1826,25 +1811,14 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
force_new_asid(&svm->vcpu);
svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | V_INTR_MASKING_MASK;
- if (nested_vmcb->control.int_ctl & V_IRQ_MASK) {
- nsvm_printk("nSVM Injecting Interrupt: 0x%x\n",
- nested_vmcb->control.int_ctl);
- }
if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
svm->vcpu.arch.hflags |= HF_VINTR_MASK;
else
svm->vcpu.arch.hflags &= ~HF_VINTR_MASK;
- nsvm_printk("nSVM exit_int_info: 0x%x | int_state: 0x%x\n",
- nested_vmcb->control.exit_int_info,
- nested_vmcb->control.int_state);
-
svm->vmcb->control.int_vector = nested_vmcb->control.int_vector;
svm->vmcb->control.int_state = nested_vmcb->control.int_state;
svm->vmcb->control.tsc_offset += nested_vmcb->control.tsc_offset;
- if (nested_vmcb->control.event_inj & SVM_EVTINJ_VALID)
- nsvm_printk("Injecting Event: 0x%x\n",
- nested_vmcb->control.event_inj);
svm->vmcb->control.event_inj = nested_vmcb->control.event_inj;
svm->vmcb->control.event_inj_err = nested_vmcb->control.event_inj_err;
@@ -1913,8 +1887,6 @@ static int vmsave_interception(struct vcpu_svm *svm)
static int vmrun_interception(struct vcpu_svm *svm)
{
- nsvm_printk("VMrun\n");
-
if (nested_svm_check_permissions(svm))
return 1;
@@ -1974,7 +1946,6 @@ static int clgi_interception(struct vcpu_svm *svm)
static int invlpga_interception(struct vcpu_svm *svm)
{
struct kvm_vcpu *vcpu = &svm->vcpu;
- nsvm_printk("INVLPGA\n");
trace_kvm_invlpga(svm->vmcb->save.rip, vcpu->arch.regs[VCPU_REGS_RCX],
vcpu->arch.regs[VCPU_REGS_RAX]);
@@ -2389,10 +2360,6 @@ static int handle_exit(struct kvm_vcpu *vcpu)
svm->vmcb->control.exit_int_info,
svm->vmcb->control.exit_int_info_err);
- nsvm_printk("nested handle_exit: 0x%x | 0x%lx | 0x%lx | 0x%lx\n",
- exit_code, svm->vmcb->control.exit_info_1,
- svm->vmcb->control.exit_info_2, svm->vmcb->save.rip);
-
vmexit = nested_svm_exit_special(svm);
if (vmexit == NESTED_EXIT_CONTINUE)
@@ -2539,7 +2506,6 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu)
static void enable_irq_window(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- nsvm_printk("Trying to open IRQ window\n");
nested_svm_intr(svm);
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 05/35] KVM: introduce kvm_vcpu_on_spin
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (3 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 04/35] KVM: SVM: Remove nsvm_printk debugging code Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 06/35] KVM: VMX: Add support for Pause-Loop Exiting Avi Kivity
` (29 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Zhai, Edwin <edwin.zhai@intel.com>
Introduce kvm_vcpu_on_spin, to be used by VMX/SVM to yield processing
once the cpu detects pause-based looping.
Signed-off-by: "Zhai, Edwin" <edwin.zhai@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 15 +++++++++++++++
2 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b985a29..bd5a616 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -286,6 +286,7 @@ int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
void kvm_vcpu_block(struct kvm_vcpu *vcpu);
+void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
void kvm_resched(struct kvm_vcpu *vcpu);
void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 70c8cbe..cac69c4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1108,6 +1108,21 @@ void kvm_resched(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_resched);
+void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu)
+{
+ ktime_t expires;
+ DEFINE_WAIT(wait);
+
+ prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
+
+ /* Sleep for 100 us, and hope lock-holder got scheduled */
+ expires = ktime_add_ns(ktime_get(), 100000UL);
+ schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
+
+ finish_wait(&vcpu->wq, &wait);
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);
+
static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
struct kvm_vcpu *vcpu = vma->vm_file->private_data;
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 06/35] KVM: VMX: Add support for Pause-Loop Exiting
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (4 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 05/35] KVM: introduce kvm_vcpu_on_spin Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 07/35] KVM: SVM: Support Pause Filter in AMD processors Avi Kivity
` (28 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Zhai, Edwin <edwin.zhai@intel.com>
New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap - upper bound on the amount of time between two successive
executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
a PAUSE loop
If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.
Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.
Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).
Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
arch/x86/include/asm/vmx.h | 4 +++
arch/x86/kvm/vmx.c | 51 +++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 54 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 272514c..2b49454 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -56,6 +56,7 @@
#define SECONDARY_EXEC_ENABLE_VPID 0x00000020
#define SECONDARY_EXEC_WBINVD_EXITING 0x00000040
#define SECONDARY_EXEC_UNRESTRICTED_GUEST 0x00000080
+#define SECONDARY_EXEC_PAUSE_LOOP_EXITING 0x00000400
#define PIN_BASED_EXT_INTR_MASK 0x00000001
@@ -144,6 +145,8 @@ enum vmcs_field {
VM_ENTRY_INSTRUCTION_LEN = 0x0000401a,
TPR_THRESHOLD = 0x0000401c,
SECONDARY_VM_EXEC_CONTROL = 0x0000401e,
+ PLE_GAP = 0x00004020,
+ PLE_WINDOW = 0x00004022,
VM_INSTRUCTION_ERROR = 0x00004400,
VM_EXIT_REASON = 0x00004402,
VM_EXIT_INTR_INFO = 0x00004404,
@@ -248,6 +251,7 @@ enum vmcs_field {
#define EXIT_REASON_MSR_READ 31
#define EXIT_REASON_MSR_WRITE 32
#define EXIT_REASON_MWAIT_INSTRUCTION 36
+#define EXIT_REASON_PAUSE_INSTRUCTION 40
#define EXIT_REASON_MCE_DURING_VMENTRY 41
#define EXIT_REASON_TPR_BELOW_THRESHOLD 43
#define EXIT_REASON_APIC_ACCESS 44
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 70020e5..a4580d6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -61,6 +61,25 @@ module_param_named(unrestricted_guest,
static int __read_mostly emulate_invalid_guest_state = 0;
module_param(emulate_invalid_guest_state, bool, S_IRUGO);
+/*
+ * These 2 parameters are used to config the controls for Pause-Loop Exiting:
+ * ple_gap: upper bound on the amount of time between two successive
+ * executions of PAUSE in a loop. Also indicate if ple enabled.
+ * According to test, this time is usually small than 41 cycles.
+ * ple_window: upper bound on the amount of time a guest is allowed to execute
+ * in a PAUSE loop. Tests indicate that most spinlocks are held for
+ * less than 2^12 cycles
+ * Time is measured based on a counter that runs at the same rate as the TSC,
+ * refer SDM volume 3b section 21.6.13 & 22.1.3.
+ */
+#define KVM_VMX_DEFAULT_PLE_GAP 41
+#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
+static int ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
+module_param(ple_gap, int, S_IRUGO);
+
+static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
+module_param(ple_window, int, S_IRUGO);
+
struct vmcs {
u32 revision_id;
u32 abort;
@@ -319,6 +338,12 @@ static inline int cpu_has_vmx_unrestricted_guest(void)
SECONDARY_EXEC_UNRESTRICTED_GUEST;
}
+static inline int cpu_has_vmx_ple(void)
+{
+ return vmcs_config.cpu_based_2nd_exec_ctrl &
+ SECONDARY_EXEC_PAUSE_LOOP_EXITING;
+}
+
static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm)
{
return flexpriority_enabled &&
@@ -1240,7 +1265,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
SECONDARY_EXEC_WBINVD_EXITING |
SECONDARY_EXEC_ENABLE_VPID |
SECONDARY_EXEC_ENABLE_EPT |
- SECONDARY_EXEC_UNRESTRICTED_GUEST;
+ SECONDARY_EXEC_UNRESTRICTED_GUEST |
+ SECONDARY_EXEC_PAUSE_LOOP_EXITING;
if (adjust_vmx_controls(min2, opt2,
MSR_IA32_VMX_PROCBASED_CTLS2,
&_cpu_based_2nd_exec_control) < 0)
@@ -1386,6 +1412,9 @@ static __init int hardware_setup(void)
if (enable_ept && !cpu_has_vmx_ept_2m_page())
kvm_disable_largepages();
+ if (!cpu_has_vmx_ple())
+ ple_gap = 0;
+
return alloc_kvm_area();
}
@@ -2298,9 +2327,16 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
if (!enable_unrestricted_guest)
exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
+ if (!ple_gap)
+ exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
}
+ if (ple_gap) {
+ vmcs_write32(PLE_GAP, ple_gap);
+ vmcs_write32(PLE_WINDOW, ple_window);
+ }
+
vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, !!bypass_guest_pf);
vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, !!bypass_guest_pf);
vmcs_write32(CR3_TARGET_COUNT, 0); /* 22.2.1 */
@@ -3348,6 +3384,18 @@ out:
}
/*
+ * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
+ * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
+ */
+static int handle_pause(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+{
+ skip_emulated_instruction(vcpu);
+ kvm_vcpu_on_spin(vcpu);
+
+ return 1;
+}
+
+/*
* The exit handlers return 1 if the exit was handled fully and guest execution
* may resume. Otherwise they set the kvm_run parameter to indicate what needs
* to be done to userspace and return 0.
@@ -3383,6 +3431,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
[EXIT_REASON_MCE_DURING_VMENTRY] = handle_machine_check,
[EXIT_REASON_EPT_VIOLATION] = handle_ept_violation,
[EXIT_REASON_EPT_MISCONFIG] = handle_ept_misconfig,
+ [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause,
};
static const int kvm_vmx_max_exit_handlers =
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 07/35] KVM: SVM: Support Pause Filter in AMD processors
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (5 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 06/35] KVM: VMX: Add support for Pause-Loop Exiting Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 08/35] KVM: x86: Harden against cpufreq Avi Kivity
` (27 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Mark Langsdorf <mark.langsdorf@amd.com>
New AMD processors (Family 0x10 models 8+) support the Pause
Filter Feature. This feature creates a new field in the VMCB
called Pause Filter Count. If Pause Filter Count is greater
than 0 and intercepting PAUSEs is enabled, the processor will
increment an internal counter when a PAUSE instruction occurs
instead of intercepting. When the internal counter reaches the
Pause Filter Count value, a PAUSE intercept will occur.
This feature can be used to detect contended spinlocks,
especially when the lock holding VCPU is not scheduled.
Rescheduling another VCPU prevents the VCPU seeking the
lock from wasting its quantum by spinning idly.
Experimental results show that most spinlocks are held
for less than 1000 PAUSE cycles or more than a few
thousand. Default the Pause Filter Counter to 3000 to
detect the contended spinlocks.
Processor support for this feature is indicated by a CPUID
bit.
On a 24 core system running 4 guests each with 16 VCPUs,
this patch improved overall performance of each guest's
32 job kernbench by approximately 3-5% when combined
with a scheduler algorithm thati caused the VCPU to
sleep for a brief period. Further performance improvement
may be possible with a more sophisticated yield algorithm.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
arch/x86/include/asm/svm.h | 3 ++-
arch/x86/kvm/svm.c | 13 +++++++++++++
2 files changed, 15 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 85574b7..1fecb7e 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -57,7 +57,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u16 intercept_dr_write;
u32 intercept_exceptions;
u64 intercept;
- u8 reserved_1[44];
+ u8 reserved_1[42];
+ u16 pause_filter_count;
u64 iopm_base_pa;
u64 msrpm_base_pa;
u64 tsc_offset;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 69610c5..170b2d9 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -46,6 +46,7 @@ MODULE_LICENSE("GPL");
#define SVM_FEATURE_NPT (1 << 0)
#define SVM_FEATURE_LBRV (1 << 1)
#define SVM_FEATURE_SVML (1 << 2)
+#define SVM_FEATURE_PAUSE_FILTER (1 << 10)
#define NESTED_EXIT_HOST 0 /* Exit handled on host level */
#define NESTED_EXIT_DONE 1 /* Exit caused nested vmexit */
@@ -654,6 +655,11 @@ static void init_vmcb(struct vcpu_svm *svm)
svm->nested.vmcb = 0;
svm->vcpu.arch.hflags = 0;
+ if (svm_has(SVM_FEATURE_PAUSE_FILTER)) {
+ control->pause_filter_count = 3000;
+ control->intercept |= (1ULL << INTERCEPT_PAUSE);
+ }
+
enable_gif(svm);
}
@@ -2281,6 +2287,12 @@ static int interrupt_window_interception(struct vcpu_svm *svm)
return 1;
}
+static int pause_interception(struct vcpu_svm *svm)
+{
+ kvm_vcpu_on_spin(&(svm->vcpu));
+ return 1;
+}
+
static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_READ_CR0] = emulate_on_interception,
[SVM_EXIT_READ_CR3] = emulate_on_interception,
@@ -2316,6 +2328,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_CPUID] = cpuid_interception,
[SVM_EXIT_IRET] = iret_interception,
[SVM_EXIT_INVD] = emulate_on_interception,
+ [SVM_EXIT_PAUSE] = pause_interception,
[SVM_EXIT_HLT] = halt_interception,
[SVM_EXIT_INVLPG] = invlpg_interception,
[SVM_EXIT_INVLPGA] = invlpga_interception,
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 08/35] KVM: x86: Harden against cpufreq
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (6 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 07/35] KVM: SVM: Support Pause Filter in AMD processors Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 09/35] KVM: VMX: fix handle_pause declaration Avi Kivity
` (26 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Zachary Amsden <zamsden@redhat.com>
If cpufreq can't determine the CPU khz, or cpufreq is not compiled in,
we should fallback to the measured TSC khz.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
arch/x86/kvm/x86.c | 16 ++++++++++++----
1 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 098e7f8..3cffa2c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1348,8 +1348,12 @@ out:
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
kvm_x86_ops->vcpu_load(vcpu, cpu);
- if (unlikely(per_cpu(cpu_tsc_khz, cpu) == 0))
- per_cpu(cpu_tsc_khz, cpu) = cpufreq_quick_get(cpu);
+ if (unlikely(per_cpu(cpu_tsc_khz, cpu) == 0)) {
+ unsigned long khz = cpufreq_quick_get(cpu);
+ if (!khz)
+ khz = tsc_khz;
+ per_cpu(cpu_tsc_khz, cpu) = khz;
+ }
kvm_request_guest_time_update(vcpu);
}
@@ -3144,8 +3148,12 @@ static void kvm_timer_init(void)
if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
cpufreq_register_notifier(&kvmclock_cpufreq_notifier_block,
CPUFREQ_TRANSITION_NOTIFIER);
- for_each_online_cpu(cpu)
- per_cpu(cpu_tsc_khz, cpu) = cpufreq_get(cpu);
+ for_each_online_cpu(cpu) {
+ unsigned long khz = cpufreq_get(cpu);
+ if (!khz)
+ khz = tsc_khz;
+ per_cpu(cpu_tsc_khz, cpu) = khz;
+ }
} else {
for_each_possible_cpu(cpu)
per_cpu(cpu_tsc_khz, cpu) = tsc_khz;
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 09/35] KVM: VMX: fix handle_pause declaration
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (7 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 08/35] KVM: x86: Harden against cpufreq Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 10/35] KVM: x86: Drop unneeded CONFIG_HAS_IOMEM check Avi Kivity
` (25 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Marcelo Tosatti <mtosatti@redhat.com>
There's no kvm_run argument anymore.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
arch/x86/kvm/vmx.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a4580d6..364263a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3387,7 +3387,7 @@ out:
* Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
* exiting, so only get here on cpu with PAUSE-Loop-Exiting.
*/
-static int handle_pause(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+static int handle_pause(struct kvm_vcpu *vcpu)
{
skip_emulated_instruction(vcpu);
kvm_vcpu_on_spin(vcpu);
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 10/35] KVM: x86: Drop unneeded CONFIG_HAS_IOMEM check
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (8 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 09/35] KVM: VMX: fix handle_pause declaration Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 11/35] KVM: Xen PV-on-HVM guest support Avi Kivity
` (24 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Jan Kiszka <jan.kiszka@web.de>
This (broken) check dates back to the days when this code was shared
across architectures. x86 has IOMEM, so drop it.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
arch/x86/kvm/x86.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3cffa2c..5d450cc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3814,7 +3814,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
if (r)
goto out;
}
-#if CONFIG_HAS_IOMEM
if (vcpu->mmio_needed) {
memcpy(vcpu->mmio_data, kvm_run->mmio.data, 8);
vcpu->mmio_read_completed = 1;
@@ -3832,7 +3831,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
goto out;
}
}
-#endif
if (kvm_run->exit_reason == KVM_EXIT_HYPERCALL)
kvm_register_write(vcpu, VCPU_REGS_RAX,
kvm_run->hypercall.ret);
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 11/35] KVM: Xen PV-on-HVM guest support
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (9 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 10/35] KVM: x86: Drop unneeded CONFIG_HAS_IOMEM check Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 12/35] KVM: x86: Fix guest single-stepping while interruptible Avi Kivity
` (23 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Ed Swierk <eswierk@aristanetworks.com>
Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.
A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future. Thus this patch
adds special support for the Xen HVM MSR.
I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files. When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.
I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.
[jan: fix i386 build warning]
[avi: future proof abi with a flags field]
Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
Documentation/kvm/api.txt | 24 ++++++++++++++++++++
arch/x86/include/asm/kvm.h | 1 +
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/x86.c | 46 +++++++++++++++++++++++++++++++++++++++
include/linux/kvm.h | 16 +++++++++++++
5 files changed, 89 insertions(+), 0 deletions(-)
diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index 5a4bc8c..3e8684e 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -593,6 +593,30 @@ struct kvm_irqchip {
} chip;
};
+4.27 KVM_XEN_HVM_CONFIG
+
+Capability: KVM_CAP_XEN_HVM
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_xen_hvm_config (in)
+Returns: 0 on success, -1 on error
+
+Sets the MSR that the Xen HVM guest uses to initialize its hypercall
+page, and provides the starting address and size of the hypercall
+blobs in userspace. When the guest writes the MSR, kvm copies one
+page of a blob (32- or 64-bit, depending on the vcpu mode) to guest
+memory.
+
+struct kvm_xen_hvm_config {
+ __u32 flags;
+ __u32 msr;
+ __u64 blob_addr_32;
+ __u64 blob_addr_64;
+ __u8 blob_size_32;
+ __u8 blob_size_64;
+ __u8 pad2[30];
+};
+
5. The kvm_run structure
Application code obtains a pointer to the kvm_run structure by
diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index f02e87a..ef9b4b7 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -19,6 +19,7 @@
#define __KVM_HAVE_MSIX
#define __KVM_HAVE_MCE
#define __KVM_HAVE_PIT_STATE2
+#define __KVM_HAVE_XEN_HVM
/* Architectural interrupt line count. */
#define KVM_NR_INTERRUPTS 256
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 179a919..36f3b53 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -410,6 +410,8 @@ struct kvm_arch{
unsigned long irq_sources_bitmap;
u64 vm_init_tsc;
+
+ struct kvm_xen_hvm_config xen_hvm_config;
};
struct kvm_vm_stat {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5d450cc..bb842db 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -857,6 +857,38 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 data)
return 0;
}
+static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data)
+{
+ struct kvm *kvm = vcpu->kvm;
+ int lm = is_long_mode(vcpu);
+ u8 *blob_addr = lm ? (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_64
+ : (u8 *)(long)kvm->arch.xen_hvm_config.blob_addr_32;
+ u8 blob_size = lm ? kvm->arch.xen_hvm_config.blob_size_64
+ : kvm->arch.xen_hvm_config.blob_size_32;
+ u32 page_num = data & ~PAGE_MASK;
+ u64 page_addr = data & PAGE_MASK;
+ u8 *page;
+ int r;
+
+ r = -E2BIG;
+ if (page_num >= blob_size)
+ goto out;
+ r = -ENOMEM;
+ page = kzalloc(PAGE_SIZE, GFP_KERNEL);
+ if (!page)
+ goto out;
+ r = -EFAULT;
+ if (copy_from_user(page, blob_addr + (page_num * PAGE_SIZE), PAGE_SIZE))
+ goto out_free;
+ if (kvm_write_guest(kvm, page_addr, page, PAGE_SIZE))
+ goto out_free;
+ r = 0;
+out_free:
+ kfree(page);
+out:
+ return r;
+}
+
int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
{
switch (msr) {
@@ -972,6 +1004,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
"0x%x data 0x%llx\n", msr, data);
break;
default:
+ if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr))
+ return xen_hvm_config(vcpu, data);
if (!ignore_msrs) {
pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n",
msr, data);
@@ -1246,6 +1280,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_PIT2:
case KVM_CAP_PIT_STATE2:
case KVM_CAP_SET_IDENTITY_MAP_ADDR:
+ case KVM_CAP_XEN_HVM:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -2441,6 +2476,17 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = 0;
break;
}
+ case KVM_XEN_HVM_CONFIG: {
+ r = -EFAULT;
+ if (copy_from_user(&kvm->arch.xen_hvm_config, argp,
+ sizeof(struct kvm_xen_hvm_config)))
+ goto out;
+ r = -EINVAL;
+ if (kvm->arch.xen_hvm_config.flags)
+ goto out;
+ r = 0;
+ break;
+ }
default:
;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index f8f8900..b694c1d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -436,6 +436,9 @@ struct kvm_ioeventfd {
#endif
#define KVM_CAP_IOEVENTFD 36
#define KVM_CAP_SET_IDENTITY_MAP_ADDR 37
+#ifdef __KVM_HAVE_XEN_HVM
+#define KVM_CAP_XEN_HVM 38
+#endif
#ifdef KVM_CAP_IRQ_ROUTING
@@ -488,6 +491,18 @@ struct kvm_x86_mce {
};
#endif
+#ifdef KVM_CAP_XEN_HVM
+struct kvm_xen_hvm_config {
+ __u32 flags;
+ __u32 msr;
+ __u64 blob_addr_32;
+ __u64 blob_addr_64;
+ __u8 blob_size_32;
+ __u8 blob_size_64;
+ __u8 pad2[30];
+};
+#endif
+
#define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
struct kvm_irqfd {
@@ -546,6 +561,7 @@ struct kvm_irqfd {
#define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config)
#define KVM_SET_BOOT_CPU_ID _IO(KVMIO, 0x78)
#define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
+#define KVM_XEN_HVM_CONFIG _IOW(KVMIO, 0x7a, struct kvm_xen_hvm_config)
/*
* ioctls for vcpu fds
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 12/35] KVM: x86: Fix guest single-stepping while interruptible
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (10 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 11/35] KVM: Xen PV-on-HVM guest support Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 13/35] KVM: SVM: Cleanup NMI singlestep Avi Kivity
` (22 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Jan Kiszka <jan.kiszka@web.de>
Commit 705c5323 opened the doors of hell by unconditionally injecting
single-step flags as long as guest_debug signaled this. This doesn't
work when the guest branches into some interrupt or exception handler
and triggers a vmexit with flag reloading.
Fix it by saving cs:rip when user space requests single-stepping and
restricting the trace flag injection to this guest code position.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 4 +++
arch/x86/kvm/x86.c | 47 +++++++++++++++++++++++---------------
2 files changed, 32 insertions(+), 19 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 36f3b53..2536fbd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -371,6 +371,10 @@ struct kvm_vcpu_arch {
u64 mcg_status;
u64 mcg_ctl;
u64 *mce_banks;
+
+ /* used for guest single stepping over the given code position */
+ u16 singlestep_cs;
+ unsigned long singlestep_rip;
};
struct kvm_mem_alias {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bb842db..13f30aa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -235,25 +235,6 @@ bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl)
}
EXPORT_SYMBOL_GPL(kvm_require_cpl);
-unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
-{
- unsigned long rflags;
-
- rflags = kvm_x86_ops->get_rflags(vcpu);
- if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
- rflags &= ~(unsigned long)(X86_EFLAGS_TF | X86_EFLAGS_RF);
- return rflags;
-}
-EXPORT_SYMBOL_GPL(kvm_get_rflags);
-
-void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
-{
- if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
- rflags |= X86_EFLAGS_TF | X86_EFLAGS_RF;
- kvm_x86_ops->set_rflags(vcpu, rflags);
-}
-EXPORT_SYMBOL_GPL(kvm_set_rflags);
-
/*
* Load the pae pdptrs. Return true is they are all valid.
*/
@@ -4565,6 +4546,12 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
vcpu->arch.switch_db_regs = (vcpu->arch.dr7 & DR7_BP_EN_MASK);
}
+ if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
+ vcpu->arch.singlestep_cs =
+ get_segment_selector(vcpu, VCPU_SREG_CS);
+ vcpu->arch.singlestep_rip = kvm_rip_read(vcpu);
+ }
+
/*
* Trigger an rflags update that will inject or remove the trace
* flags.
@@ -5031,6 +5018,28 @@ int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu)
return kvm_x86_ops->interrupt_allowed(vcpu);
}
+unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
+{
+ unsigned long rflags;
+
+ rflags = kvm_x86_ops->get_rflags(vcpu);
+ if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
+ rflags &= ~(unsigned long)(X86_EFLAGS_TF | X86_EFLAGS_RF);
+ return rflags;
+}
+EXPORT_SYMBOL_GPL(kvm_get_rflags);
+
+void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+ if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP &&
+ vcpu->arch.singlestep_cs ==
+ get_segment_selector(vcpu, VCPU_SREG_CS) &&
+ vcpu->arch.singlestep_rip == kvm_rip_read(vcpu))
+ rflags |= X86_EFLAGS_TF | X86_EFLAGS_RF;
+ kvm_x86_ops->set_rflags(vcpu, rflags);
+}
+EXPORT_SYMBOL_GPL(kvm_set_rflags);
+
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 13/35] KVM: SVM: Cleanup NMI singlestep
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (11 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 12/35] KVM: x86: Fix guest single-stepping while interruptible Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 14/35] KVM: fix irq_source_id size verification Avi Kivity
` (21 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Jan Kiszka <jan.kiszka@web.de>
Push the NMI-related singlestep variable into vcpu_svm. It's dealing
with an AMD-specific deficit, nothing generic for x86.
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/svm.c | 12 +++++++-----
2 files changed, 7 insertions(+), 6 deletions(-)
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/svm.c | 12 +++++++-----
2 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2536fbd..4d994ad 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -354,7 +354,6 @@ struct kvm_vcpu_arch {
unsigned int time_offset;
struct page *time_page;
- bool singlestep; /* guest is single stepped by KVM */
bool nmi_pending;
bool nmi_injected;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 170b2d9..ffa6ad2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -107,6 +107,8 @@ struct vcpu_svm {
u32 *msrpm;
struct nested_state nested;
+
+ bool nmi_singlestep;
};
/* enable NPT for AMD64 and X86 with PAE */
@@ -1050,7 +1052,7 @@ static void update_db_intercept(struct kvm_vcpu *vcpu)
svm->vmcb->control.intercept_exceptions &=
~((1 << DB_VECTOR) | (1 << BP_VECTOR));
- if (vcpu->arch.singlestep)
+ if (svm->nmi_singlestep)
svm->vmcb->control.intercept_exceptions |= (1 << DB_VECTOR);
if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
@@ -1195,13 +1197,13 @@ static int db_interception(struct vcpu_svm *svm)
if (!(svm->vcpu.guest_debug &
(KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) &&
- !svm->vcpu.arch.singlestep) {
+ !svm->nmi_singlestep) {
kvm_queue_exception(&svm->vcpu, DB_VECTOR);
return 1;
}
- if (svm->vcpu.arch.singlestep) {
- svm->vcpu.arch.singlestep = false;
+ if (svm->nmi_singlestep) {
+ svm->nmi_singlestep = false;
if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
svm->vmcb->save.rflags &=
~(X86_EFLAGS_TF | X86_EFLAGS_RF);
@@ -2543,7 +2545,7 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
/* Something prevents NMI from been injected. Single step over
possible problem (IRET or exception injection or interrupt
shadow) */
- vcpu->arch.singlestep = true;
+ svm->nmi_singlestep = true;
svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
update_db_intercept(vcpu);
}
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 14/35] KVM: fix irq_source_id size verification
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (12 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 13/35] KVM: SVM: Cleanup NMI singlestep Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 15/35] KVM: allow userspace to adjust kvmclock offset Avi Kivity
` (20 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Marcelo Tosatti <mtosatti@redhat.com>
find_first_zero_bit works with bit numbers, not bytes.
Fixes
https://sourceforge.net/tracker/?func=detail&aid=2847560&group_id=180599&atid=893831
Reported-by: "Xu, Jiajun" <jiajun.xu@intel.com>
Cc: stable@kernel.org
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
virt/kvm/irq_comm.c | 7 +++----
1 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 00c68d2..0d454d3 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -215,10 +215,9 @@ int kvm_request_irq_source_id(struct kvm *kvm)
int irq_source_id;
mutex_lock(&kvm->irq_lock);
- irq_source_id = find_first_zero_bit(bitmap,
- sizeof(kvm->arch.irq_sources_bitmap));
+ irq_source_id = find_first_zero_bit(bitmap, BITS_PER_LONG);
- if (irq_source_id >= sizeof(kvm->arch.irq_sources_bitmap)) {
+ if (irq_source_id >= BITS_PER_LONG) {
printk(KERN_WARNING "kvm: exhaust allocatable IRQ sources!\n");
irq_source_id = -EFAULT;
goto unlock;
@@ -240,7 +239,7 @@ void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id)
mutex_lock(&kvm->irq_lock);
if (irq_source_id < 0 ||
- irq_source_id >= sizeof(kvm->arch.irq_sources_bitmap)) {
+ irq_source_id >= BITS_PER_LONG) {
printk(KERN_ERR "kvm: IRQ source ID out of range!\n");
goto unlock;
}
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 15/35] KVM: allow userspace to adjust kvmclock offset
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (13 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 14/35] KVM: fix irq_source_id size verification Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2010-01-29 13:32 ` Alexander Graf
2009-11-19 13:34 ` [PATCH 16/35] KVM: Enable 32bit dirty log pointers on 64bit host Avi Kivity
` (19 subsequent siblings)
34 siblings, 1 reply; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Glauber Costa <glommer@redhat.com>
When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.
Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.
This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.
[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
Documentation/kvm/api.txt | 36 +++++++++++++++++++++++++++++++++
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/x86.c | 42 ++++++++++++++++++++++++++++++++++++++-
include/linux/kvm.h | 10 +++++++++
4 files changed, 88 insertions(+), 1 deletions(-)
diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index 3e8684e..36594ba 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -617,6 +617,42 @@ struct kvm_xen_hvm_config {
__u8 pad2[30];
};
+4.27 KVM_GET_CLOCK
+
+Capability: KVM_CAP_ADJUST_CLOCK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_clock_data (out)
+Returns: 0 on success, -1 on error
+
+Gets the current timestamp of kvmclock as seen by the current guest. In
+conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
+such as migration.
+
+struct kvm_clock_data {
+ __u64 clock; /* kvmclock current value */
+ __u32 flags;
+ __u32 pad[9];
+};
+
+4.28 KVM_SET_CLOCK
+
+Capability: KVM_CAP_ADJUST_CLOCK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_clock_data (in)
+Returns: 0 on success, -1 on error
+
+Sets the current timestamp of kvmclock to the valued specific in its parameter.
+In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
+such as migration.
+
+struct kvm_clock_data {
+ __u64 clock; /* kvmclock current value */
+ __u32 flags;
+ __u32 pad[9];
+};
+
5. The kvm_run structure
Application code obtains a pointer to the kvm_run structure by
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4d994ad..0558ff8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -413,6 +413,7 @@ struct kvm_arch{
unsigned long irq_sources_bitmap;
u64 vm_init_tsc;
+ s64 kvmclock_offset;
struct kvm_xen_hvm_config xen_hvm_config;
};
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 13f30aa..e16cdc9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -680,7 +680,8 @@ static void kvm_write_guest_time(struct kvm_vcpu *v)
/* With all the info we got, fill in the values */
vcpu->hv_clock.system_time = ts.tv_nsec +
- (NSEC_PER_SEC * (u64)ts.tv_sec);
+ (NSEC_PER_SEC * (u64)ts.tv_sec) + v->kvm->arch.kvmclock_offset;
+
/*
* The interface expects us to write an even number signaling that the
* update is finished. Since the guest won't see the intermediate
@@ -1262,6 +1263,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_PIT_STATE2:
case KVM_CAP_SET_IDENTITY_MAP_ADDR:
case KVM_CAP_XEN_HVM:
+ case KVM_CAP_ADJUST_CLOCK:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -2468,6 +2470,44 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = 0;
break;
}
+ case KVM_SET_CLOCK: {
+ struct timespec now;
+ struct kvm_clock_data user_ns;
+ u64 now_ns;
+ s64 delta;
+
+ r = -EFAULT;
+ if (copy_from_user(&user_ns, argp, sizeof(user_ns)))
+ goto out;
+
+ r = -EINVAL;
+ if (user_ns.flags)
+ goto out;
+
+ r = 0;
+ ktime_get_ts(&now);
+ now_ns = timespec_to_ns(&now);
+ delta = user_ns.clock - now_ns;
+ kvm->arch.kvmclock_offset = delta;
+ break;
+ }
+ case KVM_GET_CLOCK: {
+ struct timespec now;
+ struct kvm_clock_data user_ns;
+ u64 now_ns;
+
+ ktime_get_ts(&now);
+ now_ns = timespec_to_ns(&now);
+ user_ns.clock = kvm->arch.kvmclock_offset + now_ns;
+ user_ns.flags = 0;
+
+ r = -EFAULT;
+ if (copy_to_user(argp, &user_ns, sizeof(user_ns)))
+ goto out;
+ r = 0;
+ break;
+ }
+
default:
;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index b694c1d..6ed1a12 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -439,6 +439,7 @@ struct kvm_ioeventfd {
#ifdef __KVM_HAVE_XEN_HVM
#define KVM_CAP_XEN_HVM 38
#endif
+#define KVM_CAP_ADJUST_CLOCK 39
#ifdef KVM_CAP_IRQ_ROUTING
@@ -512,6 +513,12 @@ struct kvm_irqfd {
__u8 pad[20];
};
+struct kvm_clock_data {
+ __u64 clock;
+ __u32 flags;
+ __u32 pad[9];
+};
+
/*
* ioctls for VM fds
*/
@@ -562,6 +569,9 @@ struct kvm_irqfd {
#define KVM_SET_BOOT_CPU_ID _IO(KVMIO, 0x78)
#define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
#define KVM_XEN_HVM_CONFIG _IOW(KVMIO, 0x7a, struct kvm_xen_hvm_config)
+#define KVM_SET_CLOCK _IOW(KVMIO, 0x7b, struct kvm_clock_data)
+#define KVM_GET_CLOCK _IOR(KVMIO, 0x7c, struct kvm_clock_data)
+
/*
* ioctls for vcpu fds
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 16/35] KVM: Enable 32bit dirty log pointers on 64bit host
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (14 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 15/35] KVM: allow userspace to adjust kvmclock offset Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 17/35] KVM: VMX: Use macros instead of hex value on cr0 initialization Avi Kivity
` (18 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Arnd Bergmann <arnd@arndb.de>
With big endian userspace, we can't quite figure out if a pointer
is 32 bit (shifted >> 32) or 64 bit when we read a 64 bit pointer.
This is what happens with dirty logging. To get the pointer interpreted
correctly, we thus need Arnd's patch to implement a compat layer for
the ioctl:
A better way to do this is to add a separate compat_ioctl() method that
converts this for you.
Based on initial patch from Arnd Bergmann.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
virt/kvm/kvm_main.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 50 insertions(+), 1 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index cac69c4..bd44fb4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -43,6 +43,7 @@
#include <linux/swap.h>
#include <linux/bitops.h>
#include <linux/spinlock.h>
+#include <linux/compat.h>
#include <asm/processor.h>
#include <asm/io.h>
@@ -1542,6 +1543,52 @@ out:
return r;
}
+#ifdef CONFIG_COMPAT
+struct compat_kvm_dirty_log {
+ __u32 slot;
+ __u32 padding1;
+ union {
+ compat_uptr_t dirty_bitmap; /* one bit per page */
+ __u64 padding2;
+ };
+};
+
+static long kvm_vm_compat_ioctl(struct file *filp,
+ unsigned int ioctl, unsigned long arg)
+{
+ struct kvm *kvm = filp->private_data;
+ int r;
+
+ if (kvm->mm != current->mm)
+ return -EIO;
+ switch (ioctl) {
+ case KVM_GET_DIRTY_LOG: {
+ struct compat_kvm_dirty_log compat_log;
+ struct kvm_dirty_log log;
+
+ r = -EFAULT;
+ if (copy_from_user(&compat_log, (void __user *)arg,
+ sizeof(compat_log)))
+ goto out;
+ log.slot = compat_log.slot;
+ log.padding1 = compat_log.padding1;
+ log.padding2 = compat_log.padding2;
+ log.dirty_bitmap = compat_ptr(compat_log.dirty_bitmap);
+
+ r = kvm_vm_ioctl_get_dirty_log(kvm, &log);
+ if (r)
+ goto out;
+ break;
+ }
+ default:
+ r = kvm_vm_ioctl(filp, ioctl, arg);
+ }
+
+out:
+ return r;
+}
+#endif
+
static int kvm_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
struct page *page[1];
@@ -1576,7 +1623,9 @@ static int kvm_vm_mmap(struct file *file, struct vm_area_struct *vma)
static struct file_operations kvm_vm_fops = {
.release = kvm_vm_release,
.unlocked_ioctl = kvm_vm_ioctl,
- .compat_ioctl = kvm_vm_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = kvm_vm_compat_ioctl,
+#endif
.mmap = kvm_vm_mmap,
};
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 17/35] KVM: VMX: Use macros instead of hex value on cr0 initialization
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (15 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 16/35] KVM: Enable 32bit dirty log pointers on 64bit host Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 18/35] KVM: SVM: Reset cr0 properly on vcpu reset Avi Kivity
` (17 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Eduardo Habkost <ehabkost@redhat.com>
This should have no effect, it is just to make the code clearer.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/vmx.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 364263a..1773017 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2538,7 +2538,7 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
if (vmx->vpid != 0)
vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
- vmx->vcpu.arch.cr0 = 0x60000010;
+ vmx->vcpu.arch.cr0 = X86_CR0_NW | X86_CR0_CD | X86_CR0_ET;
vmx_set_cr0(&vmx->vcpu, vmx->vcpu.arch.cr0); /* enter rmode */
vmx_set_cr4(&vmx->vcpu, 0);
vmx_set_efer(&vmx->vcpu, 0);
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 18/35] KVM: SVM: Reset cr0 properly on vcpu reset
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (16 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 17/35] KVM: VMX: Use macros instead of hex value on cr0 initialization Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 19/35] KVM: SVM: init_vmcb(): remove redundant save->cr0 initialization Avi Kivity
` (16 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Eduardo Habkost <ehabkost@redhat.com>
svm_vcpu_reset() was not properly resetting the contents of the guest-visible
cr0 register, causing the following issue:
https://bugzilla.redhat.com/show_bug.cgi?id=525699
Without resetting cr0 properly, the vcpu was running the SIPI bootstrap routine
with paging enabled, making the vcpu get a pagefault exception while trying to
run it.
Instead of setting vmcb->save.cr0 directly, the new code just resets
kvm->arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on
vmcb->save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0().
kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure
kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/svm.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ffa6ad2..c9ef6c0 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -628,11 +628,12 @@ static void init_vmcb(struct vcpu_svm *svm)
save->rip = 0x0000fff0;
svm->vcpu.arch.regs[VCPU_REGS_RIP] = save->rip;
- /*
- * cr0 val on cpu init should be 0x60000010, we enable cpu
- * cache by default. the orderly way is to enable cache in bios.
+ /* This is the guest-visible cr0 value.
+ * svm_set_cr0() sets PG and WP and clears NW and CD on save->cr0.
*/
- save->cr0 = 0x00000010 | X86_CR0_PG | X86_CR0_WP;
+ svm->vcpu.arch.cr0 = X86_CR0_NW | X86_CR0_CD | X86_CR0_ET;
+ kvm_set_cr0(&svm->vcpu, svm->vcpu.arch.cr0);
+
save->cr4 = X86_CR4_PAE;
/* rdx = ?? */
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 19/35] KVM: SVM: init_vmcb(): remove redundant save->cr0 initialization
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (17 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 18/35] KVM: SVM: Reset cr0 properly on vcpu reset Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 20/35] KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx autoload msr area Avi Kivity
` (15 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Eduardo Habkost <ehabkost@redhat.com>
The svm_set_cr0() call will initialize save->cr0 properly even when npt is
enabled, clearing the NW and CD bits as expected, so we don't need to
initialize it manually for npt_enabled anymore.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/svm.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c9ef6c0..34b700f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -648,8 +648,6 @@ static void init_vmcb(struct vcpu_svm *svm)
control->intercept_cr_write &= ~(INTERCEPT_CR0_MASK|
INTERCEPT_CR3_MASK);
save->g_pat = 0x0007040600070406ULL;
- /* enable caching because the QEMU Bios doesn't enable it */
- save->cr0 = X86_CR0_ET;
save->cr3 = 0;
save->cr4 = 0;
}
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 20/35] KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx autoload msr area
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (18 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 19/35] KVM: SVM: init_vmcb(): remove redundant save->cr0 initialization Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 21/35] KVM: x86 shared msr infrastructure Avi Kivity
` (14 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
Currently MSR_KERNEL_GS_BASE is saved and restored as part of the
guest/host msr reloading. Since we wish to lazy-restore all the other
msrs, save and reload MSR_KERNEL_GS_BASE explicitly instead of using
the common code.
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/vmx.c | 39 ++++++++++++++++++++++++++-------------
1 files changed, 26 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1773017..3251251 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -99,7 +99,8 @@ struct vcpu_vmx {
int save_nmsrs;
int msr_offset_efer;
#ifdef CONFIG_X86_64
- int msr_offset_kernel_gs_base;
+ u64 msr_host_kernel_gs_base;
+ u64 msr_guest_kernel_gs_base;
#endif
struct vmcs *vmcs;
struct {
@@ -202,7 +203,7 @@ static void ept_save_pdptrs(struct kvm_vcpu *vcpu);
*/
static const u32 vmx_msr_index[] = {
#ifdef CONFIG_X86_64
- MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR, MSR_KERNEL_GS_BASE,
+ MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR,
#endif
MSR_EFER, MSR_K6_STAR,
};
@@ -674,10 +675,10 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu)
#endif
#ifdef CONFIG_X86_64
- if (is_long_mode(&vmx->vcpu))
- save_msrs(vmx->host_msrs +
- vmx->msr_offset_kernel_gs_base, 1);
-
+ if (is_long_mode(&vmx->vcpu)) {
+ rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
+ wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
+ }
#endif
load_msrs(vmx->guest_msrs, vmx->save_nmsrs);
load_transition_efer(vmx);
@@ -711,6 +712,12 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
save_msrs(vmx->guest_msrs, vmx->save_nmsrs);
load_msrs(vmx->host_msrs, vmx->save_nmsrs);
reload_host_efer(vmx);
+#ifdef CONFIG_X86_64
+ if (is_long_mode(&vmx->vcpu)) {
+ rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
+ wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
+ }
+#endif
}
static void vmx_load_host_state(struct vcpu_vmx *vmx)
@@ -940,9 +947,6 @@ static void setup_msrs(struct vcpu_vmx *vmx)
index = __find_msr_index(vmx, MSR_CSTAR);
if (index >= 0)
move_msr_up(vmx, index, save_nmsrs++);
- index = __find_msr_index(vmx, MSR_KERNEL_GS_BASE);
- if (index >= 0)
- move_msr_up(vmx, index, save_nmsrs++);
/*
* MSR_K6_STAR is only needed on long mode guests, and only
* if efer.sce is enabled.
@@ -954,10 +958,6 @@ static void setup_msrs(struct vcpu_vmx *vmx)
#endif
vmx->save_nmsrs = save_nmsrs;
-#ifdef CONFIG_X86_64
- vmx->msr_offset_kernel_gs_base =
- __find_msr_index(vmx, MSR_KERNEL_GS_BASE);
-#endif
vmx->msr_offset_efer = __find_msr_index(vmx, MSR_EFER);
if (cpu_has_vmx_msr_bitmap()) {
@@ -1015,6 +1015,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
case MSR_GS_BASE:
data = vmcs_readl(GUEST_GS_BASE);
break;
+ case MSR_KERNEL_GS_BASE:
+ vmx_load_host_state(to_vmx(vcpu));
+ data = to_vmx(vcpu)->msr_guest_kernel_gs_base;
+ break;
case MSR_EFER:
return kvm_get_msr_common(vcpu, msr_index, pdata);
#endif
@@ -1068,6 +1072,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
case MSR_GS_BASE:
vmcs_writel(GUEST_GS_BASE, data);
break;
+ case MSR_KERNEL_GS_BASE:
+ vmx_load_host_state(vmx);
+ vmx->msr_guest_kernel_gs_base = data;
+ break;
#endif
case MSR_IA32_SYSENTER_CS:
vmcs_write32(GUEST_SYSENTER_CS, data);
@@ -1559,6 +1567,11 @@ static void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct kvm_msr_entry *msr = find_msr_entry(vmx, MSR_EFER);
+ /*
+ * Force kernel_gs_base reloading before EFER changes, as control
+ * of this msr depends on is_long_mode().
+ */
+ vmx_load_host_state(to_vmx(vcpu));
vcpu->arch.shadow_efer = efer;
if (!msr)
return;
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 21/35] KVM: x86 shared msr infrastructure
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (19 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 20/35] KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx autoload msr area Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 22/35] KVM: VMX: Use " Avi Kivity
` (13 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
The various syscall-related MSRs are fairly expensive to switch. Currently
we switch them on every vcpu preemption, which is far too often:
- if we're switching to a kernel thread (idle task, threaded interrupt,
kernel-mode virtio server (vhost-net), for example) and back, then
there's no need to switch those MSRs since kernel threasd won't
be exiting to userspace.
- if we're switching to another guest running an identical OS, most likely
those MSRs will have the same value, so there's little point in reloading
them.
- if we're running the same OS on the guest and host, the MSRs will have
identical values and reloading is unnecessary.
This patch uses the new user return notifiers to implement last-minute
switching, and checks the msr values to avoid unnecessary reloading.
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 3 +
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/x86.c | 81 +++++++++++++++++++++++++++++++++++++++
3 files changed, 85 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0558ff8..26a74b7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -809,4 +809,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
+void kvm_define_shared_msr(unsigned index, u32 msr);
+void kvm_set_shared_msr(unsigned index, u64 val);
+
#endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index b84e571..4cd4983 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -28,6 +28,7 @@ config KVM
select HAVE_KVM_IRQCHIP
select HAVE_KVM_EVENTFD
select KVM_APIC_ARCHITECTURE
+ select USER_RETURN_NOTIFIER
---help---
Support hosting fully virtualized guest machines using hardware
virtualization extensions. You will need a fairly recent
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e16cdc9..58c5cdd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -37,6 +37,7 @@
#include <linux/iommu.h>
#include <linux/intel-iommu.h>
#include <linux/cpufreq.h>
+#include <linux/user-return-notifier.h>
#include <trace/events/kvm.h>
#undef TRACE_INCLUDE_FILE
#define CREATE_TRACE_POINTS
@@ -87,6 +88,25 @@ EXPORT_SYMBOL_GPL(kvm_x86_ops);
int ignore_msrs = 0;
module_param_named(ignore_msrs, ignore_msrs, bool, S_IRUGO | S_IWUSR);
+#define KVM_NR_SHARED_MSRS 16
+
+struct kvm_shared_msrs_global {
+ int nr;
+ struct kvm_shared_msr {
+ u32 msr;
+ u64 value;
+ } msrs[KVM_NR_SHARED_MSRS];
+};
+
+struct kvm_shared_msrs {
+ struct user_return_notifier urn;
+ bool registered;
+ u64 current_value[KVM_NR_SHARED_MSRS];
+};
+
+static struct kvm_shared_msrs_global __read_mostly shared_msrs_global;
+static DEFINE_PER_CPU(struct kvm_shared_msrs, shared_msrs);
+
struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "pf_fixed", VCPU_STAT(pf_fixed) },
{ "pf_guest", VCPU_STAT(pf_guest) },
@@ -123,6 +143,64 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
};
+static void kvm_on_user_return(struct user_return_notifier *urn)
+{
+ unsigned slot;
+ struct kvm_shared_msr *global;
+ struct kvm_shared_msrs *locals
+ = container_of(urn, struct kvm_shared_msrs, urn);
+
+ for (slot = 0; slot < shared_msrs_global.nr; ++slot) {
+ global = &shared_msrs_global.msrs[slot];
+ if (global->value != locals->current_value[slot]) {
+ wrmsrl(global->msr, global->value);
+ locals->current_value[slot] = global->value;
+ }
+ }
+ locals->registered = false;
+ user_return_notifier_unregister(urn);
+}
+
+void kvm_define_shared_msr(unsigned slot, u32 msr)
+{
+ int cpu;
+ u64 value;
+
+ if (slot >= shared_msrs_global.nr)
+ shared_msrs_global.nr = slot + 1;
+ shared_msrs_global.msrs[slot].msr = msr;
+ rdmsrl_safe(msr, &value);
+ shared_msrs_global.msrs[slot].value = value;
+ for_each_online_cpu(cpu)
+ per_cpu(shared_msrs, cpu).current_value[slot] = value;
+}
+EXPORT_SYMBOL_GPL(kvm_define_shared_msr);
+
+static void kvm_shared_msr_cpu_online(void)
+{
+ unsigned i;
+ struct kvm_shared_msrs *locals = &__get_cpu_var(shared_msrs);
+
+ for (i = 0; i < shared_msrs_global.nr; ++i)
+ locals->current_value[i] = shared_msrs_global.msrs[i].value;
+}
+
+void kvm_set_shared_msr(unsigned slot, u64 value)
+{
+ struct kvm_shared_msrs *smsr = &__get_cpu_var(shared_msrs);
+
+ if (value == smsr->current_value[slot])
+ return;
+ smsr->current_value[slot] = value;
+ wrmsrl(shared_msrs_global.msrs[slot].msr, value);
+ if (!smsr->registered) {
+ smsr->urn.on_user_return = kvm_on_user_return;
+ user_return_notifier_register(&smsr->urn);
+ smsr->registered = true;
+ }
+}
+EXPORT_SYMBOL_GPL(kvm_set_shared_msr);
+
unsigned long segment_base(u16 selector)
{
struct descriptor_table gdt;
@@ -4815,6 +4893,9 @@ int kvm_arch_hardware_enable(void *garbage)
int cpu = raw_smp_processor_id();
per_cpu(cpu_tsc_khz, cpu) = 0;
}
+
+ kvm_shared_msr_cpu_online();
+
return kvm_x86_ops->hardware_enable(garbage);
}
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 22/35] KVM: VMX: Use shared msr infrastructure
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (20 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 21/35] KVM: x86 shared msr infrastructure Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:34 ` [PATCH 23/35] KVM: powerpc: Fix BUILD_BUG_ON condition Avi Kivity
` (12 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
Instead of reloading syscall MSRs on every preemption, use the new shared
msr infrastructure to reload them at the last possible minute (just before
exit to userspace).
Improves vcpu/idle/vcpu switches by about 2000 cycles (when EFER needs to be
reloaded as well).
[jan: fix slot index missing indirection]
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/vmx.c | 112 +++++++++++++++++++--------------------------------
1 files changed, 42 insertions(+), 70 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3251251..bf46253 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -86,6 +86,11 @@ struct vmcs {
char data[0];
};
+struct shared_msr_entry {
+ unsigned index;
+ u64 data;
+};
+
struct vcpu_vmx {
struct kvm_vcpu vcpu;
struct list_head local_vcpus_link;
@@ -93,8 +98,7 @@ struct vcpu_vmx {
int launched;
u8 fail;
u32 idt_vectoring_info;
- struct kvm_msr_entry *guest_msrs;
- struct kvm_msr_entry *host_msrs;
+ struct shared_msr_entry *guest_msrs;
int nmsrs;
int save_nmsrs;
int msr_offset_efer;
@@ -108,7 +112,6 @@ struct vcpu_vmx {
u16 fs_sel, gs_sel, ldt_sel;
int gs_ldt_reload_needed;
int fs_reload_needed;
- int guest_efer_loaded;
} host_state;
struct {
int vm86_active;
@@ -195,6 +198,8 @@ static struct kvm_vmx_segment_field {
VMX_SEGMENT_FIELD(LDTR),
};
+static u64 host_efer;
+
static void ept_save_pdptrs(struct kvm_vcpu *vcpu);
/*
@@ -209,22 +214,6 @@ static const u32 vmx_msr_index[] = {
};
#define NR_VMX_MSR ARRAY_SIZE(vmx_msr_index)
-static void load_msrs(struct kvm_msr_entry *e, int n)
-{
- int i;
-
- for (i = 0; i < n; ++i)
- wrmsrl(e[i].index, e[i].data);
-}
-
-static void save_msrs(struct kvm_msr_entry *e, int n)
-{
- int i;
-
- for (i = 0; i < n; ++i)
- rdmsrl(e[i].index, e[i].data);
-}
-
static inline int is_page_fault(u32 intr_info)
{
return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VECTOR_MASK |
@@ -373,7 +362,7 @@ static int __find_msr_index(struct vcpu_vmx *vmx, u32 msr)
int i;
for (i = 0; i < vmx->nmsrs; ++i)
- if (vmx->guest_msrs[i].index == msr)
+ if (vmx_msr_index[vmx->guest_msrs[i].index] == msr)
return i;
return -1;
}
@@ -404,7 +393,7 @@ static inline void __invept(int ext, u64 eptp, gpa_t gpa)
: : "a" (&operand), "c" (ext) : "cc", "memory");
}
-static struct kvm_msr_entry *find_msr_entry(struct vcpu_vmx *vmx, u32 msr)
+static struct shared_msr_entry *find_msr_entry(struct vcpu_vmx *vmx, u32 msr)
{
int i;
@@ -595,17 +584,15 @@ static void reload_tss(void)
load_TR_desc();
}
-static void load_transition_efer(struct vcpu_vmx *vmx)
+static bool update_transition_efer(struct vcpu_vmx *vmx)
{
int efer_offset = vmx->msr_offset_efer;
- u64 host_efer;
u64 guest_efer;
u64 ignore_bits;
if (efer_offset < 0)
- return;
- host_efer = vmx->host_msrs[efer_offset].data;
- guest_efer = vmx->guest_msrs[efer_offset].data;
+ return false;
+ guest_efer = vmx->vcpu.arch.shadow_efer;
/*
* NX is emulated; LMA and LME handled by hardware; SCE meaninless
@@ -619,26 +606,18 @@ static void load_transition_efer(struct vcpu_vmx *vmx)
ignore_bits &= ~(u64)EFER_SCE;
#endif
if ((guest_efer & ~ignore_bits) == (host_efer & ~ignore_bits))
- return;
+ return false;
- vmx->host_state.guest_efer_loaded = 1;
guest_efer &= ~ignore_bits;
guest_efer |= host_efer & ignore_bits;
- wrmsrl(MSR_EFER, guest_efer);
- vmx->vcpu.stat.efer_reload++;
-}
-
-static void reload_host_efer(struct vcpu_vmx *vmx)
-{
- if (vmx->host_state.guest_efer_loaded) {
- vmx->host_state.guest_efer_loaded = 0;
- load_msrs(vmx->host_msrs + vmx->msr_offset_efer, 1);
- }
+ vmx->guest_msrs[efer_offset].data = guest_efer;
+ return true;
}
static void vmx_save_host_state(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ int i;
if (vmx->host_state.loaded)
return;
@@ -680,8 +659,9 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu)
wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
}
#endif
- load_msrs(vmx->guest_msrs, vmx->save_nmsrs);
- load_transition_efer(vmx);
+ for (i = 0; i < vmx->save_nmsrs; ++i)
+ kvm_set_shared_msr(vmx->guest_msrs[i].index,
+ vmx->guest_msrs[i].data);
}
static void __vmx_load_host_state(struct vcpu_vmx *vmx)
@@ -709,9 +689,6 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
local_irq_restore(flags);
}
reload_tss();
- save_msrs(vmx->guest_msrs, vmx->save_nmsrs);
- load_msrs(vmx->host_msrs, vmx->save_nmsrs);
- reload_host_efer(vmx);
#ifdef CONFIG_X86_64
if (is_long_mode(&vmx->vcpu)) {
rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
@@ -908,19 +885,14 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
/*
* Swap MSR entry in host/guest MSR entry array.
*/
-#ifdef CONFIG_X86_64
static void move_msr_up(struct vcpu_vmx *vmx, int from, int to)
{
- struct kvm_msr_entry tmp;
+ struct shared_msr_entry tmp;
tmp = vmx->guest_msrs[to];
vmx->guest_msrs[to] = vmx->guest_msrs[from];
vmx->guest_msrs[from] = tmp;
- tmp = vmx->host_msrs[to];
- vmx->host_msrs[to] = vmx->host_msrs[from];
- vmx->host_msrs[from] = tmp;
}
-#endif
/*
* Set up the vmcs to automatically save and restore system
@@ -929,15 +901,13 @@ static void move_msr_up(struct vcpu_vmx *vmx, int from, int to)
*/
static void setup_msrs(struct vcpu_vmx *vmx)
{
- int save_nmsrs;
+ int save_nmsrs, index;
unsigned long *msr_bitmap;
vmx_load_host_state(vmx);
save_nmsrs = 0;
#ifdef CONFIG_X86_64
if (is_long_mode(&vmx->vcpu)) {
- int index;
-
index = __find_msr_index(vmx, MSR_SYSCALL_MASK);
if (index >= 0)
move_msr_up(vmx, index, save_nmsrs++);
@@ -956,9 +926,11 @@ static void setup_msrs(struct vcpu_vmx *vmx)
move_msr_up(vmx, index, save_nmsrs++);
}
#endif
- vmx->save_nmsrs = save_nmsrs;
+ vmx->msr_offset_efer = index = __find_msr_index(vmx, MSR_EFER);
+ if (index >= 0 && update_transition_efer(vmx))
+ move_msr_up(vmx, index, save_nmsrs++);
- vmx->msr_offset_efer = __find_msr_index(vmx, MSR_EFER);
+ vmx->save_nmsrs = save_nmsrs;
if (cpu_has_vmx_msr_bitmap()) {
if (is_long_mode(&vmx->vcpu))
@@ -1000,7 +972,7 @@ static void guest_write_tsc(u64 guest_tsc, u64 host_tsc)
static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
{
u64 data;
- struct kvm_msr_entry *msr;
+ struct shared_msr_entry *msr;
if (!pdata) {
printk(KERN_ERR "BUG: get_msr called with NULL pdata\n");
@@ -1019,9 +991,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
vmx_load_host_state(to_vmx(vcpu));
data = to_vmx(vcpu)->msr_guest_kernel_gs_base;
break;
+#endif
case MSR_EFER:
return kvm_get_msr_common(vcpu, msr_index, pdata);
-#endif
case MSR_IA32_TSC:
data = guest_read_tsc();
break;
@@ -1035,6 +1007,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
data = vmcs_readl(GUEST_SYSENTER_ESP);
break;
default:
+ vmx_load_host_state(to_vmx(vcpu));
msr = find_msr_entry(to_vmx(vcpu), msr_index);
if (msr) {
vmx_load_host_state(to_vmx(vcpu));
@@ -1056,7 +1029,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
- struct kvm_msr_entry *msr;
+ struct shared_msr_entry *msr;
u64 host_tsc;
int ret = 0;
@@ -1565,7 +1538,10 @@ continue_rmode:
static void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
- struct kvm_msr_entry *msr = find_msr_entry(vmx, MSR_EFER);
+ struct shared_msr_entry *msr = find_msr_entry(vmx, MSR_EFER);
+
+ if (!msr)
+ return;
/*
* Force kernel_gs_base reloading before EFER changes, as control
@@ -2417,10 +2393,8 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
if (wrmsr_safe(index, data_low, data_high) < 0)
continue;
data = data_low | ((u64)data_high << 32);
- vmx->host_msrs[j].index = index;
- vmx->host_msrs[j].reserved = 0;
- vmx->host_msrs[j].data = data;
- vmx->guest_msrs[j] = vmx->host_msrs[j];
+ vmx->guest_msrs[j].index = i;
+ vmx->guest_msrs[j].data = 0;
++vmx->nmsrs;
}
@@ -3821,7 +3795,6 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
__clear_bit(vmx->vpid, vmx_vpid_bitmap);
spin_unlock(&vmx_vpid_lock);
vmx_free_vmcs(vcpu);
- kfree(vmx->host_msrs);
kfree(vmx->guest_msrs);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, vmx);
@@ -3848,10 +3821,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
goto uninit_vcpu;
}
- vmx->host_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL);
- if (!vmx->host_msrs)
- goto free_guest_msrs;
-
vmx->vmcs = alloc_vmcs();
if (!vmx->vmcs)
goto free_msrs;
@@ -3882,8 +3851,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
free_vmcs:
free_vmcs(vmx->vmcs);
free_msrs:
- kfree(vmx->host_msrs);
-free_guest_msrs:
kfree(vmx->guest_msrs);
uninit_vcpu:
kvm_vcpu_uninit(&vmx->vcpu);
@@ -4033,7 +4000,12 @@ static struct kvm_x86_ops vmx_x86_ops = {
static int __init vmx_init(void)
{
- int r;
+ int r, i;
+
+ rdmsrl_safe(MSR_EFER, &host_efer);
+
+ for (i = 0; i < NR_VMX_MSR; ++i)
+ kvm_define_shared_msr(i, vmx_msr_index[i]);
vmx_io_bitmap_a = (unsigned long *)__get_free_page(GFP_KERNEL);
if (!vmx_io_bitmap_a)
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 23/35] KVM: powerpc: Fix BUILD_BUG_ON condition
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (21 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 22/35] KVM: VMX: Use " Avi Kivity
@ 2009-11-19 13:34 ` Avi Kivity
2009-11-19 13:35 ` [PATCH 24/35] KVM: remove duplicated task_switch check Avi Kivity
` (11 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:34 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Hollis Blanchard <hollisb@us.ibm.com>
The old BUILD_BUG_ON implementation didn't work with __builtin_constant_p().
Fixing that revealed this test had been inverted for a long time without
anybody noticing...
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/powerpc/kvm/timing.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/kvm/timing.h b/arch/powerpc/kvm/timing.h
index 806ef67..8167d42 100644
--- a/arch/powerpc/kvm/timing.h
+++ b/arch/powerpc/kvm/timing.h
@@ -51,7 +51,7 @@ static inline void kvmppc_account_exit_stat(struct kvm_vcpu *vcpu, int type)
/* The BUILD_BUG_ON below breaks in funny ways, commented out
* for now ... -BenH
- BUILD_BUG_ON(__builtin_constant_p(type));
+ BUILD_BUG_ON(!__builtin_constant_p(type));
*/
switch (type) {
case EXT_INTR_EXITS:
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 24/35] KVM: remove duplicated task_switch check
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (22 preceding siblings ...)
2009-11-19 13:34 ` [PATCH 23/35] KVM: powerpc: Fix BUILD_BUG_ON condition Avi Kivity
@ 2009-11-19 13:35 ` Avi Kivity
2009-11-19 13:35 ` [PATCH 25/35] KVM: VMX: move CR3/PDPTR update to vmx_set_cr3 Avi Kivity
` (10 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:35 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Gleb Natapov <gleb@redhat.com>
Probably introduced by a bad merge.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/x86.c | 5 -----
1 files changed, 0 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58c5cdd..dbddcc2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4530,11 +4530,6 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason)
if (reason != TASK_SWITCH_CALL && reason != TASK_SWITCH_GATE)
old_tss_sel = 0xffff;
- /* set back link to prev task only if NT bit is set in eflags
- note that old_tss_sel is not used afetr this point */
- if (reason != TASK_SWITCH_CALL && reason != TASK_SWITCH_GATE)
- old_tss_sel = 0xffff;
-
if (nseg_desc.type & 8)
ret = kvm_task_switch_32(vcpu, tss_selector, old_tss_sel,
old_tss_base, &nseg_desc);
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 25/35] KVM: VMX: move CR3/PDPTR update to vmx_set_cr3
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (23 preceding siblings ...)
2009-11-19 13:35 ` [PATCH 24/35] KVM: remove duplicated task_switch check Avi Kivity
@ 2009-11-19 13:35 ` Avi Kivity
2009-11-19 13:35 ` [PATCH 26/35] KVM: MMU: update invlpg handler comment Avi Kivity
` (9 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:35 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Marcelo Tosatti <mtosatti@redhat.com>
GUEST_CR3 is updated via kvm_set_cr3 whenever CR3 is modified from
outside guest context. Similarly pdptrs are updated via load_pdptrs.
Let kvm_set_cr3 perform the update, removing it from the vcpu_run
fast path.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Acked-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/vmx.c | 5 +----
arch/x86/kvm/x86.c | 4 +++-
2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bf46253..a5f3f3e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1737,6 +1737,7 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
vmcs_write64(EPT_POINTER, eptp);
guest_cr3 = is_paging(vcpu) ? vcpu->arch.cr3 :
vcpu->kvm->arch.ept_identity_map_addr;
+ ept_load_pdptrs(vcpu);
}
vmx_flush_tlb(vcpu);
@@ -3625,10 +3626,6 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
- if (enable_ept && is_paging(vcpu)) {
- vmcs_writel(GUEST_CR3, vcpu->arch.cr3);
- ept_load_pdptrs(vcpu);
- }
/* Record the guest's net vcpu time for enforced NMI injections. */
if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked))
vmx->entry_time = ktime_get();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dbddcc2..719f31e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4591,8 +4591,10 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
mmu_reset_needed |= vcpu->arch.cr4 != sregs->cr4;
kvm_x86_ops->set_cr4(vcpu, sregs->cr4);
- if (!is_long_mode(vcpu) && is_pae(vcpu))
+ if (!is_long_mode(vcpu) && is_pae(vcpu)) {
load_pdptrs(vcpu, vcpu->arch.cr3);
+ mmu_reset_needed = 1;
+ }
if (mmu_reset_needed)
kvm_mmu_reset_context(vcpu);
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 26/35] KVM: MMU: update invlpg handler comment
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (24 preceding siblings ...)
2009-11-19 13:35 ` [PATCH 25/35] KVM: VMX: move CR3/PDPTR update to vmx_set_cr3 Avi Kivity
@ 2009-11-19 13:35 ` Avi Kivity
2009-11-19 13:35 ` [PATCH 27/35] KVM: VMX: Remove vmx->msr_offset_efer Avi Kivity
` (8 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:35 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Marcelo Tosatti <mtosatti@redhat.com>
Large page translations are always synchronized (either in level 3
or level 2), so its not necessary to properly deal with them
in the invlpg handler.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/paging_tmpl.h | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 72558f8..a601713 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -467,7 +467,6 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
level = iterator.level;
sptep = iterator.sptep;
- /* FIXME: properly handle invlpg on large guest pages */
if (level == PT_PAGE_TABLE_LEVEL ||
((level == PT_DIRECTORY_LEVEL && is_large_pte(*sptep))) ||
((level == PT_PDPE_LEVEL && is_large_pte(*sptep)))) {
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 27/35] KVM: VMX: Remove vmx->msr_offset_efer
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (25 preceding siblings ...)
2009-11-19 13:35 ` [PATCH 26/35] KVM: MMU: update invlpg handler comment Avi Kivity
@ 2009-11-19 13:35 ` Avi Kivity
2009-11-19 13:35 ` [PATCH 28/35] KVM: x86: disallow multiple KVM_CREATE_IRQCHIP Avi Kivity
` (7 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:35 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
This variable is used to communicate between a caller and a callee; switch
to a function argument instead.
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/vmx.c | 10 +++-------
1 files changed, 3 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a5f3f3e..c9cc959 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -101,7 +101,6 @@ struct vcpu_vmx {
struct shared_msr_entry *guest_msrs;
int nmsrs;
int save_nmsrs;
- int msr_offset_efer;
#ifdef CONFIG_X86_64
u64 msr_host_kernel_gs_base;
u64 msr_guest_kernel_gs_base;
@@ -584,14 +583,11 @@ static void reload_tss(void)
load_TR_desc();
}
-static bool update_transition_efer(struct vcpu_vmx *vmx)
+static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
{
- int efer_offset = vmx->msr_offset_efer;
u64 guest_efer;
u64 ignore_bits;
- if (efer_offset < 0)
- return false;
guest_efer = vmx->vcpu.arch.shadow_efer;
/*
@@ -926,8 +922,8 @@ static void setup_msrs(struct vcpu_vmx *vmx)
move_msr_up(vmx, index, save_nmsrs++);
}
#endif
- vmx->msr_offset_efer = index = __find_msr_index(vmx, MSR_EFER);
- if (index >= 0 && update_transition_efer(vmx))
+ index = __find_msr_index(vmx, MSR_EFER);
+ if (index >= 0 && update_transition_efer(vmx, index))
move_msr_up(vmx, index, save_nmsrs++);
vmx->save_nmsrs = save_nmsrs;
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 28/35] KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (26 preceding siblings ...)
2009-11-19 13:35 ` [PATCH 27/35] KVM: VMX: Remove vmx->msr_offset_efer Avi Kivity
@ 2009-11-19 13:35 ` Avi Kivity
2009-11-19 13:35 ` [PATCH 29/35] KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic Avi Kivity
` (6 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:35 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Marcelo Tosatti <mtosatti@redhat.com>
Otherwise kvm will leak memory on multiple KVM_CREATE_IRQCHIP.
Also serialize multiple accesses with kvm->lock.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/irq.h | 6 +++++-
arch/x86/kvm/x86.c | 30 ++++++++++++++++++++++--------
2 files changed, 27 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index c025a23..be399e2 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -86,7 +86,11 @@ static inline struct kvm_pic *pic_irqchip(struct kvm *kvm)
static inline int irqchip_in_kernel(struct kvm *kvm)
{
- return pic_irqchip(kvm) != NULL;
+ int ret;
+
+ ret = (pic_irqchip(kvm) != NULL);
+ smp_rmb();
+ return ret;
}
void kvm_pic_reset(struct kvm_kpic_state *s);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 719f31e..97f6f95 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2362,25 +2362,39 @@ long kvm_arch_vm_ioctl(struct file *filp,
if (r)
goto out;
break;
- case KVM_CREATE_IRQCHIP:
+ case KVM_CREATE_IRQCHIP: {
+ struct kvm_pic *vpic;
+
+ mutex_lock(&kvm->lock);
+ r = -EEXIST;
+ if (kvm->arch.vpic)
+ goto create_irqchip_unlock;
r = -ENOMEM;
- kvm->arch.vpic = kvm_create_pic(kvm);
- if (kvm->arch.vpic) {
+ vpic = kvm_create_pic(kvm);
+ if (vpic) {
r = kvm_ioapic_init(kvm);
if (r) {
- kfree(kvm->arch.vpic);
- kvm->arch.vpic = NULL;
- goto out;
+ kfree(vpic);
+ goto create_irqchip_unlock;
}
} else
- goto out;
+ goto create_irqchip_unlock;
+ smp_wmb();
+ kvm->arch.vpic = vpic;
+ smp_wmb();
r = kvm_setup_default_irq_routing(kvm);
if (r) {
+ mutex_lock(&kvm->irq_lock);
kfree(kvm->arch.vpic);
kfree(kvm->arch.vioapic);
- goto out;
+ kvm->arch.vpic = NULL;
+ kvm->arch.vioapic = NULL;
+ mutex_unlock(&kvm->irq_lock);
}
+ create_irqchip_unlock:
+ mutex_unlock(&kvm->lock);
break;
+ }
case KVM_CREATE_PIT:
u.pit_config.flags = KVM_PIT_SPEAKER_DUMMY;
goto create_pit;
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 29/35] KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (27 preceding siblings ...)
2009-11-19 13:35 ` [PATCH 28/35] KVM: x86: disallow multiple KVM_CREATE_IRQCHIP Avi Kivity
@ 2009-11-19 13:35 ` Avi Kivity
2009-11-19 13:35 ` [PATCH 30/35] KVM: only clear irq_source_id if irqchip is present Avi Kivity
` (5 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:35 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Marcelo Tosatti <mtosatti@redhat.com>
Otherwise kvm might attempt to dereference a NULL pointer.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/x86.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 97f6f95..cd6fe0a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1893,6 +1893,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
switch (ioctl) {
case KVM_GET_LAPIC: {
+ r = -EINVAL;
+ if (!vcpu->arch.apic)
+ goto out;
lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL);
r = -ENOMEM;
@@ -1908,6 +1911,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
break;
}
case KVM_SET_LAPIC: {
+ r = -EINVAL;
+ if (!vcpu->arch.apic)
+ goto out;
lapic = kmalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL);
r = -ENOMEM;
if (!lapic)
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 30/35] KVM: only clear irq_source_id if irqchip is present
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (28 preceding siblings ...)
2009-11-19 13:35 ` [PATCH 29/35] KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic Avi Kivity
@ 2009-11-19 13:35 ` Avi Kivity
2009-11-19 13:35 ` [PATCH 31/35] KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG Avi Kivity
` (4 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:35 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Marcelo Tosatti <mtosatti@redhat.com>
Otherwise kvm might attempt to dereference a NULL pointer.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
virt/kvm/irq_comm.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 0d454d3..9b07734 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -243,6 +243,10 @@ void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id)
printk(KERN_ERR "kvm: IRQ source ID out of range!\n");
goto unlock;
}
+ clear_bit(irq_source_id, &kvm->arch.irq_sources_bitmap);
+ if (!irqchip_in_kernel(kvm))
+ goto unlock;
+
for (i = 0; i < KVM_IOAPIC_NUM_PINS; i++) {
clear_bit(irq_source_id, &kvm->arch.vioapic->irq_states[i]);
if (i >= 16)
@@ -251,7 +255,6 @@ void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id)
clear_bit(irq_source_id, &pic_irqchip(kvm)->irq_states[i]);
#endif
}
- clear_bit(irq_source_id, &kvm->arch.irq_sources_bitmap);
unlock:
mutex_unlock(&kvm->irq_lock);
}
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 31/35] KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (29 preceding siblings ...)
2009-11-19 13:35 ` [PATCH 30/35] KVM: only clear irq_source_id if irqchip is present Avi Kivity
@ 2009-11-19 13:35 ` Avi Kivity
2009-11-19 13:35 ` [PATCH 32/35] KVM: Reorder IOCTLs in main kvm.h Avi Kivity
` (3 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:35 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Jan Kiszka <jan.kiszka@siemens.com>
Decouple KVM_GUESTDBG_INJECT_DB and KVM_GUESTDBG_INJECT_BP from
KVM_GUESTDBG_ENABLE, their are actually orthogonal. At this chance,
avoid triggering the WARN_ON in kvm_queue_exception if there is already
an exception pending and reject such invalid requests.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
arch/x86/kvm/x86.c | 20 ++++++++++++++------
1 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cd6fe0a..ba8958d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4656,10 +4656,20 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg)
{
unsigned long rflags;
- int i;
+ int i, r;
vcpu_load(vcpu);
+ if (dbg->control & (KVM_GUESTDBG_INJECT_DB | KVM_GUESTDBG_INJECT_BP)) {
+ r = -EBUSY;
+ if (vcpu->arch.exception.pending)
+ goto unlock_out;
+ if (dbg->control & KVM_GUESTDBG_INJECT_DB)
+ kvm_queue_exception(vcpu, DB_VECTOR);
+ else
+ kvm_queue_exception(vcpu, BP_VECTOR);
+ }
+
/*
* Read rflags as long as potentially injected trace flags are still
* filtered out.
@@ -4695,14 +4705,12 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
kvm_x86_ops->set_guest_debug(vcpu, dbg);
- if (vcpu->guest_debug & KVM_GUESTDBG_INJECT_DB)
- kvm_queue_exception(vcpu, DB_VECTOR);
- else if (vcpu->guest_debug & KVM_GUESTDBG_INJECT_BP)
- kvm_queue_exception(vcpu, BP_VECTOR);
+ r = 0;
+unlock_out:
vcpu_put(vcpu);
- return 0;
+ return r;
}
/*
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 32/35] KVM: Reorder IOCTLs in main kvm.h
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (30 preceding siblings ...)
2009-11-19 13:35 ` [PATCH 31/35] KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG Avi Kivity
@ 2009-11-19 13:35 ` Avi Kivity
2009-11-19 13:35 ` [PATCH 33/35] KVM: Allow internal errors reported to userspace to carry extra data Avi Kivity
` (2 subsequent siblings)
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:35 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Jan Kiszka <jan.kiszka@siemens.com>
Obviously, people tend to extend this header at the bottom - more or
less blindly. Ensure that deprecated stuff gets its own corner again by
moving things to the top. Also add some comments and reindent IOCTLs to
make them more readable and reduce the risk of number collisions.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
include/linux/kvm.h | 235 +++++++++++++++++++++++++--------------------------
1 files changed, 117 insertions(+), 118 deletions(-)
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 6ed1a12..ca62b8e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -14,12 +14,76 @@
#define KVM_API_VERSION 12
-/* for KVM_TRACE_ENABLE, deprecated */
+/* *** Deprecated interfaces *** */
+
+#define KVM_TRC_SHIFT 16
+
+#define KVM_TRC_ENTRYEXIT (1 << KVM_TRC_SHIFT)
+#define KVM_TRC_HANDLER (1 << (KVM_TRC_SHIFT + 1))
+
+#define KVM_TRC_VMENTRY (KVM_TRC_ENTRYEXIT + 0x01)
+#define KVM_TRC_VMEXIT (KVM_TRC_ENTRYEXIT + 0x02)
+#define KVM_TRC_PAGE_FAULT (KVM_TRC_HANDLER + 0x01)
+
+#define KVM_TRC_HEAD_SIZE 12
+#define KVM_TRC_CYCLE_SIZE 8
+#define KVM_TRC_EXTRA_MAX 7
+
+#define KVM_TRC_INJ_VIRQ (KVM_TRC_HANDLER + 0x02)
+#define KVM_TRC_REDELIVER_EVT (KVM_TRC_HANDLER + 0x03)
+#define KVM_TRC_PEND_INTR (KVM_TRC_HANDLER + 0x04)
+#define KVM_TRC_IO_READ (KVM_TRC_HANDLER + 0x05)
+#define KVM_TRC_IO_WRITE (KVM_TRC_HANDLER + 0x06)
+#define KVM_TRC_CR_READ (KVM_TRC_HANDLER + 0x07)
+#define KVM_TRC_CR_WRITE (KVM_TRC_HANDLER + 0x08)
+#define KVM_TRC_DR_READ (KVM_TRC_HANDLER + 0x09)
+#define KVM_TRC_DR_WRITE (KVM_TRC_HANDLER + 0x0A)
+#define KVM_TRC_MSR_READ (KVM_TRC_HANDLER + 0x0B)
+#define KVM_TRC_MSR_WRITE (KVM_TRC_HANDLER + 0x0C)
+#define KVM_TRC_CPUID (KVM_TRC_HANDLER + 0x0D)
+#define KVM_TRC_INTR (KVM_TRC_HANDLER + 0x0E)
+#define KVM_TRC_NMI (KVM_TRC_HANDLER + 0x0F)
+#define KVM_TRC_VMMCALL (KVM_TRC_HANDLER + 0x10)
+#define KVM_TRC_HLT (KVM_TRC_HANDLER + 0x11)
+#define KVM_TRC_CLTS (KVM_TRC_HANDLER + 0x12)
+#define KVM_TRC_LMSW (KVM_TRC_HANDLER + 0x13)
+#define KVM_TRC_APIC_ACCESS (KVM_TRC_HANDLER + 0x14)
+#define KVM_TRC_TDP_FAULT (KVM_TRC_HANDLER + 0x15)
+#define KVM_TRC_GTLB_WRITE (KVM_TRC_HANDLER + 0x16)
+#define KVM_TRC_STLB_WRITE (KVM_TRC_HANDLER + 0x17)
+#define KVM_TRC_STLB_INVAL (KVM_TRC_HANDLER + 0x18)
+#define KVM_TRC_PPC_INSTR (KVM_TRC_HANDLER + 0x19)
+
struct kvm_user_trace_setup {
- __u32 buf_size; /* sub_buffer size of each per-cpu */
- __u32 buf_nr; /* the number of sub_buffers of each per-cpu */
+ __u32 buf_size;
+ __u32 buf_nr;
+};
+
+#define __KVM_DEPRECATED_MAIN_W_0x06 \
+ _IOW(KVMIO, 0x06, struct kvm_user_trace_setup)
+#define __KVM_DEPRECATED_MAIN_0x07 _IO(KVMIO, 0x07)
+#define __KVM_DEPRECATED_MAIN_0x08 _IO(KVMIO, 0x08)
+
+#define __KVM_DEPRECATED_VM_R_0x70 _IOR(KVMIO, 0x70, struct kvm_assigned_irq)
+
+struct kvm_breakpoint {
+ __u32 enabled;
+ __u32 padding;
+ __u64 address;
+};
+
+struct kvm_debug_guest {
+ __u32 enabled;
+ __u32 pad;
+ struct kvm_breakpoint breakpoints[4];
+ __u32 singlestep;
};
+#define __KVM_DEPRECATED_VCPU_W_0x87 _IOW(KVMIO, 0x87, struct kvm_debug_guest)
+
+/* *** End of deprecated interfaces *** */
+
+
/* for KVM_CREATE_MEMORY_REGION */
struct kvm_memory_region {
__u32 slot;
@@ -329,24 +393,6 @@ struct kvm_ioeventfd {
__u8 pad[36];
};
-#define KVM_TRC_SHIFT 16
-/*
- * kvm trace categories
- */
-#define KVM_TRC_ENTRYEXIT (1 << KVM_TRC_SHIFT)
-#define KVM_TRC_HANDLER (1 << (KVM_TRC_SHIFT + 1)) /* only 12 bits */
-
-/*
- * kvm trace action
- */
-#define KVM_TRC_VMENTRY (KVM_TRC_ENTRYEXIT + 0x01)
-#define KVM_TRC_VMEXIT (KVM_TRC_ENTRYEXIT + 0x02)
-#define KVM_TRC_PAGE_FAULT (KVM_TRC_HANDLER + 0x01)
-
-#define KVM_TRC_HEAD_SIZE 12
-#define KVM_TRC_CYCLE_SIZE 8
-#define KVM_TRC_EXTRA_MAX 7
-
#define KVMIO 0xAE
/*
@@ -367,12 +413,10 @@ struct kvm_ioeventfd {
*/
#define KVM_GET_VCPU_MMAP_SIZE _IO(KVMIO, 0x04) /* in bytes */
#define KVM_GET_SUPPORTED_CPUID _IOWR(KVMIO, 0x05, struct kvm_cpuid2)
-/*
- * ioctls for kvm trace
- */
-#define KVM_TRACE_ENABLE _IOW(KVMIO, 0x06, struct kvm_user_trace_setup)
-#define KVM_TRACE_PAUSE _IO(KVMIO, 0x07)
-#define KVM_TRACE_DISABLE _IO(KVMIO, 0x08)
+#define KVM_TRACE_ENABLE __KVM_DEPRECATED_MAIN_W_0x06
+#define KVM_TRACE_PAUSE __KVM_DEPRECATED_MAIN_0x07
+#define KVM_TRACE_DISABLE __KVM_DEPRECATED_MAIN_0x08
+
/*
* Extension capability list.
*/
@@ -522,56 +566,57 @@ struct kvm_clock_data {
/*
* ioctls for VM fds
*/
-#define KVM_SET_MEMORY_REGION _IOW(KVMIO, 0x40, struct kvm_memory_region)
+#define KVM_SET_MEMORY_REGION _IOW(KVMIO, 0x40, struct kvm_memory_region)
/*
* KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns
* a vcpu fd.
*/
-#define KVM_CREATE_VCPU _IO(KVMIO, 0x41)
-#define KVM_GET_DIRTY_LOG _IOW(KVMIO, 0x42, struct kvm_dirty_log)
-#define KVM_SET_MEMORY_ALIAS _IOW(KVMIO, 0x43, struct kvm_memory_alias)
-#define KVM_SET_NR_MMU_PAGES _IO(KVMIO, 0x44)
-#define KVM_GET_NR_MMU_PAGES _IO(KVMIO, 0x45)
-#define KVM_SET_USER_MEMORY_REGION _IOW(KVMIO, 0x46,\
+#define KVM_CREATE_VCPU _IO(KVMIO, 0x41)
+#define KVM_GET_DIRTY_LOG _IOW(KVMIO, 0x42, struct kvm_dirty_log)
+#define KVM_SET_MEMORY_ALIAS _IOW(KVMIO, 0x43, struct kvm_memory_alias)
+#define KVM_SET_NR_MMU_PAGES _IO(KVMIO, 0x44)
+#define KVM_GET_NR_MMU_PAGES _IO(KVMIO, 0x45)
+#define KVM_SET_USER_MEMORY_REGION _IOW(KVMIO, 0x46, \
struct kvm_userspace_memory_region)
-#define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47)
-#define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64)
+#define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47)
+#define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64)
/* Device model IOC */
-#define KVM_CREATE_IRQCHIP _IO(KVMIO, 0x60)
-#define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level)
-#define KVM_GET_IRQCHIP _IOWR(KVMIO, 0x62, struct kvm_irqchip)
-#define KVM_SET_IRQCHIP _IOR(KVMIO, 0x63, struct kvm_irqchip)
-#define KVM_CREATE_PIT _IO(KVMIO, 0x64)
-#define KVM_GET_PIT _IOWR(KVMIO, 0x65, struct kvm_pit_state)
-#define KVM_SET_PIT _IOR(KVMIO, 0x66, struct kvm_pit_state)
-#define KVM_IRQ_LINE_STATUS _IOWR(KVMIO, 0x67, struct kvm_irq_level)
+#define KVM_CREATE_IRQCHIP _IO(KVMIO, 0x60)
+#define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level)
+#define KVM_GET_IRQCHIP _IOWR(KVMIO, 0x62, struct kvm_irqchip)
+#define KVM_SET_IRQCHIP _IOR(KVMIO, 0x63, struct kvm_irqchip)
+#define KVM_CREATE_PIT _IO(KVMIO, 0x64)
+#define KVM_GET_PIT _IOWR(KVMIO, 0x65, struct kvm_pit_state)
+#define KVM_SET_PIT _IOR(KVMIO, 0x66, struct kvm_pit_state)
+#define KVM_IRQ_LINE_STATUS _IOWR(KVMIO, 0x67, struct kvm_irq_level)
#define KVM_REGISTER_COALESCED_MMIO \
_IOW(KVMIO, 0x67, struct kvm_coalesced_mmio_zone)
#define KVM_UNREGISTER_COALESCED_MMIO \
_IOW(KVMIO, 0x68, struct kvm_coalesced_mmio_zone)
-#define KVM_ASSIGN_PCI_DEVICE _IOR(KVMIO, 0x69, \
- struct kvm_assigned_pci_dev)
-#define KVM_SET_GSI_ROUTING _IOW(KVMIO, 0x6a, struct kvm_irq_routing)
+#define KVM_ASSIGN_PCI_DEVICE _IOR(KVMIO, 0x69, \
+ struct kvm_assigned_pci_dev)
+#define KVM_SET_GSI_ROUTING _IOW(KVMIO, 0x6a, struct kvm_irq_routing)
/* deprecated, replaced by KVM_ASSIGN_DEV_IRQ */
-#define KVM_ASSIGN_IRQ _IOR(KVMIO, 0x70, \
- struct kvm_assigned_irq)
-#define KVM_ASSIGN_DEV_IRQ _IOW(KVMIO, 0x70, struct kvm_assigned_irq)
-#define KVM_REINJECT_CONTROL _IO(KVMIO, 0x71)
-#define KVM_DEASSIGN_PCI_DEVICE _IOW(KVMIO, 0x72, \
- struct kvm_assigned_pci_dev)
-#define KVM_ASSIGN_SET_MSIX_NR \
- _IOW(KVMIO, 0x73, struct kvm_assigned_msix_nr)
-#define KVM_ASSIGN_SET_MSIX_ENTRY \
- _IOW(KVMIO, 0x74, struct kvm_assigned_msix_entry)
-#define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
-#define KVM_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd)
-#define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config)
-#define KVM_SET_BOOT_CPU_ID _IO(KVMIO, 0x78)
-#define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
-#define KVM_XEN_HVM_CONFIG _IOW(KVMIO, 0x7a, struct kvm_xen_hvm_config)
-#define KVM_SET_CLOCK _IOW(KVMIO, 0x7b, struct kvm_clock_data)
-#define KVM_GET_CLOCK _IOR(KVMIO, 0x7c, struct kvm_clock_data)
-
+#define KVM_ASSIGN_IRQ __KVM_DEPRECATED_VM_R_0x70
+#define KVM_ASSIGN_DEV_IRQ _IOW(KVMIO, 0x70, struct kvm_assigned_irq)
+#define KVM_REINJECT_CONTROL _IO(KVMIO, 0x71)
+#define KVM_DEASSIGN_PCI_DEVICE _IOW(KVMIO, 0x72, \
+ struct kvm_assigned_pci_dev)
+#define KVM_ASSIGN_SET_MSIX_NR _IOW(KVMIO, 0x73, \
+ struct kvm_assigned_msix_nr)
+#define KVM_ASSIGN_SET_MSIX_ENTRY _IOW(KVMIO, 0x74, \
+ struct kvm_assigned_msix_entry)
+#define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
+#define KVM_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd)
+#define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config)
+#define KVM_SET_BOOT_CPU_ID _IO(KVMIO, 0x78)
+#define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
+#define KVM_XEN_HVM_CONFIG _IOW(KVMIO, 0x7a, struct kvm_xen_hvm_config)
+#define KVM_SET_CLOCK _IOW(KVMIO, 0x7b, struct kvm_clock_data)
+#define KVM_GET_CLOCK _IOR(KVMIO, 0x7c, struct kvm_clock_data)
+/* Available with KVM_CAP_PIT_STATE2 */
+#define KVM_GET_PIT2 _IOR(KVMIO, 0x9f, struct kvm_pit_state2)
+#define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct kvm_pit_state2)
/*
* ioctls for vcpu fds
@@ -584,7 +629,7 @@ struct kvm_clock_data {
#define KVM_TRANSLATE _IOWR(KVMIO, 0x85, struct kvm_translation)
#define KVM_INTERRUPT _IOW(KVMIO, 0x86, struct kvm_interrupt)
/* KVM_DEBUG_GUEST is no longer supported, use KVM_SET_GUEST_DEBUG instead */
-#define KVM_DEBUG_GUEST __KVM_DEPRECATED_DEBUG_GUEST
+#define KVM_DEBUG_GUEST __KVM_DEPRECATED_VCPU_W_0x87
#define KVM_GET_MSRS _IOWR(KVMIO, 0x88, struct kvm_msrs)
#define KVM_SET_MSRS _IOW(KVMIO, 0x89, struct kvm_msrs)
#define KVM_SET_CPUID _IOW(KVMIO, 0x8a, struct kvm_cpuid)
@@ -596,7 +641,7 @@ struct kvm_clock_data {
#define KVM_SET_CPUID2 _IOW(KVMIO, 0x90, struct kvm_cpuid2)
#define KVM_GET_CPUID2 _IOWR(KVMIO, 0x91, struct kvm_cpuid2)
/* Available with KVM_CAP_VAPIC */
-#define KVM_TPR_ACCESS_REPORTING _IOWR(KVMIO, 0x92, struct kvm_tpr_access_ctl)
+#define KVM_TPR_ACCESS_REPORTING _IOWR(KVMIO, 0x92, struct kvm_tpr_access_ctl)
/* Available with KVM_CAP_VAPIC */
#define KVM_SET_VAPIC_ADDR _IOW(KVMIO, 0x93, struct kvm_vapic_addr)
/* valid for virtual machine (for floating interrupt)_and_ vcpu */
@@ -608,67 +653,21 @@ struct kvm_clock_data {
/* initial ipl psw for s390 */
#define KVM_S390_SET_INITIAL_PSW _IOW(KVMIO, 0x96, struct kvm_s390_psw)
/* initial reset for s390 */
-#define KVM_S390_INITIAL_RESET _IO(KVMIO, 0x97)
+#define KVM_S390_INITIAL_RESET _IO(KVMIO, 0x97)
#define KVM_GET_MP_STATE _IOR(KVMIO, 0x98, struct kvm_mp_state)
#define KVM_SET_MP_STATE _IOW(KVMIO, 0x99, struct kvm_mp_state)
/* Available with KVM_CAP_NMI */
-#define KVM_NMI _IO(KVMIO, 0x9a)
+#define KVM_NMI _IO(KVMIO, 0x9a)
/* Available with KVM_CAP_SET_GUEST_DEBUG */
#define KVM_SET_GUEST_DEBUG _IOW(KVMIO, 0x9b, struct kvm_guest_debug)
/* MCE for x86 */
#define KVM_X86_SETUP_MCE _IOW(KVMIO, 0x9c, __u64)
#define KVM_X86_GET_MCE_CAP_SUPPORTED _IOR(KVMIO, 0x9d, __u64)
#define KVM_X86_SET_MCE _IOW(KVMIO, 0x9e, struct kvm_x86_mce)
-
-/*
- * Deprecated interfaces
- */
-struct kvm_breakpoint {
- __u32 enabled;
- __u32 padding;
- __u64 address;
-};
-
-struct kvm_debug_guest {
- __u32 enabled;
- __u32 pad;
- struct kvm_breakpoint breakpoints[4];
- __u32 singlestep;
-};
-
-#define __KVM_DEPRECATED_DEBUG_GUEST _IOW(KVMIO, 0x87, struct kvm_debug_guest)
-
+/* IA64 stack access */
#define KVM_IA64_VCPU_GET_STACK _IOR(KVMIO, 0x9a, void *)
#define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *)
-#define KVM_GET_PIT2 _IOR(KVMIO, 0x9f, struct kvm_pit_state2)
-#define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct kvm_pit_state2)
-
-#define KVM_TRC_INJ_VIRQ (KVM_TRC_HANDLER + 0x02)
-#define KVM_TRC_REDELIVER_EVT (KVM_TRC_HANDLER + 0x03)
-#define KVM_TRC_PEND_INTR (KVM_TRC_HANDLER + 0x04)
-#define KVM_TRC_IO_READ (KVM_TRC_HANDLER + 0x05)
-#define KVM_TRC_IO_WRITE (KVM_TRC_HANDLER + 0x06)
-#define KVM_TRC_CR_READ (KVM_TRC_HANDLER + 0x07)
-#define KVM_TRC_CR_WRITE (KVM_TRC_HANDLER + 0x08)
-#define KVM_TRC_DR_READ (KVM_TRC_HANDLER + 0x09)
-#define KVM_TRC_DR_WRITE (KVM_TRC_HANDLER + 0x0A)
-#define KVM_TRC_MSR_READ (KVM_TRC_HANDLER + 0x0B)
-#define KVM_TRC_MSR_WRITE (KVM_TRC_HANDLER + 0x0C)
-#define KVM_TRC_CPUID (KVM_TRC_HANDLER + 0x0D)
-#define KVM_TRC_INTR (KVM_TRC_HANDLER + 0x0E)
-#define KVM_TRC_NMI (KVM_TRC_HANDLER + 0x0F)
-#define KVM_TRC_VMMCALL (KVM_TRC_HANDLER + 0x10)
-#define KVM_TRC_HLT (KVM_TRC_HANDLER + 0x11)
-#define KVM_TRC_CLTS (KVM_TRC_HANDLER + 0x12)
-#define KVM_TRC_LMSW (KVM_TRC_HANDLER + 0x13)
-#define KVM_TRC_APIC_ACCESS (KVM_TRC_HANDLER + 0x14)
-#define KVM_TRC_TDP_FAULT (KVM_TRC_HANDLER + 0x15)
-#define KVM_TRC_GTLB_WRITE (KVM_TRC_HANDLER + 0x16)
-#define KVM_TRC_STLB_WRITE (KVM_TRC_HANDLER + 0x17)
-#define KVM_TRC_STLB_INVAL (KVM_TRC_HANDLER + 0x18)
-#define KVM_TRC_PPC_INSTR (KVM_TRC_HANDLER + 0x19)
-
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
struct kvm_assigned_pci_dev {
@@ -722,4 +721,4 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
};
-#endif
+#endif /* __LINUX_KVM_H */
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 33/35] KVM: Allow internal errors reported to userspace to carry extra data
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (31 preceding siblings ...)
2009-11-19 13:35 ` [PATCH 32/35] KVM: Reorder IOCTLs in main kvm.h Avi Kivity
@ 2009-11-19 13:35 ` Avi Kivity
2009-11-19 13:35 ` [PATCH 34/35] KVM: VMX: Report unexpected simultaneous exceptions as internal errors Avi Kivity
2009-11-19 13:35 ` [PATCH 35/35] KVM: x86: Add KVM_GET/SET_VCPU_EVENTS Avi Kivity
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:35 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
Usually userspace will freeze the guest so we can inspect it, but some
internal state is not available. Add extra data to internal error
reporting so we can expose it to the debugger. Extra data is specific
to the suberror.
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/mmu.c | 1 +
arch/x86/kvm/vmx.c | 1 +
include/linux/kvm.h | 4 ++++
virt/kvm/kvm_main.c | 1 +
4 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a902479..4c3e5b2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2800,6 +2800,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code)
case EMULATE_FAIL:
vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+ vcpu->run->internal.ndata = 0;
return 0;
default:
BUG();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c9cc959..c0e66dd 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3352,6 +3352,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
kvm_report_emulation_failure(vcpu, "emulation failure");
vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+ vcpu->run->internal.ndata = 0;
ret = 0;
goto out;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ca62b8e..172639e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -251,6 +251,9 @@ struct kvm_run {
} dcr;
struct {
__u32 suberror;
+ /* Available with KVM_CAP_INTERNAL_ERROR_DATA: */
+ __u32 ndata;
+ __u64 data[16];
} internal;
/* Fix the size of the union. */
char padding[256];
@@ -484,6 +487,7 @@ struct kvm_ioeventfd {
#define KVM_CAP_XEN_HVM 38
#endif
#define KVM_CAP_ADJUST_CLOCK 39
+#define KVM_CAP_INTERNAL_ERROR_DATA 40
#ifdef KVM_CAP_IRQ_ROUTING
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bd44fb4..f92ba13 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1653,6 +1653,7 @@ static long kvm_dev_ioctl_check_extension_generic(long arg)
#ifdef CONFIG_KVM_APIC_ARCHITECTURE
case KVM_CAP_SET_BOOT_CPU_ID:
#endif
+ case KVM_CAP_INTERNAL_ERROR_DATA:
return 1;
#ifdef CONFIG_HAVE_KVM_IRQCHIP
case KVM_CAP_IRQ_ROUTING:
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 34/35] KVM: VMX: Report unexpected simultaneous exceptions as internal errors
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (32 preceding siblings ...)
2009-11-19 13:35 ` [PATCH 33/35] KVM: Allow internal errors reported to userspace to carry extra data Avi Kivity
@ 2009-11-19 13:35 ` Avi Kivity
2009-11-19 13:35 ` [PATCH 35/35] KVM: x86: Add KVM_GET/SET_VCPU_EVENTS Avi Kivity
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:35 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
These happen when we trap an exception when another exception is being
delivered; we only expect these with MCEs and page faults. If something
unexpected happens, things probably went south and we're better off reporting
an internal error and freezing.
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/vmx.c | 11 ++++++++---
include/linux/kvm.h | 1 +
2 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c0e66dd..22fcd27 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2744,9 +2744,14 @@ static int handle_exception(struct kvm_vcpu *vcpu)
return handle_machine_check(vcpu);
if ((vect_info & VECTORING_INFO_VALID_MASK) &&
- !is_page_fault(intr_info))
- printk(KERN_ERR "%s: unexpected, vectoring info 0x%x "
- "intr info 0x%x\n", __func__, vect_info, intr_info);
+ !is_page_fault(intr_info)) {
+ vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+ vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_SIMUL_EX;
+ vcpu->run->internal.ndata = 2;
+ vcpu->run->internal.data[0] = vect_info;
+ vcpu->run->internal.data[1] = intr_info;
+ return 0;
+ }
if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR)
return 1; /* already handled by vmx_vcpu_run() */
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 172639e..976f4d1 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -163,6 +163,7 @@ struct kvm_pit_config {
/* For KVM_EXIT_INTERNAL_ERROR */
#define KVM_INTERNAL_ERROR_EMULATION 1
+#define KVM_INTERNAL_ERROR_SIMUL_EX 2
/* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
struct kvm_run {
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 35/35] KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
` (33 preceding siblings ...)
2009-11-19 13:35 ` [PATCH 34/35] KVM: VMX: Report unexpected simultaneous exceptions as internal errors Avi Kivity
@ 2009-11-19 13:35 ` Avi Kivity
34 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2009-11-19 13:35 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
From: Jan Kiszka <jan.kiszka@web.de>
This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.
[avi: future-proof abi by adding a flags field]
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
Documentation/kvm/api.txt | 49 +++++++++++++++++++++++++
arch/x86/include/asm/kvm.h | 28 ++++++++++++++
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/svm.c | 22 +++++++++++
arch/x86/kvm/vmx.c | 30 +++++++++++++++
arch/x86/kvm/x86.c | 77 +++++++++++++++++++++++++++++++++++++++
include/linux/kvm.h | 6 +++
7 files changed, 214 insertions(+), 0 deletions(-)
diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index 36594ba..e1a1141 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -653,6 +653,55 @@ struct kvm_clock_data {
__u32 pad[9];
};
+4.29 KVM_GET_VCPU_EVENTS
+
+Capability: KVM_CAP_VCPU_EVENTS
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_vcpu_event (out)
+Returns: 0 on success, -1 on error
+
+Gets currently pending exceptions, interrupts, and NMIs as well as related
+states of the vcpu.
+
+struct kvm_vcpu_events {
+ struct {
+ __u8 injected;
+ __u8 nr;
+ __u8 has_error_code;
+ __u8 pad;
+ __u32 error_code;
+ } exception;
+ struct {
+ __u8 injected;
+ __u8 nr;
+ __u8 soft;
+ __u8 pad;
+ } interrupt;
+ struct {
+ __u8 injected;
+ __u8 pending;
+ __u8 masked;
+ __u8 pad;
+ } nmi;
+ __u32 sipi_vector;
+ __u32 flags; /* must be zero */
+};
+
+4.30 KVM_SET_VCPU_EVENTS
+
+Capability: KVM_CAP_VCPU_EVENTS
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_vcpu_event (in)
+Returns: 0 on success, -1 on error
+
+Set pending exceptions, interrupts, and NMIs as well as related states of the
+vcpu.
+
+See KVM_GET_VCPU_EVENTS for the data structure.
+
+
5. The kvm_run structure
Application code obtains a pointer to the kvm_run structure by
diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index ef9b4b7..950df43 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -20,6 +20,7 @@
#define __KVM_HAVE_MCE
#define __KVM_HAVE_PIT_STATE2
#define __KVM_HAVE_XEN_HVM
+#define __KVM_HAVE_VCPU_EVENTS
/* Architectural interrupt line count. */
#define KVM_NR_INTERRUPTS 256
@@ -252,4 +253,31 @@ struct kvm_reinject_control {
__u8 pit_reinject;
__u8 reserved[31];
};
+
+/* for KVM_GET/SET_VCPU_EVENTS */
+struct kvm_vcpu_events {
+ struct {
+ __u8 injected;
+ __u8 nr;
+ __u8 has_error_code;
+ __u8 pad;
+ __u32 error_code;
+ } exception;
+ struct {
+ __u8 injected;
+ __u8 nr;
+ __u8 soft;
+ __u8 pad;
+ } interrupt;
+ struct {
+ __u8 injected;
+ __u8 pending;
+ __u8 masked;
+ __u8 pad;
+ } nmi;
+ __u32 sipi_vector;
+ __u32 flags;
+ __u32 reserved[10];
+};
+
#endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 26a74b7..06e0856 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -523,6 +523,8 @@ struct kvm_x86_ops {
bool has_error_code, u32 error_code);
int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
int (*nmi_allowed)(struct kvm_vcpu *vcpu);
+ bool (*get_nmi_mask)(struct kvm_vcpu *vcpu);
+ void (*set_nmi_mask)(struct kvm_vcpu *vcpu, bool masked);
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 34b700f..3de0b37 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2499,6 +2499,26 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
!(svm->vcpu.arch.hflags & HF_NMI_MASK);
}
+static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ return !!(svm->vcpu.arch.hflags & HF_NMI_MASK);
+}
+
+static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ if (masked) {
+ svm->vcpu.arch.hflags |= HF_NMI_MASK;
+ svm->vmcb->control.intercept |= (1UL << INTERCEPT_IRET);
+ } else {
+ svm->vcpu.arch.hflags &= ~HF_NMI_MASK;
+ svm->vmcb->control.intercept &= ~(1UL << INTERCEPT_IRET);
+ }
+}
+
static int svm_interrupt_allowed(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -2946,6 +2966,8 @@ static struct kvm_x86_ops svm_x86_ops = {
.queue_exception = svm_queue_exception,
.interrupt_allowed = svm_interrupt_allowed,
.nmi_allowed = svm_nmi_allowed,
+ .get_nmi_mask = svm_get_nmi_mask,
+ .set_nmi_mask = svm_set_nmi_mask,
.enable_nmi_window = enable_nmi_window,
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 22fcd27..778f059 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2639,6 +2639,34 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
GUEST_INTR_STATE_NMI));
}
+static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
+{
+ if (!cpu_has_virtual_nmis())
+ return to_vmx(vcpu)->soft_vnmi_blocked;
+ else
+ return !!(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
+ GUEST_INTR_STATE_NMI);
+}
+
+static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ if (!cpu_has_virtual_nmis()) {
+ if (vmx->soft_vnmi_blocked != masked) {
+ vmx->soft_vnmi_blocked = masked;
+ vmx->vnmi_blocked_time = 0;
+ }
+ } else {
+ if (masked)
+ vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
+ GUEST_INTR_STATE_NMI);
+ else
+ vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO,
+ GUEST_INTR_STATE_NMI);
+ }
+}
+
static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
{
return (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
@@ -3985,6 +4013,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
.queue_exception = vmx_queue_exception,
.interrupt_allowed = vmx_interrupt_allowed,
.nmi_allowed = vmx_nmi_allowed,
+ .get_nmi_mask = vmx_get_nmi_mask,
+ .set_nmi_mask = vmx_set_nmi_mask,
.enable_nmi_window = enable_nmi_window,
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ba8958d..35eea30 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1342,6 +1342,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_SET_IDENTITY_MAP_ADDR:
case KVM_CAP_XEN_HVM:
case KVM_CAP_ADJUST_CLOCK:
+ case KVM_CAP_VCPU_EVENTS:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -1883,6 +1884,61 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
return 0;
}
+static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
+ struct kvm_vcpu_events *events)
+{
+ vcpu_load(vcpu);
+
+ events->exception.injected = vcpu->arch.exception.pending;
+ events->exception.nr = vcpu->arch.exception.nr;
+ events->exception.has_error_code = vcpu->arch.exception.has_error_code;
+ events->exception.error_code = vcpu->arch.exception.error_code;
+
+ events->interrupt.injected = vcpu->arch.interrupt.pending;
+ events->interrupt.nr = vcpu->arch.interrupt.nr;
+ events->interrupt.soft = vcpu->arch.interrupt.soft;
+
+ events->nmi.injected = vcpu->arch.nmi_injected;
+ events->nmi.pending = vcpu->arch.nmi_pending;
+ events->nmi.masked = kvm_x86_ops->get_nmi_mask(vcpu);
+
+ events->sipi_vector = vcpu->arch.sipi_vector;
+
+ events->flags = 0;
+
+ vcpu_put(vcpu);
+}
+
+static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
+ struct kvm_vcpu_events *events)
+{
+ if (events->flags)
+ return -EINVAL;
+
+ vcpu_load(vcpu);
+
+ vcpu->arch.exception.pending = events->exception.injected;
+ vcpu->arch.exception.nr = events->exception.nr;
+ vcpu->arch.exception.has_error_code = events->exception.has_error_code;
+ vcpu->arch.exception.error_code = events->exception.error_code;
+
+ vcpu->arch.interrupt.pending = events->interrupt.injected;
+ vcpu->arch.interrupt.nr = events->interrupt.nr;
+ vcpu->arch.interrupt.soft = events->interrupt.soft;
+ if (vcpu->arch.interrupt.pending && irqchip_in_kernel(vcpu->kvm))
+ kvm_pic_clear_isr_ack(vcpu->kvm);
+
+ vcpu->arch.nmi_injected = events->nmi.injected;
+ vcpu->arch.nmi_pending = events->nmi.pending;
+ kvm_x86_ops->set_nmi_mask(vcpu, events->nmi.masked);
+
+ vcpu->arch.sipi_vector = events->sipi_vector;
+
+ vcpu_put(vcpu);
+
+ return 0;
+}
+
long kvm_arch_vcpu_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -2040,6 +2096,27 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = kvm_vcpu_ioctl_x86_set_mce(vcpu, &mce);
break;
}
+ case KVM_GET_VCPU_EVENTS: {
+ struct kvm_vcpu_events events;
+
+ kvm_vcpu_ioctl_x86_get_vcpu_events(vcpu, &events);
+
+ r = -EFAULT;
+ if (copy_to_user(argp, &events, sizeof(struct kvm_vcpu_events)))
+ break;
+ r = 0;
+ break;
+ }
+ case KVM_SET_VCPU_EVENTS: {
+ struct kvm_vcpu_events events;
+
+ r = -EFAULT;
+ if (copy_from_user(&events, argp, sizeof(struct kvm_vcpu_events)))
+ break;
+
+ r = kvm_vcpu_ioctl_x86_set_vcpu_events(vcpu, &events);
+ break;
+ }
default:
r = -EINVAL;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 976f4d1..92045a9 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -489,6 +489,9 @@ struct kvm_ioeventfd {
#endif
#define KVM_CAP_ADJUST_CLOCK 39
#define KVM_CAP_INTERNAL_ERROR_DATA 40
+#ifdef __KVM_HAVE_VCPU_EVENTS
+#define KVM_CAP_VCPU_EVENTS 41
+#endif
#ifdef KVM_CAP_IRQ_ROUTING
@@ -672,6 +675,9 @@ struct kvm_clock_data {
/* IA64 stack access */
#define KVM_IA64_VCPU_GET_STACK _IOR(KVMIO, 0x9a, void *)
#define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *)
+/* Available with KVM_CAP_VCPU_EVENTS */
+#define KVM_GET_VCPU_EVENTS _IOR(KVMIO, 0x9f, struct kvm_vcpu_events)
+#define KVM_SET_VCPU_EVENTS _IOW(KVMIO, 0xa0, struct kvm_vcpu_events)
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
--
1.6.5.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 15/35] KVM: allow userspace to adjust kvmclock offset
2009-11-19 13:34 ` [PATCH 15/35] KVM: allow userspace to adjust kvmclock offset Avi Kivity
@ 2010-01-29 13:32 ` Alexander Graf
2010-02-01 18:54 ` Marcelo Tosatti
0 siblings, 1 reply; 39+ messages in thread
From: Alexander Graf @ 2010-01-29 13:32 UTC (permalink / raw)
To: Avi Kivity; +Cc: KVM list
On 19.11.2009, at 14:34, Avi Kivity wrote:
> From: Glauber Costa <glommer@redhat.com>
>
> When we migrate a kvm guest that uses pvclock between two hosts, we may
> suffer a large skew. This is because there can be significant differences
> between the monotonic clock of the hosts involved. When a new host with
> a much larger monotonic time starts running the guest, the view of time
> will be significantly impacted.
>
> Situation is much worse when we do the opposite, and migrate to a host with
> a smaller monotonic clock.
>
> This proposed ioctl will allow userspace to inform us what is the monotonic
> clock value in the source host, so we can keep the time skew short, and
> more importantly, never goes backwards. Userspace may also need to trigger
> the current data, since from the first migration onwards, it won't be
> reflected by a simple call to clock_gettime() anymore.
So I assume without this feature there's no way to have a reliable kvmclock inside the guest? Isn't it stable material then?
Alex
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 15/35] KVM: allow userspace to adjust kvmclock offset
2010-01-29 13:32 ` Alexander Graf
@ 2010-02-01 18:54 ` Marcelo Tosatti
2010-02-01 21:42 ` patch kvm-allow-userspace-to-adjust-kvmclock-offset.patch added to 2.6.32-stable tree gregkh
0 siblings, 1 reply; 39+ messages in thread
From: Marcelo Tosatti @ 2010-02-01 18:54 UTC (permalink / raw)
To: Alexander Graf, Greg KH; +Cc: Avi Kivity, KVM list
On Fri, Jan 29, 2010 at 02:32:43PM +0100, Alexander Graf wrote:
>
> On 19.11.2009, at 14:34, Avi Kivity wrote:
>
> > From: Glauber Costa <glommer@redhat.com>
> >
> > When we migrate a kvm guest that uses pvclock between two hosts, we may
> > suffer a large skew. This is because there can be significant differences
> > between the monotonic clock of the hosts involved. When a new host with
> > a much larger monotonic time starts running the guest, the view of time
> > will be significantly impacted.
> >
> > Situation is much worse when we do the opposite, and migrate to a host with
> > a smaller monotonic clock.
> >
> > This proposed ioctl will allow userspace to inform us what is the monotonic
> > clock value in the source host, so we can keep the time skew short, and
> > more importantly, never goes backwards. Userspace may also need to trigger
> > the current data, since from the first migration onwards, it won't be
> > reflected by a simple call to clock_gettime() anymore.
>
> So I assume without this feature there's no way to have a reliable kvmclock inside the guest? Isn't it stable material then?
Its unreliable only with migration. Yes, it is stable material.
Here is a backport for 2.6.32. Greg, can you please include it ?
Thanks
------------------
From: Glauber Costa <glommer@redhat.com>
KVM: allow userspace to adjust kvmclock offset
When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.
Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.
This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.
[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
(cherry picked from afbcf7ab8d1bc8c2d04792f6d9e786e0adeb328d)
diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index 5a4bc8c..db3a706 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -593,6 +593,42 @@ struct kvm_irqchip {
} chip;
};
+4.27 KVM_GET_CLOCK
+
+Capability: KVM_CAP_ADJUST_CLOCK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_clock_data (out)
+Returns: 0 on success, -1 on error
+
+Gets the current timestamp of kvmclock as seen by the current guest. In
+conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
+such as migration.
+
+struct kvm_clock_data {
+ __u64 clock; /* kvmclock current value */
+ __u32 flags;
+ __u32 pad[9];
+};
+
+4.28 KVM_SET_CLOCK
+
+Capability: KVM_CAP_ADJUST_CLOCK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_clock_data (in)
+Returns: 0 on success, -1 on error
+
+Sets the current timestamp of kvmclock to the valued specific in its parameter.
+In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
+such as migration.
+
+struct kvm_clock_data {
+ __u64 clock; /* kvmclock current value */
+ __u32 flags;
+ __u32 pad[9];
+};
+
5. The kvm_run structure
Application code obtains a pointer to the kvm_run structure by
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d838922..d759a1f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -412,6 +412,7 @@ struct kvm_arch{
unsigned long irq_sources_bitmap;
unsigned long irq_states[KVM_IOAPIC_NUM_PINS];
u64 vm_init_tsc;
+ s64 kvmclock_offset;
};
struct kvm_vm_stat {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ae07d26..adb7912 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -677,7 +677,8 @@ static void kvm_write_guest_time(struct kvm_vcpu *v)
/* With all the info we got, fill in the values */
vcpu->hv_clock.system_time = ts.tv_nsec +
- (NSEC_PER_SEC * (u64)ts.tv_sec);
+ (NSEC_PER_SEC * (u64)ts.tv_sec) + v->kvm->arch.kvmclock_offset;
+
/*
* The interface expects us to write an even number signaling that the
* update is finished. Since the guest won't see the intermediate
@@ -1224,6 +1225,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_PIT2:
case KVM_CAP_PIT_STATE2:
case KVM_CAP_SET_IDENTITY_MAP_ADDR:
+ case KVM_CAP_ADJUST_CLOCK:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -2421,6 +2423,44 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = 0;
break;
}
+ case KVM_SET_CLOCK: {
+ struct timespec now;
+ struct kvm_clock_data user_ns;
+ u64 now_ns;
+ s64 delta;
+
+ r = -EFAULT;
+ if (copy_from_user(&user_ns, argp, sizeof(user_ns)))
+ goto out;
+
+ r = -EINVAL;
+ if (user_ns.flags)
+ goto out;
+
+ r = 0;
+ ktime_get_ts(&now);
+ now_ns = timespec_to_ns(&now);
+ delta = user_ns.clock - now_ns;
+ kvm->arch.kvmclock_offset = delta;
+ break;
+ }
+ case KVM_GET_CLOCK: {
+ struct timespec now;
+ struct kvm_clock_data user_ns;
+ u64 now_ns;
+
+ ktime_get_ts(&now);
+ now_ns = timespec_to_ns(&now);
+ user_ns.clock = kvm->arch.kvmclock_offset + now_ns;
+ user_ns.flags = 0;
+
+ r = -EFAULT;
+ if (copy_to_user(argp, &user_ns, sizeof(user_ns)))
+ goto out;
+ r = 0;
+ break;
+ }
+
default:
;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index f8f8900..b80fec1 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -436,6 +436,7 @@ struct kvm_ioeventfd {
#endif
#define KVM_CAP_IOEVENTFD 36
#define KVM_CAP_SET_IDENTITY_MAP_ADDR 37
+#define KVM_CAP_ADJUST_CLOCK 39
#ifdef KVM_CAP_IRQ_ROUTING
@@ -497,6 +498,12 @@ struct kvm_irqfd {
__u8 pad[20];
};
+struct kvm_clock_data {
+ __u64 clock;
+ __u32 flags;
+ __u32 pad[9];
+};
+
/*
* ioctls for VM fds
*/
@@ -546,6 +553,8 @@ struct kvm_irqfd {
#define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config)
#define KVM_SET_BOOT_CPU_ID _IO(KVMIO, 0x78)
#define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
+#define KVM_SET_CLOCK _IOW(KVMIO, 0x7b, struct kvm_clock_data)
+#define KVM_GET_CLOCK _IOR(KVMIO, 0x7c, struct kvm_clock_data)
/*
* ioctls for vcpu fds
^ permalink raw reply related [flat|nested] 39+ messages in thread
* patch kvm-allow-userspace-to-adjust-kvmclock-offset.patch added to 2.6.32-stable tree
2010-02-01 18:54 ` Marcelo Tosatti
@ 2010-02-01 21:42 ` gregkh
0 siblings, 0 replies; 39+ messages in thread
From: gregkh @ 2010-02-01 21:42 UTC (permalink / raw)
To: mtosatti, agraf, avi, glommer, gregkh, kvm; +Cc: stable, stable-commits
This is a note to let you know that we have just queued up the patch titled
Subject: KVM: allow userspace to adjust kvmclock offset
to the 2.6.32-stable tree. Its filename is
kvm-allow-userspace-to-adjust-kvmclock-offset.patch
A git repo of this tree can be found at
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
>From mtosatti@redhat.com Mon Feb 1 13:35:13 2010
From: Marcelo Tosatti <mtosatti@redhat.com>
Date: Mon, 1 Feb 2010 16:54:05 -0200
Subject: KVM: allow userspace to adjust kvmclock offset
To: Alexander Graf <agraf@suse.de>, Greg KH <gregkh@suse.de>
Cc: Avi Kivity <avi@redhat.com>, KVM list <kvm@vger.kernel.org>
Message-ID: <20100201185405.GB5381@amt.cnet>
Content-Disposition: inline
From: Glauber Costa <glommer@redhat.com>
(cherry picked from afbcf7ab8d1bc8c2d04792f6d9e786e0adeb328d)
When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.
Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.
This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.
[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
Documentation/kvm/api.txt | 36 ++++++++++++++++++++++++++++++++++
arch/x86/include/asm/kvm_host.h | 1
arch/x86/kvm/x86.c | 42 +++++++++++++++++++++++++++++++++++++++-
include/linux/kvm.h | 9 ++++++++
4 files changed, 87 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -412,6 +412,7 @@ struct kvm_arch{
unsigned long irq_sources_bitmap;
unsigned long irq_states[KVM_IOAPIC_NUM_PINS];
u64 vm_init_tsc;
+ s64 kvmclock_offset;
};
struct kvm_vm_stat {
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -680,7 +680,8 @@ static void kvm_write_guest_time(struct
/* With all the info we got, fill in the values */
vcpu->hv_clock.system_time = ts.tv_nsec +
- (NSEC_PER_SEC * (u64)ts.tv_sec);
+ (NSEC_PER_SEC * (u64)ts.tv_sec) + v->kvm->arch.kvmclock_offset;
+
/*
* The interface expects us to write an even number signaling that the
* update is finished. Since the guest won't see the intermediate
@@ -1227,6 +1228,7 @@ int kvm_dev_ioctl_check_extension(long e
case KVM_CAP_PIT2:
case KVM_CAP_PIT_STATE2:
case KVM_CAP_SET_IDENTITY_MAP_ADDR:
+ case KVM_CAP_ADJUST_CLOCK:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -2424,6 +2426,44 @@ long kvm_arch_vm_ioctl(struct file *filp
r = 0;
break;
}
+ case KVM_SET_CLOCK: {
+ struct timespec now;
+ struct kvm_clock_data user_ns;
+ u64 now_ns;
+ s64 delta;
+
+ r = -EFAULT;
+ if (copy_from_user(&user_ns, argp, sizeof(user_ns)))
+ goto out;
+
+ r = -EINVAL;
+ if (user_ns.flags)
+ goto out;
+
+ r = 0;
+ ktime_get_ts(&now);
+ now_ns = timespec_to_ns(&now);
+ delta = user_ns.clock - now_ns;
+ kvm->arch.kvmclock_offset = delta;
+ break;
+ }
+ case KVM_GET_CLOCK: {
+ struct timespec now;
+ struct kvm_clock_data user_ns;
+ u64 now_ns;
+
+ ktime_get_ts(&now);
+ now_ns = timespec_to_ns(&now);
+ user_ns.clock = kvm->arch.kvmclock_offset + now_ns;
+ user_ns.flags = 0;
+
+ r = -EFAULT;
+ if (copy_to_user(argp, &user_ns, sizeof(user_ns)))
+ goto out;
+ r = 0;
+ break;
+ }
+
default:
;
}
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -593,6 +593,42 @@ struct kvm_irqchip {
} chip;
};
+4.27 KVM_GET_CLOCK
+
+Capability: KVM_CAP_ADJUST_CLOCK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_clock_data (out)
+Returns: 0 on success, -1 on error
+
+Gets the current timestamp of kvmclock as seen by the current guest. In
+conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
+such as migration.
+
+struct kvm_clock_data {
+ __u64 clock; /* kvmclock current value */
+ __u32 flags;
+ __u32 pad[9];
+};
+
+4.28 KVM_SET_CLOCK
+
+Capability: KVM_CAP_ADJUST_CLOCK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_clock_data (in)
+Returns: 0 on success, -1 on error
+
+Sets the current timestamp of kvmclock to the valued specific in its parameter.
+In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
+such as migration.
+
+struct kvm_clock_data {
+ __u64 clock; /* kvmclock current value */
+ __u32 flags;
+ __u32 pad[9];
+};
+
5. The kvm_run structure
Application code obtains a pointer to the kvm_run structure by
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -439,6 +439,7 @@ struct kvm_ioeventfd {
#endif
#define KVM_CAP_IOEVENTFD 36
#define KVM_CAP_SET_IDENTITY_MAP_ADDR 37
+#define KVM_CAP_ADJUST_CLOCK 39
#ifdef KVM_CAP_IRQ_ROUTING
@@ -501,6 +502,12 @@ struct kvm_irqfd {
__u8 pad[20];
};
+struct kvm_clock_data {
+ __u64 clock;
+ __u32 flags;
+ __u32 pad[9];
+};
+
/*
* ioctls for VM fds
*/
@@ -550,6 +557,8 @@ struct kvm_irqfd {
#define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config)
#define KVM_SET_BOOT_CPU_ID _IO(KVMIO, 0x78)
#define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
+#define KVM_SET_CLOCK _IOW(KVMIO, 0x7b, struct kvm_clock_data)
+#define KVM_GET_CLOCK _IOR(KVMIO, 0x7c, struct kvm_clock_data)
/*
* ioctls for vcpu fds
Patches currently in stable-queue which might be from mtosatti@redhat.com are
queue-2.6.32/kvm-allow-userspace-to-adjust-kvmclock-offset.patch
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2010-02-01 21:44 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
2009-11-19 13:34 ` [PATCH 01/35] KVM: SVM: Add tracepoint for #vmexit because intr pending Avi Kivity
2009-11-19 13:34 ` [PATCH 02/35] KVM: SVM: Add tracepoint for invlpga instruction Avi Kivity
2009-11-19 13:34 ` [PATCH 03/35] KVM: SVM: Add tracepoint for skinit instruction Avi Kivity
2009-11-19 13:34 ` [PATCH 04/35] KVM: SVM: Remove nsvm_printk debugging code Avi Kivity
2009-11-19 13:34 ` [PATCH 05/35] KVM: introduce kvm_vcpu_on_spin Avi Kivity
2009-11-19 13:34 ` [PATCH 06/35] KVM: VMX: Add support for Pause-Loop Exiting Avi Kivity
2009-11-19 13:34 ` [PATCH 07/35] KVM: SVM: Support Pause Filter in AMD processors Avi Kivity
2009-11-19 13:34 ` [PATCH 08/35] KVM: x86: Harden against cpufreq Avi Kivity
2009-11-19 13:34 ` [PATCH 09/35] KVM: VMX: fix handle_pause declaration Avi Kivity
2009-11-19 13:34 ` [PATCH 10/35] KVM: x86: Drop unneeded CONFIG_HAS_IOMEM check Avi Kivity
2009-11-19 13:34 ` [PATCH 11/35] KVM: Xen PV-on-HVM guest support Avi Kivity
2009-11-19 13:34 ` [PATCH 12/35] KVM: x86: Fix guest single-stepping while interruptible Avi Kivity
2009-11-19 13:34 ` [PATCH 13/35] KVM: SVM: Cleanup NMI singlestep Avi Kivity
2009-11-19 13:34 ` [PATCH 14/35] KVM: fix irq_source_id size verification Avi Kivity
2009-11-19 13:34 ` [PATCH 15/35] KVM: allow userspace to adjust kvmclock offset Avi Kivity
2010-01-29 13:32 ` Alexander Graf
2010-02-01 18:54 ` Marcelo Tosatti
2010-02-01 21:42 ` patch kvm-allow-userspace-to-adjust-kvmclock-offset.patch added to 2.6.32-stable tree gregkh
2009-11-19 13:34 ` [PATCH 16/35] KVM: Enable 32bit dirty log pointers on 64bit host Avi Kivity
2009-11-19 13:34 ` [PATCH 17/35] KVM: VMX: Use macros instead of hex value on cr0 initialization Avi Kivity
2009-11-19 13:34 ` [PATCH 18/35] KVM: SVM: Reset cr0 properly on vcpu reset Avi Kivity
2009-11-19 13:34 ` [PATCH 19/35] KVM: SVM: init_vmcb(): remove redundant save->cr0 initialization Avi Kivity
2009-11-19 13:34 ` [PATCH 20/35] KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx autoload msr area Avi Kivity
2009-11-19 13:34 ` [PATCH 21/35] KVM: x86 shared msr infrastructure Avi Kivity
2009-11-19 13:34 ` [PATCH 22/35] KVM: VMX: Use " Avi Kivity
2009-11-19 13:34 ` [PATCH 23/35] KVM: powerpc: Fix BUILD_BUG_ON condition Avi Kivity
2009-11-19 13:35 ` [PATCH 24/35] KVM: remove duplicated task_switch check Avi Kivity
2009-11-19 13:35 ` [PATCH 25/35] KVM: VMX: move CR3/PDPTR update to vmx_set_cr3 Avi Kivity
2009-11-19 13:35 ` [PATCH 26/35] KVM: MMU: update invlpg handler comment Avi Kivity
2009-11-19 13:35 ` [PATCH 27/35] KVM: VMX: Remove vmx->msr_offset_efer Avi Kivity
2009-11-19 13:35 ` [PATCH 28/35] KVM: x86: disallow multiple KVM_CREATE_IRQCHIP Avi Kivity
2009-11-19 13:35 ` [PATCH 29/35] KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic Avi Kivity
2009-11-19 13:35 ` [PATCH 30/35] KVM: only clear irq_source_id if irqchip is present Avi Kivity
2009-11-19 13:35 ` [PATCH 31/35] KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG Avi Kivity
2009-11-19 13:35 ` [PATCH 32/35] KVM: Reorder IOCTLs in main kvm.h Avi Kivity
2009-11-19 13:35 ` [PATCH 33/35] KVM: Allow internal errors reported to userspace to carry extra data Avi Kivity
2009-11-19 13:35 ` [PATCH 34/35] KVM: VMX: Report unexpected simultaneous exceptions as internal errors Avi Kivity
2009-11-19 13:35 ` [PATCH 35/35] KVM: x86: Add KVM_GET/SET_VCPU_EVENTS Avi Kivity
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox