* [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4)
@ 2009-08-19 13:01 Avi Kivity
2009-08-19 13:01 ` [PATCH 01/47] KVM: Return to userspace on emulation failure Avi Kivity
` (46 more replies)
0 siblings, 47 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:01 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
Second batch of the KVM patch queue. Happy reviewing.
Alexander Graf (4):
x86: Add definition for IGNNE MSR
KVM: Implement MSRs used by Hyper-V
KVM: SVM: Implement INVLPGA
KVM: SVM: Improve nested interrupt injection
Andre Przywara (10):
KVM: Move performance counter MSR access interception to generic x86
path
KVM: Allow emulation of syscalls instructions on #UD
KVM: x86 emulator: Add missing EFLAGS bit definitions
KVM: x86 emulator: Prepare for emulation of syscall instructions
KVM: x86 emulator: add syscall emulation
KVM: x86 emulator: Add sysenter emulation
KVM: x86 emulator: Add sysexit emulation
KVM: ignore AMDs HWCR register access to set the FFDIS bit
KVM: ignore reads from AMDs C1E enabled MSR
KVM: introduce module parameter for ignoring unknown MSRs accesses
Avi Kivity (3):
KVM: Return to userspace on emulation failure
KVM: VMX: Only reload guest cr2 if different from host cr2
KVM: SVM: Don't save/restore host cr2
Christian Borntraeger (1):
KVM: s390: Fix memslot initialization for userspace_addr != 0
Gleb Natapov (2):
KVM: Replace pending exception by PF if it happens serially
KVM: Optimize searching for highest IRR
Jan Kiszka (2):
KVM: Fix racy event propagation in timer
KVM: Drop useless atomic test from timer function
Jiri Slaby (1):
KVM: fix lock imbalance
Joerg Roedel (2):
hugetlbfs: export vma_kernel_pagsize to modules
KVM: Prepare memslot data structures for multiple hugepage sizes
Marcelo Tosatti (16):
KVM: MMU: introduce is_last_spte helper
KVM: MMU audit: update count_writable_mappings / count_rmaps
KVM: MMU audit: update audit_write_protection
KVM: MMU audit: nontrapping ptes in nonleaf level
KVM: MMU audit: audit_mappings tweaks
KVM: MMU audit: largepage handling
KVM: VMX: more MSR_IA32_VMX_EPT_VPID_CAP capability bits
KVM: MMU: make for_each_shadow_entry aware of largepages
KVM: MMU: add kvm_mmu_get_spte_hierarchy helper
KVM: VMX: EPT misconfiguration handler
KVM: VMX: conditionally disable 2M pages
KVM: convert custom marker based tracing to event traces
KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl paths
KVM: powerpc: convert marker probes to event trace
KVM: remove old KVMTRACE support code
KVM: use vcpu_id instead of bsp_vcpu pointer in kvm_vcpu_is_bsp
Michael S. Tsirkin (6):
KVM: document locking for kvm_io_device_ops
KVM: switch coalesced mmio changes to slots_lock
KVM: switch pit creation to slots_lock
KVM: convert bus to slots_lock
KVM: remove in_range from io devices
KVM: document lock nesting rule
arch/ia64/include/asm/kvm_host.h | 3 +-
arch/ia64/kvm/Kconfig | 3 -
arch/ia64/kvm/kvm-ia64.c | 28 +---
arch/powerpc/include/asm/kvm_host.h | 3 +-
arch/powerpc/kvm/44x_tlb.c | 11 +-
arch/powerpc/kvm/Kconfig | 11 --
arch/powerpc/kvm/Makefile | 4 +-
arch/powerpc/kvm/e500_tlb.c | 16 +-
arch/powerpc/kvm/emulate.c | 3 +-
arch/powerpc/kvm/powerpc.c | 3 +
arch/powerpc/kvm/trace.h | 104 +++++++++++++
arch/s390/include/asm/kvm_host.h | 6 +-
arch/s390/kvm/Kconfig | 3 -
arch/x86/include/asm/kvm_host.h | 14 +-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/vmx.h | 7 +
arch/x86/kvm/Kconfig | 12 --
arch/x86/kvm/Makefile | 5 +-
arch/x86/kvm/i8254.c | 56 ++++----
arch/x86/kvm/i8259.c | 22 ++-
arch/x86/kvm/lapic.c | 75 ++++++----
arch/x86/kvm/lapic.h | 1 +
arch/x86/kvm/mmu.c | 222 ++++++++++++++++++++-------
arch/x86/kvm/mmu.h | 2 +
arch/x86/kvm/paging_tmpl.h | 3 +-
arch/x86/kvm/svm.c | 175 +++++++++++++---------
arch/x86/kvm/timer.c | 14 +-
arch/x86/kvm/trace.h | 260 +++++++++++++++++++++++++++++++
arch/x86/kvm/vmx.c | 208 +++++++++++++++++++------
arch/x86/kvm/x86.c | 290 ++++++++++++++++++++---------------
arch/x86/kvm/x86_emulate.c | 240 ++++++++++++++++++++++++++++-
include/linux/kvm.h | 38 +----
include/linux/kvm_host.h | 47 ++-----
include/trace/events/kvm.h | 57 +++++++
mm/hugetlb.c | 1 +
virt/kvm/coalesced_mmio.c | 28 ++--
virt/kvm/ioapic.c | 25 ++--
virt/kvm/iodev.h | 42 +++---
virt/kvm/irq_comm.c | 5 +
virt/kvm/kvm_main.c | 119 +++++++++++----
virt/kvm/kvm_trace.c | 285 ----------------------------------
41 files changed, 1574 insertions(+), 878 deletions(-)
create mode 100644 arch/powerpc/kvm/trace.h
create mode 100644 arch/x86/kvm/trace.h
create mode 100644 include/trace/events/kvm.h
delete mode 100644 virt/kvm/kvm_trace.c
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH 01/47] KVM: Return to userspace on emulation failure
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
@ 2009-08-19 13:01 ` Avi Kivity
2009-08-19 13:01 ` [PATCH 02/47] KVM: MMU: introduce is_last_spte helper Avi Kivity
` (45 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:01 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
Instead of mindlessly retrying to execute the instruction, report the
failure to userspace.
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/mmu.c | 5 +++--
include/linux/kvm.h | 7 +++++++
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5f97dbd..b6e4cda 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2673,8 +2673,9 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code)
++vcpu->stat.mmio_exits;
return 0;
case EMULATE_FAIL:
- kvm_report_emulation_failure(vcpu, "pagetable");
- return 1;
+ vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+ vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+ return 0;
default:
BUG();
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 5037e17..6710518 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -95,6 +95,10 @@ struct kvm_pit_config {
#define KVM_EXIT_S390_RESET 14
#define KVM_EXIT_DCR 15
#define KVM_EXIT_NMI 16
+#define KVM_EXIT_INTERNAL_ERROR 17
+
+/* For KVM_EXIT_INTERNAL_ERROR */
+#define KVM_INTERNAL_ERROR_EMULATION 1
/* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
struct kvm_run {
@@ -181,6 +185,9 @@ struct kvm_run {
__u32 data;
__u8 is_write;
} dcr;
+ struct {
+ __u32 suberror;
+ } internal;
/* Fix the size of the union. */
char padding[256];
};
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 02/47] KVM: MMU: introduce is_last_spte helper
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
2009-08-19 13:01 ` [PATCH 01/47] KVM: Return to userspace on emulation failure Avi Kivity
@ 2009-08-19 13:01 ` Avi Kivity
2009-08-19 13:01 ` [PATCH 03/47] KVM: MMU audit: update count_writable_mappings / count_rmaps Avi Kivity
` (44 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:01 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
Hiding some of the last largepage / level interaction (which is useful
for gbpages and for zero based levels).
Also merge the PT_PAGE_TABLE_LEVEL clearing loop in unlink_children.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/mmu.c | 26 +++++++++++++-------------
1 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b6e4cda..f85d995 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -250,6 +250,15 @@ static int is_rmap_spte(u64 pte)
return is_shadow_present_pte(pte);
}
+static int is_last_spte(u64 pte, int level)
+{
+ if (level == PT_PAGE_TABLE_LEVEL)
+ return 1;
+ if (level == PT_DIRECTORY_LEVEL && is_large_pte(pte))
+ return 1;
+ return 0;
+}
+
static pfn_t spte_to_pfn(u64 pte)
{
return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
@@ -1313,25 +1322,17 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
pt = sp->spt;
- if (sp->role.level == PT_PAGE_TABLE_LEVEL) {
- for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
- if (is_shadow_present_pte(pt[i]))
- rmap_remove(kvm, &pt[i]);
- pt[i] = shadow_trap_nonpresent_pte;
- }
- return;
- }
-
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
ent = pt[i];
if (is_shadow_present_pte(ent)) {
- if (!is_large_pte(ent)) {
+ if (!is_last_spte(ent, sp->role.level)) {
ent &= PT64_BASE_ADDR_MASK;
mmu_page_remove_parent_pte(page_header(ent),
&pt[i]);
} else {
- --kvm->stat.lpages;
+ if (is_large_pte(ent))
+ --kvm->stat.lpages;
rmap_remove(kvm, &pt[i]);
}
}
@@ -2381,8 +2382,7 @@ static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu,
pte = *spte;
if (is_shadow_present_pte(pte)) {
- if (sp->role.level == PT_PAGE_TABLE_LEVEL ||
- is_large_pte(pte))
+ if (is_last_spte(pte, sp->role.level))
rmap_remove(vcpu->kvm, spte);
else {
child = page_header(pte & PT64_BASE_ADDR_MASK);
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 03/47] KVM: MMU audit: update count_writable_mappings / count_rmaps
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
2009-08-19 13:01 ` [PATCH 01/47] KVM: Return to userspace on emulation failure Avi Kivity
2009-08-19 13:01 ` [PATCH 02/47] KVM: MMU: introduce is_last_spte helper Avi Kivity
@ 2009-08-19 13:01 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 04/47] KVM: MMU audit: update audit_write_protection Avi Kivity
` (43 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:01 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
Under testing, count_writable_mappings returns a value that is 2 integers
larger than what count_rmaps returns.
Suspicion is that either of the two functions is counting a duplicate (either
positively or negatively).
Modifying check_writable_mappings_rmap to check for rmap existance on
all present MMU pages fails to trigger an error, which should keep Avi
happy.
Also introduce mmu_spte_walk to invoke a callback on all present sptes visible
to the current vcpu, might be useful in the future.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/mmu.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++-----
1 files changed, 94 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f85d995..fd5579c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3045,6 +3045,55 @@ static gva_t canonicalize(gva_t gva)
return gva;
}
+
+typedef void (*inspect_spte_fn) (struct kvm *kvm, struct kvm_mmu_page *sp,
+ u64 *sptep);
+
+static void __mmu_spte_walk(struct kvm *kvm, struct kvm_mmu_page *sp,
+ inspect_spte_fn fn)
+{
+ int i;
+
+ for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
+ u64 ent = sp->spt[i];
+
+ if (is_shadow_present_pte(ent)) {
+ if (sp->role.level > 1 && !is_large_pte(ent)) {
+ struct kvm_mmu_page *child;
+ child = page_header(ent & PT64_BASE_ADDR_MASK);
+ __mmu_spte_walk(kvm, child, fn);
+ }
+ if (sp->role.level == 1)
+ fn(kvm, sp, &sp->spt[i]);
+ }
+ }
+}
+
+static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn)
+{
+ int i;
+ struct kvm_mmu_page *sp;
+
+ if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
+ return;
+ if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+ hpa_t root = vcpu->arch.mmu.root_hpa;
+ sp = page_header(root);
+ __mmu_spte_walk(vcpu->kvm, sp, fn);
+ return;
+ }
+ for (i = 0; i < 4; ++i) {
+ hpa_t root = vcpu->arch.mmu.pae_root[i];
+
+ if (root && VALID_PAGE(root)) {
+ root &= PT64_BASE_ADDR_MASK;
+ sp = page_header(root);
+ __mmu_spte_walk(vcpu->kvm, sp, fn);
+ }
+ }
+ return;
+}
+
static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte,
gva_t va, int level)
{
@@ -3137,9 +3186,47 @@ static int count_rmaps(struct kvm_vcpu *vcpu)
return nmaps;
}
-static int count_writable_mappings(struct kvm_vcpu *vcpu)
+void inspect_spte_has_rmap(struct kvm *kvm, struct kvm_mmu_page *sp, u64 *sptep)
+{
+ unsigned long *rmapp;
+ struct kvm_mmu_page *rev_sp;
+ gfn_t gfn;
+
+ if (*sptep & PT_WRITABLE_MASK) {
+ rev_sp = page_header(__pa(sptep));
+ gfn = rev_sp->gfns[sptep - rev_sp->spt];
+
+ if (!gfn_to_memslot(kvm, gfn)) {
+ if (!printk_ratelimit())
+ return;
+ printk(KERN_ERR "%s: no memslot for gfn %ld\n",
+ audit_msg, gfn);
+ printk(KERN_ERR "%s: index %ld of sp (gfn=%lx)\n",
+ audit_msg, sptep - rev_sp->spt,
+ rev_sp->gfn);
+ dump_stack();
+ return;
+ }
+
+ rmapp = gfn_to_rmap(kvm, rev_sp->gfns[sptep - rev_sp->spt], 0);
+ if (!*rmapp) {
+ if (!printk_ratelimit())
+ return;
+ printk(KERN_ERR "%s: no rmap for writable spte %llx\n",
+ audit_msg, *sptep);
+ dump_stack();
+ }
+ }
+
+}
+
+void audit_writable_sptes_have_rmaps(struct kvm_vcpu *vcpu)
+{
+ mmu_spte_walk(vcpu, inspect_spte_has_rmap);
+}
+
+static void check_writable_mappings_rmap(struct kvm_vcpu *vcpu)
{
- int nmaps = 0;
struct kvm_mmu_page *sp;
int i;
@@ -3156,20 +3243,16 @@ static int count_writable_mappings(struct kvm_vcpu *vcpu)
continue;
if (!(ent & PT_WRITABLE_MASK))
continue;
- ++nmaps;
+ inspect_spte_has_rmap(vcpu->kvm, sp, &pt[i]);
}
}
- return nmaps;
+ return;
}
static void audit_rmap(struct kvm_vcpu *vcpu)
{
- int n_rmap = count_rmaps(vcpu);
- int n_actual = count_writable_mappings(vcpu);
-
- if (n_rmap != n_actual)
- printk(KERN_ERR "%s: (%s) rmap %d actual %d\n",
- __func__, audit_msg, n_rmap, n_actual);
+ check_writable_mappings_rmap(vcpu);
+ count_rmaps(vcpu);
}
static void audit_write_protection(struct kvm_vcpu *vcpu)
@@ -3203,6 +3286,7 @@ static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg)
audit_rmap(vcpu);
audit_write_protection(vcpu);
audit_mappings(vcpu);
+ audit_writable_sptes_have_rmaps(vcpu);
dbg = olddbg;
}
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 04/47] KVM: MMU audit: update audit_write_protection
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (2 preceding siblings ...)
2009-08-19 13:01 ` [PATCH 03/47] KVM: MMU audit: update count_writable_mappings / count_rmaps Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 05/47] KVM: MMU audit: nontrapping ptes in nonleaf level Avi Kivity
` (42 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
- Unsync pages contain writable sptes in the rmap.
- rmaps do not exclusively contain writable sptes anymore.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/mmu.c | 14 +++++++++++---
1 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index fd5579c..4c2585c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3260,20 +3260,28 @@ static void audit_write_protection(struct kvm_vcpu *vcpu)
struct kvm_mmu_page *sp;
struct kvm_memory_slot *slot;
unsigned long *rmapp;
+ u64 *spte;
gfn_t gfn;
list_for_each_entry(sp, &vcpu->kvm->arch.active_mmu_pages, link) {
if (sp->role.direct)
continue;
+ if (sp->unsync)
+ continue;
gfn = unalias_gfn(vcpu->kvm, sp->gfn);
slot = gfn_to_memslot_unaliased(vcpu->kvm, sp->gfn);
rmapp = &slot->rmap[gfn - slot->base_gfn];
- if (*rmapp)
- printk(KERN_ERR "%s: (%s) shadow page has writable"
- " mappings: gfn %lx role %x\n",
+
+ spte = rmap_next(vcpu->kvm, rmapp, NULL);
+ while (spte) {
+ if (*spte & PT_WRITABLE_MASK)
+ printk(KERN_ERR "%s: (%s) shadow page has "
+ "writable mappings: gfn %lx role %x\n",
__func__, audit_msg, sp->gfn,
sp->role.word);
+ spte = rmap_next(vcpu->kvm, rmapp, spte);
+ }
}
}
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 05/47] KVM: MMU audit: nontrapping ptes in nonleaf level
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (3 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 04/47] KVM: MMU audit: update audit_write_protection Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 06/47] KVM: MMU audit: audit_mappings tweaks Avi Kivity
` (41 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
It is valid to set non leaf sptes as notrap.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/mmu.c | 7 +------
1 files changed, 1 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4c2585c..8643351 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3109,12 +3109,7 @@ static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte,
va = canonicalize(va);
if (level > 1) {
- if (ent == shadow_notrap_nonpresent_pte)
- printk(KERN_ERR "audit: (%s) nontrapping pte"
- " in nonleaf level: levels %d gva %lx"
- " level %d pte %llx\n", audit_msg,
- vcpu->arch.mmu.root_level, va, level, ent);
- else
+ if (is_shadow_present_pte(ent))
audit_mappings_page(vcpu, ent, va, level - 1);
} else {
gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, va);
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 06/47] KVM: MMU audit: audit_mappings tweaks
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (4 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 05/47] KVM: MMU audit: nontrapping ptes in nonleaf level Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 07/47] KVM: MMU audit: largepage handling Avi Kivity
` (40 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
- Fail early in case gfn_to_pfn returns is_error_pfn.
- For the pre pte write case, avoid spurious "gva is valid but spte is notrap"
messages (the emulation code does the guest write first, so this particular
case is OK).
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/mmu.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8643351..50fe854 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3117,6 +3117,11 @@ static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte,
pfn_t pfn = gfn_to_pfn(vcpu->kvm, gfn);
hpa_t hpa = (hpa_t)pfn << PAGE_SHIFT;
+ if (is_error_pfn(pfn)) {
+ kvm_release_pfn_clean(pfn);
+ continue;
+ }
+
if (is_shadow_present_pte(ent)
&& (ent & PT64_BASE_ADDR_MASK) != hpa)
printk(KERN_ERR "xx audit error: (%s) levels %d"
@@ -3288,7 +3293,8 @@ static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg)
audit_msg = msg;
audit_rmap(vcpu);
audit_write_protection(vcpu);
- audit_mappings(vcpu);
+ if (strcmp("pre pte write", audit_msg) != 0)
+ audit_mappings(vcpu);
audit_writable_sptes_have_rmaps(vcpu);
dbg = olddbg;
}
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 07/47] KVM: MMU audit: largepage handling
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (5 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 06/47] KVM: MMU audit: audit_mappings tweaks Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 08/47] KVM: Move performance counter MSR access interception to generic x86 path Avi Kivity
` (39 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
Make the audit code aware of largepages.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/mmu.c | 15 +++++++--------
1 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 50fe854..780ce3f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3058,12 +3058,11 @@ static void __mmu_spte_walk(struct kvm *kvm, struct kvm_mmu_page *sp,
u64 ent = sp->spt[i];
if (is_shadow_present_pte(ent)) {
- if (sp->role.level > 1 && !is_large_pte(ent)) {
+ if (!is_last_spte(ent, sp->role.level)) {
struct kvm_mmu_page *child;
child = page_header(ent & PT64_BASE_ADDR_MASK);
__mmu_spte_walk(kvm, child, fn);
- }
- if (sp->role.level == 1)
+ } else
fn(kvm, sp, &sp->spt[i]);
}
}
@@ -3108,10 +3107,9 @@ static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte,
continue;
va = canonicalize(va);
- if (level > 1) {
- if (is_shadow_present_pte(ent))
- audit_mappings_page(vcpu, ent, va, level - 1);
- } else {
+ if (is_shadow_present_pte(ent) && !is_last_spte(ent, level))
+ audit_mappings_page(vcpu, ent, va, level - 1);
+ else {
gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, va);
gfn_t gfn = gpa >> PAGE_SHIFT;
pfn_t pfn = gfn_to_pfn(vcpu->kvm, gfn);
@@ -3208,7 +3206,8 @@ void inspect_spte_has_rmap(struct kvm *kvm, struct kvm_mmu_page *sp, u64 *sptep)
return;
}
- rmapp = gfn_to_rmap(kvm, rev_sp->gfns[sptep - rev_sp->spt], 0);
+ rmapp = gfn_to_rmap(kvm, rev_sp->gfns[sptep - rev_sp->spt],
+ is_large_pte(*sptep));
if (!*rmapp) {
if (!printk_ratelimit())
return;
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 08/47] KVM: Move performance counter MSR access interception to generic x86 path
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (6 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 07/47] KVM: MMU audit: largepage handling Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 09/47] KVM: VMX: more MSR_IA32_VMX_EPT_VPID_CAP capability bits Avi Kivity
` (38 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Andre Przywara <andre.przywara@amd.com>
The performance counter MSRs are different for AMD and Intel CPUs and they
are chosen mainly by the CPUID vendor string. This patch catches writes to
all addresses (regardless of VMX/SVM path) and handles them in the generic
MSR handler routine. Writing a 0 into the event select register is something
we perfectly emulate ;-), so don't print out a warning to dmesg in this
case.
This fixes booting a 64bit Windows guest with an AMD CPUID on an Intel host.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/svm.c | 16 ----------------
arch/x86/kvm/vmx.c | 12 ------------
arch/x86/kvm/x86.c | 30 ++++++++++++++++++++++++++++++
3 files changed, 30 insertions(+), 28 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a7fa87b..e1dd47d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2143,22 +2143,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
else
svm_disable_lbrv(svm);
break;
- case MSR_K7_EVNTSEL0:
- case MSR_K7_EVNTSEL1:
- case MSR_K7_EVNTSEL2:
- case MSR_K7_EVNTSEL3:
- case MSR_K7_PERFCTR0:
- case MSR_K7_PERFCTR1:
- case MSR_K7_PERFCTR2:
- case MSR_K7_PERFCTR3:
- /*
- * Just discard all writes to the performance counters; this
- * should keep both older linux and windows 64-bit guests
- * happy
- */
- pr_unimpl(vcpu, "unimplemented perfctr wrmsr: 0x%x data 0x%llx\n", ecx, data);
-
- break;
case MSR_VM_HSAVE_PA:
svm->hsave_msr = data;
break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c08bb4c..6ee9292 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1025,18 +1025,6 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
rdtscll(host_tsc);
guest_write_tsc(data, host_tsc);
break;
- case MSR_P6_PERFCTR0:
- case MSR_P6_PERFCTR1:
- case MSR_P6_EVNTSEL0:
- case MSR_P6_EVNTSEL1:
- /*
- * Just discard all writes to the performance counters; this
- * should keep both older linux and windows 64-bit guests
- * happy
- */
- pr_unimpl(vcpu, "unimplemented perfctr wrmsr: 0x%x data 0x%llx\n", msr_index, data);
-
- break;
case MSR_IA32_CR_PAT:
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
vmcs_write64(GUEST_IA32_PAT, data);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 89862a8..30492f0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -886,6 +886,36 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
case MSR_IA32_MCG_STATUS:
case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1:
return set_msr_mce(vcpu, msr, data);
+
+ /* Performance counters are not protected by a CPUID bit,
+ * so we should check all of them in the generic path for the sake of
+ * cross vendor migration.
+ * Writing a zero into the event select MSRs disables them,
+ * which we perfectly emulate ;-). Any other value should be at least
+ * reported, some guests depend on them.
+ */
+ case MSR_P6_EVNTSEL0:
+ case MSR_P6_EVNTSEL1:
+ case MSR_K7_EVNTSEL0:
+ case MSR_K7_EVNTSEL1:
+ case MSR_K7_EVNTSEL2:
+ case MSR_K7_EVNTSEL3:
+ if (data != 0)
+ pr_unimpl(vcpu, "unimplemented perfctr wrmsr: "
+ "0x%x data 0x%llx\n", msr, data);
+ break;
+ /* at least RHEL 4 unconditionally writes to the perfctr registers,
+ * so we ignore writes to make it happy.
+ */
+ case MSR_P6_PERFCTR0:
+ case MSR_P6_PERFCTR1:
+ case MSR_K7_PERFCTR0:
+ case MSR_K7_PERFCTR1:
+ case MSR_K7_PERFCTR2:
+ case MSR_K7_PERFCTR3:
+ pr_unimpl(vcpu, "unimplemented perfctr wrmsr: "
+ "0x%x data 0x%llx\n", msr, data);
+ break;
default:
pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", msr, data);
return 1;
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 09/47] KVM: VMX: more MSR_IA32_VMX_EPT_VPID_CAP capability bits
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (7 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 08/47] KVM: Move performance counter MSR access interception to generic x86 path Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 10/47] KVM: MMU: make for_each_shadow_entry aware of largepages Avi Kivity
` (37 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
Required for EPT misconfiguration handler.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/include/asm/vmx.h | 7 +++++++
arch/x86/kvm/vmx.c | 20 ++++++++++++++++++++
2 files changed, 27 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index e7927a6..272514c 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -352,9 +352,16 @@ enum vmcs_field {
#define VMX_EPT_EXTENT_INDIVIDUAL_ADDR 0
#define VMX_EPT_EXTENT_CONTEXT 1
#define VMX_EPT_EXTENT_GLOBAL 2
+
+#define VMX_EPT_EXECUTE_ONLY_BIT (1ull)
+#define VMX_EPT_PAGE_WALK_4_BIT (1ull << 6)
+#define VMX_EPTP_UC_BIT (1ull << 8)
+#define VMX_EPTP_WB_BIT (1ull << 14)
+#define VMX_EPT_2MB_PAGE_BIT (1ull << 16)
#define VMX_EPT_EXTENT_INDIVIDUAL_BIT (1ull << 24)
#define VMX_EPT_EXTENT_CONTEXT_BIT (1ull << 25)
#define VMX_EPT_EXTENT_GLOBAL_BIT (1ull << 26)
+
#define VMX_EPT_DEFAULT_GAW 3
#define VMX_EPT_MAX_GAW 0x4
#define VMX_EPT_MT_EPTE_SHIFT 3
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6ee9292..6610181 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -270,6 +270,26 @@ static inline bool cpu_has_vmx_flexpriority(void)
cpu_has_vmx_virtualize_apic_accesses();
}
+static inline bool cpu_has_vmx_ept_execute_only(void)
+{
+ return !!(vmx_capability.ept & VMX_EPT_EXECUTE_ONLY_BIT);
+}
+
+static inline bool cpu_has_vmx_eptp_uncacheable(void)
+{
+ return !!(vmx_capability.ept & VMX_EPTP_UC_BIT);
+}
+
+static inline bool cpu_has_vmx_eptp_writeback(void)
+{
+ return !!(vmx_capability.ept & VMX_EPTP_WB_BIT);
+}
+
+static inline bool cpu_has_vmx_ept_2m_page(void)
+{
+ return !!(vmx_capability.ept & VMX_EPT_2MB_PAGE_BIT);
+}
+
static inline int cpu_has_vmx_invept_individual_addr(void)
{
return !!(vmx_capability.ept & VMX_EPT_EXTENT_INDIVIDUAL_BIT);
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 10/47] KVM: MMU: make for_each_shadow_entry aware of largepages
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (8 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 09/47] KVM: VMX: more MSR_IA32_VMX_EPT_VPID_CAP capability bits Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 11/47] KVM: MMU: add kvm_mmu_get_spte_hierarchy helper Avi Kivity
` (36 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
This way there is no need to add explicit checks in every
for_each_shadow_entry user.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/mmu.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 780ce3f..e18f65b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1302,6 +1302,11 @@ static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
{
if (iterator->level < PT_PAGE_TABLE_LEVEL)
return false;
+
+ if (iterator->level == PT_PAGE_TABLE_LEVEL)
+ if (is_large_pte(*iterator->sptep))
+ return false;
+
iterator->index = SHADOW_PT_INDEX(iterator->addr, iterator->level);
iterator->sptep = ((u64 *)__va(iterator->shadow_addr)) + iterator->index;
return true;
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 11/47] KVM: MMU: add kvm_mmu_get_spte_hierarchy helper
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (9 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 10/47] KVM: MMU: make for_each_shadow_entry aware of largepages Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 12/47] KVM: VMX: EPT misconfiguration handler Avi Kivity
` (35 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
Required by EPT misconfiguration handler.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/mmu.c | 18 ++++++++++++++++++
arch/x86/kvm/mmu.h | 2 ++
2 files changed, 20 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e18f65b..12974de 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3038,6 +3038,24 @@ out:
return r;
}
+int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4])
+{
+ struct kvm_shadow_walk_iterator iterator;
+ int nr_sptes = 0;
+
+ spin_lock(&vcpu->kvm->mmu_lock);
+ for_each_shadow_entry(vcpu, addr, iterator) {
+ sptes[iterator.level-1] = *iterator.sptep;
+ nr_sptes++;
+ if (!is_shadow_present_pte(*iterator.sptep))
+ break;
+ }
+ spin_unlock(&vcpu->kvm->mmu_lock);
+
+ return nr_sptes;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_get_spte_hierarchy);
+
#ifdef AUDIT
static const char *audit_msg;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 016bf71..61a1b38 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -37,6 +37,8 @@
#define PT32_ROOT_LEVEL 2
#define PT32E_ROOT_LEVEL 3
+int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
+
static inline void kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
{
if (unlikely(vcpu->kvm->arch.n_free_mmu_pages < KVM_MIN_FREE_MMU_PAGES))
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 12/47] KVM: VMX: EPT misconfiguration handler
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (10 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 11/47] KVM: MMU: add kvm_mmu_get_spte_hierarchy helper Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 13/47] KVM: VMX: conditionally disable 2M pages Avi Kivity
` (34 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
Handler for EPT misconfiguration which checks for valid state
in the shadow pagetables, printing the spte on each level.
The separate WARN_ONs are useful for kerneloops.org.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/vmx.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 85 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6610181..94c07ad 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3227,6 +3227,89 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
return kvm_mmu_page_fault(vcpu, gpa & PAGE_MASK, 0);
}
+static u64 ept_rsvd_mask(u64 spte, int level)
+{
+ int i;
+ u64 mask = 0;
+
+ for (i = 51; i > boot_cpu_data.x86_phys_bits; i--)
+ mask |= (1ULL << i);
+
+ if (level > 2)
+ /* bits 7:3 reserved */
+ mask |= 0xf8;
+ else if (level == 2) {
+ if (spte & (1ULL << 7))
+ /* 2MB ref, bits 20:12 reserved */
+ mask |= 0x1ff000;
+ else
+ /* bits 6:3 reserved */
+ mask |= 0x78;
+ }
+
+ return mask;
+}
+
+static void ept_misconfig_inspect_spte(struct kvm_vcpu *vcpu, u64 spte,
+ int level)
+{
+ printk(KERN_ERR "%s: spte 0x%llx level %d\n", __func__, spte, level);
+
+ /* 010b (write-only) */
+ WARN_ON((spte & 0x7) == 0x2);
+
+ /* 110b (write/execute) */
+ WARN_ON((spte & 0x7) == 0x6);
+
+ /* 100b (execute-only) and value not supported by logical processor */
+ if (!cpu_has_vmx_ept_execute_only())
+ WARN_ON((spte & 0x7) == 0x4);
+
+ /* not 000b */
+ if ((spte & 0x7)) {
+ u64 rsvd_bits = spte & ept_rsvd_mask(spte, level);
+
+ if (rsvd_bits != 0) {
+ printk(KERN_ERR "%s: rsvd_bits = 0x%llx\n",
+ __func__, rsvd_bits);
+ WARN_ON(1);
+ }
+
+ if (level == 1 || (level == 2 && (spte & (1ULL << 7)))) {
+ u64 ept_mem_type = (spte & 0x38) >> 3;
+
+ if (ept_mem_type == 2 || ept_mem_type == 3 ||
+ ept_mem_type == 7) {
+ printk(KERN_ERR "%s: ept_mem_type=0x%llx\n",
+ __func__, ept_mem_type);
+ WARN_ON(1);
+ }
+ }
+ }
+}
+
+static int handle_ept_misconfig(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+{
+ u64 sptes[4];
+ int nr_sptes, i;
+ gpa_t gpa;
+
+ gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+
+ printk(KERN_ERR "EPT: Misconfiguration.\n");
+ printk(KERN_ERR "EPT: GPA: 0x%llx\n", gpa);
+
+ nr_sptes = kvm_mmu_get_spte_hierarchy(vcpu, gpa, sptes);
+
+ for (i = PT64_ROOT_LEVEL; i > PT64_ROOT_LEVEL - nr_sptes; --i)
+ ept_misconfig_inspect_spte(vcpu, sptes[i-1], i);
+
+ kvm_run->exit_reason = KVM_EXIT_UNKNOWN;
+ kvm_run->hw.hardware_exit_reason = EXIT_REASON_EPT_MISCONFIG;
+
+ return 0;
+}
+
static int handle_nmi_window(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
u32 cpu_based_vm_exec_control;
@@ -3306,8 +3389,9 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu,
[EXIT_REASON_APIC_ACCESS] = handle_apic_access,
[EXIT_REASON_WBINVD] = handle_wbinvd,
[EXIT_REASON_TASK_SWITCH] = handle_task_switch,
- [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation,
[EXIT_REASON_MCE_DURING_VMENTRY] = handle_machine_check,
+ [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation,
+ [EXIT_REASON_EPT_MISCONFIG] = handle_ept_misconfig,
};
static const int kvm_vmx_max_exit_handlers =
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 13/47] KVM: VMX: conditionally disable 2M pages
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (11 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 12/47] KVM: VMX: EPT misconfiguration handler Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 14/47] KVM: Replace pending exception by PF if it happens serially Avi Kivity
` (33 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear
in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled.
[avi: s/largepages_disabled/largepages_enabled/ to avoid negative logic]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/vmx.c | 3 +++
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 14 ++++++++++++--
3 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 94c07ad..fc8d49c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1381,6 +1381,9 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_tpr_shadow())
kvm_x86_ops->update_cr8_intercept = NULL;
+ if (enable_ept && !cpu_has_vmx_ept_2m_page())
+ kvm_disable_largepages();
+
return alloc_kvm_area();
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c6e4d02..6988858 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -224,6 +224,7 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem,
struct kvm_memory_slot old,
int user_alloc);
+void kvm_disable_largepages(void);
void kvm_arch_flush_shadow(struct kvm *kvm);
gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn);
struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 777fe53..48d5e69 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -85,6 +85,8 @@ static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
static bool kvm_rebooting;
+static bool largepages_enabled = true;
+
#ifdef KVM_CAP_DEVICE_ASSIGNMENT
static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *head,
int assigned_dev_id)
@@ -1174,9 +1176,11 @@ int __kvm_set_memory_region(struct kvm *kvm,
ugfn = new.userspace_addr >> PAGE_SHIFT;
/*
* If the gfn and userspace address are not aligned wrt each
- * other, disable large page support for this slot
+ * other, or if explicitly asked to, disable large page
+ * support for this slot
*/
- if ((base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE - 1))
+ if ((base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE - 1) ||
+ !largepages_enabled)
for (i = 0; i < largepages; ++i)
new.lpage_info[i].write_count = 1;
}
@@ -1291,6 +1295,12 @@ out:
return r;
}
+void kvm_disable_largepages(void)
+{
+ largepages_enabled = false;
+}
+EXPORT_SYMBOL_GPL(kvm_disable_largepages);
+
int is_error_page(struct page *page)
{
return page == bad_page;
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 14/47] KVM: Replace pending exception by PF if it happens serially
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (12 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 13/47] KVM: VMX: conditionally disable 2M pages Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 15/47] KVM: Optimize searching for highest IRR Avi Kivity
` (32 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Gleb Natapov <gleb@redhat.com>
Replace previous exception with a new one in a hope that instruction
re-execution will regenerate lost exception.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/x86.c | 20 +++++++++++++-------
1 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 30492f0..a066876 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -181,16 +181,22 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long addr,
++vcpu->stat.pf_guest;
if (vcpu->arch.exception.pending) {
- if (vcpu->arch.exception.nr == PF_VECTOR) {
- printk(KERN_DEBUG "kvm: inject_page_fault:"
- " double fault 0x%lx\n", addr);
- vcpu->arch.exception.nr = DF_VECTOR;
- vcpu->arch.exception.error_code = 0;
- } else if (vcpu->arch.exception.nr == DF_VECTOR) {
+ switch(vcpu->arch.exception.nr) {
+ case DF_VECTOR:
/* triple fault -> shutdown */
set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
+ return;
+ case PF_VECTOR:
+ vcpu->arch.exception.nr = DF_VECTOR;
+ vcpu->arch.exception.error_code = 0;
+ return;
+ default:
+ /* replace previous exception with a new one in a hope
+ that instruction re-execution will regenerate lost
+ exception */
+ vcpu->arch.exception.pending = false;
+ break;
}
- return;
}
vcpu->arch.cr2 = addr;
kvm_queue_exception_e(vcpu, PF_VECTOR, error_code);
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 15/47] KVM: Optimize searching for highest IRR
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (13 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 14/47] KVM: Replace pending exception by PF if it happens serially Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 16/47] KVM: Fix racy event propagation in timer Avi Kivity
` (31 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Gleb Natapov <gleb@redhat.com>
Most of the time IRR is empty, so instead of scanning the whole IRR on
each VM entry keep a variable that tells us if IRR is not empty. IRR
will have to be scanned twice on each IRQ delivery, but this is much
more rare than VM entry.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/lapic.c | 24 +++++++++++++++++++++---
arch/x86/kvm/lapic.h | 1 +
2 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b1694dc..3bde43c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -165,29 +165,46 @@ static int find_highest_vector(void *bitmap)
static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic)
{
+ apic->irr_pending = true;
return apic_test_and_set_vector(vec, apic->regs + APIC_IRR);
}
-static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
+static inline int apic_search_irr(struct kvm_lapic *apic)
{
- apic_clear_vector(vec, apic->regs + APIC_IRR);
+ return find_highest_vector(apic->regs + APIC_IRR);
}
static inline int apic_find_highest_irr(struct kvm_lapic *apic)
{
int result;
- result = find_highest_vector(apic->regs + APIC_IRR);
+ if (!apic->irr_pending)
+ return -1;
+
+ result = apic_search_irr(apic);
ASSERT(result == -1 || result >= 16);
return result;
}
+static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
+{
+ apic->irr_pending = false;
+ apic_clear_vector(vec, apic->regs + APIC_IRR);
+ if (apic_search_irr(apic) != -1)
+ apic->irr_pending = true;
+}
+
int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;
int highest_irr;
+ /* This may race with setting of irr in __apic_accept_irq() and
+ * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
+ * will cause vmexit immediately and the value will be recalculated
+ * on the next vmentry.
+ */
if (!apic)
return 0;
highest_irr = apic_find_highest_irr(apic);
@@ -843,6 +860,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
apic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
apic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
}
+ apic->irr_pending = false;
update_divide_count(apic);
atomic_set(&apic->lapic_timer.pending, 0);
if (kvm_vcpu_is_bsp(vcpu))
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index a587f83..3f3ecc6 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -12,6 +12,7 @@ struct kvm_lapic {
struct kvm_timer lapic_timer;
u32 divide_count;
struct kvm_vcpu *vcpu;
+ bool irr_pending;
struct page *regs_page;
void *regs;
gpa_t vapic_addr;
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 16/47] KVM: Fix racy event propagation in timer
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (14 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 15/47] KVM: Optimize searching for highest IRR Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 17/47] KVM: Drop useless atomic test from timer function Avi Kivity
` (30 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Jan Kiszka <jan.kiszka@siemens.com>
Minor issue that likely had no practical relevance: the kvm timer
function so far incremented the pending counter and then may reset it
again to 1 in case reinjection was disabled. This opened a small racy
window with the corresponding VCPU loop that may have happened to run
on another (real) CPU and already consumed the value.
Fix it by skipping the incrementation in case pending is already > 0.
This opens a different race windows, but may only rarely cause lost
events in case we do not care about them anyway (!reinject).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/timer.c | 16 ++++++++++------
1 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c
index 85cc743..1baed41 100644
--- a/arch/x86/kvm/timer.c
+++ b/arch/x86/kvm/timer.c
@@ -9,12 +9,16 @@ static int __kvm_timer_fn(struct kvm_vcpu *vcpu, struct kvm_timer *ktimer)
int restart_timer = 0;
wait_queue_head_t *q = &vcpu->wq;
- /* FIXME: this code should not know anything about vcpus */
- if (!atomic_inc_and_test(&ktimer->pending))
- set_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests);
-
- if (!ktimer->reinject)
- atomic_set(&ktimer->pending, 1);
+ /*
+ * There is a race window between reading and incrementing, but we do
+ * not care about potentially loosing timer events in the !reinject
+ * case anyway.
+ */
+ if (ktimer->reinject || !atomic_read(&ktimer->pending)) {
+ /* FIXME: this code should not know anything about vcpus */
+ if (!atomic_inc_and_test(&ktimer->pending))
+ set_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests);
+ }
if (waitqueue_active(q))
wake_up_interruptible(q);
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 17/47] KVM: Drop useless atomic test from timer function
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (15 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 16/47] KVM: Fix racy event propagation in timer Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 18/47] KVM: VMX: Only reload guest cr2 if different from host cr2 Avi Kivity
` (29 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Jan Kiszka <jan.kiszka@siemens.com>
The current code tries to optimize the setting of
KVM_REQ_PENDING_TIMER but used atomic_inc_and_test - which always
returns true unless pending had the invalid value of -1 on entry. This
patch drops the test part preserving the original semantic but
expressing it less confusingly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/timer.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c
index 1baed41..eea4043 100644
--- a/arch/x86/kvm/timer.c
+++ b/arch/x86/kvm/timer.c
@@ -15,9 +15,9 @@ static int __kvm_timer_fn(struct kvm_vcpu *vcpu, struct kvm_timer *ktimer)
* case anyway.
*/
if (ktimer->reinject || !atomic_read(&ktimer->pending)) {
+ atomic_inc(&ktimer->pending);
/* FIXME: this code should not know anything about vcpus */
- if (!atomic_inc_and_test(&ktimer->pending))
- set_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests);
+ set_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests);
}
if (waitqueue_active(q))
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 18/47] KVM: VMX: Only reload guest cr2 if different from host cr2
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (16 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 17/47] KVM: Drop useless atomic test from timer function Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 19/47] KVM: SVM: Don't save/restore " Avi Kivity
` (28 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
cr2 changes only rarely, and writing it is expensive. Avoid the costly cr2
writes by checking if it does not already hold the desired value.
Shaves 70 cycles off the vmexit latency.
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/vmx.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fc8d49c..1a84ca1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3651,11 +3651,16 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
"mov %%"R"sp, %c[host_rsp](%0) \n\t"
__ex(ASM_VMX_VMWRITE_RSP_RDX) "\n\t"
"1: \n\t"
+ /* Reload cr2 if changed */
+ "mov %c[cr2](%0), %%"R"ax \n\t"
+ "mov %%cr2, %%"R"dx \n\t"
+ "cmp %%"R"ax, %%"R"dx \n\t"
+ "je 2f \n\t"
+ "mov %%"R"ax, %%cr2 \n\t"
+ "2: \n\t"
/* Check if vmlaunch of vmresume is needed */
"cmpl $0, %c[launched](%0) \n\t"
/* Load guest registers. Don't clobber flags. */
- "mov %c[cr2](%0), %%"R"ax \n\t"
- "mov %%"R"ax, %%cr2 \n\t"
"mov %c[rax](%0), %%"R"ax \n\t"
"mov %c[rbx](%0), %%"R"bx \n\t"
"mov %c[rdx](%0), %%"R"dx \n\t"
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 19/47] KVM: SVM: Don't save/restore host cr2
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (17 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 18/47] KVM: VMX: Only reload guest cr2 if different from host cr2 Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 20/47] x86: Add definition for IGNNE MSR Avi Kivity
` (27 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
The host never reads cr2 in process context, so are free to clobber it. The
vmx code does this, so we can safely remove the save/restore code.
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/svm.c | 17 -----------------
1 files changed, 0 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index e1dd47d..5a1f26c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -82,7 +82,6 @@ struct vcpu_svm {
u64 host_user_msrs[NR_HOST_SAVE_USER_MSRS];
u64 host_gs_base;
- unsigned long host_cr2;
u32 *msrpm;
struct vmcb *hsave;
@@ -187,19 +186,6 @@ static inline void invlpga(unsigned long addr, u32 asid)
asm volatile (__ex(SVM_INVLPGA) :: "a"(addr), "c"(asid));
}
-static inline unsigned long kvm_read_cr2(void)
-{
- unsigned long cr2;
-
- asm volatile ("mov %%cr2, %0" : "=r" (cr2));
- return cr2;
-}
-
-static inline void kvm_write_cr2(unsigned long val)
-{
- asm volatile ("mov %0, %%cr2" :: "r" (val));
-}
-
static inline void force_new_asid(struct kvm_vcpu *vcpu)
{
to_svm(vcpu)->asid_generation--;
@@ -2528,7 +2514,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
fs_selector = kvm_read_fs();
gs_selector = kvm_read_gs();
ldt_selector = kvm_read_ldt();
- svm->host_cr2 = kvm_read_cr2();
if (!is_nested(svm))
svm->vmcb->save.cr2 = vcpu->arch.cr2;
/* required for live migration with NPT */
@@ -2615,8 +2600,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
- kvm_write_cr2(svm->host_cr2);
-
kvm_load_fs(fs_selector);
kvm_load_gs(gs_selector);
kvm_load_ldt(ldt_selector);
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 20/47] x86: Add definition for IGNNE MSR
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (18 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 19/47] KVM: SVM: Don't save/restore " Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 21/47] KVM: Implement MSRs used by Hyper-V Avi Kivity
` (26 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Alexander Graf <agraf@suse.de>
Hyper-V accesses MSR_IGNNE while running under KVM.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/include/asm/msr-index.h | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 6be7fc2..bd55490 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -374,6 +374,7 @@
/* AMD-V MSRs */
#define MSR_VM_CR 0xc0010114
+#define MSR_VM_IGNNE 0xc0010115
#define MSR_VM_HSAVE_PA 0xc0010117
#endif /* _ASM_X86_MSR_INDEX_H */
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 21/47] KVM: Implement MSRs used by Hyper-V
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (19 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 20/47] x86: Add definition for IGNNE MSR Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 22/47] KVM: SVM: Implement INVLPGA Avi Kivity
` (25 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Alexander Graf <agraf@suse.de>
Hyper-V uses some MSRs, some of which are actually reserved for BIOS usage.
But let's be nice today and have it its way, because otherwise it fails
terribly.
[jaswinder: fix build for linux-next changes]
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/svm.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5a1f26c..8f80190 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2132,6 +2132,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
case MSR_VM_HSAVE_PA:
svm->hsave_msr = data;
break;
+ case MSR_VM_CR:
+ case MSR_VM_IGNNE:
+ case MSR_K7_HWCR:
+ pr_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
+ break;
default:
return kvm_set_msr_common(vcpu, ecx, data);
}
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 22/47] KVM: SVM: Implement INVLPGA
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (20 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 21/47] KVM: Implement MSRs used by Hyper-V Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 23/47] KVM: SVM: Improve nested interrupt injection Avi Kivity
` (24 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Alexander Graf <agraf@suse.de>
SVM adds another way to do INVLPG by ASID which Hyper-V makes use of,
so let's implement it!
For now we just do the same thing invlpg does, as asid switching
means we flush the mmu anyways. That might change one day though.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/svm.c | 15 ++++++++++++++-
1 files changed, 14 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8f80190..cd5a081 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1883,6 +1883,19 @@ static int clgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
return 1;
}
+static int invlpga_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ nsvm_printk("INVLPGA\n");
+
+ /* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
+ kvm_mmu_invlpg(vcpu, vcpu->arch.regs[VCPU_REGS_RAX]);
+
+ svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+ skip_emulated_instruction(&svm->vcpu);
+ return 1;
+}
+
static int invalid_op_interception(struct vcpu_svm *svm,
struct kvm_run *kvm_run)
{
@@ -2228,7 +2241,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
[SVM_EXIT_INVD] = emulate_on_interception,
[SVM_EXIT_HLT] = halt_interception,
[SVM_EXIT_INVLPG] = invlpg_interception,
- [SVM_EXIT_INVLPGA] = invalid_op_interception,
+ [SVM_EXIT_INVLPGA] = invlpga_interception,
[SVM_EXIT_IOIO] = io_interception,
[SVM_EXIT_MSR] = msr_interception,
[SVM_EXIT_TASK_SWITCH] = task_switch_interception,
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 23/47] KVM: SVM: Improve nested interrupt injection
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (21 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 22/47] KVM: SVM: Implement INVLPGA Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 24/47] KVM: convert custom marker based tracing to event traces Avi Kivity
` (23 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Alexander Graf <agraf@suse.de>
While trying to get Hyper-V running, I realized that the interrupt injection
mechanisms that are in place right now are not 100% correct.
This patch makes nested SVM's interrupt injection behave more like on a
real machine.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/svm.c | 39 ++++++++++++++++++++++++---------------
1 files changed, 24 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index cd5a081..b2e23c8 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1615,7 +1615,8 @@ static int nested_svm_vmexit_real(struct vcpu_svm *svm, void *arg1,
/* Kill any pending exceptions */
if (svm->vcpu.arch.exception.pending == true)
nsvm_printk("WARNING: Pending Exception\n");
- svm->vcpu.arch.exception.pending = false;
+ kvm_clear_exception_queue(&svm->vcpu);
+ kvm_clear_interrupt_queue(&svm->vcpu);
/* Restore selected save entries */
svm->vmcb->save.es = hsave->save.es;
@@ -1683,7 +1684,8 @@ static int nested_svm_vmrun(struct vcpu_svm *svm, void *arg1,
svm->nested_vmcb = svm->vmcb->save.rax;
/* Clear internal status */
- svm->vcpu.arch.exception.pending = false;
+ kvm_clear_exception_queue(&svm->vcpu);
+ kvm_clear_interrupt_queue(&svm->vcpu);
/* Save the old vmcb, so we don't need to pick what we save, but
can restore everything when a VMEXIT occurs */
@@ -2363,21 +2365,14 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq)
((/*control->int_vector >> 4*/ 0xf) << V_INTR_PRIO_SHIFT);
}
-static void svm_queue_irq(struct kvm_vcpu *vcpu, unsigned nr)
-{
- struct vcpu_svm *svm = to_svm(vcpu);
-
- svm->vmcb->control.event_inj = nr |
- SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR;
-}
-
static void svm_set_irq(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- nested_svm_intr(svm);
+ BUG_ON(!(svm->vcpu.arch.hflags & HF_GIF_MASK));
- svm_queue_irq(vcpu, vcpu->arch.interrupt.nr);
+ svm->vmcb->control.event_inj = vcpu->arch.interrupt.nr |
+ SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR;
}
static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
@@ -2405,13 +2400,25 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu)
struct vmcb *vmcb = svm->vmcb;
return (vmcb->save.rflags & X86_EFLAGS_IF) &&
!(vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) &&
- (svm->vcpu.arch.hflags & HF_GIF_MASK);
+ (svm->vcpu.arch.hflags & HF_GIF_MASK) &&
+ !is_nested(svm);
}
static void enable_irq_window(struct kvm_vcpu *vcpu)
{
- svm_set_vintr(to_svm(vcpu));
- svm_inject_irq(to_svm(vcpu), 0x0);
+ struct vcpu_svm *svm = to_svm(vcpu);
+ nsvm_printk("Trying to open IRQ window\n");
+
+ nested_svm_intr(svm);
+
+ /* In case GIF=0 we can't rely on the CPU to tell us when
+ * GIF becomes 1, because that's a separate STGI/VMRUN intercept.
+ * The next time we get that intercept, this function will be
+ * called again though and we'll get the vintr intercept. */
+ if (svm->vcpu.arch.hflags & HF_GIF_MASK) {
+ svm_set_vintr(svm);
+ svm_inject_irq(svm, 0x0);
+ }
}
static void enable_nmi_window(struct kvm_vcpu *vcpu)
@@ -2490,6 +2497,8 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
case SVM_EXITINTINFO_TYPE_EXEPT:
/* In case of software exception do not reinject an exception
vector, but re-execute and instruction instead */
+ if (is_nested(svm))
+ break;
if (kvm_exception_is_soft(vector))
break;
if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) {
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 24/47] KVM: convert custom marker based tracing to event traces
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (22 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 23/47] KVM: SVM: Improve nested interrupt injection Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 25/47] KVM: Allow emulation of syscalls instructions on #UD Avi Kivity
` (22 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
This allows use of the powerful ftrace infrastructure.
See Documentation/trace/ for usage information.
[avi, stephen: various build fixes]
[sheng: fix control register breakage]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/Makefile | 4 +
arch/x86/kvm/lapic.c | 7 +-
arch/x86/kvm/svm.c | 84 +++++++++----
arch/x86/kvm/trace.h | 260 +++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx.c | 78 +++++++-----
arch/x86/kvm/x86.c | 48 +++-----
include/trace/events/kvm.h | 57 +++++++++
virt/kvm/irq_comm.c | 5 +
virt/kvm/kvm_main.c | 4 +
10 files changed, 463 insertions(+), 86 deletions(-)
create mode 100644 arch/x86/kvm/trace.h
create mode 100644 include/trace/events/kvm.h
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c7b0cc2..19027ab 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -14,6 +14,7 @@
#include <linux/types.h>
#include <linux/mm.h>
#include <linux/mmu_notifier.h>
+#include <linux/tracepoint.h>
#include <linux/kvm.h>
#include <linux/kvm_para.h>
@@ -527,6 +528,7 @@ struct kvm_x86_ops {
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
+ const struct trace_print_flags *exit_reasons_str;
};
extern struct kvm_x86_ops *kvm_x86_ops;
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 01e3c61..7c56850 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -1,6 +1,10 @@
EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
+CFLAGS_x86.o := -I.
+CFLAGS_svm.o := -I.
+CFLAGS_vmx.o := -I.
+
kvm-y += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o)
kvm-$(CONFIG_KVM_TRACE) += $(addprefix ../../../virt/kvm/, kvm_trace.o)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 3bde43c..2e02865 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -34,6 +34,7 @@
#include <asm/atomic.h>
#include "kvm_cache_regs.h"
#include "irq.h"
+#include "trace.h"
#ifndef CONFIG_X86_64
#define mod_64(x, y) ((x) - (y) * div64_u64(x, y))
@@ -515,8 +516,6 @@ static u32 __apic_read(struct kvm_lapic *apic, unsigned int offset)
{
u32 val = 0;
- KVMTRACE_1D(APIC_ACCESS, apic->vcpu, (u32)offset, handler);
-
if (offset >= LAPIC_MMIO_LENGTH)
return 0;
@@ -562,6 +561,8 @@ static void apic_mmio_read(struct kvm_io_device *this,
}
result = __apic_read(apic, offset & ~0xf);
+ trace_kvm_apic_read(offset, result);
+
switch (len) {
case 1:
case 2:
@@ -657,7 +658,7 @@ static void apic_mmio_write(struct kvm_io_device *this,
offset &= 0xff0;
- KVMTRACE_1D(APIC_ACCESS, apic->vcpu, (u32)offset, handler);
+ trace_kvm_apic_write(offset, val);
switch (offset) {
case APIC_ID: /* Local APIC ID */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index b2e23c8..081d00a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -25,10 +25,12 @@
#include <linux/vmalloc.h>
#include <linux/highmem.h>
#include <linux/sched.h>
+#include <linux/ftrace_event.h>
#include <asm/desc.h>
#include <asm/virtext.h>
+#include "trace.h"
#define __ex(x) __kvm_handle_fault_on_reboot(x)
@@ -1099,7 +1101,6 @@ static unsigned long svm_get_dr(struct kvm_vcpu *vcpu, int dr)
val = 0;
}
- KVMTRACE_2D(DR_READ, vcpu, (u32)dr, (u32)val, handler);
return val;
}
@@ -1108,8 +1109,6 @@ static void svm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long value,
{
struct vcpu_svm *svm = to_svm(vcpu);
- KVMTRACE_2D(DR_WRITE, vcpu, (u32)dr, (u32)value, handler);
-
*exception = 0;
switch (dr) {
@@ -1157,14 +1156,7 @@ static int pf_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
fault_address = svm->vmcb->control.exit_info_2;
error_code = svm->vmcb->control.exit_info_1;
- if (!npt_enabled)
- KVMTRACE_3D(PAGE_FAULT, &svm->vcpu, error_code,
- (u32)fault_address, (u32)(fault_address >> 32),
- handler);
- else
- KVMTRACE_3D(TDP_FAULT, &svm->vcpu, error_code,
- (u32)fault_address, (u32)(fault_address >> 32),
- handler);
+ trace_kvm_page_fault(fault_address, error_code);
/*
* FIXME: Tis shouldn't be necessary here, but there is a flush
* missing in the MMU code. Until we find this bug, flush the
@@ -1291,14 +1283,12 @@ static int io_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
static int nmi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
{
- KVMTRACE_0D(NMI, &svm->vcpu, handler);
return 1;
}
static int intr_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
{
++svm->vcpu.stat.irq_exits;
- KVMTRACE_0D(INTR, &svm->vcpu, handler);
return 1;
}
@@ -2080,8 +2070,7 @@ static int rdmsr_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
if (svm_get_msr(&svm->vcpu, ecx, &data))
kvm_inject_gp(&svm->vcpu, 0);
else {
- KVMTRACE_3D(MSR_READ, &svm->vcpu, ecx, (u32)data,
- (u32)(data >> 32), handler);
+ trace_kvm_msr_read(ecx, data);
svm->vcpu.arch.regs[VCPU_REGS_RAX] = data & 0xffffffff;
svm->vcpu.arch.regs[VCPU_REGS_RDX] = data >> 32;
@@ -2164,8 +2153,7 @@ static int wrmsr_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
u64 data = (svm->vcpu.arch.regs[VCPU_REGS_RAX] & -1u)
| ((u64)(svm->vcpu.arch.regs[VCPU_REGS_RDX] & -1u) << 32);
- KVMTRACE_3D(MSR_WRITE, &svm->vcpu, ecx, (u32)data, (u32)(data >> 32),
- handler);
+ trace_kvm_msr_write(ecx, data);
svm->next_rip = kvm_rip_read(&svm->vcpu) + 2;
if (svm_set_msr(&svm->vcpu, ecx, data))
@@ -2186,8 +2174,6 @@ static int msr_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
static int interrupt_window_interception(struct vcpu_svm *svm,
struct kvm_run *kvm_run)
{
- KVMTRACE_0D(PEND_INTR, &svm->vcpu, handler);
-
svm_clear_vintr(svm);
svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
/*
@@ -2266,8 +2252,7 @@ static int handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
u32 exit_code = svm->vmcb->control.exit_code;
- KVMTRACE_3D(VMEXIT, vcpu, exit_code, (u32)svm->vmcb->save.rip,
- (u32)((u64)svm->vmcb->save.rip >> 32), entryexit);
+ trace_kvm_exit(exit_code, svm->vmcb->save.rip);
if (is_nested(svm)) {
nsvm_printk("nested handle_exit: 0x%x | 0x%lx | 0x%lx | 0x%lx\n",
@@ -2355,7 +2340,7 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq)
{
struct vmcb_control_area *control;
- KVMTRACE_1D(INJ_VIRQ, &svm->vcpu, (u32)irq, handler);
+ trace_kvm_inj_virq(irq);
++svm->vcpu.stat.irq_injections;
control = &svm->vmcb->control;
@@ -2718,6 +2703,59 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
return 0;
}
+static const struct trace_print_flags svm_exit_reasons_str[] = {
+ { SVM_EXIT_READ_CR0, "read_cr0" },
+ { SVM_EXIT_READ_CR3, "read_cr3" },
+ { SVM_EXIT_READ_CR4, "read_cr4" },
+ { SVM_EXIT_READ_CR8, "read_cr8" },
+ { SVM_EXIT_WRITE_CR0, "write_cr0" },
+ { SVM_EXIT_WRITE_CR3, "write_cr3" },
+ { SVM_EXIT_WRITE_CR4, "write_cr4" },
+ { SVM_EXIT_WRITE_CR8, "write_cr8" },
+ { SVM_EXIT_READ_DR0, "read_dr0" },
+ { SVM_EXIT_READ_DR1, "read_dr1" },
+ { SVM_EXIT_READ_DR2, "read_dr2" },
+ { SVM_EXIT_READ_DR3, "read_dr3" },
+ { SVM_EXIT_WRITE_DR0, "write_dr0" },
+ { SVM_EXIT_WRITE_DR1, "write_dr1" },
+ { SVM_EXIT_WRITE_DR2, "write_dr2" },
+ { SVM_EXIT_WRITE_DR3, "write_dr3" },
+ { SVM_EXIT_WRITE_DR5, "write_dr5" },
+ { SVM_EXIT_WRITE_DR7, "write_dr7" },
+ { SVM_EXIT_EXCP_BASE + DB_VECTOR, "DB excp" },
+ { SVM_EXIT_EXCP_BASE + BP_VECTOR, "BP excp" },
+ { SVM_EXIT_EXCP_BASE + UD_VECTOR, "UD excp" },
+ { SVM_EXIT_EXCP_BASE + PF_VECTOR, "PF excp" },
+ { SVM_EXIT_EXCP_BASE + NM_VECTOR, "NM excp" },
+ { SVM_EXIT_EXCP_BASE + MC_VECTOR, "MC excp" },
+ { SVM_EXIT_INTR, "interrupt" },
+ { SVM_EXIT_NMI, "nmi" },
+ { SVM_EXIT_SMI, "smi" },
+ { SVM_EXIT_INIT, "init" },
+ { SVM_EXIT_VINTR, "vintr" },
+ { SVM_EXIT_CPUID, "cpuid" },
+ { SVM_EXIT_INVD, "invd" },
+ { SVM_EXIT_HLT, "hlt" },
+ { SVM_EXIT_INVLPG, "invlpg" },
+ { SVM_EXIT_INVLPGA, "invlpga" },
+ { SVM_EXIT_IOIO, "io" },
+ { SVM_EXIT_MSR, "msr" },
+ { SVM_EXIT_TASK_SWITCH, "task_switch" },
+ { SVM_EXIT_SHUTDOWN, "shutdown" },
+ { SVM_EXIT_VMRUN, "vmrun" },
+ { SVM_EXIT_VMMCALL, "hypercall" },
+ { SVM_EXIT_VMLOAD, "vmload" },
+ { SVM_EXIT_VMSAVE, "vmsave" },
+ { SVM_EXIT_STGI, "stgi" },
+ { SVM_EXIT_CLGI, "clgi" },
+ { SVM_EXIT_SKINIT, "skinit" },
+ { SVM_EXIT_WBINVD, "wbinvd" },
+ { SVM_EXIT_MONITOR, "monitor" },
+ { SVM_EXIT_MWAIT, "mwait" },
+ { SVM_EXIT_NPF, "npf" },
+ { -1, NULL }
+};
+
static struct kvm_x86_ops svm_x86_ops = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
@@ -2779,6 +2817,8 @@ static struct kvm_x86_ops svm_x86_ops = {
.set_tss_addr = svm_set_tss_addr,
.get_tdp_level = get_npt_level,
.get_mt_mask = svm_get_mt_mask,
+
+ .exit_reasons_str = svm_exit_reasons_str,
};
static int __init svm_init(void)
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
new file mode 100644
index 0000000..cd8c90d
--- /dev/null
+++ b/arch/x86/kvm/trace.h
@@ -0,0 +1,260 @@
+#if !defined(_TRACE_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KVM_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kvm
+#define TRACE_INCLUDE_PATH arch/x86/kvm
+#define TRACE_INCLUDE_FILE trace
+
+/*
+ * Tracepoint for guest mode entry.
+ */
+TRACE_EVENT(kvm_entry,
+ TP_PROTO(unsigned int vcpu_id),
+ TP_ARGS(vcpu_id),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, vcpu_id )
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu_id = vcpu_id;
+ ),
+
+ TP_printk("vcpu %u", __entry->vcpu_id)
+);
+
+/*
+ * Tracepoint for hypercall.
+ */
+TRACE_EVENT(kvm_hypercall,
+ TP_PROTO(unsigned long nr, unsigned long a0, unsigned long a1,
+ unsigned long a2, unsigned long a3),
+ TP_ARGS(nr, a0, a1, a2, a3),
+
+ TP_STRUCT__entry(
+ __field( unsigned long, nr )
+ __field( unsigned long, a0 )
+ __field( unsigned long, a1 )
+ __field( unsigned long, a2 )
+ __field( unsigned long, a3 )
+ ),
+
+ TP_fast_assign(
+ __entry->nr = nr;
+ __entry->a0 = a0;
+ __entry->a1 = a1;
+ __entry->a2 = a2;
+ __entry->a3 = a3;
+ ),
+
+ TP_printk("nr 0x%lx a0 0x%lx a1 0x%lx a2 0x%lx a3 0x%lx",
+ __entry->nr, __entry->a0, __entry->a1, __entry->a2,
+ __entry->a3)
+);
+
+/*
+ * Tracepoint for PIO.
+ */
+TRACE_EVENT(kvm_pio,
+ TP_PROTO(unsigned int rw, unsigned int port, unsigned int size,
+ unsigned int count),
+ TP_ARGS(rw, port, size, count),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, rw )
+ __field( unsigned int, port )
+ __field( unsigned int, size )
+ __field( unsigned int, count )
+ ),
+
+ TP_fast_assign(
+ __entry->rw = rw;
+ __entry->port = port;
+ __entry->size = size;
+ __entry->count = count;
+ ),
+
+ TP_printk("pio_%s at 0x%x size %d count %d",
+ __entry->rw ? "write" : "read",
+ __entry->port, __entry->size, __entry->count)
+);
+
+/*
+ * Tracepoint for cpuid.
+ */
+TRACE_EVENT(kvm_cpuid,
+ TP_PROTO(unsigned int function, unsigned long rax, unsigned long rbx,
+ unsigned long rcx, unsigned long rdx),
+ TP_ARGS(function, rax, rbx, rcx, rdx),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, function )
+ __field( unsigned long, rax )
+ __field( unsigned long, rbx )
+ __field( unsigned long, rcx )
+ __field( unsigned long, rdx )
+ ),
+
+ TP_fast_assign(
+ __entry->function = function;
+ __entry->rax = rax;
+ __entry->rbx = rbx;
+ __entry->rcx = rcx;
+ __entry->rdx = rdx;
+ ),
+
+ TP_printk("func %x rax %lx rbx %lx rcx %lx rdx %lx",
+ __entry->function, __entry->rax,
+ __entry->rbx, __entry->rcx, __entry->rdx)
+);
+
+/*
+ * Tracepoint for apic access.
+ */
+TRACE_EVENT(kvm_apic,
+ TP_PROTO(unsigned int rw, unsigned int reg, unsigned int val),
+ TP_ARGS(rw, reg, val),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, rw )
+ __field( unsigned int, reg )
+ __field( unsigned int, val )
+ ),
+
+ TP_fast_assign(
+ __entry->rw = rw;
+ __entry->reg = reg;
+ __entry->val = val;
+ ),
+
+ TP_printk("apic_%s 0x%x = 0x%x",
+ __entry->rw ? "write" : "read",
+ __entry->reg, __entry->val)
+);
+
+#define trace_kvm_apic_read(reg, val) trace_kvm_apic(0, reg, val)
+#define trace_kvm_apic_write(reg, val) trace_kvm_apic(1, reg, val)
+
+/*
+ * Tracepoint for kvm guest exit:
+ */
+TRACE_EVENT(kvm_exit,
+ TP_PROTO(unsigned int exit_reason, unsigned long guest_rip),
+ TP_ARGS(exit_reason, guest_rip),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, exit_reason )
+ __field( unsigned long, guest_rip )
+ ),
+
+ TP_fast_assign(
+ __entry->exit_reason = exit_reason;
+ __entry->guest_rip = guest_rip;
+ ),
+
+ TP_printk("reason %s rip 0x%lx",
+ ftrace_print_symbols_seq(p, __entry->exit_reason,
+ kvm_x86_ops->exit_reasons_str),
+ __entry->guest_rip)
+);
+
+/*
+ * Tracepoint for kvm interrupt injection:
+ */
+TRACE_EVENT(kvm_inj_virq,
+ TP_PROTO(unsigned int irq),
+ TP_ARGS(irq),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, irq )
+ ),
+
+ TP_fast_assign(
+ __entry->irq = irq;
+ ),
+
+ TP_printk("irq %u", __entry->irq)
+);
+
+/*
+ * Tracepoint for page fault.
+ */
+TRACE_EVENT(kvm_page_fault,
+ TP_PROTO(unsigned long fault_address, unsigned int error_code),
+ TP_ARGS(fault_address, error_code),
+
+ TP_STRUCT__entry(
+ __field( unsigned long, fault_address )
+ __field( unsigned int, error_code )
+ ),
+
+ TP_fast_assign(
+ __entry->fault_address = fault_address;
+ __entry->error_code = error_code;
+ ),
+
+ TP_printk("address %lx error_code %x",
+ __entry->fault_address, __entry->error_code)
+);
+
+/*
+ * Tracepoint for guest MSR access.
+ */
+TRACE_EVENT(kvm_msr,
+ TP_PROTO(unsigned int rw, unsigned int ecx, unsigned long data),
+ TP_ARGS(rw, ecx, data),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, rw )
+ __field( unsigned int, ecx )
+ __field( unsigned long, data )
+ ),
+
+ TP_fast_assign(
+ __entry->rw = rw;
+ __entry->ecx = ecx;
+ __entry->data = data;
+ ),
+
+ TP_printk("msr_%s %x = 0x%lx",
+ __entry->rw ? "write" : "read",
+ __entry->ecx, __entry->data)
+);
+
+#define trace_kvm_msr_read(ecx, data) trace_kvm_msr(0, ecx, data)
+#define trace_kvm_msr_write(ecx, data) trace_kvm_msr(1, ecx, data)
+
+/*
+ * Tracepoint for guest CR access.
+ */
+TRACE_EVENT(kvm_cr,
+ TP_PROTO(unsigned int rw, unsigned int cr, unsigned long val),
+ TP_ARGS(rw, cr, val),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, rw )
+ __field( unsigned int, cr )
+ __field( unsigned long, val )
+ ),
+
+ TP_fast_assign(
+ __entry->rw = rw;
+ __entry->cr = cr;
+ __entry->val = val;
+ ),
+
+ TP_printk("cr_%s %x = 0x%lx",
+ __entry->rw ? "write" : "read",
+ __entry->cr, __entry->val)
+);
+
+#define trace_kvm_cr_read(cr, val) trace_kvm_cr(0, cr, val)
+#define trace_kvm_cr_write(cr, val) trace_kvm_cr(1, cr, val)
+
+#endif /* _TRACE_KVM_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1a84ca1..c6256b9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -25,6 +25,7 @@
#include <linux/highmem.h>
#include <linux/sched.h>
#include <linux/moduleparam.h>
+#include <linux/ftrace_event.h>
#include "kvm_cache_regs.h"
#include "x86.h"
@@ -34,6 +35,8 @@
#include <asm/virtext.h>
#include <asm/mce.h>
+#include "trace.h"
+
#define __ex(x) __kvm_handle_fault_on_reboot(x)
MODULE_AUTHOR("Qumranet");
@@ -2550,7 +2553,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
uint32_t intr;
int irq = vcpu->arch.interrupt.nr;
- KVMTRACE_1D(INJ_VIRQ, vcpu, (u32)irq, handler);
+ trace_kvm_inj_virq(irq);
++vcpu->stat.irq_injections;
if (vmx->rmode.vm86_active) {
@@ -2751,8 +2754,8 @@ static int handle_exception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
if (enable_ept)
BUG();
cr2 = vmcs_readl(EXIT_QUALIFICATION);
- KVMTRACE_3D(PAGE_FAULT, vcpu, error_code, (u32)cr2,
- (u32)((u64)cr2 >> 32), handler);
+ trace_kvm_page_fault(cr2, error_code);
+
if (kvm_event_needs_reinjection(vcpu))
kvm_mmu_unprotect_page_virt(vcpu, cr2);
return kvm_mmu_page_fault(vcpu, cr2, error_code);
@@ -2799,7 +2802,6 @@ static int handle_external_interrupt(struct kvm_vcpu *vcpu,
struct kvm_run *kvm_run)
{
++vcpu->stat.irq_exits;
- KVMTRACE_1D(INTR, vcpu, vmcs_read32(VM_EXIT_INTR_INFO), handler);
return 1;
}
@@ -2847,7 +2849,7 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
static int handle_cr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
- unsigned long exit_qualification;
+ unsigned long exit_qualification, val;
int cr;
int reg;
@@ -2856,21 +2858,19 @@ static int handle_cr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
reg = (exit_qualification >> 8) & 15;
switch ((exit_qualification >> 4) & 3) {
case 0: /* mov to cr */
- KVMTRACE_3D(CR_WRITE, vcpu, (u32)cr,
- (u32)kvm_register_read(vcpu, reg),
- (u32)((u64)kvm_register_read(vcpu, reg) >> 32),
- handler);
+ val = kvm_register_read(vcpu, reg);
+ trace_kvm_cr_write(cr, val);
switch (cr) {
case 0:
- kvm_set_cr0(vcpu, kvm_register_read(vcpu, reg));
+ kvm_set_cr0(vcpu, val);
skip_emulated_instruction(vcpu);
return 1;
case 3:
- kvm_set_cr3(vcpu, kvm_register_read(vcpu, reg));
+ kvm_set_cr3(vcpu, val);
skip_emulated_instruction(vcpu);
return 1;
case 4:
- kvm_set_cr4(vcpu, kvm_register_read(vcpu, reg));
+ kvm_set_cr4(vcpu, val);
skip_emulated_instruction(vcpu);
return 1;
case 8: {
@@ -2892,23 +2892,19 @@ static int handle_cr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
vcpu->arch.cr0 &= ~X86_CR0_TS;
vmcs_writel(CR0_READ_SHADOW, vcpu->arch.cr0);
vmx_fpu_activate(vcpu);
- KVMTRACE_0D(CLTS, vcpu, handler);
skip_emulated_instruction(vcpu);
return 1;
case 1: /*mov from cr*/
switch (cr) {
case 3:
kvm_register_write(vcpu, reg, vcpu->arch.cr3);
- KVMTRACE_3D(CR_READ, vcpu, (u32)cr,
- (u32)kvm_register_read(vcpu, reg),
- (u32)((u64)kvm_register_read(vcpu, reg) >> 32),
- handler);
+ trace_kvm_cr_read(cr, vcpu->arch.cr3);
skip_emulated_instruction(vcpu);
return 1;
case 8:
- kvm_register_write(vcpu, reg, kvm_get_cr8(vcpu));
- KVMTRACE_2D(CR_READ, vcpu, (u32)cr,
- (u32)kvm_register_read(vcpu, reg), handler);
+ val = kvm_get_cr8(vcpu);
+ kvm_register_write(vcpu, reg, val);
+ trace_kvm_cr_read(cr, val);
skip_emulated_instruction(vcpu);
return 1;
}
@@ -2976,7 +2972,6 @@ static int handle_dr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
val = 0;
}
kvm_register_write(vcpu, reg, val);
- KVMTRACE_2D(DR_READ, vcpu, (u32)dr, (u32)val, handler);
} else {
val = vcpu->arch.regs[reg];
switch (dr) {
@@ -3009,7 +3004,6 @@ static int handle_dr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
}
break;
}
- KVMTRACE_2D(DR_WRITE, vcpu, (u32)dr, (u32)val, handler);
}
skip_emulated_instruction(vcpu);
return 1;
@@ -3031,8 +3025,7 @@ static int handle_rdmsr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
return 1;
}
- KVMTRACE_3D(MSR_READ, vcpu, ecx, (u32)data, (u32)(data >> 32),
- handler);
+ trace_kvm_msr_read(ecx, data);
/* FIXME: handling of bits 32:63 of rax, rdx */
vcpu->arch.regs[VCPU_REGS_RAX] = data & -1u;
@@ -3047,8 +3040,7 @@ static int handle_wrmsr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
u64 data = (vcpu->arch.regs[VCPU_REGS_RAX] & -1u)
| ((u64)(vcpu->arch.regs[VCPU_REGS_RDX] & -1u) << 32);
- KVMTRACE_3D(MSR_WRITE, vcpu, ecx, (u32)data, (u32)(data >> 32),
- handler);
+ trace_kvm_msr_write(ecx, data);
if (vmx_set_msr(vcpu, ecx, data) != 0) {
kvm_inject_gp(vcpu, 0);
@@ -3075,7 +3067,6 @@ static int handle_interrupt_window(struct kvm_vcpu *vcpu,
cpu_based_vm_exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
- KVMTRACE_0D(PEND_INTR, vcpu, handler);
++vcpu->stat.irq_window_exits;
/*
@@ -3227,6 +3218,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
}
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+ trace_kvm_page_fault(gpa, exit_qualification);
return kvm_mmu_page_fault(vcpu, gpa & PAGE_MASK, 0);
}
@@ -3410,8 +3402,7 @@ static int vmx_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
u32 exit_reason = vmx->exit_reason;
u32 vectoring_info = vmx->idt_vectoring_info;
- KVMTRACE_3D(VMEXIT, vcpu, exit_reason, (u32)kvm_rip_read(vcpu),
- (u32)((u64)kvm_rip_read(vcpu) >> 32), entryexit);
+ trace_kvm_exit(exit_reason, kvm_rip_read(vcpu));
/* If we need to emulate an MMIO from handle_invalid_guest_state
* we just return 0 */
@@ -3500,10 +3491,8 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
/* We need to handle NMIs before interrupts are enabled */
if ((exit_intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR &&
- (exit_intr_info & INTR_INFO_VALID_MASK)) {
- KVMTRACE_0D(NMI, &vmx->vcpu, handler);
+ (exit_intr_info & INTR_INFO_VALID_MASK))
asm("int $2");
- }
idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK;
@@ -3891,6 +3880,29 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
return ret;
}
+static const struct trace_print_flags vmx_exit_reasons_str[] = {
+ { EXIT_REASON_EXCEPTION_NMI, "exception" },
+ { EXIT_REASON_EXTERNAL_INTERRUPT, "ext_irq" },
+ { EXIT_REASON_TRIPLE_FAULT, "triple_fault" },
+ { EXIT_REASON_NMI_WINDOW, "nmi_window" },
+ { EXIT_REASON_IO_INSTRUCTION, "io_instruction" },
+ { EXIT_REASON_CR_ACCESS, "cr_access" },
+ { EXIT_REASON_DR_ACCESS, "dr_access" },
+ { EXIT_REASON_CPUID, "cpuid" },
+ { EXIT_REASON_MSR_READ, "rdmsr" },
+ { EXIT_REASON_MSR_WRITE, "wrmsr" },
+ { EXIT_REASON_PENDING_INTERRUPT, "interrupt_window" },
+ { EXIT_REASON_HLT, "halt" },
+ { EXIT_REASON_INVLPG, "invlpg" },
+ { EXIT_REASON_VMCALL, "hypercall" },
+ { EXIT_REASON_TPR_BELOW_THRESHOLD, "tpr_below_thres" },
+ { EXIT_REASON_APIC_ACCESS, "apic_access" },
+ { EXIT_REASON_WBINVD, "wbinvd" },
+ { EXIT_REASON_TASK_SWITCH, "task_switch" },
+ { EXIT_REASON_EPT_VIOLATION, "ept_violation" },
+ { -1, NULL }
+};
+
static struct kvm_x86_ops vmx_x86_ops = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
@@ -3950,6 +3962,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
.set_tss_addr = vmx_set_tss_addr,
.get_tdp_level = get_ept_level,
.get_mt_mask = vmx_get_mt_mask,
+
+ .exit_reasons_str = vmx_exit_reasons_str,
};
static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a066876..892a7a6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -37,6 +37,8 @@
#include <linux/iommu.h>
#include <linux/intel-iommu.h>
#include <linux/cpufreq.h>
+#define CREATE_TRACE_POINTS
+#include "trace.h"
#include <asm/uaccess.h>
#include <asm/msr.h>
@@ -347,9 +349,6 @@ EXPORT_SYMBOL_GPL(kvm_set_cr0);
void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
{
kvm_set_cr0(vcpu, (vcpu->arch.cr0 & ~0x0ful) | (msw & 0x0f));
- KVMTRACE_1D(LMSW, vcpu,
- (u32)((vcpu->arch.cr0 & ~0x0ful) | (msw & 0x0f)),
- handler);
}
EXPORT_SYMBOL_GPL(kvm_lmsw);
@@ -2568,7 +2567,6 @@ int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address)
int emulate_clts(struct kvm_vcpu *vcpu)
{
- KVMTRACE_0D(CLTS, vcpu, handler);
kvm_x86_ops->set_cr0(vcpu, vcpu->arch.cr0 & ~X86_CR0_TS);
return X86EMUL_CONTINUE;
}
@@ -2851,12 +2849,8 @@ int kvm_emulate_pio(struct kvm_vcpu *vcpu, struct kvm_run *run, int in,
vcpu->arch.pio.down = 0;
vcpu->arch.pio.rep = 0;
- if (vcpu->run->io.direction == KVM_EXIT_IO_IN)
- KVMTRACE_2D(IO_READ, vcpu, vcpu->run->io.port, (u32)size,
- handler);
- else
- KVMTRACE_2D(IO_WRITE, vcpu, vcpu->run->io.port, (u32)size,
- handler);
+ trace_kvm_pio(vcpu->run->io.direction == KVM_EXIT_IO_OUT, port,
+ size, 1);
val = kvm_register_read(vcpu, VCPU_REGS_RAX);
memcpy(vcpu->arch.pio_data, &val, 4);
@@ -2892,12 +2886,8 @@ int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, struct kvm_run *run, int in,
vcpu->arch.pio.down = down;
vcpu->arch.pio.rep = rep;
- if (vcpu->run->io.direction == KVM_EXIT_IO_IN)
- KVMTRACE_2D(IO_READ, vcpu, vcpu->run->io.port, (u32)size,
- handler);
- else
- KVMTRACE_2D(IO_WRITE, vcpu, vcpu->run->io.port, (u32)size,
- handler);
+ trace_kvm_pio(vcpu->run->io.direction == KVM_EXIT_IO_OUT, port,
+ size, count);
if (!count) {
kvm_x86_ops->skip_emulated_instruction(vcpu);
@@ -3075,7 +3065,6 @@ void kvm_arch_exit(void)
int kvm_emulate_halt(struct kvm_vcpu *vcpu)
{
++vcpu->stat.halt_exits;
- KVMTRACE_0D(HLT, vcpu, handler);
if (irqchip_in_kernel(vcpu->kvm)) {
vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
return 1;
@@ -3106,7 +3095,7 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
a2 = kvm_register_read(vcpu, VCPU_REGS_RDX);
a3 = kvm_register_read(vcpu, VCPU_REGS_RSI);
- KVMTRACE_1D(VMMCALL, vcpu, (u32)nr, handler);
+ trace_kvm_hypercall(nr, a0, a1, a2, a3);
if (!is_long_mode(vcpu)) {
nr &= 0xFFFFFFFF;
@@ -3206,8 +3195,6 @@ unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr)
vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr);
return 0;
}
- KVMTRACE_3D(CR_READ, vcpu, (u32)cr, (u32)value,
- (u32)((u64)value >> 32), handler);
return value;
}
@@ -3215,9 +3202,6 @@ unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr)
void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val,
unsigned long *rflags)
{
- KVMTRACE_3D(CR_WRITE, vcpu, (u32)cr, (u32)val,
- (u32)((u64)val >> 32), handler);
-
switch (cr) {
case 0:
kvm_set_cr0(vcpu, mk_cr_64(vcpu->arch.cr0, val));
@@ -3327,11 +3311,11 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
kvm_register_write(vcpu, VCPU_REGS_RDX, best->edx);
}
kvm_x86_ops->skip_emulated_instruction(vcpu);
- KVMTRACE_5D(CPUID, vcpu, function,
- (u32)kvm_register_read(vcpu, VCPU_REGS_RAX),
- (u32)kvm_register_read(vcpu, VCPU_REGS_RBX),
- (u32)kvm_register_read(vcpu, VCPU_REGS_RCX),
- (u32)kvm_register_read(vcpu, VCPU_REGS_RDX), handler);
+ trace_kvm_cpuid(function,
+ kvm_register_read(vcpu, VCPU_REGS_RAX),
+ kvm_register_read(vcpu, VCPU_REGS_RBX),
+ kvm_register_read(vcpu, VCPU_REGS_RCX),
+ kvm_register_read(vcpu, VCPU_REGS_RDX));
}
EXPORT_SYMBOL_GPL(kvm_emulate_cpuid);
@@ -3527,7 +3511,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
set_debugreg(vcpu->arch.eff_db[3], 3);
}
- KVMTRACE_0D(VMENTRY, vcpu, entryexit);
+ trace_kvm_entry(vcpu->vcpu_id);
kvm_x86_ops->run(vcpu, kvm_run);
if (unlikely(vcpu->arch.switch_db_regs)) {
@@ -4842,3 +4826,9 @@ int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu)
{
return kvm_x86_ops->interrupt_allowed(vcpu);
}
+
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
new file mode 100644
index 0000000..d74b23d
--- /dev/null
+++ b/include/trace/events/kvm.h
@@ -0,0 +1,57 @@
+#if !defined(_TRACE_KVM_MAIN_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KVM_MAIN_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kvm
+#define TRACE_INCLUDE_FILE kvm
+
+#if defined(__KVM_HAVE_IOAPIC)
+TRACE_EVENT(kvm_set_irq,
+ TP_PROTO(unsigned int gsi),
+ TP_ARGS(gsi),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, gsi )
+ ),
+
+ TP_fast_assign(
+ __entry->gsi = gsi;
+ ),
+
+ TP_printk("gsi %u", __entry->gsi)
+);
+
+
+#define kvm_irqchips \
+ {KVM_IRQCHIP_PIC_MASTER, "PIC master"}, \
+ {KVM_IRQCHIP_PIC_SLAVE, "PIC slave"}, \
+ {KVM_IRQCHIP_IOAPIC, "IOAPIC"}
+
+TRACE_EVENT(kvm_ack_irq,
+ TP_PROTO(unsigned int irqchip, unsigned int pin),
+ TP_ARGS(irqchip, pin),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, irqchip )
+ __field( unsigned int, pin )
+ ),
+
+ TP_fast_assign(
+ __entry->irqchip = irqchip;
+ __entry->pin = pin;
+ ),
+
+ TP_printk("irqchip %s pin %u",
+ __print_symbolic(__entry->irqchip, kvm_irqchips),
+ __entry->pin)
+);
+
+
+
+#endif /* defined(__KVM_HAVE_IOAPIC) */
+#endif /* _TRACE_KVM_MAIN_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index bb8a1b5..94759ed 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -20,6 +20,7 @@
*/
#include <linux/kvm_host.h>
+#include <trace/events/kvm.h>
#include <asm/msidef.h>
#ifdef CONFIG_IA64
@@ -125,6 +126,8 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level)
unsigned long *irq_state, sig_level;
int ret = -1;
+ trace_kvm_set_irq(irq);
+
WARN_ON(!mutex_is_locked(&kvm->irq_lock));
if (irq < KVM_IOAPIC_NUM_PINS) {
@@ -161,6 +164,8 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
struct hlist_node *n;
unsigned gsi = pin;
+ trace_kvm_ack_irq(irqchip, pin);
+
list_for_each_entry(e, &kvm->irq_routing, link)
if (e->type == KVM_IRQ_ROUTING_IRQCHIP &&
e->irqchip.irqchip == irqchip &&
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 48d5e69..04bdedd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -59,6 +59,9 @@
#include "irq.h"
#endif
+#define CREATE_TRACE_POINTS
+#include <trace/events/kvm.h>
+
MODULE_AUTHOR("Qumranet");
MODULE_LICENSE("GPL");
@@ -2718,6 +2721,7 @@ EXPORT_SYMBOL_GPL(kvm_init);
void kvm_exit(void)
{
kvm_trace_cleanup();
+ tracepoint_synchronize_unregister();
misc_deregister(&kvm_dev);
kmem_cache_destroy(kvm_vcpu_cache);
sysdev_unregister(&kvm_sysdev);
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 25/47] KVM: Allow emulation of syscalls instructions on #UD
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (23 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 24/47] KVM: convert custom marker based tracing to event traces Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 26/47] KVM: x86 emulator: Add missing EFLAGS bit definitions Avi Kivity
` (21 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Andre Przywara <andre.przywara@amd.com>
Add the opcodes for syscall, sysenter and sysexit to the list of instructions
handled by the undefined opcode handler.
Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/x86.c | 33 ++++++++++++++++++++++++++-------
1 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 892a7a6..57e76b3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2667,14 +2667,33 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
r = x86_decode_insn(&vcpu->arch.emulate_ctxt, &emulate_ops);
- /* Reject the instructions other than VMCALL/VMMCALL when
- * try to emulate invalid opcode */
+ /* Only allow emulation of specific instructions on #UD
+ * (namely VMMCALL, sysenter, sysexit, syscall)*/
c = &vcpu->arch.emulate_ctxt.decode;
- if ((emulation_type & EMULTYPE_TRAP_UD) &&
- (!(c->twobyte && c->b == 0x01 &&
- (c->modrm_reg == 0 || c->modrm_reg == 3) &&
- c->modrm_mod == 3 && c->modrm_rm == 1)))
- return EMULATE_FAIL;
+ if (emulation_type & EMULTYPE_TRAP_UD) {
+ if (!c->twobyte)
+ return EMULATE_FAIL;
+ switch (c->b) {
+ case 0x01: /* VMMCALL */
+ if (c->modrm_mod != 3 || c->modrm_rm != 1)
+ return EMULATE_FAIL;
+ break;
+ case 0x34: /* sysenter */
+ case 0x35: /* sysexit */
+ if (c->modrm_mod != 0 || c->modrm_rm != 0)
+ return EMULATE_FAIL;
+ break;
+ case 0x05: /* syscall */
+ if (c->modrm_mod != 0 || c->modrm_rm != 0)
+ return EMULATE_FAIL;
+ break;
+ default:
+ return EMULATE_FAIL;
+ }
+
+ if (!(c->modrm_reg == 0 || c->modrm_reg == 3))
+ return EMULATE_FAIL;
+ }
++vcpu->stat.insn_emulation;
if (r) {
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 26/47] KVM: x86 emulator: Add missing EFLAGS bit definitions
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (24 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 25/47] KVM: Allow emulation of syscalls instructions on #UD Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 27/47] KVM: x86 emulator: Prepare for emulation of syscall instructions Avi Kivity
` (20 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/x86_emulate.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index ef4dfca..67af33a 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -320,8 +320,11 @@ static u32 group2_table[] = {
};
/* EFLAGS bit definitions. */
+#define EFLG_VM (1<<17)
+#define EFLG_RF (1<<16)
#define EFLG_OF (1<<11)
#define EFLG_DF (1<<10)
+#define EFLG_IF (1<<9)
#define EFLG_SF (1<<7)
#define EFLG_ZF (1<<6)
#define EFLG_AF (1<<4)
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 27/47] KVM: x86 emulator: Prepare for emulation of syscall instructions
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (25 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 26/47] KVM: x86 emulator: Add missing EFLAGS bit definitions Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 28/47] KVM: x86 emulator: add syscall emulation Avi Kivity
` (19 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Andre Przywara <andre.przywara@amd.com>
Add the flags needed for syscall, sysenter and sysexit to the opcode table.
Catch (but for now ignore) the opcodes in the emulation switch/case.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/x86_emulate.c | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index 67af33a..b0da29d 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -32,6 +32,8 @@
#include <linux/module.h>
#include <asm/kvm_x86_emulate.h>
+#include "mmu.h" /* for is_long_mode() */
+
/*
* Opcode effective-address decode tables.
* Note that we only emulate instructions that have at least one memory
@@ -209,7 +211,7 @@ static u32 opcode_table[256] = {
static u32 twobyte_table[256] = {
/* 0x00 - 0x0F */
- 0, Group | GroupDual | Group7, 0, 0, 0, 0, ImplicitOps, 0,
+ 0, Group | GroupDual | Group7, 0, 0, 0, ImplicitOps, ImplicitOps, 0,
ImplicitOps, ImplicitOps, 0, 0, 0, ImplicitOps | ModRM, 0, 0,
/* 0x10 - 0x1F */
0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps | ModRM, 0, 0, 0, 0, 0, 0, 0,
@@ -217,7 +219,9 @@ static u32 twobyte_table[256] = {
ModRM | ImplicitOps, ModRM, ModRM | ImplicitOps, ModRM, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
/* 0x30 - 0x3F */
- ImplicitOps, 0, ImplicitOps, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ ImplicitOps, 0, ImplicitOps, 0,
+ ImplicitOps, ImplicitOps, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0,
/* 0x40 - 0x47 */
DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
@@ -1988,6 +1992,9 @@ twobyte_insn:
goto cannot_emulate;
}
break;
+ case 0x05: /* syscall */
+ goto cannot_emulate;
+ break;
case 0x06:
emulate_clts(ctxt->vcpu);
c->dst.type = OP_NONE;
@@ -2054,6 +2061,12 @@ twobyte_insn:
rc = X86EMUL_CONTINUE;
c->dst.type = OP_NONE;
break;
+ case 0x34: /* sysenter */
+ goto cannot_emulate;
+ break;
+ case 0x35: /* sysexit */
+ goto cannot_emulate;
+ break;
case 0x40 ... 0x4f: /* cmov */
c->dst.val = c->dst.orig_val = c->src.val;
if (!test_cc(c->b, ctxt->eflags))
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 28/47] KVM: x86 emulator: add syscall emulation
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (26 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 27/47] KVM: x86 emulator: Prepare for emulation of syscall instructions Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 29/47] KVM: x86 emulator: Add sysenter emulation Avi Kivity
` (18 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Andre Przywara <andre.przywara@amd.com>
Handle #UD intercept of the syscall instruction in 32bit compat mode on
an Intel host.
Setup the segment descriptors for CS and SS and the EIP/ESP registers
according to the manual. Save the RIP and EFLAGS to the correct registers.
[avi: fix build on i386 due to missing R11]
Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/x86_emulate.c | 84 +++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 83 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index b0da29d..4d7256d 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -1397,6 +1397,85 @@ static void toggle_interruptibility(struct x86_emulate_ctxt *ctxt, u32 mask)
ctxt->interruptibility = mask;
}
+static inline void
+setup_syscalls_segments(struct x86_emulate_ctxt *ctxt,
+ struct kvm_segment *cs, struct kvm_segment *ss)
+{
+ memset(cs, 0, sizeof(struct kvm_segment));
+ kvm_x86_ops->get_segment(ctxt->vcpu, cs, VCPU_SREG_CS);
+ memset(ss, 0, sizeof(struct kvm_segment));
+
+ cs->l = 0; /* will be adjusted later */
+ cs->base = 0; /* flat segment */
+ cs->g = 1; /* 4kb granularity */
+ cs->limit = 0xffffffff; /* 4GB limit */
+ cs->type = 0x0b; /* Read, Execute, Accessed */
+ cs->s = 1;
+ cs->dpl = 0; /* will be adjusted later */
+ cs->present = 1;
+ cs->db = 1;
+
+ ss->unusable = 0;
+ ss->base = 0; /* flat segment */
+ ss->limit = 0xffffffff; /* 4GB limit */
+ ss->g = 1; /* 4kb granularity */
+ ss->s = 1;
+ ss->type = 0x03; /* Read/Write, Accessed */
+ ss->db = 1; /* 32bit stack segment */
+ ss->dpl = 0;
+ ss->present = 1;
+}
+
+static int
+emulate_syscall(struct x86_emulate_ctxt *ctxt)
+{
+ struct decode_cache *c = &ctxt->decode;
+ struct kvm_segment cs, ss;
+ u64 msr_data;
+
+ /* syscall is not available in real mode */
+ if (c->lock_prefix || ctxt->mode == X86EMUL_MODE_REAL
+ || !(ctxt->vcpu->arch.cr0 & X86_CR0_PE))
+ return -1;
+
+ setup_syscalls_segments(ctxt, &cs, &ss);
+
+ kvm_x86_ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data);
+ msr_data >>= 32;
+ cs.selector = (u16)(msr_data & 0xfffc);
+ ss.selector = (u16)(msr_data + 8);
+
+ if (is_long_mode(ctxt->vcpu)) {
+ cs.db = 0;
+ cs.l = 1;
+ }
+ kvm_x86_ops->set_segment(ctxt->vcpu, &cs, VCPU_SREG_CS);
+ kvm_x86_ops->set_segment(ctxt->vcpu, &ss, VCPU_SREG_SS);
+
+ c->regs[VCPU_REGS_RCX] = c->eip;
+ if (is_long_mode(ctxt->vcpu)) {
+#ifdef CONFIG_X86_64
+ c->regs[VCPU_REGS_R11] = ctxt->eflags & ~EFLG_RF;
+
+ kvm_x86_ops->get_msr(ctxt->vcpu,
+ ctxt->mode == X86EMUL_MODE_PROT64 ?
+ MSR_LSTAR : MSR_CSTAR, &msr_data);
+ c->eip = msr_data;
+
+ kvm_x86_ops->get_msr(ctxt->vcpu, MSR_SYSCALL_MASK, &msr_data);
+ ctxt->eflags &= ~(msr_data | EFLG_RF);
+#endif
+ } else {
+ /* legacy mode */
+ kvm_x86_ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data);
+ c->eip = (u32)msr_data;
+
+ ctxt->eflags &= ~(EFLG_VM | EFLG_IF | EFLG_RF);
+ }
+
+ return 0;
+}
+
int
x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
{
@@ -1993,7 +2072,10 @@ twobyte_insn:
}
break;
case 0x05: /* syscall */
- goto cannot_emulate;
+ if (emulate_syscall(ctxt) == -1)
+ goto cannot_emulate;
+ else
+ goto writeback;
break;
case 0x06:
emulate_clts(ctxt->vcpu);
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 29/47] KVM: x86 emulator: Add sysenter emulation
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (27 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 28/47] KVM: x86 emulator: add syscall emulation Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 30/47] KVM: x86 emulator: Add sysexit emulation Avi Kivity
` (17 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Andre Przywara <andre.przywara@amd.com>
Handle #UD intercept of the sysenter instruction in 32bit compat mode on
an AMD host.
Setup the segment descriptors for CS and SS and the EIP/ESP registers
according to the manual.
Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/x86_emulate.c | 70 +++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 69 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index 4d7256d..7a9bddb 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -1476,6 +1476,71 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
return 0;
}
+static int
+emulate_sysenter(struct x86_emulate_ctxt *ctxt)
+{
+ struct decode_cache *c = &ctxt->decode;
+ struct kvm_segment cs, ss;
+ u64 msr_data;
+
+ /* inject #UD if LOCK prefix is used */
+ if (c->lock_prefix)
+ return -1;
+
+ /* inject #GP if in real mode or paging is disabled */
+ if (ctxt->mode == X86EMUL_MODE_REAL ||
+ !(ctxt->vcpu->arch.cr0 & X86_CR0_PE)) {
+ kvm_inject_gp(ctxt->vcpu, 0);
+ return -1;
+ }
+
+ /* XXX sysenter/sysexit have not been tested in 64bit mode.
+ * Therefore, we inject an #UD.
+ */
+ if (ctxt->mode == X86EMUL_MODE_PROT64)
+ return -1;
+
+ setup_syscalls_segments(ctxt, &cs, &ss);
+
+ kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &msr_data);
+ switch (ctxt->mode) {
+ case X86EMUL_MODE_PROT32:
+ if ((msr_data & 0xfffc) == 0x0) {
+ kvm_inject_gp(ctxt->vcpu, 0);
+ return -1;
+ }
+ break;
+ case X86EMUL_MODE_PROT64:
+ if (msr_data == 0x0) {
+ kvm_inject_gp(ctxt->vcpu, 0);
+ return -1;
+ }
+ break;
+ }
+
+ ctxt->eflags &= ~(EFLG_VM | EFLG_IF | EFLG_RF);
+ cs.selector = (u16)msr_data;
+ cs.selector &= ~SELECTOR_RPL_MASK;
+ ss.selector = cs.selector + 8;
+ ss.selector &= ~SELECTOR_RPL_MASK;
+ if (ctxt->mode == X86EMUL_MODE_PROT64
+ || is_long_mode(ctxt->vcpu)) {
+ cs.db = 0;
+ cs.l = 1;
+ }
+
+ kvm_x86_ops->set_segment(ctxt->vcpu, &cs, VCPU_SREG_CS);
+ kvm_x86_ops->set_segment(ctxt->vcpu, &ss, VCPU_SREG_SS);
+
+ kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_EIP, &msr_data);
+ c->eip = msr_data;
+
+ kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_ESP, &msr_data);
+ c->regs[VCPU_REGS_RSP] = msr_data;
+
+ return 0;
+}
+
int
x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
{
@@ -2144,7 +2209,10 @@ twobyte_insn:
c->dst.type = OP_NONE;
break;
case 0x34: /* sysenter */
- goto cannot_emulate;
+ if (emulate_sysenter(ctxt) == -1)
+ goto cannot_emulate;
+ else
+ goto writeback;
break;
case 0x35: /* sysexit */
goto cannot_emulate;
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 30/47] KVM: x86 emulator: Add sysexit emulation
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (28 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 29/47] KVM: x86 emulator: Add sysenter emulation Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 31/47] KVM: s390: Fix memslot initialization for userspace_addr != 0 Avi Kivity
` (16 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Andre Przywara <andre.przywara@amd.com>
Handle #UD intercept of the sysexit instruction in 64bit mode returning to
32bit compat mode on an AMD host.
Setup the segment descriptors for CS and SS and the EIP/ESP registers
according to the manual.
Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/x86_emulate.c | 72 +++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 71 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index 7a9bddb..c6663d4 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -1541,6 +1541,73 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
return 0;
}
+static int
+emulate_sysexit(struct x86_emulate_ctxt *ctxt)
+{
+ struct decode_cache *c = &ctxt->decode;
+ struct kvm_segment cs, ss;
+ u64 msr_data;
+ int usermode;
+
+ /* inject #UD if LOCK prefix is used */
+ if (c->lock_prefix)
+ return -1;
+
+ /* inject #GP if in real mode or paging is disabled */
+ if (ctxt->mode == X86EMUL_MODE_REAL
+ || !(ctxt->vcpu->arch.cr0 & X86_CR0_PE)) {
+ kvm_inject_gp(ctxt->vcpu, 0);
+ return -1;
+ }
+
+ /* sysexit must be called from CPL 0 */
+ if (kvm_x86_ops->get_cpl(ctxt->vcpu) != 0) {
+ kvm_inject_gp(ctxt->vcpu, 0);
+ return -1;
+ }
+
+ setup_syscalls_segments(ctxt, &cs, &ss);
+
+ if ((c->rex_prefix & 0x8) != 0x0)
+ usermode = X86EMUL_MODE_PROT64;
+ else
+ usermode = X86EMUL_MODE_PROT32;
+
+ cs.dpl = 3;
+ ss.dpl = 3;
+ kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &msr_data);
+ switch (usermode) {
+ case X86EMUL_MODE_PROT32:
+ cs.selector = (u16)(msr_data + 16);
+ if ((msr_data & 0xfffc) == 0x0) {
+ kvm_inject_gp(ctxt->vcpu, 0);
+ return -1;
+ }
+ ss.selector = (u16)(msr_data + 24);
+ break;
+ case X86EMUL_MODE_PROT64:
+ cs.selector = (u16)(msr_data + 32);
+ if (msr_data == 0x0) {
+ kvm_inject_gp(ctxt->vcpu, 0);
+ return -1;
+ }
+ ss.selector = cs.selector + 8;
+ cs.db = 0;
+ cs.l = 1;
+ break;
+ }
+ cs.selector |= SELECTOR_RPL_MASK;
+ ss.selector |= SELECTOR_RPL_MASK;
+
+ kvm_x86_ops->set_segment(ctxt->vcpu, &cs, VCPU_SREG_CS);
+ kvm_x86_ops->set_segment(ctxt->vcpu, &ss, VCPU_SREG_SS);
+
+ c->eip = ctxt->vcpu->arch.regs[VCPU_REGS_RDX];
+ c->regs[VCPU_REGS_RSP] = ctxt->vcpu->arch.regs[VCPU_REGS_RCX];
+
+ return 0;
+}
+
int
x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
{
@@ -2215,7 +2282,10 @@ twobyte_insn:
goto writeback;
break;
case 0x35: /* sysexit */
- goto cannot_emulate;
+ if (emulate_sysexit(ctxt) == -1)
+ goto cannot_emulate;
+ else
+ goto writeback;
break;
case 0x40 ... 0x4f: /* cmov */
c->dst.val = c->dst.orig_val = c->src.val;
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 31/47] KVM: s390: Fix memslot initialization for userspace_addr != 0
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (29 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 30/47] KVM: x86 emulator: Add sysexit emulation Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 32/47] hugetlbfs: export vma_kernel_pagsize to modules Avi Kivity
` (15 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Christian Borntraeger <borntraeger@de.ibm.com>
Since
commit 854b5338196b1175706e99d63be43a4f8d8ab607
Author: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
KVM: s390: streamline memslot handling
s390 uses the values of the memslot instead of doing everything in the arch
ioctl handler of the KVM_SET_USER_MEMORY_REGION. Unfortunately we missed to
set the userspace_addr of our memslot due to our s390 ifdef in
__kvm_set_memory_region.
Old s390 userspace launchers did not notice, since they started the guest at
userspace address 0.
Because of CONFIG_DEFAULT_MMAP_MIN_ADDR we now put the guest at 1M userspace,
which does not work. This patch makes sure that new.userspace_addr is set
on s390.
This fix should go in quickly. Nevertheless, looking at the code we should
clean up that ifdef in the long term. Any kernel janitors?
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
virt/kvm/kvm_main.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 04bdedd..1da8072 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1199,6 +1199,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
if (old.npages)
kvm_arch_flush_shadow(kvm);
}
+#else /* not defined CONFIG_S390 */
+ new.user_alloc = user_alloc;
+ if (user_alloc)
+ new.userspace_addr = mem->userspace_addr;
#endif /* not defined CONFIG_S390 */
if (!npages)
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 32/47] hugetlbfs: export vma_kernel_pagsize to modules
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (30 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 31/47] KVM: s390: Fix memslot initialization for userspace_addr != 0 Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 33/47] KVM: Prepare memslot data structures for multiple hugepage sizes Avi Kivity
` (14 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Joerg Roedel <joerg.roedel@amd.com>
This function is required by KVM.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
mm/hugetlb.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index cafdcee..b16d636 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -234,6 +234,7 @@ unsigned long vma_kernel_pagesize(struct vm_area_struct *vma)
return 1UL << (hstate->order + PAGE_SHIFT);
}
+EXPORT_SYMBOL_GPL(vma_kernel_pagesize);
/*
* Return the page size being used by the MMU to back a VMA. In the majority
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 33/47] KVM: Prepare memslot data structures for multiple hugepage sizes
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (31 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 32/47] hugetlbfs: export vma_kernel_pagsize to modules Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 34/47] KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl paths Avi Kivity
` (13 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Joerg Roedel <joerg.roedel@amd.com>
[avi: fix build on non-x86]
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/ia64/include/asm/kvm_host.h | 3 +-
arch/powerpc/include/asm/kvm_host.h | 3 +-
arch/s390/include/asm/kvm_host.h | 6 +++-
arch/x86/include/asm/kvm_host.h | 12 ++++----
arch/x86/kvm/mmu.c | 30 ++++++++++---------
arch/x86/kvm/paging_tmpl.h | 3 +-
include/linux/kvm_host.h | 2 +-
virt/kvm/kvm_main.c | 56 ++++++++++++++++++++++++----------
8 files changed, 73 insertions(+), 42 deletions(-)
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 9cf1c4b..d9b6325 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -235,7 +235,8 @@ struct kvm_vm_data {
#define KVM_REQ_PTC_G 32
#define KVM_REQ_RESUME 33
-#define KVM_PAGES_PER_HPAGE 1
+#define KVM_NR_PAGE_SIZES 1
+#define KVM_PAGES_PER_HPAGE(x) 1
struct kvm;
struct kvm_vcpu;
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index d4caa61..c9c930e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -34,7 +34,8 @@
#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
/* We don't currently support large pages. */
-#define KVM_PAGES_PER_HPAGE (1UL << 31)
+#define KVM_NR_PAGE_SIZES 1
+#define KVM_PAGES_PER_HPAGE(x) (1UL<<31)
struct kvm;
struct kvm_run;
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 75535d4..78e07a6 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -40,7 +40,11 @@ struct sca_block {
struct sca_entry cpu[64];
} __attribute__((packed));
-#define KVM_PAGES_PER_HPAGE 256
+#define KVM_NR_PAGE_SIZES 2
+#define KVM_HPAGE_SHIFT(x) (PAGE_SHIFT + ((x) - 1) * 8)
+#define KVM_HPAGE_SIZE(x) (1UL << KVM_HPAGE_SHIFT(x))
+#define KVM_HPAGE_MASK(x) (~(KVM_HPAGE_SIZE(x) - 1))
+#define KVM_PAGES_PER_HPAGE(x) (KVM_HPAGE_SIZE(x) / PAGE_SIZE)
#define CPUSTAT_HOST 0x80000000
#define CPUSTAT_WAIT 0x10000000
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 19027ab..30b625d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -54,12 +54,12 @@
#define INVALID_PAGE (~(hpa_t)0)
#define UNMAPPED_GVA (~(gpa_t)0)
-/* shadow tables are PAE even on non-PAE hosts */
-#define KVM_HPAGE_SHIFT 21
-#define KVM_HPAGE_SIZE (1UL << KVM_HPAGE_SHIFT)
-#define KVM_HPAGE_MASK (~(KVM_HPAGE_SIZE - 1))
-
-#define KVM_PAGES_PER_HPAGE (KVM_HPAGE_SIZE / PAGE_SIZE)
+/* KVM Hugepage definitions for x86 */
+#define KVM_NR_PAGE_SIZES 2
+#define KVM_HPAGE_SHIFT(x) (PAGE_SHIFT + (((x) - 1) * 9))
+#define KVM_HPAGE_SIZE(x) (1UL << KVM_HPAGE_SHIFT(x))
+#define KVM_HPAGE_MASK(x) (~(KVM_HPAGE_SIZE(x) - 1))
+#define KVM_PAGES_PER_HPAGE(x) (KVM_HPAGE_SIZE(x) / PAGE_SIZE)
#define DE_VECTOR 0
#define DB_VECTOR 1
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 12974de..b67585c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -394,9 +394,9 @@ static int *slot_largepage_idx(gfn_t gfn, struct kvm_memory_slot *slot)
{
unsigned long idx;
- idx = (gfn / KVM_PAGES_PER_HPAGE) -
- (slot->base_gfn / KVM_PAGES_PER_HPAGE);
- return &slot->lpage_info[idx].write_count;
+ idx = (gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL)) -
+ (slot->base_gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL));
+ return &slot->lpage_info[0][idx].write_count;
}
static void account_shadowed(struct kvm *kvm, gfn_t gfn)
@@ -485,10 +485,10 @@ static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int lpage)
if (!lpage)
return &slot->rmap[gfn - slot->base_gfn];
- idx = (gfn / KVM_PAGES_PER_HPAGE) -
- (slot->base_gfn / KVM_PAGES_PER_HPAGE);
+ idx = (gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL)) -
+ (slot->base_gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL));
- return &slot->lpage_info[idx].rmap_pde;
+ return &slot->lpage_info[0][idx].rmap_pde;
}
/*
@@ -731,11 +731,11 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
end = start + (memslot->npages << PAGE_SHIFT);
if (hva >= start && hva < end) {
gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
+ int idx = gfn_offset /
+ KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL);
retval |= handler(kvm, &memslot->rmap[gfn_offset]);
retval |= handler(kvm,
- &memslot->lpage_info[
- gfn_offset /
- KVM_PAGES_PER_HPAGE].rmap_pde);
+ &memslot->lpage_info[0][idx].rmap_pde);
}
}
@@ -1876,8 +1876,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
pfn_t pfn;
unsigned long mmu_seq;
- if (is_largepage_backed(vcpu, gfn & ~(KVM_PAGES_PER_HPAGE-1))) {
- gfn &= ~(KVM_PAGES_PER_HPAGE-1);
+ if (is_largepage_backed(vcpu, gfn &
+ ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1))) {
+ gfn &= ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
largepage = 1;
}
@@ -2082,8 +2083,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
if (r)
return r;
- if (is_largepage_backed(vcpu, gfn & ~(KVM_PAGES_PER_HPAGE-1))) {
- gfn &= ~(KVM_PAGES_PER_HPAGE-1);
+ if (is_largepage_backed(vcpu, gfn &
+ ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1))) {
+ gfn &= ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
largepage = 1;
}
mmu_seq = vcpu->kvm->mmu_notifier_seq;
@@ -2485,7 +2487,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
if (is_large_pte(gpte) && is_largepage_backed(vcpu, gfn)) {
- gfn &= ~(KVM_PAGES_PER_HPAGE-1);
+ gfn &= ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
vcpu->arch.update_pte.largepage = 1;
}
vcpu->arch.update_pte.mmu_seq = vcpu->kvm->mmu_notifier_seq;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 322e811..53e129c 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -401,7 +401,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
if (walker.level == PT_DIRECTORY_LEVEL) {
gfn_t large_gfn;
- large_gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE-1);
+ large_gfn = walker.gfn &
+ ~(KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL) - 1);
if (is_largepage_backed(vcpu, large_gfn)) {
walker.gfn = large_gfn;
largepage = 1;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6988858..06af936 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -103,7 +103,7 @@ struct kvm_memory_slot {
struct {
unsigned long rmap_pde;
int write_count;
- } *lpage_info;
+ } *lpage_info[KVM_NR_PAGE_SIZES - 1];
unsigned long userspace_addr;
int user_alloc;
};
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1da8072..8361662 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1001,19 +1001,25 @@ out:
static void kvm_free_physmem_slot(struct kvm_memory_slot *free,
struct kvm_memory_slot *dont)
{
+ int i;
+
if (!dont || free->rmap != dont->rmap)
vfree(free->rmap);
if (!dont || free->dirty_bitmap != dont->dirty_bitmap)
vfree(free->dirty_bitmap);
- if (!dont || free->lpage_info != dont->lpage_info)
- vfree(free->lpage_info);
+
+ for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
+ if (!dont || free->lpage_info[i] != dont->lpage_info[i]) {
+ vfree(free->lpage_info[i]);
+ free->lpage_info[i] = NULL;
+ }
+ }
free->npages = 0;
free->dirty_bitmap = NULL;
free->rmap = NULL;
- free->lpage_info = NULL;
}
void kvm_free_physmem(struct kvm *kvm)
@@ -1087,7 +1093,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
int r;
gfn_t base_gfn;
unsigned long npages, ugfn;
- unsigned long largepages, i;
+ int lpages;
+ unsigned long i, j;
struct kvm_memory_slot *memslot;
struct kvm_memory_slot old, new;
@@ -1161,33 +1168,48 @@ int __kvm_set_memory_region(struct kvm *kvm,
else
new.userspace_addr = 0;
}
- if (npages && !new.lpage_info) {
- largepages = 1 + (base_gfn + npages - 1) / KVM_PAGES_PER_HPAGE;
- largepages -= base_gfn / KVM_PAGES_PER_HPAGE;
+ if (!npages)
+ goto skip_lpage;
- new.lpage_info = vmalloc(largepages * sizeof(*new.lpage_info));
+ for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
+ int level = i + 2;
- if (!new.lpage_info)
+ /* Avoid unused variable warning if no large pages */
+ (void)level;
+
+ if (new.lpage_info[i])
+ continue;
+
+ lpages = 1 + (base_gfn + npages - 1) /
+ KVM_PAGES_PER_HPAGE(level);
+ lpages -= base_gfn / KVM_PAGES_PER_HPAGE(level);
+
+ new.lpage_info[i] = vmalloc(lpages * sizeof(*new.lpage_info[i]));
+
+ if (!new.lpage_info[i])
goto out_free;
- memset(new.lpage_info, 0, largepages * sizeof(*new.lpage_info));
+ memset(new.lpage_info[i], 0,
+ lpages * sizeof(*new.lpage_info[i]));
- if (base_gfn % KVM_PAGES_PER_HPAGE)
- new.lpage_info[0].write_count = 1;
- if ((base_gfn+npages) % KVM_PAGES_PER_HPAGE)
- new.lpage_info[largepages-1].write_count = 1;
+ if (base_gfn % KVM_PAGES_PER_HPAGE(level))
+ new.lpage_info[i][0].write_count = 1;
+ if ((base_gfn+npages) % KVM_PAGES_PER_HPAGE(level))
+ new.lpage_info[i][lpages - 1].write_count = 1;
ugfn = new.userspace_addr >> PAGE_SHIFT;
/*
* If the gfn and userspace address are not aligned wrt each
* other, or if explicitly asked to, disable large page
* support for this slot
*/
- if ((base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE - 1) ||
+ if ((base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE(level) - 1) ||
!largepages_enabled)
- for (i = 0; i < largepages; ++i)
- new.lpage_info[i].write_count = 1;
+ for (j = 0; j < lpages; ++j)
+ new.lpage_info[i][j].write_count = 1;
}
+skip_lpage:
+
/* Allocate page dirty bitmap if needed */
if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
unsigned dirty_bytes = ALIGN(npages, BITS_PER_LONG) / 8;
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 34/47] KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl paths
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (32 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 33/47] KVM: Prepare memslot data structures for multiple hugepage sizes Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 35/47] KVM: ignore AMDs HWCR register access to set the FFDIS bit Avi Kivity
` (12 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
Correct missing locking in a few places in x86's vm_ioctl handling path.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/i8254.c | 2 --
arch/x86/kvm/x86.c | 12 ++++++++++++
virt/kvm/kvm_main.c | 2 ++
3 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 15fc95b..bcd00c7 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -345,9 +345,7 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val)
void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val)
{
- mutex_lock(&kvm->arch.vpit->pit_state.lock);
pit_load_count(kvm, channel, val);
- mutex_unlock(&kvm->arch.vpit->pit_state.lock);
}
static inline struct kvm_pit *dev_to_pit(struct kvm_io_device *dev)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 57e76b3..e9b0982 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1987,19 +1987,25 @@ static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, struct kvm_irqchip *chip)
r = 0;
switch (chip->chip_id) {
case KVM_IRQCHIP_PIC_MASTER:
+ spin_lock(&pic_irqchip(kvm)->lock);
memcpy(&pic_irqchip(kvm)->pics[0],
&chip->chip.pic,
sizeof(struct kvm_pic_state));
+ spin_unlock(&pic_irqchip(kvm)->lock);
break;
case KVM_IRQCHIP_PIC_SLAVE:
+ spin_lock(&pic_irqchip(kvm)->lock);
memcpy(&pic_irqchip(kvm)->pics[1],
&chip->chip.pic,
sizeof(struct kvm_pic_state));
+ spin_unlock(&pic_irqchip(kvm)->lock);
break;
case KVM_IRQCHIP_IOAPIC:
+ mutex_lock(&kvm->irq_lock);
memcpy(ioapic_irqchip(kvm),
&chip->chip.ioapic,
sizeof(struct kvm_ioapic_state));
+ mutex_unlock(&kvm->irq_lock);
break;
default:
r = -EINVAL;
@@ -2013,7 +2019,9 @@ static int kvm_vm_ioctl_get_pit(struct kvm *kvm, struct kvm_pit_state *ps)
{
int r = 0;
+ mutex_lock(&kvm->arch.vpit->pit_state.lock);
memcpy(ps, &kvm->arch.vpit->pit_state, sizeof(struct kvm_pit_state));
+ mutex_unlock(&kvm->arch.vpit->pit_state.lock);
return r;
}
@@ -2021,8 +2029,10 @@ static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps)
{
int r = 0;
+ mutex_lock(&kvm->arch.vpit->pit_state.lock);
memcpy(&kvm->arch.vpit->pit_state, ps, sizeof(struct kvm_pit_state));
kvm_pit_load_count(kvm, 0, ps->channels[0].count);
+ mutex_unlock(&kvm->arch.vpit->pit_state.lock);
return r;
}
@@ -2031,7 +2041,9 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
{
if (!kvm->arch.vpit)
return -ENXIO;
+ mutex_lock(&kvm->arch.vpit->pit_state.lock);
kvm->arch.vpit->pit_state.pit_timer.reinject = control->pit_reinject;
+ mutex_unlock(&kvm->arch.vpit->pit_state.lock);
return 0;
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8361662..f1e2e8c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2274,10 +2274,12 @@ static long kvm_vm_ioctl(struct file *filp,
#ifdef CONFIG_KVM_APIC_ARCHITECTURE
case KVM_SET_BOOT_CPU_ID:
r = 0;
+ mutex_lock(&kvm->lock);
if (atomic_read(&kvm->online_vcpus) != 0)
r = -EBUSY;
else
kvm->bsp_vcpu_id = arg;
+ mutex_unlock(&kvm->lock);
break;
#endif
default:
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 35/47] KVM: ignore AMDs HWCR register access to set the FFDIS bit
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (33 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 34/47] KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl paths Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 36/47] KVM: ignore reads from AMDs C1E enabled MSR Avi Kivity
` (11 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Andre Przywara <andre.przywara@amd.com>
Linux tries to disable the flush filter on all AMD K8 CPUs. Since KVM
does not handle the needed MSR, the injected #GP will panic the Linux
kernel. Ignore setting of the HWCR.FFDIS bit in this MSR to let Linux
boot with an AMD K8 family guest CPU.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/svm.c | 1 -
arch/x86/kvm/x86.c | 8 ++++++++
2 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 081d00a..d664157 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2138,7 +2138,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
break;
case MSR_VM_CR:
case MSR_VM_IGNNE:
- case MSR_K7_HWCR:
pr_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
break;
default:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e9b0982..cae5b12 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -833,6 +833,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
case MSR_EFER:
set_efer(vcpu, data);
break;
+ case MSR_K7_HWCR:
+ data &= ~(u64)0x40; /* ignore flush filter disable */
+ if (data != 0) {
+ pr_unimpl(vcpu, "unimplemented HWCR wrmsr: 0x%llx\n",
+ data);
+ return 1;
+ }
+ break;
case MSR_IA32_DEBUGCTLMSR:
if (!data) {
/* We support the non-activated case already */
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 36/47] KVM: ignore reads from AMDs C1E enabled MSR
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (34 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 35/47] KVM: ignore AMDs HWCR register access to set the FFDIS bit Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 37/47] KVM: introduce module parameter for ignoring unknown MSRs accesses Avi Kivity
` (10 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Andre Przywara <andre.przywara@amd.com>
If the Linux kernel detects an C1E capable AMD processor (K8 RevF and
higher), it will access a certain MSR on every attempt to go to halt.
Explicitly handle this read and return 0 to let KVM run a Linux guest
with the native AMD host CPU propagated to the guest.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/x86.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cae5b12..6aace61 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1038,6 +1038,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
case MSR_P6_EVNTSEL0:
case MSR_P6_EVNTSEL1:
case MSR_K7_EVNTSEL0:
+ case MSR_K8_INT_PENDING_MSG:
data = 0;
break;
case MSR_MTRRcap:
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 37/47] KVM: introduce module parameter for ignoring unknown MSRs accesses
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (35 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 36/47] KVM: ignore reads from AMDs C1E enabled MSR Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 38/47] KVM: powerpc: convert marker probes to event trace Avi Kivity
` (9 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Andre Przywara <andre.przywara@amd.com>
KVM will inject a #GP into the guest if that tries to access unhandled
MSRs. This will crash many guests. Although it would be the correct
way to actually handle these MSRs, we introduce a runtime switchable
module param called "ignore_msrs" (defaults to 0). If this is Y, unknown
MSR reads will return 0, while MSR writes are simply dropped. In both cases
we print a message to dmesg to inform the user about that.
You can change the behaviour at any time by saying:
# echo 1 > /sys/modules/kvm/parameters/ignore_msrs
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/x86.c | 24 ++++++++++++++++++++----
1 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6aace61..0be75d5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -83,6 +83,9 @@ struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
struct kvm_x86_ops *kvm_x86_ops;
EXPORT_SYMBOL_GPL(kvm_x86_ops);
+int ignore_msrs = 0;
+module_param_named(ignore_msrs, ignore_msrs, bool, S_IRUGO | S_IWUSR);
+
struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "pf_fixed", VCPU_STAT(pf_fixed) },
{ "pf_guest", VCPU_STAT(pf_guest) },
@@ -930,8 +933,15 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
"0x%x data 0x%llx\n", msr, data);
break;
default:
- pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", msr, data);
- return 1;
+ if (!ignore_msrs) {
+ pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n",
+ msr, data);
+ return 1;
+ } else {
+ pr_unimpl(vcpu, "ignored wrmsr: 0x%x data %llx\n",
+ msr, data);
+ break;
+ }
}
return 0;
}
@@ -1078,8 +1088,14 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1:
return get_msr_mce(vcpu, msr, pdata);
default:
- pr_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr);
- return 1;
+ if (!ignore_msrs) {
+ pr_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr);
+ return 1;
+ } else {
+ pr_unimpl(vcpu, "ignored rdmsr: 0x%x\n", msr);
+ data = 0;
+ }
+ break;
}
*pdata = data;
return 0;
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 38/47] KVM: powerpc: convert marker probes to event trace
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (36 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 37/47] KVM: introduce module parameter for ignoring unknown MSRs accesses Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 39/47] KVM: remove old KVMTRACE support code Avi Kivity
` (8 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
[avi: make it build]
[avi: fold trace-arch.h into trace.h]
CC: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/powerpc/kvm/44x_tlb.c | 11 +++--
arch/powerpc/kvm/Makefile | 4 ++
arch/powerpc/kvm/e500_tlb.c | 16 +++----
arch/powerpc/kvm/emulate.c | 3 +-
arch/powerpc/kvm/powerpc.c | 3 +
arch/powerpc/kvm/trace.h | 104 +++++++++++++++++++++++++++++++++++++++++++
6 files changed, 126 insertions(+), 15 deletions(-)
create mode 100644 arch/powerpc/kvm/trace.h
diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c
index 4a16f47..ff3cb63 100644
--- a/arch/powerpc/kvm/44x_tlb.c
+++ b/arch/powerpc/kvm/44x_tlb.c
@@ -30,6 +30,7 @@
#include "timing.h"
#include "44x_tlb.h"
+#include "trace.h"
#ifndef PPC44x_TLBE_SIZE
#define PPC44x_TLBE_SIZE PPC44x_TLB_4K
@@ -263,7 +264,7 @@ static void kvmppc_44x_shadow_release(struct kvmppc_vcpu_44x *vcpu_44x,
/* XXX set tlb_44x_index to stlb_index? */
- KVMTRACE_1D(STLB_INVAL, &vcpu_44x->vcpu, stlb_index, handler);
+ trace_kvm_stlb_inval(stlb_index);
}
void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
@@ -365,8 +366,8 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gpa_t gpaddr,
/* Insert shadow mapping into hardware TLB. */
kvmppc_44x_tlbe_set_modified(vcpu_44x, victim);
kvmppc_44x_tlbwe(victim, &stlbe);
- KVMTRACE_5D(STLB_WRITE, vcpu, victim, stlbe.tid, stlbe.word0, stlbe.word1,
- stlbe.word2, handler);
+ trace_kvm_stlb_write(victim, stlbe.tid, stlbe.word0, stlbe.word1,
+ stlbe.word2);
}
/* For a particular guest TLB entry, invalidate the corresponding host TLB
@@ -485,8 +486,8 @@ int kvmppc_44x_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws)
kvmppc_mmu_map(vcpu, eaddr, gpaddr, gtlb_index);
}
- KVMTRACE_5D(GTLB_WRITE, vcpu, gtlb_index, tlbe->tid, tlbe->word0,
- tlbe->word1, tlbe->word2, handler);
+ trace_kvm_gtlb_write(gtlb_index, tlbe->tid, tlbe->word0, tlbe->word1,
+ tlbe->word2);
kvmppc_set_exit_type(vcpu, EMULATED_TLBWE_EXITS);
return EMULATE_DONE;
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 459c7ee..4f407f2 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -10,6 +10,10 @@ common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
common-objs-$(CONFIG_KVM_TRACE) += $(addprefix ../../../virt/kvm/, kvm_trace.o)
+CFLAGS_44x_tlb.o := -I.
+CFLAGS_e500_tlb.o := -I.
+CFLAGS_emulate.o := -I.
+
kvm-objs := $(common-objs-y) powerpc.o emulate.o
obj-$(CONFIG_KVM_EXIT_TIMING) += timing.o
obj-$(CONFIG_KVM) += kvm.o
diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index a2048ac..fb1e1dc 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -22,6 +22,7 @@
#include "../mm/mmu_decl.h"
#include "e500_tlb.h"
+#include "trace.h"
#define to_htlb1_esel(esel) (tlb1_entry_num - (esel) - 1)
@@ -224,9 +225,8 @@ static void kvmppc_e500_stlbe_invalidate(struct kvmppc_vcpu_e500 *vcpu_e500,
kvmppc_e500_shadow_release(vcpu_e500, tlbsel, esel);
stlbe->mas1 = 0;
- KVMTRACE_5D(STLB_INVAL, &vcpu_e500->vcpu, index_of(tlbsel, esel),
- stlbe->mas1, stlbe->mas2, stlbe->mas3, stlbe->mas7,
- handler);
+ trace_kvm_stlb_inval(index_of(tlbsel, esel), stlbe->mas1, stlbe->mas2,
+ stlbe->mas3, stlbe->mas7);
}
static void kvmppc_e500_tlb1_invalidate(struct kvmppc_vcpu_e500 *vcpu_e500,
@@ -319,9 +319,8 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
vcpu_e500->vcpu.arch.msr & MSR_PR);
stlbe->mas7 = (hpaddr >> 32) & MAS7_RPN;
- KVMTRACE_5D(STLB_WRITE, &vcpu_e500->vcpu, index_of(tlbsel, esel),
- stlbe->mas1, stlbe->mas2, stlbe->mas3, stlbe->mas7,
- handler);
+ trace_kvm_stlb_write(index_of(tlbsel, esel), stlbe->mas1, stlbe->mas2,
+ stlbe->mas3, stlbe->mas7);
}
/* XXX only map the one-one case, for now use TLB0 */
@@ -535,9 +534,8 @@ int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu)
gtlbe->mas3 = vcpu_e500->mas3;
gtlbe->mas7 = vcpu_e500->mas7;
- KVMTRACE_5D(GTLB_WRITE, vcpu, vcpu_e500->mas0,
- gtlbe->mas1, gtlbe->mas2, gtlbe->mas3, gtlbe->mas7,
- handler);
+ trace_kvm_gtlb_write(vcpu_e500->mas0, gtlbe->mas1, gtlbe->mas2,
+ gtlbe->mas3, gtlbe->mas7);
/* Invalidate shadow mappings for the about-to-be-clobbered TLBE. */
if (tlbe_is_host_safe(vcpu, gtlbe)) {
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 28a8237..7737146 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -29,6 +29,7 @@
#include <asm/kvm_ppc.h>
#include <asm/disassemble.h>
#include "timing.h"
+#include "trace.h"
#define OP_TRAP 3
@@ -419,7 +420,7 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
}
}
- KVMTRACE_3D(PPC_INSTR, vcpu, inst, (int)vcpu->arch.pc, emulated, entryexit);
+ trace_kvm_ppc_instr(inst, vcpu->arch.pc, emulated);
if (advance)
vcpu->arch.pc += 4; /* Advance past emulated instruction. */
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 7ad30e0..0341391 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -31,6 +31,9 @@
#include "timing.h"
#include "../mm/mmu_decl.h"
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
{
return gfn;
diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
new file mode 100644
index 0000000..67f219d
--- /dev/null
+++ b/arch/powerpc/kvm/trace.h
@@ -0,0 +1,104 @@
+#if !defined(_TRACE_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KVM_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kvm
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE trace
+
+/*
+ * Tracepoint for guest mode entry.
+ */
+TRACE_EVENT(kvm_ppc_instr,
+ TP_PROTO(unsigned int inst, unsigned long pc, unsigned int emulate),
+ TP_ARGS(inst, pc, emulate),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, inst )
+ __field( unsigned long, pc )
+ __field( unsigned int, emulate )
+ ),
+
+ TP_fast_assign(
+ __entry->inst = inst;
+ __entry->pc = pc;
+ __entry->emulate = emulate;
+ ),
+
+ TP_printk("inst %u pc 0x%lx emulate %u\n",
+ __entry->inst, __entry->pc, __entry->emulate)
+);
+
+TRACE_EVENT(kvm_stlb_inval,
+ TP_PROTO(unsigned int stlb_index),
+ TP_ARGS(stlb_index),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, stlb_index )
+ ),
+
+ TP_fast_assign(
+ __entry->stlb_index = stlb_index;
+ ),
+
+ TP_printk("stlb_index %u", __entry->stlb_index)
+);
+
+TRACE_EVENT(kvm_stlb_write,
+ TP_PROTO(unsigned int victim, unsigned int tid, unsigned int word0,
+ unsigned int word1, unsigned int word2),
+ TP_ARGS(victim, tid, word0, word1, word2),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, victim )
+ __field( unsigned int, tid )
+ __field( unsigned int, word0 )
+ __field( unsigned int, word1 )
+ __field( unsigned int, word2 )
+ ),
+
+ TP_fast_assign(
+ __entry->victim = victim;
+ __entry->tid = tid;
+ __entry->word0 = word0;
+ __entry->word1 = word1;
+ __entry->word2 = word2;
+ ),
+
+ TP_printk("victim %u tid %u w0 %u w1 %u w2 %u",
+ __entry->victim, __entry->tid, __entry->word0,
+ __entry->word1, __entry->word2)
+);
+
+TRACE_EVENT(kvm_gtlb_write,
+ TP_PROTO(unsigned int gtlb_index, unsigned int tid, unsigned int word0,
+ unsigned int word1, unsigned int word2),
+ TP_ARGS(gtlb_index, tid, word0, word1, word2),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, gtlb_index )
+ __field( unsigned int, tid )
+ __field( unsigned int, word0 )
+ __field( unsigned int, word1 )
+ __field( unsigned int, word2 )
+ ),
+
+ TP_fast_assign(
+ __entry->gtlb_index = gtlb_index;
+ __entry->tid = tid;
+ __entry->word0 = word0;
+ __entry->word1 = word1;
+ __entry->word2 = word2;
+ ),
+
+ TP_printk("gtlb_index %u tid %u w0 %u w1 %u w2 %u",
+ __entry->gtlb_index, __entry->tid, __entry->word0,
+ __entry->word1, __entry->word2)
+);
+
+#endif /* _TRACE_KVM_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 39/47] KVM: remove old KVMTRACE support code
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (37 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 38/47] KVM: powerpc: convert marker probes to event trace Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 40/47] KVM: use vcpu_id instead of bsp_vcpu pointer in kvm_vcpu_is_bsp Avi Kivity
` (7 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
Return EOPNOTSUPP for KVM_TRACE_ENABLE/PAUSE/DISABLE ioctls.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/ia64/kvm/Kconfig | 3 -
arch/powerpc/kvm/Kconfig | 11 --
arch/powerpc/kvm/Makefile | 2 -
arch/s390/kvm/Kconfig | 3 -
arch/x86/kvm/Kconfig | 12 --
arch/x86/kvm/Makefile | 1 -
include/linux/kvm.h | 31 +-----
include/linux/kvm_host.h | 31 -----
virt/kvm/kvm_main.c | 3 +-
virt/kvm/kvm_trace.c | 285 ---------------------------------------------
10 files changed, 2 insertions(+), 380 deletions(-)
delete mode 100644 virt/kvm/kvm_trace.c
diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
index cbadd8a..ef3e7be 100644
--- a/arch/ia64/kvm/Kconfig
+++ b/arch/ia64/kvm/Kconfig
@@ -47,9 +47,6 @@ config KVM_INTEL
Provides support for KVM on Itanium 2 processors equipped with the VT
extensions.
-config KVM_TRACE
- bool
-
source drivers/virtio/Kconfig
endif # VIRTUALIZATION
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 46019dc..c299268 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -58,17 +58,6 @@ config KVM_E500
If unsure, say N.
-config KVM_TRACE
- bool "KVM trace support"
- depends on KVM && MARKERS && SYSFS
- select RELAY
- select DEBUG_FS
- default n
- ---help---
- This option allows reading a trace of kvm-related events through
- relayfs. Note the ABI is not considered stable and will be
- modified in future updates.
-
source drivers/virtio/Kconfig
endif # VIRTUALIZATION
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 4f407f2..37655fe 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -8,8 +8,6 @@ EXTRA_CFLAGS += -Ivirt/kvm -Iarch/powerpc/kvm
common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
-common-objs-$(CONFIG_KVM_TRACE) += $(addprefix ../../../virt/kvm/, kvm_trace.o)
-
CFLAGS_44x_tlb.o := -I.
CFLAGS_e500_tlb.o := -I.
CFLAGS_emulate.o := -I.
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index ad75ce3..bf164fc 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -34,9 +34,6 @@ config KVM
If unsure, say N.
-config KVM_TRACE
- bool
-
# OK, it's a little counter-intuitive to do this, but it puts it neatly under
# the virtualization menu.
source drivers/virtio/Kconfig
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 7fbedfd..b84e571 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -62,18 +62,6 @@ config KVM_AMD
To compile this as a module, choose M here: the module
will be called kvm-amd.
-config KVM_TRACE
- bool "KVM trace support"
- depends on KVM && SYSFS
- select MARKERS
- select RELAY
- select DEBUG_FS
- default n
- ---help---
- This option allows reading a trace of kvm-related events through
- relayfs. Note the ABI is not considered stable and will be
- modified in future updates.
-
# OK, it's a little counter-intuitive to do this, but it puts it neatly under
# the virtualization menu.
source drivers/lguest/Kconfig
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 7c56850..afaaa76 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,6 @@ CFLAGS_vmx.o := -I.
kvm-y += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o)
-kvm-$(CONFIG_KVM_TRACE) += $(addprefix ../../../virt/kvm/, kvm_trace.o)
kvm-$(CONFIG_IOMMU_API) += $(addprefix ../../../virt/kvm/, iommu.o)
kvm-y += x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 6710518..76c6408 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -14,7 +14,7 @@
#define KVM_API_VERSION 12
-/* for KVM_TRACE_ENABLE */
+/* for KVM_TRACE_ENABLE, deprecated */
struct kvm_user_trace_setup {
__u32 buf_size; /* sub_buffer size of each per-cpu */
__u32 buf_nr; /* the number of sub_buffers of each per-cpu */
@@ -325,35 +325,6 @@ struct kvm_guest_debug {
#define KVM_TRC_CYCLE_SIZE 8
#define KVM_TRC_EXTRA_MAX 7
-/* This structure represents a single trace buffer record. */
-struct kvm_trace_rec {
- /* variable rec_val
- * is split into:
- * bits 0 - 27 -> event id
- * bits 28 -30 -> number of extra data args of size u32
- * bits 31 -> binary indicator for if tsc is in record
- */
- __u32 rec_val;
- __u32 pid;
- __u32 vcpu_id;
- union {
- struct {
- __u64 timestamp;
- __u32 extra_u32[KVM_TRC_EXTRA_MAX];
- } __attribute__((packed)) timestamp;
- struct {
- __u32 extra_u32[KVM_TRC_EXTRA_MAX];
- } notimestamp;
- } u;
-};
-
-#define TRACE_REC_EVENT_ID(val) \
- (0x0fffffff & (val))
-#define TRACE_REC_NUM_DATA_ARGS(val) \
- (0x70000000 & ((val) << 28))
-#define TRACE_REC_TCS(val) \
- (0x80000000 & ((val) << 31))
-
#define KVMIO 0xAE
/*
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 06af936..0604d56 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -482,37 +482,6 @@ struct kvm_stats_debugfs_item {
extern struct kvm_stats_debugfs_item debugfs_entries[];
extern struct dentry *kvm_debugfs_dir;
-#define KVMTRACE_5D(evt, vcpu, d1, d2, d3, d4, d5, name) \
- trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
- vcpu, 5, d1, d2, d3, d4, d5)
-#define KVMTRACE_4D(evt, vcpu, d1, d2, d3, d4, name) \
- trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
- vcpu, 4, d1, d2, d3, d4, 0)
-#define KVMTRACE_3D(evt, vcpu, d1, d2, d3, name) \
- trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
- vcpu, 3, d1, d2, d3, 0, 0)
-#define KVMTRACE_2D(evt, vcpu, d1, d2, name) \
- trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
- vcpu, 2, d1, d2, 0, 0, 0)
-#define KVMTRACE_1D(evt, vcpu, d1, name) \
- trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
- vcpu, 1, d1, 0, 0, 0, 0)
-#define KVMTRACE_0D(evt, vcpu, name) \
- trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
- vcpu, 0, 0, 0, 0, 0, 0)
-
-#ifdef CONFIG_KVM_TRACE
-int kvm_trace_ioctl(unsigned int ioctl, unsigned long arg);
-void kvm_trace_cleanup(void);
-#else
-static inline
-int kvm_trace_ioctl(unsigned int ioctl, unsigned long arg)
-{
- return -EINVAL;
-}
-#define kvm_trace_cleanup() ((void)0)
-#endif
-
#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_seq)
{
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f1e2e8c..bbb4029 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2398,7 +2398,7 @@ static long kvm_dev_ioctl(struct file *filp,
case KVM_TRACE_ENABLE:
case KVM_TRACE_PAUSE:
case KVM_TRACE_DISABLE:
- r = kvm_trace_ioctl(ioctl, arg);
+ r = -EOPNOTSUPP;
break;
default:
return kvm_arch_dev_ioctl(filp, ioctl, arg);
@@ -2748,7 +2748,6 @@ EXPORT_SYMBOL_GPL(kvm_init);
void kvm_exit(void)
{
- kvm_trace_cleanup();
tracepoint_synchronize_unregister();
misc_deregister(&kvm_dev);
kmem_cache_destroy(kvm_vcpu_cache);
diff --git a/virt/kvm/kvm_trace.c b/virt/kvm/kvm_trace.c
deleted file mode 100644
index f598744..0000000
--- a/virt/kvm/kvm_trace.c
+++ /dev/null
@@ -1,285 +0,0 @@
-/*
- * kvm trace
- *
- * It is designed to allow debugging traces of kvm to be generated
- * on UP / SMP machines. Each trace entry can be timestamped so that
- * it's possible to reconstruct a chronological record of trace events.
- * The implementation refers to blktrace kernel support.
- *
- * Copyright (c) 2008 Intel Corporation
- * Copyright (C) 2006 Jens Axboe <axboe@kernel.dk>
- *
- * Authors: Feng(Eric) Liu, eric.e.liu@intel.com
- *
- * Date: Feb 2008
- */
-
-#include <linux/module.h>
-#include <linux/relay.h>
-#include <linux/debugfs.h>
-#include <linux/ktime.h>
-
-#include <linux/kvm_host.h>
-
-#define KVM_TRACE_STATE_RUNNING (1 << 0)
-#define KVM_TRACE_STATE_PAUSE (1 << 1)
-#define KVM_TRACE_STATE_CLEARUP (1 << 2)
-
-struct kvm_trace {
- int trace_state;
- struct rchan *rchan;
- struct dentry *lost_file;
- atomic_t lost_records;
-};
-static struct kvm_trace *kvm_trace;
-
-struct kvm_trace_probe {
- const char *name;
- const char *format;
- u32 timestamp_in;
- marker_probe_func *probe_func;
-};
-
-static inline int calc_rec_size(int timestamp, int extra)
-{
- int rec_size = KVM_TRC_HEAD_SIZE;
-
- rec_size += extra;
- return timestamp ? rec_size += KVM_TRC_CYCLE_SIZE : rec_size;
-}
-
-static void kvm_add_trace(void *probe_private, void *call_data,
- const char *format, va_list *args)
-{
- struct kvm_trace_probe *p = probe_private;
- struct kvm_trace *kt = kvm_trace;
- struct kvm_trace_rec rec;
- struct kvm_vcpu *vcpu;
- int i, size;
- u32 extra;
-
- if (unlikely(kt->trace_state != KVM_TRACE_STATE_RUNNING))
- return;
-
- rec.rec_val = TRACE_REC_EVENT_ID(va_arg(*args, u32));
- vcpu = va_arg(*args, struct kvm_vcpu *);
- rec.pid = current->tgid;
- rec.vcpu_id = vcpu->vcpu_id;
-
- extra = va_arg(*args, u32);
- WARN_ON(!(extra <= KVM_TRC_EXTRA_MAX));
- extra = min_t(u32, extra, KVM_TRC_EXTRA_MAX);
-
- rec.rec_val |= TRACE_REC_TCS(p->timestamp_in)
- | TRACE_REC_NUM_DATA_ARGS(extra);
-
- if (p->timestamp_in) {
- rec.u.timestamp.timestamp = ktime_to_ns(ktime_get());
-
- for (i = 0; i < extra; i++)
- rec.u.timestamp.extra_u32[i] = va_arg(*args, u32);
- } else {
- for (i = 0; i < extra; i++)
- rec.u.notimestamp.extra_u32[i] = va_arg(*args, u32);
- }
-
- size = calc_rec_size(p->timestamp_in, extra * sizeof(u32));
- relay_write(kt->rchan, &rec, size);
-}
-
-static struct kvm_trace_probe kvm_trace_probes[] = {
- { "kvm_trace_entryexit", "%u %p %u %u %u %u %u %u", 1, kvm_add_trace },
- { "kvm_trace_handler", "%u %p %u %u %u %u %u %u", 0, kvm_add_trace },
-};
-
-static int lost_records_get(void *data, u64 *val)
-{
- struct kvm_trace *kt = data;
-
- *val = atomic_read(&kt->lost_records);
- return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(kvm_trace_lost_ops, lost_records_get, NULL, "%llu\n");
-
-/*
- * The relay channel is used in "no-overwrite" mode, it keeps trace of how
- * many times we encountered a full subbuffer, to tell user space app the
- * lost records there were.
- */
-static int kvm_subbuf_start_callback(struct rchan_buf *buf, void *subbuf,
- void *prev_subbuf, size_t prev_padding)
-{
- struct kvm_trace *kt;
-
- if (!relay_buf_full(buf)) {
- if (!prev_subbuf) {
- /*
- * executed only once when the channel is opened
- * save metadata as first record
- */
- subbuf_start_reserve(buf, sizeof(u32));
- *(u32 *)subbuf = 0x12345678;
- }
-
- return 1;
- }
-
- kt = buf->chan->private_data;
- atomic_inc(&kt->lost_records);
-
- return 0;
-}
-
-static struct dentry *kvm_create_buf_file_callack(const char *filename,
- struct dentry *parent,
- int mode,
- struct rchan_buf *buf,
- int *is_global)
-{
- return debugfs_create_file(filename, mode, parent, buf,
- &relay_file_operations);
-}
-
-static int kvm_remove_buf_file_callback(struct dentry *dentry)
-{
- debugfs_remove(dentry);
- return 0;
-}
-
-static struct rchan_callbacks kvm_relay_callbacks = {
- .subbuf_start = kvm_subbuf_start_callback,
- .create_buf_file = kvm_create_buf_file_callack,
- .remove_buf_file = kvm_remove_buf_file_callback,
-};
-
-static int do_kvm_trace_enable(struct kvm_user_trace_setup *kuts)
-{
- struct kvm_trace *kt;
- int i, r = -ENOMEM;
-
- if (!kuts->buf_size || !kuts->buf_nr)
- return -EINVAL;
-
- kt = kzalloc(sizeof(*kt), GFP_KERNEL);
- if (!kt)
- goto err;
-
- r = -EIO;
- atomic_set(&kt->lost_records, 0);
- kt->lost_file = debugfs_create_file("lost_records", 0444, kvm_debugfs_dir,
- kt, &kvm_trace_lost_ops);
- if (!kt->lost_file)
- goto err;
-
- kt->rchan = relay_open("trace", kvm_debugfs_dir, kuts->buf_size,
- kuts->buf_nr, &kvm_relay_callbacks, kt);
- if (!kt->rchan)
- goto err;
-
- kvm_trace = kt;
-
- for (i = 0; i < ARRAY_SIZE(kvm_trace_probes); i++) {
- struct kvm_trace_probe *p = &kvm_trace_probes[i];
-
- r = marker_probe_register(p->name, p->format, p->probe_func, p);
- if (r)
- printk(KERN_INFO "Unable to register probe %s\n",
- p->name);
- }
-
- kvm_trace->trace_state = KVM_TRACE_STATE_RUNNING;
-
- return 0;
-err:
- if (kt) {
- if (kt->lost_file)
- debugfs_remove(kt->lost_file);
- if (kt->rchan)
- relay_close(kt->rchan);
- kfree(kt);
- }
- return r;
-}
-
-static int kvm_trace_enable(char __user *arg)
-{
- struct kvm_user_trace_setup kuts;
- int ret;
-
- ret = copy_from_user(&kuts, arg, sizeof(kuts));
- if (ret)
- return -EFAULT;
-
- ret = do_kvm_trace_enable(&kuts);
- if (ret)
- return ret;
-
- return 0;
-}
-
-static int kvm_trace_pause(void)
-{
- struct kvm_trace *kt = kvm_trace;
- int r = -EINVAL;
-
- if (kt == NULL)
- return r;
-
- if (kt->trace_state == KVM_TRACE_STATE_RUNNING) {
- kt->trace_state = KVM_TRACE_STATE_PAUSE;
- relay_flush(kt->rchan);
- r = 0;
- }
-
- return r;
-}
-
-void kvm_trace_cleanup(void)
-{
- struct kvm_trace *kt = kvm_trace;
- int i;
-
- if (kt == NULL)
- return;
-
- if (kt->trace_state == KVM_TRACE_STATE_RUNNING ||
- kt->trace_state == KVM_TRACE_STATE_PAUSE) {
-
- kt->trace_state = KVM_TRACE_STATE_CLEARUP;
-
- for (i = 0; i < ARRAY_SIZE(kvm_trace_probes); i++) {
- struct kvm_trace_probe *p = &kvm_trace_probes[i];
- marker_probe_unregister(p->name, p->probe_func, p);
- }
- marker_synchronize_unregister();
-
- relay_close(kt->rchan);
- debugfs_remove(kt->lost_file);
- kfree(kt);
- }
-}
-
-int kvm_trace_ioctl(unsigned int ioctl, unsigned long arg)
-{
- void __user *argp = (void __user *)arg;
- long r = -EINVAL;
-
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
-
- switch (ioctl) {
- case KVM_TRACE_ENABLE:
- r = kvm_trace_enable(argp);
- break;
- case KVM_TRACE_PAUSE:
- r = kvm_trace_pause();
- break;
- case KVM_TRACE_DISABLE:
- r = 0;
- kvm_trace_cleanup();
- break;
- }
-
- return r;
-}
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 40/47] KVM: use vcpu_id instead of bsp_vcpu pointer in kvm_vcpu_is_bsp
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (38 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 39/47] KVM: remove old KVMTRACE support code Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 41/47] KVM: document locking for kvm_io_device_ops Avi Kivity
` (6 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Marcelo Tosatti <mtosatti@redhat.com>
Change kvm_vcpu_is_bsp to use vcpu_id instead of bsp_vcpu pointer, which
is only initialized at the end of kvm_vm_ioctl_create_vcpu.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
include/linux/kvm_host.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0604d56..4ea42c9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -538,7 +538,7 @@ static inline void kvm_irqfd_release(struct kvm *kvm) {}
#ifdef CONFIG_KVM_APIC_ARCHITECTURE
static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
{
- return vcpu->kvm->bsp_vcpu == vcpu;
+ return vcpu->kvm->bsp_vcpu_id == vcpu->vcpu_id;
}
#endif
#endif
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 41/47] KVM: document locking for kvm_io_device_ops
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (39 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 40/47] KVM: use vcpu_id instead of bsp_vcpu pointer in kvm_vcpu_is_bsp Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 42/47] KVM: switch coalesced mmio changes to slots_lock Avi Kivity
` (5 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Michael S. Tsirkin <mst@redhat.com>
slots_lock is taken everywhere when device ops are called.
Document this as we will use this to rework locking for io.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
virt/kvm/iodev.h | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/virt/kvm/iodev.h b/virt/kvm/iodev.h
index 2c67f5a..06e38b2 100644
--- a/virt/kvm/iodev.h
+++ b/virt/kvm/iodev.h
@@ -20,6 +20,9 @@
struct kvm_io_device;
+/**
+ * kvm_io_device_ops are called under kvm slots_lock.
+ **/
struct kvm_io_device_ops {
void (*read)(struct kvm_io_device *this,
gpa_t addr,
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 42/47] KVM: switch coalesced mmio changes to slots_lock
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (40 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 41/47] KVM: document locking for kvm_io_device_ops Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 43/47] KVM: switch pit creation " Avi Kivity
` (4 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Michael S. Tsirkin <mst@redhat.com>
switch coalesced mmio slots_lock. slots_lock is already taken for read
everywhere, so we only need to take it for write when changing zones.
This is in preparation to removing in_range and kvm->lock around it.
[avi: fix build]
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
virt/kvm/coalesced_mmio.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 397f419..b40946c 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -115,16 +115,16 @@ int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
if (dev == NULL)
return -EINVAL;
- mutex_lock(&kvm->lock);
+ down_write(&kvm->slots_lock);
if (dev->nb_zones >= KVM_COALESCED_MMIO_ZONE_MAX) {
- mutex_unlock(&kvm->lock);
+ up_write(&kvm->slots_lock);
return -ENOBUFS;
}
dev->zone[dev->nb_zones] = *zone;
dev->nb_zones++;
- mutex_unlock(&kvm->lock);
+ up_write(&kvm->slots_lock);
return 0;
}
@@ -138,7 +138,7 @@ int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm,
if (dev == NULL)
return -EINVAL;
- mutex_lock(&kvm->lock);
+ down_write(&kvm->slots_lock);
i = dev->nb_zones;
while(i) {
@@ -156,7 +156,7 @@ int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm,
i--;
}
- mutex_unlock(&kvm->lock);
+ up_write(&kvm->slots_lock);
return 0;
}
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 43/47] KVM: switch pit creation to slots_lock
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (41 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 42/47] KVM: switch coalesced mmio changes to slots_lock Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 44/47] KVM: convert bus " Avi Kivity
` (3 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Michael S. Tsirkin <mst@redhat.com>
switch pit creation to slots_lock. slots_lock is already taken for read
everywhere, so we only need to take it for write when creating pit.
This is in preparation to removing in_range and kvm->lock around it.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/x86.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0be75d5..7ce6367 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2188,7 +2188,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
sizeof(struct kvm_pit_config)))
goto out;
create_pit:
- mutex_lock(&kvm->lock);
+ down_write(&kvm->slots_lock);
r = -EEXIST;
if (kvm->arch.vpit)
goto create_pit_unlock;
@@ -2197,7 +2197,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
if (kvm->arch.vpit)
r = 0;
create_pit_unlock:
- mutex_unlock(&kvm->lock);
+ up_write(&kvm->slots_lock);
break;
case KVM_IRQ_LINE_STATUS:
case KVM_IRQ_LINE: {
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 44/47] KVM: convert bus to slots_lock
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (42 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 43/47] KVM: switch pit creation " Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 45/47] KVM: remove in_range from io devices Avi Kivity
` (2 subsequent siblings)
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Michael S. Tsirkin <mst@redhat.com>
Use slots_lock to protect device list on the bus. slots_lock is already
taken for read everywhere, so we only need to take it for write when
registering devices. This is in preparation to removing in_range and
kvm->lock around it.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/x86/kvm/i8254.c | 5 +++--
arch/x86/kvm/i8259.c | 2 +-
include/linux/kvm_host.h | 5 ++++-
virt/kvm/coalesced_mmio.c | 2 +-
virt/kvm/ioapic.c | 2 +-
virt/kvm/kvm_main.c | 12 +++++++++++-
6 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index bcd00c7..4082cdd 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -583,6 +583,7 @@ static const struct kvm_io_device_ops speaker_dev_ops = {
.in_range = speaker_in_range,
};
+/* Caller must have writers lock on slots_lock */
struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
{
struct kvm_pit *pit;
@@ -621,11 +622,11 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
kvm_iodevice_init(&pit->dev, &pit_dev_ops);
- kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev);
+ __kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev);
if (flags & KVM_PIT_SPEAKER_DUMMY) {
kvm_iodevice_init(&pit->speaker_dev, &speaker_dev_ops);
- kvm_io_bus_register_dev(&kvm->pio_bus, &pit->speaker_dev);
+ __kvm_io_bus_register_dev(&kvm->pio_bus, &pit->speaker_dev);
}
return pit;
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 148c52a..1851aec 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -548,6 +548,6 @@ struct kvm_pic *kvm_create_pic(struct kvm *kvm)
* Initialize PIO device
*/
kvm_iodevice_init(&s->dev, &picdev_ops);
- kvm_io_bus_register_dev(&kvm->pio_bus, &s->dev);
+ kvm_io_bus_register_dev(kvm, &kvm->pio_bus, &s->dev);
return s;
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4ea42c9..96c8c0b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -42,6 +42,7 @@
#define KVM_USERSPACE_IRQ_SOURCE_ID 0
+struct kvm;
struct kvm_vcpu;
extern struct kmem_cache *kvm_vcpu_cache;
@@ -61,7 +62,9 @@ void kvm_io_bus_init(struct kvm_io_bus *bus);
void kvm_io_bus_destroy(struct kvm_io_bus *bus);
struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus,
gpa_t addr, int len, int is_write);
-void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
+void __kvm_io_bus_register_dev(struct kvm_io_bus *bus,
+ struct kvm_io_device *dev);
+void kvm_io_bus_register_dev(struct kvm *kvm, struct kvm_io_bus *bus,
struct kvm_io_device *dev);
struct kvm_vcpu {
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index b40946c..7b7cc9f 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -102,7 +102,7 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
kvm_iodevice_init(&dev->dev, &coalesced_mmio_ops);
dev->kvm = kvm;
kvm->coalesced_mmio_dev = dev;
- kvm_io_bus_register_dev(&kvm->mmio_bus, &dev->dev);
+ kvm_io_bus_register_dev(kvm, &kvm->mmio_bus, &dev->dev);
return 0;
}
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 0532fa6..0eca54e 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -343,7 +343,7 @@ int kvm_ioapic_init(struct kvm *kvm)
kvm_ioapic_reset(ioapic);
kvm_iodevice_init(&ioapic->dev, &ioapic_mmio_ops);
ioapic->kvm = kvm;
- kvm_io_bus_register_dev(&kvm->mmio_bus, &ioapic->dev);
+ kvm_io_bus_register_dev(kvm, &kvm->mmio_bus, &ioapic->dev);
return 0;
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bbb4029..0edc366 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2527,7 +2527,17 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus,
return NULL;
}
-void kvm_io_bus_register_dev(struct kvm_io_bus *bus, struct kvm_io_device *dev)
+void kvm_io_bus_register_dev(struct kvm *kvm, struct kvm_io_bus *bus,
+ struct kvm_io_device *dev)
+{
+ down_write(&kvm->slots_lock);
+ __kvm_io_bus_register_dev(bus, dev);
+ up_write(&kvm->slots_lock);
+}
+
+/* An unlocked version. Caller must have write lock on slots_lock. */
+void __kvm_io_bus_register_dev(struct kvm_io_bus *bus,
+ struct kvm_io_device *dev)
{
BUG_ON(bus->dev_count > (NR_IOBUS_DEVS-1));
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 45/47] KVM: remove in_range from io devices
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (43 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 44/47] KVM: convert bus " Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 46/47] KVM: document lock nesting rule Avi Kivity
2009-08-19 13:02 ` [PATCH 47/47] KVM: fix lock imbalance Avi Kivity
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Michael S. Tsirkin <mst@redhat.com>
This changes bus accesses to use high-level kvm_io_bus_read/kvm_io_bus_write
functions. in_range now becomes unused so it is removed from device ops in
favor of read/write callbacks performing range checks internally.
This allows aliasing (mostly for in-kernel virtio), as well as better error
handling by making it possible to pass errors up to userspace.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
arch/ia64/kvm/kvm-ia64.c | 28 +++--------
arch/x86/kvm/i8254.c | 49 +++++++++++---------
arch/x86/kvm/i8259.c | 20 +++++---
arch/x86/kvm/lapic.c | 44 ++++++++----------
arch/x86/kvm/x86.c | 110 +++++++++++++--------------------------------
include/linux/kvm_host.h | 6 ++-
virt/kvm/coalesced_mmio.c | 16 +++----
virt/kvm/ioapic.c | 22 +++++----
virt/kvm/iodev.h | 39 ++++++----------
virt/kvm/kvm_main.c | 26 +++++++----
10 files changed, 152 insertions(+), 208 deletions(-)
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 5c766bd..d7aa6bb 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -210,16 +210,6 @@ int kvm_dev_ioctl_check_extension(long ext)
}
-static struct kvm_io_device *vcpu_find_mmio_dev(struct kvm_vcpu *vcpu,
- gpa_t addr, int len, int is_write)
-{
- struct kvm_io_device *dev;
-
- dev = kvm_io_bus_find_dev(&vcpu->kvm->mmio_bus, addr, len, is_write);
-
- return dev;
-}
-
static int handle_vm_error(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
kvm_run->exit_reason = KVM_EXIT_UNKNOWN;
@@ -231,6 +221,7 @@ static int handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
struct kvm_mmio_req *p;
struct kvm_io_device *mmio_dev;
+ int r;
p = kvm_get_vcpu_ioreq(vcpu);
@@ -247,16 +238,13 @@ static int handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
kvm_run->exit_reason = KVM_EXIT_MMIO;
return 0;
mmio:
- mmio_dev = vcpu_find_mmio_dev(vcpu, p->addr, p->size, !p->dir);
- if (mmio_dev) {
- if (!p->dir)
- kvm_iodevice_write(mmio_dev, p->addr, p->size,
- &p->data);
- else
- kvm_iodevice_read(mmio_dev, p->addr, p->size,
- &p->data);
-
- } else
+ if (p->dir)
+ r = kvm_io_bus_read(&vcpu->kvm->mmio_bus, p->addr,
+ p->size, &p->data);
+ else
+ r = kvm_io_bus_write(&vcpu->kvm->mmio_bus, p->addr,
+ p->size, &p->data);
+ if (r)
printk(KERN_ERR"kvm: No iodevice found! addr:%lx\n", p->addr);
p->state = STATE_IORESP_READY;
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 4082cdd..8c3ac30 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -358,8 +358,14 @@ static inline struct kvm_pit *speaker_to_pit(struct kvm_io_device *dev)
return container_of(dev, struct kvm_pit, speaker_dev);
}
-static void pit_ioport_write(struct kvm_io_device *this,
- gpa_t addr, int len, const void *data)
+static inline int pit_in_range(gpa_t addr)
+{
+ return ((addr >= KVM_PIT_BASE_ADDRESS) &&
+ (addr < KVM_PIT_BASE_ADDRESS + KVM_PIT_MEM_LENGTH));
+}
+
+static int pit_ioport_write(struct kvm_io_device *this,
+ gpa_t addr, int len, const void *data)
{
struct kvm_pit *pit = dev_to_pit(this);
struct kvm_kpit_state *pit_state = &pit->pit_state;
@@ -367,6 +373,8 @@ static void pit_ioport_write(struct kvm_io_device *this,
int channel, access;
struct kvm_kpit_channel_state *s;
u32 val = *(u32 *) data;
+ if (!pit_in_range(addr))
+ return -EOPNOTSUPP;
val &= 0xff;
addr &= KVM_PIT_CHANNEL_MASK;
@@ -429,16 +437,19 @@ static void pit_ioport_write(struct kvm_io_device *this,
}
mutex_unlock(&pit_state->lock);
+ return 0;
}
-static void pit_ioport_read(struct kvm_io_device *this,
- gpa_t addr, int len, void *data)
+static int pit_ioport_read(struct kvm_io_device *this,
+ gpa_t addr, int len, void *data)
{
struct kvm_pit *pit = dev_to_pit(this);
struct kvm_kpit_state *pit_state = &pit->pit_state;
struct kvm *kvm = pit->kvm;
int ret, count;
struct kvm_kpit_channel_state *s;
+ if (!pit_in_range(addr))
+ return -EOPNOTSUPP;
addr &= KVM_PIT_CHANNEL_MASK;
s = &pit_state->channels[addr];
@@ -493,37 +504,36 @@ static void pit_ioport_read(struct kvm_io_device *this,
memcpy(data, (char *)&ret, len);
mutex_unlock(&pit_state->lock);
+ return 0;
}
-static int pit_in_range(struct kvm_io_device *this, gpa_t addr,
- int len, int is_write)
-{
- return ((addr >= KVM_PIT_BASE_ADDRESS) &&
- (addr < KVM_PIT_BASE_ADDRESS + KVM_PIT_MEM_LENGTH));
-}
-
-static void speaker_ioport_write(struct kvm_io_device *this,
- gpa_t addr, int len, const void *data)
+static int speaker_ioport_write(struct kvm_io_device *this,
+ gpa_t addr, int len, const void *data)
{
struct kvm_pit *pit = speaker_to_pit(this);
struct kvm_kpit_state *pit_state = &pit->pit_state;
struct kvm *kvm = pit->kvm;
u32 val = *(u32 *) data;
+ if (addr != KVM_SPEAKER_BASE_ADDRESS)
+ return -EOPNOTSUPP;
mutex_lock(&pit_state->lock);
pit_state->speaker_data_on = (val >> 1) & 1;
pit_set_gate(kvm, 2, val & 1);
mutex_unlock(&pit_state->lock);
+ return 0;
}
-static void speaker_ioport_read(struct kvm_io_device *this,
- gpa_t addr, int len, void *data)
+static int speaker_ioport_read(struct kvm_io_device *this,
+ gpa_t addr, int len, void *data)
{
struct kvm_pit *pit = speaker_to_pit(this);
struct kvm_kpit_state *pit_state = &pit->pit_state;
struct kvm *kvm = pit->kvm;
unsigned int refresh_clock;
int ret;
+ if (addr != KVM_SPEAKER_BASE_ADDRESS)
+ return -EOPNOTSUPP;
/* Refresh clock toggles at about 15us. We approximate as 2^14ns. */
refresh_clock = ((unsigned int)ktime_to_ns(ktime_get()) >> 14) & 1;
@@ -535,12 +545,7 @@ static void speaker_ioport_read(struct kvm_io_device *this,
len = sizeof(ret);
memcpy(data, (char *)&ret, len);
mutex_unlock(&pit_state->lock);
-}
-
-static int speaker_in_range(struct kvm_io_device *this, gpa_t addr,
- int len, int is_write)
-{
- return (addr == KVM_SPEAKER_BASE_ADDRESS);
+ return 0;
}
void kvm_pit_reset(struct kvm_pit *pit)
@@ -574,13 +579,11 @@ static void pit_mask_notifer(struct kvm_irq_mask_notifier *kimn, bool mask)
static const struct kvm_io_device_ops pit_dev_ops = {
.read = pit_ioport_read,
.write = pit_ioport_write,
- .in_range = pit_in_range,
};
static const struct kvm_io_device_ops speaker_dev_ops = {
.read = speaker_ioport_read,
.write = speaker_ioport_write,
- .in_range = speaker_in_range,
};
/* Caller must have writers lock on slots_lock */
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 1851aec..1d1bb75 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -430,8 +430,7 @@ static u32 elcr_ioport_read(void *opaque, u32 addr1)
return s->elcr;
}
-static int picdev_in_range(struct kvm_io_device *this, gpa_t addr,
- int len, int is_write)
+static int picdev_in_range(gpa_t addr)
{
switch (addr) {
case 0x20:
@@ -451,16 +450,18 @@ static inline struct kvm_pic *to_pic(struct kvm_io_device *dev)
return container_of(dev, struct kvm_pic, dev);
}
-static void picdev_write(struct kvm_io_device *this,
+static int picdev_write(struct kvm_io_device *this,
gpa_t addr, int len, const void *val)
{
struct kvm_pic *s = to_pic(this);
unsigned char data = *(unsigned char *)val;
+ if (!picdev_in_range(addr))
+ return -EOPNOTSUPP;
if (len != 1) {
if (printk_ratelimit())
printk(KERN_ERR "PIC: non byte write\n");
- return;
+ return 0;
}
pic_lock(s);
switch (addr) {
@@ -476,18 +477,21 @@ static void picdev_write(struct kvm_io_device *this,
break;
}
pic_unlock(s);
+ return 0;
}
-static void picdev_read(struct kvm_io_device *this,
- gpa_t addr, int len, void *val)
+static int picdev_read(struct kvm_io_device *this,
+ gpa_t addr, int len, void *val)
{
struct kvm_pic *s = to_pic(this);
unsigned char data = 0;
+ if (!picdev_in_range(addr))
+ return -EOPNOTSUPP;
if (len != 1) {
if (printk_ratelimit())
printk(KERN_ERR "PIC: non byte read\n");
- return;
+ return 0;
}
pic_lock(s);
switch (addr) {
@@ -504,6 +508,7 @@ static void picdev_read(struct kvm_io_device *this,
}
*(unsigned char *)val = data;
pic_unlock(s);
+ return 0;
}
/*
@@ -526,7 +531,6 @@ static void pic_irq_request(void *opaque, int level)
static const struct kvm_io_device_ops picdev_ops = {
.read = picdev_read,
.write = picdev_write,
- .in_range = picdev_in_range,
};
struct kvm_pic *kvm_create_pic(struct kvm *kvm)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2e02865..265a765 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -546,18 +546,27 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
return container_of(dev, struct kvm_lapic, dev);
}
-static void apic_mmio_read(struct kvm_io_device *this,
- gpa_t address, int len, void *data)
+static int apic_mmio_in_range(struct kvm_lapic *apic, gpa_t addr)
+{
+ return apic_hw_enabled(apic) &&
+ addr >= apic->base_address &&
+ addr < apic->base_address + LAPIC_MMIO_LENGTH;
+}
+
+static int apic_mmio_read(struct kvm_io_device *this,
+ gpa_t address, int len, void *data)
{
struct kvm_lapic *apic = to_lapic(this);
unsigned int offset = address - apic->base_address;
unsigned char alignment = offset & 0xf;
u32 result;
+ if (!apic_mmio_in_range(apic, address))
+ return -EOPNOTSUPP;
if ((alignment + len) > 4) {
printk(KERN_ERR "KVM_APIC_READ: alignment error %lx %d",
(unsigned long)address, len);
- return;
+ return 0;
}
result = __apic_read(apic, offset & ~0xf);
@@ -574,6 +583,7 @@ static void apic_mmio_read(struct kvm_io_device *this,
"should be 1,2, or 4 instead\n", len);
break;
}
+ return 0;
}
static void update_divide_count(struct kvm_lapic *apic)
@@ -629,13 +639,15 @@ static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
apic->vcpu->kvm->arch.vapics_in_nmi_mode--;
}
-static void apic_mmio_write(struct kvm_io_device *this,
- gpa_t address, int len, const void *data)
+static int apic_mmio_write(struct kvm_io_device *this,
+ gpa_t address, int len, const void *data)
{
struct kvm_lapic *apic = to_lapic(this);
unsigned int offset = address - apic->base_address;
unsigned char alignment = offset & 0xf;
u32 val;
+ if (!apic_mmio_in_range(apic, address))
+ return -EOPNOTSUPP;
/*
* APIC register must be aligned on 128-bits boundary.
@@ -646,7 +658,7 @@ static void apic_mmio_write(struct kvm_io_device *this,
/* Don't shout loud, $infamous_os would cause only noise. */
apic_debug("apic write: bad size=%d %lx\n",
len, (long)address);
- return;
+ return 0;
}
val = *(u32 *) data;
@@ -729,7 +741,7 @@ static void apic_mmio_write(struct kvm_io_device *this,
hrtimer_cancel(&apic->lapic_timer.timer);
apic_set_reg(apic, APIC_TMICT, val);
start_apic_timer(apic);
- return;
+ return 0;
case APIC_TDCR:
if (val & 4)
@@ -743,22 +755,7 @@ static void apic_mmio_write(struct kvm_io_device *this,
offset);
break;
}
-
-}
-
-static int apic_mmio_range(struct kvm_io_device *this, gpa_t addr,
- int len, int size)
-{
- struct kvm_lapic *apic = to_lapic(this);
- int ret = 0;
-
-
- if (apic_hw_enabled(apic) &&
- (addr >= apic->base_address) &&
- (addr < (apic->base_address + LAPIC_MMIO_LENGTH)))
- ret = 1;
-
- return ret;
+ return 0;
}
void kvm_free_lapic(struct kvm_vcpu *vcpu)
@@ -938,7 +935,6 @@ static struct kvm_timer_ops lapic_timer_ops = {
static const struct kvm_io_device_ops apic_mmio_ops = {
.read = apic_mmio_read,
.write = apic_mmio_write,
- .in_range = apic_mmio_range,
};
int kvm_create_lapic(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7ce6367..96f0ae7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2333,35 +2333,23 @@ static void kvm_init_msr_list(void)
num_msrs_to_save = j;
}
-/*
- * Only apic need an MMIO device hook, so shortcut now..
- */
-static struct kvm_io_device *vcpu_find_pervcpu_dev(struct kvm_vcpu *vcpu,
- gpa_t addr, int len,
- int is_write)
+static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
+ const void *v)
{
- struct kvm_io_device *dev;
+ if (vcpu->arch.apic &&
+ !kvm_iodevice_write(&vcpu->arch.apic->dev, addr, len, v))
+ return 0;
- if (vcpu->arch.apic) {
- dev = &vcpu->arch.apic->dev;
- if (kvm_iodevice_in_range(dev, addr, len, is_write))
- return dev;
- }
- return NULL;
+ return kvm_io_bus_write(&vcpu->kvm->mmio_bus, addr, len, v);
}
-
-static struct kvm_io_device *vcpu_find_mmio_dev(struct kvm_vcpu *vcpu,
- gpa_t addr, int len,
- int is_write)
+static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v)
{
- struct kvm_io_device *dev;
+ if (vcpu->arch.apic &&
+ !kvm_iodevice_read(&vcpu->arch.apic->dev, addr, len, v))
+ return 0;
- dev = vcpu_find_pervcpu_dev(vcpu, addr, len, is_write);
- if (dev == NULL)
- dev = kvm_io_bus_find_dev(&vcpu->kvm->mmio_bus, addr, len,
- is_write);
- return dev;
+ return kvm_io_bus_read(&vcpu->kvm->mmio_bus, addr, len, v);
}
static int kvm_read_guest_virt(gva_t addr, void *val, unsigned int bytes,
@@ -2430,7 +2418,6 @@ static int emulator_read_emulated(unsigned long addr,
unsigned int bytes,
struct kvm_vcpu *vcpu)
{
- struct kvm_io_device *mmio_dev;
gpa_t gpa;
if (vcpu->mmio_read_completed) {
@@ -2455,13 +2442,8 @@ mmio:
/*
* Is this MMIO handled locally?
*/
- mutex_lock(&vcpu->kvm->lock);
- mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0);
- mutex_unlock(&vcpu->kvm->lock);
- if (mmio_dev) {
- kvm_iodevice_read(mmio_dev, gpa, bytes, val);
+ if (!vcpu_mmio_read(vcpu, gpa, bytes, val))
return X86EMUL_CONTINUE;
- }
vcpu->mmio_needed = 1;
vcpu->mmio_phys_addr = gpa;
@@ -2488,7 +2470,6 @@ static int emulator_write_emulated_onepage(unsigned long addr,
unsigned int bytes,
struct kvm_vcpu *vcpu)
{
- struct kvm_io_device *mmio_dev;
gpa_t gpa;
gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr);
@@ -2509,13 +2490,8 @@ mmio:
/*
* Is this MMIO handled locally?
*/
- mutex_lock(&vcpu->kvm->lock);
- mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 1);
- mutex_unlock(&vcpu->kvm->lock);
- if (mmio_dev) {
- kvm_iodevice_write(mmio_dev, gpa, bytes, val);
+ if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
return X86EMUL_CONTINUE;
- }
vcpu->mmio_needed = 1;
vcpu->mmio_phys_addr = gpa;
@@ -2850,48 +2826,40 @@ int complete_pio(struct kvm_vcpu *vcpu)
return 0;
}
-static void kernel_pio(struct kvm_io_device *pio_dev,
- struct kvm_vcpu *vcpu,
- void *pd)
+static int kernel_pio(struct kvm_vcpu *vcpu, void *pd)
{
/* TODO: String I/O for in kernel device */
+ int r;
if (vcpu->arch.pio.in)
- kvm_iodevice_read(pio_dev, vcpu->arch.pio.port,
- vcpu->arch.pio.size,
- pd);
+ r = kvm_io_bus_read(&vcpu->kvm->pio_bus, vcpu->arch.pio.port,
+ vcpu->arch.pio.size, pd);
else
- kvm_iodevice_write(pio_dev, vcpu->arch.pio.port,
- vcpu->arch.pio.size,
- pd);
+ r = kvm_io_bus_write(&vcpu->kvm->pio_bus, vcpu->arch.pio.port,
+ vcpu->arch.pio.size, pd);
+ return r;
}
-static void pio_string_write(struct kvm_io_device *pio_dev,
- struct kvm_vcpu *vcpu)
+static int pio_string_write(struct kvm_vcpu *vcpu)
{
struct kvm_pio_request *io = &vcpu->arch.pio;
void *pd = vcpu->arch.pio_data;
- int i;
+ int i, r = 0;
for (i = 0; i < io->cur_count; i++) {
- kvm_iodevice_write(pio_dev, io->port,
- io->size,
- pd);
+ if (kvm_io_bus_write(&vcpu->kvm->pio_bus,
+ io->port, io->size, pd)) {
+ r = -EOPNOTSUPP;
+ break;
+ }
pd += io->size;
}
-}
-
-static struct kvm_io_device *vcpu_find_pio_dev(struct kvm_vcpu *vcpu,
- gpa_t addr, int len,
- int is_write)
-{
- return kvm_io_bus_find_dev(&vcpu->kvm->pio_bus, addr, len, is_write);
+ return r;
}
int kvm_emulate_pio(struct kvm_vcpu *vcpu, struct kvm_run *run, int in,
int size, unsigned port)
{
- struct kvm_io_device *pio_dev;
unsigned long val;
vcpu->run->exit_reason = KVM_EXIT_IO;
@@ -2911,11 +2879,7 @@ int kvm_emulate_pio(struct kvm_vcpu *vcpu, struct kvm_run *run, int in,
val = kvm_register_read(vcpu, VCPU_REGS_RAX);
memcpy(vcpu->arch.pio_data, &val, 4);
- mutex_lock(&vcpu->kvm->lock);
- pio_dev = vcpu_find_pio_dev(vcpu, port, size, !in);
- mutex_unlock(&vcpu->kvm->lock);
- if (pio_dev) {
- kernel_pio(pio_dev, vcpu, vcpu->arch.pio_data);
+ if (!kernel_pio(vcpu, vcpu->arch.pio_data)) {
complete_pio(vcpu);
return 1;
}
@@ -2929,7 +2893,6 @@ int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, struct kvm_run *run, int in,
{
unsigned now, in_page;
int ret = 0;
- struct kvm_io_device *pio_dev;
vcpu->run->exit_reason = KVM_EXIT_IO;
vcpu->run->io.direction = in ? KVM_EXIT_IO_IN : KVM_EXIT_IO_OUT;
@@ -2973,12 +2936,6 @@ int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, struct kvm_run *run, int in,
vcpu->arch.pio.guest_gva = address;
- mutex_lock(&vcpu->kvm->lock);
- pio_dev = vcpu_find_pio_dev(vcpu, port,
- vcpu->arch.pio.cur_count,
- !vcpu->arch.pio.in);
- mutex_unlock(&vcpu->kvm->lock);
-
if (!vcpu->arch.pio.in) {
/* string PIO write */
ret = pio_copy_data(vcpu);
@@ -2986,16 +2943,13 @@ int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, struct kvm_run *run, int in,
kvm_inject_gp(vcpu, 0);
return 1;
}
- if (ret == 0 && pio_dev) {
- pio_string_write(pio_dev, vcpu);
+ if (ret == 0 && !pio_string_write(vcpu)) {
complete_pio(vcpu);
if (vcpu->arch.pio.count == 0)
ret = 1;
}
- } else if (pio_dev)
- pr_unimpl(vcpu, "no string pio read support yet, "
- "port %x size %d count %ld\n",
- port, size, count);
+ }
+ /* no string PIO read support yet */
return ret;
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 96c8c0b..077e8bb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -60,8 +60,10 @@ struct kvm_io_bus {
void kvm_io_bus_init(struct kvm_io_bus *bus);
void kvm_io_bus_destroy(struct kvm_io_bus *bus);
-struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus,
- gpa_t addr, int len, int is_write);
+int kvm_io_bus_write(struct kvm_io_bus *bus, gpa_t addr, int len,
+ const void *val);
+int kvm_io_bus_read(struct kvm_io_bus *bus, gpa_t addr, int len,
+ void *val);
void __kvm_io_bus_register_dev(struct kvm_io_bus *bus,
struct kvm_io_device *dev);
void kvm_io_bus_register_dev(struct kvm *kvm, struct kvm_io_bus *bus,
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 7b7cc9f..0352f81 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -19,18 +19,14 @@ static inline struct kvm_coalesced_mmio_dev *to_mmio(struct kvm_io_device *dev)
return container_of(dev, struct kvm_coalesced_mmio_dev, dev);
}
-static int coalesced_mmio_in_range(struct kvm_io_device *this,
- gpa_t addr, int len, int is_write)
+static int coalesced_mmio_in_range(struct kvm_coalesced_mmio_dev *dev,
+ gpa_t addr, int len)
{
- struct kvm_coalesced_mmio_dev *dev = to_mmio(this);
struct kvm_coalesced_mmio_zone *zone;
struct kvm_coalesced_mmio_ring *ring;
unsigned avail;
int i;
- if (!is_write)
- return 0;
-
/* Are we able to batch it ? */
/* last is the first free entry
@@ -60,11 +56,13 @@ static int coalesced_mmio_in_range(struct kvm_io_device *this,
return 0;
}
-static void coalesced_mmio_write(struct kvm_io_device *this,
- gpa_t addr, int len, const void *val)
+static int coalesced_mmio_write(struct kvm_io_device *this,
+ gpa_t addr, int len, const void *val)
{
struct kvm_coalesced_mmio_dev *dev = to_mmio(this);
struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring;
+ if (!coalesced_mmio_in_range(dev, addr, len))
+ return -EOPNOTSUPP;
spin_lock(&dev->lock);
@@ -76,6 +74,7 @@ static void coalesced_mmio_write(struct kvm_io_device *this,
smp_wmb();
ring->last = (ring->last + 1) % KVM_COALESCED_MMIO_MAX;
spin_unlock(&dev->lock);
+ return 0;
}
static void coalesced_mmio_destructor(struct kvm_io_device *this)
@@ -87,7 +86,6 @@ static void coalesced_mmio_destructor(struct kvm_io_device *this)
static const struct kvm_io_device_ops coalesced_mmio_ops = {
.write = coalesced_mmio_write,
- .in_range = coalesced_mmio_in_range,
.destructor = coalesced_mmio_destructor,
};
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 0eca54e..ddf6aa9 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -227,20 +227,19 @@ static inline struct kvm_ioapic *to_ioapic(struct kvm_io_device *dev)
return container_of(dev, struct kvm_ioapic, dev);
}
-static int ioapic_in_range(struct kvm_io_device *this, gpa_t addr,
- int len, int is_write)
+static inline int ioapic_in_range(struct kvm_ioapic *ioapic, gpa_t addr)
{
- struct kvm_ioapic *ioapic = to_ioapic(this);
-
return ((addr >= ioapic->base_address &&
(addr < ioapic->base_address + IOAPIC_MEM_LENGTH)));
}
-static void ioapic_mmio_read(struct kvm_io_device *this, gpa_t addr, int len,
- void *val)
+static int ioapic_mmio_read(struct kvm_io_device *this, gpa_t addr, int len,
+ void *val)
{
struct kvm_ioapic *ioapic = to_ioapic(this);
u32 result;
+ if (!ioapic_in_range(ioapic, addr))
+ return -EOPNOTSUPP;
ioapic_debug("addr %lx\n", (unsigned long)addr);
ASSERT(!(addr & 0xf)); /* check alignment */
@@ -273,13 +272,16 @@ static void ioapic_mmio_read(struct kvm_io_device *this, gpa_t addr, int len,
printk(KERN_WARNING "ioapic: wrong length %d\n", len);
}
mutex_unlock(&ioapic->kvm->irq_lock);
+ return 0;
}
-static void ioapic_mmio_write(struct kvm_io_device *this, gpa_t addr, int len,
- const void *val)
+static int ioapic_mmio_write(struct kvm_io_device *this, gpa_t addr, int len,
+ const void *val)
{
struct kvm_ioapic *ioapic = to_ioapic(this);
u32 data;
+ if (!ioapic_in_range(ioapic, addr))
+ return -EOPNOTSUPP;
ioapic_debug("ioapic_mmio_write addr=%p len=%d val=%p\n",
(void*)addr, len, val);
@@ -290,7 +292,7 @@ static void ioapic_mmio_write(struct kvm_io_device *this, gpa_t addr, int len,
data = *(u32 *) val;
else {
printk(KERN_WARNING "ioapic: Unsupported size %d\n", len);
- return;
+ return 0;
}
addr &= 0xff;
@@ -312,6 +314,7 @@ static void ioapic_mmio_write(struct kvm_io_device *this, gpa_t addr, int len,
break;
}
mutex_unlock(&ioapic->kvm->irq_lock);
+ return 0;
}
void kvm_ioapic_reset(struct kvm_ioapic *ioapic)
@@ -329,7 +332,6 @@ void kvm_ioapic_reset(struct kvm_ioapic *ioapic)
static const struct kvm_io_device_ops ioapic_mmio_ops = {
.read = ioapic_mmio_read,
.write = ioapic_mmio_write,
- .in_range = ioapic_in_range,
};
int kvm_ioapic_init(struct kvm *kvm)
diff --git a/virt/kvm/iodev.h b/virt/kvm/iodev.h
index 06e38b2..12fd3ca 100644
--- a/virt/kvm/iodev.h
+++ b/virt/kvm/iodev.h
@@ -17,23 +17,24 @@
#define __KVM_IODEV_H__
#include <linux/kvm_types.h>
+#include <asm/errno.h>
struct kvm_io_device;
/**
* kvm_io_device_ops are called under kvm slots_lock.
+ * read and write handlers return 0 if the transaction has been handled,
+ * or non-zero to have it passed to the next device.
**/
struct kvm_io_device_ops {
- void (*read)(struct kvm_io_device *this,
+ int (*read)(struct kvm_io_device *this,
+ gpa_t addr,
+ int len,
+ void *val);
+ int (*write)(struct kvm_io_device *this,
gpa_t addr,
int len,
- void *val);
- void (*write)(struct kvm_io_device *this,
- gpa_t addr,
- int len,
- const void *val);
- int (*in_range)(struct kvm_io_device *this, gpa_t addr, int len,
- int is_write);
+ const void *val);
void (*destructor)(struct kvm_io_device *this);
};
@@ -48,26 +49,16 @@ static inline void kvm_iodevice_init(struct kvm_io_device *dev,
dev->ops = ops;
}
-static inline void kvm_iodevice_read(struct kvm_io_device *dev,
- gpa_t addr,
- int len,
- void *val)
+static inline int kvm_iodevice_read(struct kvm_io_device *dev,
+ gpa_t addr, int l, void *v)
{
- dev->ops->read(dev, addr, len, val);
+ return dev->ops->read ? dev->ops->read(dev, addr, l, v) : -EOPNOTSUPP;
}
-static inline void kvm_iodevice_write(struct kvm_io_device *dev,
- gpa_t addr,
- int len,
- const void *val)
+static inline int kvm_iodevice_write(struct kvm_io_device *dev,
+ gpa_t addr, int l, const void *v)
{
- dev->ops->write(dev, addr, len, val);
-}
-
-static inline int kvm_iodevice_in_range(struct kvm_io_device *dev,
- gpa_t addr, int len, int is_write)
-{
- return dev->ops->in_range(dev, addr, len, is_write);
+ return dev->ops->write ? dev->ops->write(dev, addr, l, v) : -EOPNOTSUPP;
}
static inline void kvm_iodevice_destructor(struct kvm_io_device *dev)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0edc366..5946065 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2512,19 +2512,25 @@ void kvm_io_bus_destroy(struct kvm_io_bus *bus)
}
}
-struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus,
- gpa_t addr, int len, int is_write)
+/* kvm_io_bus_write - called under kvm->slots_lock */
+int kvm_io_bus_write(struct kvm_io_bus *bus, gpa_t addr,
+ int len, const void *val)
{
int i;
+ for (i = 0; i < bus->dev_count; i++)
+ if (!kvm_iodevice_write(bus->devs[i], addr, len, val))
+ return 0;
+ return -EOPNOTSUPP;
+}
- for (i = 0; i < bus->dev_count; i++) {
- struct kvm_io_device *pos = bus->devs[i];
-
- if (kvm_iodevice_in_range(pos, addr, len, is_write))
- return pos;
- }
-
- return NULL;
+/* kvm_io_bus_read - called under kvm->slots_lock */
+int kvm_io_bus_read(struct kvm_io_bus *bus, gpa_t addr, int len, void *val)
+{
+ int i;
+ for (i = 0; i < bus->dev_count; i++)
+ if (!kvm_iodevice_read(bus->devs[i], addr, len, val))
+ return 0;
+ return -EOPNOTSUPP;
}
void kvm_io_bus_register_dev(struct kvm *kvm, struct kvm_io_bus *bus,
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 46/47] KVM: document lock nesting rule
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (44 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 45/47] KVM: remove in_range from io devices Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
2009-08-19 13:02 ` [PATCH 47/47] KVM: fix lock imbalance Avi Kivity
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Michael S. Tsirkin <mst@redhat.com>
Document kvm->lock nesting within kvm->slots_lock
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
virt/kvm/kvm_main.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5946065..fc1b58a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -68,7 +68,7 @@ MODULE_LICENSE("GPL");
/*
* Ordering of locks:
*
- * kvm->lock --> kvm->irq_lock
+ * kvm->slots_lock --> kvm->lock --> kvm->irq_lock
*/
DEFINE_SPINLOCK(kvm_lock);
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH 47/47] KVM: fix lock imbalance
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
` (45 preceding siblings ...)
2009-08-19 13:02 ` [PATCH 46/47] KVM: document lock nesting rule Avi Kivity
@ 2009-08-19 13:02 ` Avi Kivity
46 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-08-19 13:02 UTC (permalink / raw)
To: kvm; +Cc: linux-kernel
From: Jiri Slaby <jirislaby@gmail.com>
There is a missing unlock on one fail path in ioapic_mmio_write,
fix that.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
virt/kvm/ioapic.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index ddf6aa9..8a9c6cc 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -292,7 +292,7 @@ static int ioapic_mmio_write(struct kvm_io_device *this, gpa_t addr, int len,
data = *(u32 *) val;
else {
printk(KERN_WARNING "ioapic: Unsupported size %d\n", len);
- return 0;
+ goto unlock;
}
addr &= 0xff;
@@ -313,6 +313,7 @@ static int ioapic_mmio_write(struct kvm_io_device *this, gpa_t addr, int len,
default:
break;
}
+unlock:
mutex_unlock(&ioapic->kvm->irq_lock);
return 0;
}
--
1.6.3.3
^ permalink raw reply related [flat|nested] 48+ messages in thread
end of thread, other threads:[~2009-08-19 13:02 UTC | newest]
Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-19 13:01 [PATCH 00/47] KVM updates for 2.6.32 merge window (2/4) Avi Kivity
2009-08-19 13:01 ` [PATCH 01/47] KVM: Return to userspace on emulation failure Avi Kivity
2009-08-19 13:01 ` [PATCH 02/47] KVM: MMU: introduce is_last_spte helper Avi Kivity
2009-08-19 13:01 ` [PATCH 03/47] KVM: MMU audit: update count_writable_mappings / count_rmaps Avi Kivity
2009-08-19 13:02 ` [PATCH 04/47] KVM: MMU audit: update audit_write_protection Avi Kivity
2009-08-19 13:02 ` [PATCH 05/47] KVM: MMU audit: nontrapping ptes in nonleaf level Avi Kivity
2009-08-19 13:02 ` [PATCH 06/47] KVM: MMU audit: audit_mappings tweaks Avi Kivity
2009-08-19 13:02 ` [PATCH 07/47] KVM: MMU audit: largepage handling Avi Kivity
2009-08-19 13:02 ` [PATCH 08/47] KVM: Move performance counter MSR access interception to generic x86 path Avi Kivity
2009-08-19 13:02 ` [PATCH 09/47] KVM: VMX: more MSR_IA32_VMX_EPT_VPID_CAP capability bits Avi Kivity
2009-08-19 13:02 ` [PATCH 10/47] KVM: MMU: make for_each_shadow_entry aware of largepages Avi Kivity
2009-08-19 13:02 ` [PATCH 11/47] KVM: MMU: add kvm_mmu_get_spte_hierarchy helper Avi Kivity
2009-08-19 13:02 ` [PATCH 12/47] KVM: VMX: EPT misconfiguration handler Avi Kivity
2009-08-19 13:02 ` [PATCH 13/47] KVM: VMX: conditionally disable 2M pages Avi Kivity
2009-08-19 13:02 ` [PATCH 14/47] KVM: Replace pending exception by PF if it happens serially Avi Kivity
2009-08-19 13:02 ` [PATCH 15/47] KVM: Optimize searching for highest IRR Avi Kivity
2009-08-19 13:02 ` [PATCH 16/47] KVM: Fix racy event propagation in timer Avi Kivity
2009-08-19 13:02 ` [PATCH 17/47] KVM: Drop useless atomic test from timer function Avi Kivity
2009-08-19 13:02 ` [PATCH 18/47] KVM: VMX: Only reload guest cr2 if different from host cr2 Avi Kivity
2009-08-19 13:02 ` [PATCH 19/47] KVM: SVM: Don't save/restore " Avi Kivity
2009-08-19 13:02 ` [PATCH 20/47] x86: Add definition for IGNNE MSR Avi Kivity
2009-08-19 13:02 ` [PATCH 21/47] KVM: Implement MSRs used by Hyper-V Avi Kivity
2009-08-19 13:02 ` [PATCH 22/47] KVM: SVM: Implement INVLPGA Avi Kivity
2009-08-19 13:02 ` [PATCH 23/47] KVM: SVM: Improve nested interrupt injection Avi Kivity
2009-08-19 13:02 ` [PATCH 24/47] KVM: convert custom marker based tracing to event traces Avi Kivity
2009-08-19 13:02 ` [PATCH 25/47] KVM: Allow emulation of syscalls instructions on #UD Avi Kivity
2009-08-19 13:02 ` [PATCH 26/47] KVM: x86 emulator: Add missing EFLAGS bit definitions Avi Kivity
2009-08-19 13:02 ` [PATCH 27/47] KVM: x86 emulator: Prepare for emulation of syscall instructions Avi Kivity
2009-08-19 13:02 ` [PATCH 28/47] KVM: x86 emulator: add syscall emulation Avi Kivity
2009-08-19 13:02 ` [PATCH 29/47] KVM: x86 emulator: Add sysenter emulation Avi Kivity
2009-08-19 13:02 ` [PATCH 30/47] KVM: x86 emulator: Add sysexit emulation Avi Kivity
2009-08-19 13:02 ` [PATCH 31/47] KVM: s390: Fix memslot initialization for userspace_addr != 0 Avi Kivity
2009-08-19 13:02 ` [PATCH 32/47] hugetlbfs: export vma_kernel_pagsize to modules Avi Kivity
2009-08-19 13:02 ` [PATCH 33/47] KVM: Prepare memslot data structures for multiple hugepage sizes Avi Kivity
2009-08-19 13:02 ` [PATCH 34/47] KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl paths Avi Kivity
2009-08-19 13:02 ` [PATCH 35/47] KVM: ignore AMDs HWCR register access to set the FFDIS bit Avi Kivity
2009-08-19 13:02 ` [PATCH 36/47] KVM: ignore reads from AMDs C1E enabled MSR Avi Kivity
2009-08-19 13:02 ` [PATCH 37/47] KVM: introduce module parameter for ignoring unknown MSRs accesses Avi Kivity
2009-08-19 13:02 ` [PATCH 38/47] KVM: powerpc: convert marker probes to event trace Avi Kivity
2009-08-19 13:02 ` [PATCH 39/47] KVM: remove old KVMTRACE support code Avi Kivity
2009-08-19 13:02 ` [PATCH 40/47] KVM: use vcpu_id instead of bsp_vcpu pointer in kvm_vcpu_is_bsp Avi Kivity
2009-08-19 13:02 ` [PATCH 41/47] KVM: document locking for kvm_io_device_ops Avi Kivity
2009-08-19 13:02 ` [PATCH 42/47] KVM: switch coalesced mmio changes to slots_lock Avi Kivity
2009-08-19 13:02 ` [PATCH 43/47] KVM: switch pit creation " Avi Kivity
2009-08-19 13:02 ` [PATCH 44/47] KVM: convert bus " Avi Kivity
2009-08-19 13:02 ` [PATCH 45/47] KVM: remove in_range from io devices Avi Kivity
2009-08-19 13:02 ` [PATCH 46/47] KVM: document lock nesting rule Avi Kivity
2009-08-19 13:02 ` [PATCH 47/47] KVM: fix lock imbalance Avi Kivity
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).